EE3J2 Data Mining - 2010

 

Practice questions April 2004

Solutions to practice questions

See also the 2004 – 2006 exams!

 

Recommended book  

 Richard K Belew

 “Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW” Cambridge University Press, 2001

FOA website

 

OHP Slides  

Lecture 1 – introduction (slides) (handouts)(PPT)

Lecture 2 – texts  & Zipf (slides) (handouts)(PPT)

Lecture 3 – Zipf, stemming, stop words (slides) (handouts)(PPT)

Lecture 4 – Matching (slides) (handouts)(PPT)

Lecture 5 – Index (slides) (handouts)(PPT)

Lecture 6 – LSI (slides) (handouts)(PPT)

Lecture 7 – Query expansion (slides) (handouts)(PPT)

Lecture 8 – Topic Spotting (slides) (handouts)(PPT)

Lecture 9 – Page rank (slides) (handouts)(PPT)

Lecture 10 – statistical models (slides) (handouts)(PPT)

Lecture 11 – PCA (slides) (handouts)(PPT)

Lecture 12 – Clustering (slides) (handouts)(PPT)

Lecture 13 – K-means (slides) (handouts)(PPT)

Lecture 14 – Sequence analysis – Dynamic Programming (slides) (handouts)(PPT)

Lecture 15 – HMMs (slides) (handouts)(PPT) 

   

2007 Exam and Solutions

May 2007 exam questions (PDF)

Solution 1 (PDF, XLS)

Solution 2 (PDF, XLS)

Solution 3 (PDF, XLS)

Solution 4 (PDF, XLS)

 

Software Tools

Text analysis tool from lecture 2: zipf.c

Stop word removal tool from lecture 3: stop.c

Index generator from lecture 4: index.c

Retrieval tool from lecture 5: retrieve.c

Clustering tool from lecture 10: agglom.c

K-means tool from lecture 11: k-means.c

Vector representation of docs tool: doc2vec.c

Data file from clustering examples: sa1.txt

Edit distance tool from lecture 13: edit-dist.c

Zipf tool for first lab zipf2.c

k-means example (excel spreadsheet)

 

Laboratories
Lab sheet 1 (week 5)

Lab sheet 2 (week 9)

Files for lab 2:

Lab2Data

k-means-2010.c

agglom-2010.c

 

Resources

Online Library of Literature

Porter Stemmer in C, java, perl, C#

Scott Weiss (JHU) (porter stemmer)

Small Stop List: stopList(50)

Large Stop List: stopList(Brown)

WordNet

 

 

Last changed 28/04/2010 MJR