This is the schedule of topics for Computational Linguistics II, Spring 2011.
Unless otherwise specified, readings are from Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in.
THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep
an eye on the class mailing list or email me for "official"
dates.
Class  Topic 
Readings*  Assignments  Other 

Jan 26  Course administrivia, semester plan; some statistical NLP fundamentals 
Ch 1, 2.1.[19] (for review) Historical overview; Zipf's law; Probability spaces; finitestate and Markov models; Bayes' Rule; Bayesian updating; conjugate priors 
Assignment 1  Language Log (the linguistics blog), Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest) 
Feb 2  Words and lexical association 
Ch 5 Collocations; hypothesis testing; mutual information; 
Assignment 2  Dunning (1993); Goodman S (1999). "Toward evidencebased medical statistics. 1: The P value fallacy.". Ann Intern Med 130 (12): 995–1004. PMID 10383371.; Goodman S (1999). "Toward evidencebased medical statistics. 2: The Bayes factor.". Ann Intern Med 130 (12): 1005–13. PMID 10383350. Kilgarriff (2005); Gries (2005); Bland and Altman (1995); 
Feb 9  Information theory 
Ch 2.2, Ch 6 Information theory essentials; entropy, relative entropy, mutual information; noisy channel model; cross entropy and perplexity 
Assignment 3  
Feb 16  Maximum likelihood estimation and Expectation Maximization 
Skim Ch 910, Chapter 6 of Lin and Dyer (forthcoming). Read
my EM recipe discussion. Maximum likelihood estimation overview; quick review of smoothing; EM overview; HMM review; deriving forwardbackward algorithm as an instance of EM; Viterbi algorithm review. 
Assignment 4 
An
empirical study of smoothing techniques for language modeling (Stanley
Chen and Joshua Goodman, Technical report TR1098, Harvard University,
August 1998); Revised Chapter 4 from the updated Jurafsky and Martin textbook. 
Feb 23  Probabilistic grammars and parsing 
Ch 1112, Abney (1996) [alternative link],
my EM recipe discussion, and
the EM recipe used to derive the insideoutside algorithm
Parsing as inference; distinction between logic and control; Memoization and dynamic programming; brief review of CKY, PCKY (inside probabilities), Viterbi CKY; revisiting EM: the insideoutside algorithm. CFG extensions (grandparent parent nodes, lexicalization); syntactic dependency trees. 
Extra credit assignment, worth 50% of a homework assignment  Jason Eisner's great parsing song; Pereira (2000); Detlef Prescher, A Tutorial on the ExpectationMaximization Algorithm Including MaximumLikelihood Estimation and EM Training of Probabilistic ContextFree Grammars; McClosky, Charniak, and Johnson (2006), Effective SelfTraining for Parsing 
Mar 2  Advanced topic: on Parsing and psychological plausibility Guest presenter/facilitator: Kristy Hollingshead 
Stolcke (1995), An Efficient Probabilistic ContextFree Parsing Algorithm that Computes Prefix Probabilities,
(through section 4.4) for the Earley algorithm;
Resnik (1992), LeftCorner Parsing and Psychological Plausibility Leftcorner parsing; Earley's algorithm; using parsing as a diagnostic tool for Alzheimer's (and, to a lesser extent, autism) 
Takehome midterm handed out  Roark et al. (2007), Syntactic complexity measures for detecting Mild Cognitive Impairment 
Mar 9  Supervised classification 
Ch 16 (except 16.2) Supervised learning  knearest neighbor classification; naive Bayes; decision lists; decision trees; transformationbased learning (Sec 10.4); linear classifiers; the kernel trick; perceptrons; SVM basics. 

Mar 16  Beyond supervised learning 
Class imbalance; model and search errors; rescoring; oracle evaluations; selftraining, activelearning, cotraining. Using text to predict the real world.  Och et al. 'Smorgasbord' paper, Noah Smith and Philip Resnik, Using Text to Predict The Real World.  
Mar 23  Spring Break 
Have fun!  
March 30  Evaluation in NLP  Lin and Resnik, Evaluation of NLP Systems, Ch 11 of
Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural
Language Processing Handbook. Evaluation paradigms for NLP; parser evaluation 
Team project handed out  
Apr 6  More on supervised learning: maximum entropy models and conditional random fields [Guest lecturer TBA] 
Using maximum entropy for text classification (Kamal Nigam, John Lafferty, Andrew McCallum);
Shallow Parsing with Conditional Random Fields (Fei Sha and Fernando Pereira)
The maximum entropy principle; maxent classifiers (for predicting a single variable); CRFs (for predicting interacting variables); L2 regularization. 
Optionally, some good introductory material appears in Adam Berger's maxent tutorial, Dan Klein and Chris Manning's Maxent Models, Conditional Estimation, and Optimization, without the Magic, and Noah Smith's notes on loglinear models (which provides explicit details for a lot of the math). Another useful reading, focused on estimating the parameters of maxent models, is A comparison of algorithms for maximum entropy parameter estimation (Rob Malouf). Also, Manning and Schuetze section 16.2 can be read as supplementary material. Of historical interest: Adwait Ratnaparkhi's A Simple Introduction to Maximum Entropy Models for Natural Language Processing (1997).  
Apr 13  Unsupervised methods and topic modeling [topic tentative] 
Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated
(new version to be linked shortly); M. Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning
[tentative] Graphical model representations of generative models; MLE, MAP, and Bayesian inference; Markov Chain Monte Carlo (MCMC)and Gibbs Sampling; Latent Dirichlet Allocation (LDA) 
Blei, Ng, and Jordan (2003), Latent Dirichlet Allocation  
Apr 20  Word sense disambiguation 
Ch 8.5, 15.{1,2,4} Semantic similarity; relatedness; synonymy; polysemy; homonymy; entailment; ontologybased similarity measures; vector representations and similarity measures; sketch of LSA. Characterizing the WSD problem; WSD as a supervised classification problem. Lesk algorithm; semisupervised learning and Yarowsky's algorithm; WSD in applications; WSD evaluation. 
Optional: Adam Kilgarriff (1997) I don't believe in word senses Computers and the Humanities 31(2), pp. 91113; Philip Resnik (2006), WSD in NLP Applications (Google Books)  
Apr 27  Machine translation 
Ch 13 and Adam Lopez, Statistical Machine Translation,
In ACM Computing Surveys 40(3), Article 8, pages 149, August 2008. Historical view of MT approaches; noisy channel for SMT; IBM models 1 and 4; HMM distortion model; going beyond wordlevel models 
Also potentially useful or of interest:
Kevin Knight, A Statistical MT Tutorial Workbook;
Mihalcea and Pedersen (2003); Philip Resnik, Exploiting Hidden Meanings: Using Bilingual Text for Monolingual Annotation. In Alexander Gelbukh (ed.), Lecture Notes in Computer Science 2945: Computational Linguistics and Intelligent Text Processing, Springer, 2004, pp. 283299. 

May 4 [tentative]  Phrasebased statistical MT 
This material may be folded into the previous class
in order to make room for a different topic. Papineni, Roukos, Ward and Zhu. 2001. BLEU: A Method for Automatic Evaluation of Machine Translation Components of a phrasebased system: language modeling, translation modeling; sentence alignment, word alignment, phrase extraction, parameter tuning, decoding, rescoring, evaluation. 
Takehome final handed out  Koehn, PHARAOH: A Beam Search Decoder for PhraseBased Statistical Machine Translation; Koehn (2004) presentation on PHARAOH decoder 