Foundations of statistical natural language processing / Christopher D. Manning, Hinrich Schütze.

By: Manning, Christopher DContributor(s): Schütze, HinrichMaterial type: TextTextPublisher: Cambridge, Mass. : MIT Press, 2000Edition: 2nd print., with correctionsDescription: xxxvii, 680 pages ; 24 cmContent type: text Media type: unmediated Carrier type: volumeISBN: 0262133601; 9780262133609Subject(s): Computational linguistics -- Statistical methods | Computational linguistics -- Statistical methods | Linguistische Datenverarbeitung | Natürliche Sprache | Sprachstatistik | Computerlinguistik | Natürliche Sprache | SprachstatistikDDC classification: 410.285 LOC classification: P98.5.S83 | M36 2000Other classification: ES 910 | ST 306 Online resources: Click here to access online Companion Website with errata available via the Internet.
Contents:
Preliminaries Ratinalist and Empiricist Approaches to Language Scientific Content Questions that linguistics should answer Non-categorical phenomena in language Language and cognition as probabilistic phenomena Ambiguity of Language: Why NLP Is Difficult Dirty Hands Lexical resources Word counts Zipf's laws Collocations Concordances Mathematical Foundations Elementary Probability Theory Probability spaces Conditional probability and independence Bayes' theorem Random variables Expectation and variance Notation Joint and conditional distributions Determining P Standard distributions Bayesian statistics Essential Information Theory Entropy Joint entropy and conditional entropy Mutual information Noisy channel model Relative entropy or Kullback-Leibler divergence Relation to language: Cross entropy Entropy of English Perplexity Linguistic Essentials Parts of Speech and Morphology Nouns and pronouns Words that accompany nouns: Determiners and adjectives Verbs Other parts of speech Phrase Structure Phrase structure grammars Dependency: Arguments and adjuncts X' theory Phrase structure ambiguity Semantics and Pragmatics Other Areas Corpus-Based Work Getting Set Up Computers Corpora Software Looking at Text Low-level formatting issues Tokenization: What is a word? Morphology Sentences Marked-up Data Markup schemes Grammatical tagging Words Collocations Frequency Mean and Variance Hypothesis Testing T test Hypothesis testing of differences Pearson's chi-square test Likelihood ratios Mutual Information Notion of Collocation Statistical Inference: n-gram Models over Sparse Data Bins: Forming Equivalence Classes Reliability vs. discrimination n-gram models Building n-gram models Statistical Estimators Maximum Likelihood Estimation (MLE) Laplace's law, Lidstone's law and the Jeffreys-Perks law Held out estimation Cross-validation (deleted estimation) Good-Turing estimation Briefly noted Combining Estimators Simple linear interpolation Katz's backing-off General linear interpolation Briefly noted Language models for Austen Word Sense Disambiguation Methodological Preliminaries Supervised and unsupervised learning Pseudowords Upper and lower bounds on performance Supervised Disambiguation Bayesian classification An information-theoretic approach Dictionary-Based Disambiguation Disambiguation based on sense definitions Thesaurus-based disambiguation Disambiguation based on translations in a second-language corpus One sense per discourse, one sense per collocation Unsupervised Disambiguation What Is a Word Sense? Lexical Acquisition Evaluation Measures Verb Subcategorization Attachment Ambiguity Hindle and Rooth (1993) General remarks on PP attachment Selectional Preferences Semantic Similarity Vector space measures Probabilistic measures Role of Lexical Acquisition in Statistical NLP Grammar Markov Models Markov Models Hidden Markov Models Why use HMMs? General form of an HMM Three Fundamental Questions for HMMs Finding the probability of an observation Finding the best state sequence Third problem: Parameter estimation HMMs: Implementation, Properties, and Variants Implementation Variants Multiple input observations Initialization of parameter values Part-of-Speech Tagging Information Sources in Tagging Markov Model Taggers Probabilistic model Viterbi algorithm Variations Hidden Markov Model Taggers Applying HMMs to POS tagging Effect of initialization on HMM training Transformation-Based Learning of Tags Transformations Learning algorithm Relation to other models Automata Other Methods, Other Languages Other approaches to tagging Languages other than English Tagging Accuracy and Uses of Taggers Tagging accuracy Applications of tagging Probabilistic Context Free Grammars Some Features of PCFGs Questions for PCFGs Probability of a String Using inside probabilities Using outside probabilities Finding the most likely parse for a sentence Training a PCFG Problems with the Inside-Outside Algorithm Probabilistic Parsing Some Concepts Parsing for disambiguation Treebanks Parsing models vs.
language models Weakening the independence assumptions of PCFGs Tree probabilities and derivational probabilities There's more than one way to do it Phrase structure grammars and dependency grammars Evaluation Equivalent models Building parsers: Search methods Use of the geometric mean Some Approaches Non-lexicalized treebank grammars Lexicalized models using derivational histories Dependency-based models Discussion Applications and Techniques Statistical Alignment and Machine Translation Text Alignment Aligning sentences and paragraphs Length-based methods Offset alignment by signal processing techniques Lexical methods of sentence alignment Word Alignment Statistical Machine Translation Clustering Hierarchical Clustering Single-link and complete-link clustering Group-average agglomerative clustering An application: Improving a language model Top-down clustering Non-Hierarchical Clustering K-means EM algorithm Topics in Information Retrieval Some Background on Information Retrieval Common design features of IR systems Evaluation measures Probability ranking principle (PRP) Vector Space Model Vector similarity Term weighting Term Distribution Models Poisson distribution Two-Poisson model K mixture Inverse document frequency Residual inverse document frequency Usage of term distribution models Latent Semantic Indexing Least-squares methods Singular Value Decomposition Latent Semantic Indexing in IR Discourse Segmentation TextTiling Text Categorization Decision Trees Maximum Entropy Modeling Generalized iterative scaling Application to text categorization Perceptrons k Nearest Neighbor Classification Tiny Statistical Tables
Tags from this library: No tags from this library for this title. Log in to add tags.
    Average rating: 0.0 (0 votes)
No physical items for this record

Companion Website with errata available via the Internet.

Includes bibliographical references (pages 611-655) and index.

I Preliminaries 1 -- 1.1 Ratinalist and Empiricist Approaches to Language 4 -- 1.2 Scientific Content 7 -- 1.2.1 Questions that linguistics should answer 8 -- 1.2.2 Non-categorical phenomena in language 11 -- 1.2.3 Language and cognition as probabilistic phenomena 15 -- 1.3 Ambiguity of Language: Why NLP Is Difficult 17 -- 1.4 Dirty Hands 19 -- 1.4.1 Lexical resources 19 -- 1.4.2 Word counts 20 -- 1.4.3 Zipf's laws 23 -- 1.4.4 Collocations 29 -- 1.4.5 Concordances 31 -- 2 Mathematical Foundations 39 -- 2.1 Elementary Probability Theory 40 -- 2.1.1 Probability spaces 40 -- 2.1.2 Conditional probability and independence 42 -- 2.1.3 Bayes' theorem 43 -- 2.1.4 Random variables 45 -- 2.1.5 Expectation and variance 46 -- 2.1.6 Notation 47 -- 2.1.7 Joint and conditional distributions 48 -- 2.1.8 Determining P 48 -- 2.1.9 Standard distributions 50 -- 2.1.10 Bayesian statistics 54 -- 2.2 Essential Information Theory 60 -- 2.2.1 Entropy 61 -- 2.2.2 Joint entropy and conditional entropy 63 -- 2.2.3 Mutual information 66 -- 2.2.4 Noisy channel model 68 -- 2.2.5 Relative entropy or Kullback-Leibler divergence 72 -- 2.2.6 Relation to language: Cross entropy 73 -- 2.2.7 Entropy of English 76 -- 2.2.8 Perplexity 78 -- 3 Linguistic Essentials 81 -- 3.1 Parts of Speech and Morphology 81 -- 3.1.1 Nouns and pronouns 83 -- 3.1.2 Words that accompany nouns: Determiners and adjectives 87 -- 3.1.3 Verbs 88 -- 3.1.4 Other parts of speech 91 -- 3.2 Phrase Structure 93 -- 3.2.1 Phrase structure grammars 96 -- 3.2.2 Dependency: Arguments and adjuncts 101 -- 3.2.3 X' theory 106 -- 3.2.4 Phrase structure ambiguity 107 -- 3.3 Semantics and Pragmatics 109 -- 3.4 Other Areas 112 -- 4 Corpus-Based Work 117 -- 4.1 Getting Set Up 118 -- 4.1.1 Computers 118 -- 4.1.2 Corpora 118 -- 4.1.3 Software 120 -- 4.2 Looking at Text 123 -- 4.2.1 Low-level formatting issues 123 -- 4.2.2 Tokenization: What is a word? 124 -- 4.2.3 Morphology 131 -- 4.2.4 Sentences 134 -- 4.3 Marked-up Data 136 -- 4.3.1 Markup schemes 137 -- 4.3.2 Grammatical tagging 139 -- II Words 149 -- 5 Collocations 151 -- 5.1 Frequency 153 -- 5.2 Mean and Variance 157 -- 5.3 Hypothesis Testing 162 -- 5.3.1 T test 163 -- 5.3.2 Hypothesis testing of differences 166 -- 5.3.3 Pearson's chi-square test 169 -- 5.3.4 Likelihood ratios 172 -- 5.4 Mutual Information 178 -- 5.5 Notion of Collocation 183 -- 6 Statistical Inference: n-gram Models over Sparse Data 191 -- 6.1 Bins: Forming Equivalence Classes 192 -- 6.1.1 Reliability vs. discrimination 192 -- 6.1.2 n-gram models 192 -- 6.1.3 Building n-gram models 195 -- 6.2 Statistical Estimators 196 -- 6.2.1 Maximum Likelihood Estimation (MLE) 197 -- 6.2.2 Laplace's law, Lidstone's law and the Jeffreys-Perks law 202 -- 6.2.3 Held out estimation 205 -- 6.2.4 Cross-validation (deleted estimation) 210 -- 6.2.5 Good-Turing estimation 212 -- 6.2.6 Briefly noted 216 -- 6.3 Combining Estimators 217 -- 6.3.1 Simple linear interpolation 218 -- 6.3.2 Katz's backing-off 219 -- 6.3.3 General linear interpolation 220 -- 6.3.4 Briefly noted 222 -- 6.3.5 Language models for Austen 223 -- 7 Word Sense Disambiguation 229 -- 7.1 Methodological Preliminaries 232 -- 7.1.1 Supervised and unsupervised learning 232 -- 7.1.2 Pseudowords 233 -- 7.1.3 Upper and lower bounds on performance 233 -- 7.2 Supervised Disambiguation 235 -- 7.2.1 Bayesian classification 235 -- 7.2.2 An information-theoretic approach 239 -- 7.3 Dictionary-Based Disambiguation 241 -- 7.3.1 Disambiguation based on sense definitions 242 -- 7.3.2 Thesaurus-based disambiguation 244 -- 7.3.3 Disambiguation based on translations in a second-language corpus 247 -- 7.3.4 One sense per discourse, one sense per collocation 249e -- 7.4 Unsupervised Disambiguation 252 -- 7.5 What Is a Word Sense? 256 -- 8 Lexical Acquisition 265 -- 8.1 Evaluation Measures 267 -- 8.2 Verb Subcategorization 271 -- 8.3 Attachment Ambiguity 278 -- 8.3.1 Hindle and Rooth (1993) 280 -- 8.3.2 General remarks on PP attachment 284 -- 8.4 Selectional Preferences 288 -- 8.5 Semantic Similarity 294 -- 8.5.1 Vector space measures 296 -- 8.5.2 Probabilistic measures 303 -- 8.6 Role of Lexical Acquisition in Statistical NLP 308 -- III Grammar 315 -- 9 Markov Models 317 -- 9.1 Markov Models 318 -- 9.2 Hidden Markov Models 320 -- 9.2.1 Why use HMMs? 322 -- 9.2.2 General form of an HMM 324 -- 9.3 Three Fundamental Questions for HMMs 325 -- 9.3.1 Finding the probability of an observation 326 -- 9.3.2 Finding the best state sequence 331 -- 9.3.3 Third problem: Parameter estimation 333 -- 9.4 HMMs: Implementation, Properties, and Variants 336 -- 9.4.1 Implementation 336 -- 9.4.2 Variants 337 -- 9.4.3 Multiple input observations 338 -- 9.4.4 Initialization of parameter values 339 -- 10 Part-of-Speech Tagging 341 -- 10.1 Information Sources in Tagging 343 -- 10.2 Markov Model Taggers 345 -- 10.2.1 Probabilistic model 345 -- 10.2.2 Viterbi algorithm 349 -- 10.2.3 Variations 351 -- 10.3 Hidden Markov Model Taggers 356 -- 10.3.1 Applying HMMs to POS tagging 357 -- 10.3.2 Effect of initialization on HMM training 359 -- 10.4 Transformation-Based Learning of Tags 361 -- 10.4.1 Transformations 362 -- 10.4.2 Learning algorithm 364 -- 10.4.3 Relation to other models 365 -- 10.4.4 Automata 367 -- 10.5 Other Methods, Other Languages 370 -- 10.5.1 Other approaches to tagging 370 -- 10.5.2 Languages other than English 371 -- 10.6 Tagging Accuracy and Uses of Taggers 371 -- 10.6.1 Tagging accuracy 371 -- 10.6.2 Applications of tagging 374 -- 11 Probabilistic Context Free Grammars 381 -- 11.1 Some Features of PCFGs 386 -- 11.2 Questions for PCFGs 388 -- 11.3 Probability of a String 392 -- 11.3.1 Using inside probabilities 392 -- 11.3.2 Using outside probabilities 394 -- 11.3.3 Finding the most likely parse for a sentence 396 -- 11.3.4 Training a PCFG 398 -- 11.4 Problems with the Inside-Outside Algorithm 401 -- 12 Probabilistic Parsing 407 -- 12.1 Some Concepts 408 -- 12.1.1 Parsing for disambiguation 408 -- 12.1.2 Treebanks 412 -- 12.1.3 Parsing models vs.

language models 414 -- 12.1.4 Weakening the independence assumptions of PCFGs 416 -- 12.1.5 Tree probabilities and derivational probabilities 421 -- 12.1.6 There's more than one way to do it 423 -- 12.1.7 Phrase structure grammars and dependency grammars 428 -- 12.1.8 Evaluation 431 -- 12.1.9 Equivalent models 437 -- 12.1.10 Building parsers: Search methods 439 -- 12.1.11 Use of the geometric mean 442 -- 12.2 Some Approaches 443 -- 12.2.1 Non-lexicalized treebank grammars 443 -- 12.2.2 Lexicalized models using derivational histories 448 -- 12.2.3 Dependency-based models 451 -- 12.2.4 Discussion 454 -- IV Applications and Techniques 461 -- 13 Statistical Alignment and Machine Translation 463 -- 13.1 Text Alignment 466 -- 13.1.1 Aligning sentences and paragraphs 467 -- 13.1.2 Length-based methods 471 -- 13.1.3 Offset alignment by signal processing techniques 475 -- 13.1.4 Lexical methods of sentence alignment 478 -- 13.2 Word Alignment 484 -- 13.3 Statistical Machine Translation 486 -- 14 Clustering 495 -- 14.1 Hierarchical Clustering 500 -- 14.1.1 Single-link and complete-link clustering 503 -- 14.1.2 Group-average agglomerative clustering 507 -- 14.1.3 An application: Improving a language model 509 -- 14.1.4 Top-down clustering 512 -- 14.2 Non-Hierarchical Clustering 514 -- 14.2.1 K-means 515 -- 14.2.2 EM algorithm 518 -- 15 Topics in Information Retrieval 529 -- 15.1 Some Background on Information Retrieval 530 -- 15.1.1 Common design features of IR systems 532 -- 15.1.2 Evaluation measures 534 -- 15.1.3 Probability ranking principle (PRP) 538 -- 15.2 Vector Space Model 539 -- 15.2.1 Vector similarity 540 -- 15.2.2 Term weighting 541 -- 15.3 Term Distribution Models 544 -- 15.3.1 Poisson distribution 545 -- 15.3.2 Two-Poisson model 548 -- 15.3.3 K mixture 549 -- 15.3.4 Inverse document frequency 551 -- 15.3.5 Residual inverse document frequency 553 -- 15.3.6 Usage of term distribution models 554 -- 15.4 Latent Semantic Indexing 554 -- 15.4.1 Least-squares methods 557 -- 15.4.2 Singular Value Decomposition 558 -- 15.4.3 Latent Semantic Indexing in IR 564 -- 15.5 Discourse Segmentation 566 -- 15.5.1 TextTiling 567 -- 16 Text Categorization 575 -- 16.1 Decision Trees 578 -- 16.2 Maximum Entropy Modeling 589 -- 16.2.1 Generalized iterative scaling 591 -- 16.2.2 Application to text categorization 594 -- 16.3 Perceptrons 597 -- 16.4 k Nearest Neighbor Classification 604 -- Tiny Statistical Tables 609.

1

There are no comments on this title.

to post a comment.