Introduction to Language Engineering     magyarul
1. Word-Level Tools BMETE91MX12
2. Syntax and Tools Over the Word Level BMETE91MX13

Course code: BMETE91MX12 - BMETE91MX13
The course is designed for introduction to processing of natural language in the point of view of computer engineering, highlighting the Hungarian specialties.

For listening to the lectures, it is recommended basic knowledge of theory of automaton and formal languages. (For example, Introduction to the Algebraic Theory of Automata TE917077, TE915023 Languages and Automata TE915023, Languages and Automata VISZM104, Formal Languages VIMA2208, Information Technology II. VIAU2024, VIAU2017).
Minimal knowledge of probability calculations, theory of algorithms, and skill in programming.

Related subjects on the BME:
András Kornai: The mathematical foundations of natural language processing
Péter Szeredi: Introduction to semantic technologies
Géza Gordos, Géza Németh: Speech information systems
Klára Vicsi: Speech acoustics: human and automatic speech processing
Klára Vicsi: Speech communication

The time the place of the lectures: The time will be fixed at the first lecture to meet requirement of every pupil.

Lecturer: Mátyás Naszódi

REQUIREMENTS
TOPIC
HOME WORKS
REVIEWS of literature
LECTURE NOTES
Study in 1990

REQUIREMENTS


TOPIC (changing)

1. Word-Level Tools
  • The role and the task of computational linguistics
  • History of languages; language categorization, levels of linguistic tasks
  • Characters and codes. Letter statistics and its benefit, Shannon's language identification.
  • Sorting algorithms - fast letter trees: b-trees and gamma-trees.
  • Statistics in linguistics, its reliability issues and limits.
  • Morphology - words and word forms. Number of different words and word forms. Active and passive languages
  • Generative models of morphology. From a generative model to fast analysis - efficient tools for word analysis (Ispell, Humor, finite and multi-level automata, Frey's algorithm). Relation between place and time requirements in the implementations
  • Statistical methods of spell checking, correcting, and generating words. (n-grams, Markovian chains, shake and bake)
  • Quality control and testing of spell checkers
  • Measure the immeasurable - theoretical and practical limits of quality of spell checkers
  • Ambiguities on word level and their elimination
  • Applications of morphology - dictionaries, intelligent search...
2. Syntax and Tools Over the Word Level
  • Types of natural languages: languages with isolation, inflection, conjugations, and with agglutinative morphology
  • Ambiguity is a basic characteristic of natural languages
  • Application of context-free grammars for natural languages and its limits
  • Two level grammars: affix and unification grammars
  • Syntax - Ordering and free word order rules in the Hungarian grammar
  • Verbal and nominal phrases
  • The role of pragmatics and dependences in syntax
  • Using statistical methods and use of corpora in computational linguistics
  • Machine translations and computer aided translations
  • Rule based, transfer based, direct, and statistical translations
  • Translation quality issues - methods for qualifying
  • Weak grammars: flat, local and partial syntax, and their applications

HOME WORKS (changing)

1. Word-Level Tools
  • Universal code converting
  • Character statistics
  • Text generator based on statistics
  • Comparison of spelling checkers
  • Comparison of text correction algorithms
  • Word statistics
2. Syntax and Tools Over the Word Level

REVIEWS of literature (changing)

1. Word-Level Tools
  • TLFSM
  • ISPELL, MYSPELL
  • n-grams
2. Syntax and Tools Over the Word Level
  • AGFL
  • HPSG
  • LFG
  • Metamorphosis Grammar