Offline databases

Hungarian Reference Speech Database (MRBA)
Developer: BME TMIT Laboratory of Speech Acoustics; Info: Klára Vicsi
The Hungarian Reference Speech Database (MRBA) was created jointly by BME TMIT Laboratory of Speech Acoustics and the Institute of Informatics at the University of Szeged. Their aim was to produce a speech database containing continuous texts read out aloud that could be used to set up and test PC speech recognition programs. The texts in the database were planned so that its sentences would contain enough instances of the recognition units typical in speech recognition programs (speech sounds, diphone and triphone units). Apart from the sentences, there are also some phonetically rich words to increase the occurrence of certain rarer speech sounds. Thus 332 speakers contributed 12 different sentences and 12 different other words (independent of the sentences) to the database.

Hungarian Telephone Speech Database (MTBA)
Developer: BME TMIT Laboratory of Speech Acoustics; Info: Klára Vicsi
In this speech database, the staff of the Budapest University of Technology and Economics (Department of Telecommunications and Media Informatics, Laboratory of Speech Acoustics) and of the University of Szeged (Department of Computer Science) collected Hungarian sentences recorded through telephones. Its structure follows that suggested in the European Union project MLAP LRE-63343 SPEECHDAT (M). There are 500 Hungarian speakers in the database: 297 of these were recorded talking through a regular landline telephone, while 203 were using mobile phones.

Szeged Corpus 2.0, Szeged Treebank 2.0
Developer: University of Szeged, Institute of Informatics
A Hungarian natural language database with part-of-speech disambiguation and full syntactic parsing.