Automatic Speech Recognition and Understanding Workshop
December 8-12, 2013 | Olomouc, Czech Republic
This talk will discuss present and future applications of ASRU technologies. The talk will include analysis of the state of the art and will provide insights into future directions for research and applications.
Dr. Olive is a SETA consultant at DARPA. He has served as a program manager at DARPA for six and a half years. During that time he designed, advocated, acquired funding for and managed three programs involved in speech and language, Global Autonomous Language Exploitation (GALE), Multi-lingual Automated Document Classification, Analysis and Translation (MADCAT), and Robust Automatic Transcription of Speech (RATS). Before leaving DARPA, Dr. Olive developed and secured funding for a forth program involved in translation of the informal language genre (e-mail, messaging, conversations) and bilingual speech and text communication. Before joining DARPA, Dr. Olive has had more than thirty years of experience in research and development at Bell Laboratories and 19 years of experience in management. Dr. Olive graduated from the University of Chicago with a Doctor of Philosophy in Physics and a Master of Arts in music theory and composition.
This presentation will show how speech recognition, natural language processing, and information retrieval techniques are being combined to drive an innovative anticipatory search engine that is able to listen in on spontaneous, human-to-human conversations and augment them by providing, proactively and in real time, information culled from a variety of sources, including web and social graph, that is directly relevant to the content being discussed.
Dr Marsal Gavalda is a senior R&D leader with deep expertise in speech and language technologies. Before joining Expect Labs as Director of Research, Marsal served as VP and Chief of Research at Verint Systems, where he led Verint's global R&D in speech and text analytics. Prior to that, Marsal served as VP of Research and Incubation at Nexidia, where he conducted original research in speech recognition and natural language understanding, and developed disruptive solutions for the call center, intelligence, and media markets. Marsal holds a PhD in Language Technologies and a MS in Computational Linguistics, both from Carnegie Mellon University, and a BS in Computer Science from the Universitat Politecnica de Catalunya in Barcelona. Marsal is a frequent speaker at academic and technology conferences and every summer he organizes a summit in Barcelona on topics as diverse as machine translation, music, or the neuroscience of free will.
Automatic pattern classifiers that output soft, probabilistic classifications---rather than hard decisions---can be more widely and more profitably applied, provided the probabilistic output is well-calibrated. In the fields of automatic speaker recognition and automatic spoken language recognition, the regular NIST technology evaluations have placed a strong emphasis on cost effective application and therefore on calibration. This talk will describe calibration solutions for these technologies, with emphasis on criteria for measuring the goodness of calibration---if we can measure it, we can also optimize it.
Niko Brummer received B.Eng (1986), M.Eng (1988) and Ph.D. (2010) degrees, all in electronic engineering, from Stellenbosch University. He worked as researcher at DataFusion (later called Spescom DataVoice) and is currently chief scientist at AGNITIO. Most of his research for the last two decades has been applied to automatic speaker and language recognition and he has been participating in most of the NIST SRE and LRE evaluations in these technologies, from the year 2000 to the present. He has been contributing to the Odyssey Workshop series since 2001 and was organizer of Odyssey 2008 in Stellenbosch. His FoCal Toolkit is widely used for fusion and calibration in speaker and language recognition research.
His research interests include development of new algorithms for speaker and language recognition, as well as evaluation methodologies for these technologies. In both cases, his emphasis is on probabilistic modelling. He has worked with both generative (eigenchannel, JFA, i-vector PLDA) and discriminative (system fusion, discriminative JFA and PLDA) recognizers. In evaluation, his focus is on judging the goodness of classifiers that produce probabilistic outputs in the form of well calibrated class likelihoods.
Several use cases how speech technologies (mainly speech transcription, keyword spotting, language recognition, speaker recognition) are used in call centers, banks, by governmental agencies, or by broad cast service providers for speech data mining, voice analytics or voice biometry will be discussed. Each client and use case has some specific requirements on technology, data handling and services. The requirements and its implication on technology development and research will be mentioned.
Petr Schwarz joined Speech Recognition Group at Brno Univerzity of Technology (BUT) in 1997 when it was based by Honza Cernocky. He helped to lead this group to one of the most successful research group in the world in these days. Petr worked on many research projects sponsored by US, EU or Czech grant agencies, and on many industrial projects. His main research interests were neural networks and phoneme recognition, but he worked on many technologies including speech transcription, keyword spotting, language recognition, speaker recognition, gender recognition and others. Many of the systems were evaluated by NIST. Petr was awarded by silver metal of Brno University of Technology for research. In 2006 Petr and his colleagues based the Phonexia company where is Petr director. Phonexia focus on speech data mining, voice analytic and voice biometry. The company provides speech technologies and solutions to call centers, banks, telecommunication companies, governmental agencies, broadcast service providers, universities and others clients from more than 15 countries over world nowadays.
Speech technologies like ASR have made tremendous progress in recent years, making great strides in speed, accuracy, vocabulary, and robustness to name just a few improvements. As exciting as the research results are, they often require significant resources such as large microphone arrays, enormous amounts of data, or hundreds to thousands of cores of processing power to run, limiting what kinds of applications can make use of them. This talk will discuss the numerous challenges of real-world deployment of speech technology to a difficult user population - children - outside of relatively friendly lab environments or competitive evaluations. It will also present some of the solutions for these challenges used or considered by ToyTalk when designing our applications and what we have learned from usage in the wild, including an examination of why children might be more reasonable users than expected, despite their obvious difficulties.
Brian Langner is a Co-Founder and Senior Speech Scientist at ToyTalk, a family entertainment company. He has earned a Ph.D. and M.S. in Language Technologies from Carnegie Mellon University, as well as a B.S. in Computer Science and Engineering from Michigan State University. His primary research work has been in improving Human-Machine Spoken Interaction, at the intersection of Spoken Dialog Systems, Natural Language Generation, and Speech Synthesis. At ToyTalk, Brian is responsible for using and improving speech and language technologies as part of building entertaining conversation-driven experiences for children.
In this talk I'll give an overview of speech at google, its history, the products where speech is a central component, our existing API's, and how these products and API's have evolved over time. I will also talk about how we see speech in google becoming a more and more important component in google search capabilities. I'll describe the challenges we face and some of the research problems we have to tackle in the coming years to fulfill the dream of a star trek conversational all knowing search engine.
Dr. Pedro J. Moreno co-leads the languages modeling group at the Speech team at Google. His team is in charge of deploying speech recognition services in all supported languages and improving their quality. He joined Google 9 years ago after working as a research scientist at HP Labs. Dr. Moreno completed his Ph.D. studies at Carnegie Mellon University. Before that he completed an Electrical Engineering degree at Universidad Politecnica de Madrid, Spain.