Keynote Speakers

Prof Shrikanth (Shri) Narayanan

University of Southern California, Los Angeles, CA

Title: Bridging Speech Science and Technology — Now and Into the Future

Monday, 21 August, 09:30 – 10:30 – The Auditorium


Speech research is remarkable in so many ways––in its essential human-centeredness, the rich interconnections between the science and technology, and its wide-ranging impact that is both fundamental and applied.  Crucial advances in speech science research catalyze and leverage technological advances across the machine intelligence ecosystem, from sensing and imaging to signal processing and machine learning. Likewise, creation of speech-centric societal applications benefits from an understanding of how humans produce, process and use speech in communication. In these complementary endeavors, two intertwined lines of inquiry endure: illuminating the rich information tapestry and inherent variability in speech and creating trustworthy speech technologies.

This talk will highlight some advances and possibilities in this multifaceted speech research realm. The first is capturing and modeling the human vocal instrument during speaking and how related technological and clinical applications leverage this technology.  The second focuses on speech-based informatics tools to support research and clinical translation related to human health and wellbeing. Finally, the talk will highlight the critical goal of designing trustworthy speech and spoken language machine intelligence tools that are inclusive, equitable, robust, safe, and secure.


Shrikanth (Shri) Narayanan is University Professor and Niki & C. L. Max Nikias Chair in Engineering at the University of Southern California (USC), where he is Professor of Electrical & Computer Engineering, Computer Science, Linguistics, Psychology, Neuroscience, Pediatrics, and Otolaryngology—Head & Neck Surgery, Director of the Ming Hsieh Institute and Research Director of the Information Sciences Institute. Prior to USC, he was with AT&T Bell Labs and AT&T Research. His interdisciplinary research focuses on human-centered sensing/imaging, signal processing, and machine intelligence centered on human communication, interaction, emotions, and behavior.  He is a Fellow the Acoustical Society of America, IEEE, ISCA, the American Association for the Advancement of Science, the Association for Psychological Science, the Association for the Advancement of Affective Computing, the American Institute for Medical and Biological Engineering, and the National Academy of Inventors. He is a Guggenheim Fellow and member of the European Academy of Sciences and Arts, and a recipient of many research and education awards. He has published widely and his inventions have led to technology commercialization including through startups he co-founded: Behavioral Signals Technologies focused on AI based conversational assistance and Lyssn focused on mental health care and quality assurance.


Please click here to access the presentation slides.

Prof Virginia Dignum

Umeå University, Umeå, Sweden

Title: Beyond the AI hype: Balancing Innovation and  Social Responsibility

Tuesday, 22 August, 08:30 – 09:30 – The Auditorium


AI can extend human capabilities but requires addressing challenges in education, jobs, and biases. Taking a responsible approach involves understanding AI’s nature, design choices, societal role, and ethical considerations. Recent AI developments, including foundational models, transformer models, generative models, and large language models (LLMs), raise questions about whether they are changing the paradigm of AI, and about the responsibility of those that are developing and deploying AI systems. In all these developments, is vital to understand that AI is not an autonomous entity but rather dependent on human responsibility and decision-making.

In this talk, I will further discuss the need for a responsible approach to AI that emphasize trust, cooperation, and the common good. Taking responsibility involves regulation, governance, and awareness. Ethics and dilemmas are ongoing considerations, but require understanding that trade-offs must be made and that decision processes are always contextual. Taking responsibility requires designing AI systems with values in mind, implementing regulations, governance, monitoring, agreements, and norms. Rather than viewing regulation as a constraint, it should be seen as a stepping stone for innovation, ensuring public acceptance, driving transformation, and promoting business differentiation. Responsible Artificial Intelligence (AI) is not an option but the only possible way to go forward in AI.


Virginia Dignum is Professor of Responsible Artificial Intelligence at Umeå University, Sweden and director of WASP-HS, the Wallenberg Program on  Humanities and Society for AI, Autonomous Systems and Software, the largest Swedish national research program on fundamental multidisciplinary research on the societal and human impact of AI. She is a member of the Royal Swedish Academy of Engineering Sciences (IVA), and a Fellow of the European Artificial Intelligence Association (EURAI). She is a member of the Global Partnership on AI (GPAI), the World Economic Forum’s Global Artificial Intelligence Council, the UNESCO expert group on the implementation of AI recommendations, the Executive Committee of the IEEE Initiative on Ethically Aligned Design, and of ALLAI, the Dutch AI Alliance. She was a member of EU’s High Level Expert Group on Artificial Intelligence and leader of UNICEF’s guidance for AI and children. She is author of “Responsible Artificial Intelligence: developing and using AI in a responsible way”.

Roger Moore

Roger Moore

Panel Chairperson

Panel Discussion

End-to-End Models – Friend or Foe of Speech Research?

Wednesday, 23 August – 08:30 – 09:30 – The Auditorium

End-to-end architectures have revolutionised performance in many areas of speech technology. You no longer need to be an expert in speech to build, for example, an ASR system with performance our community only dreamed of a decade ago. INTERSPEECH has always valued the symbiotic relationship between speech science and speech technology, with linguists, phoneticians, computer scientists and engineers all learning from one another. Put simply, we need each other, or so we have always liked to believe. But where now? Does the dominance of end-to-end architectures, coupled with vast amounts of speech data and compute power mean we can learn anything we need to directly from a speech signal, without needing to understand what’s going on? Can the speech technologists go it alone? Do speech scientists care? Can speech technology and speech science working alongside one another achieve greater research outcomes than apart?

This Keynote session sees a panel of experts from our INTERSPEECH community discuss this important topic.

Dilek <br />Hakkani-Tür


Julia Hirschberg

Julia Hirschberg

Columbia University
Dan Jurafsky

Dan Jurafsky

Stanford University
Ralf Schlüter

Ralf Schlüter

RWTH Aachen University

Prof Martine Grice

University of Cologne

Title: What’s in a Rise? The Relevance of Intonation for Attention Orienting

Thursday, 24 August, 08:30 – 09:30 – The Auditorium


In this talk I will explore why and how intonational rises are used to orient attention towards the words and phrases bearing them.  The attention orienting function of rising pitch is known outside the linguistic domain, with evidence from auditory looming, a phenomenon whereby a signal that increases in loudness or pitch appears to be approaching the listener and is perceived as an immediate threat.

This attention orienting function extends to speech communication, where rises in pitch are crucial for directing listeners’ attention to the most important parts of the linguistic message. I will provide evidence from event related brain potentials that such rises affect both preattentive and conscious attention. Moreover, the lack of a rise can, in some situations, direct attention away from parts of the message, leading to information being missed. I will also discuss the influence of intonational rises on short-term memory, showing that rises can boost recall of items in a list. This effect can be local to a particular item if the rise is accentual, or more global if the rise is at the edge of a domain. However, despite the cross-linguistic effect of rises on attention, their influence can be impacted by language specific prosodic structure and linguistic expectations.


Martine Grice is the Professor of Phonetics at the University of Cologne. She has served as President of the Association of Laboratory Phonology and is board member of Journal of Phonetics and Laboratory Phonology, and editor-in-chief of Studies in Laboratory Phonology.

Her work on intonation theory investigates complex tonal structures at heads and edges of prosodic constituents and at the interplay between tune and text. Besides her analyses of Italian, English and German, she has tackled particularly challenging languages, such as Tashlhiyt Berber (where sonorant material for bearing intonational tones can be scarce), Vietnamese (where intonation can overwrite tone) and Maltese (where lexical and post-lexical prominences do not always align).

Her current projects deal with attention orienting, looking for evidence of prosodic structure in the effect of prosody on serial recall and processing of incongruence, individual-specific patterns in face-to-face dyadic communication, with a focus on autism, and modelling of prominence in language.