Forthcoming: Developments in the modeling of speech prosody

Plínio A. Barbosa (ed), Nick Campbell (ed), Dafydd Gibbon (ed), Keikichi Hirose (ed), Daniel Hirst (ed)

Synopsis

Over the last 20 years, speech recognition and synthesis have made tremendous progress, supported by the technologies of Generative Artificial Intelligence. This has made an important contribution to our knowledge of the nature of spoken language and of how it may be represented in our linguistic/phonetic descriptions. There still remain, however, many issues to be solved, especially in the area of speech prosody. Prosody covers broad time spans of speech at and above the syllable level and varies greatly due to paralinguistic and non-linguistic factors. Currently, most statistical (deep neural network) speech synthesizers do not explicitly handle prosodic features, which limits the quality of the resulting speech and can be a source of misunderstandings. To solve this problem, and to improve our understanding of spoken language, the empirically exact and theoretically well-founded modeling of prosodic features is of paramount importance. An approach which fulfils these criteria is the ‘Fujisaki Model’, which provides a seminal model that has stimulated generations of researchers and inspired a family of related approaches in what has become a standard paradigm of automatic prosody generation in text-to-speech systems, as well as a blueprint for the shed light on how articulatory laryngeal gestures produce vocal folds vibration. This collection of papers aims to stimulate discussion on issues in speech science and speech technology which start with and go beyond this paradigm, and to inspire new generations of researchers in the era of Artificial Intelligence.

Chapters

  • Introduction
    Plínio A. Barbosa, Nick Campbell, Dafydd Gibbon, Keikichi Hirose, Daniel Hirst
  • A computational approach for determining the base frequency for the F0 contour generation model
    Yoshiko Arimoto, Yasuo Horiuchi
  • Probing melodic differences in German and Brazilian Portuguese with the Fujisaki model parameters and time series cross-correlations
    Plínio A. Barbosa
  • AI-Speech Synthesis
    Too big for its boots?
    Nick Campbell, Toshiyuki Sadanobu, Akiko Mokhtari
  • generative dynamical model of discrete events in continuous f0
    Jennifer Cole, Khalil Iskarous
  • Spoken language prosody and its modelling using the Fujisaki F0 Model for the Bengali language
    Shyamal Kumar Das Mandal
  • On the linearity of prosody
    Dafydd Gibbon
  • Revisiting the acoustic cues of Mandarin rising and falling tones
    Roles of F0 range, F0 slope and duration
    Wentao Gu, Wei Zhang
  • The OMProDat prosodic annotation protocol
    Daniel Hirst, Mohammad Bani Younes, Hyongsil Cho, Hongwei Ding, Sophie Herment, Mortaza Taheri-Ardal, Martti Vainio, John Wakefield
  • Linguistic and paralinguistic influences on prosody annotation in German and Brazilian Portuguese
    Hansjörg Mixdorff, Pablo Arantes
  • Revisiting the physiological motivation of Fujisaki's intonation model
    Bernd Möbius
  • Information density and the predictability of prosodic structure
    Bernd Möbius, Bistra Andreeva, Ivan Yuen
  • Patterns of acoustic cues to prosodic structure
    Stefanie Shattuck-Hufnagel, Alejna Brugos, Jonathan Barnes, Nanette Veilleux
  • Processing the prosody of emotional speech using machine learning
    Jianhua Tao
  • Modeling tone and intonation by simulating learning
    Yi Xu, Yue Chen

Biographies

Plínio A. Barbosa

Plínio A. Barbosa obtained his PhD degree in 1994 by the Institut National Polytechnique de Grenoble, France. He is full professor of the Department of Linguistics at the State University of Campinas, Brazil, and is responsible for the Speech Prosody Studies and the Forensic Phonetic Studies Groups, both formed by a team of researchers and students working on (dynamical) analysis and modelling of speech prosody. He has published five books on Speech rhythm, Experimental phonetics, Prosody, Speech Sciences and Experimental prosody. His interests are directed to experimental phonetics, speech rhythm and intonation, dynamical systems theory, forensic phonetics, and acoustic correlates of literary texts interpretations. He is Editor of the Journal of Speech Sciences, a member of the editorial board of the International Journal of Speech Technology, Phonetica, Journal of Phonetics. He is current member of the following associations: IPA, ISCA and a member of the board of the ISCA Speech Prosody SIG.

Nick Campbell

Nick Campbell is Professor and Fellow Emeritus in the Department of Computer Science and Statistics at Trinity College Dublin (The University of Dublin) Ireland . He is credited with over 10 thousand citations for more than 500 peer-reviewed publications, currently with an h-index of 50 and an i110 index of 175. His research includes studies on emotional speech, speech databases, and speech synthesis. He has worked at AT&T Bell Labs, IBM UK Scientific Centre and Yorktown Heights labs, as well as ATR and NiCT in Japan, where he currently resides. His expertise is mainly on spoken interaction, with a focus on machine-understanding of natural human conversational speech. His third most cited reference is a patent for concatenative speech synthesis

Dafydd Gibbon

Dafydd Gibbon is emeritus Professor of Linguistics in the Faculty of Linguistics and Literature, Bielefeld University, Germany, specialising in computational linguistics and documentation of endangered languages, with particular interests in computational lexicography and in the acoustic analysis of the rhythms and melodies of speech. He has published numerous articles on prosody, phonology and phonetics, particularly of Niger-Congo and Sino-Tibetan languages, and in other areas of linguistics, as well as two handbooks on techniques and standards in speech technology as chief editor and as co-editor of a third, on technical communication. For his life’s work in services to language documentation, technology and education in West Africa he was appointed Officier de l'ordre du mérite ivoirien (Officer of the Ivory Coast Order of Merit) and received the Silver Jubilee Award of the Language Association of Nigeria. For services to Polish Linguistics and Phonetics he was inducted as Honorary Member of the Polish Phonetics Society and received the Bronze Medal for Lifetime Achievement from Adam Mickiewicz University, Poznań, Poland. In his spare time he is an amateur radio operator and gardener, and composes meditative guitar music.

Keikichi Hirose

Keikichi Hirose received the Ph. D. degree in electronic engineering in 1977 from the University of Tokyo. He was a professor of the University of Tokyo from 1994.  He retired in 2015, and received Professor of Emeritus title. From March 1987 to January 1988, he was Visiting Scientist at the Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, U.S.A.  He served as a project professor at National Institute of Informatics during June 2019 - March 2023.  He has been engaged in a wide range of research on spoken language processing, including analysis, synthesis, recognition, dialogue systems, and computer-assisted language learning.  From 2000 to 2004, he was Principal Investigator of the national project “Realization of advanced spoken language information processing utilizing prosodic features.”  He served as the general chair for INTERSPEECH 2010, Makuhari, Japan.  Since 2010, he served as the Chair of ISCA SProSIG until 2018.  Also, he served as a board member of ISCA during 2009-2017. He is a member of International Advisory Council from January 2021.  He received ISCA fellow grade in 2018.  He became an honorary member, Polish Phonetic Association, in 2013.  For his long-term and remarkable contribution to spoken language processing, he received Achievement Award from Acoustical Society of Japan in 2020.  In 2015, he was honored as a Named Person of Merit in Science and Technology by the Mayor of Tokyo. 

Daniel Hirst

Daniel Hirst has been working on speech prosody and phonology for over fifty years, (PhD 1974, Habilitation thesis 1987)). Currently Emeritus Research Director at CNRS in Aix-en-Provence, France, he has published numerous articles, chapters and books. Speech Prosody. From Acoustics to Interpretation was published in September 2024 (Springer). He has developed software for the automatic analysis of speech prosody (Momel, Intsint, ProZed), implemented as plugins for the Praat software, and all freely available from the ResearchGate website. In 2000 he founded SProSIG, (Speech Prosody Special Interest Group) affiliated to both ISCA and the IPA and in 2002 he organised the first International Conference on Speech Prosody, in Aix en Provence. The Conference has since been held regularly every two years throughout the world. In 2013 he was elected fellow of ISCA, in 2014 member of the Permanent Council for the Organisation of ICPhS and in 2018 ISCA distinguished lecturer (2018-2019).

Published

June 24, 2025
LaTeX source on GitHub
Cite as
Barbosa, Plínio A., Campbell, Nick, Gibbon, Dafydd, Hirose, Keikichi & Hirst, Daniel (eds.). Forthcoming. Developments in the modeling of speech prosody. (Spoken Language Research). Berlin: Language Science Press.

License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.