We log anonymous usage statistics. Please read the privacy information for details.
Forthcoming: Developments in the modeling of speech prosody
Synopsis
Over the last 20 years, speech recognition and synthesis have made tremendous progress, supported by the technologies of Generative Artificial Intelligence. This has made an important contribution to our knowledge of the nature of spoken language and of how it may be represented in our linguistic/phonetic descriptions. There still remain, however, many issues to be solved, especially in the area of speech prosody. Prosody covers broad time spans of speech at and above the syllable level and varies greatly due to paralinguistic and non-linguistic factors. Currently, most statistical (deep neural network) speech synthesizers do not explicitly handle prosodic features, which limits the quality of the resulting speech and can be a source of misunderstandings. To solve this problem, and to improve our understanding of spoken language, the empirically exact and theoretically well-founded modeling of prosodic features is of paramount importance. An approach which fulfils these criteria is the ‘Fujisaki Model’, which provides a seminal model that has stimulated generations of researchers and inspired a family of related approaches in what has become a standard paradigm of automatic prosody generation in text-to-speech systems, as well as a blueprint for the shed light on how articulatory laryngeal gestures produce vocal folds vibration. This collection of papers aims to stimulate discussion on issues in speech science and speech technology which start with and go beyond this paradigm, and to inspire new generations of researchers in the era of Artificial Intelligence.
Chapters
-
Introduction
-
A computational approach for determining the base frequency for the F0 contour generation model
-
Probing melodic differences in German and Brazilian Portuguese with the Fujisaki model parameters and time series cross-correlations
-
AI-Speech SynthesisToo big for its boots?
-
generative dynamical model of discrete events in continuous f0
-
Spoken language prosody and its modelling using the Fujisaki F0 Model for the Bengali language
-
On the linearity of prosody
-
Revisiting the acoustic cues of Mandarin rising and falling tonesRoles of F0 range, F0 slope and duration
-
The OMProDat prosodic annotation protocol
-
Linguistic and paralinguistic influences on prosody annotation in German and Brazilian Portuguese
-
Revisiting the physiological motivation of Fujisaki's intonation model
-
Information density and the predictability of prosodic structure
-
Patterns of acoustic cues to prosodic structure
-
Processing the prosody of emotional speech using machine learning
-
Modeling tone and intonation by simulating learning
