2009年1月10日 星期六

中研院語音實驗室 - Phonetics Lab

國語口語語篇韻律研究簡述
摘要:口語語流韻律是否只是字調和語調的連接和平滑?根據我們的研究結果,答案是否定的。我們透過大批語音資料,證明語流(連續口語)的韻律其實是字調(tone)、語調(intonation) 和語篇韻律(Discourse Prosody)疊加後的代數和,是由小到大各個韻律範疇互動的結果,其中又以範圍最大的語篇韻律扮演的角色最為重要。強調語篇韻律的特性,不是各個字調或短語句調的個別表徵的集合,而是人在說話時,如何透過韻律相關性所表達的語篇語意關聯與一致性。提出「階層式多短語韻律句群架構」假說,透過量化分析多短語口語語篇敘述,提取韻律句群中各短語間的跨短語韻律相關性聲學參數,並透過線性回歸的統計分析方式,得到與階層韻律架構相對應的分層及疊加貢獻度,證明語篇中存有跨短語語篇韻律的基型,各大小韻律範疇的互動具系統性關係:短語調導致字調變異;語篇韻律導致短語調變異。因此字調和語調在語流中的變化,其實是受到上層語篇韻律的制約作用,造成位於不同的語篇位置中的字調和語調「不到位」的結果,表現出有別於詞組或孤立句的字調和語調特性。因此我們進而論述,對國語而言,語篇韻律是字調和句調以上的韻律成分;對句調而言,是造成動態變化的原因。只要釐清下層韻律單位在多短語韻律句群中的位置,便可預測字調和句調在語流韻律中的動態變化。




關於口語韻律研究,我的研究假設是,在超音段的層面,口語的語流韻律除了字調與句調以外,在句子 (sentence)之上,還有表達句子之間語意的連接(cohesion)與連貫(coherence)的單位「多短語語段multiple- phrase speech paragraph)」,管轄制約句法句調,系統性的造成句調變化以表達語篇訊息,因此,字調與句調都只是語段的次級韻律單位,語段則是語篇的次級單位,語流韻律實為語篇韻律。從韻律的階層性來說,字調由詞義規範、短語句調由句法結構規範,因此孤立字、孤立詞和孤立短語都是靜態韻律;語流韻律由語篇規範,造成短語句調系統性的變化,因此語篇韻律是動態韻律。




我研究口語韻律的單位是以口語語篇中的多短語語段為主,承襲了中國語言學之父趙元任先生以『大波浪小波浪』描述聲調與句調之間相互疊加或抵消關係的看法,提出了階層式的多短語韻律模架構,強調各短語間,因受來自上層訊息的管轄,而構築了相互間的韻律關係,因此語段裡的短語,好像兄弟姊妹一樣有了親屬關係,而不是將各短語之間視為不相關、各自擁有本身的句調的研究觀點。從認知與生理角度出發,針對語流韻律中跨短句的抑揚頓挫、音樂性及節奏,層層找出證據,提出多短語句群的的韻律結構,並以這個韻律結構提出跨句模組式的聲學模型。如此由上到下(top-down)的研究取向深獲日本學者藤崎博也先生所讚揚,認為此一研究方式乃語音分析之先鋒。


從2003年開始,我進行一系列口語韻律假說的驗證工作,我逐步透過自行開發的語料庫語音學的研究方法,以計量研究方式,分析大批語料的超音段聲學參數及停頓與邊界效應,在2004年提出五層的「階層式多短語韻律句群Prosodic Phrase Grouping簡稱PG」假說,得到語流中跨短語節奏與邊界效應的統計證據,證明了口語語流韻律中的音長分佈是動態的,來自字調、韻律詞、韻律短語和語篇各韻律層級貢獻的總和,支持了階層式的PG假說。此一說不但解釋了語流韻律的規範制約,來自語篇語意而非僅句法結構,並進一步解釋了字調及句法句調為何無法規範口語韻律以及語流句調為何如此多變的原因。隨後我陸續得到音強響度與邊界停頓的證據,因而得以在2005年提出PG的基型,及相對應的模組式聲學語音數學模型,並在Speech Communication以 Quantitative Prosody Modeling for Natural Speech Description and Generation的主題特刊中發表重要論文:”Fluent speech prosody: Framework and modeling”;同年我將PG階層向上擴大至語篇共為六層;2006年,我繼續得到PG的跨短語調型基型在基頻軌跡方面的量化證據,並將研究結果收錄於我應邀為Advances in Chinese Spoken Language Processing一書所寫的「韻律分析(Prosody Analysis)」章節中,至此我在每一個聲學語音層面,都已得到PG一說的證據。


「多短語語流韻律模組架構」所包括的四個聲學模組:基頻框架、節奏樣版、能量分分佈趨勢以及停延與停頓結構,可說是現今數學性的韻律模型中,最完整的聲學語音模型。這一理論不單解釋了為什麼短語句調並非最大的韻律單位,以及句調在語篇中韻律相關性,而且還可以此模型加以延伸、應用,對當今語音科技裡TTS 在語流韻律的力有未逮之處,提出兼具理論性及應用性的具體建議。
此外,基於學術資源共享之理念,此段期間所收集的語流韻律語音資料庫,以及所開發的以語音感知為基礎的語料分析平台,也已編輯成一套資料庫,定名為:中央研究院口語韻律語料庫暨工具平台(Sinica COSPRO and Toolkit),在2006年1月1日正式對外發行(詳見http://www.myet.com/COSPRO),希望藉此促進國內語音研究之蓬勃與全面性發展。下圖是我們所發展的語料分析平台:







回顧近十年來研究走向的轉變,以台灣地區通行的國語語篇的語流韻律現象作為研究對象,由傳統的聲調研究走向全面性的語流韻律現象之探索。在研究方法和研究結果兩方面皆有重大的突破。不僅開發語料庫語音學的研究方法;也拓展了傳統語音學的研究視角──從宏觀的角度,提出了階層式口語敘述語流韻律的完整架構,強調聲學語音的證據與模組,說明了語流韻律的旋律、節拍、強弱、停頓的結構是如何由各個韻律層的分層貢獻度疊加整合為口語語流的韻律表面現象;同時建立數學模型,直接對語音科技開發做出貢獻。







The major differences that set my research apart from both my predecessors and peers in the field of phonetics are the following:


1. Research Problems and Perspectives
I have been studying fluent continuous speech prosody of Mandarin Chinese from a macro/top-down perspective and taking units larger than phrase/sentence intonation into consideration. This perspective made possible the emergence of the major feature of fluent speech prosody, namely, the systematic cross-phrase prosodic association that constitutes the prosodic context rather than patterns of individual phrase intonation examined in separation and treated as intonation variations. Based on quantitative evidences obtained, I was able to postulate a hierarchical prosody framework that denotes how spoken discourse is formed in layers. My research also brought forth cross-phrase prosodic association from each layer of the prosodic hierarchy in every acoustic parameter. Consequently, I was able to construct Hierarchical Phrase Grouping Model (HPG) and multiple-phrase prosody templates in F0 trajectory patterns, syllable duration patterns, intensity distribution patterns as well as boundary properties in relation to boundary breaks (Tseng et al., 2004b, 2005a, 2006a). I have further obtained evidences of how these templates are in fact default base from, deep structure in linguistic sense that applies across prosodic styles and formats (Tseng et al., 2007 and forthcoming). The perspective also allows me to examine boundary information both in the speech signals and in post-boundary silent pauses in relation to discourse information and what significance boundary information bears in both speech planning and speech processing (Tseng et al., forthcoming).
2. Quantity of Speech Samples/Data Used
I have collected 10.58GB of speech data since the late 1990’s, consisting mostly of reading of text pieces of various features and by various speakers that aimed at bringing out the acoustic properties of continuous fluent speech prosody while removing factors related to spontaneous speech. Along the process, I have developed annotation systems and toolkit for acoustic analyses and manipulation (Tseng et al., 1999, 2005b). The results are COSPRO (Sinica Mandarin Continuous Speech Prosody and Toolkit http://www.myet.com/cospro, 7.9GB of the 10.58GB annotated) now available for a fee to the research community and public. I believe the corpora are useful both to research and teaching of Mandarin (Tseng et al., 2003, 2005b).
3. Research Methodology-Corpus Phonetics Developed to Investigate Acoustic Properties of Continuous Fluent Speech
I have chosen to deal with realistic research problems of continuous speech in large chunks, for example speech paragraphs up to over 180 syllables (or 70 seconds, COSPRO 01),? and developed experimentation methods by integrating engineering- and speech-technology-oriented techniques to acoustic phonetic investigations, and thereby moved phonetics of studying limited samples of limited speakers to multiple speakers and vast amount of data (by traditional phonetics account though perhaps modest to the speech technology community) as well as using more scientific methodology than observation and descriptions. The now standard annotation and quantitative analyses of corpus linguistics have been painstakingly adopted before full-fledged methodology was available. Along the course I had to adjust and develop consistent methods to examine acoustic phonetic properties of larger domains and units. For those who choose not to agree with my perspectives, arguments and/or framework, they would not and could not refute my data. The corpus phonetics approach I developed and used has thus made replication possible and phonetics a more responsible science.?
The Fujisaki Model (Fujisaki, 1984) was adopted to analyze F0 trajectories, but my group and I have since made it both analytical and predictive (Tseng et al., 2006b, 2007), and thus expanded and strengthened it. I have also developed methods to analysis speech rhythm and loudness of prosodic units (Tseng et al., 2004a, 2005c), as well as adopting and adapting linear regression analysis to account for layered and cumulative contributions from each prosodic layer of the HPG hierarchy. The statistical method also made it possible to show the existence of higher level discourse information in the speech signals and explain the interaction between and among various prosodic units and layers, thus making it clear why surface intonation variations are not random at all but higher level constrained and defined (Tseng et al., 2004b, 2005a, 2006a, 2006e).
By studying Mandarin speech prosody in relation to higher level discourse information, I have also moved acoustic phonetic studies of Mandarin Chinese to phenomenon other than tones and (phrase/sentence) intonation. The quantitative evidences I have obtained showed that additional discourse information is also present in the speech signals; such higher level information could be accounted for statistically by establishing a prosody hierarchy above sentences. From the viewpoint of linguistic research, the HPG framework has made studying phonetic information above phrase and sentences possible; evidences obtained have provided explanations of how fluent speech prosody is generated, and moreover, together the above has helped deriving abstract linguistic knowledge from surface speech variations within and between speakers possible. As a result, surface variations are not random, but systematic and predictable. I believe I have brought forth more linguistic knowledge of and about phonetic facts through fluent speech prosody and from extensive studies on speech corpora, and thus expanded phonetics from description to explanation and generalization as well as narrowed the gap between linguistic knowledge and speech facts considerably. I have shown how it is possible to utilize quantities of speech data to extract abstract linguistic knowledge in concrete sense.
?4. Interdisciplinary Approach Implemented
Adopting a more technology oriented approach has allowed me to conduct phonetic investigation (of fluent speech prosody) with clearer goals in mind. A mathematical model was constructed on the bases of research results and ready to be tested in speech synthesis and recognition (Tseng et al., 2004c, 2004d, 2004e, 2005a, 2006b, 2006c).???


地址 Address
台北市南港區研究院路2段128號 語言所 語音實驗室
No. 128, Sec. 2, Academia Road, Nankang, Taipei 11529 Taiwan R.O.C.
聯繫 Contact
電話(Tel): (886) 02 - 26525000 ext 6143
傳真(Fax): (886) 02 - 26525048

資料來源:http://phslab.ling.sinica.edu.tw/
 

沒有留言: