Volume 101/102
Transcription
Volume 101/102
the Phonetician A Publication of ISPhS/International Society of Phonetic Sciences Historic larynx models from Franz Wethlo Number 101/102 2010 – I / II ISPhS International Society of Phonetic Sciences President: Ruth Huntley Bahr Secretary General: Mária Gósy Honorary President: Harry Hollien Vice Presidents: Angelika Braun Marie Dohalská-Zichová Mária Gósy Damir Horga Eric Keller Heinrich Kelz Stephen Lambacher Asher Laufer Judith Rosenhouse Past Presidents: Jens-Peter Köster Harry Hollien William A. Sakow † Martin Kloster-Jensen Milan Romportl † Bertil Malmberg † Eberhard Zwirner † Daniel Jones † Honorary Vice Presidents: A. Abramson S. Agrawal L. Bondarko E. Emerit G. Fant P. Janota W. Jassem M. Kloster-Jensen M. Kohno E.-M. Krech A. Marchal H. Morioka R. Nasr T. Nikolayeva R. K. Potapova M. Rossi R. Weiss M. Shirt E. Stock M. Tatham F. Weingartner Auditor: Angelika Braun Treasurer: Ruth Huntley Bahr Affiliated Members (Associations): American Association of Phonetic Sciences Dutch Society of Phonetics International Association for Forensic Phonetics and Acoustics Phonetic Society of Japan Polish Phonetics Association Affiliated Members (Institutes and Companies): KayPENTAX, Lincoln Park, NJ, USA Inst. for Advanced Study of the Communication Processes, University of Florida, USA Dept. of Phonetics, University of Trier, Germany Dept. of Phonetics, University of Helsinki, Finland Dept. of Phonetics, University of Zürich, Switzerland Centre of Poetics and Phonetics, University of Geneva, Switzerland 2 B. Schouten P. French I. Oshima & K. Maekawa G. Demenko J. Crump H. Hollien A. Braun A. Iivonen S. Schmid S. Vater International Society of Phonetic Sciences (ISPhS) Addresses www.isphs.org President: Secretary General: Professor Ruth Huntley Bahr, Ph.D. President's Office: University of South Florida Dept. of Communication Sciences & Disorders 4202 E. Fowler Ave., PCD 1017 Tampa, FL 33620-8200 USA Tel.: ++1-813-974-3182 Fax: ++1-813-974-0822 e-mail:rbahr@ usf.edu Prof. Dr. Mária Gósy Secretary General's Office: Kempelen Farkas Speech Research Laboratory Hungarian Academy of Sciences Benczúr u. 33 H-1068 Budapest Hungary ++36 (1) 321-4830 ext. 172 ++36 (1) 322-9297 e-mail: [email protected] Guest Editor: Book Review Editor: Dr. Jürgen Trouvain FR 4.7 Computational Linguistics and Phonetics Saarland University Campus C7.2 D-66041 Saarbrücken Germany Tel.: +49 (681) 302 4694 Fax: +49 (681) 302-4684 Email: [email protected] Prof. Judith Rosenhouse, Ph.D. Swantech 89 Hagalil St Haifa 32684 Israel Tel.: ++972-4-8235546 Fax: ++972-4-8235546 e-mail: [email protected] 3 FROM THE PRESIDENT I hope that you are enjoying the new format of the Phonetician. The ability to include color photographs and graphs makes the text come alive. I am grateful to the individuals who volunteer to edit an issue. Prof./Dr. Mária Gósy is doing an excellent job of recruiting editors; however we would welcome you to volunteer to edit an issue. The Phonetician would be an excellent way to showcase your area of phonetics and your institute. We all benefit from hearing about each other’s work. So, please consider editing an issue for us. A quick email to me or Prof./Dr. Gósy and we will help you get started and guide you through the process. Many thanks go out to Dr. Jürgen Trouvain for editing the current issue. My favorite thing about this issue is the variety of topics covered. The research articles range from a description of long term formant distributions in read and spontaneous speech to throat singing. There is a good article on the acousticphonetic collection in Dresden, as well as an article on a lesser studied language, Lower Sorbian. Finally, we have an article in French dealing with prosody. There is definitely something for everyone. We would love to hear your comments on the recent issues of the Phonetician and its new online format. FROM THE EDITOR After various guest editorships, this double issue of the Phonetician comes from Saarbrücken. It brings together different research contributions which reflect as large range of the phonetic sciences: from the acoustics of individual speaker characteristics to the physiology of throat singing, from the collection of historical phonetic instruments via the acquisition of a corpus of an endangered language to an experimental study at the syntaxprosody interface. In addition to the research articles, the reader finds conference reports, the presentation of phonetic institutes, book reviews, as well as obituaries. My warm thanks go to all contributors of this issue. I would like to express my gratitude to all colleagues who supported me as a guest editor, be it as a reviewer or in another form. Jürgen Trouvain Saarbrücken, March, 2012 4 The Phonetician A Publication of ISPhS/International Society of Phonetic Sciences ISSN 0741-6164 Numbers 101/102 / 2010-I/II Contents From the President …………………………………………………………. 4 From the Editor ………...……...…………………………………………… 4 Articles and Research Notes Long-term formant distribution as a measure of speaker characteristics in read and spontaneous speech by Anja Moos ………………………………………………………………… On the Physiology of Voice Production in South-Siberian Throat singing – Extended Abstract by Sven Grawunder ………………………………………………………….. The Historical Phonetic-Acoustic Collection of the TU Dresden by Rüdiger Hoffmann & Dieter Mehnert .......................................................... GENIE: The Corpus for Spoken Lower Sorbian (GEsprochenes NIEdersorbisch) by Roland Marti, Bistra Andreeva & William J. Barry ……………………… Adjectif épithète et attribut de l’objet. Qu’en est-il de la prosodie? by Denis Ramasse ……………………………………………………………. Obituaries Eli Fischer-Jørgensen (1911-2010) by Jack Windsor Lewis ……………………………………………………… Eva Sivertsen (1922-2010) by Jack Windsor Lewis ………………………………………………………. Gösta Bruce (1947-2010) by Merle Horne ………………………………………………………………. Ilse Lehiste (1922-2010) by Viola Váradi ………………………………………………………………. 5 7 25 33 47 60 78 79 80 83 Awards Svend Smith Award 2008 for Elisabeth Lhote by Jens-Peter Köster …………………………………………………………. Phonetic Institutes Present Themselves The Department of Language and Communication Studies at Norwegian University of Science and Technology, Trondheim, Norway by Jacques Koreman …………………………………………………………. Phonetics Lab and the Phonogram Archives at Zurich University, Switzerland by Volker Dellwo & Dieter Studer …………………………………………... Conference Reports Speech Prosody 2010 Chicago (USA) by Stefan Baumann …………………………………………………………... 19th Annual Conference of the IAFPA 2010 Trier (Germany) by Peter Knopp ………………………………………………………………. New Sounds 2010 – 6th International Symposium of the Acquisition of Second Language Speech Poznań (Poland) by Matthias Jilka …………………………………………………………….. Book Reviews Steve Parker (ed) 2009. Phonological Argumentation. Essays on Evidence and Motivation. reviewed by Péter Siptár ……………………………………………………. 86 88 91 96 96 99 106 Géza Németh & Gábor Olaszy (eds.) 2010. A magyar beszéd. Beszédkutatás, beszédtechnológia, beszédinformációs rendszerek [Hungarian Speech. Speech research, speech technology, speech information systems] reviewed by Péter Siptár …………………………………………………….. 110 Halicki, Shannon D. 2010. Learner Knowledge of Target Phonotactics: Judgements of French Word Transformations. reviewed by Chantal Paboudjian ……………………………………………. 112 Meetings, Conferences and Workshops …………………………………... 116 Call for Papers ……………………………………………………………… 118 Instruction for Book Reviewers …………………………………………… 118 ISPhS Membership Application Form ……………………………………. 119 News on Dues ……………………………………………………………….. 120 6 LONG-TERM FORMANT DISTRIBUTION AS A MEASURE OF SPEAKER CHARACTERISITICS IN READ AND SPONTANEOUS SPEECH Anja Moos GULP (Glasgow University Laboratory of Phonetics) and School of Psychology, University of Glasgow, UK e-mail: [email protected] Abstract The simple method of averaging formant values of a recording of a speaker known as Long-Term Formant Distribution (LTF) is applied here to German speech in the context of forensic speaker identification. Introduced by Nolan and Grigoras (2005), the advantage of LTF is that it is not necessary to categorize and label each vowel produced. Instead, for each speaker, the formants of all vocalic portions are averaged, thus leading to one mean value per formant. The volume of speech data necessary to attain reliable LTF values is also examined. LTF values of 71 German speaking males in spontaneous and read speech recorded via mobile phone connections were analysed. Good speaker characterisation is possible using the LTF values of F2 and F3; LTF values of F3 seem slightly more useful because it is less variable within speakers than F2. Comparison of spontaneous and read speech revealed significant differences between the LTF values of F2 and F3 of the two speaking styles. The LTF values of formants of read speech are higher. As LTF values only return the average and standard deviation of formants, they are not suitable for speaker recognition on their own. However, LTF is independent of many other measures of a speaker, such as speaking rate, dialect, and fundamental frequency. Therefore, LTF values can be used as an additional independent factor in speaker recognition. Keywords Long-term formant distribution, LTF, read vs. spontaneous speech, mobile phone recordings, speaker comparison Definition of LTF Long-Term Formant Distribution (LTF) is a method used to determine average formant values of a speaker. For each formant, all formant measurements of all vowels produced by a speaker are averaged (across the entire recording or appropriate sub-portions of a recording). This average is the LTF value for this formant. That means that every speaker has one LTF value and a standard deviation (SD) per formant which shall be called LTF1, LTF2 and so on. It is a frame-by-frame measurement, meaning that long vowels carry more weight than short vowels. 1. Introduction To identify a speaker by his or her phonetic speaker characteristics, various acoustic and auditory measures are taken into account. According to Jessen 7 (2007), auditory measures such as estimation of age, health, sex, dialect and sociolect mostly refer to group characteristics. Whereas fundamental frequency, articulation rate, formants and voice quality, which are often measured both acoustically and aurally, are more speaker specific. This paper focuses on formants as the importance of and interest in formant measures for forensic cases grows. Many studies in the last decade have shown that formants carry speakerspecific information and that their analysis is also possible under forensic conditions, i.e. given poor quality and bandpass filter due to phone recordings (see Rose, 2006; Nolan, 2002; Byrne & Foulkes, 2004). This paper follows Nolan & Grigoras (2005) who state: “It is argued here that formants, whose frequencies and dynamics are the product of the interaction of an individual vocal tract with the idiosyncratic articulatory gestures needed to achieve linguistically agreed targets, are so central to speaker identity that they must play a pivotal role in speaker identification.” (Nolan & Grigoras, 2005: 143) Of course formants of different people are not unique; but when combined with other speaker characteristics listed above, they may lead to a very idiosyncratic speaker description. Each additional independent feature can help to identify a speaker. The most commonly used method for formant measures in forensic phonetics to date is the centre frequency of different vowels (cf. Jessen, 2008; Rose, 2002). Here, formants are measured at the midpoint which is defined as the articulatory target of the vowel produced. Usually one tries to find a number of representatives of a couple of different vowels, mostly /i a o/, to compare their formant values from the suspect with those of the perpetrator. Comparison of vowels in speech can be problematic using this method as the context influences the formants. It might also be difficult to define vowel phonemes in general or their centre frequency in particular when dealing with a foreign language and/or poor recording quality. Although it is an accurate method, it is very time consuming. Another method is the study of formant dynamics. McDougall did this for /aI/ (McDougall, 2004) and /u/ (McDougall & Nolan, 2007). They found withinspeaker consistency and between-speaker differences in the data and argued that more attention should be paid to the development of techniques to measure dynamic features (McDougall, 2006). However, this method bares unknown effects of the vowel context and further research is necessary. Long term spectra (LTS) are also used to show formant average distributions (see e.g. Nolan & Grigoras, 2005; Hollien, 1990). An LTS is the average of all spectral slices of a sound sample. As well as voiced speech, LTS takes everything else in the signal into account, including voiceless portions of speech, background noise etc. Long-Term Formant Distribution (LTF) was developed by Nolan and Grigoras (2005) in order to address the flaws of the single vowel phoneme 8 measures and LTS. This method does not require a categorization of vowels; instead, every vowel is used for the measurements. It is also less time consuming to select all vowels by reading the spectrogram rather than carefully listening to the file repeatedly to detect single vowel phonemes. In addition to saving time, being easy to use and suitable for foreign languages, Nolan & Grigoras (2005) mention two more benefits. First, the distribution of the formants not only reflects the dimensions of the vocal tract but also shows habits in articulatory settings like palatalization or lip rounding. Second, the shape of the distribution of a formant might show useful information about the speaker insofar that a broad peaked or narrow peaked distribution might reflect the speakers’ vowel space. The disadvantages of LTF are that inter-individual differences on single vowels cannot be detected and speech dynamics like transitions and coarticulation are lost. The work of Nolan & Grigoras (2005) showed the benefits, usefulness and efficiency of the LTF method on an English forensic case. This study will show its applicability to German and also provide information on the following aspects: Testing for correlation of LTF values with the fundamental frequency, articulation rate, and dialect groups. If they correlate, it is not necessary to use LTF in addition because no further information is gained. If they do not correlate, LTF can be used as an independent measure that adds further information to the characterisation or discrimination. Determination of how many seconds of vocalic stream or of speech recordings are needed to derive reliable LTF values. This is an essential issue for forensic case work because voice recordings are often limited in duration. Different speaking styles (read and spontaneous speech) were compared. It is important to know whether, and to what degree, recordings of the same voice differ in their LTF values between speaking styles so that it can be determined whether spontaneous speech of a perpetrator can be reliably compared with read speech of a suspect. Creation of a reference database for German LTF values comprising 71 speakers. This will be useful for future use in Bayesian methods like the likelihood ratio (see Jessen, 2008; Morrison, 2009; Rose, 2002 for usage of likelihood ratio in forensic speaker comparison). 2. Methods 2.1. Data Recordings of the speech corpus “Pool2010” (Jessen et al., 2005) were used. From this German corpus, recordings of 71 male participants who read out the German version of “North wind and the sun” were used for this experiment. For spontaneous speech, participants were asked to describe objects to another person without using predefined words, similar to the game “Taboo”. The person guessing the object played ignorant to encourage the speaker to describe the items more extensively, thereby triggering longer stretches of spontaneous speech. All 9 the recordings were made in high studio quality and later played back through speakers and re-recorded through mobile phones to have data close to forensic case data. The mobile phone data was used for this experiment. The recordings of spontaneous speech were 79-313 seconds long (M=178 seconds). Recordings of the read story were 31-54 seconds long (M= 39 seconds). For the LTF analysis, recordings were cut in a way described in section “2.3 Data preparation” below, so that only vowels remained. After that, the vocalic stream of spontaneous speech was 12-83 seconds (M= 40 seconds), and the vocalic stream of read speech was 8-16 seconds (M= 12 seconds). In total 142 sound files were used (71 speakers X 2 speaking styles). 2.2. Speakers Recordings of 71 male German speakers were used. Speakers were 25 to 55 years of age (M= 38 years). Roughly half of them had recognizable but generally weak dialectal features of Hessian German (‘Hessisch’); 45 of the participants were actually from that area. The remaining participants were from other parts of Germany. None of the speakers had heavy dialectal features, and everyone had an average or above average educational background. No noticeable speech or voice disorders were present. Speaker IDs ranged from 35-107 (excluding 61 because of lack of data); speakers will be referred to later in the text by their IDs. 2.3. Data preparation For LTF, only the vocalic stream is used (i.e., every recording was cut in such a way that only vowel sounds remained). WaveSurfer (Sjölander & Beskow, 2005) was used for the cutting procedure. The selection process was based on several criteria: ● Clear and visible formant structure of the first three formants (intensity settings were sometimes increased to find F3, especially for back vowels which tend to have a higher spectral tilt) 1 ● Laterals and approximants were kept ● Filled pauses and hesitations were kept if vocalic ● Creaky voice was kept if vocalic ● No nasals or strong nasality (because of zero formants at 2-3 kHz) ● No vowels spoken with a very high pitch so that harmonics rather than formants were visible This procedure resulted in sound files of pure vocalic stream, without any pauses or consonants other than those stated above. This criteria was applied while reading the spectrogram and deleting all unwanted regions. When it was unclear whether nasality was present or not from reading the spectrogram, additional auditory judgements were made. 2.4. Data analysis The cut sound files were used for the formant measurements of F1, F2 and F3 with WaveSurfer. The automatic formant tracking was set to four formants, an LPC 1 Because the formant structure of laterals and approximants is very similar to vowels, they were kept. It doesn’t distort the data but saves working hours if no auditory inspection is needed after visual selection of vocalic stream. 10 order of 12, a frame interval of 0.01 seconds and a nominal F1 of 500 Hz. Recordings were down-sampled to 10 kHz. Usually the band width of telephone recordings (roughly 300 Hz to 3-4 kHz) does not display F4 correctly or at all because of the upper cut-off frequency. However, without a fourth dummy formant, the tracking of F2 and F3 was often found to be unreliable, so it was kept. Every file was manually checked and corrected if necessary. This correction was needed because, due to the cutting procedure, samples could contain jumps (e.g., from /i/ to /u/) without the usual formant transition. The prediction algorithm would find such unnatural jumps quite problematic. 3. Results 3.1. General results for LTF Figure 1 presents individual LTF2 and LTF3 values for every speaker for spontaneous (Figure 1a) and read speech (Figure 1b). LTF1 is not shown as it is too error prone due to the lower cut-off frequency in mobile phone transmission. Byrne & Foulkes (2004) showed that F1 on average shifts 29 % in mobile phone recordings compared to direct high fidelity recordings. Table 1 lists the average LTF values for every formant and speaking style averaged across all speakers. Both figures and the table show that LTF values are higher for read than for spontaneous speech. A t-test for paired samples showed that this difference is significant for all formants: t=-6.016, p<0.0001 for LTF1; t=-11.449, p<0.0001 for LTF2; t=-6.917, p<0.0001 for LTF3. Regarding the within speaker comparison, Figure 1a shows that hardly any LTF2 in read speech was lower than for spontaneous speech. Only very few LTF3 values for read speech were lower than for spontaneous speech, as Figure 1b displays. 2700 2600 2500 2400 2300 2100 F3_read 2000 F3_spont 1900 F2_read 1800 F2_spont 1700 1600 1500 1400 speaker (a) Spontaneous LTF2 ascending 11 68 101 47 86 72 80 42 48 87 93 52 81 78 99 49 84 66 85 53 55 97 73 41 103 90 77 46 107 91 39 54 37 106 1200 70 1300 100 Hz 2200 2700 2600 2500 2400 2300 2200 Hz 2100 F3_read F3_spont F2_read F2_spont 2000 1900 1800 1700 1600 1500 1400 100 86 105 80 40 87 93 71 69 48 84 99 51 83 62 72 35 88 81 66 107 96 43 54 58 70 98 103 97 73 55 78 63 92 1200 46 1300 speaker (b) Spontaneous LTF3 ascending Figure 1. LTF2 and LTF3 of every speaker in read and spontaneous speech. Speakers ordered by ascending LTF values of spontaneous speech. Table 1. LTF values in Hz for spontaneous and read speech and their standard deviations (SD) averaged across all speakers. LTF SD F1_spont 470 24 F1_ read 484 21 F2_spont 1400 79 F2_ read 1463 70 F3_spont 2378 128 F3_ read 2422 125 3.2. Between-speaker comparison Speaker-specific features can be identified in the distribution of LTF values, as well as their mean value. Figure 2 shows the distribution of F2 and F3 for two speakers with very different LTF values at the top, and two different speakers with very similar values at the bottom. As the top graph shows, speakers not only differed in their LTF mean value (with up to 500 Hz difference), but also in the distribution. While the distribution of speaker 44 is more platykurtic (broad peak), the distribution of speaker 95 is more leptokurtic (narrow peak). As the bottom graph shows, both speakers have a double peak distribution for F3, but their main peaks lie 250 Hz apart while having very similar F2 distributions. While this is not a very distinctive feature, it would still raise some doubt whether these two distributions are from the same speaker or not. 12 14% 12% 10% 8% 44 F2 44 F3 95 F2 95 F3 6% 4% 2% 3350 3225 3100 2975 2850 2725 2600 2475 2350 2225 2100 1975 1850 1725 1600 1475 1350 1225 1100 975 850 725 600 0% Hz 7% 6% 5% 4% 35 F2 35 F3 66 F2 66 F3 3% 2% 1% 3350 3225 3100 2975 2850 2725 2600 2475 2350 2225 2100 1975 1850 1725 1600 1475 1350 1225 1100 975 850 725 600 0% Hz Figure 2. F2 and F3 distributions of two speakers producing spontaneous speech in comparison. Top: Clearly distinguishable formant distributions of speaker 44 and 95. Bottom: Similar formant distributions of speaker 35 and 66. In comparison, Figure 3 (top) shows the distribution of F2 and F3 for speaker 44 only, with the recording of his spontaneous speech divided into two halves. The same was done for speaker 35 in Figure 3 (bottom). For speaker 44, the distributions of F2 and F3 are very similar in the two parts of his spontaneous speech; however, there is a peak shift of 125 Hz for F3. No differences of the distributions of F2 and F3 were found for speaker 35, indicating no within-speaker differences for spontaneous speech. 13 Figure 3. F2 and F3 distributions of speaker 44 (top) and 35 (bottom) producing spontaneous speech; first half of recorded speech in black, second half in grey. To sum up, it can be very useful to look at the distribution of F2 and F3 for speakers with very similar LTF means because their distributions can be manifold: They can be single vs. double peaked and/or lepto- vs. platykurtic, and these 14 distribution shapes seem to be stable within speakers but can vary between speakers. 3.3. Within-speaker comparison 3.3.1. Effect of speaking style on mean LTF Recordings of the perpetrator are sometimes compared to recordings of the suspect reading what has been said by the perpetrator during the crime. For this reason it is important to know whether, and to what degree, recordings of the same voice differ in their LTF values between spontaneous and read speech. To determine whether spontaneous speech of the perpetrator can be reliably compared with read speech of the suspect, LTF values within speakers were analysed across speaking styles. Table 2 shows the results of a t-test for paired samples. LTF values of spontaneous and read speech were paired for every speaker. A negative mean value indicates that spontaneous speech has lower values than the read speech. This is the case for all three formants. The mean difference is given in Hz, so LTF2 of spontaneous speech is 62.21 Hz lower than that of read speech and is the formant with the largest difference between speaking styles. Given a mean difference of -62.21 Hz, the standard deviation (here 45.78 Hz) indicates that the LTF2 difference of 68 % of the speakers lies between -107.99 and -16.43 Hz. These numbers were derived like this: (1) -62.21 – 45.78 = -107.99 (2) -62.21 + 45.78 = -16.43 A positive frequency indicates that some speakers of the typical 68 % have a higher LTF in spontaneous speech. This is the case for LTF3 where the SD of the mean difference of read and spontaneous speech ranges between -97.6 and +9.06 Hz. All the differences are highly significant (p<0.0001). As already mentioned LTF1 should not be taken as a reliable measure because of the lower cut-off frequency of the mobile phone transmission. LTF3 seems most reliable to use for speaker identification because it shows less difference between speaking styles and is less influenced by the mobile phone bandwidth than LTF1. Table 2. t-test for paired samples. Pairs: LTF values of read and spontaneous speech of every speaker. SD = standard deviation, SE = standard error. All T’s significant with p<0.0001. LTF1 LTF2 LTF3 paired differences mean SD -14.08 19.72 -62.21 45.78 -44.00 53.60 SE 2.34 5.43 6.36 15 T -6.016 -11.449 -6.917 df 70 70 70 Despite the fact that LTF values differ significantly across speaking styles they can still correlate strongly. In this case, a stable difference in their values can be assumed. To find correlations, a Pearson product-moment-correlation for interval-scaled data was conducted between all LTF values across all speakers to look at relationships of formant specific LTF values. Table 3 shows the statistically significant correlations between LTF values. Correlations between LTF values of the same formant across the two speaking styles were stronger (indicated in bold print) than correlations within the same speaking style across different formants. The strongest correlation was for LTF3 which is known to be the most stable formant within a speaker. Table 3. Pearson product-moment-correlation of all LTF values. All r values are significant with p<0.01. n.s. = not significant. LTF2spon LTF3spon LTF1read LTF2read LTF3read LTF1spon 0.395 n.s. 0.615 0,370 n.s. LTF2spon 1 0.514 0,400 0.819 0,484 LTF3spon 0.514 1 n.s. 0,502 0.910 LTF1read 0,400 n.s. 1 0.377 n.s. LTF2read 0,819 0,502 0.377 1 0.575 These correlations were made using the data of all the speakers. Correlations of individuals may vary, so these r values can only be used as guide values. Combining the results of the t-test and the Pearson correlation, it was found that there is a stable difference in LTF insofar that read speech produces mostly higher LTF values than spontaneous speech. The scatter plot in Figure 4 shows the downshift of LTF2 and LTF3 from read to spontaneous speech in the F2-F3-vowel space. The LTF values of every speaker are connected with a grey arrow indicating the direction of change from read (red circle) to spontaneous (blue x) speech. The general trend leads to a lower LTF2 and LTF3, but some speakers also show upward shifts or downward shifts of only one formant; only three speakers show an upward shift of both LTF2 and LTF3. 16 Figure 4. Scatter plot of LTF2 and LTF3 of all speakers. Circle = read speech, x = spontaneous speech. Values of every speaker connected through grey arrows. 3.3.2. Effect of speaking style on formant distribution When investigating the distributions of read and spontaneous speech within one speaker, it is not only interesting to see the differences between the LTF means but also the distributions. Are they similar apart from a little upward shift? No clear answer can be given, as shown in Figure 5. While speaker 35 has very different F3 distributions across speaking styles, the F3 distribution of speaker 100 is nearly identical. This raises problems discussed earlier in section “3.2 Between-speaker comparison”. When the mean is similar but the distribution different, it is still not clear whether the samples are from different speakers or whether the same speaker is using different speaking styles. 17 Figure 5. F2 and F3 distributions of spontaneous and read speech in comparison. Spontaneous speech in black, read speech in grey. F2 solid line, F3 dashed line. Top: Speaker 35. Bottom: Speaker 100. 3.4. Amount of data necessary for LTF One of the most important questions regarding LTF values for forensic phonetics use is: How much speech data is necessary to get reliable LTF measurements? The amount of data is crucial because an LTF value is only meaningful if enough (different) vowels are used. In a sample of 2 seconds of pure vocalic stream, for 18 example, it might well be that only /e/, /a/ and /ə/ are present and this would skew the data towards the open front side of the vowel quadrilateral and therefore not represent the vowel space of a speaker. Because in most forensic cases there will not be extensive recordings to extract many seconds of vocalic stream, it is necessary to find out whether short recordings are sufficient. For this, each LTF sound file was divided into packages. Each package represents a short sound file of one speaker. If the LTF values of the packages (of one sound file) do not differ much from each other, it is assumed that this size is sufficient to get reliable LTF data. The difference between packages was detected by calculating the standard deviation between packages. These calculations were made with various package sizes to detect the threshold package size (because the package size is an approximation of the length of vocalic stream needed to get reliable LTF values). Every sound file was divided into packages of 1, 1.5, 2, 2.5 … 10 sec. The average number of packages per size per speaker is listed in Table 4. Within each package size, the LTF package values were taken, and a standard deviation was determined for every speaker separately and for every package size. As the package size increases, the number of packages naturally decreases, so the standard deviation might be influenced by size and, therefore, the number of packages. On the other hand, in bigger packages there is much more variation within a package, so LTF values do not differ much any more and not many packages are needed to get a stable SD. Table 1. Average amount of packages per speaker used to calculate standard deviations. Top: spontaneous speech. Bottom: read speech. Package size in seconds. package size 1 1.5 2 2.5 3 3.5 4 number of packages 39.9 26.5 19.6 15.7 13.0 11.0 9.6 package size 6 6.5 7 7.5 8 8.5 9 number of packages 6.3 5.8 5.3 5.0 4.7 4.3 4.1 package size 1 1.5 2 number of packages 11.5 7.5 5.5 package size 6 6.5 7 number of packages 2.0 2.0 2.0 4.5 5 5.5 8.5 7.6 6.8 9.5 10 3.9 3.8 2.5 3 3.5 4 4.5 5 5.5 4.4 3.6 3.0 2.6 2.1 2.1 2.0 7.5 8 2.0 2.0 If the standard deviation asymptotically reaches a constant, package sizes do not differ much anymore and it can be assumed that the amount of data of this package size is enough to get reliable LTF values. Figure 6 shows the course of the SD curves across the different package sizes. The x-axis lists the package sizes and the y-axis the SD. It is shown that for both read and spontaneous speech, the SD was smallest for LTF1 and largest for LTF2. LTF1 is not very meaningful because the lower cut-off frequency of mobile phone transmission shifts the formant values in unpredictable ways and amounts, mostly upwards. LTF3 has a 19 smaller SD and is regarded as being more speaker specific (see Rose, 2002, p. 237; Ladefoged, 2001, p. 194). It is therefore best to work with LTF3. For spontaneous speech, LTF3 seems to become stable at a package size of 6 seconds (see Figure 6a), which equals about 27 seconds of spontaneous speech dialogue recording. It has to be noted that the curve does not seem to have reached its asymptotical level but, nonetheless, there is very little change in its course anymore. 140 standard deviation (Hz) 120 100 80 F1 F2 F3 60 40 20 0 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 package size (sec) 140 standard deviation (Hz) 120 100 80 F1 F2 F3 60 40 20 0 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 package size (sec) Figure 6. Standard deviation of package sizes averaged across all speakers. It can be assumed that enough data is collected to get reliable LTF values when the curve reaches an asymptotical level. 20 For read speech, the LTF3 threshold is difficult to detect. It might be at 5 seconds, equal to about 16 seconds of read speech recording. Empty symbols were used in Figure 6 for read speech at a package size of 7 seconds or larger because they were only based on 6, 4 and 1 speaker(s) respectively. The other speakers produced passages too short to be divided into packages larger than 6.5 seconds. As the reading passage was not very long, very few speakers produced vocalic data of that length and it cannot be assumed that the data represented by the empty symbols act in a typical way. For LTF2 a package size of 4.5 seconds seems to give reliable LTF data in read speech (equivalent to about 14.5 seconds of read speech recording). For spontaneous speech, the threshold is also difficult to detect. The safest choice is the package size of 9 seconds (equivalent to about 50 seconds of spontaneous dialogue recording) but 5.5 seconds (≈25 sec) seems to be a justifiable choice as well. In sum, LTF values of speech samples with at least 6 seconds of pure vocalic stream can be considered reliable. This estimation is based on the average behaviour of all speakers. There can sometimes be large variation between speakers as to the threshold of sufficient LTF data (see Moos, 2008, Figure 3.15). 4. Discussion In this study, LTF has been shown to be a valuable measure for speech comparison and can aid in speaker identification. Some speakers had very similar LTF values, but the distribution of the formants may vary, resulting in leptokurtic, platykurtic or double-peaked curve shapes. Other speakers had easily distinguishable distributions with clearly distinct means. Within-speaker comparisons of speaking style revealed that read speech had significantly higher LTF values than spontaneous speech. It is unclear whether this upward shift is a shift or an expansion of the vowel quadrilateral. Hyper-articulation in read speech would explain an expansion of the vowel space (an expansion would also result in an upward shift of LTF because front and open vowels are used more often than close back vowels in German, see Simpson, 1998). But, as the SD remained constant (see Table 1), a simple upward shift rather than an expansion is assumed (for an expansion the SD increases as well). Despite the shift, LTF values within formants correlated strongly across speaking styles. The curve distribution within speakers across speaking styles can also vary in different ways but generally does not show drastic changes and shifts. LTF is a measure of speaker characterisation that is independent of f0, dialect and speech rate; Moos (2008) showed no correlation between LTF and these measures using a dataset common to both studies. One aspect that could not be covered in this article is the correlation between LTF and the physiognomy of the speakers (e.g., body height). Several studies found weak negative correlations between body height and formant measures (Greisbach, 1999 for German; Gonzales, 2004 for Spanish; Rendall et al., 2005 for Canadian English, but only 21 for males). The same was found by Jessen (2010) using the same data the current study is based on. The size of the vocal tract might be a mediator of these correlations. Although no clear assumptions can be drawn from weak correlations, it is very unlikely that someone with high formants will be tall and that someone with low formants will be small. Before working with LTF measures, it is very important to know whether one has a sufficient amount of data. Because LTF is an average of all vowels produced in a speech sample, short samples are not suitable for this measurement. By dividing the given samples into smaller packages, it was estimated that roughly 6 seconds of pure vocalic stream (equivalent to 27 seconds of dialogue or 19 seconds of read speech) are, on average, enough to produce reliable LTF values. An important aim of this work was to create a reference database for LTF to work towards probability statements using likelihood ratios (LR). In court, evidence has to be weighed, and probabilities have to be given in a strength-ofevidence statement. How similar or different are the LTF values of two voice samples, and how typical are they (i.e., do many people of the population have those LTF values? See Jessen, 2008; Morrison, 2009; Rose 2002 to learn about LR in forensic speaker comparison.) To be able to give a strength-of-evidence statement (i.e., to be able to say how much more likely it is that two LTF values are from the same or different speakers), the creation of a reference database is essential. If there are, for example, two very similar LTF values of a suspect and the perpetrator, it does not necessarily mean that they are from the same speaker; if the LTF values are very typical in the population, there is relatively more evidence that they are from different speakers than if they are very atypical (e.g., very low or high). An LTF database was constructed from 71 German speakers producing read and spontaneous speech recorded through mobile phone transmission as part of this work. This database, which enables such likelihood ratio statements, is more extensively described in Moos (2008). Prospects for future work are to compare the mobile phone data with high fidelity recordings which exist for the data that has been used here as well. Another interesting investigation would be to explore the influence telephone bandwidth has on LTF values. The results could then be compared with those of Byrne & Foulkes (2004) with the advantage that the same speech data was used for both hi-fi and mobile phone qualities. Comparisons across different languages should also be made to investigate whether LTF measures of recordings of one person speaking different languages can be reliably compared. A further important test concerns the reliability of LTF measures across different phoneticians taking the measures. Will every expert include and exclude the same vocalic portions and hence produce the same data for analysis? The same question can be applied to different formant tracking algorithms used in different programmes like Praat, WaveSurfer, Emu, etc. Statistical measures to evaluate the amount of LTF data necessary to be reliable would improve the validity of the prediction. Research is 22 currently being undertaken to answer many of these questions and will hopefully give insight into these neglected areas of LTF research. References Byrne, C. & Foulkes, P. (2004). The 'mobile phone effect' on vowel formants. International Journal of Speech, Language and the Law 11(1), pp. 13501771. Greisbach, R. (1999). Estimation of speaker height from formant frequencies. Forensic Linguistics 6(2), pp. 265-277. Gonzalez, J. (2004). Formant frequencies and body size of speaker: A weak relationship in adult humans. Journal of Phonetics 32(2), pp. 277-287. Hollien, H. (1990). The Acoustics of Crime: The New Science of Forensic Phonetics. New York: Plenum Press. Jessen, M. (2007). Speaker classification in forensic phonetics and acoustics. In: C. Mueller (ed): Speaker Classification I, pp. 180-204. New York, Berlin: Springer. Jessen, M. (2008). Forensic phonetics. Language and Linguistics Compass 2(4), pp. 671-711. Jessen, M. (2010). The forensic phonetician. Forensic speaker identification by experts. In: M. Coulthard & A. Johnson (eds): The Routledge Handbook of Forensic Linguistics, pp. 378-394. London, New York: Routledge. Jessen, M., Köster, O. & Gfroerer, S. (2005). Influence of vocal effort on average and variability of fundamental frequency. International Journal of Speech, Language and the Law 12(2), pp. 174-213. Ladefoged, P. (2001). A Course in Phonetics. USA: Heinle & Heinle. McDougall, K. (2004). Speaker-specific formant dynamics: An experiment on Australian English /ai/. International Journal of Speech, Language and the Law 11(1), pp. 103-130. McDougall, K. (2006). Dynamic features of speech and the characterization of speakers: Towards a new approach using formant frequencies. International Journal of Speech, Language and the Law 13(1), pp. 89-126. McDougall, K. & Nolan, F. (2007). Discrimination of speakers using the formant dynamics of /u:/ in British English. Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken, Germany, pp. 1825-1828. Moos, A. (2008). Forensische Sprechererkennung mit der Messmethode LTF (long-term formant distribution). Unpublished Master thesis (Magisterarbeit), Saarbrücken, Universität des Saarlandes. http://www.psy.gla.ac.uk/docs/download.php?type=PUBLS&id=1286 (accessed 17/08/2010). Morrison, G. (2009). Forensic voice comparison and the paradigm shift. Science & Justice 49(4), pp. 298-308. Nolan, F. (2002). The 'telephone effect' on formants: A response. Forensic Linguistics 9(1), pp. 74-82. 23 Nolan, F. & Grigoras, C. (2005). A case for formant analysis in forensic speaker identification. International Journal of Speech, Language and the Law 12(2), pp. 143-173. Rendall, D., Kollias, S., Ney, C. & Lloyd, P. (2005). Pitch (F0) and formant profiles of human vowels and vowel-like baboon grunts: The role of vocalizer body size and voice-acoustic allometry. The Journal of the Acoustical Society of America 117(2), pp. 944-955. Rose, P. (2002). Forensic Speaker Identification. London: Taylor & Francis. Rose, P. (2006). Technical forensic speaker recognition: Evaluation, types and testing of evidence. Computer, Speech and Language 20, pp. 159-191. Simpson, A. (1998). Phonetische Datenbanken des Deutschen in der empirischen Sprachforschung und der phonologischen Theoriebildung. Habilitationsschrift, Christian-Albrechts-Universität zu Kiel. Sjölander, K. & Beskow, J. (2005). WaveSurfer 1.8.5, Stockholm, KTH Royal Institute of Technology. Software available online: http://www.speech.kth.se/wavesurfer/index.html (accessed 06/10/2007). 24 ON THE PHYSIOLOGY OF VOICE PRODUCTION IN SOUTHSIBERIAN THROAT SINGING – EXTENDED ABSTRACT Sven Grawunder Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany e-mail: [email protected] This paper is an extended abstract of a PhD project that was finished in 2005 and published as a book (Grawunder, 2009). The project represents the first field-work based phonetic study of the extraordinary voice production mechanisms that occur in throat singing. Throat singing (ThS) is practiced in four areas in South-Siberia: the Republic of Tuva, the Republic of Hakassia, the Republic of Gorno-Altai as well as parts of the Russian Federation and adjacent Mongolia. ThS is a defined genre among and intertwined with other oral folk-arts and singing types, and it is distinct from Western overtone singing. Like Western overtone singing, South-Siberian ThS uses reinforced harmonics as carriers of sung melodies and enforced phonation modes. However, different from Western overtone singing, such targeted use of harmonics appears as common but not essential to ThS in the conceptions of singers (cf. van Tongeren, 2002, Grawunder, 2003b). There are two (sometimes three) main styles with regard to voice use, as discussed by singers and ethnomusicologists (cf. Kyrgyz, 2002): first, a tensed medial (chest-) register voice and second, a raspy growling low register voice. Often these voice registers are referred to in the literature with the Tuvan style names khöömei and kargyraa, respectively. Including a small-scale endoscopic study of one singer (the author), which contributes to the few available articulatory studies of throat-singers (e.g. Dmitriev et al., 1983, Edgerton, 2005, Grawunder, 2003a, 2003b, Lindestad et al., 2001, 2004, Sakakibara et al., 2002), the laryngoscopic evidence suggests that throatsingers make use of three voice production mechanisms. All mechanisms share an excessive constriction of the larynx entrance resulting, at various levels, in an approximation of the aryepiglottic folds and the epiglottis. Therefore the study focuses on phonation types which result, in addition to the normal activity of the vocal folds (VF), from various combinations of phonation activities involving the aryepiglottic sphincter chain (AES), the ventricular folds (VTF) and sometimes even the aryepiglottic folds (AEF). Two main types are therefore proposed for voice production in SouthSiberian throat singing: a voice production by means of the vocal folds featuring a constriction of the AES (Phonation Mode 1, henceforth PM1), and a voice production with involvement of the ventricular folds (Phonation Mode 2, henceforth PM2). VTF involvement in PM2 appears as a double cyclic period, with the vocal folds vibrating twice as fast as the ventricular folds; every second cycle consists of a (near-) synchronous closure of VF and VTF (cf. Bailly et al., 25 2010). A third proposed mechanism for PM2 is the involvement of the AEF (cf. Sakakibara, 2004), similar to epiglottic trill (Esling et al., 2007). On the one hand, the mechanisms of the constriction of VTFs and AEFs are discussed with respect to histoanatomical findings of muscular tissue that facilitates the medial compression of the VTFs (Kotby et al., 1991; Reidenbach, 1998a) as well as with respect to the findings of muscular and ligamentous components for an AEFsphincter framework (Reidenbach, 1998b) that takes part in the anterior-posterior constriction by means of the AES. On the other hand, the constriction mechanisms are discussed with respect to the anterior-posterior constriction that is often found in professional singing (Yanigsawa et al., 1989; Koufman et al., 1996; Stager et al., 2003). Finally, the occurrence of these structures in linguistically relevant sound patterns (cf. Esling et al., 2007; Edmondson & Esling, 2006) emphasizes the significance of these phonation modes to general phonetic research. The typical oro-pharyngeal configurations in ThS are described in a rough scheme of at least three (overtone) articulation techniques (denoted here as articulation types, AT) that are generally also found in overtone singing (cf. Edgerton, 2005; Neuschaefer-Rube et al., 2001; Saus, 2004; Trân, 1991): an [l]like articulation of the tongue tip (AT1), an [n]-, [ŋ]-, [i]- or [u]-like articulation of the tongue dorsum (AT2), and a mid-low vowel articulation of different heights of front and back vowels (AT3) including larger jaw movement than in the other two ATs. The three main ATs can be easily linked with techniques that are commonly associated with particular (here Tuvan) styles (AT1 – sygyt, AT2 – khöömei, AT3 – kargyraa). Although all combinations of PMs and ATs are found to be used in ThS, PM1 is mainly combined with AT1 and AT2 whereas PM2 is mainly combined with AT3. Further ‘articulatory substyles,’ such as ezeŋgileer AT2 or AT1 with a strong nasal (AT4) component or birlaŋnaadyr AT1/AT2 with strong labial component (AT5), were considered but have been excluded from the analysis to a great extent. AT1 and AT2 display the highest prominence of the single ‘melodic’ harmonic, measured as amplitude difference to the previous and next harmonic (12-14dB). However, the bandwidths for AT1 tend to be wider since here F2 and F3 usually merge. Besides a general explorative investigation of the properties of the phonation modes and articulation types, it was essential that the project investigate possible areal patterns, i.e. differences between the four groups of singers with regard to their origin in Southern Central Siberia. Questions addressing the areal typology of traditional music performance (especially singing) have gained more attention recently (see Blench & Dendo, 2006). In particular cases, the analyses of specific ThS styles may help to retrieve parts of the unwritten demographic history of the area, including population contact. 26 Figure 1: Sample sequence of the Tuvan Singer Oleg Kuular, starting out with PM1AT3 and PM1AT2 (khöömei) switching to PM1AT1 (sygyt) and proceeding further with PM2AT3 (kargyraa); the third tier contains the reinforced harmonics for AT1/AT2 and vowel qualities for AT3 The current study is comprised of data from 69 male singers. The material was in part collected during fieldwork in South Siberia in the years 1999 to 2002, where 25 singers were recorded by use of a specific field setting for acoustic (Vx), electroglottographic (Lx) and subglottal resonance (Sx) signal acquisition. For the latter, an approach by Neumann et al. (2003) was adopted, which makes use of a signal acquired with a small condenser microphone placed in contact with the skin of the singer’s jugular notch, i.e. the dip at the superior border of the sternum, between the clavicular notches. The cricothyroid ligament (ligamentum conicum), which is palpable as small depression below the Adam’s apple, had been recently suggested by Wokurek and Madsack (2009) as an alternative measure point for subglottal resonance. Supplementary recordings from 44 available professional music recordings and field recordings of other researchers were added to the acoustic analysis. The results of the perturbation measures for acoustic signals show dominance of individual variability over areal (cultural) factors. As one could expect there is a strong influence of the articulatory strategy. Nonetheless there are some parameters, such as APQ11, the amplitude perturbation quotient (i.e. shimmer) over 11 cycles, which seem to allow areal grouping, e.g. with Mongolian singers, who show the highest values for PM1. However, the articulatory reinforcement strategy interacted strongly with the phonation mode and showed the highest amplitude perturbation values for AT1 (see Fig. 2). Another clear areal group tendency is observable for Hakas singers with a clear preference for lower F0 in PM1 (median values: 110Hz for AT2/AT3, 160Hz for AT1) and PM2 (60Hz for AT3). PM2 samples of Hakas singers also show higher harmonics-to-noise ratio 27 (HNR) values and lower mean spectral slopes in all three bands investigated (02kHz; 2-5kHz; 5-8kHz). Figure 2. Areal tendencies represented by the median (full circle within the box) for the acoustic shimmer (APQ11) measures of 44 singers For perturbation measures of Lx-signals of the double-cyclic phonation (PM2), the perturbation parameters have been adopted so that every second cycle could be taken into account (see bottom channel in Figure 3). This reveals a very stable vibratory pattern, unlike similarly labeled pathological patterns (cf. Fuks et al., 1998). Figure 3. Double cycle phonation mode (kargyraa) sequence of a three-channel recording (VxLxSx) of the Tuvan singer SI 28 For the ultra-structure of the Lx, besides a schematic description of the cycle shape, the applicable phase quotients (closed, closing, open quotient) and symmetry indicators (speed quotient, contact index) were analyzed. Based on the values of the closed quotient and closing quotient for PM1 (AES-VF), the impression of a tensed (sometimes pressed) voice seems to be justified. In AT1, the subglottal wave is fully dominated by reinforced harmonic formants (usually F2). For the low PM2, there was only one singer for whom the involvement of an AEF-VF phonation type seemed reasonably certain. A controlled imitation of AEF-VF phonation by the author was added. For both singers, the lack of a peak around 3 kHz in the long-term-average spectra (cf. spectrogram in Figure 1) comes into question. However, the noise-to-harmonics ratio value, that is the ratio of nonharmonic energy (frequency range: 70Hz – 4500Hz) in the spectrum, which is therefore taken as an indicator of higher frequency noise, was not particularly noticeable. The majority of the investigated singers seem to use a phonation type of a double-cycle ventricular fold/vocal fold oscillation (VTF-VF). Based on the synchronous analysis of Vx, Lx and inverse filtered Vx it can be concluded that the main vocal tract excitation occurs with the closure phase of the ‘pure’ vocal fold cycle (cf. Henrich et al., 2006; Bailly et al., 2010). One cycle, presumably the VF-cycle, showed short closing phases and higher symmetry indicators. Then the vibration of the VFs triggers the VTF vibration at F0/2. However, in terms of cycle-to-cycle amplitude difference, the subcorpus of Vx-Lx signals contains examples with exactly the reverse patterns: the Vx excitation instant aligns well with the higher closing peak in Lx but more frequently with the lower peak (see Figure 3). It also remains uncertain to what degree the subglottal wave is able to support one of the two cycles. For the one case where all three channels were successfully recorded, the subglottal sound pressure maximum seemed to precede the supraglottal peak, appearing right at the end of the opening phase. Overall, the acquired data support a model of reinforcement of harmonics by four different means (cf. Edgerton, 2005). First, there is voice source variation (shortened closing phase, with increased excitation strength presumably via increased subglottal pressure, while air flow remains constant or lowered for the tensed mode (PM1); and double cycle modes involving mass bodies of supralaryngeal structures for the low mode (PM2) enabling fundamentals at half of VF-F0). Second, a specific formant adjustment for F2 comes into play that results, for some articulatory strategies, in formant merging (F1/F2 or F2/F3) due to multiple vocal tract constrictions (e.g. the sublaminal cavity; Engstrad et al., 2007), including a coupling of source and adjacent epi- and supralaryngeal rooms (of approx. 1/6 vocal-tract length; cf. Titze & Story, 1997). Third, a specific bandwidth tuning results partially from adjustment of lip radiation and partially from a stiffness of the articulators. Finally, the fourth mechanism of reinforcing harmonics is the aryepiglottic sphincter which facilitates F1 and F0 damping, a mechanism that is used individually to a very different extent. 29 [A short audio sample from the Tuvan singer Ayas Danzyrin, recorded by author in 2000, can be found here: http://www.eva.mpg.de/~grawunde/otsths/phdxtdabs.html] References Bailly, L., Henrich, N. & Pelorson, X. (2010). Vocal fold and ventricular fold vibration in period-doubling phonation: Physiological description and aerodynamic modeling. Journal of the Acoustical Society of America 127(5), pp. 3212–3222. Blench, R. & Dendo, M. (2006) Musical instruments and musical practice as markers of the Austronesian expansion post- Taiwan. Paper presented at the 18th Congress of the Indo-pacific Prehistory Association, University of the Philippines, Manila, 20 – 26 March 2006 retrieved 2011-04-01 from http://www.rogerblench.info/Ethnomusicology %20data/Papers/Asia/General/Roger%20Blench%20AN%20music%20II%2 0paper%20submit.pdf Dmitriev, L. B., Chernov, B. P. & Maslov, V. T. (1983). Functioning of the voice mechanism in double-voice touvinian singing. Folia Phoniatrica 35(5), pp.193–197. Edgerton, M. (2005). The 21st-century voice: contemporary and traditional extranormal voice. The New Instrumentation (Vol. 9). Scarecrow, Lanham (ML), Toronto, Oxford. Edmondson, J. & Esling, J. (2006). The valves of the throat and their functioning in tone, vocal register, and stress: laryngoscopic case studies. Phonology 23, pp.157–191. Engstrand, O., Frid, J. & Lindblom, B. (2007). A perceptual bridge between coronal and dorsal /r/. In Solé, M.-J., Beddor, P. S. & Ohala, M.,(eds), Experimental approaches to phonology, pp 175–191. Oxford University Press. Esling, J. H., Zeroual, C. & Crevier-Buchman, L. (2007). A study of muscular synergies at the glottal, ventricular and aryepiglottic levels. Proc. of the 16th ICPhS, Saarbrücken, pp. 585-588. Fuks, L., Hammarberg, B. & Sundberg, J. (1998). A self-sustained vocalventricular phonation mode: acoustical, aerodynamic and glottographic evidences. TMH-QPSR 3/1998, pp. 49–59. Grawunder, S. (2003a). Comparison of voice production types of ’western’ overtone singing and south siberian throat singing. Proc. of the 15th ICPhS, Barcelona., pp. 1699–1702. Grawunder, S. (2003b). Der südsibirische Kehlgesang als Gegenstand phonetischer Untersuchungen. In: Krech, E.-M. & Stock, E. (eds) Gegenstandsauffassung und aktuelle Forschungen der halleschen Sprechwissenschaft (Hallesche Schriften zur Sprechwissenschaft und Phonetik vol. 10), pp 53–91. Peter Lang, Frankfurt am Main. 30 Grawunder, S. (2009). On the Physiology of Voice Production in South-Siberian Throat Singing - Analysis of Acoustic and Electrophysiological Evidences. Frank & Timme, Berlin. Kotby, M. N., Kirchner, J. A., Kahane, J. C., Basiouny, S. E. & el Samaa, M. (1991). Histo-anatomical structure of the human laryngeal ventricle. Acta Otolaryngol, 111(2), pp. 396–402. Koufman, J. A., Radomski, T. A., Joharji, G. M., Russell, G. B., & Pillsbury, D. C. (1996). Laryngeal biomechanics of the singing voice. Otolaryngol Head Neck Surg,115(6), pp.527–537. Kyrgys, Z. K. (2002). Tuvinskoe gorlovoe penie - etnomuzikovečeskoe issledovanie [Tuvan Throat Singing - ethnomusicological studies]. Nauka, Novosibirsk. Lindestad, P. A., Sødersten, M., Merker, B. & Granqvist, S. (2001). Voice source characteristics in mongolian “throat singing” studied with high-speed imaging technique, acoustic spectra, and inverse filtering. Journal of Voice 15(1), pp.78–85. Lindestad, P. A., Blixt, V., Pahlberg-Olsson, J. & Hammarberg, B. (2004). Ventricular fold vibration in voice production: a high-speed imaging study with kymographic, acoustic and perceptual analyses of a voice patient and a vocally healthy subject. Logoped Phoniatr Vocol 29(4), pp. 162–70. Neumann, K., Gall, V., Schutte, H. K. & Miller, D. G. (2003). A new method to record subglottal pressure waves: potential applications. Journal of Voice 17(2), pp.140–59. Neuschaefer-Rube, C., Saus, W., Matern, G., Kob, M. & Klajman, S. (2001). Sono-graphische und endoskopische Untersuchungen beim Obertonsingen. In: Geissner, H. (ed) Stimmkulturen – 3. Stuttgarter Stimmtage 2000, pp. 219–222. Röhrig Universitätsverlag, St. Ingbert. Reidenbach, M. M. (1998a). Aryepiglottic fold: normal topography and clinical implications. Clin Anat 11(4), pp. 223–35. Reidenbach, M. M. (1998b). The muscular tissue of the vestibular folds of the larynx. Eur Arch Otorhinolaryngol 255(7), pp.365–7. Sakakibara, K.-I., Kimura, M., Imagawa, H., Niimi, S. & Tayama, N. (2004). Physiological study of the supraglottal structure. In ICVPB 2004, Marseille. Stager, S. V., Neubert, R., Miller, S., Regnell, J. R. & Bielamowicz, S. A. (2003). Incidence of supraglottic activity in males and females: a preliminary report. Journal of Voice 17(3), pp. 395–402. Saus, W. (2004). Oberton singen – Das Geheimnis einer magischen Stimmkunst. Traumzeit-Verlag, Schönau, Odenwald. Titze, I. R. & Story, B. H. (1997). Acoustic interactions of the voice source with the lower vocal tract. Journal of the Acoustical Society of America 101(4), pp. 2234–2243. 31 Trân, Q. H. (1991). New experiments about the overtone singing style. Bulletin d’ Audio-phonologie. Ann. Sc. Univ. Franche-Comté, Vol. VII (N◦5&6), pp. 607–618. van Tongeren, M. (2002). Overtone Singing - physics and metaphysics of harmonics in east and west. The Harmonic Series (vol. 1). Fusica, Amsterdam. Wokurek, W., & Madsack, A. (2009). Comparison of manual and automated estimates of subglottal resonances, Proc. Interspeech, Brighton, pp. 16711674. Yanagisawa, E., Estill, J., Kmucha, S. T. & Leder, S. B. (1989). The contribution of aryepiglottic constriction to “ringing” voice quality - a videolaryngoscopic study with acoustic analysis. Journal of Voice 3(4), pp. 342–350. 32 THE HISTORIC ACOUSTIC-PHONETIC COLLECTION OF THE TU DRESDEN Rüdiger Hoffmann, Dieter Mehnert Technische Universität Dresden, Institut für Akustik und Sprachkommunikation email: [email protected], [email protected] 1 Introduction At the beginning of the last century, the growing interest in foreign cultures and languages led to a rapid development in experimental phonetics. In Germany, Rousselot’s scholar, Panconcelli-Calzia, introduced experimental phonetics as a scientific discipline in Hamburg, as did Gutzmann and Wethlo in Berlin. With the development of electronic computing in the middle of the century, the interest in human hearing and speaking was extended to machines, and the field of speech technology, with the main topics of speech recognition and synthesis, started to be investigated. In this way, we have far more than one century of fascinating development of experimental phonetics and speech technology. It can be illustrated by numerous material objects coming from phonetic or acoustic laboratories. The Dresden University of Technology, which was one of the pioneering institutions in German speech technology, hosts a collection of such objects, called Historic Acoustic-phonetic Collection (HAPS). HAPS was formally founded more than one decade ago, in 1999, but its roots go back to very renowned German institutes of the past. This paper describes the history and the recent activities of this university-owned collection. 2 History of the Collection 2.1 Forming a Collection in Speech Technology Information Technology at the TU Dresden goes back to Heinrich Barkhausen (1881–1956), the “father of the electron valve,” who taught from 1911 to 1953. He was also interested in psycoacoustics and invented the first measurement device for loudness. Speech research in a narrower sense started with the development of a vocoder in the 1950s. Walter Tscheschner (1927–2004, Figure 1) started his extensive investigations on the speech signal using components of this vocoder. 33 Figure 1. Walter Tscheschner (right), pioneer of speech technology in Dresden, with the founder of the Institute of Telecommunikation of the former TH Dresden, Kurt Freitag. Photograph from about 1960. In 1969, a scientific unit for Communication and Measurement was founded in Dresden. It was the main root of the present Institute of Acoustics and Speech Communication. W. Tscheschner was appointed Professor of Speech Communication and started with research in speech synthesis and recognition. A number of representative devices for speech synthesis and recognition have been developed in Dresden. Over six decades, they formed a historic collection, which demonstrates how speech technology was developed depending on the technological base, starting with electronic valves and ending with embedded devices [1]. 2.2 Expanding the Collection towards Experimental Phonetics At the Berlin University, phonetics was established as an institution out of two disciplines: linguistics and medicine. The linguistic root was formed by the Phonographic Commission, founded in 1915, which was started to record the voices of speakers representing foreign peoples on wax cylinders or records. This institution developed in several stages into the Institute of Sound Research at Berlin University. In 1951, the institute was renamed Institute of Phonetics. The second root of phonetics at the Berlin University is represented by Hermann Gutzmann sen. (1865-1922), who worked as a voice and speech pathologist. Gutzmann, who made speech therapy part of the university’s curriculum, collected all the new instruments and research devices that had been used since 1900 by the emerging discipline of experimental phonetics. It was on Gutzmann’s initiative that the first Phonetics Laboratory was founded in Berlin. In 1926, the Phonetics Laboratory became an independent institution under the direction of Franz Wethlo (1866–1960). Wethlo received the teaching assignment 34 for Experimental Phonetics in 1926, which gave him the opportunity to extend the laboratory and to purchase new equipment. He developed numerous pieces of equipment. After the re-opening of Berlin University in 1947, the Phonetics Laboratory became part of the Institute for Special Education in 1950, which had just been founded. More details and a description of how the two roots came together can be found in [2] and [3]. After the restructuring following German reunification in 1990, phonetics was organized under the roof of the School of Rehabilitation Sciences. As a result of the higher education reform at the three Berlin universities, enrollment for the course of study ‘Science of Speech/specialisation Voice and Speech Therapy’ was stopped by decree in the autumn semester, 1993. This led to the closing of the subject area Phonetics in Berlin at the end of the year 1996. Based on the long lasting cooperation between the phonetics in Berlin and Speech Technology in Dresden [4], the historical remnants of the phonetic equipment were transferred to Dresden following the closing of the Chair of Phonetics in Berlin. This equipment set was complemented by a number of devices which came from numerous other German institutions, mainly from a former laboratory in Chemnitz which was founded by Georg Zöppel (1892–1963). With this merger, the Dresden collection expanded to represent one century of continuous development of experimental phonetics and speech technology. The merger was completed in 1999. Therefore, we consider this as the founding year of the HAPS. 2.3 The Merger with the Former Hamburg Phonetic Collection The Humanities Faculty of the Hamburg University goes back mainly to the Hamburg Colonial Institute, which was opened in 1908. It included a number of chairs working with foreign languages. There, a phonetics laboratory was founded in 1910 as a part of the Department of African Languages, developing later into a separate institute of the Hamburg University, which was founded in 1919. From 1910 to 1949, the Phonetics Laboratory or Institute, respectively, was directed by Giulio Panconcelli-Calzia (1878–1966, [5], Figure 2) who was a scholar of the Abb´e Rousselot. He was an ingenious researcher who built the institute into a place of international scientific importance. He founded the journal VOX, which served as an international platform for experimental phonetics. It is notable that the First International Congress of Experimental Phonetics took place in Hamburg back in 1914. 35 Figure 2. Giulio Panconcelli-Calzia demonstrates the application of a kymograph. Photograph from the HAPS collection. A detailed description of the history of the institute is given in [6]. In the 1990s, the educational branch of the institute was transferred to another department. The remaining part, which focused to general phonetics, was closed at the end of the winter term 2006/07 due to the restructuring of Hamburg University. The large collection of phonetic devices, which was part of the Phonetic Institute, fortunately survived the destruction of Hamburg during World War II and was opened for the public in 1986 [7]. As a plan for preserving this valuable collection, despite of the closing of the institute, the responsible department proposed a merger with the collection in Dresden. The collection was transferred to Dresden in 2005. Since 2006, the united collection can be visited in two rooms of the Barkhausen building of the TU Dresden (Figure 3). 36 Figure 3. View on one room of the collection in the Barkhausen building of the TU Dresden. 3 Recent Status of the Collection The HAPS preserves parts of the material estate of several important institutions in Germany. It represents, therefore, the development of experimental phonetics and speech technology in Germany with a high degree of completeness. In more detail, the following groups of exhibits are available: Historic phonetic devices of the pre-electronic era These devices from the first half of the 20th century are mainly mechanical and include different groups: instruments for the experimental work of the phonetician (devices for recording speech and related signals, devices for interpreting the recordings like measuring pitch contours, devices for measuring frequencies and performing spectral analysis, objects for teaching purposes (models of voicing and articulation), early devices for speech training and rehabilitation of handicapped people. Historic phonetic devices of the early electronic era The purpose of these objects from the second half of the 20th century is similar to that of the mechanical devices, which are mentioned above, but are now 37 accomplished by electronic means. This collection stops with the introduction of the computer in the phonetic laboratories. Historic objects demonstrating the development of speech technology A few objects of this collection demonstrate how sounds and speech can be produced by mechanical means. Of course, the real development of devices for speech synthesis and speech recognition is connected to the electronic and, primarily, the computer era. The collection includes not only objects from the research and development in Dresden (following the vocoder from the 1950s), but also a number of early speech synthesizers from other laboratories. Historic sound recordings At first, it must be noted that the placement of the important collection of wax cylinders from the former Hamburg Colonial Institute and its successor chairs is not known. Hence, they did not come to Dresden. However, the HAPS includes a larger number of shellac records. Some of them were produced in the laboratory of Panconcelli-Calcia for demonstration purposes. The main collection, however, consists of commercial music records with lower scientific importance. They were collected by Wilhelm Heinitz (1883–1963) who directed a research unit for ethnomusicology in Hamburg until 1948. Furthermore, the HAPS includes tapes with sound examples of the Dresden vocoder and early speech synthesizers. Historic photographs and transparencies The collection includes, among other visual media, a set of valuable photographic plates from Panconcelli-Calzia’s laboratory. Some of them are very useful because they demonstrate the correct application of early phonetic devices. 4 Public Activities The HAPS is a collection of the university which is used in teaching and research. The university collections in Dresden are managed by a curator which is responsible for the inventory. Due to the rapid growth of the collection during the last decade, the simple activity of producing such an inventory was very important. The objects have been photographed, and a first selection of the images is available on the websites of the institute [8]. A printed catalogue of the collection is in preparation. A first volume, which includes the historic phonetic devices, will be published around the end of this year by the publisher Thelem in Dresden. The HAPS can be visited on demand and at special opportunities like the dies academicus or the annual “night of sciences”. Additionally, selected objects have been presented at special exhibitions as follows: Exhibition about measuring pitch with historic instruments at the 3rd International Conference on Speech Prosody in Dresden, 2006, Participation with selected objects at the exhibition “Kempelen – Man in the Machine” in the Hall of Arts, Budapest, 2007, 38 Exhibition of selected objects at the 16th International Congress of Phonetic Sciences (ICPhS) in Saarbrücken, 2007 (Figure 4), Special exhibition “SprachSignale” (SpeechSignals) in the Technical Museum Dresden, 2009–2010. Figure 4. Selected exhibits from the HAPS at the International Congress of Phonetic Sciences in Saarbrücken, Germany, 2007. 5 Scientific Projects A number of scientific historic projects have been performed during the last decade. They have been partially supported by the German Acoustic Society (DEGA). A short overview on these activities follows: 5.1 History of the Institutions The HAPS illustrates more than a century of development in experimental phonetics and more than a half century in speech technology. It is important to connect the exhibits with the scientific development at the places where they originated. Therefore, we are collecting and publishing material on the development in Dresden and Berlin [4], Hamburg and other places. In particular, we are working on a monograph about the development of speech technology in Dresden. 5.2 Investigations on Selected Phonetic Devices It is sometimes not easy to understand how the historic phonetic devices worked. Many questions had to be answered for the descriptions of the instruments in the catalogue which is prepared for printing now. Among them, some devices were investigated in more detail. Wethlo’s cushion pipes An early project dealt with the reconstruction of historical larynx models. In 1898, Ewald had proposed an improvement of the existing larynx models by replacing 39 the simple membranes with air-pressurized cushions. Wethlo investigated this more natural construction in great detail from 1913 onwards [9]. The model, which was critical in the development of voicing theories, is known as “Wethlo’s Polsterpfeife” (cushion pipe). The Dresden collection includes a number of these objects in different sizes (Figure 5). Some of them are originals from Wethlo’s estate. They were reconstructed, and a number of experiments and measurements were performed [10]. Figure 5. Historic larynx models from Franz Wethlo, so-called cushion pipes. History of pitch measurement Pitch measurement has always played an important role in phonetics. There were different methods for recording speech signals, but the application of a kymograph was the predominant one. After recording the speech signal, it had to be measured to produce a curve showing the pitch vs. time. The whole procedure of converting kymographic waveforms into pitch contours required a number of steps, which had to be performed with great precision. Because this was a very time-consuming process, a number of aids were proposed, which were in use until the 1950s. We have tried to explain their application [11]. Pitch measurement with Boeke’s rack Another way to measure pitch contours and other parameters is based on the measurement of the “glyphs” at the surface of the wax cylinders of Edison’s phonograph. This was performed using a very sophisticated instrument which was 40 designed by J. D. Boeke. One of these devices is part of the HAPS (Figure 6) and was described in more detail in [12]. Figure 6. Boeke’s rack for measuring the ‘glyphs’ at wax cylinders. Accuracy of measuring frequencies with mechanical resonators A simple and widespread method for measuring the frequency of sounds was the application of Helmholtz resonators (fixed frequencies) or resonator tubes of Schaefer (tunable frequencies, see Figure 7). It is interesting to know more about the accuracy of these historic measuring devices. Therefore we performed a number of listening experiments which showed high accuracy in general, but a systematic deviation in the case of Schaefer’s resonators [13]. 41 Figure 7. A set of tunable resonators from Schaefer. Transfer functions of Marey’s capsules Transducers, which convert speech sounds into mechanic movements of writing pins, have been used successfully for waveform recording early in experimental phonetics. The sound is transmitted through a hose into a flat, normally circular capsule, which is closed by a thin rubber membrane. The movement of the membrane is transferred to a light lever with an attached pin. The tip of the pin scratches the waveform in the sooted paper on the revolving drum of a kymograph. This approach dates back to E. J. Marey (1830–1904) who used it for recordings of the movement of the pulse artery (sphygmograph) and other physiologic motions. Later, it was widely applied in experimental phonetics by P.J. Rousselot (1846–1924), his scholar G. Panconcelli-Calcia, and other successors. The properties of the transducers of the Marey type were evaluated by interferometric measurements of the transfer functions of numerous capsules from the HAPS collection [13]. It became clear that the transfer functions are not at all flat over the frequency range of interest. They show several maxima which are determined by the interplay of the system components, mainly the hose and the capsule. Fortunately, the missing flatness does not influence the period lengths of the recorded signals, which are measured for determining the pitch contour. Historic devices for rehabilitation purposes Rehabilitation engineering is a classical application field of speech technology. Therefore, it is interesting to study the early attempts, mainly from the preelectronic era. Prototypes of such devices are rare exhibits. The HAPS owns some examples which have been demonstrated in [14]. 5.3 History of Speech Technology Speech technology is the main research focus of the chair where the HAPS is maintained. Therefore the development of speech analysis and synthesis is one of 42 the foci of the historic interest. During the last years, special attention was directed to the following problems. History of mechanical speech synthesis This research activity is due to the existence of small mechanical sound or word synthesizers which came to the HAPS from the Hamburg collection (Figure 8). In the year 1899, the notable otologist Johannes Kessel (1839–1907) presented such instruments at a scientific meeting in Munich [15]. Kessel aimed to use them to teach people who have a significant degree of deafness. He recognized, however, that the quality of the synthetic voice was still insufficient for this purpose. Later, the original devices came to the Hamburg laboratory. The mechanical voices are interesting as early mechanical speech synthesizers. Therefore, we started a project to explore the development of this technology [16]. It can be interpreted as a late spin-off of Kempelen’s speaking machine, the principle of which came (via Melzel) to the puppet manufacturers. Figure 8. A collection of voice mechanics by Hugo Hölbe, arranged in a demonstration box. In our case, Hugo Hölbe (1844–1931) from Sonneberg was the manufacturer of the voices. Sonneberg is a town in Thuringia and was known as the world capital of toys in former times (Figure 9). We learned that “Stimmenmacher” (voice manufacturer) was a separate profession in the production of puppets and cuddly toys. 43 Figure 9. The “speaking picture book” was patented in 1874 by the bookseller Theodor Brand from Sonneberg, Germany. It applies voice mechanics similar to that from Figure 8. Left: view of the title; right: the interior. History of early vocoders The development of the vocoder in the 1930s had a profound impact on speech research in general. The first patent of the principle of the channel vocoder was derived by K.-O. Schmidt [17], but the most important prototype was originated by H. Dudley [18] who also coined the name. A number of other prototypes were developed during and after WorldWar II in different countries. We tried to collect all available information about this period [19]. It was not always easy because much of the work was secret in that time. History of electronic speech synthesis As already mentioned, there was also a vocoder developed in Dresden in the 1950s (Figure 10). In the following decades, many prototypes of a speech synthesis terminal were developed [20], partially in cooperation with the computer company Robotron. We demonstrated these objects in the special exhibition SprachSignale (cf. 4) and included the historic examples in our lectures on speech technology. 44 Figure 10. Photograph of the Dresden vocoder from the 1950s. 6 Conclusion The HAPS has been well developed during the last decade. We are confident to be able to continue the work of collecting equipment, as well as continuing some research activity. We hope that the Department of Electrical Engineering and Information Technology at the TU Dresden specifies a final place for all scientific collections in the near future, which would guarantee stable conditions for the future of the HAPS. References [1] Hoffmann, R.: 40 Jahre institutionalisierte Sprachtechnologie in Dresden. Studientexte zur Sprachkommunikation, vol. 54. Dresden: TUDpress 2009, 7–35. [2] Mehnert, D.: Phonetics at the University of Berlin – a history. The Phonetician, No. 92 (2005–II), 34–39. [3] Mehnert, D.: Phonetik an der Berliner Universität - ein Rückblick auf ihre Geschichte und auf Forschungsarbeiten der letzten Jahre. Studientexte zur Sprachkommunikation, vol. 35. Dresden: Universitätsverlag 2005, 33–54. [4] Hoffmann, R.; Mehnert, D.: Berlin-Dresden traditions in experimental phonetics and speech communication. In: Boe, L.-J.; Vilain, C.-E. (eds.): Un siècle de phonétique expérimentale. Lyon: ENS Éditions 2010, 191–210. [5] Köster, J.: Giulio Panconcelli-Calzia. The Phonetician, CL-61, 1992, 3–10. 45 [6] Neppert, J.; Pétursson, M.: Death of a Phonetic Institute: The Phonetic Institute of the University of Hamburg. Studientexte zur Sprachkommunikation, vol. 54. Dresden: TUDpress 2009, 36–39. [7] Grieger, W.: Führer durch die Schausammlung, Phonetisches Institut. Hamburg: Christians 1989. [8] www.ias.et.tu-dresden.de/sprache [9] Wethlo, F.: Versuche mit Polsterpfeifen. Passow-Schaefers Beiträge für die gesamte Physiologie 6(1913) 3, 268–280. [10] Hoffmann, R.; Mehnert, D.; Dietzel, R.; Kordon, U.: Acoustic experiments with Wethlo’s larynx model. International Workshop to the Memory of Wolfgang von Kempelen, Budapest, March 11–13, 2004. Grazer Linguistische Studien 62 (2004), 51–60. [11] Mehnert, D.; Hoffmann, R.: Measuring Pitch with Historic Phonetic Devices. 3rd International Conference Speech Prosody, Dresden. May 2–5, 2006. Dresden: TUDpress 2006, 927–931. [12] Mehnert, D.; Dietzel, R.: Von Glyphen zu Tonhöhen und Intensitäten – das Boekesche Gestell, ein historisches Auswertegerät. Studientexte zur Sprachkommunikation, vol. 52. Dresden: TUDpress 2009, 198–208. [13] Hoffmann, R.; Mehnert, D.; Dietzel, R.: Measuring the accuracy of historic phonetic instruments. Proc. 17th Int. Congress of Phonetic Sciences, Hong Kong 2011, pp. 176-179. [14] Mehnert, D.; Dietzel, R.; Kordon, U.: Aus den Anfängen der Experimentalphonetik – Hilfsgeräte zur Behandlung Hör- und Sprachbehinderter. Fortschritte der Akustik, DAGA 2011, Düsseldorf, 147–148. [15] Denker, A.: Bericht über die Versammlung deutscher Ohrenärzte und Taubstummenlehrer zu München. Archiv für Ohrenheilkunde 47, Nr. 3, Nov. 1899, 198–208. [16] Hoffmann, R.; Mehnert, D.: Die Kesselschen Stimm-Mechaniken in der historischen akustisch-phonetischen Sammlung der TU Dresden. DAGA, Stuttgart, March 19–22, 2007. [17] Schmidt, K.-O.: Verfahren zur besseren Ausnutzung des Übertragungsweges. German Patent 594 976, patented February 27, 1932. [18] Dudley, H. W.: Signaling System. US Patent 2,098,956, patented Nov. 16, 1937. [19] Hoffmann, R.: On the development of early vocoders. Proc. IEEE Histelcon, Madrid 2010, 6 p. [20] Hoffmann, R.: Sprachsynthese an der TU Dresden: Wurzeln und Entwicklung. Studientexte zur Sprachkommunikation, vol. 35. Dresden: Universitätsverlag 2005, 55–77. 46 GENIE: The Corpus for Spoken Lower Sorbian (GEsprochenes NIEdersorbisch) Roland Marti, Bistra Andreeva, William J. Barry Department of Slavonic Languages, Saarland University, Saarbrücken, Germany Phonetics, Saarland University, Saarbrücken, Germany e-mail: [email protected], [email protected], [email protected] Abstract Lower Sorbian is a Slavonic minority language spoken in Eastern Germany in German-speaking surroundings. The language is on the brink of extinction as there are basically no native speakers below the age of sixty. Therefore, the documentation of spoken Lower Sorbian is crucial. The corpus of spoken Lower Sorbian GENIE (GE[sprochenes] NIE[dersorbisch]: http://genie.coli.unisaarland.de/) is the first documentation of this kind. It brings together various kinds of spoken Lower Sorbian: recordings from the archive of Sorbian broadcasts (years 1956-2006), recordings from the Archive of Sorbian Culture (dialect recordings 1951-1971), and new recordings from native speakers made especially for the corpus in 2005/2006. The paper presents the corpus and its defining features, paying special attention to the particular situation of Lower Sorbian and its bilingual speakers. On the one hand, there is a very strong German influence; but on the other, Upper Sorbian interference is also clearly recognizable in the recordings. Furthermore, the paper illustrates the problem of what constitutes the speech of a native speaker in the case of minority languages. Finally, the problems of corpora of endangered languages are discussed. 1. Sorbian Sorbian is currently geographically the furthermost western part of the Slavic speaking area. It is at present a language island (more exactly, an archipelago of islets) within a German speaking area, that is situated in Upper and Lower Lusatia. This represents the remainder of the originally much larger territory, which, by means of language exchange, was gradually Germanized; a process that was repeatedly triggered and fostered by language-political measures that still continue (cf. Figure 1). 47 POLAND Berlin Brandenburg Saxony CZECH REPUBLIC LUSATIA - English ŁUŽYCA - Lower Sorbian ŁUŽICA - Upper Sorbian LAUSITZ - German Figure 1: The Sorbian-speaking region in Germany. This language area can be roughly divided into Upper and Lower Sorbian. Only in the Upper Sorbian area, more precisely in the Catholic districts, are there still villages where Sorbian is the common language (Scholze, 2008); elsewhere it remains nothing more than a family language, or rather the language of the older generation(s). The number of people with an active command of Sorbian can only be estimated. The estimates vary between 15,000 and 30,000 for Upper Sorbian and between 5,000 and 10,000 for Lower Sorbian (Jodlbauer, Spieß & Steenwijk, 2001). Upper, as well as Lower, Sorbian are autonomous languages. They are officially acknowledged as minority languages in Germany, first, in the constitutions and appropriate laws concerning Sorbs (or Sorbs/Wends) in the Free State of Saxony and the state Brandenburg1 and, second, in the European Charter for Regional or Minority Languages. The main problem for the Sorbian language is the dying-off of the Sorbian speaking community due to the lack of younger native speakers and the consequent shrinking of the area in which Sorbian is spoken. Geographical shrinkage is a phenomenon that has been observed since the 16th century. Both trends have been accelerating since the mid 19th century, and neither the revival measures nor fostering throughout the German Democratic Republic era could stop them. There are language preservation and revitalization measures at present 1 The official name in Brandenburg is “Sorbs/Wends” (“Sorben/Wenden”) and “Sorbian/Wendish” (“sorbisch/ wendisch”) since a part of the Lower Sorbian speaking community refuses the name “Sorbs” (“Sorben”) and “Sorbian” (“sorbisch”), where native speakers are concerned. According to linguistic (Slavic) tradition only “Sorbs” (“Sorben”) and “Sorbian” (“sorbisch”) are used. 48 (especially the so called WITAJ-project; Budar & Norberg, 2006), which can, however, at best slow down the language assimilation process. The situation of Lower Sorbian is particularly dramatic since inter-generational transmission does not exist any longer and children are led by means of (partial) immersion to the status of a kind of “secondary native speaker”. There are yet other specific problems concerning Lower Sorbian. The revival of Sorbian life and its organization after the Second World War was primarily initiated in the Upper Sorbian region and by Upper Sorbian exponents. This led to the perception that the cultural life was Upper-Sorbian oriented, which was in fact partially the case. This was experienced especially intensively in the language domain. The spelling reform from 1949-1952 brought about the approximation of Lower Sorbian to Upper Sorbian orthography. Since pronunciation that oriented itself on the written language was fostered and required at school and in the media, the spelling reform also had orthoepic consequences (so-called “spelling pronunciation”). The Upper Sorbian linguistic influence was further strengthened by the fact that, owing to the small number of autochthonous Lower Sorbian experts, functionaries in Sorbian organizations and teachers came predominantly from Upper Lusatia, and their language did not conform to the linguistic features of Lower Sorbian. This resulted in the popular impression that the Lower Sorbian standard language does not represent real Lower Sorbian at all, but an overall Sorbian hybrid language at best, or a kind of Upper Sorbian that had been adjusted slightly to Lower Sorbian. Many native speakers of Lower Sorbian therefore refused to participate in official efforts to strengthen the language and restricted its use to private life. Often they even stopped transmitting the language to the next generation. On the other hand, the official language policy, centred on the standard language and neglecting dialects, gave rise to the feeling in Lower Sorbian speakers that they could not speak correct Sorbian (an opinion that is heard repeatedly during field recordings). This explains the wish for reinforced demarcation from Upper Sorbian which emerged when state control over cultural life ceased. The latter finds expression in the adoption of different terminology (“Wendish” instead of “Lower Sorbian”, cf. n. 1), in the withdrawal of some parts of the spelling reform from 1949-1952, and in the rejection of a purist language that is felt to be Upper Sorbian.2 2. The Corpus for Spoken Lower Sorbian GENIE In view of the precarious situation of Lower Sorbian that was described in relevant studies (Jodlbauer, Spieß & Steenwijk, 2001; Norberg, 1996), it was foreseeable that the “authentic” mother tongue would no longer exist within one generation at best. That turned out to be particularly fatal for the spoken language since the 2 This results in the current (re)appearance of lexical Germanisms (lazowaś instead of cytaś, hundert instead of sto), that have always been in colloquial use, also in written language. The similar situation can be observed in the grammar section, e.g. with determination (occasional use of the definite and marginally also the indefinite article). 49 “secondary mother tongue” (the maximum goal aimed at by efforts of revitalization) differs strongly from the “authentic” mother tongue, especially in its pronunciation.3 In this respect, it was important, and extremely urgent, to document spoken Lower Sorbian. With this objective in mind, the corpus GENIE: GEsprochenes NIEdersorbisch (Spoken Lower Sorbian) was created. The corpus creation was partially funded by the Scientific Committee of the University of Saarland in the years 2005-2006. The endeavour was also financially supported by the Radio Berlin-Brandenburg (RBB) and the Sorbian Institute/Serbski Institut. In order to make this corpus internationally usable for the scientific research, it was made available on the web (http://genie.coli.uni-saarland.de). The GENIE website is supported by the Insitut für Phonetik (http://www.coli.unisaarland.de/groups/WB/Phonetics/index.php) together with the Institut für Slavistik (http://www.uni-saarland.de/fak4/fr44/) at the University of Saarland. Due to copyright and data privacy protection rights, it could not be made generally available; its use is permitted for scientific purpose by application (http://genie.coli.uni-saarland.de/cgi-bin/benutzer.html). The corpus arrangement was structured to meet the special features of the situation of Lower Sorbian presented above and, where possible, to take into account the diachronic level.4 There are more than sixty hours of spoken Lower Sorbian in its distinct variants available in GENIE. Even though the period of time covered by the recordings ranges only from 1951 to 2006, the speakers' dates of birth indicate that the diachrony is considerably deeper: the oldest speaker was born in 1860 (he was 94 years old at the time of the recording), the youngest speaker was born in 1973. Individual diachrony is also traceable since several people are represented in multiple recordings that were produced at different times. 2.1 Sources The corpus consists of recordings from three different sources: a) Archive of the Sorbian Radio (Studio Cottbus of the Radio Berlin-Brandenburg RBB, formerly ORB, earlier still Radio of GDR) This source consists of 110 recordings made between 1956 and 2006. Speakers of dialects and of the standard language (native speakers of Lower Sorbian/ Wendish, Upper Sorbian or German) are both represented in different variants of the standard language. The text types are very different: conversation, interview, address, report etc. 3 The reason for this is primarily due to the fact that the teachers employed in the revitalization project WITAJ, apart from a few exceptions, do not have a command of Lower Sorbian as their mother tongue, but at best as their secondary mother tongue. 4 Owing to copyright, the oldest recordings of Sorbian could not be adopted from the Berlin Archive, therefore only marginal diachronic depth is taken into consideration: the recordings were made in the years 1951-2006. 50 b) Archive of Sorbian Culture/Serbski kulturny archiw (SKA) in the Sorbian Institute/Serbski Institut The source contains 135 recordings made between 1951 and 1971.The recordings were compiled for linguistic purposes by the Institute, in particular for the Sorbian Linguistic Atlas (cf. References SSA 1-13 1965-1993). Its aim was the recording of local dialects (story, interview, elicitation etc.). c) The field study project specifically for this corpus The source consists of 100 recordings made between 2005 and 2006. They involve conversations between J. Frahnow (pastor and native speaker) and mostly elderly native speakers whose speech usually represents a local dialect. While selecting the recordings and test persons, we attempted to depict the complexity of dialectal forms of Lower Sorbian/Wendish along with diverse standard linguistic variants employing the three sources mentioned. 2.2 Metadata files There is a data record sheet for every recording containing the most important information about the recording. Specifically, these are: call number (the recording identifier): this consists of the letters f, r or s and a four-character-number where f means field recording created by J. Frahnow, r stands for recordings from the radio archive of the RBB, and s signifies recordings from the Archive of Sorbian Culture. In addition to the call numbers valid for this corpus, there are archive call numbers as used in the source. text type (e.g., conversation, interview, report) contents (e.g., village life, customs, farming) place of the recording date of the recording indication of sex (names are not given to protect the person’s identity) speaker’s place of birth speaker’s date of birth dialect family language: it is specified here whether the family language was Lower Sorbian/ Wendish, German or mixed (or Upper Sorbian where applicable) places of residence education The place names in the arrays (place of the recording, place of birth, dialect and place of residence) are given in German and Lower Sorbian/Wendish and can be shown and arranged in three sections: place, municipality, and district. Additionally, all the Lower Sorbian places covered are allocated to the dialect 51 areas. In doing so, the classification of the Sorbian Language Atlas was taken into consideration, which ultimately goes back to the categorization by Muka (19111926). In it, only Lower Sorbian dialects proper or transitional dialects are distinguished. In the case of native speakers of Upper Sorbian, there is only a reference to this fact without indication of the dialect area. In case of non-native speakers or native speakers that use the standard language, the word “standard” is used. There are several metadata sets available for some recordings, namely in cases where there is more than one speaker participating in the recording (hosts and interviewers were usually not taken into account). The call numbers of the metadata sets are identified in these cases by the attached index letters (e.g., a, b, etc.). Access to the datasets and audio recordings in the corpus may be obtained either directly, by stating the call number, or indirectly by using a search form, within which you can search or classify all specified arrays with intelligent filter functions. 2.3 Technical data of the recordings In addition to the specified background information, data record sheets comprise the following information: length of the recording in minutes and seconds size of the .wav-file in bytes/kilobytes/megabytes size of the .mp3-file5 in bytes/kilobytes/megabytes sampling rate in Hz amplitude quantization rate in bits per sample number of channels (1 for mono, 2 for stereo) signal-to-noise ratio SNR (as yet only with data from the field search project) bit rate (.mp3-file) in kBit/s 3. Examples from GENIE It is evident from the description of the GENIE corpus that the material can be analysed with various objectives in mind. For one thing, the description and the comparison of the structural characteristics of the various dialect areas are an attractive challenge in itself. Even though the spontaneous speech of the recordings does not allow for an exhaustive grammatical description, the newly recorded material provides a valuable supplement to the (not immediately accessible) dialect recordings made during the German Democratic Republic era. Another important question is to what extent the spoken standard language may vary and, depending on the speaker’s origin, adopts a dialectal form, thus actually containing Lower Sorbian, Upper Sorbian or German features. The focus of our 5 mp3 audio files are highly compressed in size. They take much less time to transmit over the internet. 52 first analyses, though, will be on the influence of German on spoken Lower Sorbian; an influence that grew steadily over the 20th century, but which had been present a long time before. The comparison of recordings of younger and older people can shed light on the extent of this influence, as well as on the linguistic features affected by it. More striking yet is the comparison of recordings of the same person made at different times. According to the existing descriptions (Schwela, 1906; Janaš, 1984; Starosta, 1991), there are well-known phonetic dissimilarities between German and Lower Sorbian on the segmental level, the vowel quality and quantity, the R sound, the realization of plosives with regard to voicing and aspiration, as well as the existence of the dark L or a [w] and of the correlation of palatalization, widespread in Slavic languages. There are, above all, characteristic features of intonation and word stress known from impressionistic descriptions of the prosody. Other rarely mentioned, though important discrepancies, are word-chaining modes, such as the division of neighbouring vowels by means of a glottal stop or the type of voice assimilation (progressive or regressive). As examples of the existing and growing impact of the influence of German on Lower Sorbian, we show here four of the phenomena mentioned above in utterances of an elderly speaker (A, born in 1890) and of a younger speaker (B, born in 1960). Figure 2, a representation of the microphone signal and the spectrogram of the utterance, “Chtož tu rolu wobźěłajo” (English “Who works on the land”) illustrates several pronunciation features in one short stretch of speech that prove the influence of German, three of which we comment on below: 1. In the word “rolu” the /r/ is realized as a uvular approximant ʁ (see I). 2. “wobźěłajo” /'obʑewajo/ starts with a glottal onset instead of a smooth transition from “rolu” (see II) or an alternatively possible [h] 3. The syllable-final /b/ and the following syllable-initial /ʑ/ are voiceless (see III). 53 I II III Figure 2. The utterance “Chtož tu rolu wobźěłajo” (here: [xtɔʃ tʊ ʁɔlu ʔɔpʃevajɔ ) by speaker B (born in 1960) with (I) uvular ʁ , (II) hard vowel onset (glottal stop) und (III) devoicing at the word coda with progressive devoicing of a voiced initial fricative. In Figure 3, depicting the oscillogram of an acoustic time signal and the spectrogram of the utterance “tak daloko” (English “so far”), the voiceless plosives /t/ (see I) and /k/ (see II) demonstrate, contrary to the claim that in Lower Sorbian voiceless plosives are unaspirated, clear features of a moderate degree of aspiration (in both cases 26 ms). The measured duration of aspiration is relatively short if compared to that of monolingual speakers of German. Therefore it is important to examine whether an intermediate form (similar to the weak aspiration with Canadian speakers of French; Sundara et al., 2006; Fowler et al., 2008) has become established in Sorbian, within this generation or with this speaker alone. 54 I II Figure 3. The utterance “tak daloko” (here: [thak dalɔkhɔ]) by speaker B (born in 1960), where clear aspiration (I) of /t/ and (II) of /k/ can be noticed. The older speaker (born in 1890) demonstrates a different articulation pattern. Indeed, in figure 4 in her statement, “To njejo tak dobre” (English “It’s not that good”), a tendency to aspirate can be observed: /t/ in “to” manifests an aspiration duration of 37 ms (see I). On the other hand, following /k/ in “tak” she produces a fully voiced initial /d/ in “dobre” that affects /k/ regressively, making it voiced (see II). This suggests that the assimilation process contrasts with the common German pattern but corresponds to what is typical of other Slavic languages. The apical [r] in “dobre” also differs from the German standard-/r/, which is a uvular fricative ʁ . There are two signal muting taps of apical [r] to be seen in spectrogram as well as in the microphone signal (see III). 55 II I C III Figure 4. The utterance “To njejo tak dobre” (here: [thɔ ne thag dɔbrə]) by speaker A (born in 1890), where (I) aspiration of /t/, (II) a fully voiced /d/ with partial voicing of the preceding /k/ and (III) a double-contact apical /r/ can be observed. As far as the fourth phenomenon in the younger speaker's recording is concerne (the missing smooth transition from one vowel to the next across a word boundary), it cannot be maintained that in earlier times glottal constriction, according to German pattern, did not appear. In a short utterance (“a to ak,” English “as”) of speaker A, there is a clear glottalization at the beginning of the utterance and at the word boundary between “to” and “ak” (see I and II in figure 5). Further studies will allow us to determine how often such instances of glottalization occur in her speech. It also cannot be ruled out that Slavic languages behave similarly to other “binding” languages (French, Italian, English etc.) and dialects (such as Alemannic). That is to say a stressed word with an initial vowel in an emphatic context can very well start with a hard glottal onset. In the younger speaker’s example, however, the glottalization appears in non-emphatic context. The older speaker’s utterances are characterized by a general emphatic “word by word” style. The utterance is not distinctively emphatic, but the glottalization might be attributed to this general style. A further uncertainty, when comparing the two speakers, results from age-related differences in the voice quality that add to the difficulty of interpreting glottal phenomena. 56 II I C Figure 5. The utterance “a to ak” (here: [ʔa thɔ ʔak]) by speaker A (born in 1890) with glottalization (I) at the beginning of the utterance and (II) at the word boundary between “to” and “ak.” 4. Corpora of endangered languages – an exceptional case? Following the presentation of a concrete corpus of an endangered language, we should ask whether, from a general linguistic perspective, corpora of endangered languages, or of micro languages in the broader sense (see The UCLA Phonetics Lab Archive [http://archive.phonetics.ucla.edu/], The Endangered Language Fund [ELF: http://endangeredlanguagefund.org/], DOkumentation BEdrohter Sprachen/documentation of endangered languages [DOBES: http://www.mpi.nl/DOBES/], and the Leipzig Endangered Languages Archive [LELA: http://www.eva.mpg.de/lingua/resources/ lela.php] among others), are essentially different from the corpora of other languages and whether this has consequences for their planning, composition and supervision. In fact, there are differences, but they are not of a principal nature. An important difference concerns information value or, in other words, representativeness of the corpora. Paradoxically, the corpora of endangered languages are simultaneously more and less representative than those of other languages. The higher degree of representativeness becomes especially clear in the case of written corpora. Only languages with a limited written tradition may include a high percentage of all that has been written in the corpus. 57 There are two reasons for lower representativeness. First, endangered languages are either not documented at all, or if they are, then by relatively smallsized corpora and only rarely by means of several corpora. In addition, the data that exists has usually been collected by chance and does not reflect an intentional selection. The second reason for lower representativeness lies in the fact that the norm of endangered languages is less fixed, and so there is greater variability within them that can only be imperfectly represented. It is even possible that idiolectal predominance in a corpus may distort linguistic structures. A further discrepancy is related to the composition, processing and supervision of the corpora. As far as endangered languages are concerned, the group of people that are interested in the corpora and are capable to put them together is rather small. The same applies to the financial possibilities of minorities. As a consequence, corpora of minority languages, if they are created at all, cannot be specialized (they are the proverbial 'all-in-one' tools) and will only be partially annotated, if at all. Continuous development, updating and documentation are only possible to a very limited degree. A major difference is ultimately inherent in the function of the corpora. As far as endangered languages are concerned, the corpus is not a linguistic working tool in the first place. It is, rather, a memorial with a quite distinct culture-political objective. It shall document what still exists and what will possibly soon disappear.6 This may well have consequences for the choice of the texts to be recorded if the “antiquarian” idea prevails. Corpora of endangered languages are clearly an exceptional case. Both producers and consumers must take this into consideration. The producers must take into account the limiting general conditions and the additional functions and ensure that such corpora will be supervised in spite of limited resources. The users must show understanding for the particularities of such corpora and also be willing to contribute actively to their optimization, for example, by making the transcriptions and annotations they created themselves available for the corpus. References Budar, L. & Norberg, M. (2006). „Les écoles sorabes après 1990“. Education et Sociétés Plurilingues 20 (juin): 27-38. Fowler, C. A., Sramko, V., Ostry, D. J., Rowland, S. & Halle, P. (2008). Crosslanguage phonetic influences on the speech of French-English bilinguals. Journal of Phonetics 36, pp. 649-663. Janaš, Pětr (1984). Niedersorbische Grammatik für den Schulgebrauch. Bautzen: Domowina. 6 It is not a coincidence that in the “Archive of vanished places” (“Archiv verschwundener Orte/archiw zgubjonych jsow”) in the village of Baršć/Forst, recordings of Sorbian language are to be heard in order to demonstrate how “Devastation” (open-cast lignite mining) affected the cultural heritage of the region (www.forst-lausitz.de/sixcms/media.php/674/Broschuere_AVO_Aufl2.pdf). 58 Jodlbauer, R., Spieß, G. & Steenwijk, H. (2001). Die aktuelle Situation der niedersorbischen Sprache: Ergebnisse einer soziolinguistischen Untersuchung der Jahre 1993-1995. Bautzen: Domowina (= Schriften des Sorbischen Instituts 27). Muka, Ernst (1911-1926). Słownik dolnoserbskeje rěcy a jeje narěcow I–III. Petrograd: RAN; Praha: ČAVU. Norberg, Madlena (1996). Sprachwechselprozeß in der Niederlausitz. Soziolinguistische Fallstudie der deutsch-sorbischen Gemeinde Drachhausen/ Hochoza. Uppsala (= Acta Universitatis Upsaliensis. Studia Slavica Upsaliensia 37). Scholze, Lenka (2008). Das grammatische System der obersorbischen Umgangssprache im Sprachkontakt. Bautzen: Domowina (= Schriften des Sorbischen Instituts 45). Schwela, Gotthold (1906). Lehrbuch der Niederwendischen Sprache. Erster Teil: Grammatik. Heidelberg: Ficker. SSA 1-15 1965-1996 Sorbischer Sprachatlas (Serbski rěčny atlas), bearbeitet von H. Faßke, H. Jentsch und S. Michalk, 1-15, Bautzen (Budyšin) 1965-1996. Starosta, Manfred (1991). Niedersorbisch schnell und intensiv 1. Bautzen: Domowina. Sundara, M., Polka, L., & Baum, S. (2006). Production of coronal stops by simultaneously bilingual adults. Bilingualism: Language and Cognition 9, pp. 97– 114. Internet sources (accessed 30.03.2011): www.forst-lausitz.de/sixcms/media.php/674/Broschuere_AVO_Aufl2.pdf http://genie.coli.uni-saarland.de http://www.mpi.nl/DOBES/ http://www.eva.mpg.de/lingua/resources/lela.php http://endangeredlanguagefund.org/ 59 Adjectif épithète et attribut de l’objet. Qu’en est-il de la prosodie ? Denis Ramasse CRISCO EA 4255, Université de Caen, France e-mail: [email protected] Résumé En français, un adjectif placé juste après un nom peut avoir deux fonctions différentes : épithète et attribut du complément d’objet (a.c.o.). Une confusion peut ainsi naître dans l’interprétation d’une phrase comme : J’ai cru cet homme sincère qui peut être comprise de deux façons : cet homme était vraiment sincère et je l’ai cru, cela correspond à la fonction épithète ; ou j’ai cru que cet homme était sincère et je me suis peut-être trompé, dans cette interprétation l’adjectif est attribut de l’objet homme. On a cherché à savoir si la prosodie permettait de lever cette ambiguïté sous deux aspects : celui de l’encodage et celui du décodage. 10 phrases ambiguës, présentées dans deux cotextes (l’un forçant l’analyse de l’adjectif en épithète, l’autre en a.c.o.), ont été enregistrées par 6 locuteurs (3 hommes, 3 femmes). L’analyse acoustique de ce corpus a révélé 4 indices prosodiques susceptibles de différencier les deux fonctions: un court silence entre nom et adjectif (appelé pausette dans une description précédente), une montée mélodique finale, un allongement moyen de durée et une élévation moyenne de hauteur. Une analyse statistique des données a montré l’importance des deux premiers indices. Un double test de perception a permis de vérifier que cette hiérarchie des indices n’était pas la même au niveau du décodage parce qu’elle a révélé aussi qu’une élévation moyenne de hauteur venait renforcer le rôle de la pausette pour indiquer une fonction attribut de l’objet. Abstract Can prosody help to decide whether an adjective is epithet or attribute of the object in a sentence? In French, there can be an ambiguity when you don’t know by the context the exact function of an adjective. In the sentence J’ai cru cet homme sincère, you can understand: I trusted this sincere man (the adjective is epithet) or I thought this man was sincere (the adjective is attribute of the object man). Perhaps prosodic cues could disambiguate such sentences. To check this hypothesis 20 sentences, in fact 10 sentences but realized in two different co-texts, were recorded by 3 men and 3 women. The acoustic analysis of the recordings revealed 4 cues which could establish a distinction between the two functions. The adjective was analyzed as an attribute of the object when i) there was a short silence between the noun and the adjective, ii) there was a melodic rising at the end of the sentence, iii) and iv) the average duration and the average height of the sentence were a little greater. A statistical analysis of the data showed that the silence and the final rising were the 60 most important cues. A perceptual test was then prepared to check whether these cues were used in perception. It proved that there was not the same hierarchy between cues in the perception, because the average height of the sentence seems to be a useful cue which completes the role of the silence. 1 Introduction Une séquence Verbe (V) + Nom (N) + Adjectif (A) peut être source d’ambiguïté en français. Il y a, en effet, deux rattachements possibles de l’adjectif (Fuchs 1996) : soit il dépend du nom, il n’y a pas de frontière syntaxique entre N et A — la parenthétisation est V(NA) —, et il est épithète soit il dépend du verbe ; il y a une frontière entre V et N — la structure syntaxique est ((V N) A) — l’adjectif est alors attribut du complément d’objet (abrégé en a.c.o. selon Riegel, Pellat & Rioul 1994). Riegel (1991) propose un ensemble de tests pour mettre en évidence la fonction d’un adjectif selon tel ou tel emploi et ainsi faire la distinction entre épithète et a.c.o. Par exemple, en prenant une phrase du corpus qui sera étudié: Il a acheté cette voiture neuve. Table 1: épithète pronominalisation Il l’a achetée. interrogation en qu(e) Qu’est-ce qu’il a acheté ? ou qu’est-ce qu(e) transformation en: Cette voiture neuve qu’il a nom+relative achetée. passivation Cette voiture neuve a été achetée. Extraction en C’est cette voiture neuve c’est… que qu’il a achetée. détachement Cette voiture neuve, il l’a achetée. a.c.o. Il l’a achetée neuve. Qu’est-ce qu’il a acheté neuf ? Cette voiture qu’il a achetée neuve. Cette voiture a été achetée neuve. C’est cette voiture qu’il a achetée neuve. Cette voiture, il l’a achetée neuve. (Un septième test (en fait, le troisième dans la liste qu’il présente) semble difficile à appliquer, même dans l’exemple qu’il donne (Le jury a jugé ce travail remarquable.); il s’agit de l’interrogation en comment : 61 Table 2: interrogation en comment épithète Comment a-t-il acheté cette voiture neuve? a.c.o. ? Comment a-t-il acheté cette voiture ? Ce test, implicitement, tend à considérer l’adjectif en fonction a.c.o. comme un complément circonstanciel ; c’est pourquoi il semble préférable de ne pas l’utiliser.) Si la distinction, à l’écrit, entre les deux fonctions est délicate, on peut se demander s’il n’y a pas, à l’oral, des indices permettant de lever cette ambiguïté. Les locuteurs pourraient, en effet, à l’encodage, ajouter des éléments prosodiques que les auditeurs seraient, au décodage, habitués à retrouver. L’étude présentée ici s’attachera à mettre en évidence, dans la prosodie des phrases, l’existence éventuelle d’indices permettant d’opposer les deux fonctions de l’adjectif. 1.1. a.c.o. essentiel et a.c.o. accessoire (ou accidentel) Après certains verbes par exemple d’opinion (juger, croire, trouver, voir, sentir, etc.) ou déclaratifs (dire, prétendre, assurer, affirmer, etc.), l’adjectif attribut du complément d’objet est considéré comme essentiel car il détermine l’acception de ces verbes (d’après Noailly 1999: 120, et Le Goffic 1993, en particulier § 263). Avec ces verbes, d’après Fuchs (1996: 133), apparaissent des constructions dites "réduites", soit réduction d’une complétive en que si l’adjectif a une fonction a.c.o. essentiel : J’ai cru que cet homme était sincère. soit réduction d’une relative si l’adjectif a une fonction épithète : J’ai cru cet homme qui était sincère. Le résultat de la réduction est : J’ai cru cet homme sincère. (phrase 10 du corpus). Avec les autres verbes, il n’y a pas de réduction, ce sont des attributs accessoires. Dans le corpus qui sera étudié, on peut ainsi faire une distinction entre : 1. Il a trouvé cette idée folle. 10. J’ai cru cet homme sincère. où, si l’adjectif est attribut de l’objet, il est considéré comme attribut essentiel de l’objet ─ ce seront les seuls attributs essentiels du corpus ─, et, par exemple : 6. Il boit son chocolat froid. 8. J’ai connu cet homme intraitable. où, le cas échéant, sera analysée une fonction attribut accessoire de l’objet. On peut faire remarquer à propos de la phrase 1 du corpus : Il a trouvé cette idée folle. que la langue anglaise fait intervenir l’ordre des mots pour éviter l’ambiguïté créée par les deux fonctions possibles, épithète ou a.c.o., de l’adjectif ; en effet, s’il est épithète, on a la phrase : He found this crazy idea. 62 et inversement, s’il est attribut de l’objet, il est postposé au nom : He found this idea crazy. Il est alors tout à fait justifié de supposer que, s’il y a désambiguïsation dans une langue par des moyens syntactiques, des indices prosodiques pourront avoir le même rôle dans une autre langue. 1.2. Sémantisme des deux fonctions La fonction épithète ou attribut de l’objet fera ressortir telle ou telle clique d’un verbe, une clique étant un sens microscopique, selon le dictionnaire des synonymes du CRISCO, et un sous-graphe complet maximal dans la représentation graphique de la synonymie d’un mot, selon Ploux et Victorri (1998). Par exemple pour la phrase n° 1 : trouver = 22 : concevoir, créer, découvrir, imaginer, inventer, trouver, avoir avec adjectif épithète; = 48 : estimer, juger, penser, trouver, être d'avis avec adjectif attribut de l’objet. Par ailleurs, les objets possèdent une propriété intrinsèque, ontologique (pour reprendre le terme de Thomas 2003) ; par exemple pour la phrase n° 3 du corpus présenté ci-dessous, la propriété d’un feu de circulation peut être sa couleur. L’adjectif vient définir cette propriété. De même, la propriété, à la phrase n° 6, du chocolat est sa température. L’adjectif épithète la définit de façon « durable » (pour reprendre le terme de Blanche-Benvéniste 1991), tandis que l’adjectif attribut de l’objet la définit de façon passagère. Pour cette phrase n° 6 ainsi que pour la phrase n° 8, il y aurait avec les attributs de l’objet, selon Fabienne Martin (2006), simultanéité de deux procès et juxtaposition de deux prédicats, le second étant un dépictif (prédicat second descriptif) : 6. Il boit son chocolat froid. (=Il boit son chocolat alors qu’il est froid) 8. J’ai connu cet homme intraitable. (=J’ai connu cet homme alors qu’il était intraitable.) Fabienne Martin oppose les dépictifs aux prédicats seconds résultatifs que l’on trouve dans les phrases n° 4 et n° 5 du corpus : 4. Il a rendu son devoir irréprochable. 5. Il a gardé sa chemise propre. Dans la première de ces deux phrases, le caractère irréprochable est le résultat obtenu par le premier procès. Même si c’est moins évident pour la seconde phrase, l’aspect propre de la chemise est le résultat d’un procès implicite de protection. Une autre opposition sémantique subjectif/objectif peut être véhiculée par cette différence de fonction. Les attributs essentiels des phrases n° 1 et n° 10 s’opposent en effet par leur aspect subjectif au caractère objectif conféré par la fonction épithète. Par exemple dans la phrase n° 10, la sincérité de l’homme est le fruit d’une impression ou d’un jugement dans un cas et une réalité dans l’autre cas. 63 Un aspect objectif et durable véhiculé par la fonction épithète s’oppose ainsi au caractère subjectif et éphémère apporté par la fonction attribut de l’objet avec des nuances circonstancielles de simultanéité (dans les dépictifs) ou de finalité (dans les prédicats seconds résultatifs). 2. Étude présentée Dans cette étude, on a cherché à mettre en évidence une différence dans la prosodie de deux phrases, identiques d’un point de vue segmental, mais comportant l’une, un adjectif épithète l’autre, le même adjectif en fonction attribut de l’objet. Par exemple, la phrase 1 du corpus Il a trouvé cette idée folle. peut se paraphraser en Il a conçu cette idée folle. (adjectif épithète) d’une part, et en Il a jugé cette idée folle. (adjectif a.c.o.) d’autre part. Pour parvenir à ce résultat, la même phrase d’un point de vue segmental a été placée dans deux cotextes différents induisant deux fonctions différentes de l’adjectif. Ces cotextes étaient très simples, n’avaient rien de littéraire, mais avaient été imaginés dans le seul but de donner une fonction très distincte à l’adjectif. Par exemple, pour cette première phrase : Cotexte 1 : Il cherche toujours à se faire remarquer. Il a trouvé cette idée folle. Il s’est acheté une chemise violette. Cotexte 2 : Elle lui a suggéré d’acheter une chemise violette. Il a trouvé cette idée folle. Ou, pour prendre la phrase 6 du corpus : Il boit son chocolat froid. Cotexte 1 : Il fait très chaud. Il entre dans un café et se commande un chocolat froid. Il regarde sa montre. Il boit son chocolat froid. Il sort. Cotexte 2 : Il se sert son chocolat bien chaud, bien fumant. Il s’attarde plus qu’il ne l’aurait fallu à sa lecture. Il boit son chocolat froid. Il part travailler. 10 phrases ont ainsi été réunies dans un petit corpus 1, en pratique deux corpus, l’un avec les adjectifs épithètes, l’autre avec les adjectifs en fonction attribut de l’objet. 1 Un premier corpus réalisé par une seule locutrice a d’abord été analysé dans une étude préliminaire qui a été présentée à un groupe de recherche sur l’adjectif du CRISCO. Ce corpus a été modifié, en partie grâce aux remarques faites par des membres du groupe, car il comportait deux phrases présentant des problèmes dans l’analyse. 1° il y avait une phrase qui était en quelque sorte une "intruse", puisque l’adjectif est susceptible d’être non pas attribut du complément d’objet, mais attribut du complément du présentatif. Il s’agissait de la phrase Voilà la question insoluble. Elle figurait dans le corpus pour tester sur le plan prosodique ce que disent RIEGEL et coll. (1994) à propos de l’attribut du complément du présentatif (p. 241), à savoir : Les séquences introduites par les présentatifs voici, voilà et par le verbe impersonnel falloir occupent la position structurelle d'un c.o.d. Elles peuvent être suivies d'un élément prédicatif fonctionnant comme un a.c.o. : Le voici enfin libre. Mais l’analyse a montré que 64 Le but était de dégager des indices utilisés pour encoder les deux fonctions et non pas de faire une étude statistique exhaustive et rigoureuse de la production des phrases avec adjectif épithète ou attribut de l’objet. Ce corpus (cf figure 1) a donc été enregistré par six locuteurs (dont moi-même2), trois femmes (désignées par loc1, loc2 et loc3) et trois hommes (loc4, loc5 et loc6). Le corpus des adjectifs épithètes a toujours été enregistré avant celui des attributs de l’objet, selon l’ordre adopté dans la présentation de la figure 1. la réalisation prosodique d’une telle phrase était différente de celle comportant un adjectif a.c.o. 2° une autre phrase a été modifiée car la forme phonétique de l’adjectif était la même au masculin et au féminin. Il s’agissait de :Il a acheté cette voiture chère. Ceci avait pour conséquence une réalisation de cet adjectif comme l’adverbe cher. La réalisation prosodique de cette phrase se distinguait nettement de celle des autres, car l’adjectif chère n’avait plus la fonction a.c.o. mais complément circonstanciel. 2 Il n’y a pas de différence significative entre ma réalisation et celle de chacun des deux locuteurs masculins. Ceci est prouvé par l’application de 2 tests de Wilcoxon signés (degré de significativité à .05) sur les différences des moyennes, pour les 3 derniers paramètres, entre chaque phrase avec adjectif épithète et la phrase correspondante avec adjectif a.c.o. ; par ailleurs, je n’ai pas réalisé de pausette. 65 3. Contributions antérieures à la description de la prosodie de l’adjectif épithète et a.c.o. Il n’y a pas de véritables études de la prosodie des phrases dans lesquelles apparaissent les adjectifs épithètes et attributs de l’objet, mais des descriptions "impressives" plus ou moins détaillées. La plus récente date de 1999, c’est celle de Noailly (p.120), qui commente la phrase : Lise voudrait un mur jaune. en disant : on est indécis, tout dépendant du contexte, et de l'intonation, plus ou moins liée. Ce qui peut être glosé de la façon suivante : l’adjectif jaune est épithète si l’intonation est liée il est attribut de l’objet s’il y a une rupture dans l’intonation. Cette rupture dans l’intonation apparaissant dans des phrases avec des adjectifs attributs de l’objet avait déjà été décrite par Damourette et Pichon (1911-1940) dans le tome II de leur ouvrage (p.18), à propos de la phrase : Je veux ma robe rouge. Selon eux : La confusion peut se produire pour la dianathète de l'ayance. Soit la phrase: « Je veux ma robe rouge ». Rouge est-il épithète ou diathète ? Plusieurs critères permettent de préciser : S'il y a pausette après robe, on a affaire à une (échoite) dianathète : en même temps que je commande ma robe, j'indique ma volonté qu'elle soit rouge; s'il n'y a pas de pause, on a affaire à une épithète: j'exprime la volonté d'avoir celle de mes robes qui est rouge. L'allocutaire est donc renseigné sur l'intention du locuteur par la pause vocale. Selon le glossaire des termes spéciaux ou de sens spécial employés dans l’Essai de grammaire qu’ils font figurer dans leurs Compléments : DIATHÈTE : attribut à valeur adjective : « Je suis grand.» ÉCHOITE : attribut d’un complément autre que le sujet. DIANATHÈTE : attribut à valeur adjective d’attache moyennement serrée : « Petit poisson deviendra grand.» AYANCE : complément direct d’objet. On en déduit que l’expression dianathète de l’ayance désigne la fonction attribut du complément direct d’objet. 4. La pausette Damourette et Pichon emploient, dans le passage cité, le terme de pausette fondé sur une classification des pauses proposée dans le tome I (§169 p.188). Ils distinguent trois types de pauses : 66 1. Grandes pauses : pauses finales des phrases marquées d’ordinaire par le point. 2. Pausules : petites pauses marquées d’ordinaire par la virgule. 3. Pausettes : très petites pauses pour lesquelles la graphie actuelle ne dispose malheureusement d’aucun signe de ponctuation, encore que le besoin s’en fasse à chaque instant sentir. Ou plus simplement selon leur glossaire : PAUSULE : pose (sic) vocale marquée ordinairement par une virgule. PAUSETTE : pose vocale moindre que celle marquée ordinairement par une virgule. Donc la présence d’une courte pause entre le nom et l’adjectif est, selon eux, l’indice prosodique d’une fonction attribut de l’objet, l’absence d’une telle pause contribuant à faire interpréter l’adjectif comme épithète. 5. A la recherche de pausettes La première question à laquelle il fallait répondre était si les pausettes de Damourette et Pichon était un indice permettant d’opposer adjectifs épithètes et attributs de l’objet. Dans l’affirmative on pouvait alors se demander quelle était l’importance de cet indice, s’il apparaissait systématiquement. Il suffisait pour cela de chercher un silence entre le nom et l’adjectif subséquent. Cette analyse a révélé la présence de six pausettes entre 50 ms et 100 ms (la moyenne étant de 72 ms) Sur les 60 phrases avec un adjectif en fonction a.c.o. , cela ne représente que 10% de ce qui aurait pu être réalisé. Il n’y a qu’une seule pausette réalisée par un homme, toutes les autres se trouvant dans la réalisation des 3 locutrices. L’exemple de Damourette et Pichon pour illustrer le rôle de la pausette comme indice prosodique de l’a.c.o., est en conformité avec cette constatation, puisqu’il ne peut être prononcé que par une locutrice : Je veux ma robe rouge. Et 3 des 6 pausettes apparaissent dans la phrase 8, entre homme et intraitable de : J’ai connu cet homme intraitable (suivi de la phrase : Il est maintenant doux comme un agneau.). Ceci est illustré dans la Figure 2 où on peut remarquer une pausette de 68 ms entre homme et intraitable dans la courbe supérieure. 67 Figure 2. Mise en regard de la forme de deux courbes pour la phrase 8 réalisée par la première locutrice (loc1). On remarquera la pausette de 68 ms (entre homme et intraitable) manifestée par l’interruption du trait dans la courbe supérieure. D’autres indices prosodiques ont été cherchés. Ils seront d’abord présentés séparément puis leur importance relative sera évaluée. 6. Comparaison de la forme des courbes Elle a été appréhendée par des mesures prises sur les voyelles : leur durée et leur fréquence ; plus exactement : la fréquence initiale du fondamental la fréquence finale du fondamental La fréquence du fondamental des consonnes voisées n’a pas été mesurée pour la comparaison entre voyelles, mais elle a été prise en compte pour la description de la fin des phrases. La mesure de la fréquence a été effectuée le plus souvent par : l’AMDF (Average Magnitude Difference Function Pitch Extractor) proposé par Ross et al. (1974), mais, parfois, pour des parties où le signal était trop faible, trois autres algorithmes ont été utilisés : la fonction peigne proposée par Martin (1981), un algorithme fondé sur une méthode d’autocorrélation de Boersma (1993), une simple F.F.T. (Fast Fourier Transform), qui a , semble-t-il, donné des mesures précises. 68 Ces algorithmes sont proposés dans trois suites logicielles : PHONÉDIT, SPEECH ANALYZER, et PRAAT. La hauteur du fondamental a ensuite été évaluée par une conversion des fréquences en demi-tons avec 100 Hz comme valeur de référence (100 Hz = 0 demi-ton, toute valeur inférieure à 100 Hz, exprimée en demi-tons, devenant négative). 7. La durée des voyelles Une stratégie de différenciation des fonctions aurait pu être d’allonger les voyelles des phrases comportant un adjectif a.c.o. ; Une comparaison, voyelle par voyelle (échantillons appariés) des durées pouvait montrer que les voyelles des phrases comportant un adjectif attribut de l’objet sont plus longues que celles des phrases comportant un adjectif épithète ; c’est ce qui a été vérifié en comparant pour chaque paire de voyelles, leur durée. Pour chaque phrase le degré de signification des différences a été vérifié soit par le test du t de Student (si les distributions des échantillons étaient normales selon le test de Liliefors) soit par le test de Wilcoxon. Ceci n’est avéré que pour 3 phrases, n°1, 4 et 6, prononcées par la première locutrice (loc1) où les : voyelles des phrases avec a.c.o. sont plus longues que celles des phrases avec adjectif épithète de 20,3 ms de moyenne (degré de signification du t de Student < .02). Un modèle mixte3, avec, en variable dépendante, la durée, en facteur fixe, la catégorie épithète/a.c.o. et, en facteur aléatoire, les locuteurs, a été utilisé pour en tester la pertinence de cet indice dans la distinction des catégories. Ceci a été confirmé avec un degré de significativité inférieur à .01 (< .0001). 8. Recherche d’une différence de hauteur systématique Pour chaque paire de voyelles, la hauteur a été comparée. Pour chaque phrase le degré de signification des différences a été contrôlé soit par le test du t de Student (si les distributions des échantillons étaient normales selon le test de Liliefors) soit par le test de Wilcoxon. Il y a différence significative de hauteur pour 10 phrases, 7 différences positives (soit 11.66%) et 3 différences négatives (soit 5%). Les phrases 4 et 8, présentent à elles-seules 5 des différences positives et on se souvient qu’elles 3 Dans le modèle mixte utilisé pour cette étude, ce sont les moyennes de la durée des voyelles, d’une part, des phrases avec adjectif épithète et d’autre part, de celles avec adjectif a.c.o. , qui ont été comparées. Les deux autres modèles mixtes, dont il sera question plus loin, portaient sur les moyennes des hauteurs et celles des montées mélodiques finales. 69 comportent 4 des 6 pausettes apparues dans le corpus, 3 pour la phrase 8, et 2 pour la phrase 4. Mais cette phrase 4 présente une différence négative de 3 tons. Un modèle mixte, analogue à celui utilisé pour confirmer la pertinence de l’indice de durée, mais avec la hauteur en variable dépendante, a été utilisé. Un degré de significativité inférieur à .01 (= .007) a, là-aussi, confirmé cette pertinence. 9. L’indice de la montée mélodique finale 48 des 120 phrases (soit 40%) présentent une montée finale de la mélodie; cette montée se réalise parfois seulement au sein de la dernière voyelle : il y a alors glissando (montée mélodique au sein d’une voyelle), mais la montée peut aussi commencer depuis le début de la consonne précédant cette voyelle finale et continuer jusqu’à la fin de la consonne suivante : dans ces deux derniers cas il y aurait ce qu’on peut appeler « montée intraconsonantique» que l’on peut opposer à « glissando simple ». A deux ou trois reprises, la montée se termine dans la réalisation d’un schwa, s’étendant sur deux voyelles et une consonne. Quatre cas de figures sont donc possibles : glissando montée intraconsonantique + glissando montée intraconsonantique + glissando+ montée intraconsonantique montée intraconsonantique + glissando+ montée intraconsonantique + schwa (cf. figure 3) Figure 3. Phrase 3 Il a vu le feu orange. avec adjectif a.c.o. prononcée par le locuteur 6. Cette réalisation a la particularité de présenter une montée mélodique finale d’un peu plus de 7 tons (montée la plus importante relevée dans cette étude). Parmi ces phrases, 29 comportent un adjectif attribut de l’objet, (ce qui constitue environ 24% des phrases cette catégorie), et 19 phrases ont un adjectif épithète. Le tableau 1 présente les données concernant les deux ensembles de phrases. Là-aussi, la pertinence de cet indice est confirmée par un modèle mixte (avec la montée mélodique finale en variable dépendante), le degré de significativité étant ici égal à .001. 70 10. Étude de l’importance relative de chaque indice Des analyses en termes d’agrégation ("clustering"), en l’occurrence des classifications k-means, ont été pratiquées pour vérifier l’importance de chaque indice dans la discrimination entre les deux fonctions épithète et attribut de l’objet. Pour chaque analyse, il y avait 1000 itérations. L’idéal aurait été d’obtenir une répartition avec 60 phrases comportant un adjectif épithète dans une classe 1 et 60 phrases avec adjectif a.c.o. dans une classe 2. Le résultat est très loin de ce qui aurait été souhaité. Mais même si on s’en approche que de très loin, les proportions entre les différents résultats peuvent servir à évaluer l’importance relative des 4 indices. Ainsi la différence entre le nombre de phrases avec adjectif épithète et celui de phrases avec adjectif a.c.o. dans chaque classe peut donner une estimation de l’importance de chaque indice pour la discrimination entre les deux fonctions. Ceci est résumé dans le tableau 2. Tableau 2. Résultat des tests de classification k-means. La classe 1 comporte (sauf exception pour l’indice de la hauteur) un plus grand nombre de phrases avec adjectif épithète et la classe 2 un plus grand nombre de phrases avec adjectif a.c.o. ; la différence entre chaque type de phrase dans chaque classe figure à la dernière ligne. pausette montée mélodique finale durée hauteur 1 2 1 2 1 2 1 2 classe phrases avec adjectif épithète 60 phrases avec adjectif a.c.o. 54 différence 6 0 47 13 39 21 30 30 6 43 4 17 37 23 30 30 2 0 Il apparaît que la pausette est l’indice qui a la fonction discriminante la plus importante. Ceci peut paraître surprenant parce qu’il n’y a que 6 pausettes réalisées sur les 60 possibles. Mais la manifestation des autres indices est aussi très restreinte. La montée mélodique finale vient après, dans cette classification, avec une différence de 4 éléments dans chaque classe (4 phrases avec adjectif épithète de plus dans la classe 1, et 4 phrases avec adjectif a.c.o. de plus dans la classe 2). Le rôle discriminant de l’indice de la durée est très réduit puisqu’il n’y a une différence de 2 éléments. Enfin l’indice de la hauteur des voyelles n’a aucun rôle discriminant puisqu’on a un nombre égal de phrases avec un adjectif de chaque fonction dans les deux classes. La hiérarchie des indices est donc la suivante : pausette, montée mélodique finale, durée et hauteur. La hauteur ne semblant avoir aucune fonction discriminante, il n’a pas la fonction d’un indice permettant de différencier les deux fonctions de l’adjectif contenu dans les phrases. Par ailleurs, ce résultat est obtenu 71 à partir d’un nombre de locuteurs trop faible pour qu’il soit significatif d’un point de vue statistique. Néanmoins quelques indices susceptibles d’être utilisés à l’encodage, ont été dégagés. On pouvait se demander, s’ils intervenaient au décodage, comment les auditeurs faisaient pour obtenir une indication sur la fonction d’un adjectif placé immédiatement après un nom complément d’objet direct ; c’est ce qui a été cherché par des tests de perception. 11. Tests de perception Pour savoir plus précisément quel est le rôle de chaque indice dans l’indication de la fonction, deux tests faisant intervenir des stimuli naturels et synthétiques ont été préparés à partir de deux phrases du corpus prononcées par la première locutrice (loc1) : la phrase n° 8 : J’ai connu cet homme intraitable. et la phrase n° 6 : Il boit son chocolat froid. Pour chaque test, la phrase avec adjectif épithète et celle avec adjectif a.c.o. prononcée par la locutrice 1 a constitué respectivement le stimulus st000 et le stimulus st100. Les autres stimuli ont été manipulés à partir de ces phrases grâce au logiciel Praat. Dans le premier test, la durée de la pausette était de 68 ms et dans le second test la montée mélodique finale était de de 5.5 demi-tons. La nature des stimuli est résumée dans le tableau 3. Après une phase d’écoute préliminaire, ces stimuli ont ensuite été présentés dans un ordre aléatoire, à 7 s d’intervalle, à 18 sujets (étudiants de 1ère année de lettres modernes). Chaque série de stimuli était répétée 3 fois. Tableau 3. Description des stimuli utilisés pour les deux tests de perception ; la pausette avait une durée de 68 ms et la montée mélodique finale était de 5,5 demitons. 72 Une feuille de réponses devait être remplie de la façon suivante ; dans le premier test (phrase : J’ai connu cet homme intraitable.), pour chaque stimulus présenté, il était demandé de cocher une case oui ou non en réponse à la question : l’homme at-il changé ? La question dans le second test (phrase : Il boit son chocolat froid.) était : le chocolat a-t-il refroidi ? et il fallait aussi cocher une case. Les cas de refus de réponse ont été pris en compte dans l’analyse (ils seront notés nsp sur les graphiques). L’analyse globale des résultats a été l’occasion de constater la difficulté des tests puisque aucune conclusion significative d’un point de vue statistique n’a pu être obtenue pour les 18 auditeurs. C’est pourquoi il a fallu faire une sélection parmi les résultats : 6 auditeurs ont donc été choisis en fonction de la diversité et de l’exactitude de leurs réponses en ce qui concernait les stimuli non modifiés. Une analyse factorielle des correspondances (AFC) a été pratiquée, pour le premier test, sur les données obtenues, représentées selon le tableau de contingence de la figure 4. Le test d’indépendance entre lignes et colonnes (khi2) est significatif à .006. Vue 3D du tableau de contingence 14 12 10 8 6 4 2 0 st000 st001 st010 st011 st100 Lignes st101 épi st110 aco st111 nsp Colonnes Figure 4. Tableau de contingence des résultats du premier test ; sur les 18 sujets initiaux, 6 ont été sélectionnés et il y a eu 3 présentations d’une série de 8 stimuli, ce qui fait un total de 144 données. 73 La figure 5 représente le "mapping" de cette analyse. Il montre une répartition très nette des stimuli selon l’axe F1 : o les stimuli dont l’adjectif est perçu en fonction a.c.o. comportent tous une pausette o les stimuli dont l’adjectif est perçu en fonction épithète ne comportent pas de pausette o une exception cependant quand la pausette est le seul indice de la fonction a.c.o. (st100), le stimulus est à la limite mais perçu avec adjectif épithète. Selon l’axe F2 : o une élévation de hauteur moyenne des phrases assure une meilleure identification de la fonction a.c.o. de l’adjectif o l’absence d’une telle élévation laisse apparaitre le doute, même s’il y a présence simultanée d’une pausette et d’un allongement relatif (st101) Graphique symétrique (axes F1 et F2 : 100,00 %) 0.4 st111 0.2 st110 épi st011 F2 (15,54 %) 0 st100 st010 aco st000 st001 -0.2 st101 -0.4 nsp -0.6 -0.8 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 F1 (84,46 %) Colonnes Lignes Figure 5. Mapping de l’analyse factorielle des correspondances sur les résultats du premier test. L’axe F1 représente, d’un côté, les phrases avec pausette et, de l’autre, les autres phrases. Selon l’axe F2, une élévation de hauteur moyenne des phrases assure une meilleure identification de la fonction a.c.o. de l’adjectif et l’absence d’une telle élévation laisse apparaitre le doute. 74 Une AFC a aussi été pratiquée sur les données du second test, leur tableau de contingence est présenté figure 6. Le test d’indépendance entre lignes et colonnes (khi2) est ici encore plus significatif (.0001). Vue 3D du tableau de contingence 18 16 14 12 10 8 6 4 2 0 épi st000 st100 aco st011 st111 nsp Colonnes Lignes Figure 6. Tableau de contingence des résultats du second test. Comme il n’y avait que 4 stimuli par série, le nombre de données présentées n’est que de 72. La figure 7 représente le mapping de cette analyse. On remarque que: selon l’axe F1, la montée mélodique finale assure exactement le même rôle que la pausette dans l’identification de la fonction a.c.o. (stimuli avec montée) et épithète (stimuli sans montée). Il y a donc ici analogie avec les résultats du premier test. l’axe F2 montre que la présence de la montée mélodique finale permet à elle-seule une reconnaissance sûre de la fonction de l’adjectif, contrairement à la pausette. 75 Graphique symétrique (axes F1 et F2 : 100,00 %) 0.6 nsp 0.4 st111 F2 (12,40 %) 0.2 st000 st011 -1E-15 aco épi -0.2 st100 -0.4 -0.6 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 F1 (87,60 %) Colonnes Lignes Figure 7. Mapping de l’AFC sur les résultats du second test. L’axe F1 montre que la montée mélodique finale est perçue comme un indice prosodique de la fonction a.c.o. ; Il y a analogie et complémentarité, au niveau de la perception , entre montée mélodique finale et pausette. 12. Conclusion Deux indices principaux permettant d’opposer la fonction épithète et a.c.o. à l’encodage comme au décodage se dégagent : la montée mélodique finale et la pausette. Mais il est surprenant de constater que la pausette, le seul indice qui ait déjà été décrit est peu utilisé, et à une exception près, que par des locutrices. La montée mélodique finale est l’objet d’un emploi plus important, mais on la trouve aussi dans des phrases avec adjectif épithète ; ceci n’entrave pas son rôle discriminant entre les deux fonctions comme le prouve le test de perception. Enfin, une hauteur moyenne plus importante de la phrase vient renforcer le rôle discriminant de la pausette au décodage. Ces indices sont peu utilisés mais ils relèvent de la prosodie, ce qui explique leur caractère facultatif. Cette étude a porté sur un corpus de phrases lues. La validité des indices trouvés doit maintenant être vérifiée sur des corpus de parole spontanée. Références bibliographiques Bachelet, R. (2010). L’analyse lille.fr factorielle des correspondances. http://rb.ec- 76 Blanche-Benvéniste, C. (1991). Deux relations de solidarité utiles pour l’analyse de l’attribut. Gaulmyn, M.M, Rémi-Giraud, S. & Basset, L. (éds), À la recherche de l'attribut, PUL, Lyon, pp. 83-98. Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proc. of the Institute of Phonetic Sciences of the University of Amsterdam 17, pp.97-110. Cibois, P. (2007). Les méthodes d’analyse d’enquêtes. PUF, Paris. CRISCO (2011), Dictionnaire des synonymes, http://www.crisco.unicaen.fr/ Damourette, J. & Pichon, E. (1911-1940). Des mots à la pensée. Essai de grammaire de la langue française, tomes I, II et Compléments. D'Artrey, Paris. Fuchs, C. (1996). Les ambiguïtés du français. Ophrys, Gap-Paris. Le Goffic, P. (1993). Grammaire de la phrase française. Hachette, Paris. Martin, F. (2006). Prédicats statifs, causatifs et résultatifs en discours – Sémantique des adjectifs évaluatifs et des verbes psychologiques. Thèse présentée à l’Université libre de Bruxelles. Martin, Ph. (1981). Mesure de la fréquence fondamentale par intercorrélation avec une fonction Peigne. Actes des XIIèmes Journées d’Étude sur la Parole, Montréal. Noailly, M. (1999). L'adjectif en français. Ophrys, Gap-Paris. Ploux, S. (1997). Modélisation et traitement informatique de la synonymie. Linguisticae Investigationes, 21/1, pp.1-28. Ploux, S. & Victorri, B. (1998). Construction d’espaces sémantiques à l’aide de dictionnaires de synonymes. Traitement automatique des langues 39, n°1, pp.161-182. Riegel, M. (1991). Pour ou contre la notion grammaticale d'attribut de l'objet: critères et arguments. Gaulmyn, M.M., Rémi-Giraud, S. & Basset, L. (éds), À la recherche de l'attribut. PUL, Lyon, pp.99-118. Riegel, M., Pellat, J.C. & Rioul, R. (1994). Grammaire méthodique du français. PUF, Paris. Ross, M.J., Schaeffer, H.L., Cohen, A., Freudberg, R. & Manley, H.J. (1974). Average Magnitude Difference Function Pitch Extraction. IEEE Trans ASSP22, pp.353-362. Thomas, I. (2003). Quels types de données pour la traduction automatique de l’adjectif qualificatif dans les groupes ADJ NOM/NOM ADJ : vers une approche ontologique et contextuelle. Bulletin de Linguistique appliquée et générale 28, pp.255-274. Logiciels utilisés PHONÉDIT développé par la société S.Q.Lab en collaboration avec le Laboratoire Parole et Langage d’Aix-en-Provence (C.N.R.S. URA 261). PRAAT logiciel d’analyse et de synthèse de la parole développé par Paul Boersma and David Weenink, Phonetic Sciences, University of Amsterdam. SPEECH ANALYZER version 3.0.1. (2007) développé par la SIL (Dallas) XLSTATS logiciel de statistiques et d’analyse de données développé par Addinsoft 77 OBITUARIES Eli Fischer-Jørgensen (1911- 2010) At the age of ninety-nine, Emeritus Professor Eli Fischer-Jørgensen died at her home in Denmark in February, 2010. This marked the end of a very long and distinguished career that had begun in 1929 with studies of the French and German languages which were firmly in the Danish tradition stemming from great scholars of the linguistic sciences, such as Otto Jespersen. While still a student, she was accepted into the Linguistic Circle of Copenhagen, which was famous for the "glossematic" theories of Louis Hjelmslev. He was a scholar who may be easy to overlook due to the fact that he collaborated with a colleague (Poul Andersen) to produce a practical textbook for their students of phonetics. While still a student, Eli developed her lifelong passion for integrating observational and instrumental phonetic work with phonological theory. Graduating MA in 1936, she set off on travels to and sojourns in places which included Marburg (for German dialectology), Paris to work with Martinet and Marguérite Durand and Berlin to study with Eberhard Zwirner. Returning home just before the outbreak of World War II, she got work in the Department of German which, in due course, morphed into a lectureship in phonetics created for her under the aegis of Hjelmslev. After the War, she extended her experience by visits to London to the Phonetics Department at University College to study with Jones and Hélène Coustenoble and also to the School of Oriental and African Studies to attend lectures by J. R. Firth and on Yoruba and Chinese as well. Other journeys took her to America to the Haskins Laboratories and to Stockholm to cooperate with Gunnar Fant. At home, her work became recognised by the creation of a Chair of 78 Phonetics for her in 1966 and an associated institute. Fruitful connections with colleagues at Lund also followed. As time went by, she became the host herself of researchers from abroad, including individuals from Japan, Edinburgh, Berkeley and Germany. A most memorable and brilliantly managed (very much by her) occasion was the 1979 visit to Copenhagen for the Ninth International Congress of Phonetic Sciences. This was something of a swan song for her since two years later, on her reaching 70, regulations no doubt required her to relinquish her post. Her varied publications were far too many to detail here. They included a classic account of the Danish stød, the historical Tryk i ældre dansk (on Stress in Old Danish), Trends in phonological theory and her accounts of the phonetic symbolisms of vowels. Nor should her modest, concise, clear summary of general phonetics for her Danish students, Almen Fonetik, be quite forgotten. She was held in high esteem amongst her friends for her gifted water colours. She'll be remembered for a long time to come. Jack Windsor-Lewis Eva Sivertsen (1922-2010) Eva Sivertsen was born on the 8th of July, 1922 at Trondheim, the ancient city on the shores of a fjord in the middle of Norway's thousand-mile coastline. She graduated in English at the University of Oslo continuing her studies there with a Ph.D. on the famous dialect of working-class Londoners, known as Cockney. This activity developed after some years of further work into the 280-page book published by Oslo University Press in 1960 as Cockney Phonology. She did much of her work on Cockney from a base at University College London's Department of Phonetics, but also lived for a while among her main informants at a social settlement in the East End area of Bethnal Green. Besides the influence of the contemporary and previous UCL staff which she clearly acknowledged, she 79 became a great enthusiast for the work of the American structuralists. The influence of Charles F Hockett certainly pervades the whole book. A three-page review of it in Le Maître Phonétique by J. D. O'Connor began "the standard work on Cockney Phonetics has now been written" and ended with "altogether a splendid book". She included in it also an admirable "conspectus of the general problems posed by the phonological analysis of English" thus making it "two books in one". Besides being a brilliant scholar she was an equally gifted administrator, as was seen when she became a principal organiser of the Eighth International Congress of Linguists in 1957 and edited its volume of Proceedings. In 1960, she headed the Department of English at Trondheim University. She ultimately became the Rektor of the whole University. She always maintained an interest in the teaching of English as an extra language in its grammar and other linguistic features, as well as its phonology. She was an outstandingly energetic person physically, as well as intellectually — much given to outdoor pursuits with remarkable endurance. She never married, but she had many friends by whom she was well liked. Jack Windsor-Lewis Gösta Bruce (1947-2010) (picture courtesy of Daniel Bruce) Gösta Bruce, Professor of Phonetics at Lund University, Sweden, passed away on June 15, 2010, following a short period of hospitalization. He was 63 years old. Gösta Bruce is survived by his wife, Barbro, and his children Sara (with partner Valtteri), Daniel, and Niklas. Born and brought up in the southern Swedish town of Helsingborg, Gösta chose to continue his higher education at Lund University, 60 km south of 80 Helsingborg. After an undergraduate degree in Russian, Gösta went on to study phonetics, drawn to the department where Bertil Malmberg and Kerstin Hadding had developed the field of phonetics as an experimental discipline at the Humanities faculty at Lund University. Under the direction of Hadding’s successor, Eva Gårding, Gösta Bruce developed the Lund model of intonation. He carried the phonetic analysis of Swedish word accents in a new direction by analysing them with respect to their syntactic position and pragmatic function (focus) in utterances. His seminal dissertation, Swedish Word Accents in Sentence Perspective (1977) laid the theoretical foundation for the development of ideas about how intonational phenomena could be analysed as components in a hierarchical prosodic structure. These fundamental ideas on intonational structure and their relation to syntax and pragmatics have since been adopted and developed by many researchers the world over. Following a research stay at Bell Labs in 1984, as well as a period as a visiting professor at Stockholm University during 1985-1986, Gösta Bruce was appointed to the chair of phonetics at Lund University in 1986. The contributions to the festschrift to Gösta on the occasion of his 50th birthday in 1997 (Horne, 2000) bear witness to the influence that his work had for researchers, not only in phonetics, but also in general linguistics and in speech technology. Although Gösta Bruce’s model was based on the prosodic patterning of ‘standard’ central Swedish, Gösta’s own dialect, that of Helsingborg in the southern province of Scania, differed quite considerably from that of the standard variety. This variation in the patterning of word accents in Swedish dialects was an area that intrigued Gösta as it had earlier Eva Gårding (1977) and Ernst Meyer (1937–1954). Gösta Bruce followed in their footsteps and carried the investigation of dialectal variation to new heights in his work on prosodic modeling. Although the phonetic realization of the two Swedish word accents differs quite considerably dialectally, the crucial timing difference between the word accents with respect to the stressed syllable is something that is constant for all dialects and is something which fascinated Gösta. He had an extremely sensitive ear for tonal variation and timing, and in recent years, his work was focused on systematizing this variation as regards Swedish dialect prosody in several externally financed research projects such as SweDia 2000 and SIMULEKT. Shortly before his untimely death, his vast accumulated knowledge on the varieties of Swedish was published in his book Vår fonetiska geografi ‘Our phonetic geography’ (Bruce, 2010). Gösta’s sensitivity for timing differences also lead to a number of novel studies on rhythmic structure in Swedish. By carrying out a number of innovative experimental studies on differences in the duration of unstressed syllables, he could show how rhythmic alternation was created postlexically in strings of nonprominent syllables (Bruce, 1987). Gösta Bruce was not only a creative researcher and scientist; he was also a dedicated and respected teacher. His undergraduate courses on prosody, Swedish 81 dialect variation and sounds of the world’s languages were always highly evaluated. At the time of his premature death, Gösta was planning to rework and update his very popular course book on Swedish prosody (Bruce, 1998). On the graduate level, Gösta was regularly engaged in doctoral courses on both a local and national level. He was a devoted teacher and supervisor, and during his career, Gösta supervised 13 doctoral dissertations. He sincerely cared about his students and constantly inspired and encouraged them, both by his words of wisdom and by his empathetic manner. His humor, often spontaneously expressed in terms of perfect sound imitation (everything from different Swedish dialects to Russian intonation to complex African click consonants), was another productive outlet for his very creative mind. Despite all his research and teaching duties, Gösta Bruce played an important role in academic leadership at Lund University. During his time as professor, he served as head of the department of linguistics and phonetics, vice dean of the humanities faculty, chairman of the appointments’ board for language and linguistics, and most recently, member of the board of research at the Center for Languages and Literature. He was also engaged as an expert evaluator at the Swedish and Norwegian Research Councils and was a member of the editorial board of Phonetica. In addition, he was an active member in several learned societies, including The Royal Swedish Academy of Letters, History, and Antiquities. In 2007, Gösta Bruce was appointed president of the International Phonetic Association. In this role, Gösta saw the opportunity to approach a discussion of fundamental issues related to the future of the discipline of phonetics, including the relationship of prosodic research within a larger interdisciplinary perspective where phonetics plays a central role in understanding speech processing phenomena. Due to his untimely death, however, many of Gösta’s plans were tragically left at the planning stage. Following a suggestion by Gösta’s family at the time of his funeral, the IPA set up a memorial fund to honor Gösta and his accomplishments. Since that time, the IPA Council has decided to make the fund a permanent fund. The Gösta Bruce Memorial Fund is intended to serve as a means to support students in phonetics and speech sciences by awarding scholarships in Gösta’s name that will assist them in traveling to ICPhS conferences in order to meet other speech scientists and present their research results to the international community. Nothing could be more fitting to keep the memory of Gösta Bruce’s many scientific accomplishments and his constant devotion to developing knowledge of phonetics alive. References Bruce, Gösta. 1977. Swedish word accents in sentence perspective. (Travaux de l’Institut de linguistique de Lund XII). Lund: Gleerup. Bruce, Gösta. 1987. On the phonology and phonetics of rhythm: Evidence from Swedish. 82 In Dressler, W., Luschützky, H., Pfeiffer, O. & Rennison, J. (Eds.), Phonologica 1984. Proceedings of the Fifth International Phonology Meeting, Eisenstadt, 25–28 June 1984, pp. 21-32. Cambridge: Cambridge University Press. Bruce, Gösta. 1998. Allmän och svensk prosodi [General and Swedish prosody]. (Praktisk lingvistik 16). Dept. of linguistics and phonetics, Lund University. Bruce, Gösta. 2010. Vår fonetiska geografi [Our phonetic geography]. Lund: Studentlitteratur. Gårding, Eva. 1977. The Scandinavian word accents (Travaux de l’Institut de linguistique de Lund XI). Lund: Gleerup. Horne, Merle (Ed.). 2000. Prosody: Theory and experiment. Studies presented to Gösta Bruce. Dordrecht: Kluwer. Meyer, Ernst A. 1937-1954. Die Intonation im Schwedischen [Intonation in Swedish], 2 vols. (Stockholm Studies in Scandinavian Philology, 0562-1097). Stockholm: Fritzes. Merle Horne Professor of general linguistics Dept. of linguistics and phonetics Lund University, Sweden Ilse Lehiste (1922 – 2010) (picture by courtesy of Sarah Ritschert) One of the greatest phoneticians who, was a remarkable scientist, passed away. Ilse Lehiste, born on January 31, 1922 in Tallinn, Estonia, died at Riverside Methodist Hospital on Saturday, December 25, 2010. She was born into the family of a higher officer. She started her studies in Estonia: graduated from the Lender high school, then studied piano for one year at the Conservatory of Tallinn, and she came up to the University of Tartu, Faculty of Arts (1942). 83 After two years, she continued her studies in Germany because she left Estonia as a refugee in 1944, fleeing the Soviet invasion of her homeland. At first, she studied at the University of Leipzig and then at the University of Hamburg. Her postgraduate studies concentrated on the work of William Morris, the manysided Victorian designer, artist, writer, and socialist. She was especially interested in the motives of the Nordic literature in his work. She defended her PhD in Philology at the University of Hamburg in 1948. At that time, she lived in a refugee camp in Germany. During the next year she moved to United States, where she continued her studies. Here, she was engaged especially in linguistics. In 1959, she defended her second PhD at the University of Michigan. Her main research was acoustic phonetics, besides this she was engaged in other fields of linguistics: prosody, language contact, Estonian, phonetics and phonology, Serbo-Croatian accentology. After receiving her PhD, she spent four years at the Communication Sciences Laboratory there as a research associate. In 1963, Ilse Lehiste joined the linguistics faculty at The Ohio State University (OSU), Columbus. At first, she spent two years in the Slavic Department, then she was elected to be the Linguistics Department’s first Chair when it was founded in 1965. She enjoyed a long and especially distinguished career at OSU: she was elected Professor in Linguistic in 1965. Since 1987, she was continuing as Professor Emeritus. She has given exciting lectures at universities and at conferences all over the world. She was not only a linguist, but a phonetician. She worked to build a bridge between the linguists of Estonia and the West. That interest is exemplified by the 11th International Phonetics Conference which was organized in Tallinn in 1987 because of her suggestion. She was a Renaissance person: linguist, literateur, poet, musician, etc. Her poems were published in 1989 (Noorest peast kirjutatud laulud). She analyzed the Estonian literature and she wrote several overviews for the World Literature Today in the United States. In the past decade, she was cooperating with the Institute of Estonian and General Linguistics of the University of Tartu to investigate Finno-Ugric prosody. Lehiste left behind an enormous body of work: she was author, co-author or editor of twenty books, two hundred articles and around a hundred reviews. I would like to emphasize only one of her admirable books. She was employed in researching the production and perception of suprasegmental features, and the general work, called Suprasegmentals was published in 1970. Lehiste summarized what was known about the phonetic nature of suprasegmentals and evaluated the available evidence from the point of view of linguistic theory. Ilse Lehiste attended the Speech Research ’89 Conference in Budapest (Hungary) more than 20 years ago offering her help to the conference organizers. It was a great experience for the Hungarian phoneticians to meet her personally. The title of her talk was The experimental studies of poetic rhythm. 84 The importance of her scientific work was well recognized by a number of professional bodies around the world. Lehiste has received a honorary doctorate from Essex University, England (1977), the University of Lund, Sweden (1982), Tartu University, Estonia (1989), and The Ohio State University (1999). She was a Fellow of the American Academy of Arts and Sciences (1990), Foreign Member of the Finnish Academy of Sciences (1998), and Foreign Member of the Estonian Academy of Sciences (2008). Ilse Lehiste will be remembered both personally and professionally. Viola Váradi Eötvös Loránd University Phonetics Department Budapest, Hungary 85 Svend Smith Award 2008 for Elisabeth Lhote Elisabeth Lhote was born in Toul. After graduating from high school, she studied French literature and linguistics at the University of Lille and was introduced to phonetics there. Motivated by her growing interest in general, experimental and applied phonetics, she moved to the Institute of Phonetics of Strasbourg University where she joined the research team around Georges Straka. Under his guidance, Elisabeth Lhote specialized in voice production and earned her doctorate in Phonetics in 1970 with a thesis on "La méthode glottospectro-graphique et la simulation de la parole" (Glottospectrography and the simulation of speech). She continued her career as a researcher under the supervision of Péla Simon, who had succeeded Georges Straka in the position of head of the Phonetics Department in 1971, and presented an excellent habilitation treatise in 1980 on "Analyse et synthèse de faits de langue au niveau du larynx" (Analysis and synthesis of laryngeal features). In 1980, Elisabeth Lhote was appointed Professor of Phonetics and head of the Phonetics Laboratory at the University of Franche-Comté in Besançon. In 1986, she became director of the Center of Applied Linguistics and head of the Laboratory of Speech Analysis. In these positions she was able to substantially develop and foster phonetics and applied linguistics at her university until her retirement in 1997. Elisabeth Lhote’s list of publications comprises 4 books and 65 articles. She started publishing the results of her research activities in the late sixties. Her first publications may be characterized as reports on detailed experimental investigations of the activities of the vocal cords by glottography and glottospectrography. Her findings shed new light on the acoustics of the glottal source, provided new impulses to the theory of phonation and stimulated new research initiatives in the domaines of intonation and tones in the tone languages. Later, her interests shifted to speech pathology and therapy, speech perception and comprehension, speaker recognition and foreign language teaching. As an academic teacher, Professor Lhote has supervised 18 doctoral and 2 habilitation theses. By her outstanding commitment, devotion and excellence as a researcher and academic teacher, she has profoundly promoted the phonetic sciences and applied linguistics in France, Europe and the world. ISPhS’s membership is proud to confer the 2008 Svend Smith Award to her. Jens-Peter Koester [email protected] 86 References Lhote, E. (1970). La méthode glottospectrographique et la simulation de la parole. Dr. dissertation, Strasbourg. Lhote E. (1973). Contribution à l'étude de la fonction linguistique du larynx. Phonetica, n° 28, p. 26-41. Lhote, E. (1982). La parole et la voix. Hamburg (Buske). Lhote, E. (Ed.) (1990). Le paysage sonore d'une langue, le français. Hamburg (Buske). Lhote, E. (1995). Enseigner l'ora1 en interaction. Percevoir, écouter, comprendre. Paris (Hachette). 87 PHONETICS INSTITUTES PRESENT THEMSELVES THE DEPARTMENT OF LANGUAGE AND COMMUNICATION STUDIES NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY, TRONDHEIM, NORWAY The Department of Language and Communication Studies, or in Norwegian: Institutt for språk- og kommunikasjonsstudier (ISK), is the only department in Norway where it is possible to study Phonetics. Its research is both fundamental and applied, and often cross-disciplinary. The Dragvoll campus, which houses the Department of Language and Communication Studies Study programmes The Department of Language and Communication Studies <http://www.ntnu.edu/isk> offers a full BA/MA programme in Phonetics <http://www.ntnu.edu/studies/bfon>. The programme covers all traditional areas of phonetics (transcription, physiology and articulation, acoustics, and speech perception) and focuses on experimental phonetics. All courses aim to combine phonetic theory with practical exercises, usually in the studio or in the phonetic lab. The Phonetics section is represented by two professors, Wim van Dommelen and Jacques Koreman. In addition to Phonetics, the Department of Language and Communication Studies offers full study programmes in General Linguistics and Applied Linguistics, as well as subsidiary programmes in Swahili and Norwegian as a Second Language. It is responsible for all Norwegian courses for exchange students at the Norwegian University of Science and Technology (NTNU). This varied environment, and collaboration with speech technologists at NTNU, opens up possibilities for a wide range of research themes. 88 Research The research in the Department of Language and Communication Studies covers comparative language studies and foreign language acquisition, speech perception, speaker recognition and speech technology. In a long-standing collaboration with Norwegian as a Second Language, Wim van Dommelen <http://www.hf.ntnu.no/hf/isk/Ansatte/wim.van.dommelen/ personInfo.html> has investigated the difficulties foreigners have in learning Norwegian. His research covers both segmental and supra-segmental properties. Tone and intonation has been (and is) an area of interest, especially the realization of Norwegian lexical tones, in which he has a tight collaboration with Linguistics. As a spin-off result of his involvement in the Sound-to-Sense project <http://www.sound2sense.eu/>, he is also involved in experiments on foreigners’ perception of English sounds in noise. This research is carried out in collaboration with University College London, the University of the Basque Country (Bilbao) and Radboud University in Nijmegen. The Sound-to-Sense project is a MarieCurie Research Training Network in which Ph.D. students and post-docs are trained outside their native country. It also brought Helena Spilková to Trondheim. Helena is carrying out her Ph.D. research on reductions in spontaneous conversational speech and comparing productions of native English speakers with productions of two groups of non-native speakers of English (Czech and Norwegian speakers). This research involves detailed phonetic analysis as well as evaluation of various context influences on the word realizations. In the same project, there is a collaborative research effort with Radboud University in Nijmegen on the systematic phonetic variation of word-final /t/ in Dutch, where the influence of linguistic (e.g. morphological structure) and probabilistic factors (word frequency) on the realization of canonical /t/’s is being investigated and compared to the way an automatic speech recognition system deals with such phonetic variation. Recently, the department has started a collaborative project which brings together theoretical expertise from phonetics with pedagogical experience from the Norwegian teachers in the department to build a computer-assisted pronunciation teaching system (CAPT). This system is based on VILLE <http://www.speech.kth.se/ville>, which was developed by KTH in Stockholm, who are also one of the partners in the project, “Computer-Assisted Listening and Speaking Tutor (CALST)” <http://www.ntnu.edu/isk/projects>. This project aims to not only adapt the Swedish system to Norwegian, but also extend it so that users can train with different dialects. The reason for this is that there is no accepted pronunciation standard for Norwegian, so that foreigners must learn to deal with different dialects in their communication with Norwegians to be able to understand different speakers. Besides focusing on different target dialects, the system is being developed for specific source languages (or native languages of the users), so that learners of Norwegian can be guided through pronunciation exercises that are relevant for their native language. This is done in detail for a few 89 major learner groups in Norway, but we also analyse a large number of languages in less detail. In order to do this, an automatic contrastive analysis of the phoneme inventory is made on the basis of UPSID (UCLA Phonological Segment Inventory Database) <http://www.linguistics.ucla.edu/faciliti/sales/software.htm#upsid>. The aim is to build a flexible, extendable interface for contrastive analysis between any language pair that can be used in CAPT applications for any language. Jacques Koreman <http://www.hf.ntnu.no/isk/koreman> is the project manager. He is interested in speech technology, and has previously worked on speech recognition with the use of phonetic features. He also coordinated research on biometric user authentication in the SecurePhone Project <http://www.secure-phone.info>, where he and his colleagues specifically worked with speaker recognition and fusion (combination) of different modalities (voice, face and signature). The biometric recognizer was also implemented on a PDA/mobile phone. Besides speech technology, he is interested in the voice and voice pathology. He has carried out research on the phonetic consequences of unilateral vocal fold paralysis with a colleague at Saarland University, Germany, where he worked before moving to Trondheim. He also investigated vocal fold aerodynamics using a Rothenberg mask. He is now involved in other research projects in collaboration with Saarland University (project leader) and with the Technical University in Berlin. These projects investigate the production and perception of prominent syllables in several languages, of which Norwegian is one. The investigations so far show that languages use different prosodic properties to signal that a syllable is prominent, which of course has implications for second language acquisition and perception. Equipment The department has a high-quality recording studio. Besides audio recordings, it is possible to record electroglottograms (Glottal Enterprises EG-2) as well as aerodynamic signals (Rothenberg mask). In addition, a motion capture system is being installed. Recording of the airflow and microphone signals in the studio 90 Location The Norwegian University of Science and Technology < http://www.ntnu.edu> (NTNU) consists of two campuses <http://www.ntnu.edu/about-ntnu/campuses>. The Gløshaugen campus is home to the engineering sciences, while Dragvoll hosts the humanist and social sciences. Dragvoll is just outside Trondheim, and a bus ride into the city centre takes 15 minutes. Most of the buildings are connected by glass-roofed streets, with a bookshop, a café, small shops and a student cafeteria, in addition to the university library, lecture halls and offices. Walking the indoor streets of Dragvoll or enjoying the sunny spell we call winter What else? Students can use the university’s sports facilities, and there is ample opportunity for hiking in the beautiful surroundings of Trondheim, which is situated next to a fjord. During the long winters, you can go skiing in “lysløper” (lighted ski trails) in the Estenstadsmarka close to Dragvoll, or in the Bymarka. There are also ski jumps, as well as alpine slopes, in the vicinity of Trondheim. There are many lakes where you can go for a swim in summer or skate in winter. The city itself is the third-largest city in Norway – but it is still small. It has a cozy atmosphere with its wooden houses, and is at the same time alive with its large student population and rich cultural life. Jacques Koreman e-mail: [email protected] THE PHONETICS LAB AND THE PHONOGRAM ARCHIVES AT ZURICH UNIVERSITY, SWITZERLAND The need for knowledge in phonetics as a language expert was probably one of the main motivations for the English philology professor Eugen Dieth to found the Phonetics Lab at the University of Zürich (UZH) in 1935 and to carry out phonetics research using early versions of palatography and sound kymography (Dieth, 1950). Apart from focusing on speech research activities, Dieth was also involved in descriptive work on dialectal variability. For this reason, he desired to 91 maintain the ‘Phonogram Archives’, which were co-founded in 1909 at UZH by Albert Bachmann and Louis Gauchat with the aim of collecting vernacular language recordings in the four Swiss national languages (German, French, Italian and Reto-Romance). At present, both the Phonetics Lab and the Phonogram Archives compose two inseparable institutions in the Faculty of Philosophy at UZH that have actively been involved in phonetics and dialectology research and teaching for the past decade. UZH is the largest of the 10 Swiss universities in terms of number of students and staff members. A need for knowledge in phonetics and speech sciences in both research and education exists across a wide variety of disciplines such as the philologies (German, English and Romance languages), psychology, general linguistics and others. The Phonetics Lab/Phonogram Archives can be viewed as a hybrid institute which serves research needs in a variety of departments and offers students from a wide range of disciplines the facilities and expertise to carry out projects in phonetics and speech sciences at Graduate, Postgraduate and Doctoral level. We do not offer degree courses specifically in phonetics, but it is part of the required program for most philology students (English, German and Romance languages) for them to visit the phonetics lectures provided by the Phonetics Lab. Students with a deeper interest in the subject then take part in voluntary higher level phonetics courses and graduate in a related discipline (at any level) with a focus in a phonetic topic. Supervision and examination of such students is provided by staff-members of the Phonetics Lab. Our lab consists of a sound-proof booth with a supervisory window that is well suited for high-quality speech recordings and speech perception experiments. The booth has high-end recording equipment permanently installed, and we carry out standard speech measurement and analysis techniques, like laryngography, palatography and phonatory aerodynamic analysis. We also own a large variety of portable recording devices and perceptual testing equipment for field work. In addition, we have our own research library with the main journals in the area of phonetics and speech sciences and a large number of monographs from all areas of spoken language, phonetics, linguistics, acoustics, and speech and hearing sciences. All of our facilities are easily accessible in the tower of the main UZH building right in the heart of Zurich. At present our team is formed by the following researchers who are actively involved in teaching and/or research in phonetics and speech archiving (alphabetically by surname): Camilla Bernardasci (Student Research Assistant) Dario Brander (Post-graduate Research Assistant) Volker Dellwo (PhD, Assistant Professor of Phonetics/Phonology) Elvira Glaser (PhD, Professor of German Linguistics and member of permanent leading board) Lea Hagmann (Student Research Assistant) 92 Ingrid Hove (PhD, part-time Lecturer) Marie-José Kolly (Research Assitant and PhD student) Adrian Leemann (PhD, Post-Doc in Phonetics/Speakeridentification) Michele Loporcaro (PhD, Professor of Romance Linguistics and Head-ofLab) Mathias Müller (Student Research Assistant) Stephan Schmid (PhD, PD, Senior Lecturer of Phonetics) Daniel Schreier (PhD, Professor of English Linguistics and member of permanent leading board) Michael Schwarzenbach (lic. phil, Research Assistant) Jürg Strässler (PhD, part-time Lecturer) Dieter Studer (lic. phil, Research Assistant) Sibylle Sutter (Post-graduate Research Assistant) Our research interests range from historical sound development over synchronic dialectology to speech production, acoustics and perception, and we work on segmental, as well as suprasegmental/prosodic levels of analysis. Work is currently being carried out on the distribution of rhythmic patterns across Italian and Swiss German dialects (Stephan Schmid), and we are interested in which functions rhythmic and timing variability may have in human speech communication (Volker Dellwo, Lea Hagmann, Mathias Müller). In a number of pilot studies, we found that there is significant rhythmic variability between speakers. We are now interested in how this variability can be used in areas like speaker identification (Volker Dellwo, Adrian Leeman, Marie-José Kolly, Stephan Schmid). For this project we received major grant funding for three years by the Swiss National Science Foundation (SNF). We are also interested in how this variability may help listeners to segregate two speakers speaking simultaneously (Volker Dellwo, Dario Brander, Sibylle Sutter; see Cushing & Dellwo, 2010). For this project we received one year start-up funding by the University of Zurich Research Fund. Another significant expertise in the group is dialectal distribution of sound patterns and the diachronic phonological development in Italian dialects (Michele Loporcaro & Stephan Schmid) and Swiss German (Elvira Glaser), as well as socio-phonetic distribution of speech features across non-standard varieties of English (Daniel Schreier). On a yearly basis, the Romance language oriented members of the group organize fieldwork trips to various regions of the Italian speaking world to systematically record a wide variety of Italian accents and dialects. These recordings have led to research on the distribution and functions of phonemic vowel quantity across different accents of Italian and to arguments about the historical phonological development of Romance languages (Loporcaro, 2007). For research into the historical development and synchronic dialectal variability of Swiss German (Fleischer & Schmid, 2006, Christen, Glaser & Friedli, 2010), the Phonogram Archives offer an impressive collection of sound 93 carriers which have been collected and archived over the past 100 years. This material contains valuable specimens of language varieties that have since become extinct or near-extinct – such as the West Yiddish dialect spoken in Lengnau and Endingen (Aargau) or the franco-provençal “Patois” – formerly spoken all over the Western (now French-speaking) part of Switzerland. It also contains early recordings on wax disc (collaboratively recorded with the Phonogram Archives of Vienna between 1909 and 1923), which are now part of the UNESCO Memory of the World Programme (Fleischer & Gadmer, 2002). Major projects of the archives (Dieter Studer, Michael Schwarzenbach, Lea Hagmann & Camilla Bernardasci) are currently the compilation of an on-line catalogue, the production of a digital version of the entire historic archives holdings (in collaboration with the Swiss national Sound Archives in Lugano) and the presentation of a major exhibition on Swiss dialects together with the Swiss National Library in Bern in 2012. In teaching, we offer a variety of lectures, seminars and practical lab sessions at an introductory and advanced level of phonetics. For students of philology, we have specifically designed courses in German, English and Romance phonetics. Additionally, we offer lab sessions in which higher level and postgraduate students learn experimental techniques in speech production, acoustic measurements and speech perception. In different lecture series, students are introduced to the main concepts, as well as specialist areas of phonetics (e.g. speaker idiosyncratic features or speech rhythmic variability). We have strong links to other departments like Experimental Audiology or Psychology with whom we provide collaborative PhD supervision. There are currently four PhD students in the lab, and the interest is growing. At present, both the Phonetics Lab and the Phonogram Archives are in a highly dynamic situation of change. Both institutions are co-directed in different ways by a board of professors from the philologies, Michele Loporcaro (Romance Linguistics), Elvira Glaser (German Linguistics) and Daniel Schreier (English Linguistics). While both institutions were rather separate entities during the past decades, a proposal is currently being carried out to unite them in a single unit (on a practical level, this process is nearly completed). In addition, the university recently decided to invest into the area of spoken language sciences and established a new Assistant Professorship in Phonetics/Phonology for which Volker Dellwo (formerly University College London) was hired in August 2010. With the merger of the Phonetics Lab and the Phonogram Archives, we are expecting to strengthen phonetics and dialectology research and teaching at UZH in the future. The group managed to attract grant funding in the past and at present. More major and minor grant applications have been submitted over the past months. We thus hope to further enlarge our research team and be able to offer more funded PhD research in Phonetic Sciences at UZH in the near future. Should we manage to convince UZH to make further investments into our lab (for example a full-professorship in Phonetics); our aim would be to set up a degree course in phonetics at the postgraduate level. 94 Further information on the Phonetics Lab, the Phonogram Archives and our dynamic situation can be found at our (still separate) webpages www.pholab.uzh.ch and www.phonogrammarchiv.uzh.ch. References Christen, H. , Glaser, E. and Friedli, M. (2010) Kleiner Sprachatlas der deutschen Schweiz. Huber: Frauenfeld. Cushing, I.R., and Dellwo, V. (2010) The role of speech rhythm in attending to one of two simultaneous speakers. In: Electronic Proceedings of Speech Prosody, Chicago/USA (http://speechprosody2010.illinois.edu/papers/100039 .pdf ) Dieth, E. (1950) Vademekum der Phonetik. Bern: Francke. Fleischer, J. and Gadmer, T. (2002) Schweizer Aufnahmen–Enregistrements Suisses–Ricordi sonori Svizzeri–Registraziuns Svizras. Sound Documents from the Phonogrammarchiv of the Austrian Academy of Science. The Complete Historical Collections 1899-1950, Series 6/1- 6/3. Wien: Österreichische Akademie der Wissenschaften, Zürich: Phonogrammarchiv der Universität Zürich. Fleischer, J. and Schmid, S. (2006) Zurich German. In: Journal of the International Phonetic Association 36.2: 243-253 Loporcaro, M. (2007) Facts, theory and dogmas in historical linguistics: vowel quantity from Latin to Romance. In : Salmons J. C. and Dubenion-Smith S. (eds.), Historical Linguistics 2005. Selected papers from the 17th International Conference on Historical Linguistics, Madison, Wisconsin, 31 July- 5 August 2005. Amsterdam, Philadelphia: John Benjamins, 311-336. Some staff members of the Phonetics Lab and the Phonogram Archives at Zurich University in front of our recording cabin/sound lab (from left to right: Volker Dellwo, Michael Schwarzenbach, Stephan Schmid, Ingrid Hove, Dieter Studer, Camilla Bernardasci). Volker Dellwo & Dieter Studer e-mail: [email protected] 95 CONFERENCE REPORTS Speech Prosody 2010 Chicago, USA, 11-14 May 2010 Speech Prosody is the biennial meeting of ISCA’s (the International Speech Communication Association) Speech Prosody Special Interest Group (SProSIG). In 2010, it was held in Chicago and was co-organized by various departments of the University of Illinois at Urbana-Champaign, the Northwestern Institute on Complex Systems and the Toyota Technological Institute. For five days (an externally organized Satellite Workshop on the perceptual and automatic identification of prosodic prominence took place on May 10th), more than 300 participants attended the 270 oral and poster presentations on aspects of prosody which play a role in various disciplines next to Linguistics, such as Psychology, Computer Science, Speech and Hearing Science, and Electrical Engineering. The general theme of Speech Prosody 2010 was the large diversity, as well as the universality of prosody, also addressed in the Keynote lectures: the role of prosody research in enriching speech engineering (Shrikanth Narayanan), prosodic cues in first and second sign language acquisition (Diane Brentari), representations of prosodic cues in computational models for language processing (Mari Ostendorf), prosody from an evolutionary perspective (Steven Mithen) and from a psycho- and neuro-linguistic perspective (Aniruddh Patel). Interestingly, the last two Keynote lectures related language to music, adding another interdisciplinary facet. In addition to the Keynotes, three of the special sessions included in the program can be regarded as highlights of this year’s conference. Their topics were computer aided pronunciation training and prosody, experimental approaches to focus, as well as shape, scaling, and alignment of F0 events. It has to be stated, however, that the quality of the papers and posters was generally very high. In particular, there were a large number of excellent student papers, which is a promising sign for the workshops and conferences to come. Stefan Baumann, Cologne 19th Annual Conference of the International Association for Forensic Phonetics and Acoustics (IAFPA) Trier, Germany, 18-21 July 2010 Seventeen years after the last IAFPA conference in Germany’s oldest city, the Phonetics Department of the University of Trier hosted the 19th Annual Conference of the International Association for Forensic Phonetics and Acoustics. Prof. Dr. Angelika Braun and her team of organizers were pleased to break the unprecedented 100 participant threshold and welcomed phoneticians and 96 acousticians from 14 countries. The main topics presented and discussed in 27 presentations and 10 posters were formants, whispered voice, speech databases, automatic voice/speaker comparison, and language analysis for the determination of origin (LADO). The conference was opened by the President of the University of Trier, Prof. Dr. Schwenkmezger, who commemorated 40 years of (forensic) phonetic expertise at the university and at the same time assured that degrees in phonetics will continue to be awarded in the future. In her opening address, the Dean of the Department of Languages, Literature and Media Science Prof. Dr. Hilaria Gössmann referred to the large number of students at this university attending this year´s conference, citing it as evidence of an active and interested student body and of the spirit of cooperation in the phonetics department. Both Prof. Schwenkmezger and Prof. Gössmann stressed the importance for Trier in being host to this high-profile international conference and wished all the participants a successful and enjoyable time. The first session of the 2010 conference was chaired by Jens-Peter Koester, the founder and long-time head of Trier’s phonetics department. It started with a presentation by Francis Nolan, Kirsty McDougall and Toby Hudson entitled Perceived voice similarity and acoustic measures following up on previous research towards a model of voice similarity for linguistically homogeneous voices. Their perception experiment showed that telephone recordings level out the perceived difference between different speakers. Furthermore, the mixing of studio and telephone recordings increases the perceived difference between samples from the same speaker. In a second step Nolan et al. applied multidimensional scaling (MDS, dim1- dim5) to the perceptual results of the studio recordings and correlated them with acoustic parameters. Some correlation was found between dim2 - F3, dim3 - F2 and dim4 - F1. The strongest correlation however was found between dim1 and F0 indicating the importance of fundamental frequency to naive listeners when judging voice similarity. These results were supported by Mette Hjortshøj Sørensen in her paper on Perception of voice similarity by different groups of listeners. Her experiment included 3 groups of listeners (Danish L1, Danish L2 and no knowledge of Danish) who listened to paired Danish voice samples with the task of judging degrees of similarity or dissimilarity. Her preliminary findings suggest that most listeners used fundamental frequency as the main cue for their decision making although L1 listeners utilised linguistic cues as well. She also noted that regardless of their linguistic background, listener performance varied significantly, thus indicating that voice-discrimination ability varied among listeners. Both findings are relevant for earwitness testimony evaluation. Her presentation received the 2010 IAFPA student paper award. The first day of the conference ended with a session in which two papers shifted the focus from forensic speech evidence proper to the meta-level of evidence presentation. Allen Hirson in his talk Electronic presentation of evidence 97 in Forensic Phonetics: A critical appraisal argued that electronic presentation of evidence promotes effectiveness and efficiency in court. The analysis and decision-making process of the expert becomes more comprehensible when explained with the help of digital presentations or interactive visualizations. Jonas Lindh, Anders Eriksson and Gustaf Nelhans concerned themselves with the phrasing of conclusions, questioning the claim made by some scientists that the Baysian framework actually constitutes a paradigm shift as compared to traditional verbal scales. The tell-tale dialect: Analysis of dialectal variation of German native speakers in telephone conversations by Karen Masthoff, Yasmin Hadj Boubaker and Olaf Köster showed that when they are given the task of dialect identification on telephone voice samples, experts’ performance does not correlate with time spent, number and type of methods applied or perceived degree of difficulty. Individual skill and experience appear to be the dominant factors for dialect identification performance. Anna Czajkowskis’ contribution, Vocal tract Resonances in Voiced and Whispered Speech and Listeners’ Perception of Voice Depth and Pitch, compared mean F1 and F2 LPC values of voiced and whispered recordings. F1 was higher in whispered speech for all vowels and all speakers. The same proved to be the case for F2 except with /i/ and /u/. She also presented findings from an experiment on listeners’ perception of a deep voice, concluding that untrained listeners may associate low mid-points of F1/F2 vowel spaces with ’deep’ voice even if F0 values do not indicate a low voice. Probably the most anticipated talk of the conference was Tina CambierLangeveld’s presentation on Performance of native speakers and linguists in LADO cases with true origin established. She presented results based on actual LADO cases in which the speakers’ true origins could be confirmed beyond reasonable doubt after the forensic speech analysis had been done. The combination of trained native speakers and supervising linguists turned out to perform very well with 120/124 cases (primary aim: verification of claimed origin) and 65/69 cases (secondary aim: identification of real origin) correctly established. Counter-expert reports by specialized linguists only on some of the same cases did not show this level of accuracy: 1/8 correct for primary aim but incorrect for secondary aim, 5/8 incorrect for primary aim and 2/8 inconclusive. She concluded that both trained native speakers and linguists can contribute to LADO and that a priori exclusion of trained native speakers is unfounded. The session on automatic speaker and voice comparison opened with Automatic Forensic Voice Comparison: Experiments on Real Case Data from the BKA by Timo Becker et al. They presented findings based on experiments with their own SPES system using real case material, confirming that transmission channel and speaking style mismatch as well as short recording durations reduce system performance. As a result, the use of global EER measures for automatic voice comparison systems was discouraged. In fact, system evaluation requires 98 suitable data, matching the conditions of the case recordings in question, in order to provide meaningful EERs. Herman Künzel presented Automatic Speaker Identification with Multilingual Speech Material in which he tested Batvox 3.1 for three channel conditions (studio, landline, GSM) and language mismatch conditions (GER-RUS, GER-POL, GER-ENG, GER-SPAN, GER-SPAN CATL). He confirmed that system performance generally decreases with reduction of channel quality (studio>landline>GSM). His language mismatch settings however seemed to have no or very little effect on the system’s EERs, leading him to the conclusion that language mismatch, at least for non-tone languages, can be ignored when using Batvox or similar systems for automatic speaker identification. In his presentation Empirically Assessing the Validity and Reliability of Forensic-Comparison Systems Geoffrey Morrison explained and supported the use of log-likelihood-ratio cost (Cllr) as an appropriate measure of accuracy for automatic speaker recognition systems used in forensic voice-comparison. The well-received poster sessions featured, among others, three contributions concerning speech databases: A Swedish Dialect Database by Jonas Lindh, an Alcohol Language Corpus by Florian Schiel et al. and a Database of Chinese Female Voice Recordings by Cuiling Zhang and Geoffrey Morrison. The conference ended on Wednesday afternoon with the announcement that the next IAPFA annual conference in 2011 will be hosted by the Austrian Academy of Sciences in Vienna, Austria. Peter Knopp, Trier New Sounds 2010 Sixth International Symposium on the Acquisition of Second Language Speech Poznań, Poland, 1-3 May, 2010 The sixth New Sounds meeting took place at Adam Mickiewicz University in Poznań. As the name (and subtitle) suggest “New Sounds” aims to describe and investigate the acquisition of second language speech, i.e. the phonetic/phonological aspects of a second language acquisition. The idea of a “New Sounds”conference was originally developed by Allan James and Jonathan Leather who organized the first meeting in Amsterdam in 1990, as well as the following three meetings in 1992 (Amsterdam), 1997 (Klagenfurt) and 2000 (Amsterdam once again). New Sounds returned in 2007, taking place in Florianópolis, Brazil (organized by Barbara Baptista, Michael Watkins and Andréia Rauber). For 2010, the responsibility for setting up the conference was taken over by Katarzyna Dziubalska-Kołaczyk, Magdalena Wrembel and Małgorzata Kul. With 180 participants, the Poznań conference (see also http://ifa.amu.edu.pl/newsounds/ introduction) can safely be said to be largest and most successful one yet. 99 The New Sounds conferences have always stood out thanks to being very well-organized and providing an especially friendly and relaxed atmosphere, which allows for fruitful and extensive discussions both during and outside of the actual presentation sessions. The Poznań conference did not break with this tradition. On the contrary, the excellent food supply for the lunches, the very pleasant conference reception and a cultural program, including a guided tour of the old city and an exhilarating choir performance, can be described as exceptional. Each of the three conference days was introduced by a keynote speech that provided an overview of a core area of phonetic/phonological SLA studies while presenting new insights into its theoretical underpinnings. Conference co-founder Allan James opened the meeting with a talk entitled “Sounds new? Extending the explanatory remit of second language phonology: identifications, multivalent sound categories and a use take on acquisition” in which he argued that recent, different sociolinguistically influenced conceptions of language, which involve ‘unordered scenarios’ of selective learning, partial competence and performance without competence, should also be reflected in the acquisition process and thus the phonetic and phonological paradigms used to describe it. For the second keynote speech, it was especially fortunate that the organizers were successful in coaxing a relaxed, serene and helpful (many of the younger researchers at the conference benefitted from his advice and encouragement) Jim Flege out of retirement in Italy. Now an “immigrant” and late L2 learner himself, Flege spoke about his latest insights into an area, to which he has already contributed very much, namely “Age effects on second language acquisition”. He concentrated especially on the factor of age of arrival (AOA), what underlying variables (neural maturation, cognitive changes across the life span, change in the way L1 and L2 systems interact, and difference in L2 input) may be correlated with it, and how this co-variation among multiple variables might be controlled. Finally, Martha Young-Scholten started the last day of the conference by introducing her most recent ideas and undertakings in the study of “Development in L2 phonology”. She convincingly argued that in order to effectively compare the different stages of phonological development in native and non-native learners, there is a need for longitudinal studies that involve naturalistic L2 learners, i.e., learners under conditions comparable to those applying to younger L1 learners (who do, of course, receive regular and plentiful input from the native speakers of their target language, but have no or very limited exposure to written text). She presented data from three learners of L2 German, analyzing their progress in terms of the successive re-ranking of OT constraints. The fact that the Poznań meeting has been the biggest New Sounds conference to date can certainly be interpreted to mean that the study of the acquisition of second language speech phonetics/phonology is a growing area. This is also reflected by the increasing variety within the field. 100 In order to provide an impression of the multitude of different subjects addressed during the conference, a classification of major blocks of topics seems useful, even though it is of course subjective (and deviates slightly from the categories the organizers had proposed before the conference). Similarly, the following overview of papers given at the conference is just as subjective and guided by what the author of this report witnessed himself and/or perceived as interesting. Segmental production of second language speech “Production of English interdental fricatives by Dutch, German, and English speakers” by Adriana Hanulikova and Andrea Weber examined the substitution of /θ/ by other sounds. German learners tend towards /s/, while the majority of Dutch learners prefer /t/. Besides the distribution of the substitutions, the study also aimed to compare these productions with actually intended /t, s/ productions and acoustically analyzed those instances of /θ/ where the speakers succeeded. In his study on “Voiced obstruents in L2 French: the case of Swiss German learners” Stephan Schmid showed that speakers of Swiss German, depending on phonotactic context, frequently did not reproduce voicing in obstruents when speaking French, realizing contrasts instead by means of longer/shorter durations. Thorsten Piske (co-authors James Flege, Ian MacKay and Diane Meador) gave a presentation “Investigating native and non-native vowels produced in conversational speech” arguing that true mastery of L2 vowels should be determined with respect to this more realistic and more challenging criterion. An instrumental approach measuring “Language-specific articulatory settings in L2 speech” and comparing them to native speaker settings was demonstrated in a paper by Sonja Schäffler, Ineke Mennen and James Scobbie. Rob Drummond combined L2 research with sociolinguistic aspects in his study of native Polish speakers in Manchester adopting local features, i.e., northern high, rounded pronunciation of the STRUT vowel vs. more widespread features like t-glottaling (“Speaking like the locals - the acquisition of local accent features by native Polish speakers living in Manchester”) L2 speech perception Silke Hamann, Paul Boersma and Małgorzata Ćavar examined whether closely related languages show a similar use of perceptual cues to identify phonological categories, thus facilitating L2 learning (“Language-specific differences in the weighting of perceptual cues for labiodentals”). They investigated such perceptual cues as duration, amplitude of friction noise and percentage of voicing, for the Dutch labiodentals /f, v, υ/ and how they would be perceived by native speakers of German, English, Croatian and Polish. Preliminary results indicated that the number of labiodental categories in these second languages was more influential than being a member of the same language family. 101 Joan C. Mora, James L. Keidel and James Flege argued that the perception of the contrasts between the mid vowels /e/ - /ε/ and /o/ - /ɔ/ was difficult even for Spanish-Catalan bilinguals because of a smaller degree of categoriality. A higher percentage of language use/experience was the most important factor for success (“Why are Catalan contrasts between /e/ - /ε/ and /o/ - /ɔ/ so difficult for even early Spanish-Catalan bilinguals to perceive?”). In their study of “The impact of visual cues and lexical knowledge on the perception of a non-native consonant contrast for Colombian adults” Michele Thompson and Valerie Hazan showed not only that both of the mentioned parameters were indeed used to support the identification of contrasts (e.g., /b/ vs. /v/), but also that there seemed to be a culture-specific bias with respect to the use of visual cues, as the Colombian speakers relied much more on them than Korean or mainland Spanish speakers did in earlier studies. Several studies, of course, combined production and perceptual data from L2 speakers, e.g. “Speech production and perception findings for native German speakers learning English as a second language” by Bruce L. Smith and Rachel Hayes-Harb or “Individual variation in the production and perception of SL phonemes: French speakers learning /i - ɪ/” by Georgina Oliver and Paul Iverson, who showed in their experiment that L2 vowel production was not highly linked to L2 vowel perception. They interpreted this result as indicating that learning an L2 category did not rely on just a single underlying ability or representation. There were also a number of studies that examined perceptual abilities employing neurolinguistic methods. Nuria Kaufmann, Martin Meyer and Stephan Schmid, for example, performed an EEG experiment using mismatch negativity paradigms to investigate contrasts between Serbian affricates as perceived by native speakers of Swiss German and of Rhaeto-Romance (“Phonetic contrasts in foreign language perception: A neuropsychological study on Serbian affricates”). Cheryl Frenck-Mestre and colleagues also used event-related potentials to investigate the perception of contrasts between the American English vowels /ε/, /æ/ and /ɪ/ by native speakers of American English, of French and by late FrenchEnglish bilinguals (“ERP evidence of the acquisition of non-native contrasts in late learners”). Prosody The number of studies dealing with prosodic features has increased in recent years and the field was also well-represented at New Sounds 2010. Ineke Mennen, Aoju Chen and Fredrik Karlsson’s paper “Characterising the internal structure of learner intonation and its development over time” examined the internal organization and longitudinal development of L2 learner intonation. Their approach thus did not look at individual aspects of intonation, but aimed to describe each learner intonation variety in its entirety. Results suggested that apart from language102 specific transfer phenomena, learners started out with a set of basic elements to build a simple, but efficient intonation system. “Categorizing Mandarin tones into prosodic categories: the role of phonetic properties” by Connie K. So and Catherine T. Best described how L2 learners perceived foreign tones according to the pitch patterns of the intonational categories in their native prosodic systems. Speakers of non-tone languages (e.g. English or French) therefore assimilated Mandarin tones into the corresponding categories (e.g., Mandarin tone 3 (fall-rise) may be interpreted as expressing uncertainty). The realization of different types of focus (narrow, broad, contrastive) as a source of foreign accent was discussed in Mary O’Brien and Ulrike Gut’s paper “Phonological and phonetic realisation of different types of focus in L2 speech.” Johannes Schliesser’s poster on “Prosodic encoding of focus and sentence mode in L2” also considered the realization of focus in L2 speech and especially considered Gussenhoven’s biological codes as an explanation for patterns that transfer from the L1 cannot easily account for. Foreign accent detection/identification Steven Weinberger and Stephen Kunath introduced “A computational model for accent identification”, the Speech Transcription Analysis Tool (STAT), which used segment and syllable structure generalizations, such as vowel shortening, final obstruent devoicing, palatalization, interdental fricative substitution, vowel epenthesis or consonant deletion to derive a specific set of phonological speech patterns that are characteristic of a particular foreign accent. Sylwia Scheuer’s presentation “How sure are judges about their foreign accent judgments?” on the other hand, dealt with human quality judgments of foreign accent. Scheuer confirmed that judges are constant in their ratings and on that basis attempted to identify those phonetic features (in this case of L2 English) that promise to provide the greatest reliability. Teaching The studies just described do of course have a close connection to the applied aspects of the study of second language speech, i.e. pronunciation teaching. New Sounds also offered various papers dealing with particular phonetic phenomena that trigger the impression of foreign accent. Walcir Cardoso, co-host of New Sounds 2013 in Montréal, looked at the production of foreign /s/ + consonant clusters by learners, e.g. speakers of Brazilian Portuguese, who were not familiar with them (“Teaching foreign sC onset clusters: Comparing the effect of three types of instruction”). He tested the success of three different forms of instruction (and the underlying philosophy) finding that the Projection Model of Markedness showed the largest instructional effect. 103 Wiktor Gonet, Jolanta Szpyra-Kosłowska and Radosław Święciński investigated why the velar nasal /ŋ/ is especially difficult for Polish learners of English to acquire when it is not followed by a velar plosive (“Acquiring angma – the velar nasal in advanced learners’ English”), while Esther Gómez Lacabex and María Luisa García Lecumberri demonstrated success in instructing native speakers of Spanish to produce correct instances of vowel reduction in English (“Investigating training effects in the production of English weak forms by Spanish learners”). Factors influencing second language performance The study of the various individual parameters that play a role in a learner’s overall competence has always been one of the major subjects in second language speech research. New Sounds again included many interesting papers devoted to particular aspects of the individual and demonstrated their relevance. Various areas were covered, ranging from cognitive psychology, e.g. “Phonological shortterm memory and L2 speech learning in adulthood” by Cristina Aliaga-Garcia, Joan C. Mora and Eva Cerviño-Povedano to “classic” factors ,like age, albeit from the unusual perspective of very young learners as in Henning Wode’s talk on “L2 phonological acquisition by young learners: Evidence from production” to other, somewhat external, linguistic aspects,and Yasaman Rafat’s paper on “Orthography as a conditioning factor in L2 transfer: evidence from English speakers’ production of Spanish consonants.” Several presentations also attempted to investigate the possible interactions between different phonetic abilities and the many known relevant psychological and neurological factors, as well as those describing the external circumstances of acquisition in order to isolate the significance a particular parameter. This is the case in the study “Investigating the concept of talent in phonetic performance” by Matthias Jilka, Natalie Lewandowska and Giuseppina Rota and a connected investigation of the phenomenon of phonetic convergence as an indicator of talent (“Is dynamic phonetic adaptation in dialog related to talent?” by Lewandowski ,Jilka and Grzegorz Dogil). Yoon Hyun Kim and Valerie Hazan’s study on “Individual variability in perceptual learning of L2 speech sounds and its cognitive correlates” also followed a similar methodology (use of a test battery covering various cognitive abilities) in order to investigate individual variability in discriminating non-native phonetic contrasts. Models and theories of the acquisition of second language speech Another important aspect of second language acquisition research was provided by studies that explicitly attempt to contribute to the (further) development and explanatory/predictive power of models of sound acquisition and/or representation. Ocke-Schwen Bohn and Catherine T. Best attempted to account for native German listeners’ abilities to perceive the constrasts between the American 104 English approximants /r/, /l/, /w/ and /j/ in terms of Flege’s Speech Learning Model and Best’s own Perceptual Assimilation Model. John Archibald argued for the existence of a L1 phonological filter that can be overcome by especially robust cues, explaining why certain articulations, although equally unfamiliar to learners, are acquired more easily than others (“Conditions for overriding the L1 phonological filter”) Finally, conference host Katarzyna Dziubalska-Kołaczyk and co-author Daria Zielińska presented an approach predicting preferred and dispreferred consonant clusters based on the recognition of phonotactic and morphonotactic (sound clusters across morphological boundaries) structures. Phonotactic preferences were based on the notion of markedness, which in turn was defined by the perceptual distance between segments (as measured according to DziubalskaKołaczyk’s own Net Auditory Distance Principle). Morphonotactic clusters behaved differently as they contained morphological information and markedness was used to signal their function. As indicated earlier, this can only be a subjective, somewhat impressionistic summary of the many interesting presentation given at New Sounds 2010. Full Proceedings can be found at http://ifa.amu.edu.pl/newsounds/Proceedings_guidelines. The conference organizers intend to publish two books with more elaborate versions of many of the presented papers early next year. The next New Sounds conference will take place in 2013 at Concordia University in Montréal, Canada! Matthias Jilka, Stuttgart 105 BOOK REVIEWS Steve Parker ed. (2009) Phonological Argumentation. Essays on Evidence and Motivation. London/Oakville: Equinox (377 pp. ISBN 978-1-84553-221-5) Reviewed by: Péter Siptár Eötvös Loránd University, Budapest, Hungary e-mail: [email protected] The Equinox series Advances in Optimality Theory (series editors: Ellen Woolford and Armin Mester) was launched in 2007 with John J. McCarthy’s monograph Hidden Generalizations: Phonological Opacity in Optimality Theory. The present volume is the fifth in the series and is a Festschrift for McCarthy, written by his former students, all of them alumni of the graduate school of the University of Massachusetts at Amherst (except Joe Pater who is McCarthy’s colleague, a professor in the Department of Linguistics there). The book has a Foreword by Elisabeth Selkirk and the editor’s Introduction includes excerpts from some of the authors’ personal comments on John McCarthy. The eleven chapters of the collection all discuss the process of phonological argumentation, the way the validity (or otherwise) of particular phonological analyses can (or must) be demonstrated within the framework of Optimality Theory (and in general). The chapters are divided into two main sections: the first six chapters discuss the evidence for, and the methodology used in, discovering the bases of phonological theory (i.e., how constraints are formed and what sort of evidence is relevant in positing them); the last five chapters present case studies that focus on particular theoretical issues within OT through various phenomena in one or several languages, arguing in favour of or against specific formal analyses. Andries W. Coetzee’s “Grammar is both categorical and gradient” (pp. 9–42) motivates the claim in its title by presenting the results of psycholinguistic experiments involving speakers of English and Hebrew. In particular, the author shows that the subjects’ mental grammars are capable of making both categorical and gradient judgements about the well-formedness of hypothetical word-like forms. He also proposes a new type of comparative OT tableau to model both types of decision-making behaviour, pointing out that traditional grammars are unable to handle them. Standard derivational models of generative grammar can easily account for the categorical distinction between grammatical and ungrammatical forms but have some difficulty with gradient well-formedness distinctions. On the other hand, models in which the bifurcation of grammatical and ungrammatical forms does not exist, that is, where an ungrammatical form is taken to be simply a form with extremely low probability of occurrence, are also challenged by the experimental results. The author argues that the inherent 106 comparative character of OT grammars enables that theory to model both kinds of behaviours in a straightforward manner. Paul de Lacy’s contribution on “Phonological evidence” (pp. 43–77) examines the innatist theory of generative grammar’s phonological component and related modules, asking what such a framework identifies as empirical evidence that supports it. The chapter also refers to predicted ambiguities where two or more modules influence the same phenomenon. Specifically, the author discusses phenomena like alternations, phonotactics, phonetic neutralization, free variation, diachronic change, loanword adaptation, language games, language acquisition data, and typological frequency, and concludes that the theory – or at least its phonological component – does not claim responsibility for many of these phenomena. Based on his earlier work on markedness, he proposes methods to help separate valid from spurious evidence. Elliott Moreton’s “Underphonologization and modularity bias” (pp. 79–101) proposes a stochastic learning algorithm to capture the relative frequency of phonologization effects, showing that the model derives the correct results in a simulation of typological patterns involving tones interacting with other tones. The author concludes that the hypothesis pairing “hard typology” (what grammars are cognitively possible) with Universal Grammar and “soft typology” (how frequent they are) with other factors affecting language change is probably too strong. “Cognition and phonetics interact to determine typology in ways more complicated (and interesting) than has been generally acknowledged. Further progress will require a better quantitative understanding of the typology of phonetic precursors, and of the differential receptiveness of learners to different patterns” (p. 100). Máire Ní Chiosáin and Jaye Padgett’s “Contrast, comparison sets, and the perceptual space” (pp. 103–121) uses a systemic approach couched in Flemming’s Dispersion Theory to argue for a principled restriction of the perceptual space of comparison sets which resolves the problem of infinite candidate generation. The discussion focuses on secondary palatalization contrasts in onset versus coda position, using perceptual data from Irish. Joe Pater’s “Morpheme-specific phonology: Constraint indexation and inconsistency resolution” (123–154) argues that exceptions and other instances of morpheme-specific phonology are best analysed in OT in terms of lexically indexed markedness and faithfulness constraints (as opposed to lexically specified rankings, i.e., cophonologies). This approach can capture locality restrictions, distinctions between exceptional and truly impossible patterns, distinctions between blocking and triggering, and distinctions between variation and exceptionality. The chapter discusses data from Assamese, Finnish, and Yine (formerly known as Piro) and provides a learnability account of the genesis of lexically indexed constraints. Jennifer L. Smith’s “Source similarity in loanword adaptation: Correspondence Theory and the posited source-language representation” (pp. 155– 107 177) assumes a correspondence relation between loanwords and their “pLs representations”, i.e., the borrower’s posited representation of the source-language form, allowing for a consistent account of the interaction between phonological adaptation processes and factors such as perception and orthography. The author provides empirical support from Japanese, Finnish, Hmong, and Sranan, predicting multiple phonological adaptation strategies for loanwords. Part Two of the volume includes five case studies. John Alderete’s “Exploring recursivity, stringency, and gradience in the Pama-Nyungan stress continuum” (pp. 181–202) reviews contemporary approaches to the morphological influences on stress in Diyari, Dyirbal, Warlpiri, and other Pama-Nyungan languages. The author develops nine different theories to account for the variation found that differ in the constraints responsible for edge effects in stress and the alignment of morphological and prosodic structure. Analysing the factorial typology of each theory, the author comes up with three conclusions. First, stringency (special-general) relations between morpho-prosodic alignment constraints are necessary because theories that ignore them either fail to describe all relevant data or predict the existence of implausible (and unattested) stress patterns. Second, some gradiently evaluated constraints have to stay even though some others can (and must) be dispensed with. And third, McCarthy and Prince’s recursive prosodic word analysis can be given both theoretical and empirical support. Maria Gouskova and Nancy Hall’s “Acoustics of epenthetic vowels in Lebanese Arabic” (pp. 203–225) examines Lebanese epenthetic vowels through acoustic experiments and shows that such vowels have phonetic traces that can help learners distinguish them from underlying vowels. Although epenthetic and lexical vowels are often transcribed as identical, they turn out to be acoustically distinct: epenthetic vowels are either shorter or backer or both. The authors propose a learning strategy based on McCarthy’s theory of Candidate Chains that provides a way to model this incomplete neutralization and its opaque interaction with stress assignment. In particular, they suggest that phonetic implementation optionally accesses an intermediate level of phonological derivation, that is, a stage that is closer to the underlying representation than the (fully neutralized) surface phonological form of the given item. Junko Ito and Armin Mester’s “The onset of the prosodic word” (pp. 227– 260) is my personal favourite in the whole volume. In one of the pioneering works of OT, McCarthy offered a comprehensive analysis of r-insertion in non-rhotic English dialects, suggesting that the constraint driving the process was not an onset-related one but rather a constraint requiring prosodic words to end in a consonant. This paper shows that this counter-intuitive ‘anti-wellformedness’ constraint can be done away with on the basis of an enriched view of prosodic constituent structure involving functional morphemes and the onset properties of the maximal prosodic word. “Empirically, our analysis not only accounts for the complex distribution of the linking r-consonant in RP and the Eastern 108 Massachusetts dialect, but also extends straightforwardly to the different distributions in other dialects. While preserving the central insights of [McCarthy’s paper], which remains not just a classic but also a model of optimality-theoretic analysis, the present proposal is theoretically grounded in correspondence theory (positional faithfulness), and is a natural outgrowth of a conception of prosodic structure that views function words as occupying positions within extended word structures (maximal prosodic words)” (pp. 256–7). Ania Łubowitz’s “Infixation as morpheme absorption” (pp. 261–284) presents evidence that infixes in Palauan and Akkadian are subject to feature cooccurrence restrictions (OCP) on the root domain, whereas segmentally identical prefixes are not. In order to account for this asymmetry, the author proposes that infixes are structurally incorporated into the root morpheme in the output through a process called morpheme absorption. Finally, Sam Rosenthall’s “Vowel length in Arabic verb stems” (pp. 285– 307) relies on a foundational insight of OT, the interaction between ranked and violable constraints, in analysing the intricate morphophonemics of Arabic verb roots containing a glide as one of their radicals. Vowel coalescence and compensatory lengthening are both seen to arise from the same subhierarchy of constraints, but only if verb roots are crucially triliteral underlyingly. The chapter also argues for a prosodic analysis of verb stems, in accordance with McCarthy and Prince’s Prosodic Morphology Hypothesis. The back matter includes a cumulative list of References (pp. 308–347), as well as an author index, an index of constraints, an index of languages, and a subject index. All in all, this is an important book and, although by no means an easy bedside reading, it is thoroughly enjoyable even for readers whose acquaintance with the current OT scene is somewhat superficial. It is a pity that the volume is riddled by a substantial number of typos of various sorts from simple misalignments (as on p. 275 (26) or p. 294 (14)) through cases like “it is difficult how to see how” (p. 138), “a language that that neutralizes contrasts” (p. 149), “it less likely to affect” (p. 210), “the fact that that the optimal stem has a long vowel” (p. 298), “as well in as clusters” (p. 154, fn. 3), “such as constraint” (for such a constraint, p. 258, fn. 8), to truly embarrassing instances like “case ending suffix” for infinitive suffix (p. 283 fn. 20), “obstruent-sonorant clusters” for sonorantobstruent clusters (p. 205 (4)), and even transcription errors (in nonsense items) like “stʌt” for stɔɪt (p. 37). Perhaps the most serious error is this: “a special PRECEDENCE constraint requires that epenthesis precede insertion of stress” where the correct requirement is that stress assignment precedes epenthesis (p. 219). The typographic details of referencing conventions are not uniform throughout (Alderete’s chapter is the odd man out in this respect). And even the editor’s own name is misspelt at one point as “Stever Parker” (p. 75, fn. 1). 109 Such minor (or not-so-minor) imperfections notwithstanding, the book will be of interest to anyone who seriously follows what is going on in the field of phonology in general and Optimality Theory in particular. Géza Németh & Gábor Olaszy eds. (2010) A magyar beszéd. Beszédkutatás, beszédtechnológia, beszédinformációs rendszerek [Hungarian Speech. Speech research, speech technology, speech information systems] Budapest: Akadémiai Kiadó (708 pp. ISBN 978-963-05-8966-6) Reviewed by: Péter Siptár Eötvös Loránd University, Budapest, Hungary e-mail: [email protected] Speech technology is one of the new industries of the late twentieth and early twenty-first centuries – and this volume is its first systematic book-size overview in Hungarian and on Hungarian. As the various devices and services of speech technology, with their functions growing fast both in number and in diversity, become part of our everyday lives and especially part and parcel of our children’s lives, it is increasingly important that they are made interesting, attractive, easy to learn and simple to use. The forms and functions of human speech communication have taken several millennia to emerge; their application for information exchange between man and machine is therefore a great opportunity and a great challenge. Scientists have only taken the very first steps in that direction. So far; their machines have but a tiny fraction of the communicative endowments of human speakers at their disposal, especially with respect to the realm of meaning or semantic interpretation. With respect to the time of a potential financial breakthrough for speech technology solutions, serious experts had predicted back in the 1980s that an exponential increase was to be expected in the English-language speech recognition market in a matter of two years or so. This did not happen; at best, linear development took place; a fact that discouraged decision makers who wielded influence over financial resources. Ever since, due to a tension between marketing promises and actual performance, cycles of increased attention followed by less awareness can be observed every five or six years. However, if we compare the early eighties with the present day, the overall rate of development is enormous. Fortunately, speech research has a significant tradition in Hungary. Hence, it is not necessary for Hungarians to wait for technologies from big multinational companies to fill the relatively small market of this country. Instead, the Hungarians have found competitive solutions based on their own intellectual and material resources. 110 This book is a compendium of what current results of scientific and technological research have to tell us about Hungarian speech in the twenty-first century. The aim of the authors, as the editors point out in the preface, is to present an overview of the acoustic structure of present-day Hungarian speech, and to review the recent results, problem areas, and applications of speech technology as a relatively new interdisciplinary area of research, especially insofar as it pertains to Hungary. The book has essential chapters (for instance, those on speech acoustics or signal processing), as well as chapters on various applications and technologies that characterize the state of the art technology. The book has an associated homepage (http://magyarbeszed. tmit.bme.hu) that contains a host of relevant data that had to be left out of the book due to lack of space. The authors are leading speech technology experts of this country: Géza Németh and Gábor Olaszy (the two editors), as well as Kálmán Abari, Mátyás Bartalis, Tamás Bőhm, Tamás Gábor Csapó, László Czap, Tibor Fegyó, Géza Kiss, Péter Mihajlik, György Szaszák, György Takács, Péter Tatai, Bálint Tóth, Klára Vicsi, Ákos Viktóriusz, and Csaba Zainkó. The volume also has a “supervising editor”, Géza Gordos. The book has four large sections, preceded by a preface, a list of authors, and a key to abbreviations, and followed by a large list of references, an appendix and an index. The first section (People, language, and speech, pp. 1–92) consists of four introductory chapters (Speech and the information society, pp. 3–7, The complex structure of speech, pp. 9–18, Physiological and physical basics, pp. 19– 71, The connection between speech and writing, pp. 73–92). The second section, still on a preliminary note, but focusing on Hungarian, discusses The structural analysis of speech (pp. 93–205). This is the part of the book that is closest to linguistic phonetics and is divided into two chapters (The segmental structure of speech, pp. 95–170, and The suprasegmental structure of speech, pp. 171–205). The third and largest section (Speech technology, pp. 207–522) discusses The science of speech technology (pp. 209–259), Data bases serving speech technology (pp. 261–331), Speech perception and recognition by machine (pp. 333–409), and Speech production by machine (pp. 411–522). Finally, the fourth section (Applications of speech technology, pp. 523–655) tells us about Speech information systems (pp. 525–539), provides Examples of the areas of application of speech technology (pp. 541–629), lists Interfaces, standards, homepages, and programs (631–651), and concludes in a very brief chapter by Nick Campbell and Géza Németh on The future of speech technology (pp. 653–655), the last sentence of which is “Speech technology is roughly at a stage of development that the vehicle industry had reached by 1900.” It remains for the reader to decide whether this is an optimistic or a pessimistic note to end a book like this on. The book is primarily intended as a textbook for students of informatics. However, it will also be useful for experts and decision makers in telecommunication, speech technology research and development, designers of 111 content providing services, the health industry and rehabilitation. But the authors had an even larger audience in mind when writing it. In their view, the book may turn out to be useful in a range of less technologically minded university courses in the humanities and elsewhere (phonetics, speech analysis, linguistics, speech psychology, health promotion and disease prevention, mass communication, and so on). The authors furthermore recommend this book for secondary schools, and indeed for anybody who might be interested (like physicists, linguists, people who work for radio or television or in the movie industry or media experts in popular science). The comprehensive contents and the relatively popular attitude of the book make it readable and even enjoyable for everybody from philosophers to engineers (and beyond). Halicki, Shannon D. (2010) Learner Knowledge of Target Phonotactics: Judgements of French Word Transformations Lincom GmbH, (LINCOM Studies in Language Acquisition Series (LSLA), 27), ix + 234 pages, ISBN 9783895867408, price €65,10 USD 79.70 / EUR 64.80 / GBP 55.10.2009. Reviewed by Chantal Paboudjian University of Provence, Aix-en-Provence, France This 27th volume of the LINCOM Studies in Language Acquisition series contains the publication of the Ph.D. dissertation of Dr. Shannon Halicki who is now assistant professor of French and Spanish at the Department of Humanities of West Liberty University in West Virginia. The dissertation was defended in 2009 at Indiana University in Bloomington. Throughout its 7 chapters, the book addresses the relationship between language learners’ inter-language phonology and Universal Grammar (UG). More precisely, it seeks to determine the extent to which inter-language phonology is constrained by UG principles. Following Chomsky’s 1965 Aspects of the Theory of Syntax, it is assumed that an innate language learning mechanism is intact in adult second language acquisition. Learners would thus be equipped with preexisting knowledge that makes acquisition possible. The author takes the opposing view of most studies on the subject attesting that second language phonology is not native-like and attempts to investigate if English adult second-language (L2) learners of French acquire L2 phonotactic constraints at abstract levels in the same way as native learners and hypothesizes that learners can reconfigure L1 112 parameters to accommodate new L2 material. Two major research questions are addressed: “Do non-native speakers of a language exhibit consistent judgments of wordlikeness in their target language?” and if they do, “Are the judgements native-like, driven by L1 transfer and inhabit the niche occupied by native language phonologies?” To provide answers to these questions, the author has been testing L2 knowledge of three structural features which differ between native language (L1) and L2, i.e., consonant cluster limits in French, sonorancy assimilation at morpheme boundaries, and similarity avoidance at morpheme boundaries. Chapter 1. Introduction and Background (pp. 1-33) reviews arguments that grammars (including phonological grammars) are generative systems whose acquisition is driven by an innate learning mechanism. It presents research on language acquisition, particularly syntactic well-formedness and interpretation, by innate ability for both native and L2 learners. It also briefly addresses issues such as evidence of native speaker intuition about phonotactics, L2 learners’ acquisition of non-learnable knowledge about the target language (not transferred from the L1), the relationship between UG constraints and phonology and L2 phonological systems (with focus on learner’s pronunciation). The chapter concludes with a presentation of the research questions the author studies in the volume. Chapter 2. Studies in native and learner phonotactic performance (pp. 32-66). Since L2 learners seem to demonstrate native-like judgments of syntactic well-formedness and interpretation, the author asks whether they demonstrate similar abilities in L2 phonotactics. She thus surveys literature relevant to the study of L2 phonotactic knowledge with special focus on syllable well-formedness contrasts (relationship between markedness and language universals) and the concept of ‘wordlikeness’ in cognitive linguistics. She also reviews two relevant L2 studies carried out in an Optimality Theory framework and presents studies showing the importance of an abstract phonological level in the account of the data. Chapter 3. The learnability of French syllable Constraints by L1 English Speakers (pp. 67- 110). The author examines here the ‘learnability’ (author’s expression) of constraints on French consonant clusters exhibiting L1-L2 contrasts. She describes facts of syllable structure in French and English in order to specify the nature of the learning task, the type of input available to learners as well as representations that may be transferred from the L2 system. The parametric difference of syllable structure between English and French is presented using McCarthy and Prince’s Prosodic Morphology analysis. Syllable structure constraints and a detailed description of the French maximal syllable in codas are provided and illustrated and the validity of some minor linguistic phenomena such as word transformations is discussed. A section further analyses 113 the rules of popular French re-suffixation (assimilated to slang language manipulations) which are difficult to acquire by L2 learners. Chapter 4. Experimental Design and Methodology (pp. 111-130) describes the design of the word-building experiment and the statistical procedures used in the data analysis. Three structural features, i.e., consonant cluster limit, sonorancy assimilation and continuancy dissimilation, have been tested. The tests, designed to probe intuitions regarding the well-formedness of re-suffixed items in French, instructed intermediate and advanced English-speaking learners of French as well as native French speakers to give their levels of acceptance of series of items with sequences varying at the phonotactic level. The stimuli, the questionnaire and the experimental hypotheses for the tests are described. Moreover the questionnaires for participants and lists of the test items are provided in the volume appendix. Chapter 5. Quantitative Results (pp. 131-161) provides quantitative results for the 3 tests in the experiment illustrated by 24 tables and figures. Both French native speakers and English learners of French appear to exhibit similar judgments of asymmetries in the well-formedness of proposed nonce re-suffixed items with level of confidence increasing with language proficiency. However a difference is noted in the rates of acceptances of some consonant clusters in nonce words between learners and native speakers. Legal sequences in French were accepted within the context of roots but not in derivations. Chapter 6. Discussion (pp. 162-189). In this last part, the author interprets and discusses the central findings of Chapter 5, which are that advanced and intermediate learners as well as native French speakers rejected some items but accepted others as well-formed. The author concludes that formal phonological grammar (knowledge of phonotactic constraints and knowledge of alternations) is the primary locus of both groups. She argues that L2 learners construct the phonological shape of the suffix at the prosodic level and obey constraints on operations having to do with the preservation of roots, and specification of phonological features of allomorphs. She discusses the potential influences of lexical frequency and universal markedness which would predict outcomes in the word judgment task. Chapter 7. Conclusion (pp. 190-203). This chapter contains a general discussion with conclusions drawn from the experimental findings. The author points out two novel aspects of the presented research: (1) her adoption of a new approach to issues such as learner simplification strategies, which is the Optimality Theory framework which assumes the universality of constraints on language output as well as the parametric difference between languages; (2) her adoption of a new psycholinguistic approach in phonological testing, that is the introduction of the notion of relative acceptability/rejection taking into account gradient judgments of listeners. Finally two sections stress the role of UG in L2 phonology and of lexical frequencies in phonotactic knowledge. The concluding remarks show that a line of research has been opened up onto important issues in 114 language acquisition. Further research could focus on the order of acquisition of structures and the definition of cues needed to establish correct parameter settings. The first impression made by this book is that it is geared towards language acquisition specialists who are fluent readers of English. Other readers may be discouraged by the complexity of a presentation (particularly in the last two chapters) more suited to a dissertation than to the communication of a scientific work to a larger public. In addition, the use of a small font size doesn't make reading easier for some. However language acquisition specialists will acknowledge the tremendous work that has been conducted in the presentation of the reviews, and language teachers will appreciate the analysis of research on language acquisition mechanisms. French teachers will also find helpful and sometimes practical information they can directly use in their teaching approach. Reference Chomsky, Noam (1965). Aspects of the Theory of Syntax, Cambridge, MA, MIT Press. 115 WORKSHOPS AND CONFERENCES +++ 2-3 May 2012 The Listening Talker (LISTA) Workshop Edinburgh, Scotland +++ 2-4 May 2012 2nd Workshop on Sound Change Kloster Seeon, Germany +++ 21-27 May 2012 8th International Conference on Language Resources and Evaluation (LREC) Istanbul, Turkey +++ 22-25 May 2012 Speech Prosody 2012 Shanghai, China +++ 26 May 2012 4th International Workshop on Corpora for Research on Emotion Sentiment & Social Signals Istanbul, Turkey +++ 19-21 July 2012 Interdisciplinary Workshop on Perspectives on Rhythm and Timing Glasgow, UK +++ 27-29 July 2012 Lab Phon 13 Stuttgart, Germany +++ 5-8 August 2012 Annual Conference of the International Association for Forensic Phonetics and Acoustics (IAFPA) Santander, Spain 116 +++ 3-5 September 2012 ISICS 2012: International Symposium on Imitation and Convergence in Speech Aix-en-Provence, France +++ September 7-8, 2012 Interdisciplinary Workshop on Feedback Behaviors in Dialog Portland, U.S. +++ 09-13 September 2012 Interspeech 2012 Portland, U.S. +++ 25-29 August 2013 Interspeech 2013 Lyon, France +++ 7-11 September 2014 Interspeech 2014 Singapore +++ August 2015 18th International Congress of the Phonetic Sciences (ICPhS) Glasgow, Scotland +++ September 2015 Interspeech 2015 Dresden, Germany 117 CALL FOR PAPERS The Phonetician will publish peer-reviewed papers and short articles in all areas of speech science including articulatory, acoustic phonetics, speech production and perception, speech synthesis, speech technology, applied phonetics, psycholinguistics, sociophonetics, history of phonetics, etc. Contributions should primarily focus on experimental work but theoretical and methodological papers will also be considered. Papers should be original works that have not been published and are not considered for publication elsewhere. Authors should follow the guidelines of the Journal of Phonetics for the preparation of their manuscripts. Manuscripts will be reviewed anonymously by two experts of the field. The title page should include the authors’ names and affiliations, address, e-mail, telephone, and fax numbers. Manuscripts should include an abstract of no more than 150 words and up to four keywords. The final version of the manuscript should be sent both in .doc and in .pdf files. It is the authors’ responsibility to obtain written permission to reproduce copyright material. All kinds of manuscripts should be sent in electronic form (.doc and .pdf) to the Editor. We encourage our colleagues to send manuscripts for our newly released section entitled Master’s research: Introduction. Master’s students are invited to sum up their research in the area of phonetics answering the questions of motivation, topic, goal, and results (no more than 1,200 words). INSTRUCTIONS FOR BOOK REVIEWERS Reviews in the Phonetician are dedicated to books related to phonetics and phonology. Usually the editor contacts prospective reviewers. Readers who wish to review a book mentioned in the list of “Publications Received” or any other book, should address the editor about it. A review should begin with the author’s surname and name, publication date, the book title and subtitle, publication place, publishers, ISBN numbers, price, page numbers, and other relevant information such as number of indexes, tables, or figures. The reviewer’s name, surname, and address should follow “Reviewed by” in a new line. The review should be factual and descriptive rather than interpretive, unless reviewers can relate a theory or other information to the book which could benefit our readers. Review length usually ranges between 700 and 2500 words. All reviews should be sent in electronic form to prof. Judith Rosenhouse (e-mail: [email protected] ). 118 ISPhS MEMBERSHIP APPLICATION FORM Please mail the completed form to: Treasurer: Prof. Dr. Ruth Huntley Bahr, Ph.D. Treasurer’s Office: Dept. of Communication Sciences and Disorders 4202 E. Fowler Ave. PCD 1017 University of South Florida Tampa, FL 33620 USA I wish to become a member of the International Society of Phonetic Sciences Title: ____ Last Name: _________________ First Name: _________________ Company/Institution: ________________________________________________ Full mailing address: ________________________________________________ ________________________________________________________________ Phone: __________________________ Fax: ____________________________ E-mail: ___________________________________________________________ Education degrees: __________________________________________________ Area(s) of interest: __________________________________________________ The Membership Fee Schedule (check one): 1.Members (Officers, Fellows, Regular) 2.Student Members 3.Emeritus Members 4.Affiliate (Corporate) Members 5.Libraries (plus overseas airmail postage) 6.Sustaining Members 7.Sponsors 8.Patrons 9.Institutional/Instructional Members $ 30.00 per year $ 10.000 per year NO CHARGE $ 60.000 per year $ 32.000 per year $ 75.000 per year $ 150.000 per year $ 300.000 per year $ 750.000 per year Go online at www.isphs.org and pay your dues via PayPal using your credit card. I have enclosed a cheque (in US $ only), made payable to ISPhS. Date ___________________ Full Signature _____________________________ Students should provide a copy of their student card. 119 News on Dues Your dues should be paid as soon as it convenient for you to do so. Please send them directly to the Treasurer in US$: Prof. Ruth Huntley Bahr, Ph.D. Dept. of Communication Sciences & Disorders 4202 E. Fowler Ave., PCD 1017 University of South Florida Tampa, FL 33620-8200 USA Tel.: +1.813.974.3182, Fax: +1.813.974.0822 e-mail: rbahr@ usf.edu VISA and MASTERCARD: You now have the option to pay your ISPhS membership dues by credit card using PayPal if you hold a VISA or MASTERCARD. Please visit our website, www.isphs.org, and click on the Membership tab and look under Dues for the underlined phrase, “paid online via PayPal.” Click on this phrase and you will be directed to PayPal. The Fee Schedule: 1. Members (Officers, Fellows, Regular) 2. Student Members 3. Emeritus Members 4. Affiliate (Corporate) Members 5. Libraries (plus overseas airmail postage) 6. Sustaining Members 7. Sponsors 8. Patrons 9. Institutional/Instructional Members $ 30.00 per year $ 10.00 per year NO CHARGE $ 60.00 per year $ 32.00 per year $ 75.00 per year $ 150.00 per year $ 300.00 per year $ 750.00 per year Special members (categories 6–9) will receive certificates; Patrons and Institutional members will receive plaques, and Affiliate members will be permitted to appoint/elect members to the Council of Representatives (two each national groups; one each for other organizations). Libraries: Please encourage your library to subscribe to The Phonetician. Library subscriptions are quite modest – and they aid us in funding our mailings to phoneticians in Third World Countries. Life members: Based on the request of several members, the Board of Directors has approved the following rates for Life Membership in ISPhS: Age 60 or older: $ 150.00 Age 50–60: $ 250.00 Younger than 50 years: $ 450.00 120