Skip to main content
Intended for healthcare professionals

Abstract

The precise nature of the prosodic contribution to disambiguating open and polar questions with indefinite content pro-forms in Korean (e.g., nwukwu “who/someone”) remains a matter of debate. One possible relevant prosodic feature is expanded F0 range at the site of question focus. We report the results of a pilot experiment followed by a large-scale online speech perception gating study (n=124) that manipulated the natural prosody of identical strings read as statements or open or polar questions. We found a tendency for all questions to be perceived as open questions, regardless of prosody. Open questions were reliably disambiguated from other utterance types, and there was no effect of the prosodic manipulation. Polar questions did show a significant effect of the prosodic manipulation (p<.001), but even with full natural prosody, these stimuli were never correctly identified above chance levels. We suggest that in the absence of context, there is a strong preference for hearers to interpret content pro-forms as question words, and that subsequent prosodic information may be discounted if a hearer has already committed to their interpretation. The present findings have implications for formal accounts of the Korean syntax-prosody interface.

1 Introduction

This article contributes to the debate surrounding the nature of the prosodic contribution to disambiguating otherwise identical strings in Korean that can be interpreted as open questions, polar questions, or statements. Grammatically, the different meanings are accounted for by differences in the scope of question focus, the at-issue content of the question. For open questions such as “Who did grandmother meet?,” the at-issue content is the identity of the person who grandmother met. Thus, question focus is on the word who and is often described as having a narrow scope. For polar questions such as “Did grandmother meet someone?,” the at-issue content is the whole proposition that grandmother met someone. Accordingly, question focus is on the whole utterance, often described as having broad scope. It is generally accepted that each reading is associated with characteristic prosody. However, although various proposals have been made as to which prosodic features are critical in the disambiguation, including the placement of Accentual Phrase (henceforth AP) boundaries (e.g., Jun & Oh, 1996; Yun, 2012) and pitch expansion associated with the region of focus (e.g., Kang, 1988), conclusive evidence has to date been elusive, and the exact nature of some of these features is also hard to define.
We report the findings of two experiments: a pilot comprehension study testing the application of the gating paradigm to naturally produced stimuli, followed by a large-scale speech comprehension experiment with natural and manipulated stimuli. The experiments were designed on the assumption that expanded pitch range during a focus-bearing constituent was the critical prosodic feature in disambiguation and aimed to specify more closely what level of F0 range would be interpreted by hearers as pitch expansion. Naturally produced stimuli were manipulated to reduce the range of F0 in the AP containing the focus-bearing constituent. The placement of AP boundaries was not controlled. Participants heard repeated, incrementally longer fragments of the stimuli and were asked to identify whether they had heard an open question (wh-question), a polar question (yes-no question), or a statement, or whether they did not yet know.
Our results suggest that F0 range alone does not determine hearers’ interpretation of an ambiguous utterance. Although a gradient effect of the size of the F0 range at the verb was found when interpreting polar question stimuli, this competed with a strong tendency for hearers to interpret indefinite pro-forms as signaling open questions, whether or not they were associated with a greater F0 range. These findings have implications for our understanding of the role of prosody within the overall content of spoken language. Furthermore, they suggest that in attempting to account for the prosodic contribution of the meaning of an utterance within formal theories of grammar, it is insufficient to add prosody only as a module alongside syntax and semantics. Instead, grammatical models must also take account of an interaction between prosody and lexical semantics.
The organization of this article is as follows. We first begin with an introduction to the role of prosody in Korean and the ways in which it has been previously examined and analyzed by the field. We then provide the broader context in which research is situated, which is at the interface of syntax and prosody. Both the theoretical and experimental motivations of our study are presented, followed by our reporting of the study itself, detailing the methods and the results, and a discussion of the implication of our findings and methodological issues. We also touch on issues regarding formal analysis of the Korean syntax-prosody interface using Lexical Functional Grammar (LFG) and conclude by proposing further avenues for future research.

2 Background

Prosody in Korean is associated with grammatical mood, with characteristic prosodic patterns for declarative, interrogative, propositive, and imperative moods (Jun, 2005). It can also disambiguate open and polar questions in sentences with ambiguous content pro-forms (CPFs), often termed wh-words (Jun & Oh, 1996) or wh-phrases (Hwang, 2009). In speech styles such as the polite speech style, where there is no morphological marking of mood, this can lead to sentences with three possible readings, which are only disambiguated prosodically, Example (1) taken from Jun and Oh (1996):
(1) acwumeni-nun encey ecilewe-yo
madam-top  when/sometimes feel.dizzy-pst-pol
Declarative: “Madam sometimes feels dizzy.”
Open: “Madam, when do you feel dizzy?”
Polar: “Madam, is there any time that you feel dizzy?”

2.1 Korean prosody

Jun (2005) provides an account of standard Korean prosody without lexical tone, stress, or pitch accent, in which prosodic phrases are marked by tone patterns at their boundaries. Above the level of the prosodic word, the core building block is the AP, which Jun defines as “a tonally demarcated unit which can contain more than one lexical item” (p. 205). The underlying tonal pattern for an AP is THLH, with T-H at the left edge of the AP and L-H at the right end of the AP. The first tone, underspecified for the value of “Tone” and given here as T, is usually L but can appear as H when it is associated with a syllable that has either an aspirated (/kh/, /th/, /ph/) or tense (/k*/, /p*/, /t*/) initial obstruent (Jun, 1996; Jun & Oh, 1996, p. 39). It is also possible for the final tone to be realized as L (Jun, 2000). The four tones that specify an AP are associated autosegmentally with the syllables contained in the AP, illustrated schematically in Example (2) taken from Jun and Oh (1996, p. 40). Where an AP has four syllables, each tone is associated with its own syllable, as shown in Example (2a). If there are more than four syllables, the pitch declines from the T-H on the left edge to the L-H on the right edge across the AP, as shown in Example (2b). Where an AP has three or fewer syllables, either or both of the second and third tones of the underlying form are not realized. This makes the specifying pattern T-(H)-(L)-H, so for a three-syllable AP, tonal options are T-H-H or T-L-H, as shown in Example (2c), and for a two-syllable AP, the specifying pattern is T-H, as shown in Example (2d):
(2) a. T H L H σ σ σ σ
b. T H  L H  σ σ σ σ σ
c. T H/L H  σ σ σ
d. T H  σ σ
Jun (2005) defines Intonation Phrases (henceforth IPs) as one or more APs that have a final boundary tone pattern, indicated in text with the symbol % after the tones. The final boundary tone replaces the final H of the last AP in the IP, which no longer appears. This boundary tone pattern is observed on the final syllable of the IP, which is lengthened. From Jun’s inventory of nine IP-final boundary tones (Jun, 2005, p. 216), the relevant tonal patterns are HL%, characteristic of declarative statements, and LH%, which is characteristic of questions. Jun and Oh (1996, p. 44) found that, in Seoul Korean, polar questions mostly ended in H% with occasional LH% cases, and wh-questions mostly ended with an LH%, although H% and HL% were also observed. These suggest that the statements end with a low-tone target, and both types of questions end with a high-tone target in most cases.

2.2 Studies examining question focus in Seoul Korean

An early investigation of question focus in Korean was the production study reported by Jun and Oh (1996), who accounted for the disambiguation of sentences such as (1) by the placement of AP boundaries. In their study, stimuli were presented as a two-sentence dialogue. The ambiguous target utterance contained an initial element, a CPF, and a verb. This was either preceded or followed by a sentence that indicated the intended reading, which could be an open question, a yes/no question, or an incredulity question, where the speaker is surprised by the preceding statement. For the polar question reading, an AP boundary is predicted before the final verb, whereas for an open question, this boundary is predicted to be absent. Alongside their analysis of final boundary tones, Jun and Oh (p. 46) found in their experimental data that where a CPF functions as a question word in an open question, it appears in a single AP with the following verb, whereas when the CPF functions as an indefinite pronoun in a polar question, there is an AP boundary before the verb (Figure 5, p. 48).
Jones (2016), responding to Jun and Oh (1996), carried out a speech production experiment where native speakers of Seoul Korean (n=9, seven female and two male) were asked to produce sentences with a three-way ambiguity between statements, open questions, and polar questions such that hearers would understand one particular meaning out of the three possibilities. The stimuli included pairs of sentences: a short form, Example (3), where the CPF was adjacent to the verb as in Jun & Oh, and a long form, Example (4), where a modifier phrase of six to nine syllables intervened between the CPF and the verb:
(3) namtongsayng-i mwe-lul masyess-eyo
younger.brother-sbj what/something.obj drink.pst-pol
 Declarative: “Younger brother drank something.”
Open: “What did younger brother drink?”
Polar: “Did younger brother drink something?”
(4) namtongsayng-i mwe-lul yaoi hayngsa-eyso masyess-eyo
younger.brother-sbj what/something.obj outdoor festival-loc  drink.pst-pol
Declarative: “Younger brother drank something at the open-air festival.”
Open: “What did younger brother drink at the open-air festival?”
 Polar: “Did younger brother drink something at the open-air festival?”
For the short questions, 66% of open questions were produced with no boundary between the CPF and the verb, as predicted by Jun & Oh, and 34% were produced with an intervening boundary. Polar questions showed the converse pattern, with 34% of utterances having no boundary between the CPF and the verb (contra Jun & Oh) and 66% of utterances having an intervening boundary, in line with Jun & Oh’s predictions. For the long utterances, whether open or polar, there was always an AP boundary immediately after the CPF and while 90% of polar questions had the predicted pattern of an AP boundary immediately before the verb, 78% of open questions also had this pattern. Jones (2016) did, however, find systematic differences between open and polar questions in both short and long utterances in the heights of the F0 peaks at the CPF and the verb, with the F0 peak at the CPF being higher for open questions compared with polar questions, and the F0 peak at the verb being higher for polar questions than for open questions. Based on these findings, Jones claimed that AP boundaries alone cannot account for the resolution of the ambiguity. He proposed an account including a prosodic feature expanded pitch range and generated a formal syntactic analysis using LFG that associated the right edge of the expanded pitch range with the right edge of the syntactic scope of question focus. We return to this analysis in Section 5.3.
In contrast, Yun (2019) contends that only dephrasing after the CPF contributes to the reading of an utterance, whereas the raising of the pitch of the CPF does not contribute to that interpretation. Sentences in which the CPF itself was manipulated to have a higher pitch were only interpreted as open questions 10% of the time. On the contrary, sentences in which pitch points following the CPF were erased (dephrased) were interpreted as open questions 66% of the time. The interaction between pitch raising and dephrasing after the CPF had statistical significance (p<.022); this indicates that pitch raising when combined with dephrasing will actually lower the probability of an open question interpretation. In addition, for utterances that were interpreted as declarative, a positive correlation was observed between pitch raising at the CPF and a wide scope interpretation, in which the pro-form, for example, someone, did not refer to a specific entity. When a sentence had neither pitch raising nor dephrasing, the CPF was more commonly interpreted to have narrow scope, thus referring to a specific entity.
In later research, Yun and Lee (2022) argue that three prosodic factors, namely the F0 peak height of the CPF, an L tone following the CPF, and the IP boundary tone, are the cues that speakers exploit when distinguishing between polar and open readings of ambiguous questions. In their study, which comprised both a perception and a production experiment, Yun and Lee found that F0 values of the H tone pitch peak in the CPF are higher when reading open questions than when reading polar questions. Female speakers had a tendency for a greater difference in pitch across the readings relative to male speakers. In their perception experiment, for stimuli that were read as a polar question with natural prosody, the removal of the L tone following the CPF and the changing of the sentence boundary tone from H% to LH% increased the likelihood of an open question interpretation. Figure 1 shows the effect of the manipulation, which removed the L tone but maintained the H% boundary tone. Representations of all their manipulations are given in Yun and Lee (2022, Figures 7, 8, p. 33–34). Likewise, for stimuli originally read as an open question, changing the sentence boundary tone from LH% to H% was effective in eliciting a polar question response. However, the addition of an L tone after the CPF was not, by itself, effective. They further found that only changing the F0 height, without also changing other factors, did not alter the participants’ interpretation, that is, when the pitch of the CPF in a sentence originally read as a polar question was raised, and no other factors were manipulated, participants rarely interpreted the manipulated sentence as an open question.
Figure 1. Schematic of one of Yun and Lee’s experimental manipulations. Diagram p1 shows the naturally produced stimulus; p3 shows the effect of removing the L tone. Reproduced from Yun and Lee (2022, Figure 7, p. 33).
Yun and Lee report such findings to be surprising, in particular the lack of effect of raising the pitch of the CPF on perceiving a question as an open question. But they offer two possible accounts, one being that the raised tone was potentially under-represented in the experiment and the other that the raised tone is not significant when disambiguating question types. They posit that increasing the F0 and intensity of the CPF would possibly enhance perceptual salience. They also note the interaction between the pitch peak of the CPF and the final IP boundary tone; when both a raised pitch peak and an LH% boundary tone are present, perception of an open question increases.

2.3 Related studies

Related studies have examined the prosodic realization of question focus in other Korean dialects and the realization of contrastive focus in Seoul Korean. Hwang (2009), investigating question focus in Kyungsang Korean, posits that the critical difference between open and polar questions is the final boundary tone. A perception experiment was conducted in which sentences were constructed from one interrogative clause being embedded within another. In some cases, the wh-phrase occurred in the embedded clause (in situ), whereas in others, it occurred in the matrix clause (scrambled). In addition, clauses varied by prosody; in some cases, both clauses would have the same prosody (both indicative of either an open or a polar question), whereas in others, clauses would differ in prosody (i.e., Clause 1 having open question prosody, with Clause 2 having polar question prosody). Participants (n=6) listened to recordings of the questions and then indicated whether each question was acceptable, along with whether it was an open or a polar question.
The results from Hwang indicate that in questions that also contained embedded questions, the prosody of the matrix clause (cueing either an open or a polar question) influenced the interpretation of the sentence as the corresponding question type. In addition, these interpretations arose regardless of the position of the wh-phrase itself (either embedded or in the matrix clause). In Kyungsang Korean, which contains distinct particles for open questions (-no) and polar questions (-na), Hwang finds that the presence of those particles holds greater influence than the prosody in almost all environments. However, in questions in which the polar question particle is present, yet the prosody of the question matches that of an open question, the question is more likely to be interpreted as an open question, whether the wh-phrase was embedded or in the matrix clause. In Hwang’s analysis of the Kyungsang Korean speakers, open questions are always marked by an L% boundary tone regardless of the position of the wh-phrase, whereas polar questions are realized with either H% or HL% boundary tone. Hwang ultimately interprets her findings in support of the claim that the final boundary tone, specifically the falling boundary tone present in open questions, is the key difference between the two question types, although she does suggest the possibility that other prosodic features may be at play, requiring further research. From a cross-dialectal point of view, Korean varieties can also make use of boundary tones to disambiguate the three possible readings for the utterances. However, such evidence is lacking in Seoul Korean, where this is not a reliable cue, as both rises and falls can be used for all three types of utterance (see Yun & Lee, 2022, p. 20 for a discussion). Thus the interaction between ambiguous CPFs, particles, and prosody suggests that it may not be safe to view any particular dimension of the speech signal as the main factor in disambiguation.
Turning to contrastive focus, production studies of Seoul Korean show that a raised F0 peak at the focus site is generally important, although often observed alongside other elements. Jun and Lee (1998) found that the start of the focus scope was marked with an AP boundary and that there was a tendency for the AP with the focused constituent to extend into following words. Pitch expansion was also observed in the focused AP, and this signal was more important than duration or post-focus compression in Korean. Lee & Xu (2010) reported F0 expansion during the focused AP, but in this case, it was reliably followed by F0 compression. Hatcher et al. (2024) also found that contrastive focus was expressed primarily through F0 modulation rather than through phrase boundaries. In this study, the nature of prosodic expression depended on the position of the focused constituent in an AP: focus at the start of the AP could result in an elevated F0 peak. They found no clear evidence for the impact of contrastive focus on phrase formation and argue that focus is “just one of several potentially competing structures that determine a sentence’s phrasing” (p. 1). However, Lee et al. (2015), comparing prosodic patterns in Seoul Korean, Mandarin Chinese, and English, found no conclusive evidence for the role of F0 or other prosodic elements, commenting that prosody was “neither clearly marked in production nor accurately recognised in perception” (p. 4754).

2.4 Summary

The role of prosody in perceiving and processing ambiguity in Korean remains a source of robust debate, with expanded and compressed F0 range and lexical biases being seen as factors in disambiguation alongside the AP boundaries. The present study aims not only to contribute further to the debate but also to sharpen our conception of the role of prosodic cues in syntactical analyses.
The gating paradigm (Grosjean, 1980, 1996a) has been used to investigate perceptions of prosody and its contribution to hearers’ interpretation and predictions of audio stimuli, including questions (e.g., Grosjean, 1996b; Hansen et al., 2023; Petrone & Niebuhr, 2014a). Accordingly, we chose this paradigm to explore the potential contribution of the proposed expanded pitch range feature.

3 Experiment 1

We began with a pilot study to test the practical application of the gating paradigm in this question and validate the claims that prosody alone allows hearers to identify the scope of focus and thus disambiguate between statements, open questions, and polar questions.
Participants were played repeated, incrementally longer fragments of six naturally produced utterances, starting each time at the beginning of the utterance, and were asked to categorize the utterance as statement/polar/open/unknown on the basis of what they had heard.

3.1 Method

3.1.1 Stimuli

Six stimulus sets were created. For each set, a three-way ambiguous sentence containing a CPF was used. All of the ambiguous sentences followed the template in Figure 2. Open and polar question stimuli were created from recordings collected as part of Jones (2016), from eight native speakers of Seoul Korean (six female and two male) aged between 18 and 35, studying at the University of Oxford. For each recording, participants were presented with a screen showing contextual information and the ambiguous sentence. They were asked to read the sentence aloud as a question in such a way that it made sense in context, and such that a hearer would be able to infer the context. Where the target was an open question, the context was “You know some, but not all, details of an event.” and where the target was a polar question, the context was “You don’t know whether an event happened.”
Figure 2. Template for preparing stimuli for Experiment 1.
A native speaker of Seoul Korean reviewed the recordings from Jones’ experiment, and for the open and polar question types, selected the recording that most clearly portrayed the associated meaning, resulting in recordings made by five female speakers and one male speaker. The native speaker identifying the utterance types was naïve to the experiment and the underlying research. The native speaker also had broad training in linguistics but very minimal training in syntax and/or prosody.
Statement stimuli were recorded at the time of this experiment by a further female native speaker of Seoul Korean who was asked to read the sentence aloud so that it would be unambiguously understood as a statement. All of the above-mentioned stimuli were recorded digitally with a sampling rate of 44.1 kHz using a professional-grade microphone in a sound-attenuated room.
Using Praat (Boersma & Weenink, 2023) and scripts provided by Lennes (2017), the selected recordings were divided into audio segments following Row 2 of the template in Figure 2. From these segments, a series of incrementally longer utterances was generated, each utterance adding one segment. Thus, for participants, the first constituent and the intermediate constituents were each presented as word-length fragments, and the CPF and verb were presented as fragments that increased one syllable at a time.

3.1.2 Procedure

Participants accessed the stimuli via the PsyToolkit website (Stoet, 2010, 2017), after confirming that they were native speakers of Korean and giving their consent to participate. Stimuli were divided into three cohorts using a Latin square design such that each participant was presented with six trials, two of each stimulus type (open, polar, statement). These trials were presented in a random order. Within each trial, the individual utterances were presented in turn, in increasing length (see Figure 6, which illustrates the similar procedure for Experiment 2). After each presentation, participants selected a checkbox to indicate whether they had heard a statement, an open question, or a polar question, or whether they did not know. Participants then had to click a button to confirm their choice and move to the next item.

3.1.3 Participants

In total, 26 participants started the experiment, of whom 12 completed all six trials and a further participant completed three trials. The remaining 13 participants dropped out of the experiment during the first trial; we discuss this further in Section 3.3. All participants were native speakers of Seoul Korean residing in Korea or the United States, recruited through social networks. Further demographic information was not collected.

3.2 Results

Results are presented for the 75 completed trials. Table 1 shows the responses given by participants to the stimuli during their presentation. The figures for CPF and verb, where the incremental step was by syllable, include all of the steps.
Table 1. Hearers’ Disambiguation of the Stimuli.
Sentence typeNumber of trialsIdentified asFirst constituentContent pro-formIntermediate constituentsVerbFinal syllable
Statement25Statement416619
Open08584
Polar00120
Don’t know21161392
Open25Statement31211
Open014131924
Polar10100
Don’t know2110950
Polar25Statement30151
Open01071216
Polar11138
Don’t know21141650
Note. Correct responses are shown in bold text.
The pilot results show that statements (19/25 trials) and open questions (24/25 trials) were ultimately reliably disambiguated, and that for open questions, the raised F0 peak at the focused CPF often allowed disambiguation even before the possibility of post-focus compression was available. For statements, there was a tendency to erroneously disambiguate at the CPF or subsequently, with evidence for the correct meaning building during the later stages of the utterance. The picture for polar questions is more complicated. Similar to statements, there was a tendency to identify the utterance as an open question once the CPF had been heard, but ultimately disambiguation was not reliably successful, with only 8/25 trials correctly identified and 16/25 trials incorrectly identified as open questions.

3.3 Interim discussion

The gating paradigm worked, but the high participant dropout rate suggested that the nature of the interface and the size of the incremental steps needed to be improved for the full experiment. Feedback from participants suggested that the detailed operation of the pilot experimental website, where multiple actions were required between each presentation of an utterance, may have had an effect, and our method for Experiment 2 was amended to address this by reducing the number of increments within each trial and by building a smoother interface with only one click needed to progress to the next utterance.
One possible explanation for the incorrect disambiguation for statements and polar questions at the CPF could be that these indefinite pro-forms are preferentially parsed as a question word in the absence of an easily accessible antecedent. Because the utterances are out-of-the-blue, this leads to a default question reading for the word, which is associated with an open question reading for the whole utterance. For statements, the unambiguous HL% boundary tone is inconsistent with an open question reading and so forces the correct reanalysis, but for questions, the cue from the LH% boundary tone is still consistent with an open question reading. For reanalysis to occur, the hearer must also pay attention to the expanded pitch range at the verb, and if they have already committed to the open reading, this may be less likely.

4 Experiment 2

The results from Experiment 1, the pilot study, were only partially in line with our predictions. For open questions, the presence of a CPF (and potentially its associated prosody) seemed to provide a strong early cue to disambiguation, but for polar questions, this seemed to be a distractor, even without the prosody associated with open questions. We also had concerns about the high dropout rate of participants early in the experiment and that the cumbersome experimental interface might have provided a confound.
In light of this, we explore further and investigate the impact of manipulating F0 levels, using an improved web interface and reducing the number of repeated presentations for each stimulus. Our research question was:
RQ1. How does reducing the naturally produced F0 range of focused constituents in utterances with indefinite CPFs affect hearers’ comprehension of those utterances?

4.1 Hypothesis and predictions

Our main hypothesis, and the basis on which we started the study, was that expanded pitch range is the primary cue used by hearers to decide which constituent of the question is in focus, and therefore, how to disambiguate the occurrence. Expanded pitch range, following Jones (2016), was assumed to be present at the focused constituent. This gave us specific regions of interest in our test stimuli. For open questions, the region of interest was the AP containing the CPF, and for polar questions, the region of interest was the AP containing the verb plus the sentence-final question tune LH%.
The predictions according to this hypothesis for the two extremes of the variation continuum (naturally produced stimuli vs. stimuli with expanded pitch range removed) are shown in Table 2. We predicted that for the intermediate levels of variation, a gradient response would be observed, such that the stimuli where the F0 pitch range has been reduced will be correctly identified less often.
Table 2. Main Hypothesis Predictions—Expanded Pitch Range (EPR) Assumed as the Determiner.
Sentence typeNatural stimulus (100% EPR for questions)Zero EPR
OpenCorrectly identified as open at CPFRandom identification at yo
PolarIdentified as polar at verbRandom identification at yo
Statementn/aIdentified as the statement at yo
However, from the pilot study, there was a tendency for polar questions to be mistakenly identified as open questions. We thus had an additional hypothesis that there is a lexical preference for questions with CPFs to be interpreted as open questions, and that this preference may override the effect of polar question prosody. The predictions for this second situation are shown in Table 3. Again, we predicted a gradient effect for those situations where prosody is involved in hearers’ decision-making, larger F0 ranges being associated with more successful disambiguation of the utterance type.
Table 3. Additional Hypothesis Predictions—Interaction Between Expanded Pitch Range (EPR) and Lexical Preference.
Sentence typeNatural stimulus (100% EPR for questions)Zero EPR
OpenCorrectly identified as open at CPFCorrectly identified as open at CPF
PolarIncorrectly identified as open at CPF, possibly corrected to Polar at verbIncorrectly identified as open at CPF
Statementn/aIncorrectly identified as open at CPF, corrected to statement at yo

4.2 Method

4.2.1 Participants

Participants were recruited through Prolific.1 All participants reported their first language as Korean. A total of 124 participants completed the experiment, of which 85 identified as female, 38 as male and 1 did not state a gender. The age range of participants was 18–68 years with a mean age of 32.08 years and an interquartile range of 25–36 years. In total, 102 participants were born in Korea, 17 in the United States, 3 in Canada, 1 in Germany, and for 1 participant, these data were unavailable. In total, 57 participants were residing in the United States, 26 in Korea, and 19 in Canada. A total of 14 participants were residing in other English-speaking countries, and eight were residing in other countries. Participants were paid GBP 1.75 for participation, which represented a payment of GBP 12.00 per hour at the median length of time to carry out the experiment.

4.2.2 Stimuli

Twenty-one sets of stimuli were generated by a native speaker of Seoul Korean, recorded in a sound-attenuated room at a sampling frequency of 44.1 kHz. Each set consists of the same ambiguous sentence, read aloud three times with the speaker asked to produce the sentence as a statement, an open question, or a polar question, respectively, as in Example (5):
(5) hakchangsicel ttay nwukwu-lul mollay sarangh-ayss-eyo
school.days during who/someone-obj secretly love-pst-pol
a. “(I) secretly loved someone when I was at school.” (statement)
b. “Who did you secretly love when you were at school?” (open question)
c. “Did you secretly love someone when you were at school?” (polar question)
Once the recordings were generated, we then created the corresponding TextGrids in Praat (Boersma & Weenink, 2023) using a script readily available online.2 Working within Jun’s (2005) description of the Korean AP, we followed Jones (2016) in assuming that focus is associated with expanded pitch range. The schematic diagram in Figure 3 shows an idealized version of the prosodic patterns that we are assuming; the placement of phrase boundaries in the naturally produced stimuli is shown in Table 4. All 63 naturally produced stimuli had an AP boundary after the first constituent, and no stimuli had AP boundaries within either the CPF or the verb. In total, 23 of the 63 stimuli had a boundary pattern matching the schematic diagram (pattern F in Table 4).
Figure 3. Schematic diagram of the prosody for the three types of stimuli showing constituent boundaries and declination. The regions of interest for open and polar questions are shown in gray.
Table 4. AP Boundary Placement Within Stimuli.
AP boundary patternAP boundary presentStimulus categoryTotal
After first constituentAfter CPFDuring adverbialAfter adverbialOpenPolarStatement
A+1001
B++126624
C++1113
D+++3115
E++0123
F+++410923
G++++0224
There are no significant differences in the distribution of patterns between the three stimulus categories.
The remaining 40 stimuli showed some variation from the idealized pattern in the placement of AP boundaries after the first constituent. Table 5 shows how the presence or absence of AP boundaries at specific points during the stimuli was distributed between the different categories. Open question stimuli were significantly different from the other two categories in the presence of an AP boundary immediately after the CPF (p<.01), but this was not deterministic: for polar question and statement stimuli, eight stimuli in each category did not have an AP boundary after the CPF. There were no differences between the three categories for boundary placement during or after the adverbial.
Table 5. Differences Between Stimulus Categories in the Placement of AP Boundaries at Specific Points During the Utterances.
AP boundary presentBoundary patternsStimulus categoryComment
OpenPolarStatement
After CPF
 –A B C D1788χ2=10.31,df=2
 +E F G41313p<.01
During adverbial
 –A B E F171717χ2=0, df =2
 +C D G444p=1
After adverbial
 –A C E223χ2=0.32,df=2
 +B D F G191918p=.85
In 10 of the 21 stimulus sets, the AP boundary patterns were identical across all three categories. Of the remaining 11 sets, 10 had polar questions and statements patterning together; one set had open and polar questions patterning together; and one set had open questions and statements patterning together.
Having recorded and analyzed the baseline stimuli, we proceeded to generate test stimuli by manipulating the F0 contour in the region of interest, which was the CPF for open question stimuli and the verb for polar question stimuli. Stimuli for declarative statements were not manipulated. All manipulations were based around F0 measurements from the entire AP that included the region of interest. Up to four points were measured, depending on how Jun’s T-H . . . L-H tone pattern was realized. An example of the measurement points for one AP in one stimulus is shown in Figure 4.
Figure 4. Praat-generated F0 contour for the final AP of a polar question stimulus. The four reference points T, H1, H2, and L are used in calculating F0 values for the variant stimuli.
From each baseline open and polar question utterance, we created a set of five test stimuli. The sets had four equal steps between the full extent of pitch expansion in the region of interest and a baseline F0 contour measured in the corresponding region of the comparator stimulus. For open questions, the comparator was the polar question, and for polar questions, the comparator was the open question. All manipulations reduced the height of the F0 peaks in the regions of interest, and the minimum and maximum values of those peaks matched F0 values that were naturally produced.
For open questions, the region of interest was not at the edge of the sentence, and so initial and final F0 were the same for original and manipulated stimuli. For polar questions, the verb was in focus, and so the region of interest was the AP containing the verb. Because this AP was also IP-final, the AP-final H tone was replaced by the LH% boundary tone associated with questions. Because the statement had a sentence-final HL% boundary tone, the pitch expansion contrast for polar questions was with the corresponding open question. Again, we created test stimuli that have four equal steps in the region of interest, shown in Figure 5.
Figure 5. Creation of variants for the open question: FOPEN and FPOLAR are the F0 range from the start to the maximum for the open and polar questions, respectively. Zero expanded pitch range is taken to be FPOLAR and full expanded pitch range is taken to be FOPEN. The pitch maximum of the open question AP is manipulated to produce variants at 0%, 25%, 50%, and 75% of the pitch range difference.
For polar questions, we followed Jones (2016) and assumed that there was also an expanded pitch range at the final LH% tone. We therefore also created four equal steps in the utterance-final verb and particle. Here, the extent of manipulation was the natural F0 difference between open questions and polar questions.
The aforementioned variants were produced by manipulating the F0 contour using Praat (Boersma & Weenink, 2023) using the following procedure:
1.
For each stimulus in a set, the regions of interest were identified.
2.
For each element of the region of interest, the log F0 was taken at the start, the maximum, the minimum, and the end of the phrase. Logarithms were used so that the manipulated variants would be equally spaced in terms of pitch rather than frequency. The start-maximum F0 range for APs, here called M, was calculated by subtracting the starting log F0 from the maximum log F0, Formula (1):
M=logF0maxF0start
(1)
For the open and polar questions, the start-maximum F0 range for the LH% tune T was calculated in the same way, Formula (2).
T=logF0maxF0start
(2)
The start-end F0 range for APs E was calculated by subtracting the starting log F0 from the ending log F0, Formula (3).
E=logF0endF0start
(3)
3. For the open question, there was one element of the region of interest and one point to manipulate: the F0max. The F0 range F for the AP containing the constituent in focus was calculated as the open question start-maximum F0 range, here called MO, minus the statement start-maximum F0 range, here called MS. Revised F0 maxima (F0′max) for the manipulated stimuli were calculated using Formula (4), where x is the proportion of the F0 range for the region of interest included in the manipulated stimulus:
logF0max=logF0start+MS+x.F
(4)
The F0 contour of the natural open question was streamlined to remove all points except the start, the maximum, the minimum (if this was not also the end of the phrase), and the end of the phrase. The maximum point was then changed to the revised F0. Four variants were produced for each open question, with the proportion x being equal to 0%, 25%, 50%, and 75%, respectively.
4. For the polar question, the AP forming the region of interest includes the sentence-final LH% tone. Within the region of interest, there are three points to manipulate: the maximum of the AP before the sentence-final LH% tone, the pitch at the start of the sentence-final LH% tone, and the maximum of the sentence-final LH% tone.
(a) For the maximum of the AP, the focus F0 range F is the difference between the polar question AP start-maximum pitch range MP minus the corresponding open question start-maximum F0 range MO. The revised F0 maxima (F0′max) were calculated using Formula (5):
logF0max=logF0start+MO+x.F
(5)
(b) For the pitch at the start of the sentence-final LH% tone, the focus pitch range F is the difference between the polar question AP start-end pitch range EP minus the corresponding open question start-end pitch range EO. The revised pitch levels (F0′boundary) were calculated using Formula (6):
logF0boundary=logF0start+EO+x.F
(6)
(c) For the maximum of the sentence-final LH% tone, the focus pitch range F is the difference between the start-maximum pitch range TP of the polar question final LH% tone and the corresponding open question start-maximum pitch range TO. The revised pitch maxima (F0′tune) were calculated using Formula (7).
logF0tune=logF0boundary+TO+x.F
(7)
(d) Having calculated the manipulated values, the F0 contour of the natural polar question baseline was streamlined to remove all pitch points except the start, the AP maximum, the AP minimum (if this was not also the end of the phrase), the boundary between the AP and the LH% tone, the maximum of the LH% tone, and the end of the LH% tone (if this was not also the maximum). The AP maximum, the boundary pitch, and the LH% tone maximum points were then changed to the revised pitches. Again, four variants were produced for each open question, with x= 0%, 25%, 50%, and 75%, respectively.
We expected that manipulation might reduce the audio quality, and thus, the intelligibility of the stimuli. All 168 manipulated stimuli were validated by asking native speakers of Seoul Korean (n=9) to judge their intelligibility on a five-point Likert-type scale (1 = completely unintelligible, 5 = completely intelligible). The mean acceptability across all stimuli was 4.54, but 16 of the 105 stimuli had a mean score below 4.0, and these were excluded from the results.
Once the full set of manipulated utterances was ready, the individual gating stimuli were prepared, with one set of stimuli for each manipulated utterance. The stimuli were segmented using Praat following the model in Figure 6; there were five segments for each stimulus. The open question region of interest with the CPF was first presented in Stimulus 2, and the polar question region of interest was first presented in Stimulus 4. Only in Stimulus 5 did participants hear the tune associated with either a question or a declarative statement.
Figure 6. Content of individual stimuli within a set. Regions of interest are marked in gray.
Following segmentation, the gating stimulus files were produced using a script amended from the Speech Corpus Toolkit for Praat (Lennes, 2017).

4.2.3 Procedure

Participants were presented with stimuli via a website written using OpenSesame (Mathôt et al., 2012) and jsPsych (de Leeuw et al., 2023), which was powered by a JATOS server (Lange et al., 2015) hosted at the University of Groningen. After giving consent to participate, participants were shown instructions, which included explanations for what statements, open questions, and polar questions are, respectively. Having confirmed that they had read the instructions, participants continued to the data collection screen. Four buttons were presented in a horizontal row at the center of the screen with labels in Korean acik molukessta “Don’t know yet”; kaypanghyeng cilmun “Open question”; phyeyswayhyeng cilmun “Closed question”; cinsul “Statement.” Below the buttons was the question etten mwuncangul tutko issnayo? “What sort of sentence are you listening to?,” and at the top of the screen was a bar showing progress through the experiment.
Stimuli were played automatically when the page loaded, and once the participant had made a choice, the page re-loaded to play the next stimulus. It was not possible for participants to replay the stimuli.
Because each stimulus set contained 11 members (five variants of open questions, five variants of polar questions, and one declarative statement), participants were randomly allocated to one of 11 cohorts. Each cohort heard one member of each of the stimulus sets in a Latin Square design, a total of 21 trials with no repetition of stimulus sets. During the experiment, each participant was presented with a mixture of open questions, polar questions, and sentences, and for the open and polar questions, there was a mixture of the five variant levels of prosody. The order of presentation of the stimulus sets was random for each participant.
For each trial, participants were presented with the five stimuli in the utterance set, in increasing order of length. Once all five stimuli had been heard, the next utterance set was presented. Four times during the experiment, at the end of an utterance set, participants were asked a question to confirm they were paying attention, in line with guidance from Prolific. The question was a multiple-choice question, and the question included the answer that was required to be given. Participants who answered two or more of these attention questions incorrectly were excluded from the study.

4.3 Results

We begin with a presentation of the data in Section 4.3.1 before introducing a descriptive statistical model in Section 4.3.2.

4.3.1 The data

In this section, we present the raw experimental data. Participants who failed the attention checks as described above were excluded (2/126). All data points from the remaining 124 participants were included in the analysis.
Did participants accurately disambiguate the stimuli? Figure 7(a) shows how participants disambiguated open question stimuli after the whole of the stimulus had been heard, and Figure 7(b) shows the same for the polar question stimuli.
Figure 7. The impact of variant on participants’ responses to question stimuli after the stimulus had been completely heard. (a) Open stimuli. (b) Polar stimuli. X-axis = the percentage of natural prosody present. Y-axis = the number of responses.
Open question stimuli were reasonably reliably disambiguated (range = 79%–84%). However, polar question stimuli were disambiguated much less reliably (range = 29%–55%). Only with 100% of natural prosody were more than 50% of the stimuli reported as polar questions. For all other manipulations, there was a preference to report the stimuli as open questions.
Statement stimuli had no prosodic variants, so Figure 8 shows how participants’ responses to these stimuli changed over time. In some trials, statements were identified as open questions at the CPF and following adverbial, and at the verb, the responses were spread between don’t know, statement, and open question. However, by the end of the stimulus set, statement stimuli were being reliably identified.
Figure 8. Responses to statement stimuli during iterative presentation. X-axis = segments, 1: introduction; 2: CPF; 3: adverbial; 4: verb; 5: sentence-final particle -yo. Y-axis = the number of responses.
How did prosody affect disambiguation during the utterances? Figure 9(a) shows the proportion of open question stimuli that were correctly identified with different levels of natural prosody, and Figure 9(b) shows the same for polar question stimuli.
Figure 9. The impact of manipulating prosody on the timing of correct responses. X-axis = segments, 1: introduction; 2: CPF; 3: adverbial; 4: verb; 5: sentence-final particle -yo. Y-axis = the number of responses. (a) Open question stimulus correctly perceived. (b) Polar question stimulus correctly perceived.
For the open questions, accuracy increases as more of the stimulus is heard, with accuracy increasing most strongly after Segment 3 and a smaller increase in accuracy from Segment 2 to Segment 3. There is some gradient effect associated with the proportion of natural prosody at the CPF and the subsequent adverbial, but this disappears by the end of the utterance. It appears that Segment 2 is more important in disambiguation than Section 3. However, this is not to the extent that would support a claim that expanded pitch range at the CPF (Segment 2) is unambiguously associated with open questions. If this were the case, we would have expected a higher level of disambiguation at Segment 2 modulated by the proportion of focus prosody present.
For the polar questions, disambiguation seems to begin at the verb (Segment 4), but the highest level of disambiguation takes place once the sentence-final LH% tone at the particle -yo has been heard (Segment 5). This is later than the theory would predict, although there does appear to be a gradient effect of prosody in the correct disambiguation.
Figure 10(a) shows how prosody affected the incorrect disambiguation of open question stimuli during the repeated presentations. Figure 10(b) shows the same for polar question stimuli.
Figure 10. The impact of manipulating prosody on the timing of incorrect responses. X-axis = segments, 1: introduction; 2: CPF; 3: adverbial; 4: verb; 5: sentence-final particle -yo. Y-axis = the number of responses. (a) Open question stimulus incorrectly perceived as a polar question. (b) Polar question stimulus incorrectly perceived as an open question.
For open questions, there is a slight increase toward the end of the sentence but no discernible gradient effect of prosody. For polar questions, some participants identify the stimulus as an open question as soon as the CPF is heard (Segment 2), and misidentifications continue to increase as more of the stimulus is heard. In this case, there appears to be a gradient effect of prosody at Segment 5 but not earlier, with increasing natural prosody leading to fewer inaccurate disambiguations. A gradient effect would be in line with theoretical predictions, but the level of incorrect disambiguations is not, particularly the continuing increase at Segments 4 and 5, once the verb has been heard.

4.3.2 Statistical model

Five generalized additive models (GAMs; Hastie & Tibshirani, 1990) were constructed using the packages mgcv (Wood, 2011) and itsadug (van Rij et al., 2022) within R (R Core Team, 2018). GAMs were chosen because the nature of the experimental variables does not satisfy the requirement of independence necessary to use a linear mixed-effects model. However, their results must be interpreted with caution. Two GAMs were constructed for each of the open and polar question categories, respectively, with the dependent variable being a binary predictor of whether the stimulus had been correctly identified. Fixed effects were segment and variant. Random effects were participant and item in relation to segment and order in relation to participant. For each category, one of the models also included a fixed effect of the interaction between segment and variant. For the statement category, where there was no manipulation of the stimuli, and therefore, no variants, one GAM was produced with segment as a fixed effect and the same random effects as the other models. Models were visually inspected to check for structure in the random effects; none was found.
The maximal formula used for the models is shown at (6). The left-hand side term correct is a derived Boolean that is true when a participant correctly disambiguated the stimulus. The right-hand side terms (a)–(b) represent fixed experimental effects, term (c) represents a possible interaction between the fixed effects, and terms (d)–(f) represent the random effects of participant, item, and order of presentation of the stimuli, respectively. The terms in (g) are the other parameters of the calculation:
(6) gam(correct ~ s(variant, k = 4)                   (a)
          + s(segment, k = 4)                   (b)
          + ti(variant, segment, k = 4)             (c)
          + s(segment, participant, bs = “fs,” m = 1, k = 4) (d)
          + s(segment, item, bs = “fs,” m = 1, k = 4) (e)
          + s(participant, order, bs = “fs,” m = 1, k = 20), (f)
          data = dataset, family = binomial, discrete = TRUE) (g)
For statements, right-hand side terms (a) and (c) were not used in the model because no pitch manipulation was used in preparing statement stimuli, and the value of the parameter k in (f) was set to 10 because of the smaller number of data points compared with the other stimulus types. The statement model explained 67.3% of the variance with a highly significant effect of segment (p<.001).
Figure 11 shows the effect of the segment on participants’ correct identification of statements. By the region of interest, Segment 5, statements are being correctly identified as expected. The model predicts a slight reduction in accuracy at Segment 2, when the CPF is heard.
Figure 11. Effect of segment on the correct identification of statement stimuli. The region of interest is segment = 5.
The pairs of models with and without the fixed interaction between segment and variant were compared. The amount of deviance explained was similar between pairs (63.0% for the open category, 51.8% and 51.7% for the polar category). Using the Akaike information criterion (AIC), for the polar category, the model with the interaction between variant and segment was preferred (AIC 2464 vs. 2481), and so this model was selected. For the open category, the model without the interaction was slightly preferred (AIC 3400 vs. 3396), and the probability of an interaction was not significant (p=.790). Accordingly, for the open category, the model without an interaction was chosen, omitting the right-hand side term (c).
The open category model explained 66.9% of the variance in the data, with a significant (p=.041) effect of variant and a highly significant effect of segment (p<.001). The polar category model explained 51.7% of the variance in the data. This is lower than the open question model but not unreasonable. For the polar category, the model found no significant effect of variant alone (p=.090), but highly significant effects of both segment (p<.001) and the interaction between segment and variant (p<.001).
Figure 12(a) shows the effects of segment and variant according to the preferred model on participants’ correct prediction of open question stimuli, and Figure 12(b) shows the same for polar question stimuli. For open questions, the stimuli were correctly identified significantly above chance levels (0.25) after Segment 4. There is no evidence of an effect of the variant on the time that stimuli were correctly identified. In other words, there is no evidence that expanded pitch range played a role in participants’ decisions.
Figure 12. Effect of segment and variant on the correct identification of question stimuli. Region of interest for open questions is Segment = 2, and for polar questions, it is Segment = 4–5. The color scales are the same on the two plots. Negative values (green through white to light purple) represent the incorrect identification or don’t know; positive values (dark purple) represent the correct identification. The area shaded in gray represents no significant difference from chance. (a) Open question stimuli. (b) Polar question stimuli.
For polar questions, the stimuli were never significantly correctly identified above chance levels. However, there is evidence of an interaction between variant and segment; in other words, a greater amount of expanded pitch range increased the likelihood that participants would correctly identify the stimuli. However, it was only at Segment 5 and with 75% or higher expanded pitch range that correct identifications were at chance levels; earlier in the sentence for all levels of pitch expansion, and at Segment 5 for 50% of lower pitch expansion, participants were significantly more likely to identify the stimulus incorrectly (as an open question, a statement, or unknown).

5 Discussion

We undertook a large-scale online study where participants listened to recordings of syntactically ambiguous utterances that had been produced using prosodic patterns that are canonically associated with statements, open questions, or polar questions. We manipulated the size of F0 variation in the stimuli, to explore the role of a proposed feature expanded pitch range, and we used a gating methodology to identify the critical point during the utterances where disambiguation was occurring. We expected to see open questions disambiguated once the expanded pitch range was recognized at the CPF, polar questions to be disambiguated once expanded pitch range was recognized at the verb, and statements to be disambiguated by the characteristic utterance-final HL% boundary tone. Given the role of expanded pitch range, we expected to see a gradient effect of accuracy as the size of the F0 range was artificially reduced.
The results confounded our expectations. Statements, where the stimuli had no prosodic variation, were reliably disambiguated at the end of the utterance, as predicted. However, for the two question types, predictions were not met, but in different ways for each question type. Open questions were ultimately reliably disambiguated, but disambiguation rose above chance levels only once the verb had been heard. There was no gradient effect arising from the prosodic manipulation. Polar questions were never reliably disambiguated, and there was a significant effect of the prosodic manipulation only in interaction with the position in the sentence. But even the most accurate disambiguation, with 75% or more of natural prosody once the whole utterance had been heard, was not significantly above chance levels. Accordingly, we cannot support a position that prosody is the primary determinant of disambiguation in this case.

5.1 The role of prosody

5.1.1 The nature of the prosodic expression

Although Jones (2016) takes expanded pitch range as applying across a number of syllables in the focused constituent, the method we used to construct the stimuli used the F0 peak within the relevant AP, streamlining the contour between this point and the boundaries of the phrase. This approach is more in line with the F0 peak as described by Yun and Lee (2022), where a positive association was seen between the height of the F0 peak and an open question reading. However, we did not see a reduction in open question interpretation as the F0 peak at the CPF decreased, which would have been predicted by Yun and Lee’s results.

5.1.2 Post-focus compression

It is also possible that post-focus compression is a necessary element of prosody, alongside expanded pitch range. In a similar case involving disambiguating wh-interrogatives from wh-declaratives in Mandarin, Yang et al. (2020) found that open questions showed a more compressed F0 range relative to their declarative counterparts. The stimuli for open questions all had the natural prosody of an open question after the AP containing the CPF. Thus, even for the variants where the expanded F0 range had been removed, the subsequent F0 range compression may have been detectable. We did not control for this, and so the question remains open for further investigation.

5.1.3 The status of AP boundaries

Our study did not set out to explore the role of AP boundaries in disambiguation, but it is possible to make some inferences about their impact. If the placement of AP boundaries is the crucial determiner of disambiguation, then we would expect to see no gradient effect in either the open or polar categories, because only the pitch peaks within the regions of interest were manipulated, and the low tones at phrasal edges were unchanged. For open questions, we would also expect to see successful disambiguation at or shortly after the region of interest Segment 2; the study design means that there is time to fully process the sentence fragment before making a decision. However, a gradient effect was observed in the polar category, and for the open category, it was not until Segment 4 that disambiguation reached chance levels. The data therefore suggest that AP boundaries are not crucial in disambiguation.

5.2 Factors other than prosody

5.2.1 The status of CPFs

Our design assumed that there is no preferred reading for CPFs, but the data suggest that there is a degree of lexical preference for interpreting them as open questions rather than indefinite pronouns. For statement stimuli, where there was no manipulation of natural prosody, the 10%–15% of participants who identified an utterance type at the CPF or the subsequent adverbial largely thought the utterance was an open question. Even when the sentence-final boundary tune had been heard, just more than 10% of participants continued to identify the statement stimuli as open questions. For the polar category, at least 75% of natural prosody at the verb, Segments 4 and 5, was required to bring disambiguation up to chance levels, and even with full natural prosody at the verb, an open question reading was as likely as a polar question reading. Within the limits of the study, it was not possible to carry out a corpus investigation to explore this further.

5.2.2 The role of context

Cross-linguistic evidence shows that context interacts with prosody in disambiguating ambiguous utterances. Snedeker and Trueswell (2003) studied syntactic ambiguity in prepositional phrase attachment in English and found that speakers produced strong prosodic differences when contextual information was insufficient to disambiguate between syntactic structures. These prosodic cues significantly contributed to listeners’ ability to disambiguate. However, when speakers were unaware of the ambiguity, they produced weaker prosodic cues, making it more difficult for listeners to rely on them for disambiguation. Similarly, Hansen et al. (2023), investigating prosodic grouping in coordinated name sequences in German, examined how prosodic cues such as F0 range, final lengthening, and pause signaled internal grouping within three-name sequences. Using a gating paradigm, they tested whether listeners could predict these groupings based on boundary-related prosodic information. They found that only minimal prosodic information related to grouping was necessary; most of the listeners were able to disambiguate after the first name before the grouping information was available. Interestingly, listeners used different disambiguation strategies: some preferred to wait for as much information as possible, whereas others started the identification process early on. In our study, where utterances were presented out of the blue with no supporting context, we also found late disambiguation, which underscored the interpretation that listeners tried to wait for as much information as possible before they attempted disambiguation. Moreover, we saw that listeners relied little on the prosodic information from the CPF; instead, the lexical meaning of the CPF appeared to have biased the listeners’ interpretation toward questions. These findings echo Song et al. (2022), showing that prosodic features are not the only factors in disambiguating Korean polar and wh-questions, as subtle lexical meaning can also influence the interpretations. A similar phenomenon has also been reported in other languages (e.g., see Zhang, 2018, p. 146, for Tianjin Mandarin).

5.2.3 Other methodological points

Our sample size is relatively large, and this may have contributed to unexpected patterns being revealed. However, we note that our pilot study (n=16) also showed differences in successful disambiguation between the different stimulus categories. The online environment for the experiment is less controlled than testing participants in the laboratory, but this may also more closely reflect how people are using language in their daily lives.
One possible confound is that throughout our study, participants potentially forgot the meanings of the different terms open question, polar question, and statement. Although we provided an explanation and relevant examples at the start of the study to acclimatize participants to the key terms, we recognize these terms are still technical in nature, thus potentially not very accessible to the everyday speaker. Therefore, it could be that over the course of the experiment, participants possibly defaulted to a particular response, and in this case, to the open interpretation for questions, which would have resulted in later responses showing a different pattern to earlier responses. However, this is not borne out by our statistical modeling. In our preferred models, order was treated as a random effect, and its inclusion as such did not add structure to the residuals. There was no evidence to suggest that order should have been included as a fixed effect.
An aspect that may have prevented the early identification of open questions was the presence of statements in the stimuli. Because the defining distinction in Korean between statements and questions is the utterance-final tune (HL% for statements vs. LH% for questions), which in the polite speech style is associated with the particle -yo, participants may have waited until Segment 5 to make their decision, even if they had formed a strong hypothesis having heard F0 variation against declination at Segment 2, the CPF. We did not expect this to happen, and it was not seen in the pilot study, but restricting the experiment to a choice between open and polar questions would remove this as a potential confound.

5.3 Implications for the prosody–syntax interface

Jones (2016) presents a model of this phenomenon in LFG (Bresnan & Kaplan, 1982). LFG is a modular, declarative, constraint-based, computationally robust grammar theory that supports analyses of language from the spoken or written utterance through to representations of meaning and discourse. Different elements of language such as syntax, semantics, prosodic structure, and information structure are represented in distinct modules; LFG analyses propose constraints both within individual modules and also at the interface between modules. The theory is thus well-suited for developing accounts of the relationship between prosody and meaning.
Two main approaches have been proposed to analyzing the interface between prosody and meaning in LFG. Bögel (2015, 2022) takes a bottom-up approach where information on F0 and syllable duration is combined with lexical representations such that the interface is modeled on a word-by-word basis. Dalrymple and Mycock (2011) and Mycock and Lowe (2013) take a top-down approach which models the interface at the edges of prosodic and syntactic constituents. More information about the formal treatment of prosody in LFG can be found in Bögel (2023).
Jones’s 2016 model follows the edge-based approach. In his analysis of the data, F0 expansion was seen not only at the F0 peak in an AP but also at the following syllables in the phrase. Accordingly, he assumed that F0 expansion spread leftwards from the right edge of an AP. He also assumed that the position within the sentence of the constituent edge associated with this expansion was associated with the right edge of the syntactic element bearing question focus, from which he derived a formal account of the different readings of open and polar questions. Our results do not support that analysis. In our experiment, F0 expansion was linked to an H tone within the AP, rather than at the right edge. There also seems to be a lexical preference for a question-word reading of the CPF, whether or not the CPF is produced with F0 expansion (see Table 6).
Table 6. Participants’ Interpretations of Question Stimuli.
Experimental stimuliEPR at CPFEPR at verbParticipants’ interpretation
Open question with natural prosody+Open
Polar question with natural prosody+Mixed
Questions with natural prosody removedOpen
However, the contribution of prosody is also not entirely absent; the presence of natural canonical polar question prosody at the verb partially inhibited participants from interpreting the stimuli as open questions and brought decisions to chance levels. A successful analysis needs to allow for this interaction while recognizing that there are differences—whether individual or situational—in the weight that is given to prosodic evidence in making a decision. An initial analysis of these data using lexical preferences in the edge-based approach is presented in Jones et al. (2024).

6 Conclusion

We began our research for this article with an assumption—based on the results of a previous production experiment and in line with the prevailing view in the literature—that prosody was central to the correct perception of Korean sentences containing indefinite content pro-forms that are ambiguous between statements, open questions, and polar questions. Our results lead us to believe that the situation is considerably more complex. Although we found that prosody does play a role in hearers’ correct identification of polar questions, it appears that in the absence of other contextual information, the presence of an indefinite content pro-form creates a strong bias toward an utterance being interpreted as an open question.

Acknowledgments

Our thanks go to Lillian Phillips for her help in creating the pitch-manipulated stimuli and recruiting participants, to Jacolien van Rij for her support in constructing and interpreting the statistical model, and to the editor, Hae-Sung Jeon, and three anonymous reviewers for their comments, which have substantially improved the paper. No external funding was used to carry out the study.

ORCID iDs

Footnotes

References

Boersma P., Weenink D. (2023). Praat: Doing phonetics by computer [computer program]. https://www.fon.hum.uva.nl/praat/
Bögel T. (2015). The syntax-prosody interface in Lexical Functional Grammar [PhD thesis, University of Konstanz, Konstanz].
Bögel T. (2022). The prosody-syntax interface: A computational implementation. In Butt M., Findlay J. Y., Toivonen I. (Eds.), Proceedings of the LFG22 Conference (pp. 61–78). University of Konstanz.
Bögel T. (2023). Prosody and its interfaces. In Dalrymple M. (Ed.), Handbook of Lexical Functional Grammar (pp. 779–821). Language Science Press.
Bresnan J., Kaplan R. M. (1982). Grammars as mental representations of language. In Bresnan J. (Ed.), The mental representation of grammatical relations (chapter Introduction, pp. xvii–lii). MIT Press.
Dalrymple M., Mycock L. (2011). The prosody-semantics interface. In Butt M., King T. H. (Eds.), Proceedings of LFG11 (pp. 173–193). CSLI Publications.
de Leeuw J., Gilbert R., Luchterhandt B. (2023). jsPsych: Enabling an open-source collaborative ecosystem of behavioral experiments. Journal of Open Source Software, 8(85), 5351.
Grosjean F. (1980). Spoken word recognition processes and the gating paradigm. Perception and Psychophysics, 28(4), 267–283.
Grosjean F. (1996a). Gating. Language and Cognitive Processes, 11, 597–604.
Grosjean F. (1996b). Using prosody to predict the end of sentences in English and French. Language and Cognitive Processes, 11, 107–134.
Hansen M., Huttenlauch C., de Beer C., Wartenburger I., Hanne S. (2023). Individual differences in early disambiguation of prosodic grouping. Language and Speech, 66(3), 706–733.
Hastie T., Tibshirani R. (1990). Generalized additive models. Chapman & Hall/CRC.
Hatcher R., Joo H., Kim S., Cho T. (2024). Focus-induced tonal distribution in Seoul Korean as an edge-prominence language. Journal of Phonetics, 107, 101353.
Hwang H. (2009). Wh-phrase questions and prosody in Korean. Japanese/Korean Linguistics, 17, 295–309.
Jones S. (2016). The syntax-prosody interface in Korean: Resolving ambiguity in questions. In Arnold D., Butt M., Crysmann B., King T. H., Müller S. (Eds.), Proceedings of the Joint 2016 Conference on Head-driven Phrase Structure Grammar and Lexical Functional Grammar (pp. 318–338). CSLI Publications.
Jones S. M., Kim Y., Zhang C. (2024). The syntax-prosody interface in LFG: Revisiting Korean question focus. In Butt M., Findlay J. Y., Toivonen I. (Eds.), Proceedings of the LFG’24 Conference (pp. 186–206). International Lexical Functional Grammar Association, PubliKon.
Jun S.-A. (1996). Influence of microprosody on macroprosody: A case of phrase initial strengthening (pp. 97–116). UCLA Working Papers in Phonetics.
Jun S.-A. (2000). Korean ToBI, Version 3. UCLA Working Papers in Phonetics, 99, 149–173.
Jun S.-A. (2005). Korean intonational phonology and prosodic transcription. In Jun S.-A (Ed.), Prosodic typology: The phonology of intonation and phrasing (Chapter 8, pp. 202–228). Oxford University Press.
Jun S.-A., Lee H.-J. (1998, November 30–December 4). Phonetic and phonological markers of contrastive focus in Korean [Conference session]. International Conference on Spoken (ICSLP), Sydney, NSW, Australia.
Jun S.-A., Oh M. (1996). A prosodic analysis of three types of wh-phrases in Korean. Language and Speech, 39(1), 37–61.
Kang M.-Y. (1988). Topics in Korean syntax: Phrase structure, variable binding and movement [PhD thesis, Massachusetts Institute of Technology, Cambridge, MA].
Lange K., Kühn S., Filevich E. (2015). “Just Another Tool for Online Studies (JATOS): An easy solution for setup and management of web servers supporting online studies. Plos ONE, 10(7), Article e0130834.
Lee Y.-C., Wang B., Chen S., Adda-Decker M., Amelot A., Nambu S., Liberman M. (2015). A crosslinguistic study of prosodic focus. In IEEE International conference on acoustics, speech and signal processing (ICASSP) (pp. 4574–4758). IEEE.
Lee Y.-C., Xu Y. (2010). Phonetic realization of contrastive focus in Korean. In Mark H.-J. (Ed.), Speech Prosody 2010 (pp. 1–4). University of Illinois.
Lennes M. (2017). The speech corpus toolkit for Praat. https://lennes.github.io/spect/
Mathôt S., Schreij D., Theeuwes J. (2012). OpenSesame: An open-source, graphical experimental builder for the social sciences. Behavior Research Methods, 44(2), 314–324.
Mycock L., Lowe J. (2013). The prosodic marking of discourse functions. In Butt M., King T. H. (Eds.), Proceedings of LFG13 (pp. 440–460). CSLI Publications.
Petrone C., Niebuhr O. (2014). On the intonation of German Intonation Questions: The role of the prenuclear region. Language and Speech, 57(1), 108–146.
R Core Team. (2018). R: A language and environment for statistical computing. R Foundation for Statistical computing.
Snedeker J., Trueswell J. (2003). Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. Journal of Memory and Language, 48(1), 103–130.
Song J., Jeon H.-S., Kiaer J. (2022). Use of prosodic and lexical cues for disambiguating wh-words in Korean. In Interspeech 2022 (pp. 81–85). ISCA.
Stoet G. (2010). PsyToolkit—A software package for programming psychological experiments using Linux. Behavior Research Methods, 24(4), 1096–1104.
Stoet G. (2017). PsyToolkit: A novel web-based method for running online questionnaires and reaction-time experiments. Teaching of Psychology, 44(1), 24–31.
van Rij J., Wieling M., Baayen R. H., van Rijn H. (2022). itsadug: Interpreting time series and autocorrelated data using GAMMs [R package version 2.4.1]. https://cran.r-project.org/web/packages/itsadug/index.html
Wood S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B), 73(1), 3–36.
Yang Y., Gryllia S., Cheng L. L.-S. (2020). Wh-question or wh-declarative? Prosody makes the difference. Speech Communication, 118, 21–32.
Yun J. (2012). The deterministic prosody of indeterminates. In Proceedings of the 29th West Coast Conference on Formal Linguistics (pp. 285–293). Cascadilla Proceedings Project.
Yun J. (2019). Meaning and prosody of wh-indeterminates in Korean. Linguistic Inquiry, 50(3), 630–647.
Yun J., Lee H.-S. (2022). Prosodic disambiguation of questions in Korean: Theory and processing. Korean Linguistics, 18(1), 18–47.
Zhang C. (2018). Tianjin Mandarin tones and tunes [DPhil thesis, University of Oxford, Oxford].