4.2.2 Stimuli
Twenty-one sets of stimuli were generated by a native speaker of Seoul Korean, recorded in a sound-attenuated room at a sampling frequency of 44.1 kHz. Each set consists of the same ambiguous sentence, read aloud three times with the speaker asked to produce the sentence as a statement, an open question, or a polar question, respectively, as in Example (5):
(5) hakchangsicel ttay nwukwu-lul mollay sarangh-ayss-eyo
school.days during who/someone-obj secretly love-pst-pol
a. “(I) secretly loved someone when I was at school.” (statement)
b. “Who did you secretly love when you were at school?” (open question)
c. “Did you secretly love someone when you were at school?” (polar question)
Once the recordings were generated, we then created the corresponding TextGrids in Praat (
Boersma & Weenink, 2023) using a script readily available online.
2 Working within
Jun’s (2005) description of the Korean AP, we followed
Jones (2016) in assuming that focus is associated with expanded pitch range. The schematic diagram in
Figure 3 shows an idealized version of the prosodic patterns that we are assuming; the placement of phrase boundaries in the naturally produced stimuli is shown in
Table 4. All 63 naturally produced stimuli had an AP boundary after the first constituent, and no stimuli had AP boundaries within either the CPF or the verb. In total, 23 of the 63 stimuli had a boundary pattern matching the schematic diagram (pattern F in
Table 4).
The remaining 40 stimuli showed some variation from the idealized pattern in the placement of AP boundaries after the first constituent.
Table 5 shows how the presence or absence of AP boundaries at specific points during the stimuli was distributed between the different categories. Open question stimuli were significantly different from the other two categories in the presence of an AP boundary immediately after the CPF
but this was not deterministic: for polar question and statement stimuli, eight stimuli in each category did not have an AP boundary after the CPF. There were no differences between the three categories for boundary placement during or after the adverbial.
In 10 of the 21 stimulus sets, the AP boundary patterns were identical across all three categories. Of the remaining 11 sets, 10 had polar questions and statements patterning together; one set had open and polar questions patterning together; and one set had open questions and statements patterning together.
Having recorded and analyzed the baseline stimuli, we proceeded to generate test stimuli by manipulating the F0 contour in the region of interest, which was the CPF for open question stimuli and the verb for polar question stimuli. Stimuli for declarative statements were not manipulated. All manipulations were based around F0 measurements from the entire AP that included the region of interest. Up to four points were measured, depending on how Jun’s T-H . . . L-H tone pattern was realized. An example of the measurement points for one AP in one stimulus is shown in
Figure 4.
From each baseline open and polar question utterance, we created a set of five test stimuli. The sets had four equal steps between the full extent of pitch expansion in the region of interest and a baseline F0 contour measured in the corresponding region of the comparator stimulus. For open questions, the comparator was the polar question, and for polar questions, the comparator was the open question. All manipulations reduced the height of the F0 peaks in the regions of interest, and the minimum and maximum values of those peaks matched F0 values that were naturally produced.
For open questions, the region of interest was not at the edge of the sentence, and so initial and final F0 were the same for original and manipulated stimuli. For polar questions, the verb was in focus, and so the region of interest was the AP containing the verb. Because this AP was also IP-final, the AP-final H tone was replaced by the LH% boundary tone associated with questions. Because the statement had a sentence-final HL% boundary tone, the pitch expansion contrast for polar questions was with the corresponding open question. Again, we created test stimuli that have four equal steps in the region of interest, shown in
Figure 5.
For polar questions, we followed
Jones (2016) and assumed that there was also an expanded pitch range at the final LH% tone. We therefore also created four equal steps in the utterance-final verb and particle. Here, the extent of manipulation was the natural F0 difference between open questions and polar questions.
The aforementioned variants were produced by manipulating the F0 contour using Praat (
Boersma & Weenink, 2023) using the following procedure:
1.
For each stimulus in a set, the regions of interest were identified.
2.
For each element of the region of interest, the log F0 was taken at the start, the maximum, the minimum, and the end of the phrase. Logarithms were used so that the manipulated variants would be equally spaced in terms of pitch rather than frequency. The start-maximum F0 range for APs, here called was calculated by subtracting the starting log F0 from the maximum log F0, Formula (1):
For the open and polar questions, the start-maximum F0 range for the LH% tune was calculated in the same way, Formula (2).
The start-end F0 range for APs was calculated by subtracting the starting log F0 from the ending log F0, Formula (3).
3. For the open question, there was one element of the region of interest and one point to manipulate: the The F0 range for the AP containing the constituent in focus was calculated as the open question start-maximum F0 range, here called minus the statement start-maximum F0 range, here called Revised F0 maxima (F0′max) for the manipulated stimuli were calculated using Formula (4), where is the proportion of the F0 range for the region of interest included in the manipulated stimulus:
The F0 contour of the natural open question was streamlined to remove all points except the start, the maximum, the minimum (if this was not also the end of the phrase), and the end of the phrase. The maximum point was then changed to the revised F0. Four variants were produced for each open question, with the proportion being equal to 0%, 25%, 50%, and 75%, respectively.
4. For the polar question, the AP forming the region of interest includes the sentence-final LH% tone. Within the region of interest, there are three points to manipulate: the maximum of the AP before the sentence-final LH% tone, the pitch at the start of the sentence-final LH% tone, and the maximum of the sentence-final LH% tone.
(a) For the maximum of the AP, the focus F0 range is the difference between the polar question AP start-maximum pitch range minus the corresponding open question start-maximum F0 range . The revised F0 maxima (F0′max) were calculated using Formula (5):
(b) For the pitch at the start of the sentence-final LH% tone, the focus pitch range is the difference between the polar question AP start-end pitch range minus the corresponding open question start-end pitch range The revised pitch levels (F0′boundary) were calculated using Formula (6):
(c) For the maximum of the sentence-final LH% tone, the focus pitch range is the difference between the start-maximum pitch range of the polar question final LH% tone and the corresponding open question start-maximum pitch range . The revised pitch maxima (F0′tune) were calculated using Formula (7).
(d) Having calculated the manipulated values, the F0 contour of the natural polar question baseline was streamlined to remove all pitch points except the start, the AP maximum, the AP minimum (if this was not also the end of the phrase), the boundary between the AP and the LH% tone, the maximum of the LH% tone, and the end of the LH% tone (if this was not also the maximum). The AP maximum, the boundary pitch, and the LH% tone maximum points were then changed to the revised pitches. Again, four variants were produced for each open question, with 0%, 25%, 50%, and 75%, respectively.
We expected that manipulation might reduce the audio quality, and thus, the intelligibility of the stimuli. All 168 manipulated stimuli were validated by asking native speakers of Seoul Korean to judge their intelligibility on a five-point Likert-type scale (1 = completely unintelligible, 5 = completely intelligible). The mean acceptability across all stimuli was 4.54, but 16 of the 105 stimuli had a mean score below 4.0, and these were excluded from the results.
Once the full set of manipulated utterances was ready, the individual gating stimuli were prepared, with one set of stimuli for each manipulated utterance. The stimuli were segmented using Praat following the model in
Figure 6; there were five segments for each stimulus. The open question region of interest with the CPF was first presented in Stimulus 2, and the polar question region of interest was first presented in Stimulus 4. Only in Stimulus 5 did participants hear the tune associated with either a question or a declarative statement.
Following segmentation, the gating stimulus files were produced using a script amended from the Speech Corpus Toolkit for Praat (
Lennes, 2017).
4.2.3 Procedure
Participants were presented with stimuli via a website written using OpenSesame (
Mathôt et al., 2012) and jsPsych (
de Leeuw et al., 2023), which was powered by a JATOS server (
Lange et al., 2015) hosted at the University of Groningen. After giving consent to participate, participants were shown instructions, which included explanations for what statements, open questions, and polar questions are, respectively. Having confirmed that they had read the instructions, participants continued to the data collection screen. Four buttons were presented in a horizontal row at the center of the screen with labels in Korean
acik molukessta “Don’t know yet”;
kaypanghyeng cilmun “Open question”;
phyeyswayhyeng cilmun “Closed question”;
cinsul “Statement.” Below the buttons was the question
etten mwuncangul tutko issnayo? “What sort of sentence are you listening to?,” and at the top of the screen was a bar showing progress through the experiment.
Stimuli were played automatically when the page loaded, and once the participant had made a choice, the page re-loaded to play the next stimulus. It was not possible for participants to replay the stimuli.
Because each stimulus set contained 11 members (five variants of open questions, five variants of polar questions, and one declarative statement), participants were randomly allocated to one of 11 cohorts. Each cohort heard one member of each of the stimulus sets in a Latin Square design, a total of 21 trials with no repetition of stimulus sets. During the experiment, each participant was presented with a mixture of open questions, polar questions, and sentences, and for the open and polar questions, there was a mixture of the five variant levels of prosody. The order of presentation of the stimulus sets was random for each participant.
For each trial, participants were presented with the five stimuli in the utterance set, in increasing order of length. Once all five stimuli had been heard, the next utterance set was presented. Four times during the experiment, at the end of an utterance set, participants were asked a question to confirm they were paying attention, in line with guidance from Prolific. The question was a multiple-choice question, and the question included the answer that was required to be given. Participants who answered two or more of these attention questions incorrectly were excluded from the study.