Skip to main content
Intended for healthcare professionals
Restricted access
Research article
First published online June 15, 2023

A Corpus Study on the Difference of Turn-Taking in Online Audio, Online Video, and Face-to-Face Conversation

Abstract

Daily conversation is usually face-to-face and characterized by rapid and fluent exchange of turns between interlocutors. With the need to communicate across long distances, advances in communication media, online audio communication, and online video communication have become convenient alternatives for an increasing number of people. However, the fluency of turn-taking may be influenced when people communicate using these different modes. In this study, we conducted a corpus analysis of face-to-face, online audio, and online video conversations collected from the internet. The fluency of turn-taking in face-to-face conversations differed from that of online audio and video conversations. Namely, the timing of turn-taking was shorter and with more overlaps in face-to-face conversations compared with online audio and video conversations. This can be explained by the limited ability of online communication modes to transmit non-verbal cues and network latency. In addition, our study could not completely exclude the effect of formality of conversation. The present findings have implications for the rules of turn-taking in human online conversations, in that the traditional rule of no-gap–no-overlap may not be fully applicable to online conversations.

Get full access to this article

View all access and purchase options for this article.

Data availability statement

The data that support the findings of this study are openly available in OSF at https://osf.io/s2nq6.

References

Baayen R. H., Davidson D. J., Bates D. M. (2008). Mixed-effects modelling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. https://doi.org/10.1371/journal.pone.0099019
Bailenson J. N. (2021). Nonverbal overload: A theoretical argument for the causes of Zoom fatigue. Technology, Mind, and Behaviour, 2(1), 1–6. https://doi.org/10.1037/tmb0000030
Baker D. A., Burns D. M., Reynolds-Kueny C. (2020). Just sit back and watch: Large disparities between video and face-to-face interview observers in applicant ratings. International Journal of Human–Computer Interaction, 36(4), 1–12. https://doi.org/10.1080/10447318.2020.1805874
Baltes B. B., Dickson M. W., Sherman M. P., Bauer C. C., LaGanke J. S. (2002). Computer-mediated communication and group decision making: A meta-analysis. Organizational Behavior and Human Decision Processes, 87(1), 156–179. https://doi.org/10.1006/obhd.2001.2961
Barthel M., Sauppe S. (2019). Speech planning at turn transitions in dialog is associated with increased processing load. Cognitive Science, 43(7), Article e12768. https://doi.org/10.1111/cogs.12768
Bavelas J. B., Chovil N. (2000). Visible acts of meaning: An integrated message model of language use in face-to-face dialogue. Journal of Language and Social Psychology, 19(2), 163–194. https://doi.org/10.1177/0261927x00019002001
Binder J. F., Cebula K., Metwally S., Vernon M., Atkin C., Mitra S. (2019). Conversational engagement and mobile technology use. Computers in Human Behavior, 99, 66–75. https://doi.org/10.1016/j.chb.2019.05.016
Bögels S. (2020). Neural correlates of turn-taking in the wild: Response planning starts early in free interviews. Cognition, 203(2020), Article 104347. https://doi.org/10.1016/j.cognition.2020.104347
Bögels S., Kendrick K. H., Levinson S. C. (2015). Never say no . . . How the brain interprets the pregnant pause in conversation. PLoS ONE, 10(12), Article e0145474. https://doi.org/10.1371/journal.pone.0145474
Bögels S., Levinson S. C. (2017). The brain behind the response: Insights into turn-taking in conversation from neuroimaging. Research on Language and Social Interaction, 50, 71–89. https://doi.org/10.1080/08351813.2017.1262118
Bögels S., Torreira F. (2015). Listeners use intonational phrase boundaries to project turn ends in spoken interaction. Journal of Phonetics, 52(2015), 46–57. https://doi.org/org/10.1016/j.wocn.2015.04.004
Bögels S., Torreira F. (2021). Turn-end estimation in conversational turn-taking: The roles of context and prosody. Discourse Processes, 58(10), 903–924. https://doi.org/org/10.1080/0163853X.2021.1986664
Boland J. E., Fonseca P., Mermelstein I., Williamson M. (2022). Zoom disrupts the rhythm of conversation. Journal of Experimental Psychology: General, 151(6), 1272–1282. https://doi.org/10.1037/xge0001150
Cao B., Lin W.-Y. (2017). Revisiting the contact hypothesis: Effects of different modes of computer-mediated communication on intergroup relationships. International Journal of Intercultural Relations, 58, 23–30. https://doi.org/10.1016/j.ijintrel.2017.03.003
Cassell J., Nakano Y. I., Bickmore T. W., Sidner C. L., Rich C. (2001, July 6–11). Non-verbal cues for discourse structure. In Association for Computational Linguistic, 39th Annual Meeting and 10th Conference of the European Chapter, Proceedings of the Conference (pp. 114–123). https://doi.org/10.3115/1073012.1073028
Chen C. L., Zhang B. (2000). Modern Chinese sentences. East China Normal University Press.
China Internet Network Information Center. (2022). The 49th China statistical report on internet development. https://www.cnnic.com.cn/IDR/ReportDownloads/202204/P020220424336135612575.pdf
Clough S., Duff M. C. (2020). The role of gesture in communication and cognition: Implications for understanding and treating neurogenic communication disorders. Frontiers in Human Neuroscience, 14. https://doi.org/10.3389/fnhum.2020.00323
Corps R. E., Crossley A., Gambi C., Pickering M. J. (2018). Early preparation during turn-taking: Listeners use content predictions to determine what to say but not when to say it. Cognition, 175, 77–95. https://doi.org/10.1016/j.cognition.2018.01.015
Corps R. E., Gambi C., Pickering M. J. (2017). Coordinating utterances during turn-taking: The role of prediction, response preparation, and articulation. Discourse Processes, 55(2), 230–240. https://doi.org/10.1080/0163853x.2017.1330031
Daft R. L., Lengel R. H., Trevino L. K. (1987). Message equivocality, media selection, and manager performance: Implications for information systems. MIS Quarterly, 11(3), 355–366. https://doi.org/10.2307/248682
de Ruiter J. P., Mitterer H., Enfield N. J. (2006). Projecting the end of a speaker’s turn: A cognitive cornerstone of conversation. Language, 82(3), 515–535. https://doi.org/10.1353/lan.2006.0130
Ding Q. (2020). Comparative analysis and research on interrogative sentences in Chinese, French and Malagasy [Master’s thesis]. Jiangxi Normal University.
Duncan S. (1972). Some signals and rules for taking speaking turns in conversation. Journal of Personality and Social Psychology, 23(2), 283–292. https://doi.org/10.1037/h0033031
Ferreira F. (1991). Effects of length and syntactic complexity on initiation times for prepared utterances. Journal of Memory and Language, 30(2), 210–233. https://doi.org/10.1016/0749-596X(91)90004-4
Furukawa H., Nishida M., Jokinen K., Yamamoto S. (2011, October 26–28). A multimodal corpus for modeling turn management in multi-party conversations. In 14th Annual International Conference on Speech Database and Assessments, Oriental COCOSDA 2011, Hsinchu, Taiwan (pp. 142–146). https://doi.org/10.1109/icsda.2011.6085996
Garrod S., Pickering M. J. (2015). The use of content and timing to predict turn transitions. Frontiers in Psychology, 6, 1–12. https://doi.org/10.3389/fpsyg.2015.00751
Geiger I. (2020). From letter to Twitter: A systematic review of communication media in negotiation. Group Decision and Negotiation, 29(3), 207–250. https://doi.org/10.1007/s10726-020-09662-6
Gerwing J., Allison M. (2009). The relationship between verbal and gestural contributions in conversation: A comparison of three methods. Gesture, 9(3), 312–336. https://doi.org/10.1075/gest.9.3.03ger
Gratier M., Devouche E., Guellai B., Infanti R., Yilmaz E., Parlato-Oliveira E. (2015). Early development of turn-taking in vocal interaction between mothers and infants. Frontiers in Psychology, 6, Article 1167. https://doi.org/10.3389/fpsyg.2015.01167
Hassell M. D., Cotton J. L. (2017). Some things are better left unseen: Toward more effective communication and team performance in video-mediated interactions. Computers in Human Behavior, 73, 200–208. https://doi.org/10.1016/j.chb.2017.03.039
Heldner M., Edlund J. (2010). Pauses, gaps and overlaps in conversations. Journal of Phonetics, 38(4), 555–568. https://doi.org/10.1016/j.wocn.2010.08.002
Holler J., Kendrick K. H., Levinson S. C. (2017). Processing language in face-to-face conversation: Questions with gestures get faster responses. Psychonomic Bulletin and Review, 25(5), 1900–1908. https://doi.org/10.3758/s13423-017-1363-z
Jongman S. R. (2021). The attentional demands of combining comprehension and production in conversation. Psychology of Learning and Motivation, 74, 95–140. https://doi.org/10.1016/bs.plm.2021.02.003
Kabacoff R. I. (2011). R in action. Manning.
Kendon A. (1967). Some functions of gaze-direction in social interaction. Acta Psychologica, 26, 22–63. https://doi.org/10.1016/0001-6918(67)90005-4
Kendrick K. H. (2015). The intersection of turn-taking and repair: The timing of other-initiations of repair in conversation. Frontiers in Psychology, 6, Article 250. https://doi.org/10.3389/fpsyg.2015.00250
Kendrick K. H., Holler J. (2017). Gaze direction signals response preference in conversation. Research on Language and Social Interaction, 50(1), 12–32. https://doi.org/10.1080/08351813.2017.1262120
Kendrick K. H., Torreira F. (2015). The timing and construction of preference: A quantitative study. Discourse Processes, 52(4), 255–289. https://doi.org/10.1080/0163853X.2014.955997
Kosmala L., Crible L. (2021). The dual status of filled pauses: Evidence from genre, proficiency and co-occurrence. Language and Speech, 65(4), 1–24. https://doi.org/10.1177/00238309211010862
Lenth R. (2020). emmeans: Estimated marginal means, aka least-squares means (R Package Version 1.4.8). https://CRAN.R-project.org/package=emmeans
Levinson S. C. (2016). Turn-taking in human communication—Origins and implications for language processing. Trends in Cognitive Sciences, 20(1), 6–14. https://doi.org/10.1016/j.tics.2015.10.010
Levinson S. C., Holler J. (2014). The origin of human multi-modal communication. Philosophical Transactions of the Royal Society B, 369(1651), 20130302. https://doi.org/10.1098/rstb.2013.0302
Levinson S. C., Torreira F. (2015). Timing in turn-taking and its implications for processing models of language. Frontiers in Psychology, 6, Article 731. https://doi.org/10.3389/fpsyg.2015.00731
Lindsay L., Gambi C., Rabagliati H. (2019). Preschoolers optimize the timing of their conversational turns through flexible coordination of language comprehension and production. Psychological Science, 30(4), 504–515. https://doi.org/10.1177/0956797618822802
Liu H. (2004). Analysis of conversational structure. Peking University Press.
Liu Y. T. (2007). The rudiments of conversation analysis. Xue Lin Press.
Magyari L., De Ruiter J. P., Levinson S. C. (2017). Temporal preparation for speaking in question-answer sequences. Frontiers in Psychology, 8, Article 211. https://doi.org/10.3389/fpsyg.2017.00211
McKenna K. Y. A., Seidman G. (2005). Social identity and the self: Getting connected online. In Walker W. R., Hermann D. (Eds.), Cognitive technology: Essays on the transformation of thought and society (pp. 89–110). MacFarland and Company.
Meredith J. (2020). Conversation analysis, cyberpsychology and online interaction. Social and Personality Psychology Compass, 14, Article e12529. https://doi.org/10.1111/spc3.12529
Meyer M., Chung H., Debnath R., Fox N., Woodward A. L. (2022). Social context shapes neural processing of others’ actions in 9-month-old infants. Journal of Experimental Child Psychology, 213, Article 105260. https://doi.org/10.1016/j.jecp.2021.105260
Mondada L. (2016). Challenges of multimodality: Language and the body in social interaction. Journal of Sociolinguistics, 20(3), 336–366. https://doi.org/10.1111/josl.1_12177
Neiberg D., Gustafson J. (2011, August 28–31). Predicting speaker changes and listener responses with and without eye-contact [Conference session]. INTERSPEECH 2011, 12th annual conference of the International Speech Communication Association, Florence, Italy.
Nguyen D. T., Canny J. (2009, April 4–9). More than face-to-face: Empathy effects of video framing [Conference session]. Proceedings of the 27th international conference on human factors in computing systems-CHI 2009, Boston, MA, United States. https://doi.org/10.1145/1518701.1518770
Ramirez A. Jr., Burgoon J. K. (2004). The effect of interactivity on initial interactions: The influence of information valence and modality and information richness on computer-mediated interaction. Communication Monographs, 71(4), 422–447. https://doi.org/10.1080/0363452042000307461
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Available at: https://www.R-project.org/
Roberts S. G., Torreira F., Levinson S. C. (2015). The effects of processing and sequence organization on the timing of turn taking: A corpus study. Frontiers in Psychology, 6, Article 509. https://doi.org/10.3389/fpsyg.2015.00509
Rusk F., Pörn M. (2019). Delay in L2 interaction in video-mediated environments in the context of virtual tandem language learning. Linguistics and Education, 50, 56–70. https://doi.org/10.1016/j.linged.2019.02.003
Sacks H., Schegloff E. A., Jefferson G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50(4), 696–735. https://doi.org/10.1353/lan.1974.0010
Schegloff E. A. (2002). Opening sequencing. In Katz J. E., Aakhus M. (Eds.), Perpetual contact: Mobile communication, private talk, public performance (pp. 326–385). Cambridge University Press.
Seuren L. M., Wherton J., Greenhalgh T., Shaw S. E. (2021). Whose turn is it anyway? Latency and the organization of turn-taking in video-mediated interaction. Journal of Pragmatics, 172(4), 63–78. https://doi.org/10.1016/j.pragma.2020.11.005
Sjerps M. J., Meyer A. S. (2015). Variation in dual-task performance reveals late initiation of speech planning in turn-taking. Cognition, 136, 304–324. https://doi.org/10.1016/j.cognition.2014.10.008
Skantze G. (2021). Turn-taking in conversational systems and human-robot interaction: A review. Computer Speech & Language, 67(2021), Article 101178. https://doi.org/org/10.1016/j.csl.2020.101178
Sprecher S. (2014). Initial interactions online-text, online-audio, online-video, or face-to-face: Effects of modality on liking, closeness, and other interpersonal outcomes. Computers in Human Behavior, 31, 190–197. https://doi.org/10.1016/j.chb.2013.10.029
Stern D. N., Gibbon J. (1979). Temporal expectancies of social behaviours in mother-infant play. In Thoman E. B. (Ed.), Origins of the infant’s social responsiveness (pp. 409–429). Lawrence Erlbaum.
Stivers T., Enfield N. J., Brown P., Englert C., Hayashi M., Heinemann T., Hoymann G., Rossano F., de Ruiter J. P., Yoon K.-E., Levinson S. C. (2009). Universals and cultural variation in turn-taking in conversation. Proceedings of the National Academy of Sciences of the United States of America, 106(26), 10587–10592. https://doi.org/10.1073/pnas.0903616106
Szekrényes I., Kovács G. (2017). Classification of formal and informal dialogues based on turn-taking and intonation using deep neural networks. In Karpov A., Potapova R., Mporas I. (Eds.), Speech and computer (pp. 233–243). Springer International Publishing. https://doi.org/10.1007/978-3-319-66429-3-22
Tanaka K., Nakanishi H., Ishiguro H. (2015). Physical embodiment can produce robot operator’s pseudo presence. Frontiers in ICT, 2(8), 1–12. https://doi.org/10.3389/fict.2015.00008
Tang Q. Y., Li D. Y. (2007). A contrastive study of mood between Chinese and English and its translation. Shanghai Journal of Translators, 3(3), 69–73. https://doi.org/10.3969/j.issn.1672-9358.2007.03.019
ten Bosch L., Oostdijk N., Boves L. (2005). On temporal aspects of turn-taking in conversational dialogues. Speech Communication, 47(1–2), 80–86. https://doi.org/10.1016/j.specom.2005.05.009
ten Bosch L., Oostdijk N., de Ruiter J. P. (2004). Durational aspects of turn-taking in spontaneous face-to-face and telephone dialogues. Lecture Notes in Computer Science, 3206, 563–570. https://doi.org/10.1007/978-3-540-30120-2_71
ter Bekke M., Drijvers L., Holler J. (2020). The predictive potential of hand gestures during conversation: An investigation of the timing of gestures in relation to speech. In Gesture and speech in interaction (GESPIN2020) (pp. 1–6). https://doi.org/10.31234/osf.io/b5zq7
Torreira F., Bögels S. (2022). Vocal reaction times to speech offsets: Implications for processing models of conversational turn-taking. Journal of Phonetics, 94(2022), Article 101175. https://doi.org/org/10.1016/j.wocn.2022.101175
Torreira F., Bögels S., Levinson S. C. (2015). Breathing for answering: The time course of response planning in conversation. Frontiers in Psychology, 6, Article 284. https://doi.org/10.3389/fpsyg.2015.00284
Vilhelmson B., Thulin E., Elldér E. (2016). Where does time spent on the internet come from? Tracing the influence of information and communications technology use on daily activities. Information, Communication & Society, 20(2), 1–14. https://doi.org/10.1080/1369118x.2016.1164741
Wang T.-Y., Kawaguchi I., Kuzuoka H., Otsuki M. (2018). Effect of manipulated amplitude and frequency of human voice on dominance and persuasiveness in audio conferences. Proceedings of the ACM on human-computer interaction, 2(CSCW), 1–18. https://doi.org/10.1145/3274446
Weilhammer K., Rabold S. (2003, August 3–9). Durational aspects in turn taking [Conference session]. Proceedings of the international conference of phonetic sciences (pp. 2145–2148). Barcelona, Spain.
Wen Z. L., Zhang H. O. U. J. T., Liu H. Y. (2004). Testing and application of the mediating effects. Acta Psychologica Sinica, 36(5), 614–620. https://doi.org/10.1007/BF02911031
Wheatley D. J., Basapur S. (2011). Concept evaluation and usability testing of a TV based video communications system. Entertainment Computing, 2(3), 163–173. https://doi.org/10.1016/j.entcom.2011.03.003
Wibisono B., Haryono A. (2022). Turn-taking in conversation uttered by Madurese community in Jember. JOALL (Journal of Applied Linguistics and Literature), 7(2), 345–361. https://doi.org/org/10.33369/joall.v7i2.20773
Wilson M., Wilson T. P. (2005). An oscillator model of the timing of turn-taking. Psychonomic Bulletin & Review, 12(6), 957–968. https://doi.org/10.3758/bf03206432
Wilson T. P., Zimmerman D. H. (1986). The structure of silence between turns in two-party conversation. Discourse Processes, 9(4), 375–390. https://doi.org/10.1080/01638538609544649
Wittenburg P., Brugman H., Russel A., Klassmann A., Sloetjes H. (2006). ELAN: A professional framework for multimodality research. In Proceedings of the fifth international conference on language resources and evaluation (pp. 1556–1559). http://tla.mpi.nl/tools/tla-tools/elan/
Xi J. (2011). The status quo of college students’ online communication and the characteristics of their psychological needs [Master’s thesis]. Shaanxi Normal University.
Xie Y. X. (2012). Function and dilemma of nonverbal cues in video-mediated communication. Open Education Research, 18(006), 33–39. https://doi.org/10.13966/j.cnki.kfjyyj.2012.06.006
Yang Y. M. (2019). A preliminary study on the translation of modal particle “啊”” in the English versions of teahouse [Master’s thesis]. Guangxi University.
Yu G. D., Wu Y. X. (2016). A research proposal of speech acts in Mandarin from conversation analytic perspective. Journal of Shanxi University (Philosophy & Social Science), 39(4), 45–48.
Zeng Q. Y. (2019). A contrastive study of the interrogative in Chinese and Vietnamese [Doctoral dissertation]. Central China Normal University.
Zheng S. Y. (2015). Research on WeChat chat language [Master’s thesis]. Xi’an International Studies University.
Zhu M. X., Yan X. L., Yuan Q. J. (2018). A review of researches based on media richness theory in MIS discipline. Journal of Modern Information, 38(09), 146–154. https://doi.org/10.3969/j.issn.1008-0821.2018.09.023
Zubek J., Nagórska E., Komorowska-Mach J., Skowrońska K., Zieliński K., Rączaszek-Leonardi J. (2022). Dynamics of remote communication: Movement coordination in video-mediated and face-to-face conversations. Entropy, 24(4), 559. https://doi.org/10.3390/e24040559