New content
International Journal of Chinese Education

Open access

Research article

First published online February 20, 2026

Exploring Factors that Influence Higher Education Chinese Students’ Adoption of Online Videos for English Pronunciation Learning

Simon Wong https://orcid.org/0000-0003-3408-9747 [email protected], Wience Wing-sze Lai, […], Yui-yip Lau, and Kwong-cheong Wong+1View all authors and affiliations

All Articles

https://doi.org/10.1177/2212585X261427635

Abstract
Introduction
Methodology
Results
Discussion
Conclusions and Implications
Declaration of Conflicting Interests
Funding
ORCID iD
References
Appendix

PDF/EPUB

Abstract

This paper presents a quantitative approach to explore Chinese students’ acceptance of using online English pronunciation video clips to learn English pronunciation in a Hong Kong higher education institution. A convenience sample of 145 students completed the online questionnaires which were designed to measure the effects of the students’ belief on their behavioral intention of using the video clips for English pronunciation learning. These effects include relevance of using the video clips to the students’ major of study, learning performance enhancement by the video clips, ease of use of the video clips, and influence from the students’ social groups. The paired-samples t-test results showed that the Chinese students are more inclined to accept the use of these video clips for English pronunciation learning. The results of correlation and multiple regression analyses indicate that their relevance for major strongly influences their performance expectancy while their performance expectancy and social influence affect their behavioral intention significantly. These findings indicate that the Chinese students, especially those with a major related to English pronunciation, like to use the online English pronunciation video clips for English pronunciation learning and provide implications for their social groups to facilitate adoption of the online English pronunciation videos.

Introduction

English has been the medium of instruction for most non-Chinese subjects in Hong Kong higher education (Gibbons, 1987; Wong, 2012, 2015) for two main reasons: (1) to keep the local’s English standard for maintaining Hong Kong’s status as an international trade center (Education Commission, 1996, 1999) and (2) to facilitate Hong Kong higher education students’ grasp of the latest and advanced knowledge in various disciplines as presented in books and journals which are dominated by the English language (Johnson et al., 1993; Li et al., 2001). However, the actual use of Chinese rather than English in the students’ daily conversations (Evans & Morrison, 2011) has caused the Chinese-English bilingual education. Under the Chinese-English bilingual education in the Hong Kong higher education context, a natural question arises: How to enhance higher education students’ English pronunciation for better comprehension and communication in the global context to safeguard Hong Kong’s international status?

Many studies (e.g., Carnaghan et al., 2011; Lau et al., 2021; Tang et al., 2021) explored how technologies enhance learning. One major exploration is how the use of video clips enhances learning. The studies (e.g., Hammer, 2000; Schwartz & Hartman, 2007; Sherin, 2004; Snelson, 2018) found how students learn from watching video clips. For example, Sherin (2004) found that videos can be watched repeatedly and used to direct viewers’ attention to the important events for learning enhancement. For language studies, video clips that contain rich audio-visual information (e.g., sound effects and facial expressions) and cultural references (Canning-Wilson, 2000; Galbraith & Rodriguez, 2018) reinforce the second language (L2) acquisition (Gong et al., 2019; Lin, 2010, 2011) and enhance the conversation and pronunciation skills of the students (Watkins & Wilkins, 2011).

Inspired by the previous studies (e.g., Chan, 2010; Deterding et al., 2008) indicating the problem of English pronunciation by Hong Kong higher education students and the study by Watkins and Wilkins (2011) revealing that the use of video clips can help English pronunciation learning, this study was conducted to explore how English pronunciation video clips can be adopted for Chinese students in Hong Kong higher education to learn English pronunciation appropriately irrespective of their video production and digital literacy skills. Taking into account many factors, including the widespread use of computing devices among students (Gikas & Grant, 2013), the availability of Internet access facilities on campus, higher education students’ learning autonomy (Benson & Voller, 1997; Henri et al., 2018; Lau et al., 2021), their controlled video viewing strategies such as pausing, repeating, slow-motion playing and fast forwarding a video (Costley et al., 2021), and their lapse in attention for viewing a video longer than 20 minutes (Costley et al., 2021), the researchers decided to develop short online video clips (each of which is around 2 to 5 minutes long) in this study for the students to learn English pronunciation at any time, place, and pace. On the other hand, through identifying the typical mistakes made by Hong Kong students of higher education, these online video clips were targeted at the common English academic vocabulary used in higher education (Coxhead, 2000; Durrant, 2016; Gardner & Davies, 2013).

Nevertheless, a crucial precondition emerges: if the students do not accept using the online English pronunciation video clips, then that technology cannot help them to enhance their pronunciation accuracy. Thus, to increase the student involvement in using the online English pronunciation video clips for identifying pronunciation errors and improving pronunciation, the students’ acceptance of adopting the online English pronunciation video clips must be investigated. For this investigation, the researchers considered using a quantitative approach to measure the constructs related to technology acceptance to explore how Chinese students accept using online English pronunciation video clips to learn English pronunciation in a Hong Kong higher education context.

Theoretical Grounds

To address the students’ acceptance of adopting the online English pronunciation video clips, we considered Technology Acceptance Model (TAM) pioneered by Davis (1989). This model theorizes that an individual’s actual usage behavior (UB) of a technology is determined by that individual’s behavioral intention (BI) to use that technology, which is in turn determined by that individual’s perceived usefulness and perceived ease of use of that technology. Perceived usefulness is the individual’s belief that his or her performance can be enhanced by using that technology. Perceived ease of use refers to the individual’s perception that using that technology is easy. TAM has been adjusted to fit different situations (e.g., Park et al., 2012; Venkatesh & Davis, 2000; Wong et al., 2019).

After reviewing TAM and its extended models, the unified model called Unified Theory of Acceptance and Use of Technology (UTAUT) by Venkatesh et al. (2003) was found to be applicable to this study as the constructs in UTAUT are more relevant to this study on exploring the students’ acceptance of adopting the online English pronunciation video clips. As theorized by UTAUT, the students’ acceptance of the online English pronunciation videos for English pronunciation learning is indicated by their actual UB which is, in turn, determined by their BI to use those videos and facilitating conditions. Facilitating conditions refer to students’ perceptions that the institutional and technical infrastructure (e.g., Internet access facilities, computing devices provided, and websites) exist to support them to use the online English pronunciation videos. In this study, since the participating students came from the same education institution with the same facilitating conditions, measuring this construct of the facilitating conditions should not give different results. Therefore, this construct of the facilitating conditions in the original UTAUT was left out in this study. The student’s BI is in turn determined by the following three factors: the student’s performance expectancy (PE) which is the same as the perceived usefulness in TAM, effort expectancy (EE) which is the extent of the student’s perception of the digital literacy and perceived ease of use of the online English pronunciation video clips, and social influence (SI) which is the extent to which the student perceives the influence from the social environment such as the expectation of the student’s classmates, teachers, friends and parents that the student should use the technology. UTAUT contains moderating (or indirect) effects (i.e., gender, age, experience, and voluntariness of use), but these moderators were not examined as this study focused on obtaining the findings applicable to any gender and expected no different moderating effects from the participating students with similar ages, similar experience in using the technology and the same voluntary basis.

UTAUT model has been adopted in some previous students on the students’ acceptance of technology for learning the second language (L2). For example, a study by Ovchinnikova (2021) on a university in the Netherlands that uses L2 English as a medium of instruction for some programs explored the dominant UTAUT factors that affect the students’ acceptance of computer-assisted language learning for learning L2 English pronunciation. The finding in this study was that SI is the significant factor. Tan (2013) found that PE, EE, and SI have positive effects on the Taiwanese students’ intention to use English e-learning websites to learn L2 English.

The relevance for major (MR) construct, which is one’s belief that the use of online English pronunciation video clips is relevant to his or her major of study (e.g., English, linguistics, and journalism), was considered in the models by Park et al. (2012) and by Wong et al. (2019) as it fits the education case. Figure 1 shows the theoretical model based on UTAUT and the models by Park et al. (2012) and by Wong et al. (2019) for this study, where an arrow indicates direct determination as follows:

• PE, EE, SI → BI means PE, EE and SI directly determine BI.

• MR → PE means MR directly influences PE.

.• BI → UB means BI has a direct influence on UB

Figure 1. An extended and adjusted UTAUT model

Research Objectives and Significance

Having the theoretical grounds in mind, research on the Chinese students’ acceptance of using the online English pronunciation videos for learning English pronunciation in the Hong Kong higher education context could be designed. Specifically, based on the extended and adjusted UTAUT model in Figure 1, we addressed the following research question: What influences the students’ decision to adopt the online English pronunciation videos for learning?

This research adopted the model in Figure 1 as a theoretical framework to focus on the effect of the students’ MR mediated through PE (i.e., MR → PE) and the combined effect and the relative effects of the factors influencing the students’ intention (i.e., PE, EE, SI → BI) to use the online English pronunciation videos for learning in a Hong Kong higher education institution. Specifically, this research tested the following five hypotheses:

There is a significant effect of MR on PE.

There is a significant combined effect of PE, EE and SI on BI.

There is a significant effect of PE on BI.

There is a significant effect of EE on BI.

There is a significant effect of SI on BI.

The significance of this study is twofold. First, as shown in the reported studies reviewed in this paper and the study by Williams et al. (2015), various technologies (e.g., digital television, educational portal, instant messaging, Internet, smartphones, speech recognition system, tablets, etc.) were tested in previous studies. So far, few studies have tested video technology. Video technology is restricted to one-way communication with learners while accurate and timely feedback is essential for the learners to notice their L2 mispronunciations (Rogerson-Revell, 2021; Xu et al., 2021), but videos such as those on YouTube provide a handy and easy-to-use way for the learners. This study attempted to test video technology which is just one small part of technology. However, it helped to complete the technology list of technology acceptance study in the literature. Second, in addition to the policies and budgets for education management and teachers to consider for choosing a technology for students’ English pronunciation learning, the students’ perspectives can help the management and teachers to understand the effects of the technology and evaluate the effectiveness of its provision (Cooper, 1993). This study explored the rationale behind the students’ intention to use the online English pronunciation video technology for learning with reference to the extended and adjusted UTAUT model for the education management and teachers to implement that technology appropriately.

Literature Search

Initially, articles to be reviewed were determined by setting the inclusion criteria and deriving appropriate search terms. The inclusion criteria included the following:

• Empirical studies related to TAM and its extensions (e.g., UTAUT and UTAUT follow-up model) on students’ intention to use online English pronunciation learning videos.

• Studies on any construct of TAM and its extensions that influences the students’ intention to use online English pronunciation learning videos.

The search terms derived from the inclusion criteria were categorized to Who the user is, Which model the study is based on, What technology to use or What constructs to explore, How the technology is used for, and When the technology is used. Table 1 shows the search terms used initially for the literature search.

Table 1. Search Terms for Initial Literature Search

Category	Search Term
Who	Students
Which	Technology acceptance model, TAM, TAM extensions, unified theory of acceptance and use of technology, UTAUT, UTAUT follow-up model
What	Video, relevance for major, performance expectancy, effort expectancy, social influence, behavioral intention
How	English pronunciation
When	Learning

A combination of the search terms using logical operators and some keywords such as “influence” and “effect” were used to search through the Internet search engines (e.g., ERIC, Google, ProQuest, Scopus, and Web of Science) and libraries. The search results showed no previous studies related to TAM, TAM extensions, UTAUT, or UTAUT follow-up model that explored the effect of any of the constructs on the students’ intention to use English pronunciation videos for learning. Some previous studies related to using videos for learning English including its pronunciation were found, but they are not related to any technology acceptance model. For example, Watkins and Wilkins (2011) found that YouTube videos enhance the students’ speaking, listening, and pronunciation skills.

Methodology

A convenience sample of 145 Chinese students at the Hong Kong higher education institution who enrolled in a subject that involved English pronunciation learning with the use of online English pronunciation videos was requested to participate in this study. This English subject is an elective subject taken by students, primarily aged between 19 and 20, at their second year of associate degree study from various disciplines. The participating students were invited to complete a survey (termed as a pre-watch survey), then watch English pronunciation video clips and after that, complete another survey (termed as a post-watch survey). Each video clip lasted for about 3 to 8 minutes. These pre-watch and post-watch surveys were adopted to explore whether the students’ expected technology acceptance of watching the online video clips for learning English pronunciation and their actual perceived technology acceptance of watching the video clips are different. For a new technology (e.g., blockchain and metaverse) that is unfamiliar to students, much difference between the students’ expectations and their perceptions of utilizing that new technology is probable.

The online English pronunciation video clips were developed in accordance with the phonetics course materials and the common English pronunciation mistakes recorded and found from the students’ submitted English pronunciation coursework, presentations, quizzes, pronunciation activities in tutorials. This development was also based on the students’ English pronunciation learning difficulty reported in student-staff meetings and post-teaching reports. The online video clips show scenarios of conversations in English with Chinese explanations on how to pronounce some English words properly. For example, one video clip with some screenshots displayed in Figure 2 shows the scenario of instant messaging between two persons in which one person mispronounces the words “similar”, “data”, “structure” and “analyse” because of lack of vowel reduction in unstressed syllables. Another video clip with some screenshots displayed in Figure 3 shows a scenario of voice messaging between two people in which one person mispronounces the words “benefit”, “create” and “percent” because of consonant deletion. The other person in these video clips points out the mispronunciations and presents proper pronunciations.

Figure 2. Screenshots presenting lack of vowel reduction in unstressed syllables

Figure 3. Screenshots presenting consonant deletion

For the surveys, online questionnaires were designed to contain the measuring items similar to the validated measuring items used in the studies by Park et al. (2012) and Venkatesh et al. (2003). The online questionnaires for the pre-watch survey and the post-watch survey are shown in the Appendix. In these tables, the short form in round brackets ( ) is a code for that measuring item. In the questionnaires, a construct was operationalized by more than one similar item (e.g., PE was operationalized by PE1 to PE4) and was measured on a 5-point Likert’s (1932) scale with 5 = strongly agree, 4 = agree, 3 = neutral, 2 = disagree, and 1 = strongly disagree. Similar measuring items were used to operationalize a construct as those similar items could be used to measure the internal consistency reliability using Cronbach’s (1951) coefficient alpha. The similar measuring items of each of the constructs (i.e., PE, EE, SI, BI, and MR) should yield similar Likert’s scores. Cronbach’s coefficient alpha of 0.7 or above indicates similar Likert’s scores for a construct and therefore ensures the internal consistency reliability for that construct (DeVellis, 2012; Nunnally, 1978). Correlation and multiple regression analyses were then adopted to analyze the collected data to explore the effect of MR on PE and the combined effect and the relative effects of the factors PE, EE, and SI on BI. For the multiple regression analysis, the threshold for the sample size is 50 + 8v, where v is the number of independent variables (Tabachnick & Fidell, 2013). For the regression model PE, EE, SI → PU with three independent variables in Figure 1, the threshold for the sample size should be at least 50 + 8× 3 = 74. Our sample size n = 145 is larger than this required threshold.

Data Collection

Once this research study was approved by the ethics committee of the case Hong Kong higher education institution, the researchers invited the Chinese students to participate in this research study. The students taking an online course relating to English pronunciation at the institution during COVID-19 pandemic from the academic years 2020 to 2023 were given access to the online questionnaires and the English pronunciation video clips. Each academic year contains two 13-week semesters – Semester 1 and Semester 2. For this online course relating to English pronunciation, a two-hour lecture and a one-hour tutorial were conducted each week. The students were requested to view the English pronunciation video clips in some lectures and tutorials. A pre-watch survey was conducted at the beginning of each semester while a post-watch survey was performed at the end of each semester. Before conducting the surveys, the purpose, procedures, and scope of the research were explained to the students. 145 Chinese students responded with implied consent (Berg & Lune, 2012, p. 92) by completing the online questionnaires. For this research study, implied consent was approved by the ethics committee of the Hong Kong higher education institution. The participants clearly understood the study design and objectives. To keep highly confidential, all the participants’ particulars would not be disclosed in the study. When filling out the online questionnaires, the convenience sample of 145 Chinese participating students was not required to provide their identities and the collected survey data were stored and protected securely in a database provided by the institution to ensure informant anonymity and confidentiality.

Data Analyses

To ensure the reliability of the students’ completed measuring items, Cronbach’s alpha coefficients were computed. Then, a paired-samples t-test was used to explore the difference between the students’ expected technology acceptance constructs and the student’s perceived technology acceptance constructs. For the model MR → PE in Figure 1, to evaluate the effect of the independent variable MR on the dependent variable PE, correlation analysis was carried out. Correlation analysis can determine the strength of the effect of MR on PE and the linear relationship direction between MR and PE. For the other model PE, EE, SI → BI in Figure 1, to explore whether a difference exists in the effects of PE, EE, and SI on the students’ BI, multiple regression analysis was used. As this regression model PE, EE, SI → BI was already in mind, multiple regression analysis was appropriate for explanatory research to determine the combined and relative effects of a set of the independent variables PE, EE, and SI on the dependent variable BI (Keith, 2019).

Results

The internal consistency reliability of scales for the questionnaires’ measuring items was generated by the statistical tool Statistical Package for the Social Sciences version 26 (in short, SPSS), as shown in Table 2. The internal consistency reliability of all the measuring items is acceptable with all their Cronbach’s alpha coefficients above 0.7. Cronbach’s alpha coefficients for PE, SI, and BI are 0.84 or above, suggesting a very good internal consistency reliability for these scales (Pallant, 2020, p. 105).

Table 2. Combined Means and Internal Consistency Reliabilities of the Measuring Items

Con-struct	Item (n = 145)	Item Mean		Standard Deviation		Reliability (Cronbach’s Alpha Coefficient)		Paired-Samples T-Test Result (Probability Value)
Con-struct	Item (n = 145)	Pre-Watch	Post-Watch	Pre-Watch	Post-Watch	Pre-Watch	Post-Watch	Paired-Samples T-Test Result (Probability Value)
PE	PE1	3.92	4.31	0.750	0.777	0.90	0.94	0.000
	PE2	3.85	4.30	0.785	0.811			0.000
	PE3	3.70	4.19	0.851	0.900			0.000
	PE4	3.79	4.28	0.843	0.812			0.000
EE	EE1	3.88	4.37	0.777	0.772	0.93	0.93	0.000
	EE2	3.85	4.32	0.885	0.832			0.000
	EE3	3.86	4.30	0.787	0.809			0.000
	EE4	3.86	4.36	0.764	0.733			0.000
	EE5	3.86	4.41	0.764	0.722			0.000
SI	SI1	3.63	4.21	0.889	0.875	0.86	0.91	0.000
	SI2	3.61	4.21	0.836	0.889			0.000
	SI3	3.54	4.12	0.913	0.912			0.000
	SI4	3.80	4.19	0.855	0.827			0.000
BI	BI1	3.71	4.14	0.857	1.014	0.92	0.96	0.000
	BI2	3.68	4.11	0.841	1.001			0.000
	BI3	3.68	4.06	0.831	1.053			0.001
MR	MR1	3.92	4.24	0.795	0.827	0.84	0.89	0.001
MR	MR2	3.86	4.19	0.816	0.876	0.84	0.89	0.001

The SPSS-generated paired-samples t-test results show all the significance values between each pair of the pre-watch and post-watch technology acceptance constructs are less than 0.05, meaning that there is a significant difference between the students’ expectation and their perception of utilizing the English pronunciation video clips. It can be seen from Table 2 that all the means for the post-watch technology acceptance constructs are greater than 4 out of the maximum value of 5 while those for pre-watch are less than 4, indicating that there is a significant increase in the students’ technology acceptance constructs from pre-watch to post-watch. That is, the students tend to accept utilizing the online English pronunciation videos after experiencing watching those videos for English pronunciation learning.

For the correlation and multiple regression analyses, the actual students’ perceived technology acceptance constructs (i.e., post-watch technology acceptance constructs) were used. For the model MR → PE, the SPSS-generated results from the correlation analysis are shown in Table 3. MR is positively correlated with PE, as indicated by the strong Pearson correlation coefficient value of 0.741 with the significance level ρ < 0.01. According to Cohen (1988, pp. 79–81), the Pearson correlation coefficient larger than 0.5 indicates a strong correlation.

Table 3. Correlation Between MR and PE

Construct	PE	MR
PE	1	0.741 **
MR	0.741 **	1

**ρ < 0.01 (2-tailed) n = 145.

For the regression model PE, EE, SI → BI, simultaneous multiple regression was run by SPSS. By simultaneous multiple regression, all the independent variables PE, EE, and SI were entered into the regression equation simultaneously to come up with the combined effect on the students’ BI. This combined effect is shown by the adjusted R² value. To determine the relative effects of PE, EE, and SI on BI, the standardized coefficients for different independent variables (i.e., PE, EE, and SI) which had been converted to the same scale could be used for comparison (Keith, 2019).

The assumptions for multiple regression (i.e., normality, linearity, homoscedasticity, and multicollinearity) were checked with the SPSS-generated results. The SPSS-generated normal probability plot of the regression standardized residuals and the SPSS-generated scatterplot were used to check the assumptions of normality, linearity, and homoscedasticity. As the points lie around the straight diagonal line in the normal probability plot and only a few outliers with a standardized residual of more than 3.3 or less than −3.3 occur in the scatterplot (Tabachnick & Fidell, 2013), there is no violation of the assumptions of normality, linearity, and homoscedasticity. The threshold for the presence of multicollinearity in the regression model is the tolerance value of 0.1 (Pallant, 2020, p. 156). Any tolerance value for each independent variable of less than 0.1 indicates that the independent variable has a high correlation with some other independent variables. Since the SPSS-generated tolerance values are all above 0.1, the multicollinearity assumption is not violated.

Table 4 shows the results of the multiple regression analysis that explained BI. It shows the effects of PE, EE, and SI on BI. This model explained 73.6% of the variance in BI. The significant results, indicated by ρ < 0.05, show that SI contains the largest standardized regression coefficient, meaning that SI is the strongest determinant of BI while PE also influences BI. There is no significant effect of EE on BI.

Table 4. Regression Results Explaining the Students’ BI

Independent Variable	Adjusted R²	Standardized Regression Coefficient
	0.736 *
PE		0.390 *
EE		0.103
SI		0.412 *

*ρ < 0.05 n = 145.

Discussion

The effect of how various technologies enhance learning has been discussed in literature (Carnaghan et al., 2011; Lau et al., 2021; Tang et al., 2021). However, only a few studies (e.g., Hammer, 2000; Schwartz & Hartman, 2007; Sherin, 2004; Snelson, 2018; Watkins & Wilkins, 2011) in the literature explored how video technology enhances learning. This study is in line with the research direction of these few studies by exploring how video technology enhances learning in the Hong Kong higher education context. While the findings by Watkins and Wilkins (2011) revealed that the use of short video clips can help English pronunciation learning, some new implications applicable for the education management and practitioners to implement video technology for English pronunciation learning emerge from the findings of this study.

In this study, the Chinese students in the Hong Kong higher education context were invited to complete the online questionnaire for the pre-watch survey, then experienced watching the online English pronunciation video clips that were targeted at the common English academic vocabulary used in higher education and then completed the online questionnaire for the post-watch survey. The collected quantitative data from the surveys were analyzed with paired-samples t-test, correlation and multiple regression.

The analytical results from the paired-samples t-test in this study revealed that there is a significant increase in the students’ technology acceptance constructs after experiencing watching the online English pronunciation videos for English pronunciation learning. In this regard, the Chinese students in Hong Kong higher education like to use the online English pronunciation videos for English pronunciation learning once they have experienced watching the videos. Therefore, showing actively these online English pronunciation videos by teachers in English pronunciation activities such as teaching, student activities, and seminars can encourage the Chinese students to adopt this video technology for English pronunciation learning.

In Figure 1, the model MR → PE indicates the relevance for major influences the performance expectancy. As this model MR → PE used in this study involves one predictor variable and one outcome, correlation analysis should be good enough. The analytical result from correlation analysis showed that there is a significant strong effect of MR on PE. In this regard, H1 is accepted.

The model PE, EE, SI → BI in Figure 1 indicates a combined effect of performance expectancy, effort expectancy and social influence on the behavioral intention. As this model PE, EE, SI → BI used in this study involves three predictor variables and one outcome, multiple regression analysis is needed. In Table 4, the Adjusted R² in the multiple regression analysis result indicated the combined effect of the three predictor variables on the outcome. Standardized Regression Coefficient in the multiple regression analysis indicated a significant effect of PE on BI, so H3 is accepted; the Standardized Regression Coefficient also indicated a significant effect of SI on BI, so H5 is accepted. However, there is no significant combined effect of PE, EE, and SI on BI and no significant effect of EE on BI. In these regards, H2 and H4 are rejected.

The significant strong effect of MR on PE and the significant effect of PE on BI indicate that the Chinese students’ major of study is relevant to their performance expectancy, which in turn influences their behavioral intention of using the English pronunciation video clips. If a student’s major of study is closely related to English pronunciation, then that student has a high expectation of English pronunciation and intends to adopt the English pronunciation video clips for learning proper English pronunciation.

The significant effect of SI on BI indicates that the Chinese students’ perception of social influence influences their acceptance of using the online English pronunciation videos for learning. By comparison with the effect of PE, the effect of SI is stronger, indicating that the Chinese students’ perception of social influence is a stronger motive for adopting the online English pronunciation video clips for learning. In this sense, for the Chinese students in Hong Kong higher education to learn English pronunciation by using the online English pronunciation videos, the influence of their social groups such as the education administrators, their teachers, peer students, friends, and parents play an important role.

Conclusions and Implications

This study adopted a quantitative approach to explore the answers for the research question: What influences the students’ decision to adopt the online English pronunciation videos for learning? The paired-samples t-test analytical results showed once the Chinese students who have experienced watching the online English pronunciation video clips are more inclined to accept the use of these video clips for English pronunciation learning. The analytical results of correlation and multiple regression analyses revealed that the Chinese students’ relevance for major strongly influences their performance expectancy. In turn, the Chinese students’ performance expectancy influences their behavioral intention to use the video clips for English pronunciation learning significantly.

Also, the Chinese students’ social influence affects their behavioral intention to adopt the video clips for English pronunciation learning significantly. This result provides an implication for further study focusing on how each of the social groups (e.g., teachers, peer students, and friends) can help in students’ English pronunciation learning process with the use of the online English pronunciation videos in Hong Kong’s higher education context. Ganotice and King (2014) and King and Ganotice (2014) found that social support from parents, teachers, and peers helps in students’ academic achievement. Tang et al. (2022) found that social groups bring a significant impact on deep learning and provide an effective method to encourage higher-order thinking and metacognition. How social groups help Chinese students in Hong Kong higher education to learn using the online English pronunciation videos remains to be explored.

This study can be extended to any similar situation in which the students are familiar with their L1 but learn using their less familiar L2 like the cases in China (Deterding, 2006), Singapore (Deterding, 2003, 2007; Kirkpatrick & Saunders, 2005) and some other Asian countries (Deterding & Kirkpatrick, 2006). Such a comparative study may generalize this research study and improve its validity. Moreover, a controlled study is recommended to compare the students’ English pronunciation performance with and without the use of online English pronunciation video clips. These comparison results can be used to explore the relationship with the students’ acceptance of using the online English pronunciation videos for learning. In addition, this study mainly used a quantitative research approach from the students’ perspectives. To offset the weakness of this research methodology, we may add the qualitative research approach via semi-structured and in-depth interviews with various stakeholders like policymakers, educators, education associations, and management of higher education institutions to get valuable insights into how to improve the teaching pedagogy and how to set the new research agenda in the forthcoming years. Besides, a limitation in this study is that the UTAUT moderator effects (e.g., gender) were not thoroughly examined. Further study in this area should also consider these moderator effects.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Pedagogical Innovation Fund from the College of Professional and Continuing Education, an affiliate of The Hong Kong Polytechnic University.

ORCID iD

Simon Wong https://orcid.org/0000-0003-3408-9747

References

Benson P., Voller P. (1997). Introduction: Autonomy and Independence in language learning. In Benson P., Voller P. (Eds.), Autonomy and Independence in language learning. Longman.

Construct	Measuring Item
Performance expectancy (PE)	I Expect English pronunciation learning video clips to be useful for my learning of English pronunciation. (PE1)
	Using the English pronunciation learning video clips enables me to learn English pronunciation more quickly. (PE2)
	The English pronunciation learning video clips can motivate me to learn English pronunciation. (PE3)
	I Am convinced that the use of English pronunciation learning video clips will add value to my experience of learning English pronunciation. (PE4)
Effort expectancy (EE)	I Expect the English pronunciation learning video clips to be easy to use. (EE1)
	I Expect that I won’t feel stressed when using the English pronunciation learning video clips. (EE2)
	I Expect that I won’t need much technical expertise to effectively use the English pronunciation learning video clips for learning English pronunciation. (EE3)
	I Expect that the use of the English pronunciation learning video clips reduced will reduce both time and effort associated with traditional learning methods. (EE4)
	I Expect the use of the English pronunciation learning video clips for learning English pronunciation was [to be] not frustrating. (EE5)
Social influence (SI)	People who tend to influence my behavior think that I should use the English pronunciation learning video clips to learn English pronunciation. (SI1)
	People who are important to me think that I should use the English pronunciation learning video clips to learn English pronunciation. (SI2)
	I Found myself encouraged by other classmates to use the English pronunciation learning video clips to learn English pronunciation. (SI3)
	The use of the English pronunciation learning video clips for learning English pronunciation is encouraged by my lecturer(s). (SI4)
Behavioral intention (BI)	I Expect that after viewing the English pronunciation learning video clips, I will intend to re-watch the English pronunciation learning video clips to learn English pronunciation in the next 6 months. (BI1)
	I Expect that after viewing the English pronunciation learning video clips, I will re-watch the English pronunciation learning video clips to learn English pronunciation in the next 6 months. (BI2)
	I Expect that after viewing the English pronunciation learning video clips, I will plan to re-watch the English pronunciation learning video clips to learn English pronunciation in the next 6 months. (BI3)
Major of relevance (MR)	Using the English pronunciation learning video clips for learning English pronunciation is relevant to my study. (MR1)
Major of relevance (MR)	Using the English pronunciation learning video clips for learning English pronunciation can help me understand the courses in my study. (MR2)

Construct	Measuring Item
Performance expectancy (PE)	I Found English pronunciation learning video clips useful for my learning of English pronunciation. (PE1)
	Using the English pronunciation learning video clips enabled me to learn English pronunciation more quickly. (PE2)
	The English pronunciation learning video clips can motivate me to learn English pronunciation. (PE3)
	I Am convinced that the use of English pronunciation learning video clips will add value to my experience of learning English pronunciation. (PE4)
Effort expectancy (EE)	The English pronunciation learning video clips was easy to use. (EE1)
	I didn’t feel stressed when using the English pronunciation learning video clips. (EE2)
	I didn’t need much technical expertise to effectively use the English pronunciation learning video clips for learning English pronunciation. (EE3)
	The use of the English pronunciation learning video clips reduced both time and effort associated with traditional learning methods. (EE4)
	The use of the English pronunciation learning video clips for learning English pronunciation was [to be] not frustrating. (EE5)
Social influence (SI)	People who tend to influence my behavior think that I should use the English pronunciation learning video clips to learn English pronunciation. (SI1)
	People who are important to me think that I should use the English pronunciation learning video clips to learn English pronunciation. (SI2)
	I Found myself encouraged by other classmates to use the English pronunciation learning video clips to learn English pronunciation. (SI3)
	The use of the English pronunciation learning video clips for learning English pronunciation is encouraged by my lecturer(s). (SI4)
Behavioral intention (BI)	I Intend to re-watch the English pronunciation learning video clips to learn English pronunciation in the next 6 months. (BI1)
	I Predict that I will re-watch the English pronunciation learning video clips to learn English pronunciation in the next 6 months. (BI2)
	I Plan to re-watch the English pronunciation learning video clips to learn English pronunciation in the next 6 months. (BI3)
Major of relevance (MR)	Using the English pronunciation learning video clips for learning English pronunciation is relevant to my study. (MR1)
Major of relevance (MR)	Using the English pronunciation learning video clips for learning English pronunciation can help me understand the courses in my study. (MR2)