USING R IN THE RESEARCH BY FUTURE PHILOLOGISTS

Cover Image
Victoriia V. Zhukovska, Oleksandr O. Mosiiuk, Veronika V. Komarenko

Abstract

Corpus linguistics is a newly emerging field of study in applied linguistics that deals with construction, processing, and exploitation of text corpora. To date, a high-quality analysis of vast amounts of empirical language data provided by computerized corpora is impossible without computer technologies and relevant statistical methods. Therefore, teaching future philologists to effectively apply statistical computer programs is an important stage in their research training. The article discusses the possibilities of using one of the leading in Western linguistics, but not well-known in Ukraine, software packages for statistical data analysis – R statistical software environment – in the research by future philologists. The paper reveals the advantages and disadvantages of this program in comparison with other similar software packages (SPSS and Statistica) and provides Internet links to R self-learn tutorials. The flexibility and efficacy of R for linguistic research are demonstrated on the example of a statistical analysis of the use of hedges in the corpus of academic speech. For novice philologists to properly understand the peculiarities of conducting a statistical linguistic experiment with R, a detailed description of each stage of the study is provided. The statistical verification of hedges in the speech of students and lecturers was carried out using such statistical methods as the Kolmogorov–Smirnov test and the Mann-Whitney U Test. The article presents the developed algorithms to calculate the specified tests applying the built-in commands and various specialized library functions, created by R user community to enhance the functionality of this statistical software. Each script for statistical calculations in R is accompanied by a detailed description and interpretation of the results obtained. Further study of the issue will involve a number of activities aimed at raising awareness and improving skills of future philologists in using R statistical software, which is important for their professional development as researchers.

Keywords

R statistical software environment; corpus of academic speech; hedges; the Kolmogorov-Smirnov test; the Mann-Whitney U Test



References

L. A. Janda, Quantitative Methods in Cognitive Linguistics. An Introduction, Cognitive linguistics. The quantitative turn. The essential reader, Berlin : De Gruyter Mouton, 2013, 321 p.

С. Н. Бук, Основи статистичної лінгвістики, Львів: Видавничий центр ЛНУ імені Івана Франка, 2008.

"What is R? " [Електронний ресурс]. Доступно: https://www.r-project.org/ about.html.

"R resources (free courses, books, tutorials, & cheat sheets) ". [Електронний ресурс]. Доступно: https://paulvanderlaken. com/2017/08/10/r-resources-cheatsheets-tutorials-books/.

"Why RStudio? " [Електронний ресурс]. Доступно: https://www.rstudio.com/ about/.

"Michigan corpus of academic spoken English". [Електронний ресурс]. Доступно: https://quod.lib.umich.edu/m/micase/.

D. Lakoff, Hedges: "A study in meaning criteria and the logic of fuzzy concepts", Journal of philosophical logic, №. 2 (4), 1972, p. 458-508.

А. В. Ярхо, "Референційний хеджинг як стратегія етикетизації у дискурсі англомовної науково-дослідницької статті: контрастивний аналіз", Вісник Харківського національного університету імені В. Н. Каразіна. №930 Серія «Романо-германська філологія. Методика викладанні іноземних мов», 2010, Випуск 64, С. 82-90.

В. В. Шилюк, "Класифікація засобів вираження позиції мовця в усній комунікації: порівняльний аналіз", Вісник Житомирського державного університету, Вип. 2 (80), 2015, С. 302-308.

Е. В. Сидоренко, Методы математической обработки в психологии, СПб., ООО «Речь», 2000.

Л. В. Шелехова, Математические методы в педагогике и психологи: в схемах и таблицах: учебное пособие, Майкоп, изд-во АГУ, 2010.

В. В. Левицкий, Квантитативные методы в лингвистике, Черновцы, Рута, 2004.

Р. Г. Пиотровский, К. Б. Бектаев, А. А. Пиотровская, Математическая лингвистика : учеб. пособие для пед. институтов, М., Высшая школа, 1977.

H. W. Lilliefors, "On the Kolmogorov-Smirnov test for normality with mean and variance unknown", Journal of the American Statistical Association, Vol. 62, 1967, p. 399-402.

"Package ‘nortest’". [Електронний ресурс]. Доступно: https://cran.r-project.org/web/packages/nortest/nortest.pdf.

R. M. Conroy, "What hypotheses do “nonparametric” two-group tests actually test?", The Stata Journal, № 2, 2012, р. 182-190.

F. Wilcoxon, "Individual comparisons by ranling methods", Biometrics Bull, vol. 1, 1945, p. 80-83.

H. B. Mann, D. R. Whitney, "On a test of whether one of two random variables is stochastically larger than the other", Annals of Mathematical Statistics, vol. 18, № 1, 1947, p. 50-60.

А. Б. Шипунов, А. И. Коробейников, Е. М. Балдин, "Анализ данных с R (II"). [Електронний ресурс]. Доступно: http://www.inp.nsk.su/~baldin/DataAnalysis/ R/R-05-2var.pdf.


REFERENCES (TRANSLATED AND TRANSLITERATED)

L. A. Janda, Quantitative Methods in Cognitive Linguistics. An Introduction, Cognitive linguistics. The quantitative turn. The essential reader, Berlin : De Gruyter Mouton, 2013, (in English).

S. N. Buk, The Basics of Statistical Linguistics: educational method. manual, Lviv: Publishing Center of Ivan Franko National University of LNU, 2008, (in Ukrainian).

What is R? : [Online]. Available: https://www.r-project.org/about.html, (in English).

R resources (free courses, books, tutorials, & cheat sheets). [Online]. Available: https://paulvanderlaken.com/2017/08/10/r-resources-cheatsheets-tutorials-books/, (in English).

Why RStudio? [Online]. Available: https://www.rstudio.com/about/, (in English).

Michigan corpus of academic spoken English. [Online]. Available: https://quod.lib.umich.edu/m/ micase/, (in English).

D. Lakoff, Hedges: A study in meaning criteria and the logic of fuzzy concepts, Journal of philosophical logic, №. 2 (4), 1972, p. 458 - 508. (in English).

A. V. Yarkho, Referential hedging as an etiquette strategy in the discourse of an anglo-american scientific research paper: a contrastive analysis, Journal of Kharkiv National University named after V. N. Karazin, №930 Series «Romano-Germanic Philology. Methodology of Teaching Foreign Languages», 2010, issue 64, p. 82-90., (in Ukrainian).

V. V. Shiluk, Classification of means of expressing the position of the speaker in spoken communication: comparative analysis, Bulletin of Zhytomyr State University, issue 2 (80), 2015, p. 302 - 308, (in Ukrainian).

E. V. Sydorenko, Methods of mathematical processing in psychology, SPb: OOO “Rech”, 2000, (in Russian).

L. V. Shelekhova, Mathematical Methods in Pedagogy and Psychologists: in Schemes and Tables: Textbook, Maykop: ASU Publishing house, 2010, (in Russian).

V. V. Levitsky, Quantitative methods in linguistics, Chernivtsi: Ruta, 2004, (in Russian).

R. G. Piotrovsky, K. B. Bektaev, A. A. Piotrovskaya, Mathematical Linguistics: Textbook for pedagogical institutes, Moscow: «Higher School», 1977, (in Russian).

H. W. Lilliefors, On the Kolmogorov-Smirnov test for normality with mean and variance unknown, Journal of the American Statistical Association, vol. 62,1967, p. 399 - 402, (in English).

Package ‘nortest’. [Online]. Available: https://cran.r-project.org/web/packages/nortest/nortest.pdf, (in English).

R. M. Conroy, What hypotheses do “nonparametric” two-group tests actually test?, The Stata Journal, № 2, 2012, p. 182 - 190, (in English).

F. Wilcoxon, Individual comparisons by ranling methods, Biometrics Bull, vol. 1, 1945, p. 80 - 83, (in English).

H. B. Mann, D. R. Whitney, On a test of whether one of two random variables is stochastically larger than the other, Annals of Mathematical Statistics, vol. 18, № 1, 1947, p. 50 - 60, (in English).

A. B. Shipunov, A. I. Korobeinikov, E. M. Baldin, Analysis of data with R (II). [Online]. Available: http://www.inp.nsk.su/~baldin/DataAnalysis/R/R-05-2var.pdf, (in Russian).





Copyright (c) 2018 Victoriia V. Zhukovska, Oleksandr O. Mosiiuk, Veronika V. Komarenko


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.