Abstract
The study aims to develop and methodologically justify a comprehensive psychometric-didactic model for the validation and systematic integration of micro-assessment based on Large Language Models into the process of preparing international students for the Ukrainian Language Proficiency Test. This research specifically addresses the "psychometric gap" between traditional linear testing environments and the rapid, large-scale content generation capabilities of modern generative artificial intelligence. The research employs an integrated "DBR-ABV Loop" model, which synthesizes Design-Based Research for iterative task improvement and Argument-Based Validation for continuous evidence collection regarding assessment reliability. The methodological framework follows a rigorous four-stage cycle: Domain Definition through prompt design based on B1/B2 descriptors, Production of micro-tasks and distractors, empirical Testing with student interaction logs, and Reflection to update design principles. The implementation of the proposed model demonstrates that LLM-generated tasks, when supported by adaptive feedback and dynamic distractor generation based on interference errors, significantly enhance diagnostic accuracy and reduce random guessing. The study reveals that the system's ability to adjust linguistic complexity in real-time ensures an optimal level of cognitive load for each learner. Instant, explanatory feedback operates within the student’s Zone of Proximal Development, providing necessary scaffolding that fosters linguistic autonomy and reduces exam-related anxiety. The DBR-ABV Loop effectively bridges the "psychometric gap" between high-speed AI content generation and the requirement for scientific validity in language testing. The transition from static testing to adaptive micro-assessment transforms ULPT preparation from a stressful control mechanism into a supportive, personalized learning process. This model provides a solid foundation for personalized educational trajectories and creates new perspectives for scaling the system to assess productive speech skills such as writing and speaking.
References
[1] C. Narreddy, S. Joordens, and S. Prompiengchai, ‘Harnessing Large Language Models for Scalable and Effective Formative Assessment in Higher Education: A Review’, Trends in Higher Education, vol. 4, no. 4, p. 65, Oct. 2025, doi: 10.3390/higheredu4040065. (in English)
[2] OECD, PISA 2022 Results (Volume II): Learning During – and From – Disruption. in PISA. OECD Publishing, 2023. doi: 10.1787/a97db61c-en. (in English)
[3] ‘PISA 2025 Foreign Language Assessment’. [Online]. Available: https://www.oecd.org/en/topics/sub-issues/foreign-language-learning/pisa-2025-foreign-language-assessment.html (in English)
[4] ‘Design-based research: putting a stake in the ground’, in Design-based Research, Psychology Press, 2016, pp. 1–14. doi: 10.4324/9780203764565-2. (in English)
[5] ‘Design-based research: an emerging paradigm for educational inquiry’, Educational Researcher, vol. 32, no. 1, pp. 5–8, Jan. 2003, doi: 10.3102/0013189X032001005. (in English)
[6] A. L. Brown, ‘Design experiments: theoretical and methodological challenges in creating complex interventions in classroom settings’, Journal of the Learning Sciences, vol. 2, no. 2, pp. 141–178, Apr. 1992, doi: 10.1207/s15327809jls0202_2. (in English)
[7] T. Amiel and T. C. Reeves, ‘Design-Based Research and Educational Technology: Rethinking Technology and the Research Agenda’, Journal of Educational Technology & Society, vol. 11, no. 4, pp. 29–40, 2008. (in English)
[8] C. Johnson et al., ‘Using Design-Based Research to Develop Meaningful Online Discussions in Undergraduate Field Experience Courses’, IRRODL, vol. 18, no. 6, Sep. 2017, doi: 10.19173/irrodl.v18i6.2901. (in English)
[9] M. Kane, ‘The Argument-Based Approach to Validation’, School Psychology Review, vol. 42, no. 4, pp. 448–457, Dec. 2013, doi: 10.1080/02796015.2013.12087465. (in English)
[10] S. Davis-Becker and C. W. Buckendahl, Testing in the Professions: Credentialing Policies and Practice, 1st edn. New York: Routledge, 2017. doi: 10.4324/9781315751672. (in English)
[11] M. Wang and Y. Gao, ‘Artificial intelligence-driven personalized language learning: Customizing content and feedback to learners’ needs and proficiency levels’, call-research, vol. 2025, pp. 179–184, Sep. 2025, doi: 10.29140/97817637116240-23. (in English)
[12] M. Kane, ‘Certification Testing as an Illustration of Argument-Based Validation’, Measurement: Interdisciplinary Research & Perspective, vol. 2, no. 3, pp. 135–170, Jul. 2004, doi: 10.1207/s15366359mea0203_1. (in English)
[13] A. C. Huggins‐Manley, B. M. Booth, and S. K. D’Mello, ‘Toward Argument‐Based Fairness with an Application to AI‐Enhanced Educational Assessments’, J Educational Measurement, vol. 59, no. 3, pp. 362–388, Sep. 2022, doi: 10.1111/jedm.12334. (in English)
[14] R. K. Hambleton, H. Swaminathan, and H. J. Rogers, Fundamentals of item response theory. SAGE, 1991. (in English)
[15] F. M. Lord, Applications of item response theory to practical testing problems. Hoboken: Taylor and Francis, 2012. (in English)
[16] P. Li et al., ‘Adaptive Testing for LLM Evaluation: A Psychometric Alternative to Static Benchmarks’, 2025, arXiv. doi: 10.48550/ARXIV.2511.04689. (in English)
[17] S. E. Embretson and S. P. Reise, Item Response Theory, 0 edn. Psychology Press, 2013. doi: 10.4324/9781410605269. (in English)
[18] Y. Chen, X. Li, J. Liu, and Z. Ying, ‘Item Response Theory—A Statistical Framework for Educational and Psychological Measurement’, Statist. Sci., vol. 40, no. 2, May 2025, doi: 10.1214/23-STS896. (in English)
[19] N. Milano, M. Ponticorvo, and D. Marocco, ‘Human Expertise and Large Language Model Embeddings in the Content Validity Assessment of Personality Tests’, Educational and Psychological Measurement, p. 00131644251355485, Aug. 2025, doi: 10.1177/00131644251355485. (in English)
[20] W. Wang and N. Kingston, ‘Adaptive Testing With a Hierarchical Item Response Theory Model’, Applied Psychological Measurement, vol. 43, no. 1, pp. 51–67, Jan. 2019, doi: 10.1177/0146621618765714. (in English)
[21] B. Chanan, ‘How to Use Item Response Theory (IRT) for Adaptive Testing’. [Online]. Available: https://bksoftwaredevelopment.com/blog/how-to-use-item-response-theory-irt-for-adaptive-testing (in English)
[22] L. S. Vygotsky and M. Cole, Mind in society: development of higher psychological processes. Harvard University Press, 1978. (in English)
[23] J. van de Pol, M. Volman, and J. Beishuizen, ‘Scaffolding in teacher–student interaction: a decade of research’, Educ Psychol Rev, vol. 22, no. 3, pp. 271–296, Sep. 2010, doi: 10.1007/s10648-010-9127-6. (in English)
[24] S. Chaiklin, ‘The Zone of Proximal Development in Vygotsky’s analysis of learning and instruction’, in Vygotsky’s Educational Theory in Cultural Context, Cambridge University Press, 2003, pp. 39–65. (in English)
[25] J. Hattie and H. Timperley, ‘The Power of Feedback’, Review of Educational Research, vol. 77, no. 1, pp. 81–112, Mar. 2007, doi: 10.3102/003465430298487. (in English)
[26] R. J. Krumsvik, ‘GPT-4’s capabilities for formative and summative assessments in Norwegian medicine exams—an intrinsic case study in the early phase of intervention’, Front. Med., vol. 12, p. 1441747, Apr. 2025, doi: 10.3389/fmed.2025.1441747. (in English)
[27] G. Siemens, ‘Connectivism: A Learning Theory for the Digital Age’, Elearnspace, pp. 14–16, 2004. (in English)
[28] J. G. S. Goldie, ‘Connectivism: A knowledge learning theory for the digital age?’, Medical Teacher, vol. 38, no. 10, pp. 1064–1069, Oct. 2016, doi: 10.3109/0142159X.2016.1173661. (in English)
[29] C. Halkiopoulos and E. Gkintoni, ‘Leveraging AI in E-Learning: Personalized Learning and Adaptive Assessment through Cognitive Neuropsychology—A Systematic Analysis’, Electronics, vol. 13, no. 18, p. 3762, Sep. 2024, doi: 10.3390/electronics13183762. (in English)
[30] K. Fullbrook, ‘A complex systems approach to educational change and innovation’. [Online]. Available: https://www.cois.org/about-cis/news/post/~board/perspectives-blog/post/a-complex-systems-approach-to-educational-change-and-innovation (in English)
[31] T. D. Henriksen and S. Ejsing-Duun, ‘Implementation in Design-Based Research Projects: A Map of Implementation Typologies and Strategies’, NJDL, vol. 17, no. 4, pp. 234–247, Dec. 2022, doi: 10.18261/njdl.17.4.4. (in English)
[32] M. A. Hjalmarson and A. W. Parsons, ‘Conjectures, Cycles and Contexts: A Systematic Review of Design-based Research in Engineering Education’, Studies in Engineering Education, vol. 1, no. 2, p. 142, Mar. 2021, doi: 10.21061/see.35. (in English)
[33] C. Hoadley and F. C. Campos, ‘Design-based research: What it is and why it matters to studying online learning’, Educational Psychologist, vol. 57, no. 3, pp. 207–220, Jul. 2022, doi: 10.1080/00461520.2022.2079128. (in English)
[34] L. Eyal, ‘Developing and Validating an AI-TPACK Assessment Framework: Enhancing Teacher Educators’ Professional Practice Through Authentic Artifacts’, Education Sciences, vol. 15, no. 11, p. 1452, Nov. 2025, doi: 10.3390/educsci15111452. (in English)
[35] B. L. Lakhe Shrestha, N. Dahal, Md. K. Hasan, S. Paudel, and H. Kapar, ‘Generative AI on professional development: a narrative inquiry using TPACK framework’, Front. Educ., vol. 10, p. 1550773, Jun. 2025, doi: 10.3389/feduc.2025.1550773. (in English)
[36] D. A. Schmidt, E. Baran, A. D. Thompson, P. Mishra, M. J. Koehler, and T. S. Shin, ‘Technological pedagogical content knowledge (Tpack): the development and validation of an assessment instrument for preservice teachers’, Journal of Research on Technology in Education, vol. 42, no. 2, pp. 123–149, Dec. 2009, doi: 10.1080/15391523.2009.10782544. (in English)
[37] P. Mishra and M. J. Koehler, ‘Technological pedagogical content knowledge: a framework for teacher knowledge’, Teachers College Record: The Voice of Scholarship in Education, vol. 108, no. 6, pp. 1017–1054, Jun. 2006, doi: 10.1111/j.1467-9620.2006.00684.x. (in English)
[38] Z. Zhang and X. Huang, ‘The impact of chatbots based on large language models on second language vocabulary acquisition’, Heliyon, vol. 10, no. 3, p. e25370, Feb. 2024, doi: 10.1016/j.heliyon.2024.e25370. (in English)
[39] O. Cherednichenko, O. Yanholenko, A. Badan, N. Onishchenko, and N. Akopiants, ‘Large language models for foreign language acquisition’, in Computational Linguistics Workshop at CoLInS 2024, Lviv, Ukraine, 13.04 2024, pp. 101–130. [Online]. Available: https://ceur-ws.org/Vol-3722/paper8.pdf(in English)
[40] Y. Shi, K. Yu, Y. Dong, and F. Chen, ‘Large language models in education: a systematic review of empirical applications, benefits, and challenges’, Computers and Education: Artificial Intelligence, vol. 10, p. 100529, Jun. 2026, doi: 10.1016/j.caeai.2025.100529. (in English)
[41] G. Andreou and P. Christani, ‘The Benefits and Limitations of the Use of Generative Artificial Intelligence Tools in the Acquisition of Productive Skills in English as a Foreign Language—A Systematic Analysis’, Applied Sciences, vol. 15, no. 21, p. 11476, Oct. 2025, doi: 10.3390/app152111476. (in English)
[42] C. Dhanapal, N. Asharudeen, and S. Y. Alfaruque, ‘Impact of Artificial Intelligence Versus Traditional Instruction for Language Learning: A Survey’, WJEL, vol. 14, no. 2, p. 182, Jan. 2024, doi: 10.5430/wjel.v14n2p182. (in English)
[43] O. Fagbohun, N. P. Iduwe, M. Abdullahi, A. Ifaturoti, and O. M. Nwanna, ‘Beyond Traditional Assessment: Exploring the Impact of Large Language Models on Grading Practices’, JAIMLD, vol. 2, no. 1, pp. 1–8, Feb. 2024, doi: 10.51219/JAIMLD/oluwole-fagbohun/19. (in English)
[44] J. Ye et al., ‘Position: LLMs Can be Good Tutors in English Education’, 2025, arXiv. doi: 10.48550/ARXIV.2502.05467. (in English)
[45] P. Polakova and B. Klimova, ‘Implementation of AI-driven technology into education – a pilot study on the use of chatbots in foreign language learning’, Cogent Education, vol. 11, no. 1, p. 2355385, Dec. 2024, doi: 10.1080/2331186X.2024.2355385. (in English)
[46] S. Melumad and J. H. Yun, ‘Experimental evidence of the effects of large language models versus web search on depth of learning’, PNAS Nexus, vol. 4, no. 10, p. pgaf316, Sep. 2025, doi: 10.1093/pnasnexus/pgaf316. (in English)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright (c) 2026 Galyna Prіsovska, Olena Іvanova

