A Comparative Analysis of Data Mining Methods and Hierarchical Linear Modeling Using PISA 2018 Data

A Comparative Analysis of Data Mining Methods and Hierarchical Linear Modeling Using PISA 2018 Data Wenting Weng1 and Wen Luo2, 1Johns Hopkins University, USA, 2Texas A&M University, USA Abstract Educational research often encounters clustered data sets, where observations are organized into multilevel units, consisting of lower-level units (individuals) nested within higher-level units (clusters). However, many studies in education utilize tree-based methods like Random Forest without considering the hierarchical structure of the data sets. Neglecting the clustered data structure can result in biased or inaccurate results. To address this issue, this study aimed to conduct a comprehensive survey of three treebased data mining algorithms and hierarchical linear modeling (HLM). The study utilized the Programme for International Student Assessment (PISA) 2018 data to compare different methods, including non-mixedeffects tree models (e.g., Random Forest) and mixed-effects tree mo...