Saturday, October 20, 2012

Predicting Student Performance in Distance Learning Systems

In any distance learning environment ability of predicting a student’s performance is very important which is advantageous for the teachers and tutors to identify the students with different capabilities and their capacities. When it comes to University education where many students are accessing or following their studies through open and distance environment it requires a identification process upon the students to measure whether they achieve the required level of performance. Otherwise due to the nature of the distance education some students can be lagging behind while peer students have passed them by miles. If teachers and tutors can recognize them at the early stage of the course module necessary steps or decisions can be made in order to prevent them from dropping out from the course modules.

S. Kotsiantis, C. Pierrakeas, and P. Pintelas [2003] have suggested an approach which has used machine learning algorithms with the LMS data to prevent, student dropouts in university distance education. They tried to investigate the efficiency of machine learning techniques in such an environment with trained data sets provided by the “informatics” course of the Hellenic Open University.

In their research they used five different algorithms to study student data and they found that these algorithms can be used more appropriately to predict the student dropouts in study programs. In this research they used most common machine learning techniques which are Decision Trees, Bayesian Nets, Perceptron-based Learning, Instance-Based Learning and Rule-learning.

In their data collection process they collected student data under two categories of attributes which are Demographic attributes and Performance attributes. The Demographic attributes were collected by concerning students’ sex, age, marital status, number of children and occupation and Performance attributes represents attributes which were collected from tutors’ records concerning students’ marks on the written assignments and their presence or absence in face-to-face meetings.

In the above mentioned algorithms categories they used C4.5 algorithm for representing the decision tree, Naive Bayes algorithm was the representative of the Bayesian networks, the RIPPER algorithm was the representative of the rule-learning techniques, WINNOW as the representative of perceptron-based algorithms and finally 3-NN or 3- Nearest Neighbor as the Instance-Based Learning algorithm.

In order to rank the representative algorithms they used the prediction accuracy criterion was used. In the evaluation of the algorithms they found that there was no statistically significant difference between algorithms, but it showed that the Naive Bayes algorithm and the RIPPER had the best accuracy than the others. Among the Naive Bayes algorithm and the RIPPER, Naive Bayes has the advantage short computational time requirement and importantly Naive Bayes classifier can use data with missing values as inputs, whereas RIPPER cannot work with which gives a indication that the Naive Bayes is the most appropriate learning algorithm to be used for the construction of a software support tool in Learning Management Systems.

Other than the above it was found that there exist some obvious and some less obvious attributes that demonstrate a strong correlation with student performance where some gives the higher importance in consideration. Also it can be argued that the learning algorithms could enable tutors to predict student performance with satisfying accuracy long before final examination. 

Reference: S. Kotsiantis, C. Pierrakeas, and P. Pintelas, “Efficiency of Machine Learning Techniques in Predicting Students’ Performance in Distance Learning Systems,” Citeseer, 2002.

No comments:

Post a Comment