Thursday, October 25, 2012

Can we prevent school dropout in Distance Learning???

Distance and open education is a emerging educational principle where many universities and institutions are using e-learning in distance education for their study programs. Due to the nature of virtual communication the teachers and students are not meeting each other face to face. Because of this learning nature many students are dropping out from the educational programs since they cannot cope with the requirement of the study programs. Therefore understanding the performance level of each student will help the teachers to identify different capability levels of the student which will help them to climb up the ladder with their peers. 

S. B. Kotsiantis, C. J. Pierrakeas, and P. E. Pintelas [2003] have tried to apply data mining methodologies on educational data to limit student dropout in university-level distance learning. According to them the dropout can be caused by professional, academic, health, family and personal reasons and varies depending on the education system adopted by the institution providing distance learning, as well as the selected subject of studies.

They based their research on a course module which was offered in Hellenic Open University which is based their educational programs mainly on distance mode. The built a data set of 365 student instances and based on the data the attributes were divided in to two groups which were the ‘Curriculum-based' group and the ‘Students' performance' group. The ‘Curriculum-based' group represented attributes of students' sex, age, marital status, number of children and occupation and the group represented attributes concerning students' marks on the first two written assignments and their presence or absence in the first two face-to-face meetings.  

In this research they used six machine learning techniques which are Decision Trees, Neural Networks, Naive Bayes algorithm, Instance-Based Learning Algorithms, Logistic Regression and Support Vector Machines. For each of these algorithms they used a representative algorithm as C4.5 algorithm for the decision trees algorithm and to estimate the values of the weights of a neural network the Back Propagation (BP) algorithm was used. The Naive Bayes (NB) algorithm was used for the Bayers algorithm and 3-Nearest Neighbour algorithm was also used. Maximum Likelihood Estimation (MLE) was the used statistical method for estimating the coefficients of the logistic model and finally, the Sequential Minimal Optimization (or SMO) algorithm was the representative of the Support Vector Machine.

Based on these six algorithms they found that Naive Bayes algorithm and Back Propagation (BP) algorithm had the best accuracy with the data sets. However they mentioned that the differences were generally small and because they were only based on one course module and it may possible that the ranking in another data set of the same domain is different. Also they concluded that Naive Bayes has the  short training time and effective communicated way of predicting and the small programming cost than the other algorithms.

Reference: S. Kotsiantis, C. Pierrakeas, and P. Pintelas, “Preventing student dropout in distance learning using machine learning techniques,” in Knowledge-Based Intelligent Information and Engineering Systems, 2003, pp. 267–274.

No comments:

Post a Comment