Distance and open education is a emerging
educational principle where many universities and institutions are using
e-learning in distance education for their study programs. Due to the nature of
virtual communication the teachers and students are not meeting each other face
to face. Because of this learning nature many students are dropping out from
the educational programs since they cannot cope with the requirement of the
study programs. Therefore understanding the performance level of each student
will help the teachers to identify different capability levels of the student
which will help them to climb up the ladder with their peers.
S. B. Kotsiantis, C. J.
Pierrakeas, and P. E. Pintelas [2003] have tried to apply data mining
methodologies on educational data to limit student dropout in university-level distance
learning. According to them the dropout can be caused by professional,
academic, health, family and personal reasons and varies depending on the
education system adopted by the institution providing distance learning, as
well as the selected subject of studies.
They based their research on a
course module which was offered in Hellenic Open University which is based
their educational programs mainly on distance mode. The built a data set of 365
student instances and based on the data the attributes were divided in to two
groups which were the ‘Curriculum-based' group and the ‘Students' performance'
group. The ‘Curriculum-based' group represented attributes of students' sex,
age, marital status, number of children and occupation and the group
represented attributes concerning students' marks on the first two written
assignments and their presence or absence in the first two face-to-face
meetings.
In this research they used six
machine learning techniques which are Decision Trees, Neural Networks, Naive
Bayes algorithm, Instance-Based Learning Algorithms, Logistic Regression and
Support Vector Machines. For each of these algorithms they used a representative
algorithm as C4.5 algorithm for the decision trees algorithm and to estimate
the values of the weights of a neural network the Back Propagation (BP)
algorithm was used. The Naive Bayes (NB) algorithm was used for the Bayers
algorithm and 3-Nearest Neighbour algorithm was also used. Maximum
Likelihood Estimation (MLE) was the used statistical method for estimating the
coefficients of the logistic model and finally, the Sequential Minimal
Optimization (or SMO) algorithm was the representative of the Support Vector
Machine.
Based on these six algorithms
they found that Naive Bayes algorithm and Back Propagation (BP) algorithm had
the best accuracy with the data sets. However they mentioned that the
differences were generally small and because they were only based on one course
module and it may possible that the ranking in another data set of the same
domain is different. Also they concluded that Naive Bayes has the short training time and effective communicated
way of predicting and the small programming cost than the other algorithms.
Reference: S. Kotsiantis, C.
Pierrakeas, and P. Pintelas, “Preventing student dropout in distance learning
using machine learning techniques,” in Knowledge-Based Intelligent Information
and Engineering Systems, 2003, pp. 267–274.
No comments:
Post a Comment