In web-based educational
environments predict student’s performances is very important where the
students who are at the risk of failing examinations can be identified at the
early stage of the course modules and the educators can take necessary actions
to improve their knowledge to a more higher level and to increase their
learning capacities as well.
In data mining context use of
classification on the educational data is a upcoming research area where to
discover potential student groups with similar characteristics and to identify
learners with low motivation and find corrective actions to lower drop-out
rates.
C. Romero, S. Ventura, P. G.
Espejo, and C. Hervás [2008] have tried to used different classification
approaches on the student data to
compare the applicability on data mining techniques for classifying the
students in to groups and to predict the final marks obtained in the course
modules.
In their research they used a
framework which is known as KEEL which is an open source framework for building
data mining models including classification, regression, clustering, pattern
mining and based on this framework they developed an data mining tool which can
be integrated in to the moodle environment.
For constructing the dataset they used the activity
information from the database in the moodle environment and they extracted the
information on moodle activities and the final marks of the students have
achieved for 7 course modules.
For this research they used 25 classification
algorithms which are based on Statistical classification (linear discriminant
analysis, least mean square quadratic , kernel and k nearest neighbors), decision
tree (C4.5 and CART), rule Induction ( CN2, AprioriC , XCS, Supervised
Inductive Algorithm(SIA), a genetic algorithm using real-valued genes
(Corcoran) and a Grammar-based genetic programming algorithm (GGP)), fuzzy rule
induction (LogitBoost, MaxLogitBoost, AdaBoost, Grammarbased genetic
Programming (GP), a hybrid Grammar-based genetic Programming/genetic Algorithm
method (GAP), a hybrid Simulated Annealing/genetic Programming algorithm (SAP) and
an adaptation of the Wang-Mendel algorithm (Chi)), neural Networks (multilayer
perceptron, a radial basis function neural network (RBFN), incremental RBFN,
decremental RBFN, a hybrid Genetic Algorithm Neural Network (GANN) and Neural
Network Evolutionary Programming (NNEP)).
According to their research they
found that models obtained using categorical data are more comprehensible than
when using numerical data because categorical values are easier for a teacher
to interpret than precise magnitudes and ranges.
Decision trees are considered
easily understood models because a reasoning process can be given for each
conclusion. But a tree obtained with large nodes and leaves are less
comprehensible. C4.5 and CART algorithms are simple for instructors to
understand and interpret.
Rule induction algorithms are
normally also considered to produce comprehensible models because they discover
a set of IF-THEN classification rules that are a highlevel knowledge
representation and can be used directly for decision making. Algorithms such as
GGP have a higher expressive power allowing the user to determine the specific
format of the rules.
Fuzzy rule algorithms obtain
IF-THEN rules that use linguistic terms that make them more interpretable by humans and these rules is
very intuitive and easily understood by problem-domain experts like teachers.
Statistical methods and neural
networks are deemed to be less suitable for data mining purposes due to the lack of comprehensibility even they
attain very good accuracy rates but very difficult for people to understand. But
algorithms like ADLinear, PolQuadraticLMS, Kernel and NNEP algorithms obtain
functions that express the possible strong interactions among the variables.
Reference : C. Romero, S. Ventura, P. G. Espejo, and C. Hervás, “Data mining algorithms to classify students,” Proceedings of Educational Data Mining, pp. 20–21, 2008.
No comments:
Post a Comment