Sunday, October 28, 2012

Can we Classify students with Data Mining ???

In web-based educational environments predict student’s performances is very important where the students who are at the risk of failing examinations can be identified at the early stage of the course modules and the educators can take necessary actions to improve their knowledge to a more higher level and to increase their learning capacities as well. 

In data mining context use of classification on the educational data is a upcoming research area where to discover potential student groups with similar characteristics and to identify learners with low motivation and find corrective actions to lower drop-out rates. 

C. Romero, S. Ventura, P. G. Espejo, and C. Hervás [2008] have tried to used different classification approaches on  the student data to compare the applicability on data mining techniques for classifying the students in to groups and to predict the final marks obtained in the course modules. 

In their research they used a framework which is known as KEEL which is an open source framework for building data mining models including classification, regression, clustering, pattern mining and based on this framework they developed an data mining tool which can be integrated in to the moodle environment. 


For constructing the dataset they used the activity information from the database in the moodle environment and they extracted the information on moodle activities and the final marks of the students have achieved for 7 course modules. 

For this research they used 25 classification algorithms which are based on Statistical classification (linear discriminant analysis, least mean square quadratic , kernel and k nearest neighbors), decision tree (C4.5 and CART), rule Induction ( CN2, AprioriC , XCS, Supervised Inductive Algorithm(SIA), a genetic algorithm using real-valued genes (Corcoran) and a Grammar-based genetic programming algorithm (GGP)), fuzzy rule induction (LogitBoost, MaxLogitBoost, AdaBoost, Grammarbased genetic Programming (GP), a hybrid Grammar-based genetic Programming/genetic Algorithm method (GAP), a hybrid Simulated Annealing/genetic Programming algorithm (SAP) and an adaptation of the Wang-Mendel algorithm (Chi)), neural Networks (multilayer perceptron, a radial basis function neural network (RBFN), incremental RBFN, decremental RBFN, a hybrid Genetic Algorithm Neural Network (GANN) and Neural Network Evolutionary Programming (NNEP)).

According to their research they found that models obtained using categorical data are more comprehensible than when using numerical data because categorical values are easier for a teacher to interpret than precise magnitudes and ranges. 

Decision trees are considered easily understood models because a reasoning process can be given for each conclusion. But a tree obtained with large nodes and leaves are less comprehensible. C4.5 and CART algorithms are simple for instructors to understand and interpret.

Rule induction algorithms are normally also considered to produce comprehensible models because they discover a set of IF-THEN classification rules that are a highlevel knowledge representation and can be used directly for decision making. Algorithms such as GGP have a higher expressive power allowing the user to determine the specific format of the rules.

Fuzzy rule algorithms obtain IF-THEN rules that use linguistic terms that make them more  interpretable by humans and these rules is very intuitive and easily understood by problem-domain experts like teachers.

Statistical methods and neural networks are deemed to be less suitable for data mining purposes  due to the lack of comprehensibility even they attain very good accuracy rates but very difficult for people to understand. But algorithms like ADLinear, PolQuadraticLMS, Kernel and NNEP algorithms obtain functions that express the possible strong interactions among the variables.

Reference : C. Romero, S. Ventura, P. G. Espejo, and C. Hervás, “Data mining algorithms to classify students,” Proceedings of Educational Data Mining, pp. 20–21, 2008.

No comments:

Post a Comment