Wednesday, October 17, 2012

Using K-Nearest Neighbor Algorithm on Student Data

The k-nearest neighbor data mining method is a classical prediction method among the machine learning techniques available in data mining. It has been widely used due to its simplicity and adaptability in predicting many different types of data. The main advantage of using KNN in prediction processes is that the KNN is a lazy method which does not require a model to represent the statistics and distribution of the original training data rather it can be apply on the actual instances of the training data. Even though the KNN is a simple predictive algorithm which we can rely on and it does not make any assumption about the prior probabilities of the training dat. Also the KNN is satisfactorily used on the situations when the data set is included with noisy and incomplete data.

Due to the advantages and the simplicity of the algorithm T. Tanner and H. Toivonen [2010] has tried to implement a model using K nearest neighbor data mining algorithm to identify the student who are at high risk of failing in a specific course. By this research they suggested that good results in predicting final scores indicate that students with learning problems can be found reliably. What they have been using on the student data is to make prediction on the performance of a given student based on the similarity to all instances in the training set and find the k most similar objects in the data set. This similarity is calculated by using a Euclidean distance between the features of the test subject and the corresponding features of each instant in the training set.

In their research they showed that KNN can produce predictions accurately for the final scores even after the first lesson. Another interesting result they found is that in any skill based courses early tests on skills can be used as the predictors for the final scores and they suggested that predicting final scores for the courses can be used to identify the students with learning problems and can be used directly to implement as a early warning features for the teachers so that the students can be altered if they are likely to fail the final tests. Based on the information or features they used for the experiment with the KNN algorithm they suggest that KNN method could be just as effective in other learning management systems (LMSs) such as Moodle where only a single lesson score is available for student assessment and especially other skill-based courses could be a good fit for the KNN method.

Reference : T. Tanner and H. Toivonen, “Predicting and preventing student failure–using the k-nearest neighbour method to predict student performance in an online course environment,” International Journal of Learning Technology, vol. 5, no. 4, pp. 356–377, 2010.

No comments:

Post a Comment