2010 KDD Cup Competition
Educational Data Mining Challenge
https://pslcdatashop.web.cmu.edu/KDDCup/
CALL FOR PARTICIPATION
The KDD Cup is the annual Data Mining and Knowledge Discovery
competition in which some of the best data mining teams in the
world compete to solve an important practical data mining
problem.
This year 15,000 USD in cash prizes and travel support will be
provided thanks to our sponsors: Facebook, Elsevier, and the
Pittsburgh Science of Learning Center.
THIS YEAR'S CHALLENGE
How generally or narrowly do students learn? How quickly or
slowly? Will the rate of improvement vary between students? What
does it mean for one problem to be similar to another? It might
depend on whether the knowledge required for one problem is the
same as the knowledge required for another. But is it possible to
infer the knowledge requirements of problems directly from
student performance data, without human analysis of the tasks?
This year's challenge asks you to predict student performance on
mathematical problems from logs of student interaction with
Intelligent Tutoring Systems. This task presents significant
technical challenges, has practical importance, and is
scientifically interesting.
TECHNICAL CHALLENGES
In terms of technical challenges, we mention just a few:
- The data matrix is sparse: not all students are given every
problem, and some problems have only 1 or 2 students who
completed each item. So, the contestants need to exploit
relationships among problems to bring to bear enough data to
hope to learn.
- There is a strong temporal dimension to the data: students
improve over the course of the school year, students must
master some skills before moving on to others, and incorrect
responses to some items lead to incorrect assumptions in
other items. So, contestants must pay attention to temporal
relationships as well as conceptual relationships among
items.
- Which problems a given student sees is determined in part by
student choices or past success history: e.g., students only
see remedial problems if they are having trouble with the
non-remedial problems. So, contestants need to pay attention
to causal relationships in order to avoid selection bias.
SCIENTIFIC AND PRACTICAL IMPORTANCE
From a practical perspective, improved models could be saving
millions of hours of students' time (and effort) in learning
algebra. These models should both increase achievement levels and
reduce time needed. Focusing on just the latter, for the .5
million students that spend about 50 hours per year with
Cognitive Tutors for mathematics, let's say these optimizations
can reduce time to mastery by at least 10%. One experiment showed
the time reduction was about 15% (Cen et al. 2007). That's 5
hours per student, or 2.5 million student hours per year
saved. And this .5 million is less than 5% of all
algebra-studying students in the US. If we include all algebra
students (20x) and the grades 6-11 for which there are Carnegie
Learning and Assistment applications (5x), that brings our rough
estimate to 250 million student hours per year saved! In that
time, students can be moving on in math and science or doing
other things they enjoy.
From a scientific viewpoint, the ability to achieve low
prediction error on unseen data is evidence that the learner has
accurately discovered the underlying factors which make items
easier or harder for students. Knowing these factors is essential
for the design of high-quality curricula and lesson plans (both
for human instructors and for automated tutoring software). So
you, the contestants, have the potential to influence lesson
design, improving retention, increasing student engagement,
reducing wasted time, and increasing transfer to future lessons.
Currently K-12 education is extremely focused on assessment. The
No Child Left Behind act has put incredible pressure on schools
to "teach to the test", meaning that a significant amount of time
is spent preparing and taking standardized tests. Much of the
time spent drilling for and taking these tests is wasted from the
point of view of deep learning (long-term retention, transfer,
and desire for future learning); so any advances which allow us
to reduce the role of standardized tests hold the promise of
increasing deep learning.
To this end, a model which accurately predicts long-term future
performance as a byproduct of day-to-day tutoring could augment
or replace some of the current standardized tests: this idea is
called "assistment", from the goal of assessing performance while
simultaneously assisting learning. Previous work has suggested
that assistment is indeed possible: e.g., an appropriate analysis
of 8th-grade tutoring logs can predict 10th-grade standardized
test performance as well as 8th-grade standardized test results
can predict 10th-grade standardized test performance (Feng,
Heffernan, & Koedinger, 2009). But it is far from clear what the
best prediction methods are; so, the contestants' algorithms may
provide insights that allow important improvements in assistment.
IMPORTANT DATES
- March 15 - Call for participation
- April 1 – Web site opens for registration
- April 15 – Competition begins (Updated!)
- June 1 - Competition ends
For more information, please visit the official KDDCup 2010
Competition website:
https://pslcdatashop.web.cmu.edu/KDDCup/
_______________________________________________
uai mailing list
uai@ENGR.ORST.EDU
https://secure.engr.oregonstate.edu/mailman/listinfo/uai