EDM Projects

Grant by the Office for Learning and Teaching, Australian Government

Longbing Cao, Jiuyong Li, Dharmendra Sharma. Data mining of learning behaviors and interactions for improved sentiment and performance, 2012-2014, the Office for Learning and Teaching (OLT) Grants, Department of Industry, Innovation, Science, Research, Climate Change and Tertiary Education.


EDM projects at the University of Technology Sydney

Several EDM projects have been sponsored by UTS since 2008, investigated by the Educational Data Mining team at UTS joint with T&L experts.

The EDM-UTS projects aim to apply Data Mining technologies in the educational data to facilitate data-driven decision making for better learning and teaching. The projects involve applied innovation in the education sector towards a deeper and more comprehensive understanding of student learning behaviors, performance, pathways, and their sentiment and dynamics, combining data and information in and out-of campus, including online access, social media, socio-economic and cultural information.

You can find more information regarding the EDM-UTS projects from the following aspects:

  • Project summary
  • Project objectives
  • Project team
  • Project methodology
  • Project scheduling
  • Project management (internal)

Project Summary


The following projects are on EDM:

Project 1:


  • Project title: Improving The Support for Student Learning Needs Through Pattern Analysis of The Student Experience

  • Investigators: Longbing Cao, Richard Raban, Paul Kennedy, Gordon Lingard, Andrew Litchfield

  • Source: UTS FEIT TEDD fund

  • Time:2009


Project 2:

  • Project title: Curriculum Renewal and Better Support of Student Learning Needs Through Pattern Analysis of Student’s Learning Experiences

  • Investigators: Longbing Cao, Richard Raban, Paul Kennedy, Gordon Lingard, Andrew Litchfield

  • Source: UTS Learning & Teaching Performance Fund

  • Time: 2009


Project 3:


  • Project title: Inform Course Learning and Renewal by Deeply Understanding Cause-Effect of Student Learning Behaviors and Performance

  • Investigators: Longbing Cao, Richard Raban, Paul Kennedy, Tich Phuoc Tran, Peter Kandlbinder

  • Source: UTS VC’s T&L Strategic Grant

  • Time:2010


Project 4:


  • Project title: Data Mining Modelling and System Prototype for the Detection and Prediction of Students at Risk in the “Killer Subjects”

  • Investigators: Longbing Cao, et al

  • Source: UTS VC’s T&L Strategic Grant

  • Time: 2011-12


Project 5:


  • Project title: “At risk” student analytics project

  • Investigators: Brett Smout, Longbing Cao, et al

  • Source: UTS ITCMP grant

  • Time: 2011-12


Project Summary

The above projects implement Educational Data Mining (EDM) to identify students who run the risk of dropping out or retention in a higher education system. Among other things, the most interesting features of EDM are clustering and prediction capabilities. Particularly, clustering student data into homogeneous classes or clusters can provide a comprehensive understanding of student characteristics while predicting the likelihood of a student failing a subject can help teaching staffs concentrate academic assistance on that student. We propose a complete EDM framework with several steps: student data collection and analysis (statistical and visualization), clustering, prediction, evaluation and interpretation of the analysis results. The outcome of such a data-driven analysis can be integrated into well-studied pedagogical theories to complement traditional decision making procedures on various educational processes (e.g. assessment, evaluation, and counseling), automatically extract useful knowledge and hidden patterns from immense quantities of educational data. Such valuable information can be used to identify and prioritize learning needs for different groups of students, increase graduation rates, effectively assess institutional performance, maximize campus resources and optimize subject curriculum renewal. The framework and algorithms from this project will expand the research of EDM to external use for T&L performance improvement purpose.

The unique results from these projects also include the deep dynamics analysis on student progression data and risk associated with student learning/teaching performance along the progression. Social network analysis is used in the projects to understand the performance associated with naturally grouped students, and driving forces and difference associated with students falling into implicit linkages and groups.


Learning analytics system

We have developed an online system for learning analytics and for predicting students at academic risk, which can be used for any courses, any subjects, any universities, colleges or schools. The system is based on data mining and risk management, with data mining models and rules predicting who at what time on which subject may have risk for what likelihood and for what reasons.




Project Team

Project Leader:

Prof Longbing Cao Business modeling and data mining

Chief Investigators:

Dr. Richard Raban Business definition and evaluation

Dr. Paul Kennedy Data Mining

Mr. Gordon Lingard Business definition and evaluation

Mr. Andrew Litchfield Business definition and evaluation

Dr. Peter Kandlbinder Interactive Media and Learning

Research Fellows:

Dr. Patrick Tran Data Mining

Dr. Yiling Zeng Data Mining

Mr Zhigang Zheng EDM system

Mr Mu Li EDM system

Dr Yuming Ou EDM system


Research Methdology

A Two-Phase Learning Approach

The above Figure presents a two-phase learning approach deployed to mine student data. First of all, a full ifecycle of DM process is applied on the educational data. As described in Figure 2, this data mining rocess involves several steps:

Data Collection

Data Preparation / Visualization

Feature Extraction / Clustering

Predictive Modeling


The outcomes of the DM process include descriptive statistical reports, created student clusters and discovered patterns on student features. Such patterns are the building blocks to construct a predictive model which is applied on new student data to predict the probability that a newly enrolled student can have problem successfully completing the course. Based on this prediction/classification, UTS may proactively take remedial action to improve the situation, i.e. concentrate academic assistance on those students most at risk. One of the most interesting steps in classification is naming the induced classes/clusters. With the help of knowledge about higher education systems, more meaningful labels can be used to describe these classes, allowing pedagogically appropriate actions to be planned.
The output of the first phase is then fed into the second phase in which education domain knowledge is utilized to fine-tune the DM processes and provide meaningful interpretation of the results. Particularly, we use well-studied pedagogical theories to guide DM discovery processes, construct a predictive model from useful patterns to identify students at risk, transform statistical outcomes into actionable knowledge and facilitate the selection of remedial actions.