UI researchers use machine learning techniques to improve students’ learning

By Willie Cui, News Editor
October 6, 2022

In a research project funded by the National Science Foundation, Dr. Dong Wang and Dr. Nigel Bosch, both professors in Information Sciences, intend to use machine learning techniques to help students better estimate their abilities and understanding in the classroom.

“This is an interdisciplinary project that combines really 2 disciplines — one is on (artificial intelligence) and machine learning, and one is on education,” Wang said. “For this program, it’s really an interdisciplinary innovation, both on the education side and also the machine learning and AI side.”

Wang highlighted two main priorities for the project, fairness and privacy.

“For education and in AI or machine software you’re using, you definitely want it to be fair, especially to people in different demographic groups,” Wang said. “As you are doing some prediction, you will definitely need some training data to train your model, then you want to protect the privacy of the data from the students.”

Using a machine learning technique called federated learning, the two researchers are planning to develop a predictive model that estimates student performance and then compare the model’s predictions to students’ self-estimates in an attempt to help students improve their metacognitive skills.

Get The Daily Illini in your inbox!

“The educational motivation for this project is a common problem … when to study, what to study, how long to study — those kinds of things,” Bosch said. “For those kinds of learning behaviors, it’s really important that students are able to accurately self-assess how well they know something.”

Students and self-evaluation

If a student knows they are falling behind in a course, then they can spend more time studying for that course or ask others for help, Bosch noted. Similarly, if a student feels comfortable with a particular topic, then they can move on and spend more time on other things.

“But it’s actually very difficult for students to properly estimate that, and, in general, difficult for people to estimate how well they know some particular topic,” Bosch said.

In the project, Wang and Bosch are looking specifically at how well students can predict their performance on an upcoming exam, which Bosch said is a very important and common issue.

“The typical approach to trying to teach students how to estimate their knowledge is to give them feedback after an exam,” Bosch said. “So maybe they make some self assessment before the exam, then they take the exam. Then you give them some feedback on how well they thought they were going to do versus what they actually got on the exam, and then they can recalibrate.”

Through the use of machine learning techniques, the two researchers hope that building a predictive model can help provide students with similar feedback preemptively.

“If you only get feedback after the exam, it’s over — it’s too late to do anything about that particular exam; you can only hopefully do better at estimating your knowledge on the next one,” Bosch said. “So that’s where AI comes into this, trying to predict (student performance) before an exam actually happens.”

Privacy and federated learning

As with any AI or machine learning model, Wang and Bosch’s model needs data to be trained so it can be accurate in its predictions.

When it comes to their research project, the two will be looking at various potential indicators of student success to use as predictors for their model.

Along with more conventional indicators such as previous assignments, Bosch said that this dataset will include things like discussion forums usage and how often students visit the course website and when they do.

“We create this AI based on lots of information about the students and what they’ve done in the course,” Bosch said. “And that raises all kinds of privacy and various types of issues on the AI side.”

Wang noted a “knowledge gap” in existing AI and machine learning literature. Currently, most machine learning models are built and trained in a centralized way, where data is collected and stored together before all of it is used to train the model.

“But in that sense, you already kind of violated (students’) privacy a little bit by collecting all the students’ raw data — their exam scores, their pre-scores, their performance in different aspects of the class,” Wang said. “Some students may not feel comfortable sharing that kind of raw data with your machine learning model.”

To help mitigate these privacy concerns, the researchers plan on using a “distributed machine learning model.”

This technique, called federated learning, also known as federal or collaborative learning, involves training various localized AI models using limited sets of data, where only model parameters, not raw data, are exchanged between the local models and the global model.

“We can run some local models on the students’ machines, like their computers, without sending all the raw data to some central servers,” Wang said. “So in that way, we protect the students’ data.”

Fairness and accuracy

Wang noted that ensuring fairness in their predictive model is important, especially in the realm of education.

“People definitely want more fairness in educational software,” Wang said. “But the reality is that there is a trade-off between the performance or accuracy of your prediction model and fairness.”

Wang explained that having imbalanced training data can lead to the resulting model being biased against underrepresented demographics.

“For example, if you have more male students than female students in your class, then you will get more training data on a male students compared to female students,” Wang said. “Then, it is very likely that your model’s performance will bias against female students simply because you don’t have enough data on them.”

This is a “very natural problem” for machine learning models, according to Wang.

“So what people normally do in the machine learning literature is adaptive sampling,” Wang said.

While this would make the model fairer, it would reduce its accuracy.

“If I have more male data than female data, I can downsample the male data — I just remove some male data and keep the data more balanced,” Wang said. “However, that will unfortunately downgrade the performance of the male students, while you keep their performance more balanced.”

Additionally, this trade-off becomes complicated when dealing with multiple demographic groups.

Continuing with his previous example, Wang noted that dropping samples from male students to keep the model balanced can be complicated when those students are also from other minority groups.

“What happens if some of the male samples are from African American students?” Wang asked. “When you drop the data from those male students, you also drop the data from the African American male students. Then you can make it more biased against that group.”

For the project, Wang and Bosch hope to find a balance between the fairness of their predictive model and its accuracy.

“That’s definitely one challenge we want to address,” Wang said. “We want to build a model that can probably hit the sweet spot of the trade-off between the fairness and the accuracy of the model.”

[email protected]