Student data collected by the University of Arizona has ignited a debate.
On March 7, UA announced it could use CatCard data to predict a student’s probability of returning for their second year of college.
“By getting their digital traces, you can explore their patterns of movement, behavior and interactions,” said professor Sudha Ram, director of UA’s INSITE: Center for Business Intelligence and Analytics, in the announcement. “That tells you a great deal about them.”
All UA students must get a CatCard when starting their college careers. For three years, Ram used data for the study from CatCards used by freshmen.
“It’s kind of like a sensor that’s embedded in them, which can be used for tracking them,” Ram said in the announcement. “It’s really not designed to track their social interactions, but you can.”
Mariette Marsh, director of UA’s Human Subjects Protection Program, explained the university needed no approval from the Institutional Review Board to collect student data.
Ram submitted the portion of data she used for review to the IRB. The data was deemed unidentifiable.
“She had no direct identifiers,” Marsh said. “It is still not a person, it may be research, but it is not a person according to the federal law.”
The IRB was created after the 1979 release of The Belmont Report , which lists ethical principles and guidelines for research involving human subjects. The need arose from the realization that the government studied, and knowingly left untreated, black citizens with syphilis in Tuskegee, Alabama.
The privacy and ethics of the use of datasets has been debated for years by Jacob Metcalf, a researcher with Data and Society Research Institute.
The UA study caught his attention.
“Unless the study is explained to them (students) up front,” he said, “I think it is very ethically problematic.”
Metcalf’s concerns centered about student privacy, lack of knowledge about the research and the ability to withdraw from the study.
“There should be a special burden for researchers that are using sensitive data, which people have no choice but to hand over,” he said. “Just because it is being used for a good cause – freshman retention is a real problem – that doesn’t mean the students’ rights to choose to participate in the research study should be dismissed.”
Marsh said the IRB determination was based on government guidelines for identification, and none of those identifiers were used in the data.
“In computer science they theorize and argue that there is no way to truly anonymize a dataset,” Marsh said. “You have to have large computational abilities, at this time, to pinpoint someone.”
Metcalf said those guidelines needed to address more than the common person’s ability to use a dataset to identify someone.
“There has to be significant safeguards to protect students’ right to anonymity and privacy,” Metcalf said. “If the idea is just to produce anonymized, aggregated scores, like to study a pattern in population, then the privacy burden is a lot less.”
Metcalf also expressed concern of how the data produced would be used.
“If there is a way for any student to be seen individually through some portal, like if the end goal is to assign a retention risk score to a student, then this becomes a much more problematic study and the IRB should be thinking about it differently,” Metcalf said. “It is also highly problematic if this data is de-anonymized, in any fashion, so that personal interventions can be made.”
Both the UA INSITE Center and University Information Technology Services were contacted, but no one from those departments would comment on whether students’ data would be used to intervene or identify risk potential of students.
Jason Weir is a reporter for Arizona Sonora News, a service from the School of Journalism with University of Arizona. Contact him at jasonweir@email.arizona.edu.