Is this the right way to use machine learning in education?

An article ‘Predicting Employment through Machine Learning‘ by Linsey S. Hugo on the National Association of Colleges and Employers web site,confirms some of my worries about the use of machine learning in education.

The article presents a scenario which it is said “illustrates the role that machine learning, a form of predictive analytics, can play in supporting student career outcomes.” It is based on a recent study at Ohio University (OHIO) which  leveraged machine learning to forecast successful job offers before graduation with 87 percent accuracy. “The study used data from first-destination surveys and registrar reports for undergraduate business school graduates from the 2016-2017 and 2017-2018 academic years. The study included data from 846 students for which outcomes were known; these data were then used in predicting outcomes for 212 students.”

A key step in the project was “identifying employability signals” based on the idea that “it is well-recognized that employers desire particular skills from undergraduate students, such as a strong work ethic, critical thinking, adept communication, and teamwork.” These signals were adapted as proxies for the “well recognised”skills.

The data were used to develop numerous machine learning models, from commonly recognized methodologies, such as logistic regression, to advanced, non-linear models, such as a support-vector machine. Following the development of the models, new student data points were added to determine if the model could predict those students’ employment status at graduation. It correctly predicted that 107 students would be employed at graduation and 78 students would not be employed at graduation—185 correct predictions out of 212 student records, an 87 percent accuracy rate.

Additionally, this research assessed sensitivity, identifying which input variables were most predictive. In this study, internships were the most predictive variable, followed by specific majors and then co-curricular activities.

As in many learning analytics applications the data could then be used as a basis for intervention to support students employability on gradation. If they has not already undertaken a summer internship then they could be supported in this and so on.

Now on the one hand this is an impressive development of learning analytics to support over worked careers advisers and to improve the chances of graduates finding a job. Also the detailed testing of different machine learning and AI approaches is both exemplary and unusually well documented.

However I still find myself uneasy with the project. Firstly it reduces the purpose of degree level education to employment. Secondly it accepts that employers call the shots through proxies based on unquestioned and unchallenged “well recognised skills” demanded by employers. It may be “well recognised” that employers are biased against certain social groups or have a preference for upper class students. Should this be incorporated in the algorithm. Thirdly it places responsibility for employability on the individual students, rather than looking more closely at societal factors in employment. It is also noted that participation in unpaid interneships is also an increasing factor in employment in the UK: fairly obviously the financial ability to undertake such unpaid work is the preserve of the more wealthy. And suppose that all students are assisted in achieving the “predictive input variable”. Does that mean they would all achieve employment on graduation? Graduate unemployment is not only predicated on individual student achievement (whatever variables are taken into account) but also on the availability of graduate jobs. In teh UK  many graduates are employed in what are classified as non graduate jobs (the classification system is something I will return to in another blog). But is this because they fail to develop their employability signals or simply because there simply are not enough jobs?

Having said all this, I remain optimistic about the role of learning analytics and AI in education and in careers guidance. But there are many issues to be discussed and pitfalls to overcome.


Travel to university time a factor in student performance

My summer morning’s work is settling into a routine. First I spend about half an hour learning Spanish on DuoLingo. Then I read the morning newsletters – OLDaily, WONKHE, The Canary and Times Higher Education (THE).

THE is probably the most boring of them. But this morning they led on an interesting and important research report. In an article entitled ‘Long commutes make students more likely to drop out’, Ana McKie says:

Students who have long commutes to their university may be more likely to drop out of their degrees, a study has found.

Researchers who examined undergraduate travel time and progression rates at six London universities found that duration of commute was a significant predictor of continuation at three institutions, even after other factors such as subject choice and entry qualifications were taken into account.

THE reports that the research., commissioned by London Higher, which represents universities in the city found that “at the six institutions in the study, many students had travel times of between 10 and 20 minutes, while many others traveled for between 40 and 90 minutes. Median travel times varied between 40 and 60 minutes.”

At one university, every additional 10 minutes of commuting reduced the likelihood of progression beyond end-of-first-year assessments by 1.5 per cent. At another, the prospect of continuation declined by 0.63 per cent with each additional 10 minutes of travel.

At yet another institution, a one-minute increase in commute was associated with a 0.6 per cent reduction in the chances of a student’s continuing, although at this university it was only journeys of more than 55 minutes that were particularly problematic for younger students, and this might reflect the area these students were traveling from.

I think there are a number of implications from this study. It is highly probable that those students traveling the longest distance are either living with their parents or cannot afford the increasingly expensive accommodation in central London. Thus this is effectively a barrier to less well off students. But it is also worth noting that much work in Learning Analytics has been focused on predicting students likely to drop out. Most reports suggest it is failing to complete or to success in initial assignments that is the most reliable predicate. Yet it may be that Learning Analytics needs to take a wider look at the social, cultural, environmental and financial context of student study with a view to providing more practical support for students.

I work on the LMI for All project which provides an API and open data for Labour Market Information for mainly use in careers counseling advice and guidance and to help young people choose their future carrers or education. We already provide data on travel to work distances, based on the 2010 UK census. But I am wondering if we should also provide data on housing costs,possibly on a zonal basis around universities (although I am not sure if their is reliable data). If distances (and time) traveling to college is so important in student attainment this may be a factor students need to include in their choice of institution and course.


Data and the future of universities

I’ve been doing quite a lot of thinking about how we use data in education. In the last few years two things have combined – the computing ability to collect and analyse large datasets, allied to the movement by many governments and administrative bodies towards open data.

Yet despite all the excitement and hype about the potential of using such data in education, it isn’t as easy as it sounds. I have written before about issues with Learning Analytics – in particular that is tends to be used for student management rather than for improving learning.

With others I have been working on how to use data in careers advice, guidance and counselling. I don’t envy young people today in trying to choose and  university or college course and career. Things got pretty tricky with the great recession of 2009. I think just before the banks collapsed we had been putting out data showing how banking was one of the fastest growing jobs in the UK. Add to the unstable economies and labour markets, the increasing impact of new technologies such as AI and robotics on future employment and it is very difficult for anyone to predict the jobs of the future. And the main impact may well be nots o much in new emerging occupations,or occupations disappearing but in the changing skills and knowledge required n different jobs.

One reaction to this from many governments including the UK has been to push the idea of employability. To make their point, they have tried to measure the outcomes of university education. But once more, just as student attainment is used as a proxy for learning in many learning analytics applications, pay is being used as a proxy for employability. Thus the Longitudinal Education Outcomes (LEO) survey, an experimental survey in the UK, users administrative data to measure the pay of graduates after 3, 5 and 0 years, per broad subject grouping per university. The trouble is that the survey does not record the places where graduates are working. And once thing we know for a certainty is that pay in most occupations in the UK is very different in different regions. The LEO survey present a wealth of data. But it is pretty hard to make any sense of it. A few things stand out. First is that UK labour markets look pretty chaotic. Secondly there are consistent gender disparities for graduates of the same subject group form individual universities. The third point is that prior attainment before entering university seems a pretty good predictor of future pay, post graduation. And we already know that prior attainment is closely related to social class.

A lot of this data is excellent for research purposes and it is great that it is being made available. But the collection and release of different data sets may also be ideologically determined in what we want potential students to be able to find out. In the same way by collecting particular data, this is designed to give a strong steer to the directions universities take in planning for the future. It may well be that a broader curriculum and more emphasis on process and learning would most benefits students. Yet the steer towards employability could be seen to encourage a narrower focus on the particular skills and knowledge employers say they want in the short term and inhibit the wider debates we should be having around learning and social inclusion.