Anonymising open data

Here is the next in our occasional series about open and linked data. I wrote in a previous post that we are worki8ngt on developing an application for visualising Labour market Information for use in careers guidance.

One of the major issues we face is the anonymity of the data. fairly obviously, the mo0re sources of data are linked, the more possible it may become to identify people through the data. The UK information Commissioner’s Office has recently published a code of practice on “Anonymisation: managing data protection risk” and set up an Anonymisation Network. In the foreword to the code of practice they say:

The UK is putting more and more data into the public domain.

The government’s open data agenda allows us to find out more than ever about the performance of public bodies. We can piece together a picture that gives us a far better understanding of how our society operates and how things could be improved. However, there is also a risk that we will be able to piece together a picture of individuals’ private lives too. With ever increasing amounts of personal information in the public domain, it is important that organisations have a structured and methodical approach to assessing the risks.

The key points about the code are listed as:

  • Data protection law does not apply to data rendered anonymous in such a way that the data subject is no longer identifiable. Fewer legal restrictions apply to anonymised data.
  • The anonymisation of personal data is possible and can help service society’s information needs in a privacy-friendly way.
  • The code will help all organisations that need to anonymise personal data, for whatever purpose.
  • The code will help you to identify the issues you need to consider to ensure the anonymisation of personal data is effective.
  • The code focuses on the legal tests required in the Data Protection Act
Particularly useful are the Appendices which presents a list of key anonymisation techniques, examples and case studies and a discussion of the advantages and disadvantages of each. These include:
  • Partial data removal
  • Data quarantining
  • Pseudonymisation
  • Aggregation
  • Derived data items and banding
The report is well worth reading for anyone interested in open and linked data – even if you are not from the UK. Note for some reason files are downloading with an ashx suffix. But if you just change this locally to pdf they will  open fine.

Open data and Careers Choices

A number of readers have asked me about our ongoing work on using data for careers guidance. I am happy to say that after our initial ‘proof of process’ or prototype project undertaken for the UK Commission for Employment and Skills (UKCES), we have been awarded a new contract as part of a consortium to develop a database and open APi. The project is called LMI4All and we will work with colleagues from the University of Warwick and Raycom.

The database will draw on various sources of labour market data including the Office of National Statistics (ONS) Labour Force Survey (LFS) and the Annual survey of Hours and Earnings (ASHE). Although we will be developing some sample clients and will be organising a hackday and a modding day with external developers, it is hoped that the availability of an open API will encourage other organisations and developers to design and develop their own apps.

Despite the support for open data at a policy level in the UK and the launch of a series of measures to support the development of an open data community, projects such as this face a number of barriers. In the coming weeks, I will write a short series of articles looking at some of these issues.

In the meantime, here is an extract from the UKCES Briefing Paper about the project. You can download the full press release (PDF) at the bottom of this post. And if you would like to be informed about progress with the project, or better still are interested in being involved as a tester or early adapter, please get in touch.

What is LMI for All?

LMI for All is a data tool that the UK Commission for Employment and Skills is developing to bring together existing sources of labour market information (LMI) that can inform people’s decisions about their careers.

The outcome won’t be a new website for individuals to access but a tool that seeks to make the data freely available and to encourage open use by applications and websites which can bring the data to life for varying audiences.

At heart this is an open data project, which will support the wider government agenda to encourage use and re-use of government data sets.

What will the benefits be?

The data tool will put people in touch with some of the most robust LMI from our national surveys/sources therefore providing a common and consistent baseline for people to use alongside wider intelligence.

The data tool will have an access layer which will include guidance for developers about what the different data sources mean and how they can be used without compromising quality or confidentiality. This will help ensure that data is used appropriately and encourage the use of data in a form that suits a non-technical audience.

What LMI sources will be included?

The data tool will include LMI that can answer the questions people commonly ask when thinking about their careers, including ‘what do people get paid?’ and ‘what type of person does that job?’. It will include data about characteristics of people who work in different occupations, what qualifications they have, how much they get paid, and allow people to make comparisons across different jobs.

The first release of the data tool will include information from the Labour Force Survey and the Annual Survey of Hours and Earnings. We will be consulting with other organisations that own data during the project to extend the range of LMI available through the data tool.

LMI for All Briefing Paper

The problem – a major shortage of jobs

We are constantly being told that we have to improve our employability skills and qualifications for finding employment. Yet whilst more qualifications may help in getting a job, from a policy perspective if ignores the obvious. There is simply a shortage of work.

New research from the UK Joseph Rowntree Foundation explores the difficulty of job searching for young people seeking low-skilled work in three areas in England and Wales. Their overall finding is that the main problem for disadvantaged young people looking for work is fundamental – a major shortage of jobs.

Other key findings include:

  • Over two-thirds of applications (69%) received no response at all.
  • 78% of the jobs applied for paid under £7 an hour, while 54% offered the minimum wage. Only 24% of the vacancies offered full-time, daytime work.
  • In the weak labour market, 10 jobseekers chased every job compared to five jobseekers in the strong one.
  • Jobseekers who do not have high-speed internet at home are at a substantial disadvantage and can only search for jobs sporadically, rather than the daily basis that is required.
  • Applications sent a week after jobs were first advertised were half as likely to receive positive responses as those sent in the first three days.
  • The research found there was strong evidence that good-quality applicants from neighbourhoods with poor reputations were not more likely to be rejected by employers.
  • However, employers expressed a preference for local candidates with easy journeys to work.

Knowledge is social

I like this presnetation by Harold Jarche. In another post on his website, Harold says: “Innovation is inextricably linked to both networks and learning. We can’t be innovative unless we integrate learning into our work. It sounds easy, but it’s a major cultural change. Why? Because it questions our basic, Taylorist, assumptions about work; assumptions like:

A JOB can be described as a series of competencies that can be “filled” by the best qualified person.

Somebody in a classroom, separate from the work environment, can “teach” you all you need to know.

The higher you are on the “org chart”, the more you know (one of the underlying premises of job competency models).

PKM is a framework that enables the re-integration of learning and work and can help to increase our potential for innovation. It’s time to design workplaces for individuals, and their Personal KM, instead of getting everyone to conform to a sub-optimal structure that maximizes capital but not labour.”

Who owns the e-Portfolio?

Over the years I have had a fair bit of interest, in this diagramme, produced in a paper for the the e-Portfolio conference in Cambridge in 2005.

I has some discussion about it with Gemma Tur at the PLE2012 Conference in Aveiro. And now Gemma, who is writing her doctoral dissertation in ePortfolios, has written to me to remind me of our discussion. Gemma says:

I thought I could add that eportfolios built with web 2.0 tools may have another process which is based on networking. Cambridge (2009, 2010) argues about the construction of two selves, the networked self and the symphonic self. The first is about documenting learning quickly, in everyday life, taking brief notes with short and quick reflection, sharing and networking. The second is about presenting learning, reorganizing learning, linking learning evidence, with longer and more profound reflection… no networking in this final stage, as it is an inner process

As I am working with learning eportfolios, with web 2.0 tools, networking is a learning process for my students. Therefore, they are building their networked self.

So, if I argue networking is an eportofolio process of web 2.0 eportfolios, who owns the process? Looking at your article and your illustration, I thought it could be a process owned by both the learner and the external world. If networking is a process of sharing, visiting, linking, connecting, commenting, does it mean that it involves both the learner and the audience? this is what I thought before you told me that it is the learner’s process for sure.

So do you think that definitely I should argue that it is only owned by the learner? Then although it could need someone else to comment and connect, in fact, the act of networking is the student’s responsibility? is this the reason why you think that?, do you think I should argue it is owned by the learner?

These are interesting discussion impacting on wider areas than ePortfolios. In particular I think the issue of control is important to the emerging MOOC discussion.

Returning to Gemma’s questions – although I have not read the paper – I don’t think I agree with Cambridge’s idea of he networked self and the symphonic self – at least in this context. I think that networking becomes more important when presenting learning, reorganizing learning, linking learning evidence, and longer and more profound reflection. these processes are inherently social and therefore take place in a social environment.

However it is interesting that social networking was hardly on the radar as a learning process in 2005. And when I referred to the ‘external world’ I was thinking about external organisations – qualification and governmental bodies, trade unions and employers rather than broad social networks. Probably the diagramme needs completely redrawing to reflect the advent and importance of Personal Learning Networks.

However, despite the fact that personal social networks exist in the external world (the ‘audience’), I think the owner of the process is the learner. AZnd I would return again to Ilona Buchems study of the psychological ownership of Personal learning Environments. Ilona says:

One of most interesting outcomes of the study was the relation between control and ownership. The results show that while perceived control of intangible aspects of a learning environment (such as being able to determine the subject matter or access rights) has a much larger impact on the feeling of ownership of a learning environment than perceived control of tangible aspects (such as being able to choose the technology).

Personal Learning Networks are possibly the most important of the intangible aspects of a learning environment. The development of PLEs (which I would argue come out of the ePortfolio debate) and the connectivist MOOCs are shifting control from the educational institutions to the elearners and possibly more important from institutions to wider communities of practice and learning. Whilst up to now, institutions have been able to keep some elements of control (and monopoly through verifying, moderating, accrediting and certifying learning, that is now being challenged by a range of factors including open online courses, new organisations such as the Social Science Centre in Lincoln in the UK and Open Badges.

Such a trend will almost inevitably continue as technology affords ever wider access to resources and learning. The issue of power and control is however unlikely to go away but will appear in different forms in the future.