Anonymising open data

Here is the next in our occasional series about open and linked data. I wrote in a previous post that we are worki8ngt on developing an application for visualising Labour market Information for use in careers guidance.

One of the major issues we face is the anonymity of the data. fairly obviously, the mo0re sources of data are linked, the more possible it may become to identify people through the data. The UK information Commissioner’s Office has recently published a code of practice on “Anonymisation: managing data protection risk” and set up an Anonymisation Network. In the foreword to the code of practice they say:

The UK is putting more and more data into the public domain.

The government’s open data agenda allows us to find out more than ever about the performance of public bodies. We can piece together a picture that gives us a far better understanding of how our society operates and how things could be improved. However, there is also a risk that we will be able to piece together a picture of individuals’ private lives too. With ever increasing amounts of personal information in the public domain, it is important that organisations have a structured and methodical approach to assessing the risks.

The key points about the code are listed as:

Data protection law does not apply to data rendered anonymous in such a way that the data subject is no longer identifiable. Fewer legal restrictions apply to anonymised data.
The anonymisation of personal data is possible and can help service society’s information needs in a privacy-friendly way.
The code will help all organisations that need to anonymise personal data, for whatever purpose.
The code will help you to identify the issues you need to consider to ensure the anonymisation of personal data is effective.
The code focuses on the legal tests required in the Data Protection Act

Particularly useful are the Appendices which presents a list of key anonymisation techniques, examples and case studies and a discussion of the advantages and disadvantages of each. These include:

Partial data removal
Data quarantining
Pseudonymisation
Aggregation
Derived data items and banding

The report is well worth reading for anyone interested in open and linked data – even if you are not from the UK. Note for some reason files are downloading with an ashx suffix. But if you just change this locally to pdf they will open fine.

CareersTalk

leading-edge ideas for careers work

Anonymising open data

Leave a Reply Cancel reply