For anyone interested in privacy and security it’s been difficult in the last couple of days to avoid the congressional hearing and Mark Zuckerberg’s testimony on the Cambridge Analytica data story.   In between some of the amusing back and forths as Zuckerberg tried to explain basic concepts of how “the internets” works to people with zero technical comprehension, and beyond the hype of the media circus around the hearing, there feels like a fundamental shift in the way certain online activities will be perceived.

At the core of this story Is the use of apps on Facebook, and the use of your Facebook credentials to login to other services – either embedded on the site or elsewhere.  Users were invited to login into “This Is Your Digital Life” to learn interesting things that could be shown from their online data…and of course immediately provided all that data to that system as part of the process.

This is of interest to REFEDS and Identity Federations for very good reasons.  The design of our approaches – letting users log-in with a single account to other services –  is operating in exactly the same field as Facebook Login or “Sign Up With Google”, but  with a combination of the very real desire to ensure R&E federations are privacy preserving and the fear of organisations that they might be fined for releasing data. This means that oftentimes services that legitimately need basic information about users are struggling to get that data  from us.   Logging in with social media accounts to scholarly services may “just by easier”.   This latest news shows why easier is often not better.

Cambridge Analytica: was it a data breach?

Facebook has been consistent in referring to the Cambridge Analytica data usage as a data breach, and most of the press stories have picked up that vocabulary and run with it…but was it actually a breach?

A breach can be defined as “a security incident in which sensitive, protected or confidential data is copied, transmitted, viewed, stolen or used by an individual unauthorized to do so”.

The key word here being “unauthorized”.   The app in question here certainly wasn’t unauthorised – Facebook permitted it to use their system.  Users had to login and “consent” to use the app (more on that below) so again the data was being passed across in EXACTLY the way that the system is designed to work.   Any cute quiz or questionnaire on Facebook really has no interest in helping you find out which Super Hero’s Pet You are Most Likely to Be – the aim of these surveys is always to collect data from you.

Something we have to ask in the “This is Your Digital Life” scenario is whether Facebook understood this data was being passed on to Cambridge Analytica and that permission existed for them to further process the information.  Different messages are being spread about this and there is no conclusive answer either way.  These kind of scenarios are exactly why new data protection regulation has moved for stressing not only WHAT you collect but WHY you collect it and WHO you share it with.

Whether this can really be classified as a data breach is hard to call, but it certainly was a breach of privacy expectations for users.

Harvesting by design

The General Data Protection Regulation (GDPR) is built around a principle of “privacy by design” – a useful catch phrase to use when explaining the document.  It simple means that every single system we build and use should start from the basis of protecting the privacy of the user, and work out from there.  This is clearly at odds with what we have tended to call social media approaches, where the user is actively encouraged to disengage from any concept of privacy.  As The Circle says, “Sharing is Caring”.  [1]

This certainly used to be the attitude of Facebook.  Back in 2011, Mark Zuckerberg stated that “People have really gotten comfortable not only sharing more information and different kinds, but more openly and with more people….that social norm is just something that has evolved over time.” [2]

We’ve been pretty used to information about us being out there, and most vaguely clued-up users of social media will tell you that they never really had any expectations that their data wasn’t be used.  So why is it so different this time?  In accepting that companies will use the data, most people will also think “but what can they possibly do with posts about my cats and latte drinking habits”.  We now know the answer is  – A LOT.  The clear evidence that this data has been used to influence voters recalls a statement made by Neelie Kroes in 2014: “It is clear that the cord connecting technology and democracy has been severed. This is bad for democracy and bad for technology and it will not be easy to stitch the two back together.” [3]

Consent and legitimate interest

How often have you logged in to another service with Facebook?  Most people I ask say “not very often”.  Now check “apps and websites” menu under settings on Facebook.  How many apps are authorised to have access to your account?  More than you thought?  It’s very easy to permit access to an app because you want to do something quickly, and then forget exactly how many times you have done this.  Many of these apps and websites are quite responsible in what they collect, but some are not.   You may find yourself asking “why does bubble bubble pop game need to access my religious views” and also “seriously, there’s a religious views field on Facebook profiles?  Thank goodness I never filled that in.”

Services will argue that you have consented to all of this information being shared.  When you choose to login with Facebook, a box appears telling you exactly what information will be shared with the app.  In many scenarios, this consent is not consent.  The GDPR is very clear that consent must be unambiguous and freely given.  If you have to release unnecessary data to use a service, this is not consent.

Services in question must also have a legitimate interest in the information that they gather.  This interest can be pretty broad and can absolutely include the interest in marketing to you, but it has to be in the context of the service in question.

In some ways, this is why “This is Your Digital Life” was pretty clever.  It’s quite easy for an app that is all about analysing your digital activity to, well, need access to all the data about your digital activity.   However, even if the user did agree to that not-really-consenting dialogue box AND wanted to use the app, it’s pretty clear that the users weren’t informed who else would be using the data and that the data was used in a different context from the “permission” given.  This is again why GDPR puts so much emphasis on telling users WHY you need their data and WHO you share it with.

Uneasy neighbours

Several years ago now at TNC2012, I argued that security, privacy and usability were very uneasy neighbours and this continues to be true.  Michael Hausding from SWITCH-CERT has written a great piece on problems caused by the GDPR and “WHOIS”, which is a topic that is high priority in the CSIRT space at the moment.  As Michael points out the move to hide personal data of natural persons will also provide greater protection for criminals and make contacting domain owners during incidents more difficult.

Back to our Facebook story and we see the tension between usability and privacy.  Clicking on that “login via Facebook” button is just so quick and easy that the potential problems that causes can easily be brushed aside.

Here’s me again, talking to the EUNIS 2014 conference on the theme of “Anonymity, Security, Trust and Me”.  The talk give several examples of the problems balancing privacy, security, usability and also anonymity in our every day transactions – particularly in the face of laws and regulations that don’t map well to the non-geographically bound world of online interaction.

So What Next?

One of the biggest potential fall outs of this recent story could be Privacy Shield.  Privacy Shield was brought in to replace Safe Harbor as a mechanism for sharing data between the EU and the US – but the same campaigners that brought down Safe Harbor continue to use Facebook as an example that the protections provided by Privacy Shield are not adequate.  This is not too impactful for REFEDS as many of the organisations we work with are not eligible for Privacy Shield, but it will cause headaches for a lot of people undergoing GDPR assessments right now.

In terms of the work that we do in REFEDS – we continue to work to ensure that R&E Identity Federations support principles of privacy by design, but ALSO to ensure that data can flow when and where it should.  A lot of focus with GDPR is on what you can’t do – but one of the first sentences of the GDPR is that it is exists to protect but also “to ensure the free flow of personal data between Member States” (Recital 3).

REFEDS continues to strive to allow that flow of data by providing mechanisms such as the Research and Scholarship Entity Category, Sirtfi and support for the GÉANT Code of Conduct that are GDPR compliant ways of releasing data.  We are also actively looking at how to make these official safeguards under GDPR as explained on the REFEDS wiki.

If you have any concerns, questions or ideas about any of the topics covered in this blogpost, please do not hesitate to get in touch.