Sunday, December 7, 2014

Stop Making Web Surveillance Bugs by Mistake!

Since I've been writing about library websites that leak privacy, I figured it would be a good idea to do an audit of Unglue.it to make sure it wasn't leaking privacy in ways I wasn't aware of. I knew that some pages leak some privacy via referer headers to Google, to Twitter, and to Facebook, but we force HTTPS and make sure that user accounts can be pseudonyms. We try not to use any services that push ids for advertising networks. (Facebook "Like" button, I'm looking at you!)

I've worried about using static assets loaded from third party sites. For example, we load jQuery from https://ajax.googleapis.com (it's likely to be cached, and should load faster) and Font Awesome from https://netdna.bootstrapcdn.com (ditto). I've verified that these services don't set any cookies and allow caching, which makes it unlikely that they could be used for surveillance of unglue.it users.

It turned out that my worst privacy leakage was to Creative Commons! I'd been using the button images for the various licenses served from https://i.creativecommons.org/ I was surprised to see that id cookies were being sent in the request for these images.
In theory, the folks at Creative Commons could track the usage for any CC-licensed resource that loaded button images from Creative Commons! And it could have been worse. If I had used the HTTP version of the images, anyone in the network between me and Creative Commons would be able to track what I was reading!

Now, to be clear, Creative Commons is NOT tracking anyone. The reason my browser is sending id cookies along with button image requests is that the Creative Commons website uses Google Analytics, and Google Analytics sets a domain-wide id cookie. Google Analytics doesn't see any of this traffic- it doesn't have access to server logs. But without anyone intending it, the combination of Creative Commons, Google Analytics, and websites like mine that want to promote use of Creative Commons have conspired to build a network of web surveillance bugs BY MISTAKE.

When I inquired about this to Creative Commons, I found out they were way ahead of the issue. They've put in redirects to HTTPS version of their button images. This doesn't plug any privacy leakage, but it discourages people from using the privacy spewing HTTP versions. In addition, they'd already started to process of moving static assets like button images to a special-purpose domain. The use of this domain,  licensebuttons.net, will ensure that id cookies aren't sent and nobody could use them for surveillance.

If you care about user privacy and you have a website, here's what you should do:
  1. Avoid loading images and other assets from 3rd party sites. consider self-hosting these.
  2. When you use 3rd party hosted assets, use HTTPS references only!
  3. Avoid loading static assets from domains that use Google Analytics and set id domain cookies.
For Creative Common license buttons, use the buttons from licensebuttons.net. If you use the Creative Commons license chooser, replace "i.creativecommons.org" in the code it makes for you with "licensebuttons.net". This will help the web respect user privacy. The buttons will also load faster, because the "i.creativecommons.org" requests will get redirected there anyway.

0 comments:

Contribute a Comment

Note: Only a member of this blog may post a comment.