Blog post -

Forskarbloggen: Big Brother vous regarde

It happens that we’re asked what impact current research in authorship attribution can come to have on personal privacy. In the short run, I would say “not very much”. Naryanan et al report that in over 20 % of the cases, their classification tool is able to identify the correct author among 100 000 candidate authors. These are of course impressive figures, but as there are approximately 2 000 000 000 English-speaking users online, Internet-scale authorship attribution is not quite here. Furthermore, in the experiment by Narayanan et al, the goal is to match up different entries written by the same author on the an online forum. This means that the text segments written by a single author is likely to be on a limited number of topics and written within a relatively short time frame, two aspects which simplifies the problem. To distinguish between authors, Narayanan et al consider features such as vocabulary, use of punctuation, and simple syntactical patterns. When the feature set is known, it is easy to confuse the system. In this particular case it should be sufficient to send the text back and forth through Google translate, and then run a spell-checker on the result. This may confuse the intended audience as well, but the author can feel pretty safe.

Big Brother graffiti in France 2

In the long run, however, the situation is more severe. There are currently great advances in machine learning: The automatic speech recognition and translation systems that power Siri and Google are automatized systems trained on huge quantities of data. Since nothing that has been made public on the web can ever really be erased, the methods that we use to protect our anonymity must not only be effective against the current forms of analysis, but also against all methods developed in the for us relevant future (perhaps our lifetime?). For those that feel concerned, the Tor Project has several useful tools and recommendations on how to protect oneself against network surveillance and data analysis.

A related question is what responsibility falls on us who are active in the field. I would say that ethics of anonymity is a complicated questions in itself, and one that can be more competently investigated by researchers in the humanities. What I think we who are working on the technological side of it can and should do, is to publish our results in open forums, so that the methods and algorithms are available to all, and also to publish in a form understandable by the public in general. What we all can and must do is to continue to work towards a democratic, transparent, and tolerant society. In the wrong hands, I doubt that even Post-It notes are perfectly safe. A small push in this direction is to sign the petition for the blogger Raif Badawi who was sentenced to 10 years in prison and 1000 lashes for starting a website for social and political debate. 1000 lashes is too much for anyone, no matter how you divide them up.

Blog post written by Johanna Björklund, CTO, for Umeå University's "Forskarbloggen" (the Reasearcher Blog).

Related links

Topics

  • Data, Telecom, IT

Categories

  • codemill
  • research & development

Contacts

Johanna Björklund

Press contact CEO - Smart Video Smart Video 070-603 94 59

Related content