I had a discussion the other day about GDPR, ePrivacy and all those problems for publishers, and I pointed out that the way we think about this is wrong.
We were never supposed to be able to do what most publishers and tech companies do today. In fact, what if I were to tell you that the original specification for how cookies should be implemented in browsers pretty much defined what GDPR is today?
Yep, that’s right. The original cookie specification was GDPR compliant, it’s only later that the ad tech industry completely messed everything up.
It’s a really fascinating story, that as a publisher you should know, because it very profoundly puts the problem that we face today into perspective.
So, let me tell this story.
Before cookies we had a passive web
Cookies today have a really bad reputation, and many people are saying we shouldn’t even have cookies at all. But it’s important to remember why the cookie was actually invented in the first place.
Before cookies, the web was solely a place where you could look up information on a page. You would type in a URL or click on a link, and then you would see what was on that page. But neither the browser or the server had any idea about who you were or what you had done before.
For a purely text-based website, this was not a problem. But imagine that you wanted to create a web shop.
If the browser was unable to remember you between pages, how would you buy something?
Take something like a shopping cart. If you tried to do this before cookies, when people put a product into a shopping cart on the first page they visited, as soon as they click on anything else, the browser would think this was a completely new visit, and… well… your shopping cart would be empty.
This was not very useful and we needed to find a way to fix this. And so back in 1994, Netscape and MSI got together to invent a solution that would allow websites to keep track of information across pages to enable things like web shops.
And the solution that they came up with was the cookie. A very small text file that contained a small amount of data that could be used to identify people or to, for instance, store a shopping cart ID.
This was great. In fact, the invention of the cookie was the single most important thing we have ever added to the internet. It changed the internet from just a passive place for us to read things, to an interactive place.Just think about how many things that you use every single day that rely on cookies to work.
Take something like Twitter or Instagram. Without cookies, it would not be able to update without you logging in every single time. Meaning every time you wanted to see a new tweet or post a reply, you would have to login again. Every time your browser had to load anything, it forgot who you were.
Imagine how annoying this would be?
Or think about what we do as publishers. Take subscriptions. Without cookies, we have to ask people to manually log in every single time they click on a link. Not just the first time, but between every page view.
If people visited your front page, they would have to login to read that. Then as they look at all the articles, every time they clicked on any of them, they would have to login again … and again.
So, the invention of cookies is amazing. The internet wouldn’t work without it.
The problem of course was that it also introduces a huge potential for abuse.
Think about what a cookie is doing. It is identifying who you are, and keeping track of you over time.
This has obvious problems because there are about a million ways that this could be used wrongly. Not only could this be used to violate people’s privacy, there were also many security problems.
So, in 1994 Netscape implemented the cookie in their browser, but they never really told the public about it. And also, there were very few safeguards put into place. And over the next couple of years, as developers started realizing what they could do with cookies, all the bad things started happening.
It was like a gold rush for bad people!
This went on until February 1996, when the Financial Times realized what was going on, and they published the first article about the dangers of cookies.
It was called “This bug in your PC is a smart cookie” (archived in full here):
Considering that this was written in 1996, it is a really good article. It highlighted the problems with cookies in much the same way as we talk about them today. It points out how people have not been made aware that it exists, and that people don’t have any control over it.
As they wrote:
…keeping tabs on the behavior of customers are possible today in cyberspace. Technology is already in place – and ready to be put to use on the World Wide Web of the Internet – that will allow Web site owners to gather an alarming range of information on people who look at their Web pages from PCs at home.
Most internet users are not aware that such possibilities exist. They believe, correctly, that when they surf the Web, the information sent from their PC to the Web site is an IP address – a string of digits that specify the Internet location of the computer they are logging in from. Tracking down the customer from that information alone is an inexact science, since a single IP address can be shared by hundreds of people working at a company or thousands of people using an online service.
But the leading software used on the Web contains a little-known wrinkle that increases the power of companies to find out who their customers are and what they are up to. It allows companies to track which Web pages an individual looks at, when, for how long, and in what order.
That information can be tallied against information the customer provides of his own free will – for instance, when he ‘registers’ for membership by giving a name and e-mail address or provides a credit card number and address when ordering a delivery – to produce a comprehensive record of individual behaviour.
Most extraordinary of all, this information can be stored on customers’ own PCs without their knowledge. It can be kept in a form so that only the company that collected the information can benefit from it. And when the customer connects to the Web site, the site can silently interrogate this PC and pick up the information.
It then continues:
Yet the tale of these cookies is an illustration of the possibilities that internet marketing opens up. In the old days, placing an advertisement was like firing a blunderbuss; remember the old quip that half the money spent on advertising was wasted, but no-one knew which half. Today, technology has created a silver bullet that allows companies to target people individually.
In the long term, this is a good thing, for it will tailor advertising more closely to what consumers want. But at stake is the issue of privacy, which needs to be debated.
The only consolation is that breaches of privacy using this technology are unlikely to have any life-and-death consequences. The worst thing that most companies will do, after all, is try to sell you something.
Despite the somewhat reconciling conclusion, this article had a big impact. Soon after many other newspapers started looking into this, which led to two Federal Trade Commission hearings, and the web industry felt pressured to respond.
BTW: As a complete side-note. On the same day, the Financial Times also wrote about the V-chip and the ‘Clean TV’ regulation that was supposed to limit the broadcasting of sexual and violent shows. And the TV Networks were reluctant to act. As they said:
With the V-chip law now in place, US TV networks and cable companies are under pressure to respond, but it is not yet clear whether they will raise legal challenges to the new law. The Industry risks a backlash of public opinion if it opposes the law, but fears the loss of advertising revenue on “X-rated” programmes if it complies.
I just find this amusing, because it’s exactly the same way publishers are today responding to the upcoming ePrivacy law. So, 1996 was an exciting year.
Anyway, back to the cookie.
So, before all of this started, back in 1995, Netscape had started trying to create a formal cookie specification and different working groups were coming up with different ideas. But after the Financial Times article, this work was suddenly put into high gear.
But not only that, they also seemed to realize that they needed to make some dramatic changes.
As Wikipedia puts it:
In February 1996, the working group identified third-party cookies as a considerable privacy threat. The specification produced by the group was eventually published as RFC 2109 in February 1997. It specifies that third-party cookies were either not allowed at all, or at least not enabled by default.
Yep, you read that right.
The original cookie specification did not allow third-party cookies. All the problems that we have today could be solved back then if the industry had just followed the original specification.
But, the whole document is pretty amazing, and it sounds almost like how GDPR is defined.
Let me give you some of the details.
The first problem that they wanted to address was the problem of websites setting cookies for other domains than the one you were visiting.
This is a general problem that we have today, and I can illustrate this in a very simple way.
If I take a completely fresh browser (with nothing stored), and visit a web site, say, like one of the newspapers in my country, as a reader, I’m only visiting that one website domain.
However, when we then check what cookies were saved by my browser, we suddenly see a very long list of domains that have apparently set cookies in my browser.
As a reader, I didn’t visit any of these sites. None of these domains ever showed up in the URL bar in my browser. Instead, these were set entirely in secret, behind the scenes.
Think about how insane this is. Here we have hundreds of completely unknown secretive ad tech companies, tracking what I’m doing, without me ever visiting any of them directly.
This is not okay.
What is worse about this is that these companies provide no transparency of any kind.
In the media industry, we spend so much time pointing to Google or Facebook. But I’m not really worried about them. Google and Facebook are big companies that I know about. And they are constantly facing public scrutiny.
But for all these hundreds of unknown companies that are on publishers’ sites? I have no idea who they are or what they are actually doing. This is the dark web of ad tech. And it’s all happening via publisher’s sites. And it’s exactly this problem that the Financial Times pointed out in 1996.
So, when the cookie specification was originally defined, they realized this could be a problem, and they decided that this should not be how cookies were used. And in the specifications they came up with a rule as to when browsers were required to reject cookies from being read or set.
They wrote this:
Two of these are just technical specifications for how cookies are defined, but the other two are very interesting.
First of all, it says that when defining the path in a cookie, that path must be a part of the original request-URI … or in plain words, it must match the URL that you see in the top of your browser.
Secondly, they talk about domain-matching, they say that the domain for the cookie must match the domain that people are visiting.
This means that if you visit a newspaper site, it should not be able to set cookies for other domains. Or in other words, third-party cookies are not allowed. And they say that this must be the rule to “prevent possible security or privacy violations”.
This is amazing. They realized the problem with cookies all the way back in 1997. They knew how this could be abused, and they instructed browsers to stop it by rejecting any cookie that violated this rule.
I just love this!
So why do we have this problem today? Well, there are two reasons, and I will get back to the second reason in a moment. But the main reason is how the ad tech world worked around this.
You see, when I go to a newspaper site, all these extra cookies that are being set aren’t actually being set by the newspaper. Instead, the newspaper has embedded all these third-party scripts, which then set the cookies from their own domains. So technically, they can claim that the domains match, because they are actually setting the cookies from them.
This, of course, is a load of bullshit because, as a user, they are basically just saying “screw you”. Technically, they are right, but this is clearly not how this was intended.
But, this is where the original specification is even more amazing, because they realized that the ad-tech companies would probably do this. And so they tried to stop this as well.
What they did was that they started talking about when cookies were allowed to be set automatically, and when they were not. And they called this “verified transactions” versus “unverified transactions“.
Essentially, what it means is that something is verified when the user knows what domain they are on. So for instance, if I visit the New York Times, I know that I’m on their site, and then the browser is allowed to set a cookie for the nytimes.com domain.
But, any other type of request, like when content is loaded from some code embedded into the site, which people would have no way of knowing about, that is an unverified transaction.
And they go on to define that in all those unverified cases, cookies can only be set for domains that match the ‘origin’ request. Meaning, only to the original domain.
This is another way of saying that third-party cookies are not allowed, and ad tech companies can’t ‘cheat the system’ by just claiming that the domain matched because it was embedded in some part of the page.
In a later revision of the original specification, from 2000, they make this even more clear. Here they very strongly tell browsers that they cannot set or read cookies for unverified transactions to third-party sites.
They even called out the bad actors as the reason for this rule.
They also gave an example of this. They used the example of the difference between a link that people had chosen to click on (a known action), and an image being automatically loaded by the page (an unknown action).
Here, again, they are talking about the difference between whether people understand what site they are interacting with or not. When you click on a link, you know what site you are visiting, but when loading an image, people wouldn’t have any idea where that image is actually loaded from.
This should remind you of something, because this is essentially what we now call the Facebook Pixel.
The Facebook Pixel is a tiny image that many publishers have embedded into their sites that, when automatically loaded, allows Facebook to record and track people … which can then be used for a number of things, including ad targeting.
What they were saying, in 1997, was that this would be an unverified transaction, and should therefore not be allowed to store cookies.
They said this 7 years before Facebook was even invented, because they knew that some bad companies would be tempted to do this.
They also introduced an exception. They realized that there might be some cases where this type of functionality was needed, and so they allowed the ability for people to change this configuration. But, as they also said, this should only be allowed so long as this override is defaulted to “off”.
Again, this is absolutely amazing.
But more than that. This is GDPR!
No… you have to provide a configurable option, that people manually had to change before you were allowed to set any third-party cookies.
In fact, this whole thing sounds a lot like GDPR, including the part of verified and unverified transactions. In GDPR, this is defined as the difference between legitimate interest and explicit consent.
For example, with GDPR, you are allowed to set a cookie to manage a shopping cart as a ‘legitimate interest to fulfil a contract’, because that is something people expect your site to do. But you are not allowed to use legitimate interest to set third-party cookies from embedded ad blocks, because people would not be able to understand what was happening behind the scenes.
So the original cookie specification was GDPR compliant!
They also talked about the potential problem with something they called ‘cookie sharing’. Today the ad tech world call this ‘cookie matching’, where they try to match individual cookies set for different domains to the same person, which would allow them to track people across services and sites.
And the original cookies specification frowns upon this. They are specifically calling this out as something that we should make every effort to prevent.
This also applies to other things, like browser-fingerprinting, or how Google will now try to identify people using machine learning without actually using cookies. Sometimes this is called “cookie-less tracking”.
Even though those technologies don’t actually use a cookie, the principle still applies. This is a bad thing that these companies are doing, and we should make every attempt to prevent it.
Again, think about this. They wrote this in 1997!
Finally, they even started talking about cookie management. They said that people should be in control of their data, and if they choose to, they should be allowed to delete it.
They also suggested that this should be made possible to control via some type of interface, and they ended by pointing out that “privacy considerations dictate that the user have considerable control over cookie management”.
Again, this is completely the same as what we see now with GDPR. GDPR also dictates that people should be in control of their data. That publishers must provide a way for people to manage it, and if they choose so, to delete it.
But again, it’s amazing that all of this was defined in the original specification from 1997.
So, where are we today?
Well, as we all know, we messed this up. The ad tech world today is now doing every single thing that we in 1997 said they shouldn’t be allowed to do.
The browsers messed up too! They failed to implement this cookie specification the right way. They failed to reject cookies from being set when the transaction was unverified or when the domain didn’t match.
They allowed the ad tech companies to work around the specifications by setting the cookies in embedded code blocks or tracking ‘pixels’, even though that was explicitly mentioned as something that should not be allowed to happen.
And when we started seeing the problem with cookie matching, fingerprinting and now machine learning, the big browsers were very slow to do anything about it.
We messed up!
What is even worse is that the cookie specification has since been updated several times, and in the latest version (from 2011), they have basically given up trying to do this right.
Here is what the specification says today:
7.1. Third-Party Cookies
Third-party cookie blocking policies are often ineffective at achieving their privacy goals if servers attempt to work around their restrictions to track users. In particular, two collaborating servers can often track users without using cookies at all by injecting identifying information into dynamic URLs.
Basically, what they are saying is: “Yes, we see that there is this big problem with third-party cookies, but we don’t want to get involved, and instead it should just be up to industry to ‘balance’ how much people’s privacy should be violated. And besides, even if we tried to fix this, it probably wouldn’t work because there are other ways to do bad things too.”
This is embarrassing to read. Compare this to what it said in 1997 and 2000.
In fact, this sounds like ad tech lobbyism. It’s a complete failure of the web industry.
Again, we messed up big time! We knew how to deal with this in 1997, but we failed to implement it.
A fix is happening
Things are getting better though. Several browsers, like Firefox, Apple’s Safari and Brave (although I’m not a fan of Brave) are now blocking tracking by default. In other words, they have actually started implementing cookies the way they were originally supposed to.
Chrome on the other hand is falling behind. And I hate saying this because I have used Chrome for years and I loved it (now I use Firefox).
At the same time, we see legislation like GDPR, the upcoming ePrivacy law, and CCPA (California Consumer Privacy Act) that also steps in to do what the web industry clearly could not.
However, there is one thing that really worries me, and that is what I hear from many publishing executives.
For instance, Jessica Davies from Digiday recently wrote a good article about what is happening with ePrivacy, and here she reported this:
News publisher trade groups warn the regulation will be catastrophic in its present form.
Online news content is freely accessible to all because of its underlying cookie-based advertising business model,” said Iacob Gammeltoft, policy adviser at News Media Europe. “If advertisement cookies are undermined, journalism could ultimately be pushed behind paywalls, making it only available to those who can afford it.
And publishers are actively trying to undermine how cookies can be controlled by the browser.
That has caused publisher trade bodies concern that publishers may be cut out of the dialogue and instead the browsers will be the consent gatekeepers. There has been to-ing and fro-ing on this for months, with the result that this article has been deleted from the current version. But according to policy advisor sources, a large number of European Union member states want to reintroduce it. That has sent shivers down the spine of publishers across Europe.
In other words, we are now seeing publishers arguing that they should be allowed to violate people’s privacy because that’s how they make money.
I’m reminded here of a similar story from the US about how US car companies are dealing with climate change.
As you may know, things are a bit of a mess in the US. The US Government, with Trump at the helm, don’t believe in climate change and just think car companies should be allowed to continue to pollute as much as they want. Whereas the state of California, already feeling the impact of climate change, wants to prevent any further damage by imposing much tougher rules on emissions.
And what we are now seeing is that different car companies choose different sides to support. General Motors and Toyota chose to side with Trump, so that they don’t have to reduce emissions, whereas Ford and Honda have chosen to side with California to create a better future… as was recently reported by Hiroko Tabuchi from the New York Times.
And as Aram Zucker-Scharff commented above. We are “gonna remember which car manufacturers decided to side against the planet”.
He is right!
But this is the same thing we have with publishers and privacy. Just like climate change, privacy is going through a pivotal moment in history, in response to a very real trend and a very real demand for the public to fix this problem.
And so I’m really sad to see so many media people actually choosing to lobby against it, spending their time trying to convince politicians that they should be allowed to continue to track people in a privacy violating way just so that they can continue to make money the old way.
You are doing what General Motors and Toyota are doing. You are acting how the tobacco industry used to act, and how the oil industry used to. You picked the side with all the bad people.
As a media analyst, I cannot stress the importance of changing the way you think about this. This change will happen with you or without you, and you don’t want to end up being known as one of the publishers who tried to do the bad thing.
So, my official recommendation to you is to get on the right side of this. Just like Ford, which is now going to work with California to define a new cleaner future for cars, as a publisher, you need to start working with the public to create a privacy respecting future.
We need to invent new ad models that don’t rely on tracking people without their consent, or one where you are not telling people that you want to send their personal data to hundreds of third-party partners.
Is this going to be easy? No … but this is where the future is, and you want to be firmly on the right side when it’s all over.