While cookies are still being used widely to track users on the Internet, recent privacy-related developments have forced marketers and companies to experiment with different means of tracking users on the Internet. One method that has been in use for at least several years uses so called ETags to track users.
Think of an ETag as a unique value that a web server assigns to each cached element. This unique value is then compared in consecutive visits by the server to determine whether the cached file needs to be replaced. If the identifier differs, the new element is downloaded from the website and a new unique identifier is assigned to it.
Since unique identifiers are assigned to cache resources, ETags can be used to track users on the Internet. What makes ETags special is that it takes some expertise to spot them. While most Internet users are aware of cookies either directly through the web browser's cookie management option or third-party services such as Disconnect or Mozilla Lightbeam , it is difficult to spot ETags without proper tools such as the Live Headers add-on for the Firefox web browser.
To test this right now in your browser of choice, visit Noc. Here you should see ETag information next to others. Not every website that is making use of ETag is using it to track you.
The primary purpose is caching, but if you want to be on the safe side, you will handle all ETags the same way. You have several options to check if a site uses ETags. Since caching is used to set ETags, clearing the browser cache will remove them. While they will be set the next time you visit the site, they cannot be compared by the site anymore and cannot therefor be used to track you across sessions. To find out how you can configure your browser to clear the cache, check out our guide that explains how to do so.
In order for content to be served from the cache, the URL has to be an exact match to the content in the cache. Some web developers will add random numbers to part of the query string to ensure that the content is not cached and is always "fresh. Cache-Control headers specify whether or not the content can be cached and for how long.
The values can include:. If no Cache-Control or Expires headers are present, the browser will cache the content with no expiration date as illustrated below:. Using the HTTP header is the preferred and recommended way of controlling the cache behavior. Refresh elements can be used to tell the browser to either redirect the user to another page or to refresh the page after a certain amount of time.
The refresh tag works the same way as hitting the refresh button in the browser. Even if content has a valid expiration date, the browser will ask for validation that it has not changed from the server of origin. This essentially defeats the purpose of setting content expiration dates.
The use of how content is pulled from cache on repeat visits is impacted by the manner in which the request is issued. While in the same browser session, all content for a site will be served from the local browser cache. If a user clicks through multiple pages of an application and the same graphics and elements are found on each page, the request will not be sent to the origin web server. Instead it will be served from the local cache. If the user re-visits a page during that session, all of the content—including the HTML—will be retrieved from the local cache, as shown in the image below depending on the browser settings.
As soon as the browser is closed, the session cache is cleared. For the next session, the only cache that will be used is the disk cache. Users might also hit refresh on a page to check for new content, such as an updated sports score or news article. Hitting refresh results in an "If-None-Match" header being sent to the origin web server for all content that is currently on the disk cache, independent of the expiration date of the cached content. This results in a response code for each reusable item that is currently in the browser's cache, as illustrated in the picture below.
All objects will contain a response code of , indicating that all were served directly from the servers as in the illustration below. If a new browser session is started and a user returns to a frequently visited site, the local browser cache will be used based on the browser settings.
If a valid expiration date exists for cached content, it will be delivered directly from the cache and no request will be issued to the origin web server. Strong validators are ideal for comparisons but can be very difficult to generate efficiently. Weak ETag values of two representations of the same resources might be semantically equivalent, but not byte-for-byte identical.
This means weak etags prevent caching when byte range requests are used, but strong etags mean range requests can still be cached. Entity tag that uniquely represents the requested resource.
The method by which ETag values are generated is not specified. Typically, the ETag value is a hash of the content, a hash of the last modification timestamp, or just a revision number. For example, a wiki engine can use a hexadecimal hash of the documentation article content. With the help of the ETag and the If-Match headers, you can detect mid-air edit collisions.
0コメント