Skip to page content or skip to Accesskey List.
Search evolt.org
evolt.org login: or register

Work

Main Page Content

Tracking your users in the access logs

Rated 3.49 (Ratings: 4) (Add your rating)

Log in to add a comment
(6 comments so far)

Want more?

 
Picture of calimehtar

Philip Hoyt

Member info | Full bio

User since: March 22, 2002

Last login: March 22, 2002

Articles written: 2

Most server log analysis applications on the market simply present usage information grouped by date with sub-groupings like daily averages and top downloads by file size. You can see trends this way like the spike in traffic after you send out an email, or whether you're getting most of your hits from users at work at 3 PM or at home after work, and basic information about where your users are coming from and how much data they are downloading. While this can be useful, it doesn't begin to touch the range of information available to be gleaned from the logs with a little creativity.

Server access logs, while limited in their flexibility, are the best source available for real, hard statistics on what your users are doing. No expensive usability tests are a more accurate portrayal of usage patterns since server logs are not affected by laboratory conditions or limitations on the sample size.

Note also that this article is not an introduction to server logs. For that you might want to try this Evolt article by Marlene Bruce .

Where do your users live?

Host information can be used to extrapolate what region users are accessing your site from. Most log stats systems categorize users by their top-level domain (TLD) only - .com (often categorized as 'US Commercial') .net (labeled with the largely meaningless tag 'network'). Only those users whose ISPs have a TLD indicating nationality will be categorized in a meaningful way. Canadians will recognize what a problem this is - for example the dominant ISP in Western Canada is Telusplanet.net. This piece of trivia is relatively easy to come by but a web-server stats system with default configuration will likely report a Telusplanet hit as coming from 'network'.

A recent analysis of a client's server logs for example, categorizing ISPs by region wherever possible, revealed that nearly as many users were hitting this particular site from Ontario alone as came from the USA as a whole. This is information that could not be derived without altering the presets on my log analysis tool.

Analog is a popular log analysis application and is configurable enough to let you easily sort your users by ISP using a syntax like

HOSTALIAS *.cgocable.net "Ontario.ca"
HOSTALIAS *.charter.com "USA.com"

The .ca or .com suffix is necessary because Analog treats HOSTALIAS as in IP address and therefore will only accept strings that are formed like IP addresses.

Where did your users find your site?

Referers are incredibly useful. Probably the most obvious use is to find sites that link to the one whose logs you are studying. This will turn up influential link pages on the internet, blogs whose author shares an interest in your site, and sometimes forum discussions. It pays to know the internet community that brings you traffic - this can help determine the motivations of your users and cater to them better as well as to help in search engine optimization since. For example, Google also likes interlinking - it will help your google rank if you link back to some of the people that are already linking to you.

The ability to view external site referers is well-supported by popular log analysis systems which hyperlink listings so that you can view the referring pages with a single click.

What are your users searching for?

While searches that fail to satisfy a users request obviously won't turn up in the logs, you can learn a lot from those that succeed. You may be surprised what keyword searches lead people to your site, and you can use this information as well as results from less successful searches to reorganize content to optimize searches and learn what people who visit your site may be looking for. Analysis of the logs on a recent project, for example, revealed that an inordinate number of searches were resulting in pdfs which revealed nothing about the site they were hosted by and provided no links back to the site.

On the other hand, no log analysis application I have used has an adequate system for viewing the referring searches first-hand. You might want to see complete referring url for searches including parameters and the particular url of the search engine used, so you can perform the search for yourself and see why, for example, a plausible search for a European villa rental service like "European villa rental" isn't bringing your site traffic - there are a million other sites higher ranked than it in this search - while a seemingly less intuitive search like "culinary chateau" is ranked 8th most popular search phrase - it's on the second page of results.

In order to simplify the presentation server statistics all systems I have used including Analog and Webalizer leave the parameters and urls off, displaying search queries as plain text and preventing you from seeing your own search results ranking first-hand. This is a major handicap and the only work-around I'm aware of is to view the actual server logs, find the relevant information manually (using grep on Mac OS X or Linux), and paste the referring URLs into your browser.

Which internal links are people following?

In the case of internal pages you can observe which links are most used to access certain types of information and which pages may not be getting the attention they deserve, again by tracking referers. One log analysis application I found, Wusage, creates an ingenious, if ugly, visual navigation map in pdf form which allows you to see popular documents and the most common link paths in one view using a simple tree diagram.

Extending basic stats functionality with redirects

By adding a redirect page, you can track how many users are following a link from your site out to one particular document on the web. For example on a site that has two options for selling books - by downloading an order-form pdf or by following a link to Amazon - the click-throughs from the author's site can be tracked by making linking to a blank page with a redirect rather than directly to Amazon.

Tracking links from a newsletter back to your site - observing these hits independently from hits that come from Google or elsewhere, as well as tracking hits from different issues of your newsletter - can be facilitated by adding parameters to the url which don't have to be processed by an application server in order to be tracked by the server logs.

Changing http://www.MyDomain.com to http://www.MyDomain.com?issue=12 will not affect the display of the page unless you want it to, but will allow you to track the success of each consecutive newsletter issue separately without any additional work.

Track user habits and browser settings with response codes

The variety of information which can be gleaned from server response codes may be as deep as the number of possible uses there are for such codes as "303 See Other" or "403 Forbidden". See this page from the W3 Consortium's HTTP specification for the complete list of server codes.

"304 Not Modified" can be used to determine how often external css and js files are being cached (about two thirds of the time on one site) compared with how often users had to down load them (returning "200 OK") to determine how much bandwidth is conserved by moving style information off html files into external files.

Given that unique visitors to the site will not have cached versions of files on your site and that these cached files will expire eventually, "304 Not Modified" could also be used to measure fluctuations in the numbers of returning vs. unique visitors, if not the absolute figures. Unfortunately ache expiry depends on user settings, and statistics about these settings can not easily be found.

Conclusion

Server logs are there to be used and contain real information about real users on your site. There is no more reliable source for information on user patterns, though there are many tools that are more flexible. My one misgiving is that the log analysis tools out there today are so consistently mediocre that they will frustrate many attempts to study the data more deeply. I hope that these ideas will help provide the impetus for these features to be added to existing log analysis tools.

Behind firewalls?

Submitted by notabene on November 25, 2003 - 11:49.

I see your point, of course.

But how do you deal with several people being connected behind the same firewall?

More precisely, how do you know when the data is to be considered relevant or not?

login or register to post comments

Where user's live

Submitted by omerida on December 9, 2003 - 11:00.

Matching ISP's to geographic regions is a pretty good approach to figuring out where traffic is coming from. Is there a compiled list like this for analog available anywhere? Kepe in mind, that doing so will give you a better idea of WHERE traffic to your site is coming from but because of proxying and firewalls, you won't really know how many people have accessed your site. Anyone using analog should read How the Web Works especially section 4 - "What you can't know"

Also, for this technique to work you'll have to have the IP addresses in your logs resolved to the correct domain name. The best package I've found to do that is jdresolve, much faster that Analog's built-in resolution and with tweaking you can get "100%" resultion (not really but close )..

login or register to post comments

Tracking users in the access logs in real time

Submitted by cs2004 on February 10, 2004 - 00:02.

Do you have any solutions, how to track the visitors in real time ? As far as i know the Webalizer or another like programms will work only by means crontab.

login or register to post comments

Google Pagerank

Submitted by pappa on November 18, 2004 - 07:37.

Google also likes interlinking - it will help your google rank if you link back to some of the people that are already linking to you.

Actually, that's not true. You cannot improve your own Google PR by linking to others, only incoming links count.

Also, you can damage your PR by linking to other sites through two related methods. The first is described by Google as linking to "Bad Sectors of the Web", which they do not elaborate upon. The second is the hotly contested Pagerank Leak. It's difficult to say, but the "Bad Sectors of the Web" thing may just be an extreme example of Pagerank Leak occuring when Google have artificially removed the PR of sites you link to.

Pappa

login or register to post comments

RE:Where user's live

Submitted by jayZ on March 14, 2005 - 08:08.

There's another new trick I've found in the webmaster/search engine optimisation toolbox. Ubiquitous search engine Google has now started 'forcing' users to use their local geographical portal (e.g. Google.co.uk if you're in the UK rather than google.com) by redirecting them.

The interesting effect is that this means you can see the Google URL they were referred by, and therefore where they are. Seeing as this is based on Google Geo-IP database, it's very accurate without the cost of a commercial Geo-IP list. (i.e. just look at the referring search engine's table in your traffic report)

Only works for search engine referrals though, but it's a useful trick all the same - especially when you're compiling your clients monthly internet marketing report from the scantiest of data...

login or register to post comments

The access keys for this page are: ALT (Control on a Mac) plus:

evolt.orgEvolt.org is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.