Todays article comes from
Rob Tracy of
Remedial Comics. Recently the subject of metric sites came up on the forums and in that discussion and in another topic there was some question of the value of traffic numbers and metric sites. This is something that used to concern me a great deal until I learned the truth about it so I wanted to share what I have learned and maybe clear up some confusion on the subject.
One thing I want to say right up front is that this is something I did not understand myself until weeks of pestering
Chadm1n; our resident admin bore fruit in a long discussion of what is and what is not measurable on the internet. So while I'm writing this article the only reason I can share this knowledge with you is because of HIS expertise and I plan on having him look this article over before I post it. And hopefully he will field any questions that pop up on the forums.
Traffic numbers are VERY important to webcomic creators. As
Howard Tayler pointed out in his amazing presentation on
Open Source TV if you have verifiable traffic numbers (Howard had done a survey of his readers and that should ring some bells with you because you might recall that almost every popular web comic has been doing surveys for some time now) you can literally "take them to the bank."
But why, you might ask yourself, would these large webcomics do surveys? Well the practical use is to measure possible merchandise sales, set ad rates and try and establish some hard numbers on the size of their committed fan base (and probably a whole bunch of stuff some marketing people could tell us that don't know and don't really want to know). But the REASON these surveys have to be done is because traffic on the internet is largely unmeasurable.
Let's talk about the different ways that traffic is measured on the web and then I can tell you why they don't really work. If you are really into the technical jargon of this subject
this post at Wikipedia does an excellent job of explaining it all but doesn't draw the conclusions I will.
Most metric sites require the site being measured to install a small amount of hidden code on the site for the purposes of measurement. As a Google Analytics member myself I know that there is a tiny bit of code that goes off like a roman candle every time my home page gets hit. The same goes for my Project Wonderful ads. The code has to be there or the hits don't get measured.
The big problem with this code is that it requires the user and his/her browser to cooperate and allow themselves to be counted. Many people on the web do not use Javascript and for the most part those folks will be uncountable by this kind of web analytic collection.
Zach Weiner recently
made a Reddit post about Ad-Block and the implications it has for web based businesses like webcomics that outlines some of the other potential problems with this. If he's right and 10-20% of his traffic is unrecordable because of Ad-Block that means that whatever you are using to gauge your traffic is already off by that much. Would you rely on statistics that are only 80% correct? When you have 1,000 unique visitors to your site and your metric is telling you 800 that might not be so bad. But what happens when you have 10,000 and your metric says 8,000? How about when you hit 100,000 and your metric is only reporting 80,000?
And that's only one instance of why that type of measurement may fail. Some measurement systems involve cookies. Many web browsers turn off cookies or manage which they accept and which they do not.
Then there are sub domains. You know thissite/atthatsite.com kind of thing. For example;
Gamespy is an incredibly popular gaming website with tons and tons of traffic. They also offer lesser known features like webcomics in their
humor section. If you were to copy the link to one of these webcomics and then do an
Alexa search for traffic you would think these comics were some of the most popular on the internet. When in fact the traffic numbers are high because Alexa does not differentiate between the main site (Gamespy) and the sub domain at which those comics live.
There is also the problem with IP tracking. How many universities, hotels, businesses and homes have multiple users on the same IP address? Think about that for a moment? Then consider all the times your internet service provider has changed your IP address. IP addresses issues have the potential to either radically reduce your real numbers or in the case of a single unique user logging in twice under two different IP addresses slightly, falsely increase those numbers.
Not to mention multiple users on the same computer. Ever been in a university computer lab? Aren't college students some of our biggest readers? Two or more people loading the same page from the same computer might increase your page views. But it won't do a thing for your unique users.
Finally, the type of web analytic that has been around the longest and is often trotted out as the most accurate (also the most complex and difficult to interpret) is the log file. By actually measuring the exact number of times certain elements are pulled from the server you can say with absolute confidence that your page was loaded X number of times by a web browser somewhere.
Here's the problem. Caching of certain elements of your page will skew those numbers. So unless you know precisely what portion of your page is being cached by every computer that loaded your page initially you cannot say for sure how many times your page really got loaded. Then there are archive sites. Supposedly Google is now recording everything on the web at all times that isn't secured behind some kind of firewall. So people could be loading your older comics from an archive other than the one at your site. And lastly there are the bots.
We all know that the web is literally crawling with tons and tons of little computer programs that are running around recording, analyzing and indexing everything. I currently have a day in August of last year standing as my day with the most people on line at my forum. On that day 59 people were on line in my forum. On that day there was no comic update and no posts were made. I was the only user who logged in. The other 58 people were either guests who did not desire to post anything or bots. Most were undoubtedly bots.
Just measuring your server logs leaves you open to WILD traffic fluctuations based upon whether or not you say or do something that these little programs find interesting. And there is currently no easy way to tell the difference between man and machine. Look at how much trouble we all go through to keep these things out of our comments sections and forums. Thus far no one has figured out a way to stop their page loads from counting on the server logs.
There are some hybrid solutions and some web analytics are better than others. But, and this really is the message that I want you to take away from all of this; for the most part traffic numbers are voodoo. The absolutely best use for this type of traffic measuring is in establishing a baseline to make note of increases and decreases in traffic, trying to identify the patterns of behavior on your site that have contributed to these upticks or declines and reacting accordingly. Because as an actual, factual, measurable indication of your true traffic numbers they are completely worthless.