Sometimes you don't need statistics. Sometimes you don't need retrospectives. Sometimes you don't need superlatives.
My Top 100 Blogs for Developers is, without a doubt, the single most popular post I ever created. On this page I want to share with you the (somewhat complicated) process I use in making such a list. I am sure many people in the world would be interested in other lists, like a Top 100 Blogs for Secretaries, a Top 100 Blogs for Managers, a Top 50 Blogs for Dog Lovers, or a Top 200 Blogs for Tree Huggers. However, I will not be the one to make those lists. But if someone else wants to, then I invite them to follow the steps described below.
Warning: if you want to make a Top Blog list in the way I describe here, you must have some skills in working with spreadsheet formulas! And some stamina is useful as well, because it’s a lot of work…
Research
Making a Top X Blogs begins with finding the right candidate blogs. Here are a couple of suggestions…
For my Top 100 Blogs for Developers I ended up with between 200 and 300 blogs. Important: doing statistics and calculations with that many blogs is a lot of work, so make sure you limit the size of your top list to something you can manage time-wise!
Statistics
When you’ve found enough candidates for your top list, it is time to start collecting their statistics. Most of the statistics mentioned in this section are controversial, for some reason or another. But it’s the only thing we have. So the best thing we can do is not to rely on each of them too heavily. That’s why I use multiple statistics. When you take the average of multiple statistics, the deficiencies tend to cancel each other out. And no blog is punished too heavily for doing badly in one specific category.
For some platforms Alexa only tracks the traffic of the entire platform, and not for the individual blogs hosted on that platform. For example: for each blog on Blogger.com and TypePad.com, Alexa maintains a separate ranking. If you check the ranking for Evolving Web (http://ourfounder.typepad.com/) you see that Alexa tracks it’s ranking separately. But for MSDN.com, Alexa has just one ranking: that of the MSDN site as a whole. If you check the ranking for J.D. Meiers blog (http://blogs.msdn.com/jmeier/) you will see it has an extremely high ranking. But it would be unfair to use that ranking for each individual blog hosted on MSDN.
Likewise, there may be a corporate site that draws a lot of traffic, with a minor blog hosted on the parent site that forms only a small part of the corporate site. In such a case Alexa would give you the ranking of the parent (corporate) site, and it would not be fair to use that number for the blog. If you check the ranking for ThoughtBlogs (http://blogs.thoughtworks.com/) you see the Alexa ranking is for the corporate site ThoughtWorks.com, while the blogs are only a smaller part of that site.
One last issue is that Alexa does not always have traffic ranks available. In some cases they simply have no data. All things considered, it means that the Alexa rank for some blogs must be set to unknown. This is important to take into account when doing your calculations (see next section). In my experience, about 9% of the blogs I checked out have no rating on Alexa.
Similar to Alexa’s ranking, the Technorati Authority numbers are not always available. Blog authors have to submit their blogs to Technorati, or else Technorati will not maintain their authority numbers. This means that the Technorati Authority for some blogs must be set to unknown. In my experience, about 12% of the blogs I checked out have no rating on Technorati.
Note that some blog authors don’t allow comments on their blogs. Like before, in those cases the resulting statistics will be unknown. In my experience, only about 4% of the blogs I checked out have no comments.
When you check FeedRank it is important to consider this: many blogs offer their feeds in multiple formats (Atom and RSS). You should check the URL of each feed format because sometimes the different formats turn out to have different FeedRank values. (In those cases I simply take the highest number.)
Like before, some blogs have no FeedRank, and the resulting statistics will be unknown. In my experience, this applies to about 9% of the blogs I checked out.
One last comment about these finding statistics: for each category you should aim to collect all data on the same day. All statistics are regularly updated, and you don’t want that to happen right in the middle of your statistical analysis!
Calculations
When you’ve collected the statistics for each blog in your spreadsheet (each blog on a new row, and each statistical category in another column) it is time to do the calculations. I will show you my methods, and my motivations behind it:
=IF(_authority_<>””; RANK(_authority_;_columnofallauthorities_; 0); “”)
For each blog, this formula calculates if the blog is the #1 blog in this statistical category, or the #2, or #3, etc. (Blogs will end up with the same rank, if they have the same authority number, but that’s ok.) A similar formula must be constructed for PageRank, Alexa Rank, Google hits, Comments and FeedRank. Basically, it means that we’re doing away with the different scales and types of the six statistical categories. We simply end up with six columns of rankings, where the best blog scores #1, and the others follow behind it.
Important: the Alexa statistic is the only where a low number means a good score! In that case you must change the formula so that the lowest, and not the highest, number is ranked as #1:
=IF(_alexarank_<>””; RANK(_alexarank_;_rangeofallalexaranks_; 1); “”)
One last thing to point out is that we must deal with statistics that are unknown (or empty). In this formula unknown values will simply lead to unknown rankings (empty cells).
=IF(_authorityrank_<>””; COUNT(_rangeofallauthorityranks_)+1 – _authorityrank_; “”)
This formula takes the results of step 1 as its input (authorityrank). Again, it knows when a statistic was unknown, and it gives an empty result in those cases.
=IF(_authoritypoints_<>””; 100/COUNT(_rangeofallauthoritypoints_) * _authoritypoints_; “”)
This formula takes the number of points from step 2 as its input (autoritypoints). It then makes sure that all points are re-scaled to a new scale of 0 to 100. This enables us to prepare the last ratings in the final steps.
20% Googe PageRank
20% Technorati Authority
20% Alexa Rank
20% Google hits
10% RSSMicro FeedRank
10% Comments
It means that, in my final calculations, the measure of RSS feed subscribers accounts for 10% of the final ranking, while Technorati’s authority numbers influence 20% of my results.
After you’ve calculated the average scores in step 5, it will be easy for you to order the results according to those scores, and you will have made yourself your own Top Blogs list!
If you have created a list that you want to share with others, or when you think the research/calculation process can be further improved, please feel free to share this with us in the comments section.