Friday, February 02, 2007

I've just been experimenting with Windows Live searches (via the MSN Search Web Services programming interface) and produced results which surprised me, although they are quite logical in a way.
It seems that for queries with many results (e.g. 8,000+) the hit counts reported by the search engine tend to be estimates for the number of matching URLs (and are probably slight overestimates of about 20% too high). For queries with few results (e.g., 500-) after eliminating duplicates, near duplicates (similar snippets displayed in the results) and multiple URLs from the same site. Hence high hit counts tend to measure something different to low hit counts! This may be the same for other search engines too.