When using Google to test for importance or existence, bear in mind that this will be biased in favor of modern subjects of interest to people from developed countries with Internet access, so it should be used with some judgment. For example, a current popular-music group from the United States will probably need many thousands of Google hits before most Wikipedians consider it worthy of inclusion. A similarly important group in a country with less Internet presence will have many fewer hits, if any. An important musician of the 14th century might not show up on Google at all.
Q. What is the minimum number of matches you should see if a term is not made up? (3? 27? 81?)
A. Perhaps a few hundred, but this depends on several things:
Further judgment: the Google test checks popular usage, not correctness. For example, a search for the incorrect Charles Windsor gives 10 times more results than the correct Charles Mountbatten-Windsor.
Also, some topics may not be on the Web because of low Internet use in certain areas and cultures of the world.
The search result from Google are highly biased towards popular culture. This article, Scientists Use Google To Measure Fame vs. Merit, for example, points out that Barry Williams ("Greg Brady" from the Brady Bunch, has 45% more Google hits than Albert Einstein (2,400,000 vs. 1,660,000).
Especially when trying to determine the frequency of use of diacritic vs. non-diacritic versions of a word, the internet (and therefore Google) is extremely biased towards the non-diacritic versions. This is often more an example of laziness and cluelessness of those who created the webpages than a real test of usage. For example, spelling the weather phenomenon El Niño as 'El Nino' is just plain wrong (it doesn't rhyme with keno, vino, or Zeno). When Spanish words that have the ñ letter get naturalized into English the ñ often gets converted to "ny" (as when cañon became canyon), but "El Niño" is rarely spelled "El Ninyo" (and that spelling is more likely not on an English-language website). Yet despite the fact that the spelling should be El Niño, a Google test shows that there are more web pages with "El Nino" than "El Niño" (8,830,000 vs. 7,970,000 as of September 2005). Much better criteria for deciding upon the use of the diacritic vs. non-diacritic versions of a word would be the entries in dictionaries, other encyclopedias, and style guides.
Note that other Google searches, particular Google Books have a different systemic bias from Google Web searches and give an interesting cross-check and a somewhat independent view.
The simple Google test by number of hits is not applicable to people or titles within a number of internet-based businesses, most notably pornography. This is because an entire sub-industry has appeared with the sole purpose of increasing the number of Google hits certain subjects receive. They achieve this by use of a number of techniques, including multiple mirror sites, and spamming of notice boards and Wikipedia. Also, pornographic actors tend to appear in production-line quantities of entirely non-notable films. It is therefore necessary, as per Wikipedia:criteria for inclusion of biographies, for the researcher to prove that the actor or actress has established notoriety. This usually requires finding journalistic coverage, independent biographies or extensive fan clubs.
This guide is licensed under the GNU Free Documentation License. It uses material from the Wikipedia.
Comments