One can figure out how correlated a term B is to another term A by using Google.
cor_B(A) = Number of search results for query A B / Number of search results for query B
So for example if you search for query 'cricket bat' number of results returned is approx 746000
while results returned for 'bat' alone is 105000000 which means that
cor_bat(cricket) = 1/130 approximately which can be interpreted as that out of 130 pages containing the word bat only 1 is referring to cricket bat. Now for term lbw
cor_lbw(cricket) = 1730000/247000 = 1/1.4 (approx). This is understandable because lbw hardly has any other meaning outside of cricket jargon.
Now that the definitions are clear, the game is to find terms with highest correlation to each of the following terms
1. cricket
2. java
3. radiohead
4. einstein
So lets see how good you can do!! And ofcourse B=A should give you the maximum correlation but that is not allowed.
No comments:
Post a Comment