Zipf law
- Zipf law states that only a few words in a language are used very often, while many or most are used rarely.
- George Kingsley Zipf (1902-1950) first proposed this law in 1935.
- It was one of the first academic studies of word frequency.
Formulae of Zipf law
- The frequencies f of certain events are inversely proportional to their rank r.
- Frequency is given approximately byf(r) ≅ 0.1/r.
Explanation
- The mostcommon word in English (Rank 1) occurs about one-tenth of the time in a typical text.
- The next most common word (Rank 2) occurs about one-twentieth of the time and so forth.
- Another way of looking at this is that a rank r-word occurs 1/r times as often as the most frequent word.
- That is, rank 2 word occurs half as often as the rank 1 word,
- The rank 3-word one-third as often,
- The rank 4-word one-fourth as often, and so forth.
Uses
- It is useful in schemes for data compression and in the allocation of resources by urban planners.
- For example, in 1949 he claimed that the largest city in a country is about twice the size of the next largest, three times the size of the third-largest, and so forth.
Responses