Calculate word frequency of files in Bash
I was reading Ryan Tomayko’s blog post AWK-ward Ruby explaining how the Unix AWK Tool is among the ancestors of Ruby and Perl. He wrote a few examples showing some of AWK’s advanced features, one of them which listed the word frequencies of any file provided. I found this example quite useful and extracted it as a function in to my Dotfiles.
#!/bin/bash
function word_frequency() {
awk '
BEGIN { FS="[^a-zA-Z]+" }
{
for (i=1; i<=NF; i++) {
word = tolower($i)
words[word]++
}
}
END {
for (w in words)
printf("%3d %s\n", words[w], w)
}
' |
sort -rn
}
Now you can pipe
the output of any file to this function and it will list all words and their frequencies in Descending order.
# Examples:
cat my_text_file.txt | word_frequency
# Pipe the contents of a text file to the function using `cat`
curl -s https://github.com/humans.txt | word_frequency
# Get word frequency of a file on the internet
Looking forward to using AWK more and more.