Search This Blog

Saturday, January 29, 2011

Caching lengths and frequencies of lettered words in a paragraph

Suppose that we have a very short paragraph like this "Roses are red. Violets are blue. This verse doesn't rhyme. And neither does this one," how can we find and save all different word lengths and their frequencies of occurrence? For example, "red" is 3 letter long and there are a total of five letters of this length (are, red, are, and, one) in the paragraph. Then one of the items in our cache should be 3 and 5

Just like other problems where we have to keep track of the number of occurrences, we should use a hash table. The algorithm is like this:

public HashMap <Integer, Integer> countWordLengthFrequency(String[] paragraph)
{
    HashMap <Integer, Integer> frequencyTable = new HashMap<Integer, Integer>();

    for (int i = 0; i < paragraph.length; i++)
    {
      if (!frequencyTable.containsKey(paragraph[i].length()))
        frequencyTable.put(paragraph[i].length(), 1);
      else
      {
        Integer count = frequencyTable.get(paragraph[i].length()) + 1;
        frequencyTable.put(paragraph[i].length(), count);
      }
    }

    return frequencyTable;
}

Explanation:we just loop through the words in the paragraph. For each word, we check to see if its length is already in the hash table. Therefore, there are two cases. 1) If the word's length is already in the table, we increase the frequency count by one. 2) Otherwise, we hash the length into the table and set the frequency to 1 because this is the first time this length is hashed into the table.

Obviously, the time complexity is O(n) because we have to check every word in the paragraph. Moreover, in the worst case, the space complexity is O(n). That's when every word in the paragraph has a different length than the other words. If you know any better algorithm for this problem, please leave it in the comment below. Thanks for reading!

No comments:

Post a Comment