Java Program to Implement the Bloom Filter to Test whether a Given Word is in a Set using MD5 Hashing

There are many circumstances where we need to find out if something is a member of a set, and there are many algorithms for doing it. For example a spell checker checks to see whether a given word is in the dictionary. Holding 250,000 words in memory for a spell checker might be too big an overhead if your target environment has low memory.

Bloom filters are one solution for the above mentioned problem. It works as follows :
• Take a big array of bits, initially all zero.
• Then take each word from your dictionary of words.
•  Produce ‘n’ independent hash values for each word. Set the bit positions of the array at the corresponding hash values. Sometimes there’ll be clashes, where the bit will already be set from some other word. This doesn't matter.
To check whether a given word is in the dictionary, do the following :

• Produce the n hash values of the given word.
• Then check to see if each of the bits corresponding to these hash values is set.
• If any bit is not set, then you never loaded that word in, and you can say that word is not in the dictionary

To find the hash values, MD5 algorithm can be used. MD5 algorithm produces a 128-bit (16-byte) hash value and then take smaller hash values by extracting sequences of bits from the result.

Example : To store the word "computer"
MD5 value in decimal for the word "computer" is - 296852903797016122161588463667493754506
Taking 3 digits from the hash value, bit positions 296, 852, 903, 797, 016, 122, 161, 588, 463, 667,493, 754 and 506 needed to be set.

Example 2: ATX - Hash value: 71643585294187099263809470221633738211. Bit positions to set are 716, 435, 852, 941, 870, 992, 638, 094, 702, 216, 337, 382, 11

Output :

Check for the word "chipset" in the list
--------------------------
Success : Word is in the List
+++++++++++++++++++++++++++
Check for the word "data" in the list
--------------------------
Success : Word is in the List
+++++++++++++++++++++++++++
Check for the word "achchuthan" in the list
--------------------------
Word is NOT in the List
+++++++++++++++++++++++++++

Thank you for vising

1. Great tutorial. Am looking forward to using this to develop my own spell checker in java.

2. I have created a text file by the name of terms but it does use it. Instead it says:

Error: Could not find or load main class md5file.MD5File
Java Result: 1

What is this md5file?