31 January 2022

Bloom Filter

Bloom filter is a space-efficient probabilistic data structure for testing if an element is in a Set. It is a space-efficient probabilistic data structure.[1]

It cannot return False negative, nevertheless it can return false positives. So there is a probability that the element belongs to the set but cannot not be acute 100%.[8]

In other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with the counting Bloom filter variant); the more items added, the larger the probability of false positives.[1]

Use a hash function to map a key to a bucket. However, it will not store that key in that bucket, it will simply mark it as filled. So, many keys might map to same filled bucket, creating false positives.

Interestingly a Bloom filter can also trade accuracy for space.


An example of a Bloom filter, representing the set {x, y, z} . The colored arrows show the positions in the bit array that each set element is mapped to. The element w is not in the set {x, y, z} , because it hashes to one bit-array position containing 0. For this figure, m = 18 and k = 3. (wikipedia)

 

A Guava BloomFilter is created by calling the static method create on the BloomFilter class,
passing in a Funnel object and an int representing the expected number of insertions. A Funnel, also new in Guava 11, is an object that can send data into a Sink.[6]

//Creating the BloomFilter
BloomFilter bloomFilter = BloomFilter.create(Funnels.byteArrayFunnel(), 1000);

//Putting elements into the filter
//A BigInteger representing a key of some sort
bloomFilter.put(bigInteger.toByteArray());

//Testing for element in set
boolean mayBeContained = bloomFilter.mayContain(bitIntegerII.toByteArray());

 

https://github.com/devwebcl/java-samples/tree/master/src/main/java/cl/devweb/guava/bloomfilter


Bad sample where expected elemts is 5 and we insert 100,000, will give too many false positives:

BloomFilter<Integer> filter = BloomFilter.create(
 Funnels.integerFunnel(),
 5,
 0.01);
IntStream.range(0, 100_000).forEach(filter::put);

 

Funnel

A Funnel describes how to decompose a particular object type into primitive field values. For example, if we had

class Person {
 final int id;
 final String firstName;
 final String lastName;
 final int birthYear;
}

our Funnel might look like

Funnel<Person> personFunnel = new Funnel<Person>() {
 @Override
 public void funnel(Person person, PrimitiveSink into) {
 into
 .putInt(person.id)
 .putString(person.firstName, Charsets.UTF_8)
 .putString(person.lastName, Charsets.UTF_8)
 .putInt(birthYear);
 }
};


  1. https://en.wikipedia.org/wiki/Bloom_filter
  2. https://llimllib.github.io/bloomfilter-tutorial/
  3. https://www.baeldung.com/guava-bloom-filter
  4. https://www.geeksforgeeks.org/bloom-filters-introduction-and-python-implementation/
  5. https://github.com/google/guava/wiki/HashingExplained
  6. https://dzone.com/articles/using-guava-bloomfilter-guard 
  7. https://github.com/bbejeck/guava-blog/blob/master/src/test/java/bbejeck/guava/hash/BloomFilterTest.java
  8. https://www.i-programmer.info/programming/theory/2404.html 

 

PS: there is a The Invertible Bloom Filter !... ?

 

Blog Archive

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.