13 April 2022

El software se deteriora

El software no se desgasta, pero sí se deteriora
Software doesn’t wear out, but it does deteriorate.

 

--Pressman, Ingenieria del Software, 7th ed.

--Software Engineering

 

31 January 2022

Bloom Filter

Bloom filter is a space-efficient probabilistic data structure for testing if an element is in a Set. It is a space-efficient probabilistic data structure.[1]

It cannot return False negative, nevertheless it can return false positives. So there is a probability that the element belongs to the set but cannot not be acute 100%.[8]

In other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with the counting Bloom filter variant); the more items added, the larger the probability of false positives.[1]

Use a hash function to map a key to a bucket. However, it will not store that key in that bucket, it will simply mark it as filled. So, many keys might map to same filled bucket, creating false positives.

Interestingly a Bloom filter can also trade accuracy for space.


An example of a Bloom filter, representing the set {x, y, z} . The colored arrows show the positions in the bit array that each set element is mapped to. The element w is not in the set {x, y, z} , because it hashes to one bit-array position containing 0. For this figure, m = 18 and k = 3. (wikipedia)

 

A Guava BloomFilter is created by calling the static method create on the BloomFilter class,
passing in a Funnel object and an int representing the expected number of insertions. A Funnel, also new in Guava 11, is an object that can send data into a Sink.[6]

//Creating the BloomFilter
BloomFilter bloomFilter = BloomFilter.create(Funnels.byteArrayFunnel(), 1000);

//Putting elements into the filter
//A BigInteger representing a key of some sort
bloomFilter.put(bigInteger.toByteArray());

//Testing for element in set
boolean mayBeContained = bloomFilter.mayContain(bitIntegerII.toByteArray());

 

https://github.com/devwebcl/java-samples/tree/master/src/main/java/cl/devweb/guava/bloomfilter


Bad sample where expected elemts is 5 and we insert 100,000, will give too many false positives:

BloomFilter<Integer> filter = BloomFilter.create(
 Funnels.integerFunnel(),
 5,
 0.01);
IntStream.range(0, 100_000).forEach(filter::put);

 

Funnel

A Funnel describes how to decompose a particular object type into primitive field values. For example, if we had

class Person {
 final int id;
 final String firstName;
 final String lastName;
 final int birthYear;
}

our Funnel might look like

Funnel<Person> personFunnel = new Funnel<Person>() {
 @Override
 public void funnel(Person person, PrimitiveSink into) {
 into
 .putInt(person.id)
 .putString(person.firstName, Charsets.UTF_8)
 .putString(person.lastName, Charsets.UTF_8)
 .putInt(birthYear);
 }
};


  1. https://en.wikipedia.org/wiki/Bloom_filter
  2. https://llimllib.github.io/bloomfilter-tutorial/
  3. https://www.baeldung.com/guava-bloom-filter
  4. https://www.geeksforgeeks.org/bloom-filters-introduction-and-python-implementation/
  5. https://github.com/google/guava/wiki/HashingExplained
  6. https://dzone.com/articles/using-guava-bloomfilter-guard 
  7. https://github.com/bbejeck/guava-blog/blob/master/src/test/java/bbejeck/guava/hash/BloomFilterTest.java
  8. https://www.i-programmer.info/programming/theory/2404.html 

 

PS: there is a The Invertible Bloom Filter !... ?

 

25 January 2022

The Bank of San Serriffe 2

After a long time, again my 2 cents for FG Knuth

Added:
+      Germán González-Morris           0x$6.40

Update 2025.01

                  Germán González-Morris 0x$6.60


https://www-cs-faculty.stanford.edu/~knuth/boss.html

13 January 2022

Jenkins directory structure

From official documentation:

JENKINS_HOME has a fairly obvious directory structure that looks like the following:

JENKINS_HOME
 +- config.xml     (jenkins root configuration)
 +- *.xml          (other site-wide configuration files)
 +- userContent    (files in this directory will be served under your http://server/userContent/)
 +- fingerprints   (stores fingerprint records)
 +- nodes          (slave configurations)
 +- plugins        (stores plugins)
 +- secrets        (secretes needed when migrating credentials to other servers)
 +- workspace (working directory for the version control system)
     +- [JOBNAME] (sub directory for each job)
 +- jobs
     +- [JOBNAME]      (sub directory for each job)
         +- config.xml     (job configuration file)
         +- latest         (symbolic link to the last successful build)
         +- builds
             +- [BUILD_ID]     (for each build)
                 +- build.xml      (build result summary)
                 +- log            (log file)
                 +- changelog.xml  (change log)

Although the version (legacy) I am using is a bit different.

Blog Archive

Disclaimer

Qux