[Dailydave] approximate string matching - Bloom filters

Mateusz Berezecki mateuszb at gmail.com
Fri Sep 1 13:19:23 EST 2006


On 9/1/06, Fausett, Mark (US SSA) <mark.fausett at baesystems.com> wrote:
> Bloom filters are approximate in a different sense though -- Think of
> them as space efficient, but lossy token sets; you put a bunch of tokens
> in, and subsequently can query whether a particular token was placed
> into the set; to some degree of confidence.
> Bloom filters are subject to false positives -- they'll sometimes
> incorrectly tell you that a token is in the set -- but not false
> negatives.  Because hashing functions are used to insert tokens into the
> bloom filter, the false positives have nothing to do with approximate
> string matches.
>

By trial end error I already discovered this really unwanted behavior :-/

It's very good for representing what is not in the set rather than representing
the set itself.

Mateusz


More information about the Dailydave mailing list