[Dailydave] approximate string matching - Bloom filters
Fausett, Mark (US SSA)
mark.fausett at baesystems.com
Fri Sep 1 12:23:43 EST 2006
Bloom filters are approximate in a different sense though -- Think of
them as space efficient, but lossy token sets; you put a bunch of tokens
in, and subsequently can query whether a particular token was placed
into the set; to some degree of confidence.
Bloom filters are subject to false positives -- they'll sometimes
incorrectly tell you that a token is in the set -- but not false
negatives. Because hashing functions are used to insert tokens into the
bloom filter, the false positives have nothing to do with approximate
string matches.
Cheers,
Mark Fausett
-----Original Message-----
From: dailydave-bounces at lists.immunitysec.com
[mailto:dailydave-bounces at lists.immunitysec.com] On Behalf Of Martin
Roesch
Sent: Friday, September 01, 2006 10:06 AM
To: Mateusz Berezecki
Cc: dailydave at lists.immunitysec.com
Subject: Re: [Dailydave] approximate string matching
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Have you taken a look at Bloom Filters? I'm not sure what your exact
requirements are but Bloom Filters are fast and do probabilistic pattern
matching. Check out:
http://en.wikipedia.org/wiki/Bloom_filter
http://www.arl.wustl.edu/~sarang/analysis.pdf
They're easy to implement and fast, so maybe that will work for you.
Hope that helps!
-Marty
...SNIP...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Fausett, Mark (US SSA).vcf
Type: text/x-vcard
Size: 216 bytes
Desc: Fausett, Mark (US SSA).vcf
Url : http://lists.immunitysec.com/pipermail/dailydave/attachments/20060901/2104be93/attachment.vcf
More information about the Dailydave
mailing list