[Dailydave] Unknown Application Protocol Analysis
William McVey
wam at cisco.com
Wed Sep 6 15:06:12 EST 2006
On Wed, 2006-09-06 at 22:59 +0800, Rhys Kidd wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi List,
>
> I've been thinking about a problem faced when approaching an unknown traffic
> flow, and think this list probably contains an expert or two in this area.
>
>
> Q. How do you run a quick one pass analysis of some proprietary application
> protocol?
>
>
> I know it's fairly easy to look at small subsets of traffic manually,
> looking for the \x00 and slowly guess-timate where fields begin and end,
> what constitute a record, what are static offsets etc, but I'm imagining a
> tool that would take in a batch of traffic and work out roughly what's what,
> seeing the big picture.
>
> I'd imagine this tool would run a first check, looking for what might
> constitute discrete units of information, (possibly all those bounded by
> \x00).
>
> I'd imagine this tool would then look for some of the basic layouts of TLV
> protocols (which seem most common IMHO) by working out lengths of what
> appear to be strings, and look for those ints before or after. Maybe even
> looking for md5 or sha1 hashes that correspond to other data fields. Then
> look for repeating byte patterns etc.
>
> Once it understands the structure of a single packet, then compare it over
> time with other packets between similar host, looking for which fields are
> constant, which ones change randomly (signifying GUID or Message IDs) and
> those that only change slightly (perhaps timing fields). This would be where
> the real knowledge would lie, as assumptions made about individual packets
> (eg what is really static or dynamic) could be rectified over a larger
> data-set.
>
> Then print this out in a way like:
>
> <static header><record 1><length><Unicode content><\x88\x88\x88><record
> 2><length><COMPUTER_NAME><record 3><CURRENT_TIME><unknown static crud>
>
> Producing an Ethereal protocol definition file at the end would be icing on
> the cake!
>
> I've had a look at:
> [1]
> http://research.microsoft.com/workshops/sysml/papers/sysml-Gopalratnam.pdf
> [2] http://www.ub.utwente.nl/webdocs/ctit/1/000000ef.pdf
>
> But can't seem to find any public code that has attempted to solve the same
> problem.
> Has anyone else thought about this, or know of code I should look at?
There have been a couple of papers on a technique dubbed Protocol
Informatics. There was a proof of concept implementation and some
whitepapers/presentations written by Marshall Beddoe that used to be
available at http://www.baselineresearch.net/PI/ (but is now a dead
domain... perhaps available in google cache/way back machine). The code
though appears to live on at PacketStorm:
File Name: PI.tgz http://packetstormsecurity.org/sniffers/PI.tgz
Description:
The protocol informatics project is a software
framework that allows for advanced sequence and
protocol stream analysis by utilizing
bioinformatics algorithms. The sole purpose of
this software is to identify protocol fields in
unknown or poorly documented network protocol
formats. The algorithms that are utilized
perform comparative analysis on a series of
samples to better understand the underlying
structure of the otherwise random-looking data.
The PI framework was designed for
experimentation through the use of a
widget-based component set.
Author:Marshall Beddoe
Homepage:http://www.baselineresearch.net/PI
MD5 Checksum:26b4efae961542718a9208bca030a7e7
I seem to recall another app doing automated field boundary detection,
posted fairly recently; but I'm afraid I can't find it right now. :-(
-- William
More information about the Dailydave
mailing list