[Dailydave] [fuzzing] Coverage and a recent paper by L. Suto
matthew wollenweber
mwollenweber at gmail.com
Wed Oct 17 00:28:44 EDT 2007
>
> The main downside is that I don't know why it is that finding bugs
> should depend linearly on anything. is it really true that if run A
> explores 10 more blocks than run B then run A has a .10 better chance of
> finding a bug in the program?
I think the above is the crux of the problem. Despite lots of intuition on
where the bugs are or how to find them, there hasn't been much analysis (to
my knowledge) of modeling bug prediction. Clearly a lot of bugs occur where
unchecked data is transferred into fixed sized buffers, but from a system
perspective I've not seen anything quantifiable that suggests where bugs
will be. For your weights a,b,c (or even the linear equation) you're right
that we don't have metrics on what the best fit looks like.
I think it would be interesting for an academic or two to put a lot of
"small enough" programs through a lot of fuzzers. Fuzz almost all/most of
the possible paths and then analyze the common traits to answer basic
questions as to how diversity, coverage, and interesting functions relate to
finding bugs. (to any professors out there: I haven't finished my masters
yet and would love to find an interesting program). The cool research out
there (Molnar, Aitel, DeMott, etc) seems to focus a lot on the feedback
systems to find more bugs, but as of yet, I've seen no reason as to why any
fitness function is well suited to the problem.
On 10/16/07, David Molnar <dmolnar at eecs.berkeley.edu> wrote:
>
> On Mon, 2007-10-15 at 21:38 -0700, J.M. Seitz wrote:
>
> > So, let's try this:
> >
> > spankage = code coverage + (path diversity weight * 2) + (num
> > interesting funcs hit * 5)
> >
> > This stems from many talks with Jared, Pedram, and Charlie. I also
> > proposed that a proximity value to dangerous functions be thrown in, but
> > let's not get too far ahead of ourselves.
>
> Cool, this is interesting. I wonder if you could also flip it around and
> try to estimate the values of those coefficients instead of picking "2"
> and "5". I don't think those are bad values, just it'd be interesting if
> there is a way to derive them somehow from data about which fuzzing runs
> have been successful so far.
>
> One way this could work is as follows: suppose we have a set of fuzzing
> runs R_1, R_2,...,R_n, and suppose for each run we know whether the run
> found a bug or not. Let's define Y_i to be 1 if the run R_i found at
> least one bug and Y_i to be 0 otherwise. Then we could set up a linear
> model:
>
> Y_i = a + b*coverage_i + c*path diversity_i + d*num int.funcs_i +error_i
>
> This just says we think whether a bug is found on run i depends linearly
> on coverage, path diversity, and number of interesting functions, plus
> an "intercept term" a and an error term. In your equation above for
> "spankage", we have a = 0, b = 1, c = 2, d =5.
>
> The error represents among other things the effect of all the other
> variables we did not measure. Like, there's no variable in this equation
> for "did the developers do a lot of threat modeling," so if that has a
> lot of impact on whether a specific run finds a bug, then we will see
> that as "error" in estimating how important coverage, path diversity,
> and #interesting functions is to finding a bug. The error can also
> account for things like randomization in the fuzzer, difficulty in
> measuring path diversity, and so on.
>
> The main point is that if we assume this model holds we can then
> estimate values for a,b,c, and d from the data using any of several
> methods. For example, if we believe the errors are independent of the
> other variables, and if we believe the errors have a mean value of 0,
> then we can use ordinary least squares regression. We can then look at
> the resulting estimates for a,b,c, and d to give us some idea of whether
> coverage, path diversity, and number of interesting functions in fact
> have an impact on finding bugs. Finally, we could feed the resulting
> estimates back into your Fuzzing Proxy Coverage Spanker-abob.
>
> The main downside is that I don't know why it is that finding bugs
> should depend linearly on anything. is it really true that if run A
> explores 10 more blocks than run B then run A has a .10 better chance of
> finding a bug in the program? Mainly this came to mind because I'm just
> learning about linear regressions now and have to read dozens of papers
> where people run, like, regressions of computer use vs. wages and
> compare them to regressions of pencil use vs. wages to see whether
> computers or pencils better predict high wages. Still, it seems like we
> should be able to leverage the data generated on which fuzzer runs
> produce bugs and which do not.
>
> -David Molnar
>
>
--
Matthew Wollenweber
mwollenweber at gmail.com | mjw at cyberwart.com
www.cyberwart.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.immunitysec.com/pipermail/dailydave/attachments/20071017/40a4645a/attachment.htm
More information about the Dailydave
mailing list