LogAnalysis
Re: [logs] regexless parsing, again? Sep 13 2007 10:55PM
Marcus J. Ranum (mjr ranum com) (1 replies)
Re: [logs] regexless parsing, again? Sep 14 2007 09:18AM
Mordechai T. Abzug (morty frakir org) (1 replies)
Re: [logs] regexless parsing, again? Sep 14 2007 04:45PM
Marcus J. Ranum (mjr ranum com) (2 replies)
Re: [logs] regexless parsing, again? Sep 14 2007 07:59PM
David Corlette (dcorlette novell com) (1 replies)
Re: [logs] regexless parsing, again? Sep 14 2007 08:08PM
Marcus J. Ranum (mjr ranum com)
Re: [logs] regexless parsing, again? Sep 14 2007 07:14PM
Mordechai T. Abzug (morty frakir org)
On Fri, Sep 14, 2007 at 12:45:12PM -0400, Marcus J. Ranum wrote:

> In the case of a lookahead in a parser you're still going to prune
> the search with a single lookahead, whereas with a regex, you've got
> to try N additional regexes no matter what. Unless you order the
> regexes into a parse tree (of sorts) - which I've seen done - but
> that's just putting lipstick on a pig.

One approach I've used to improve the performance of regexes is to OR
all the regexes together. I.e. instead of trying N regexes, regex1,
regex2, regex3, etc., in sequence, you do:

(regex1{sideeffect1}|regex2{sideeffect2}|regex3{sideeffect3}|etc)

The regex engine compiles the big regex once into a state machine, so
it's a lot faster than N regexes. You test each message against the
big regex. The side effect lets you figure out which regex actually
matched, so you then do a big case statement on the side effect to
figure out how to disposition the data. [You also want to run the
message just against the sub-regex, but that's not a big deal.]

I've also implemented a pre-processor that allows token-like
subregexes for things like IPs, zones, mail addresses, etc.

However, as you say, it's still lipstick on a pig. In particular, if
a message fails to match on the big regexes, you're going to have to
debug from square one. Debugging badly-formed regexes becomes harder,
although you can improve this by doing a pre-pass of testing each
regex at the start of the run. And I never even considered testing
for overlapping.

- Morty
_______________________________________________
LogAnalysis mailing list
LogAnalysis (at) loganalysis (dot) org [email concealed]
http://www.loganalysis.org/mailman/listinfo/loganalysis

[ reply ]


 

Privacy Statement
Copyright 2010, SecurityFocus