|
LogAnalysis
Re: [logs] regexless parsing, again? Sep 13 2007 10:55PM Marcus J. Ranum (mjr ranum com) (1 replies) Re: [logs] regexless parsing, again? Sep 14 2007 09:18AM Mordechai T. Abzug (morty frakir org) (1 replies) Re: [logs] regexless parsing, again? Sep 14 2007 04:45PM Marcus J. Ranum (mjr ranum com) (2 replies) Re: [logs] regexless parsing, again? Sep 14 2007 07:59PM David Corlette (dcorlette novell com) (1 replies) |
|
Privacy Statement |
> In the case of a lookahead in a parser you're still going to prune
> the search with a single lookahead, whereas with a regex, you've got
> to try N additional regexes no matter what. Unless you order the
> regexes into a parse tree (of sorts) - which I've seen done - but
> that's just putting lipstick on a pig.
One approach I've used to improve the performance of regexes is to OR
all the regexes together. I.e. instead of trying N regexes, regex1,
regex2, regex3, etc., in sequence, you do:
(regex1{sideeffect1}|regex2{sideeffect2}|regex3{sideeffect3}|etc)
The regex engine compiles the big regex once into a state machine, so
it's a lot faster than N regexes. You test each message against the
big regex. The side effect lets you figure out which regex actually
matched, so you then do a big case statement on the side effect to
figure out how to disposition the data. [You also want to run the
message just against the sub-regex, but that's not a big deal.]
I've also implemented a pre-processor that allows token-like
subregexes for things like IPs, zones, mail addresses, etc.
However, as you say, it's still lipstick on a pig. In particular, if
a message fails to match on the big regexes, you're going to have to
debug from square one. Debugging badly-formed regexes becomes harder,
although you can improve this by doing a pre-pass of testing each
regex at the start of the run. And I never even considered testing
for overlapping.
- Morty
_______________________________________________
LogAnalysis mailing list
LogAnalysis (at) loganalysis (dot) org [email concealed]
http://www.loganalysis.org/mailman/listinfo/loganalysis
[ reply ]