Content Inspection - Statistical methods Aug 08 2009 05:45PM
Glenn Wilkinson (glenn wilkinson gmail com) (2 replies)
Re: Content Inspection - Statistical methods Aug 11 2009 07:03PM
Richard Bejtlich (taosecurity gmail com) (1 replies)
On Sat, Aug 8, 2009 at 1:45 PM, Glenn
Wilkinson<glenn.wilkinson (at) gmail (dot) com [email concealed]> wrote:
> Hello IDS folks,
> I'm currently doing a mini-project involving applying machine learning
> techniques to the identification of hostile network traffic. My focus
> is on TCP traffic, and I'm looking at header and content based
> inspection. I'm wrapping up my feature extraction code now, whereby
> I've imported all TCP sessions from the DARPA training sets into a DB
> and have tagged the hostile sessions.
> My question is, does anyone have any bright ideas of some useful,
> simple content analysis attributes? As it's a statistical/ML approach
> I'm trying to come up with as generic as possible ideas. So far I'm
> calculating things like session data entropy, most frequent character,
> counts of certain characters.
> I'm brand new to this field, but am really excited about this project.
> Any feedback/advice would be greatly appreciated.
> Thanks!
> G

Hi Glenn,

How about NOT using the DARPA data sets? Maybe something more modern?



