LogAnalysis
RE: [logs] How to define Log, Event, and Alert? Jul 24 2008 04:43PM
Tina Bird (tbird precision-guesswork com) (1 replies)
RE: [logs] How to define Log, Event, and Alert? Jul 25 2008 02:23AM
Bill Scherr IV (bschnzl cotse net) (1 replies)
Re: [logs] How to define Log, Event, and Alert? Jul 25 2008 04:14AM
Anton Chuvakin (anton chuvakin org) (1 replies)
RE: [logs] How to define Log, Event, and Alert? Jul 25 2008 09:23AM
Rainer Gerhards (rgerhards hq adiscon com) (1 replies)
RE: [logs] How to define Log, Event, and Alert? Jul 26 2008 02:48AM
Bill Scherr IV (bschnzl cotse net) (1 replies)
RE: [logs] How to define Log, Event, and Alert? Jul 29 2008 03:31PM
Rainer Gerhards (rgerhards hq adiscon com) (1 replies)
RE: [logs] How to define Log, Event, and Alert? Jul 30 2008 07:41AM
Eric Fitzgerald (Eric Fitzgerald microsoft com) (1 replies)
RE: [logs] How to define Log, Event, and Alert? Jul 30 2008 09:56AM
Rainer Gerhards (rgerhards hq adiscon com)
Hi Eric,

On Wed, 2008-07-30 at 00:41 -0700, Eric Fitzgerald wrote:
> Rainer,
>
> I have really enjoyed reading your last few posts. I would invite you
> to join the CEE working group.

I've just joined it, waiting for the first message. And now I have the
dilemma where to post this reply ;) For the time being, I stick with
loganalysis, as it started here...

> Your discussion of state was very similar to a private one that I sent
> offline in CEE. I think that from a developer standpoint, events are
> typically raised on internal state changes, but users of a system
> typically view it as a black box and don't focus on the state changes
> so much as the observable macroscopic occurrence, which might appear
> to have been a state change but might not appear to be so.

I fully agree. To me, it is just important to note that an event
originates in a state change, no matter if the user sees that or not. As
I have written in my last post, the level of granularity of the def is
also depending on whom you are targeting. I am currently at a low level,
trying to understand the implications of what happens (and you have a
couple of good points below).

> My working definition of "event" is:
>
> "An observable occurrence in an IT system."
> * observable - If we can't observe it, then we can't raise an event on
> it, so it's not interesting.
> * occurrence - We raise an event about an instantaneous occurrence,
> not about something ongoing (the beginning, end, and perhaps state
> changes within a long duration process might be events).
> * IT system - This restricts the problem domain to computer science,
> seemed like a reasonable restriction at the time. Perhaps there is a
> better term to restrict this to the realm of computers and software.

Restricting the domain is important and I missed that. That removes a
lot of ambiguity.

> For completeness, I define the following:
> Event record- a persistable data structure containing information
> about an event
> Event log- an ordered database of event records (typically but not
> always a sequential access file) [this also captures your and Bill's
> [Scherr] point about the temporal aspect of events, but does not
> restrict us to any specific technology]

One might argue that there is a third entity: an Event Log stream. Then,
that would be an "ordered set" of event records. This could then be used
to describe the on-the-wire event records as they traverse between
systems.

An event log then simply is a persisted event log stream.

Again, your use case defines the granularity to look it. I find it
important to apply the concept of "event logs" not only to databases of
any kind, but also dynamic, in-transit event records. The reason is they
have much in common and so I can apply general knowledge to both of this
cases.

> Now a few comments regarding sequence:
>
> Sequence information in the log is also often insufficient for
> determining the actual order of events; I agree that it is better than
> timestamp for the reasons that you state but I also feel that most
> logs are significantly less useful (if not useless) without
> timestamps, being the primary correlator with any other observable or
> discoverable information in the IT environment.

I agree, a timestamp is most useful, especially if provided in
high-resolution. That was a primary driving force for my IETF work to
get high-precision timestamps into the syslog RFCs.

In many cases it even is sufficient. But, for reasons you give below,
one should not assume they are totally reliable (or even sufficient for
the task). Depending on the app, this might actually be sufficient (like
web stats), but for others (evidence in court) it may be at least
questionable. Fine-grained definitions help differentiate between valid
in questionable use-cases (aka "I most know with what I deal before I
put it to use").

>
> Consider a multi-processor system where the log server is in a
> different process than the processes which are performing logged
> activities.

Even a different thread will do...

> If multiple loggable activities occur at roughly the same time, then
> whichever process' inter-process call to the log server is serviced
> first, will get the lower sequence number. Since logging often
> involves allocating memory for and marshaling string data, etc., often
> the "smaller" event makes it to the log first, or in some cases, the
> "better prepared" event- if a process that logs frequently leaves
> allocated buffers then it might be able to call the log server without
> delay, while another process is still marshaling log data, even though
> the logged activities occurred simultaneously.
>
> It gets worse. Process prioritization might influence the order in
> which the kernel services the calls to the log server, as will the
> time that the logged activity and the call to the logging service
> happened within the quantum of execution.

This is why you don't even need two processes. Let two threads execute
logging code. For the reasons you gave, you never know which one will
log first.

However, this also raises the point if order of execution (not only
order of logging) is actually something you can rely on. As everyone who
has dealt at least a bit with threads knows, you can not. (more below)

> So an activity that occurred at an earlier time on a lower-priority
> process, or an activity that occurred at an earlier time, but near the
> end of the execution quantum for the process raising the event, will
> frequently be logged sequentially after events raised by
> higher-priority processes.
>
> To an external observer, log ordering on a busy multiprocessor server
> might appear essentially random.

(continued) And, in fact, it actually is random. With increasing
parallelism in todays multi-core CPUs, developers begin to notice that.
Everything actually is random (not from a cryptographer's view, but
maybe even from their POV...) except if you explicitely serialize it.

If you use a logging server process, you lose even minimum order of
events because you now have the process switching and buffering
involved. So from the log server's POV, a file may actually be deleted
before it is created...

> A group here at Microsoft did some research recently into reliable
> ordering of events. I am currently unable to locate a published paper
> on this but I am still looking. If I find it I will forward the link.
> In short it is possible, even across multiple systems, to be able to
> reliably reconstruct order in limited cases but there are severe
> constraints on the scenarios. I can't go into more detail until I
> determine the IP status of the work.

This paper would of course be a very interesting read. I guess one of
its main points is that you actually need to serialize calls to the
logging function. This is probably not a big constraint on a dual cpu
system and a process running on few threads. But the more parallel
processing you get, the higher is the toll you have to pay for
synchronizing activities. One relatively cheap way around this is a
sequence counter. But it requires that the logging function is executed
in shared memory and, again, you get a bottleneck if there are enough
competing processes. The idea of the sequence counter is that you do an
atomic increment on a sufficiently large integer and include its value
with the log messages (thus there is such a field in the upcomig syslog
protocol RFC). That does not remove the thread scheduling issues. And it
is NOT lightweight if you need to do this on a global system basis
(because you no longer can use a simple atomic add CPU opcode), but is
one thing to approach it.

The bottom line, and here I fully agree, is that reliable log sequence
must place big constraints on any system that tries it. IMHO it is NOT
suited for tomorrows highly-parallel systems. In these environments, we
must have a granular log function, where sequence is preserved only for
a given subsystem (thread).

I was long hesitant to include asynchronous processing in rsyslog
( http://www.rsyslog.com ): in a very obvious way, async processing and,
even more, queuing of in-transit log messages destroys order of events.
Because, in my terms, not all meta-properties are usually perserved.
Over time, however, there is no way to deny using this functionality.
You can't do a high performance log server (or <whatever server>)
without going concurrent these days. So I had to give up trying to keep
(naive) sequence. Now, if rsyslog is configured to use async actions and
queues, users give up even naive sequence. The only thing I currently
can advise is to disable all these features if someone actually needs
this sequence. So far, nobody asked. I know this is not a good enough
sample of the logging world, because enterprises only gradually turn to
rsyslog (and have a couple of different solutions). If the missing
sequence indicator really becomes a problem, I know I can save this by
recording (at least some of) the meta properties. Thus I find it vitally
important to know they exist ;)

> Sequence information IS useful for gap detection and for a rough idea
> of ordering. I'm not sure that high-precision timestamping on logs is
> very useful except perhaps on single-processor machines with
> cooperative rather than pre-emptive multitasking OS, or on real-time
> OS.

It reduces the magnitude of the problem, but, as said above, does not
solve it. One may just think about the fact that any real high precision
timestamp (at least on the usual hardware) is of course an approximation
in itself...
>
> I do think that adding a "collection timestamp" at a log collection
> server is very useful as this solves almost all the problems of
> timestamp normalization, and it doesn't make ordering any worse than
> it already is.

I agree, but would like to note that its utility is sharply reduced in
relay chains, especially with occasionally disconnected system in them.
I remember doing log collection for ships, which had connectivity only
at certain spots but most of the time were disconnected - obviously,
queueing plus "sufficiently" reliable and precise timestamps were a good
solution in that case. Meta-property recording would have been another.

But, in most use cases, a collection timestamp (being itself
approximate ;)) is good enough.

Rainer

>
> Eric
>
>
> ______________________________________________________________________
> From: loganalysis-bounces (at) loganalysis (dot) org [email concealed]
> [loganalysis-bounces (at) loganalysis (dot) org [email concealed]] On Behalf Of Rainer Gerhards
> [rgerhards (at) hq.adiscon (dot) com [email concealed]]
> Sent: Tuesday, July 29, 2008 8:31 AM
> To: bschnzl (at) cotse (dot) net [email concealed]
> Cc: loganalysis (at) loganalysis (dot) org [email concealed]
> Subject: RE: [logs] How to define Log, Event, and Alert?
>
>
>
> <inline...>
>
> On Fri, 2008-07-25 at 22:48 -0400, Bill Scherr IV wrote:
> > Wow! Comments interlaced!
> >
> > Circa 11:23, 25 Jul 2008, a note, claiming source Rainer Gerhards
> <rgerhards (at) hq.adiscon (dot) com [email concealed]>, was sent to me:
> >
> > Subject: RE: [logs] How to define Log, Event, and
> Alert?
> > Date sent: Fri, 25 Jul 2008 11:23:40 +0200
> > From: "Rainer Gerhards" <rgerhards (at) hq.adiscon (dot) com [email concealed]>
> > To: "Anton Chuvakin" <anton (at) chuvakin (dot) org [email concealed]>,
> <bschnzl (at) cotse (dot) net [email concealed]>
> > Copies to: <loganalysis (at) loganalysis (dot) org [email concealed]>
> >
> > > If I may generalize things a bit...
> > >
> > > I'd replace TIME by "sequence identifier", with sequence
> identifier
> > > defined as being something that is monotonically increasing. A
> timestamp
> > > is an object that we think to be a natural example of a
> monotonically
> > > increasing function. HOWEVER, if we look at existing technology,
> this is
> > > not always the case. In fact, it is more often NOT the case than
> it
> > > is... If we have two systems a and b and these systems do not have
> time
> > > synchronized, and have a system c which is the event collector
> (and
> > > collects only events from a and b), then c may record time stamps
> inside
> > > its log that do not monotonically increase. For example, it may
> record:
> > >
> > > 02:00:00 event a1
> > > 01:00:00 event b1
> > > 02:01:00 event a2
> > > 01:01:00 event b2
> > >
> > If we keep the original log intact, the out of sync timestamps are
> information in themselves.
>
> My point was just in relation to that log. You even do not know - with
> only the information you see - if the log is out of sync. For example,
> some time zone may be improperly configured. As many devices do not
> convey timezone info, you need to reset to metadata to do that. So in
> short: the timestamp alone is insufficient.
>
> >
> > > Of course, this is still a TIMED record of occurrences. However,
> in this
> > > sense this is used, "TIMED" includes a sense of temporal order (at
> least
> > > to me). In the above log, we may not have the correct temporal
> order.
> > > We may be able to reconstruct it by sorting on the timestamp. That
> would
> > > be a valid approach if the timestamps are indeed correct (compared
> to
> > > universal time). But if a and/or b has incorrect time, we would
> create a
> > > wrong temporal order. Indeed, in this sense the monotonically
> increasing
> > > identity of the log in question would actually not be the
> timestamp but
> > > rather the *sequence of recording*, kind of a meta-property not
> directly
> > > contained in the property set of the individual event record (but
> rather
> > > obtained by its relationship to its predecessor in the log file).
> > >
> > If a file is deleted before it is created, then we locate the shift
> in timestamps. Different context has different meaning. For
> > instance, a timestamp shift coincident with the file deletion says
> real badness, while a timeshift occuring an odd time before
> > (3hrs, 24mins, 5 secs) may just indicate that an admin noticed that
> ntpd had died. Rather than try to sync the time in the
> > logs, what can we do with variations? Automatically?
>
> I am not talking about a shift in timestamp. I am talking about
> insufficient resolution. The system I describe is working fine, but it
> emits only second-resolution timestamps. For a file delete and file
> create, you can not differentiate any sequence without resorting to
> meta-information (like sequence of records inside the event log).
>
> >
> > > Now let's assume a log without a timestamp. These things happens,
> e.g.
> > > in debug logs (and all too often in others I have seen).
> > >
> > > If we define
> > >
> > > > Log = a TIMED record of the above occurence.
> > >
> > > such a "log" would obviously not be a log, because it does not
> fulfill
> > > the requirement to be timed.
> > >
> > > If a log instead is "a record of events with a sequence
> identifier",
> > > that problem does not exist. The sequence identifier in that case
> would
> > > be the derived property I mentioned above.
> > >
> > The sequence is the order in which it appears in the log, plus or
> minus network latency. (wait for it)
>
> What happens if the logs were queued for some time at an interim
> system?
> Again, you need to have the meta-information, just the timestamp is
> insufficient.
>
> >
> > > The question remains if such a definition is actually useful. The
> > > sequence identifier is obviously something with very vague
> semantics.
> > > They depend on the observer as well as the correctness of the
> "sequence
> > > identifier generating function" on all systems in question.
> > >
> > > Let's get back to the simple case of timestamps: as outlined
> above, the
> > > semantics of timestamps depend on time sync. Even if there is ntp
> > > timesync, timestamps (with reasonable precision) are always
> > > questionable. They are approximate, even on the same system. With
> > > standard syslog timestamps (second precision!) the problem is easy
> to
> > > see: one may receive hundreds of events within the same second. So
> even
> > > if time is correct, an observer is unable to detect any order of
> events.
> > > If looking just at the timestamps, one must conclude that all
> events
> > > happened at once. If looking at the semantics of the messages, one
> most
> > > often also can conclude this is impossible (e.g. how to delete a
> file
> > > before it is created?). Obviously, the timestamp alone is never
> > > sufficient to detect order of events, even on a single system.
> Granted,
> > > for practical purposes a high resolution timestamp (with good time
> > > synchronization) is most often a sufficiently well approximation
> of the
> > > time an event happened. But do you really trust it? ...always?
> Have a
> > > look at your own correlation engines: do they work on pure
> timestamps -
> > > or do they include some other properties, like the order of event
> log
> > > records inside the log?
> > >
> > Why are you looking at logs? Is the machine broke? There should be
> plenty of context to render a correct determination. Is
> > the machine compromised? Then you must find an independent log that
> says roughly the same thing.
>
> Of course. But, and this is my point, you can not just rely on the
> timestamp (as you say).
>
> >
> > > Now let me try to define what I think a log actually is:
> > >
> > > An EVENT is a set of properties that describe a state change (in
> the
> > > sense I have described state change yesterday). The contents of
> this set
> > > is depending on the entity who's state changes as well as on the
> > > observer. [so it may actually be a set of two sets: entity-related
> > > properties and observer-related properties]
> > >
> > > An event is generated when a state changes.
> > >
> > "when" being the temporal aspect, recorded from the view of the
> program sensing the state change.
>
> I fully agree. IMO a "state change" is also a temporal aspect. I think
> "state *change*" implies that something has *now* state y while it had
> state x a (maybe very) short while ago. So I think there is always a
> temporal component inside a state change.
>
> <snip>
>
> >
> > > There is no inherent order inside event logs. In practice we
> experience
> > > a "natural order" (one record begins before another), but that
> actually
> > > is a meta-property of the event record. So we can order event
> records
> > > based on their meta-properties. It just happens that a text log is
> > > physically ordered by the sequence meta property.
> > >
>
> <snip>
>
> > > >
> > > > On Thu, Jul 24, 2008 at 7:23 PM, Bill Scherr IV
> <bschnzl (at) cotse (dot) net [email concealed]>
> > > > wrote:
> > > > > So...
> > > > >
> > > > > I gather a temporal mention to be appropriate beyond the
> definition
> > > > of the Log. Also, most systems break off their logs by
> > > > > size, not time. Although there is a definite time to each
> log, they
> > > > are not consistent, even with the same log gatherer. Right or
> > > > > wrong, that is how I find them. Suggestions below (if I may
> be so
> > > > bold):
> > > > >
> > > > > Circa 11:26, 23 Jul 2008, a note, claiming source Heinbockel,
> Bill
> > > > <heinbockel (at) mitre (dot) org [email concealed]>, was sent to me:
> > > > >
> > > > > From: "Heinbockel, Bill"
> <heinbockel (at) mitre (dot) org [email concealed]>
> > > > > To: <loganalysis (at) loganalysis (dot) org [email concealed]>
> > > > > Subject: [logs] How to define Log, Event, and
> Alert?
> > > > >
> > > > >>
> > > > >>
> > > > >> Here is our initial shot at defining these terms:
> > > > >>
> > > > >>
> > > > >> Event:
> > > > >> A discrete, distinct, and discernible state change in
> an
> > > > >> environment.
> > > > >
> > > > > A discrete, distinct, and discernible state change in an
> environment
> > > > at a recorded (or given) time.
> > > > >>
>
_______________________________________________
LogAnalysis mailing list
LogAnalysis (at) loganalysis (dot) org [email concealed]
http://www.loganalysis.org/mailman/listinfo/loganalysis

[ reply ]


 

Privacy Statement
Copyright 2010, SecurityFocus