Re: [Bro-Dev] Broker data layouts

Dominik Charousset Thu, 23 Aug 2018 09:03:56 -0700

> Dominik, wasn't the original idea for VAST to provide an event
> description language that would create the link between the values
> coming over the wire and their interpretation? Such a specification
> could be auto-generated from Bro's knowledge about the events it
> generates.


We were actually thinking about auto-generating the schema. But broker::data 
simply has no meta information that we can use. Even distinguishing 
records/tuples from actual lists is impossible, because broker::vector is used 
for both. Of course we can make a couple of assumptions (the top-level vector 
is a record, for example), but then VAST users only ever can use type queries. 
In other words, they can only ask for IP addresses for example, but not 
specifically for originator IPs.

In a sense, broker’s representation is an inverted JSON. In JSON, we have field 
names but no type information (everything is a string), whereas in broker we 
have (ambiguous) type information but no field names. :)


>> Though the Broker data corresponding to log entry content is also
>> opaque at the moment (I recall that was maybe for performance or
>> message volume optimization),
> 
> Yeah, but generally this is something I could see opening up. The log
> structure is pretty straight-forward and self-describing, it'd be
> mostly a matter of clean up and documentation to make that directly
> accessible to external consumers I think. Events, on the other hands,
> are semantically tied very closely to the scripts generating them, and
> also much more diverse so that self-description doesn't really seem
> feasible/useful. Republishing a relevant subset certainly sounds
> better for that; or, if it's really a bulk feed that's desired, some
> out-of-band mechanism to convey the schema information somehow.

Opening that up would be great.

However, our goal was to have Broker as a source for structured data that we 
can import in a generic fashion for later analysis. Of course that relies on a 
standard / convention / best practice for making schema programmatically 
accessible. Currently, it seems that we need a schema definition provided by 
the user offline. This will work as long as all published data for a given 
topic is uniform. Multiplexing multiple event types already makes things 
complicated, but it seems like this is actually the standard use case. OSQuery, 
for example, will generate different events that we than either need to 
separate into different topics or multiplex in a single topic but merge-in some 
meta information. And once we mix in meta information with actual data, a 
simple schema definition no longer cuts it. At worst, importing data from 
Broker requires a separate parser for each import format.


> broker/bro.hh is basically all there is right now

I’m a bit hesitant to rely on this header at the moment, because of:

/// A Bro log-write message. Note that at the moment this should be used only
/// by Bro itself as the arguments aren't publicly defined.

Is the API stable enough on your end at this point to make it public? Also, 
there are LogCreate and LogWrite events. The LogCreate has the `fields_data` (a 
list of field names?). Does that mean I need to receive the LogCreate even 
first to understand successive LogWrite events? That would mean I cannot parse 
logs that had their LogCreate event before I was able to subscribe to the topic.

    Dominik
_______________________________________________
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker data layouts

Reply via email to