Re: Audit logging to tables.

Joshua McKenzie Fri, 01 Mar 2019 09:38:49 -0800

Is there a world in which a general purpose, side-channel file storage
format for transient things like this (hints, batches, audit logs, etc)
could be useful as a first class citizen in the codebase? i.e. a world in
which we refactored some of the hints-specific reader/writer code to be
used for things like this if/when they come up?


On Thu, Feb 28, 2019 at 12:04 PM Jonathan Haddad <[email protected]> wrote:

> Agreed with Dinesh and Josh.  I would *never* put the audit log back in
> Cassandra.
>
> This is extendable, Sagar, so you're free to do as you want, but I'm very
> opposed to putting a ticking time bomb in Cassandra proper.
>
> Jon
>
>
> On Thu, Feb 28, 2019 at 8:38 AM Dinesh Joshi <[email protected]>
> wrote:
>
> > I strongly echo Josh’s sentiment. Imagine losing audit entries because C*
> > is overloaded? It’s fine if you don’t care about losing audit entries.
> >
> > Dinesh
> >
> > > On Feb 28, 2019, at 6:41 AM, Joshua McKenzie <[email protected]>
> > wrote:
> > >
> > > One of the things we've run into historically, on a *lot* of axes, is
> > that
> > > "just put it in C*" for various functionality looks great from a user
> and
> > > usability perspective, and proves to be something of a nightmare from
> an
> > > admin / cluster behavior perspective.
> > >
> > > i.e. - cluster suffering so you're writing hints? Write them to C*
> tables
> > > and watch the cluster suffer more! :)
> > > Same thing probably holds true for audit logging - at a time frame when
> > > things are getting hairy w/a cluster, if you're writing that audit
> > logging
> > > into C* proper (and dealing with ser/deser, compaction pressure,
> flushing
> > > pressure, etc) from that, there's a compounding effect of pressure and
> > pain
> > > on the cluster.
> > >
> > > So the TL;DR we as a project kind of philosophically have been moving
> > > towards (I think that's valid to say?) is: use C* for the things it's
> > > absolutely great at, and try to side-channel other recovery operations
> as
> > > much as you can (see: file-based hints) to stay out of its way.
> > >
> > > Same thing held true w/design of CDC - I debated "materialize in memory
> > for
> > > consumer to take over socket", and "keep the data in another C* table",
> > but
> > > the ramifications to perf and core I/O operations in C* the moment
> things
> > > start to go badly were significant enough that the route we went was
> "do
> > no
> > > harm". For better or for worse, as there's obvious tradeoffs there.
> > >
> > >> On Thu, Feb 28, 2019 at 7:46 AM Sagar <[email protected]>
> > wrote:
> > >>
> > >> Thanks all for the pointers.
> > >>
> > >> @Joseph,
> > >>
> > >> I have gone through the links shared by you. Also, I have been looking
> > at
> > >> the code base.
> > >>
> > >> I understand the fact that pushing the logs to ES or Solr is a lot
> > easier
> > >> to do. Having said that, the only reason I thought having something
> like
> > >> this might help is, if I don't want to add more pieces and still
> > provide a
> > >> central piece of audit logging within Cassandra itself and still be
> > >> queryable.
> > >>
> > >> In terms of usages, one of them could definitely be CDC related use
> > cases.
> > >> With data being stored in tables and being queryable, it can become a
> > lot
> > >> more easier to expose this data to external systems like Kafka
> Connect,
> > >> Debezium which have the ability to push data to Kafka for example.
> Note
> > >> that pushing data to Kafka is just an example, but what I mean is, if
> we
> > >> can have data in tables, then instead of everyone writing custom
> custom
> > >> loggers, they can hook into this table info and take action.
> > >>
> > >> Regarding the infinite loop question, I have done some analysis, and
> in
> > my
> > >> opinion, instead of tweaking the behaviour of Binlog and the way it
> > >> functions currently, we can actually spin up another tailer thread to
> > the
> > >> same Chronicle Queue which can do the needful. This way the config
> > options
> > >> etc all remain the same(apart from the logger ofcourse).
> > >>
> > >> Let me know if any of it makes sense :D
> > >>
> > >> Thanks!
> > >> Sagar.
> > >>
> > >>
> > >> On Thu, Feb 28, 2019 at 1:09 AM Dinesh Joshi
> <[email protected]
> > >
> > >> wrote:
> > >>
> > >>>
> > >>>
> > >>>> On Feb 27, 2019, at 10:41 AM, Joseph Lynch <[email protected]>
> > >>> wrote:
> > >>>>
> > >>>> Vinay can confirm, but as far as I am aware we have no current plans
> > to
> > >>>> implement audit logging to a table directly, but the implementation
> is
> > >>>> fully pluggable (like compaction, compression, etc ...). Check out
> the
> > >>> blog
> > >>>> post [1] and documentation [2] Vinay wrote for more details, but the
> > >>> short
> > >>>
> > >>> +1. I am still curious as to why you'd want to store audit log
> entries
> > >>> back in Cassandra? Depending on the scale it can generate a lot of
> load
> > >> and
> > >>> I think you'd end up in an infinite loop because as you're inserting
> > the
> > >>> audit log entry you'll generate a new one and so on unless you black
> > list
> > >>> audits to that table / keyspace.
> > >>>
> > >>> Ideally you'd insert this data into ElasticSearch / Solr or some
> other
> > >>> place that can be then used for analytics or search.
> > >>>
> > >>> Dinesh
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: [email protected]
> > >>> For additional commands, e-mail: [email protected]
> > >>>
> > >>>
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>

Re: Audit logging to tables.

Reply via email to