Is there a world in which a general purpose, side-channel file storage format for transient things like this (hints, batches, audit logs, etc) could be useful as a first class citizen in the codebase? i.e. a world in which we refactored some of the hints-specific reader/writer code to be used for things like this if/when they come up?
On Thu, Feb 28, 2019 at 12:04 PM Jonathan Haddad <j...@jonhaddad.com> wrote: > Agreed with Dinesh and Josh. I would *never* put the audit log back in > Cassandra. > > This is extendable, Sagar, so you're free to do as you want, but I'm very > opposed to putting a ticking time bomb in Cassandra proper. > > Jon > > > On Thu, Feb 28, 2019 at 8:38 AM Dinesh Joshi <djos...@icloud.com.invalid> > wrote: > > > I strongly echo Josh’s sentiment. Imagine losing audit entries because C* > > is overloaded? It’s fine if you don’t care about losing audit entries. > > > > Dinesh > > > > > On Feb 28, 2019, at 6:41 AM, Joshua McKenzie <jmcken...@apache.org> > > wrote: > > > > > > One of the things we've run into historically, on a *lot* of axes, is > > that > > > "just put it in C*" for various functionality looks great from a user > and > > > usability perspective, and proves to be something of a nightmare from > an > > > admin / cluster behavior perspective. > > > > > > i.e. - cluster suffering so you're writing hints? Write them to C* > tables > > > and watch the cluster suffer more! :) > > > Same thing probably holds true for audit logging - at a time frame when > > > things are getting hairy w/a cluster, if you're writing that audit > > logging > > > into C* proper (and dealing with ser/deser, compaction pressure, > flushing > > > pressure, etc) from that, there's a compounding effect of pressure and > > pain > > > on the cluster. > > > > > > So the TL;DR we as a project kind of philosophically have been moving > > > towards (I think that's valid to say?) is: use C* for the things it's > > > absolutely great at, and try to side-channel other recovery operations > as > > > much as you can (see: file-based hints) to stay out of its way. > > > > > > Same thing held true w/design of CDC - I debated "materialize in memory > > for > > > consumer to take over socket", and "keep the data in another C* table", > > but > > > the ramifications to perf and core I/O operations in C* the moment > things > > > start to go badly were significant enough that the route we went was > "do > > no > > > harm". For better or for worse, as there's obvious tradeoffs there. > > > > > >> On Thu, Feb 28, 2019 at 7:46 AM Sagar <sagarmeansoc...@gmail.com> > > wrote: > > >> > > >> Thanks all for the pointers. > > >> > > >> @Joseph, > > >> > > >> I have gone through the links shared by you. Also, I have been looking > > at > > >> the code base. > > >> > > >> I understand the fact that pushing the logs to ES or Solr is a lot > > easier > > >> to do. Having said that, the only reason I thought having something > like > > >> this might help is, if I don't want to add more pieces and still > > provide a > > >> central piece of audit logging within Cassandra itself and still be > > >> queryable. > > >> > > >> In terms of usages, one of them could definitely be CDC related use > > cases. > > >> With data being stored in tables and being queryable, it can become a > > lot > > >> more easier to expose this data to external systems like Kafka > Connect, > > >> Debezium which have the ability to push data to Kafka for example. > Note > > >> that pushing data to Kafka is just an example, but what I mean is, if > we > > >> can have data in tables, then instead of everyone writing custom > custom > > >> loggers, they can hook into this table info and take action. > > >> > > >> Regarding the infinite loop question, I have done some analysis, and > in > > my > > >> opinion, instead of tweaking the behaviour of Binlog and the way it > > >> functions currently, we can actually spin up another tailer thread to > > the > > >> same Chronicle Queue which can do the needful. This way the config > > options > > >> etc all remain the same(apart from the logger ofcourse). > > >> > > >> Let me know if any of it makes sense :D > > >> > > >> Thanks! > > >> Sagar. > > >> > > >> > > >> On Thu, Feb 28, 2019 at 1:09 AM Dinesh Joshi > <djos...@icloud.com.invalid > > > > > >> wrote: > > >> > > >>> > > >>> > > >>>> On Feb 27, 2019, at 10:41 AM, Joseph Lynch <joe.e.ly...@gmail.com> > > >>> wrote: > > >>>> > > >>>> Vinay can confirm, but as far as I am aware we have no current plans > > to > > >>>> implement audit logging to a table directly, but the implementation > is > > >>>> fully pluggable (like compaction, compression, etc ...). Check out > the > > >>> blog > > >>>> post [1] and documentation [2] Vinay wrote for more details, but the > > >>> short > > >>> > > >>> +1. I am still curious as to why you'd want to store audit log > entries > > >>> back in Cassandra? Depending on the scale it can generate a lot of > load > > >> and > > >>> I think you'd end up in an infinite loop because as you're inserting > > the > > >>> audit log entry you'll generate a new one and so on unless you black > > list > > >>> audits to that table / keyspace. > > >>> > > >>> Ideally you'd insert this data into ElasticSearch / Solr or some > other > > >>> place that can be then used for analytics or search. > > >>> > > >>> Dinesh > > >>> --------------------------------------------------------------------- > > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >>> > > >>> > > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > -- > Jon Haddad > http://www.rustyrazorblade.com > twitter: rustyrazorblade >