Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

Jeff Jirsa Fri, 01 Feb 2019 14:59:27 -0800

Iterate over all of the possible time buckets.


On Fri, Feb 1, 2019 at 1:36 PM Carl Mueller
<[email protected]> wrote:

> I'd still need a "all events for app_id" query. We have seconds-level
> events :-(
>
>
> On Fri, Feb 1, 2019 at 3:02 PM Jeff Jirsa <[email protected]> wrote:
>
> > On Fri, Feb 1, 2019 at 12:58 PM Carl Mueller
> > <[email protected]> wrote:
> >
> > > Jeff: so the partition key with timestamp would then need a separate
> > index
> > > table to track the appid->partition keys. Which isn't horrible, but
> also
> > > tracks into another desire of mine: some way to make the replica
> mapping
> > > match locally between the index table and the data table:
> > >
> > > So in the composite partition key for the TWCS table, you'd have
> app_id +
> > > timestamp, BUT ONLY THE app_id GENERATES the hash/key.
> > >
> > >
> > Huh? No, you'd have a composite partition key of app_id + timestamp
> > ROUNDED/CEIL/FLOOR to some time window, and both would be used for
> > hash/key.
> >
> > And you dont need any extra table, because app_id is known and the
> > timestamp can be calculated (e.g., 4 digits of year + 3 digits for day of
> > year makes today 2019032 )
> >
> >
> >
> > > Thus it would match with the index table that is just partition key
> > app_id,
> > > column key timestamp.
> > >
> > > And then theoretically a node-local "join" could be done without an
> > > additional query hop, and batched updates would be more easily atomic
> to
> > a
> > > single node.
> > >
> > > Now how we would communicate all that in CQL/etc: who knows. Hm. Maybe
> > > materialized views cover this, but I haven't tracked that since we
> don't
> > > have versions that support them and they got "deprecated".
> > >
> > >
> > > On Fri, Feb 1, 2019 at 2:53 PM Carl Mueller <
> > [email protected]>
> > > wrote:
> > >
> > > > Interesting. Now that we have semiautomated upgrades, we are going to
> > > > hopefully get everything to 3.11X once we get the intermediate hop to
> > > 2.2.
> > > >
> > > > I'm thinking we could also use sstable metadata markings + custom
> > > > compactors for things like multiple customers on the same table. So
> you
> > > > could sequester the data for a customer in their own sstables and
> then
> > > > queries could effectively be subdivided against only the sstables
> that
> > > had
> > > > that customer. Maybe the min and max would cover that, I'd have to
> look
> > > at
> > > > the details.
> > > >
> > > > On Thu, Jan 31, 2019 at 8:11 PM Jonathan Haddad <[email protected]>
> > > wrote:
> > > >
> > > >> In addition to what Jeff mentioned, there was an optimization in 3.4
> > > that
> > > >> can significantly reduce the number of sstables accessed when a
> LIMIT
> > > >> clause was used.  This can be a pretty big win with TWCS.
> > > >>
> > > >>
> > > >>
> > >
> >
> http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html
> > > >>
> > > >> On Thu, Jan 31, 2019 at 5:50 PM Jeff Jirsa <[email protected]>
> wrote:
> > > >>
> > > >> > In my original TWCS talk a few years back, I suggested that people
> > > make
> > > >> > the partitions match the time window to avoid exactly what you’re
> > > >> > describing. I added that to the talk because my first team that
> used
> > > >> TWCS
> > > >> > (the team for which I built TWCS) had a data model not unlike
> yours,
> > > and
> > > >> > the read-every-sstable thing turns out not to work that well if
> you
> > > have
> > > >> > lots of windows (or very large partitions). If you do this, you
> can
> > > fan
> > > >> out
> > > >> > a bunch of async reads for the first few days and ask for more as
> > you
> > > >> need
> > > >> > to fill the page - this means the reads are more distributed, too,
> > > >> which is
> > > >> > an extra bonus when you have noisy partitions.
> > > >> >
> > > >> > In 3.0 and newer (I think, don’t quote me in the specific
> version),
> > > the
> > > >> > sstable metadata has the min and max clustering which helps
> exclude
> > > >> > sstables from the read path quite well if everything in the table
> is
> > > >> using
> > > >> > timestamp clustering columns. I know there was some issue with
> this
> > > and
> > > >> RTs
> > > >> > recently, so I’m not sure if it’s current state, but worth
> > considering
> > > >> that
> > > >> > this may be much better on 3.0+
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Jeff Jirsa
> > > >> >
> > > >> >
> > > >> > > On Jan 31, 2019, at 1:56 PM, Carl Mueller <
> > > >> [email protected]>
> > > >> > wrote:
> > > >> > >
> > > >> > > Situation:
> > > >> > >
> > > >> > > We use TWCS for a task history table (partition is user, column
> > key
> > > is
> > > >> > > timeuuid of task, TWCS is used due to tombstone TTLs that rotate
> > out
> > > >> the
> > > >> > > tasks every say month. )
> > > >> > >
> > > >> > > However, if we want to get a "slice" of tasks (say, tasks in the
> > > last
> > > >> two
> > > >> > > days and we are using TWCS sstable blocks of 12 hours).
> > > >> > >
> > > >> > > The problem is, this is a frequent user and they have tasks in
> ALL
> > > the
> > > >> > > sstables that are organized by the TWCS into time-bucketed
> > sstables.
> > > >> > >
> > > >> > > So Cassandra has to first read in, say 80 sstables to
> reconstruct
> > > the
> > > >> > row,
> > > >> > > THEN it can exclude/slice on the column key.
> > > >> > >
> > > >> > > Question:
> > > >> > >
> > > >> > > Or am I wrong that the read path needs to grab all relevant
> > sstables
> > > >> > before
> > > >> > > applying column key slicing and this is possible? Admittedly we
> > are
> > > in
> > > >> > 2.1
> > > >> > > for this table (we in the process of upgrading now that we have
> an
> > > >> > > automated upgrading program that seems to work pretty well)
> > > >> > >
> > > >> > > If my assumption is correct, then the compaction strategy knows
> as
> > > it
> > > >> > > writes the sstables what it is bucketing them as (and could
> encode
> > > in
> > > >> > > sstable metadata?). If my assumption about slicing is that the
> > whole
> > > >> row
> > > >> > > needs reconstruction, if we had a perfect infinite monkey coding
> > > team
> > > >> > that
> > > >> > > could generate whatever we wanted within some feasibility, could
> > we
> > > >> > provide
> > > >> > > special hooks to do sstable exclusion based on metadata if we
> know
> > > >> that
> > > >> > > that the metadata will indicate exclusion/inclusion of columns
> > based
> > > >> on
> > > >> > > metadata?
> > > >> > >
> > > >> > > Goal:
> > > >> > >
> > > >> > > The overall goal would be to support exclusion of sstables from
> a
> > > read
> > > >> > > path, in case we had compaction strategies hand-tailored for
> other
> > > >> > queries.
> > > >> > > Essentially we would be doing a first-pass bucketsort exclusion
> > with
> > > >> the
> > > >> > > sstable metadata marking the buckets. This might aid support of
> > > >> superwide
> > > >> > > rows and paging through column keys if we allowed the table
> > creator
> > > to
> > > >> > > specify bucketing as flushing occurs. In general it appears
> query
> > > >> > > performance quickly degrades based on # sstables required for a
> > > >> lookup.
> > > >> > >
> > > >> > > I still don't know the code nearly well enough to do patches, it
> > > would
> > > >> > seem
> > > >> > > based on my looking at custom compaction strategies and the
> basic
> > > read
> > > >> > path
> > > >> > > that this would be a useful extension for advanced users.
> > > >> > >
> > > >> > > The fallback would be a set of tables to serve as buckets and we
> > > span
> > > >> the
> > > >> > > buckets with queries when one bucket runs out. The tables
> rotate.
> > > >> >
> > > >> >
> > ---------------------------------------------------------------------
> > > >> > To unsubscribe, e-mail: [email protected]
> > > >> > For additional commands, e-mail: [email protected]
> > > >> >
> > > >> >
> > > >>
> > > >> --
> > > >> Jon Haddad
> > > >> http://www.rustyrazorblade.com
> > > >> twitter: rustyrazorblade
> > > >>
> > > >
> > >
> >
>

Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

Reply via email to