Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

Jeff Jirsa Fri, 01 Feb 2019 13:02:40 -0800

On Fri, Feb 1, 2019 at 12:58 PM Carl Mueller
<carl.muel...@smartthings.com.invalid> wrote:


> Jeff: so the partition key with timestamp would then need a separate index
> table to track the appid->partition keys. Which isn't horrible, but also
> tracks into another desire of mine: some way to make the replica mapping
> match locally between the index table and the data table:
>
> So in the composite partition key for the TWCS table, you'd have app_id +
> timestamp, BUT ONLY THE app_id GENERATES the hash/key.
>
>
Huh? No, you'd have a composite partition key of app_id + timestamp
ROUNDED/CEIL/FLOOR to some time window, and both would be used for hash/key.

And you dont need any extra table, because app_id is known and the
timestamp can be calculated (e.g., 4 digits of year + 3 digits for day of
year makes today 2019032 )



> Thus it would match with the index table that is just partition key app_id,
> column key timestamp.
>
> And then theoretically a node-local "join" could be done without an
> additional query hop, and batched updates would be more easily atomic to a
> single node.
>
> Now how we would communicate all that in CQL/etc: who knows. Hm. Maybe
> materialized views cover this, but I haven't tracked that since we don't
> have versions that support them and they got "deprecated".
>
>
> On Fri, Feb 1, 2019 at 2:53 PM Carl Mueller <carl.muel...@smartthings.com>
> wrote:
>
> > Interesting. Now that we have semiautomated upgrades, we are going to
> > hopefully get everything to 3.11X once we get the intermediate hop to
> 2.2.
> >
> > I'm thinking we could also use sstable metadata markings + custom
> > compactors for things like multiple customers on the same table. So you
> > could sequester the data for a customer in their own sstables and then
> > queries could effectively be subdivided against only the sstables that
> had
> > that customer. Maybe the min and max would cover that, I'd have to look
> at
> > the details.
> >
> > On Thu, Jan 31, 2019 at 8:11 PM Jonathan Haddad <j...@jonhaddad.com>
> wrote:
> >
> >> In addition to what Jeff mentioned, there was an optimization in 3.4
> that
> >> can significantly reduce the number of sstables accessed when a LIMIT
> >> clause was used.  This can be a pretty big win with TWCS.
> >>
> >>
> >>
> http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html
> >>
> >> On Thu, Jan 31, 2019 at 5:50 PM Jeff Jirsa <jji...@gmail.com> wrote:
> >>
> >> > In my original TWCS talk a few years back, I suggested that people
> make
> >> > the partitions match the time window to avoid exactly what you’re
> >> > describing. I added that to the talk because my first team that used
> >> TWCS
> >> > (the team for which I built TWCS) had a data model not unlike yours,
> and
> >> > the read-every-sstable thing turns out not to work that well if you
> have
> >> > lots of windows (or very large partitions). If you do this, you can
> fan
> >> out
> >> > a bunch of async reads for the first few days and ask for more as you
> >> need
> >> > to fill the page - this means the reads are more distributed, too,
> >> which is
> >> > an extra bonus when you have noisy partitions.
> >> >
> >> > In 3.0 and newer (I think, don’t quote me in the specific version),
> the
> >> > sstable metadata has the min and max clustering which helps exclude
> >> > sstables from the read path quite well if everything in the table is
> >> using
> >> > timestamp clustering columns. I know there was some issue with this
> and
> >> RTs
> >> > recently, so I’m not sure if it’s current state, but worth considering
> >> that
> >> > this may be much better on 3.0+
> >> >
> >> >
> >> >
> >> > --
> >> > Jeff Jirsa
> >> >
> >> >
> >> > > On Jan 31, 2019, at 1:56 PM, Carl Mueller <
> >> carl.muel...@smartthings.com.invalid>
> >> > wrote:
> >> > >
> >> > > Situation:
> >> > >
> >> > > We use TWCS for a task history table (partition is user, column key
> is
> >> > > timeuuid of task, TWCS is used due to tombstone TTLs that rotate out
> >> the
> >> > > tasks every say month. )
> >> > >
> >> > > However, if we want to get a "slice" of tasks (say, tasks in the
> last
> >> two
> >> > > days and we are using TWCS sstable blocks of 12 hours).
> >> > >
> >> > > The problem is, this is a frequent user and they have tasks in ALL
> the
> >> > > sstables that are organized by the TWCS into time-bucketed sstables.
> >> > >
> >> > > So Cassandra has to first read in, say 80 sstables to reconstruct
> the
> >> > row,
> >> > > THEN it can exclude/slice on the column key.
> >> > >
> >> > > Question:
> >> > >
> >> > > Or am I wrong that the read path needs to grab all relevant sstables
> >> > before
> >> > > applying column key slicing and this is possible? Admittedly we are
> in
> >> > 2.1
> >> > > for this table (we in the process of upgrading now that we have an
> >> > > automated upgrading program that seems to work pretty well)
> >> > >
> >> > > If my assumption is correct, then the compaction strategy knows as
> it
> >> > > writes the sstables what it is bucketing them as (and could encode
> in
> >> > > sstable metadata?). If my assumption about slicing is that the whole
> >> row
> >> > > needs reconstruction, if we had a perfect infinite monkey coding
> team
> >> > that
> >> > > could generate whatever we wanted within some feasibility, could we
> >> > provide
> >> > > special hooks to do sstable exclusion based on metadata if we know
> >> that
> >> > > that the metadata will indicate exclusion/inclusion of columns based
> >> on
> >> > > metadata?
> >> > >
> >> > > Goal:
> >> > >
> >> > > The overall goal would be to support exclusion of sstables from a
> read
> >> > > path, in case we had compaction strategies hand-tailored for other
> >> > queries.
> >> > > Essentially we would be doing a first-pass bucketsort exclusion with
> >> the
> >> > > sstable metadata marking the buckets. This might aid support of
> >> superwide
> >> > > rows and paging through column keys if we allowed the table creator
> to
> >> > > specify bucketing as flushing occurs. In general it appears query
> >> > > performance quickly degrades based on # sstables required for a
> >> lookup.
> >> > >
> >> > > I still don't know the code nearly well enough to do patches, it
> would
> >> > seem
> >> > > based on my looking at custom compaction strategies and the basic
> read
> >> > path
> >> > > that this would be a useful extension for advanced users.
> >> > >
> >> > > The fallback would be a set of tables to serve as buckets and we
> span
> >> the
> >> > > buckets with queries when one bucket runs out. The tables rotate.
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >> >
> >> >
> >>
> >> --
> >> Jon Haddad
> >> http://www.rustyrazorblade.com
> >> twitter: rustyrazorblade
> >>
> >
>

Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

Reply via email to