Re: Getting partition min/max timestamp

Benedict Elliott Smith Sun, 14 Jan 2018 08:12:06 -0800

We already store the minimum timestamp in the EncodingStats of each
partition, to support more efficient encoding of atom timestamps.  This
just isn't exposed beyond UnfilteredRowIterator, though it probably could
be.


Storing the max alongside would still require justification, though its
cost would actually be fairly nominal (probably only a few bytes; it
depends on how far apart min/max are).

I'm not sure (IMO) that even a fairly nominal cost could be justified
unless there were widespread benefit though, which I'm not sure this would
provide.  Maintaining a patched variant of your own that stores this
probably wouldn't be too hard, though.

In the meantime, exposing and utilising the minimum timestamp from
EncodingStats is probably a good place to start to explore the viability of
the approach.

On 14 January 2018 at 15:34, Jeremiah Jordan <[email protected]> wrote:

> Don’t forget about deleted and missing data. The bane of all on replica
> aggregation optimization’s.
>
> > On Jan 14, 2018, at 12:07 AM, Jeff Jirsa <[email protected]> wrote:
> >
> >
> > You’re right it’s not stored in metadata now. Adding this to metadata
> isn’t hard, it’s just hard to do it right where it’s useful to people with
> other data models (besides yours) so it can make it upstream (if that’s
> your goal). In particular the worst possible case is a table with no
> clustering key and a single non-partition key column. In that case storing
> these extra two long time stamps may be 2-3x more storage than without,
> which would be a huge regression, so you’d have to have a way to turn that
> feature off.
> >
> >
> > Worth mentioning that there are ways to do this without altering
> Cassandra -  consider using static columns that represent the min timestamp
> and max timestamp. Create them both as ints or longs and write them on all
> inserts/updates (as part of a batch, if needed). The only thing you’ll have
> to do is find a way for “min timestamp” to work - you can set the min time
> stamp column with an explicit  “using timestamp” timestamp = 2^31-NOW, so
> that future writes won’t overwrite those values. That gives you a first
> write win behavior for that column, which gives you an effective min
> timestamp for the partition as a whole.
> >
> > --
> > Jeff Jirsa
> >
> >
> >> On Jan 13, 2018, at 4:58 AM, Arthur Kushka <[email protected]> wrote:
> >>
> >> Hi folks,
> >>
> >> Currently, I working on custom CQL operator that should return the max
> >> timestamp for some partition.
> >>
> >> I don't think that scanning of partition for that kind of data is a nice
> >> idea. Instead of it, I thinking about adding a metadata to the
> partition. I
> >> want to store minTimestamp and maxTimestamp for every partition as it
> >> already done in Memtable`s. That timestamps will be updated on each
> >> mutation operation, that is quite cheap in comparison to full scan.
> >>
> >> I quite new to Cassandra codebase and want to get some critics and
> ideas,
> >> maybe that kind of data already stored somewhere or you have better
> ideas.
> >> Is my assumption right?
> >>
> >> Best,
> >> Artur
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Getting partition min/max timestamp

Reply via email to