Jeremiah, this might be the exception, since the value that is being
aggregated is exactly the same value that determines liveliness of the
data, and more so since the aggregation requested is the *max* of the
timestamp, given that Cassandra is a Last-Write-Wins (so, looks at the
maximum timestamp)
It's a long time since I looked at the code, but I'm pretty sure that
comment is explaining why we translate *no* timestamp to *epoch*, to save
space when serializing the encoding stats. Not stipulating that the data
may be inaccurate.
However, being such a long time since I looked, I forgot we s
Finding the max timestamp of a partition is an aggregation. Doing that
calculation purely on the replica (wether pre-calculated or not) is problematic
for any CL > 1 in the face of deletions or update that are missing. As the
contents of the partition on a given replica are different than what
First of all, thx for all the ideas.
Benedict ElIiott Smith, in code comments I found a notice that data in
EncodingStats can be wrong, not sure that its good idea to use it for accurate
results. As I understand incorrect data is not a problem for the current use
case of it, but not for my one
(Obviously, not to detract from the points that Jon and Jeremiah make, i.e.
that if TTLs or tombstones are involved the metadata we have, or can add,
is going to be worthless in most cases anyway)
On 14 January 2018 at 16:11, Benedict Elliott Smith
wrote:
> We already store the minimum timestamp
We already store the minimum timestamp in the EncodingStats of each
partition, to support more efficient encoding of atom timestamps. This
just isn't exposed beyond UnfilteredRowIterator, though it probably could
be.
Storing the max alongside would still require justification, though its
cost wou
Don’t forget about deleted and missing data. The bane of all on replica
aggregation optimization’s.
> On Jan 14, 2018, at 12:07 AM, Jeff Jirsa wrote:
>
>
> You’re right it’s not stored in metadata now. Adding this to metadata isn’t
> hard, it’s just hard to do it right where it’s useful to p
You’re right it’s not stored in metadata now. Adding this to metadata isn’t
hard, it’s just hard to do it right where it’s useful to people with other data
models (besides yours) so it can make it upstream (if that’s your goal). In
particular the worst possible case is a table with no clusterin
Do you need to support TTLs? That might be a bit of an issue.
On Sat, Jan 13, 2018 at 12:41 PM Arthur Kushka wrote:
> Hi folks,
>
> Currently, I working on custom CQL operator that should return the max
> timestamp for some partition.
>
> I don't think that scanning of partition for that kind of