Hi all,
14227 has undergone review and perf numbers look ok. Now I have to
tackle the downgradability issue and hopefully then merge. This is what
I have gathered from the many conversations, please help me let me know
if this is correct or if I am missing sthg:
- Everything will be based off a feature flag. I will add a transient
feature flag while waiting for CASSANDRA-18301 to land. I will merge to
trunk and when CASSANDRA-18301 lands it should replace it. That makes
CASSANDRA-18301 a release blocker (think multiple feature flags, avoid
future feature flag deprecations,...). If the effort for the TTL feature
flag is comparable to implementing CASSANDRA-18301 I might just do that
(TBD).
- My code will have to behave as has always done and produce sstables
_not_ in the new format. Once that feature flag toggles I can write
sstables in the _new_ format with the new behavior. I will add testing
for both behaviors and synthetically emulate the flag toggle.
- Providing a tool to downgrade sstables already written in the _new_
format in the _previous_ format is not in scope for 14227. That would be
CASSANDRA-8928 in any case.
Is this correct?
Thx in advance.
On 3/2/23 15:24, Henrik Ingo wrote:
In that case I agree that increasing from 20 years is an interesting
opportunity but clearly out of scope for your current ticket.
On Fri, Feb 3, 2023 at 3:48 PM Berenguer Blasi
<berenguerbl...@gmail.com> wrote:
Hi,
20y is the current and historic value. 68y is what an integer can
accommodate hence the current 2038 limit since the 1970 Unix
epoch. I wouldn't make it a configurable value, off the top of my
head it would make for some interesting bugs and debugging
sessions when nodes had different values. Food for another ticket
in any case imo.
Regards
On 3/2/23 14:18, Henrik Ingo wrote:
Naive PHB questions to follow...
Why are 68y and 20y special? Could you pick any value? Could we
allow it to be configurable? (Last one probably overkill, just
asking to understand...)
If we can pick any values we want, instinctively I would
personally suggest to have TTL higher than 20 years, but also
kicking the can further than 2035, which is only 13 years from
now. Just to suggest a specific number, why not 35y and 2071?
henrik
On Fri, Feb 3, 2023 at 12:32 PM Berenguer Blasi
<berenguerbl...@gmail.com> wrote:
Hi All,
a version using Uints, 20y max TTL and kicking the can down
the road until 2086 has been put up for review #justfyi
Regards
On 15/11/22 7:06, Berenguer Blasi wrote:
Hi all,
thanks for your answers!.
To Benedict's point: In terms of the uvint enconding of
deletionTime i.e. it is true it happens here
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SerializationHeader.java#L170.
But we also have a DeletionTime serializer here
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/DeletionTime.java#L166
that is writing an int and a long that would now write 2 longs.
TTL itself (the delta) remains an int in the new PR so it
should have no effect in size.
Did I reference the correct parts of the codebase? No
sstable expert here.
On 14/11/22 19:28, Josh McKenzie wrote:
in 2035 we'd hit the same problem again.
In terms of "kicking a can down the road", this would be a
pretty vigorous kick. I wouldn't push back against this
deferral. :)
On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:
I’m confused why we see *any* increase in sstable size -
TTLs and deletion times are already written as unsigned
vints as offsets from an sstable epoch for each value.
I would dig in more carefully to explore why you’re seeing
this increase? For the same data there should be no change
to size on disk.
On 14 Nov 2022, at 06:36, C. Scott Andreas
<sc...@paradoxica.net> <mailto:sc...@paradoxica.net> wrote:
A 2-3% increase in storage volume is roughly equivalent
to giving up the gain from LZ4 -> LZ4HC, or a one to
two-level bump in Zstandard compression levels. This
regression could be very expensive for storage-bound use
cases.
From the perspective of storage overhead, the unsigned
int approach sounds preferable.
On Nov 13, 2022, at 10:13 PM, Berenguer Blasi
<berenguerbl...@gmail.com>
<mailto:berenguerbl...@gmail.com> wrote:
Hi all,
We have done some more research on c14227. The current
patch for CASSANDRA-14227 solves the TTL limit issue by
switching TTL to long instead of int. This approach does
not have a negative impact on memtable memory usage, as
C* controles the memory used by the Memtable, but based
on our testing it increases the bytes flushed by 4 to 7%
and the byte on disk by 2 to 3%.
As a mitigation to this problem it is possible to encode
/localDeletionTime/ as a vint. It results in a 1%
improvement but might cause additional computations
during compaction or some other operations.
Benedict's proposal to keep on using ints for TTL but as
a delta to nowInSecond would work for memtables but not
for work in the SSTable where nowInSecond does not
exist. By consequence we would still suffer from the
impact on byte flushed and bytes on disk.
Another approach that was suggested is the use of
unsigned integer. Java 8 has an unsigned integer API
that would allow us to use unsigned int for TTLs. Based
on computation unsigned ints would give us a maximum
time of 136 years since the Unix Epoch and therefore a
maximum expiration timestamp in 2106. We would have to
keep TTL at 20y instead of 68y to give us enough
breathing room though, otherwise in 2035 we'd hit the
same problem again.
Happy to hear opinions.
On 18/10/22 10:56, Berenguer Blasi wrote:
Hi,
apologies for the late reply as I have been OOO. I have
done some profiling and results look virtually
identical on trunk and 14227. I have attached some
screenshots to the ticket
https://issues.apache.org/jira/browse/CASSANDRA-14227.
Unless my eyes are fooling me everything in the jfrs
look the same.
Regards
On 30/9/22 9:44, Berenguer Blasi wrote:
Hi Benedict,
thanks for the reply! Yes some profiling is probably
needed, then we can see if going down the delta
encoding big refactor rabbit hole is worth it?
Let's see what other concerns people bring up.
Thx.
On 29/9/22 11:12, Benedict Elliott Smith wrote:
My only slight concern with this approach is the
additional memory pressure. Since 64yrs should be
plenty at any moment in time, I wonder if it wouldn’t
be better to represent these times as deltas from the
nowInSec being used to process the query. So, long
math would only be used to normalise the times to
this nowInSec (from whatever is stored in the
sstable) within a method, and ints would be stored in
memtables and any objects used for processing.
This might admittedly be more work, but I don’t
believe it should be too challenging - we can
introduce a method deletionTime(int nowInSec) that
returns a long value by adding nowInSec to the
deletionTime, and make the underlying value private,
refactoring call sites?
On 29 Sep 2022, at 09:37, Berenguer Blasi
<berenguerbl...@gmail.com>
<mailto:berenguerbl...@gmail.com> wrote:
Hi all,
I have taken a stab in a PR you can find attached in
the ticket. Mainly:
- I have moved deletion times, gc and nowInSec
timestamps to long. That should get us past the 2038
limit.
- TTL is maxed now to 68y. Think CQL API
compatibility and a sort of a 'free' guardrail.
- A new NONE overflow policy is the default but
everything is backwards compatible by keeping the
previous ones in place. Think upgrade scenarios or
apps relying on the previous behavior.
- The new limit is around year 292,471,208,677 which
sounds ok given the Sun will start collapsing in 3
to 5 billion years :-)
- Please feel free to drop by the ticket and take a
look at the PR even if it's cursory
Thx in advance.
--
Henrik Ingo
c. +358 40 569 7354
w. www.datastax.com <http://www.datastax.com>
<https://urldefense.com/v3/__https://www.facebook.com/datastax__;!!PbtH5S7Ebw!dGILuVLnHD9WkWF3ITGFiQhX8pPqihOqoeji0lxk4hrPPlPQewsQDIVydwjNA5cYWR-6Ug87ZGjZUekBXRlRT3OQ3Xw$><https://twitter.com/datastax><https://urldefense.com/v3/__https://www.linkedin.com/company/datastax/__;!!PbtH5S7Ebw!dGILuVLnHD9WkWF3ITGFiQhX8pPqihOqoeji0lxk4hrPPlPQewsQDIVydwjNA5cYWR-6Ug87ZGjZUekBXRlRHUiTB_M$><https://github.com/datastax/>
--
Henrik Ingo
c. +358 40 569 7354
w. www.datastax.com <http://www.datastax.com>
<https://www.facebook.com/datastax><https://twitter.com/datastax><https://www.linkedin.com/company/datastax/><https://github.com/datastax/>