Hi All,
a version using Uints, 20y max TTL and kicking the can down the road
until 2086 has been put up for review #justfyi
Regards
On 15/11/22 7:06, Berenguer Blasi wrote:
Hi all,
thanks for your answers!.
To Benedict's point: In terms of the uvint enconding of deletionTime
i.e. it is true it happens here
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SerializationHeader.java#L170.
But we also have a DeletionTime serializer here
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/DeletionTime.java#L166
that is writing an int and a long that would now write 2 longs.
TTL itself (the delta) remains an int in the new PR so it should have
no effect in size.
Did I reference the correct parts of the codebase? No sstable expert here.
On 14/11/22 19:28, Josh McKenzie wrote:
in 2035 we'd hit the same problem again.
In terms of "kicking a can down the road", this would be a pretty
vigorous kick. I wouldn't push back against this deferral. :)
On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:
I’m confused why we see *any* increase in sstable size - TTLs and
deletion times are already written as unsigned vints as offsets from
an sstable epoch for each value.
I would dig in more carefully to explore why you’re seeing this
increase? For the same data there should be no change to size on disk.
On 14 Nov 2022, at 06:36, C. Scott Andreas <sc...@paradoxica.net>
wrote:
A 2-3% increase in storage volume is roughly equivalent to giving
up the gain from LZ4 -> LZ4HC, or a one to two-level bump in
Zstandard compression levels. This regression could be very
expensive for storage-bound use cases.
From the perspective of storage overhead, the unsigned int approach
sounds preferable.
On Nov 13, 2022, at 10:13 PM, Berenguer Blasi
<berenguerbl...@gmail.com> wrote:
Hi all,
We have done some more research on c14227. The current patch for
CASSANDRA-14227 solves the TTL limit issue by switching TTL to
long instead of int. This approach does not have a negative impact
on memtable memory usage, as C* controles the memory used by the
Memtable, but based on our testing it increases the bytes flushed
by 4 to 7% and the byte on disk by 2 to 3%.
As a mitigation to this problem it is possible to encode
/localDeletionTime/ as a vint. It results in a 1% improvement but
might cause additional computations during compaction or some
other operations.
Benedict's proposal to keep on using ints for TTL but as a delta
to nowInSecond would work for memtables but not for work in the
SSTable where nowInSecond does not exist. By consequence we would
still suffer from the impact on byte flushed and bytes on disk.
Another approach that was suggested is the use of unsigned
integer. Java 8 has an unsigned integer API that would allow us to
use unsigned int for TTLs. Based on computation unsigned ints
would give us a maximum time of 136 years since the Unix Epoch and
therefore a maximum expiration timestamp in 2106. We would have to
keep TTL at 20y instead of 68y to give us enough breathing room
though, otherwise in 2035 we'd hit the same problem again.
Happy to hear opinions.
On 18/10/22 10:56, Berenguer Blasi wrote:
Hi,
apologies for the late reply as I have been OOO. I have done some
profiling and results look virtually identical on trunk and
14227. I have attached some screenshots to the ticket
https://issues.apache.org/jira/browse/CASSANDRA-14227
<https://issues.apache.org/jira/browse/CASSANDRA-14227>. Unless
my eyes are fooling me everything in the jfrs look the same.
Regards
On 30/9/22 9:44, Berenguer Blasi wrote:
Hi Benedict,
thanks for the reply! Yes some profiling is probably needed,
then we can see if going down the delta encoding big refactor
rabbit hole is worth it?
Let's see what other concerns people bring up.
Thx.
On 29/9/22 11:12, Benedict Elliott Smith wrote:
My only slight concern with this approach is the additional
memory pressure. Since 64yrs should be plenty at any moment in
time, I wonder if it wouldn’t be better to represent these
times as deltas from the nowInSec being used to process the
query. So, long math would only be used to normalise the times
to this nowInSec (from whatever is stored in the sstable)
within a method, and ints would be stored in memtables and any
objects used for processing.
This might admittedly be more work, but I don’t believe it
should be too challenging - we can introduce a method
deletionTime(int nowInSec) that returns a long value by adding
nowInSec to the deletionTime, and make the underlying value
private, refactoring call sites?
On 29 Sep 2022, at 09:37, Berenguer Blasi
<berenguerbl...@gmail.com> <mailto:berenguerbl...@gmail.com>
wrote:
Hi all,
I have taken a stab in a PR you can find attached in the
ticket. Mainly:
- I have moved deletion times, gc and nowInSec timestamps to
long. That should get us past the 2038 limit.
- TTL is maxed now to 68y. Think CQL API compatibility and a
sort of a 'free' guardrail.
- A new NONE overflow policy is the default but everything is
backwards compatible by keeping the previous ones in place.
Think upgrade scenarios or apps relying on the previous behavior.
- The new limit is around year 292,471,208,677 which sounds ok
given the Sun will start collapsing in 3 to 5 billion years :-)
- Please feel free to drop by the ticket and take a look at
the PR even if it's cursory
Thx in advance.