FWIW, I’m not -0, just think that long after the freeze date a change like this 
needs a strong mandate from the community.  I think the change is a good one.





> On 17 Oct 2018, at 22:09, Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> Hi,
> 
> It's really not appreciably slower compared to the decompression we are going 
> to do which is going to take several microseconds. Decompression is also 
> going to be faster because we are going to do less unnecessary decompression 
> and the decompression itself may be faster since it may fit in a higher level 
> cache better. I ran a microbenchmark comparing them.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16653988
> 
> Fetching a long from memory:       56 nanoseconds
> Compact integer sequence   :       80 nanoseconds
> Summing integer sequence   :      165 nanoseconds
> 
> Currently we have one +1 from Kurt to change the representation and possibly 
> a -0 from Benedict. That's not really enough to make an exception to the code 
> freeze. If you want it to happen (or not) you need to speak up otherwise only 
> the default will change.
> 
> Regards,
> Ariel
> 
> On Wed, Oct 17, 2018, at 6:40 AM, kurt greaves wrote:
>> I think if we're going to drop it to 16k, we should invest in the compact
>> sequencing as well. Just lowering it to 16k will have potentially a painful
>> impact on anyone running low memory nodes, but if we can do it without the
>> memory impact I don't think there's any reason to wait another major
>> version to implement it.
>> 
>> Having said that, we should probably benchmark the two representations
>> Ariel has come up with.
>> 
>> On Wed, 17 Oct 2018 at 20:17, Alain RODRIGUEZ <arodr...@gmail.com> wrote:
>> 
>>> +1
>>> 
>>> I would guess a lot of C* clusters/tables have this option set to the
>>> default value, and not many of them are having the need for reading so big
>>> chunks of data.
>>> I believe this will greatly limit disk overreads for a fair amount (a big
>>> majority?) of new users. It seems fair enough to change this default value,
>>> I also think 4.0 is a nice place to do this.
>>> 
>>> Thanks for taking care of this Ariel and for making sure there is a
>>> consensus here as well,
>>> 
>>> C*heers,
>>> -----------------------
>>> Alain Rodriguez - al...@thelastpickle.com
>>> France / Spain
>>> 
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>> 
>>> Le sam. 13 oct. 2018 à 08:52, Ariel Weisberg <ar...@weisberg.ws> a écrit :
>>> 
>>>> Hi,
>>>> 
>>>> This would only impact new tables, existing tables would get their
>>>> chunk_length_in_kb from the existing schema. It's something we record in
>>> a
>>>> system table.
>>>> 
>>>> I have an implementation of a compact integer sequence that only requires
>>>> 37% of the memory required today. So we could do this with only slightly
>>>> more than doubling the memory used. I'll post that to the JIRA soon.
>>>> 
>>>> Ariel
>>>> 
>>>> On Fri, Oct 12, 2018, at 1:56 AM, Jeff Jirsa wrote:
>>>>> 
>>>>> 
>>>>> I think 16k is a better default, but it should only affect new tables.
>>>>> Whoever changes it, please make sure you think about the upgrade path.
>>>>> 
>>>>> 
>>>>>> On Oct 12, 2018, at 2:31 AM, Ben Bromhead <b...@instaclustr.com>
>>> wrote:
>>>>>> 
>>>>>> This is something that's bugged me for ages, tbh the performance gain
>>>> for
>>>>>> most use cases far outweighs the increase in memory usage and I would
>>>> even
>>>>>> be in favor of changing the default now, optimizing the storage cost
>>>> later
>>>>>> (if it's found to be worth it).
>>>>>> 
>>>>>> For some anecdotal evidence:
>>>>>> 4kb is usually what we end setting it to, 16kb feels more reasonable
>>>> given
>>>>>> the memory impact, but what would be the point if practically, most
>>>> folks
>>>>>> set it to 4kb anyway?
>>>>>> 
>>>>>> Note that chunk_length will largely be dependent on your read sizes,
>>>> but 4k
>>>>>> is the floor for most physical devices in terms of ones block size.
>>>>>> 
>>>>>> +1 for making this change in 4.0 given the small size and the large
>>>>>> improvement to new users experience (as long as we are explicit in
>>> the
>>>>>> documentation about memory consumption).
>>>>>> 
>>>>>> 
>>>>>>> On Thu, Oct 11, 2018 at 7:11 PM Ariel Weisberg <ar...@weisberg.ws>
>>>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> This is regarding
>>>> https://issues.apache.org/jira/browse/CASSANDRA-13241
>>>>>>> 
>>>>>>> This ticket has languished for a while. IMO it's too late in 4.0 to
>>>>>>> implement a more memory efficient representation for compressed
>>> chunk
>>>>>>> offsets. However I don't think we should put out another release
>>> with
>>>> the
>>>>>>> current 64k default as it's pretty unreasonable.
>>>>>>> 
>>>>>>> I propose that we lower the value to 16kb. 4k might never be the
>>>> correct
>>>>>>> default anyways as there is a cost to compression and 16k will still
>>>> be a
>>>>>>> large improvement.
>>>>>>> 
>>>>>>> Benedict and Jon Haddad are both +1 on making this change for 4.0.
>>> In
>>>> the
>>>>>>> past there has been some consensus about reducing this value
>>> although
>>>> maybe
>>>>>>> with more memory efficiency.
>>>>>>> 
>>>>>>> The napkin math for what this costs is:
>>>>>>> "If you have 1TB of uncompressed data, with 64k chunks that's 16M
>>>> chunks
>>>>>>> at 8 bytes each (128MB).
>>>>>>> With 16k chunks, that's 512MB.
>>>>>>> With 4k chunks, it's 2G.
>>>>>>> Per terabyte of data (pre-compression)."
>>>>>>> 
>>>>>>> 
>>>> 
>>> https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=15886621&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15886621
>>>>>>> 
>>>>>>> By way of comparison memory mapping the files has a similar cost per
>>>> 4k
>>>>>>> page of 8 bytes. Multiple mappings makes this more expensive. With a
>>>>>>> default of 16kb this would be 4x less expensive than memory mapping
>>> a
>>>> file.
>>>>>>> I only mention this to give a sense of the costs we are already
>>>> paying. I
>>>>>>> am not saying they are directly related.
>>>>>>> 
>>>>>>> I'll wait a week for discussion and if there is consensus make the
>>>> change.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Ariel
>>>>>>> 
>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>>>> 
>>>>>>> --
>>>>>> Ben Bromhead
>>>>>> CTO | Instaclustr <https://www.instaclustr.com/>
>>>>>> +1 650 284 9692
>>>>>> Reliability at Scale
>>>>>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>> 
>>>> 
>>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to