FWIW, I’m not -0, just think that long after the freeze date a change like this needs a strong mandate from the community. I think the change is a good one.
> On 17 Oct 2018, at 22:09, Ariel Weisberg <ar...@weisberg.ws> wrote: > > Hi, > > It's really not appreciably slower compared to the decompression we are going > to do which is going to take several microseconds. Decompression is also > going to be faster because we are going to do less unnecessary decompression > and the decompression itself may be faster since it may fit in a higher level > cache better. I ran a microbenchmark comparing them. > > https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16653988 > > Fetching a long from memory: 56 nanoseconds > Compact integer sequence : 80 nanoseconds > Summing integer sequence : 165 nanoseconds > > Currently we have one +1 from Kurt to change the representation and possibly > a -0 from Benedict. That's not really enough to make an exception to the code > freeze. If you want it to happen (or not) you need to speak up otherwise only > the default will change. > > Regards, > Ariel > > On Wed, Oct 17, 2018, at 6:40 AM, kurt greaves wrote: >> I think if we're going to drop it to 16k, we should invest in the compact >> sequencing as well. Just lowering it to 16k will have potentially a painful >> impact on anyone running low memory nodes, but if we can do it without the >> memory impact I don't think there's any reason to wait another major >> version to implement it. >> >> Having said that, we should probably benchmark the two representations >> Ariel has come up with. >> >> On Wed, 17 Oct 2018 at 20:17, Alain RODRIGUEZ <arodr...@gmail.com> wrote: >> >>> +1 >>> >>> I would guess a lot of C* clusters/tables have this option set to the >>> default value, and not many of them are having the need for reading so big >>> chunks of data. >>> I believe this will greatly limit disk overreads for a fair amount (a big >>> majority?) of new users. It seems fair enough to change this default value, >>> I also think 4.0 is a nice place to do this. >>> >>> Thanks for taking care of this Ariel and for making sure there is a >>> consensus here as well, >>> >>> C*heers, >>> ----------------------- >>> Alain Rodriguez - al...@thelastpickle.com >>> France / Spain >>> >>> The Last Pickle - Apache Cassandra Consulting >>> http://www.thelastpickle.com >>> >>> Le sam. 13 oct. 2018 à 08:52, Ariel Weisberg <ar...@weisberg.ws> a écrit : >>> >>>> Hi, >>>> >>>> This would only impact new tables, existing tables would get their >>>> chunk_length_in_kb from the existing schema. It's something we record in >>> a >>>> system table. >>>> >>>> I have an implementation of a compact integer sequence that only requires >>>> 37% of the memory required today. So we could do this with only slightly >>>> more than doubling the memory used. I'll post that to the JIRA soon. >>>> >>>> Ariel >>>> >>>> On Fri, Oct 12, 2018, at 1:56 AM, Jeff Jirsa wrote: >>>>> >>>>> >>>>> I think 16k is a better default, but it should only affect new tables. >>>>> Whoever changes it, please make sure you think about the upgrade path. >>>>> >>>>> >>>>>> On Oct 12, 2018, at 2:31 AM, Ben Bromhead <b...@instaclustr.com> >>> wrote: >>>>>> >>>>>> This is something that's bugged me for ages, tbh the performance gain >>>> for >>>>>> most use cases far outweighs the increase in memory usage and I would >>>> even >>>>>> be in favor of changing the default now, optimizing the storage cost >>>> later >>>>>> (if it's found to be worth it). >>>>>> >>>>>> For some anecdotal evidence: >>>>>> 4kb is usually what we end setting it to, 16kb feels more reasonable >>>> given >>>>>> the memory impact, but what would be the point if practically, most >>>> folks >>>>>> set it to 4kb anyway? >>>>>> >>>>>> Note that chunk_length will largely be dependent on your read sizes, >>>> but 4k >>>>>> is the floor for most physical devices in terms of ones block size. >>>>>> >>>>>> +1 for making this change in 4.0 given the small size and the large >>>>>> improvement to new users experience (as long as we are explicit in >>> the >>>>>> documentation about memory consumption). >>>>>> >>>>>> >>>>>>> On Thu, Oct 11, 2018 at 7:11 PM Ariel Weisberg <ar...@weisberg.ws> >>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> This is regarding >>>> https://issues.apache.org/jira/browse/CASSANDRA-13241 >>>>>>> >>>>>>> This ticket has languished for a while. IMO it's too late in 4.0 to >>>>>>> implement a more memory efficient representation for compressed >>> chunk >>>>>>> offsets. However I don't think we should put out another release >>> with >>>> the >>>>>>> current 64k default as it's pretty unreasonable. >>>>>>> >>>>>>> I propose that we lower the value to 16kb. 4k might never be the >>>> correct >>>>>>> default anyways as there is a cost to compression and 16k will still >>>> be a >>>>>>> large improvement. >>>>>>> >>>>>>> Benedict and Jon Haddad are both +1 on making this change for 4.0. >>> In >>>> the >>>>>>> past there has been some consensus about reducing this value >>> although >>>> maybe >>>>>>> with more memory efficiency. >>>>>>> >>>>>>> The napkin math for what this costs is: >>>>>>> "If you have 1TB of uncompressed data, with 64k chunks that's 16M >>>> chunks >>>>>>> at 8 bytes each (128MB). >>>>>>> With 16k chunks, that's 512MB. >>>>>>> With 4k chunks, it's 2G. >>>>>>> Per terabyte of data (pre-compression)." >>>>>>> >>>>>>> >>>> >>> https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=15886621&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15886621 >>>>>>> >>>>>>> By way of comparison memory mapping the files has a similar cost per >>>> 4k >>>>>>> page of 8 bytes. Multiple mappings makes this more expensive. With a >>>>>>> default of 16kb this would be 4x less expensive than memory mapping >>> a >>>> file. >>>>>>> I only mention this to give a sense of the costs we are already >>>> paying. I >>>>>>> am not saying they are directly related. >>>>>>> >>>>>>> I'll wait a week for discussion and if there is consensus make the >>>> change. >>>>>>> >>>>>>> Regards, >>>>>>> Ariel >>>>>>> >>>>>>> >>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>> >>>>>>> -- >>>>>> Ben Bromhead >>>>>> CTO | Instaclustr <https://www.instaclustr.com/> >>>>>> +1 650 284 9692 >>>>>> Reliability at Scale >>>>>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>> >>>> >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org