I've granted access to the account "Dmitry Konstantinov (netudima)"

Kind Regards,
Brandon

On Fri, Jan 3, 2025 at 8:18 AM Dmitry Konstantinov <netud...@gmail.com> wrote:
>
> Hi Brain, I wanted it to be created under 
> https://cwiki.apache.org/confluence/display/CASSANDRA/Discussion but it looks 
> like I do not have grants to add a page there and Confluence automatically 
> selected this space to store the page.
> I do not have permission to move it too :-(
> Can I get grants to create pages under 
> https://cwiki.apache.org/confluence/display/CASSANDRA/ ?
>
> Thank you,
> Dmitry
>
> On Fri, 3 Jan 2025 at 14:12, Brian Proffitt <b...@apache.org> wrote:
>>
>> Dmitry:
>>
>> You are using a section of the Confluence wiki that is dedicated to 
>> Community Over Code, the Apache Conference. Please move that page to a more 
>> appropriate part of the Apache wiki as soon as you can.
>>
>> Thanks!
>> BKP
>>
>> On 2025/01/03 13:55:49 Dmitry Konstantinov wrote:
>> > I have summarized information from this mail thread to
>> > https://cwiki.apache.org/confluence/display/COC/SSTable%27s+partition+cardinality+implementation
>> > Probably later it can be transformed to a CEP..
>> > Regarding experience of DataSketches library's authors and publications
>> > here there is a good summary in Background section:
>> > https://cwiki.apache.org/confluence/display/INCUBATOR/DataSketchesProposal
>> > . It looks good..
>> >
>> > On Fri, 3 Jan 2025 at 13:06, Štefan Miklošovič <smikloso...@apache.org>
>> > wrote:
>> >
>> > > Right ... that sounds reasonable. Let's "sleep on it" for a while. It is
>> > > not something which is urgent to deal with right now but I find myself
>> > > quite often to identify the functionality where we go to the disk more
>> > > often than necessary and this was next on the list to take a look at
>> > > reading CASSANDRA-13338. So I took a look ... and here we are.
>> > >
>> > > If you guys go to bump SSTable version in 5.1 / 6.0, this change might be
>> > > just shipped with that too.
>> > >
>> > > On Fri, Jan 3, 2025 at 1:47 PM Benedict <bened...@apache.org> wrote:
>> > >
>> > >> I’ve had a quick skim of the data sketches library, and it does seem to
>> > >> have made some more efficient decisions in its design than clearspring,
>> > >> appears to maybe support off-heap representations, and has reasonably 
>> > >> good
>> > >> documentation about the theoretical properties of the sketches. The 
>> > >> chair
>> > >> of the project is a published author on the topic, and the library has
>> > >> newer algorithms for cardinality estimation than HLL.
>> > >>
>> > >> So, honestly, it might not be a bad idea to (carefully) consider a
>> > >> migration, even if the current library isn’t broken for our needs.
>> > >>
>> > >> It would not be high up my priority list for the project, but I would
>> > >> support it if it scratches someone’s itch.
>> > >>
>> > >> On 3 Jan 2025, at 12:16, Štefan Miklošovič <smikloso...@apache.org>
>> > >> wrote:
>> > >>
>> > >> 
>> > >> Okay ... first problems.
>> > >>
>> > >> These 2000 bytes I have mentioned in my response to Chris were indeed
>> > >> correct, but that was with Datasketches and the main parameter for Hall
>> > >> Sketch (DEFAULT_LG_K) was 12. When I changed that to 13 to match what we
>> > >> currently have in Cassandra with Clearspring, that doubled the size to
>> > >> ~4000 bytes.
>> > >>
>> > >> When we do not use Datasketches, what Clearspring generates is about
>> > >> ~5000 bytes for the array itself but that array is wrapped into an
>> > >> ICardinality object of Clearspring and we need that object in order to
>> > >> merge another ICardinality into that. So, we would need to cache this
>> > >> ICardinality object instead of just an array itself. If we don't cache
>> > >> whole ICardinality, we would then need to do basically what
>> > >> CompactionMetadata.CompactionMetadataSerializer.deserialize is doing 
>> > >> which
>> > >> would allocate a lot / often (ICardinality cardinality =
>> > >> HyperLogLogPlus.Builder.build(that_cached_array)).
>> > >>
>> > >> To avoid the allocations every time we compute, we would just cache that
>> > >> whole ICardinality of Clearspring, but that whole object measures like
>> > >> 11/12 KB. So even 10k tables would occupy like 100MB. 50k tables 500MB.
>> > >> That is becoming quite a problem.
>> > >>
>> > >> On the other hand, HllSketch of Datasketches, array included, adds
>> > >> minimal overhead. Like an array has 5000 bytes and the whole object like
>> > >> 5500. You got the idea ...
>> > >>
>> > >> If we are still OK with these sizes, sure ... I am just being 
>> > >> transparent
>> > >> about the consequences here.
>> > >>
>> > >> A user would just opt-in into this (by default it would be turned off).
>> > >>
>> > >> On the other hand, if we have 10k SSTables, reading that 10+KB from disk
>> > >> takes around 2-3ms so we would read the disk 20/30 seconds every time we
>> > >> would hit that metric (and we haven't even started to merge the logs).
>> > >>
>> > >> If this is still not something which would sell Datasketches as a viable
>> > >> alternative then I guess we need to stick to these numbers and cache it 
>> > >> all
>> > >> with Clearspring, occupying way more memory.
>> > >>
>> > >> On Thu, Jan 2, 2025 at 10:15 PM Benedict <bened...@apache.org> wrote:
>> > >>
>> > >>> I would like to see somebody who has some experience writing data
>> > >>> structures, preferably someone we trust as a community to be competent 
>> > >>> at
>> > >>> this (ie having some experience within the project contributing at this
>> > >>> level), look at the code like they were at least lightly reviewing the
>> > >>> feature as a contribution to this project.
>> > >>>
>> > >>> This should be the bar for any new library really, but triply so for
>> > >>> replacing a library that works fine.
>> > >>>
>> > >>> On 2 Jan 2025, at 21:02, Štefan Miklošovič <smikloso...@apache.org>
>> > >>> wrote:
>> > >>>
>> > >>> 
>> > >>> Point 2) is pretty hard to fulfil, I can not imagine what would be
>> > >>> "enough" for you to be persuaded. What should concretely happen? 
>> > >>> Because
>> > >>> whoever comes and says "yeah this is a good lib, it works" is probably 
>> > >>> not
>> > >>> going to be enough given the vague requirements you put under 2) You 
>> > >>> would
>> > >>> like to see exactly what?
>> > >>>
>> > >>> The way it looks to me is to just shut it down because of perceived
>> > >>> churn caused by that and there will always be some argument against 
>> > >>> that.
>> > >>>
>> > >>> Based on (1) I don't think what we have is bug free.
>> > >>>
>> > >>> Jeff:
>> > >>>
>> > >>> Thank you for that answer, I think we are on the same page that caching
>> > >>> it is just fine, that's what I got from your last two paragraphs.
>> > >>>
>> > >>> So the path from here is
>> > >>>
>> > >>> 1) add datasketches and cache
>> > >>> 2) don't add datasketches and cache it anyway
>> > >>>
>> > >>> The introduction of datasketches lib is not the absolute must in order
>> > >>> to achieve that, we can cache / compute it parallel with Clearspring as
>> > >>> well, it is just a bitter-sweet solution which just doesn't feel right.
>> > >>>
>> > >>> (1) https://github.com/addthis/stream-lib/issues
>> > >>>
>> > >>> On Thu, Jan 2, 2025 at 9:26 PM Benedict <bened...@apache.org> wrote:
>> > >>>
>> > >>>> Your message seemed to be all about the caching proposal, which I have
>> > >>>> proposed we separate, hence my confusion.
>> > >>>>
>> > >>>> To restate my answer to your question, I think that unless the new
>> > >>>> library actually offers us concrete benefits we can point to that we
>> > >>>> actually care about then yes it’s a bad idea to incur the churn of
>> > >>>> migration.
>> > >>>>
>> > >>>> I’m not inherently opposed to a migration but simply “new is better” 
>> > >>>> is
>> > >>>> just plain wrong. Nothing you’ve presented yet convinces me this 
>> > >>>> library is
>> > >>>> worth the effort of vetting given our current solution works fine.
>> > >>>>
>> > >>>> My position is that for any new library we should:
>> > >>>>
>> > >>>> 1) Point to something it solves that we actually want and is worth the
>> > >>>> time investment
>> > >>>> 2) Solicit folk in the community competent in the relevant data
>> > >>>> structures to vet the library for the proposed functionality
>> > >>>>
>> > >>>> The existing solution never went through (2) because it dates from the
>> > >>>> dark ages where we just threw dependencies in willynilly. But it has 
>> > >>>> the
>> > >>>> benefit of having been used for a very long time without incident.
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> On 2 Jan 2025, at 20:12, Štefan Miklošovič <smikloso...@apache.org>
>> > >>>> wrote:
>> > >>>>
>> > >>>> 
>> > >>>> Hi Benedict,
>> > >>>>
>> > >>>> you wrote:
>> > >>>>
>> > >>>> I am strongly opposed to updating libraries simply for the sake of it.
>> > >>>> Something like HLL does not need much ongoing maintenance if it works.
>> > >>>> We’re simply asking for extra work and bugs by switching, and some 
>> > >>>> risk
>> > >>>> without understanding the quality control for the new library 
>> > >>>> project’s
>> > >>>> releases.
>> > >>>>
>> > >>>> I understand this. But really, do you think that it is a bad idea to
>> > >>>> switch to a well maintained library which is already used quite 
>> > >>>> widely (the
>> > >>>> website mentions extensions for sketches in Apache Druid, Hive, Pig, 
>> > >>>> Pinot
>> > >>>> and PostgreSQL) and using the library which was abandoned for 6 years?
>> > >>>>
>> > >>>> As I mentioned there is also extensive comparison with Clearspring (1)
>> > >>>> where all performance benefits / speedups etc present in detail with 
>> > >>>> charts
>> > >>>> attached.
>> > >>>>
>> > >>>> I think this is a mature project, under Apache, so when we think that 
>> > >>>> a
>> > >>>> 6 years old and abandoned library is better than what Apache 
>> > >>>> Datasketches
>> > >>>> provides, then the question is what are we doing here? Are we not 
>> > >>>> believing
>> > >>>> what Apache itself offers and we need to rely on a 6 years old and 
>> > >>>> dead
>> > >>>> library instead of that? Huh? That lib has 3k commits, releases 
>> > >>>> often, it's
>> > >>>> a pretty active project ...
>> > >>>>
>> > >>>> I don't say that we should not test more deeply how it behaves, we
>> > >>>> might even re-consider the parameters of hyperloglog as we do that. 
>> > >>>> But I
>> > >>>> don't think that having this library introduced would cause some kind 
>> > >>>> of a
>> > >>>> widespread / systemic risk.
>> > >>>>
>> > >>>> (1) https://datasketches.apache.org/docs/HLL/Hll_vs_CS_Hllpp.html
>> > >>>>
>> > >>>> On Thu, Jan 2, 2025 at 5:03 PM Benedict <bened...@apache.org> wrote:
>> > >>>>
>> > >>>>> I am strongly opposed to updating libraries simply for the sake of 
>> > >>>>> it.
>> > >>>>> Something like HLL does not need much ongoing maintenance if it 
>> > >>>>> works.
>> > >>>>> We’re simply asking for extra work and bugs by switching, and some 
>> > >>>>> risk
>> > >>>>> without understanding the quality control for the new library 
>> > >>>>> project’s
>> > >>>>> releases.
>> > >>>>>
>> > >>>>> That said, I was not very impressed with the clear spring library 
>> > >>>>> when
>> > >>>>> I looked at it, so I would be open to a stronger argument about data
>> > >>>>> sketches being superior otherwise in a way that matters to us.
>> > >>>>>
>> > >>>>> If we are to replace the library, we should at the very least do
>> > >>>>> proper due diligence by reviewing the new library’s implementation(s)
>> > >>>>> ourselves. We cannot simply assume the new library behaves well for 
>> > >>>>> our use
>> > >>>>> cases, or is well maintained.
>> > >>>>>
>> > >>>>> We should also not use the fallback intersection method, as this 
>> > >>>>> would
>> > >>>>> represent a regression to compaction on upgrade. We should really 
>> > >>>>> convert
>> > >>>>> from one HLL to another.
>> > >>>>>
>> > >>>>> The proposal to reduce allocations appears to be orthogonal to this
>> > >>>>> library, so let’s separate out that discussion? If there’s evidence 
>> > >>>>> this
>> > >>>>> library alone improves the memory profile let’s discuss that.
>> > >>>>>
>> > >>>>>
>> > >>>>> On 2 Jan 2025, at 15:26, Chris Lohfink <clohfin...@gmail.com> wrote:
>> > >>>>>
>> > >>>>> 
>> > >>>>> I think switching to datasketches is a good idea first off simply
>> > >>>>> because of the lack of maintenance and improvements from 
>> > >>>>> clearspring. I am
>> > >>>>> however, am not sold that it will actually improve anything 
>> > >>>>> significantly.
>> > >>>>> Caches might help on small cases, but those small cases probably are 
>> > >>>>> not
>> > >>>>> actually impacted. In the large cases the caches cost more in 
>> > >>>>> complexity,
>> > >>>>> memory, and ultimately wont matter when theres 50k sstables and the 
>> > >>>>> cache
>> > >>>>> holds 1k so everythings hitting disk anyway.
>> > >>>>>
>> > >>>>> The 5% is missing some relevant information like what the allocation
>> > >>>>> rate was, how many tables there are etc. On an idle system thats
>> > >>>>> meaningless, if there were 5gb/s allocations of reads/writes 
>> > >>>>> happening at
>> > >>>>> the time thats huge.
>> > >>>>>
>> > >>>>> On Thu, Jan 2, 2025 at 8:42 AM Štefan Miklošovič <
>> > >>>>> smikloso...@apache.org> wrote:
>> > >>>>>
>> > >>>>>> Interesting, thanks for this. Well ... 5% here, 5% there ... it
>> > >>>>>> compounds. I think it is worth trying to do something with this. 
>> > >>>>>> Would be
>> > >>>>>> great if you were part of this effort!
>> > >>>>>>
>> > >>>>>> On Thu, Jan 2, 2025 at 3:38 PM Dmitry Konstantinov <
>> > >>>>>> netud...@gmail.com> wrote:
>> > >>>>>>
>> > >>>>>>> I have seen this place in async profiler memory allocation profile
>> > >>>>>>> on one of production environments some time ago, it was visible 
>> > >>>>>>> but not
>> > >>>>>>> dramatic, about 5% of allocations:
>> > >>>>>>> <image.png>
>> > >>>>>>>
>> > >>>>>>> The amount of overhead also depends on a metric collection 
>> > >>>>>>> frequency
>> > >>>>>>> (in my case it was once per 60 seconds)
>> > >>>>>>>
>> > >>>>>>> Regards,
>> > >>>>>>> Dmitry
>> > >>>>>>>
>> > >>>>>>> On Thu, 2 Jan 2025 at 14:21, Štefan Miklošovič <
>> > >>>>>>> smikloso...@apache.org> wrote:
>> > >>>>>>>
>> > >>>>>>>> Indeed, I plan to measure it and compare, maybe some bench test
>> > >>>>>>>> would be cool to add ..
>> > >>>>>>>>
>> > >>>>>>>> I strongly suspect that the primary reason for the slowness (if it
>> > >>>>>>>> is verified to be true) is us going to the disk every time and 
>> > >>>>>>>> reading
>> > >>>>>>>> stats for every SSTable all over again.
>> > >>>>>>>>
>> > >>>>>>>> While datasketches say that it is way faster to update (1), we are
>> > >>>>>>>> living in a realm of nanoseconds here and I don't think that 
>> > >>>>>>>> itself would
>> > >>>>>>>> make any meaningful difference when merging one hyperloglog with 
>> > >>>>>>>> another as
>> > >>>>>>>> part of partition rows estimation computation. The only place we 
>> > >>>>>>>> are
>> > >>>>>>>> updating is SortableTableWriter#endParition which calls
>> > >>>>>>>> metadatacollector.addKey(key.getKey()) which eventually updates 
>> > >>>>>>>> the
>> > >>>>>>>> estimator via cardinality#offeredHashed.
>> > >>>>>>>>
>> > >>>>>>>> In other words, I think that going to the disk and reading it
>> > >>>>>>>> repeatedly is disproportionally more IO / time intensive than 
>> > >>>>>>>> switching the
>> > >>>>>>>> hyperloglog implementation.
>> > >>>>>>>>
>> > >>>>>>>> However, I consider the replacement of the library still 
>> > >>>>>>>> important.
>> > >>>>>>>> I feel uneasy about staying with an abandoned library where there 
>> > >>>>>>>> is
>> > >>>>>>>> clearly a well-maintained replacement.
>> > >>>>>>>>
>> > >>>>>>>> What we could do is to cache all cardinality estimators and just
>> > >>>>>>>> merge it all when asked upon metric resolution. That is different 
>> > >>>>>>>> from
>> > >>>>>>>> going to disk to deserialize StatsComponent's all over again.
>> > >>>>>>>>
>> > >>>>>>>> Then on SSTable removal, we would remove that from cache too. I
>> > >>>>>>>> think there is some kind of an observer when SSTable is removed 
>> > >>>>>>>> ...
>> > >>>>>>>>
>> > >>>>>>>> However, I am not sure I can just hold it all in the memory, it
>> > >>>>>>>> works for laptop testing but if we have thousands of SSTables with
>> > >>>>>>>> non-trivial number of rows things start to get interesting.
>> > >>>>>>>>
>> > >>>>>>>> (1) https://datasketches.apache.org/docs/HLL/Hll_vs_CS_Hllpp.html
>> > >>>>>>>> - section HllSketch vs. HyperLogLogPlus Update Speed Behavior
>> > >>>>>>>>
>> > >>>>>>>> On Thu, Jan 2, 2025 at 2:46 PM Jon Haddad 
>> > >>>>>>>> <j...@rustyrazorblade.com>
>> > >>>>>>>> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> Sounds interesting.  I took a look at the issue but I'm not 
>> > >>>>>>>>> seeing
>> > >>>>>>>>> any data to back up "expensive".  Can this be quantified a bit 
>> > >>>>>>>>> more?
>> > >>>>>>>>>
>> > >>>>>>>>> Anytime we have a performance related issue, there should be some
>> > >>>>>>>>> data to back it up, even if it seems obvious.
>> > >>>>>>>>>
>> > >>>>>>>>> Jon
>> > >>>>>>>>>
>> > >>>>>>>>> On Thu, Jan 2, 2025 at 8:20 AM Štefan Miklošovič <
>> > >>>>>>>>> smikloso...@apache.org> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>> Hello,
>> > >>>>>>>>>>
>> > >>>>>>>>>> I just stumbled upon this library we are using for getting
>> > >>>>>>>>>> estimations of the number of partitions in a SSTable which are 
>> > >>>>>>>>>> used e.g. in
>> > >>>>>>>>>> EstimatedPartitionCount metric. (1)
>> > >>>>>>>>>>
>> > >>>>>>>>>> A user reported in (1) that it is an expensive operation. When
>> > >>>>>>>>>> one looks into what it is doing, it calls
>> > >>>>>>>>>> SSTableReader.getApproximateKeyCount() (6) which basically goes 
>> > >>>>>>>>>> to disk
>> > >>>>>>>>>> every single time, it loads all Stats components and it looks 
>> > >>>>>>>>>> into
>> > >>>>>>>>>> CompactionMetadata where the cardinality estimator is located.
>> > >>>>>>>>>>
>> > >>>>>>>>>> We are serializing the hyperloglog to disk as part of a SSTable
>> > >>>>>>>>>> and we deserialize it back in runtime for every SSTable in a CF 
>> > >>>>>>>>>> and we
>> > >>>>>>>>>> merge them all to one cardinality again.
>> > >>>>>>>>>>
>> > >>>>>>>>>> I do not think there is a way around this because of the nature
>> > >>>>>>>>>> of how a cardinality estimator works (hyperloglog). We can not 
>> > >>>>>>>>>> "cache it",
>> > >>>>>>>>>> it would work only in case we are adding SSTables only - hence 
>> > >>>>>>>>>> we would
>> > >>>>>>>>>> just merge again - but if we remove an SSTable as part of the 
>> > >>>>>>>>>> compaction,
>> > >>>>>>>>>> we can not "unmerge" it.
>> > >>>>>>>>>>
>> > >>>>>>>>>> That being said, we are currently using this library for
>> > >>>>>>>>>> hyperloglog (1) which was archived in summer 2020 and nothing 
>> > >>>>>>>>>> was
>> > >>>>>>>>>> contributed to that for 6 years. That lib is dead.
>> > >>>>>>>>>>
>> > >>>>>>>>>> There is very nice replacement of that (2) directly from Apache
>> > >>>>>>>>>> (!!!) and they are even giving the detailed and in-depth 
>> > >>>>>>>>>> comparison of
>> > >>>>>>>>>> hyperloglog implementation found in stream-lib we happen to use 
>> > >>>>>>>>>> (3)
>> > >>>>>>>>>> (stream-lib = Clearspring) where they are saying that updating 
>> > >>>>>>>>>> is way
>> > >>>>>>>>>> faster and it is also giving better estimations in general.
>> > >>>>>>>>>>
>> > >>>>>>>>>> I have implemented the usage of both cardinality estimators (4),
>> > >>>>>>>>>> (5). The reason we need to keep the old one around is that we 
>> > >>>>>>>>>> may have old
>> > >>>>>>>>>> SSTables around and we need to work with them too. That 
>> > >>>>>>>>>> translates to a new
>> > >>>>>>>>>> SSTable version (ob) which uses new implementation and for 
>> > >>>>>>>>>> versions < ob,
>> > >>>>>>>>>> it uses the old one. When SSTables are upgraded from oa to ob, 
>> > >>>>>>>>>> the old
>> > >>>>>>>>>> estimator will not be used anymore.
>> > >>>>>>>>>>
>> > >>>>>>>>>> There is also a case of a user not upgrading his oa SSTables,
>> > >>>>>>>>>> turning a node on and creating new SSTables with ob version. 
>> > >>>>>>>>>> When this
>> > >>>>>>>>>> happens and we ask what is the cardinality (e.g via nodetool 
>> > >>>>>>>>>> tablestats), I
>> > >>>>>>>>>> am checking if all SSTables are on the same version or not. If 
>> > >>>>>>>>>> they are,
>> > >>>>>>>>>> they will use either an old or new estimator. (we can not merge 
>> > >>>>>>>>>> estimations
>> > >>>>>>>>>> from two different hyperloglog implementations). If they are 
>> > >>>>>>>>>> not, it will
>> > >>>>>>>>>> compute that from index summaries. (The computation for index 
>> > >>>>>>>>>> summaries was
>> > >>>>>>>>>> already in place (6) as a fail-over in case the estimation 
>> > >>>>>>>>>> computation
>> > >>>>>>>>>> failed / was not present).
>> > >>>>>>>>>>
>> > >>>>>>>>>> Does this all make sense to drive further to the completion and
>> > >>>>>>>>>> eventually merge this work to trunk?
>> > >>>>>>>>>>
>> > >>>>>>>>>> Worth to add that Apache Datasketches are just two dependencies
>> > >>>>>>>>>> for us, it has zero external dependencies.
>> > >>>>>>>>>>
>> > >>>>>>>>>> (1) https://github.com/addthis/stream-lib
>> > >>>>>>>>>> (2) https://datasketches.apache.org/
>> > >>>>>>>>>> (3) 
>> > >>>>>>>>>> https://datasketches.apache.org/docs/HLL/Hll_vs_CS_Hllpp.html
>> > >>>>>>>>>> (4) https://issues.apache.org/jira/browse/CASSANDRA-13338
>> > >>>>>>>>>> (5) https://github.com/apache/cassandra/pull/3767
>> > >>>>>>>>>> (6)
>> > >>>>>>>>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L284-L338
>> > >>>>>>>>>>
>> > >>>>>>>>>> Regards
>> > >>>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>
>> > >>>>>>> --
>> > >>>>>>> Dmitry Konstantinov
>> > >>>>>>>
>> > >>>>>>
>> >
>> > --
>> > Dmitry Konstantinov
>> >
>
>
>
> --
> Dmitry Konstantinov

Reply via email to