Thanks for that. I figured out how to manage it in the Java lib. You need to use a WritableMemory to wrap the byte array and then explicitly instantiate an UpdateSketch with the WritableMemory. This is now working and I'm doing some prototyping. Ideally I could use this from the C++ library as well, but I will work with the Java lib for now while investigating.
I will spend some time seeing if I can simplify a series model to do what I want. On Thu, Aug 26, 2021 at 12:07 AM Alexander Saydakov < sayda...@verizonmedia.com> wrote: > I believe that Java code still has the functionality to serialize and > deserialize updatable Theta sketches. You point to a "wrap" operation, > which is one of two ways to deserialize: heapify (instantiate an object on > heap from a given chunk of bytes, involves copying data) and wrap (directly > operate on a given chunk of bytes, often off-heap) > > Perhaps you could explain your use case a little more? What would the life > cycle of your sketches be? When would you serialize them? When deserialize? > How many do you anticipate to keep overall? How many would you like to > update? What is the reason for serializing? And so on. > > On Wed, Aug 25, 2021 at 2:26 PM Karl Matthias <k...@community.com> wrote: > >> Thank you, I will dig around the old source and see if I can find it. >> AFAICT it was already removed from the Java implementation as well [1]. You >> can serialize an UpdateSketch but when deserializing they are read-only. >> >> I do deeply understand time series data (I was on the team that designed >> the second generation metrics pipeline at New Relic) but the problem I'm >> trying to solve is not nicely modeled as a time series. Of course that is >> possible, but doing it that way will require much more data and many more >> calculations than I want at reporting time. The reported data will always >> be for all time. So modeling as a time series will require an increasingly >> large number of sketches, and possibly thus also a periodic >> roll-up/compaction phase. None of which is necessary if I can simply update >> the same sketch—really a set of them representing various dimensions—until >> I rebuild it/them from the source events on a periodic basis. It is also >> too much cardinality across too many dimensions to use the sketches simply >> as a roll-up tool for distinct counting on the original data. >> >> I was hoping a private fork wasn't necessary to do it, but I can >> understand that you folks intentionally chose not to support it. I will >> have a go at it and see what I can make work. >> >> Thanks for the replies! >> >> [1] >> https://github.com/apache/datasketches-java/blob/27ecce938555d731f29df97f12f4744a0efb663d/src/main/java/org/apache/datasketches/theta/Sketch.java#L139 >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_datasketches-2Djava_blob_27ecce938555d731f29df97f12f4744a0efb663d_src_main_java_org_apache_datasketches_theta_Sketch.java-23L139&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=4MOEFXeD5db9oY9LJT00yMhrs15KmwAKMoMQm_mpWP8&s=qPeEDGmb9kd6n6nkOG002YD-j3Taq0udBPitc-G_rHk&e=> >> >> On Wed, Aug 25, 2021 at 9:46 PM Alexander Saydakov < >> sayda...@verizonmedia.com> wrote: >> >>> It is possible, and we used to have serialization and deserialization of >>> updatable Theta sketches. At some point we decided that it is more >>> confusing than useful and might encourage anti-patterns in big systems >>> (such as deserialize-update-serialize sequences on every update). So we >>> removed this functionality from the C++ code, but not from Java (yet). >>> Again, I would suggest treating serialization as finalizing a sketch. If >>> you want to update it, create a fresh one for this new time frame or >>> whatever classifier makes sense (batch, session, transaction). Hopefully >>> this new sketch can be kept for updating for a while (unlit some >>> close-of-books for a period of time or until the whole batch is processed >>> or something). Finalized sketches can be easily merged as needed. Say, you >>> create a new sketch every minute and serialize the previous one. Later you >>> can have your report to show the last 60-min rolling window or a calendar >>> day or something like that by aggregating the appropriate set of sketches >>> for that report. >>> >>> >>> On Wed, Aug 25, 2021 at 1:20 PM Karl Matthias <k...@community.com> >>> wrote: >>> >>>> Thanks for the reply. Yes I could do time series sketches, but what I >>>> want actually is a summary representation of the current set, which I >>>> update over time and eventually replace entirely. It's an evented system >>>> and I want to use Theta sketches as a sort of summary. I can rebuild them >>>> entirely at any time, but if maintained live they would be a fast >>>> approximation that is combinable with other Theta sketches. Ideally I would >>>> not have to keep them all in memory to do that and could serialize and >>>> deserialize at will. >>>> >>>> It sounds like it's not currently implemented. But if I can manage the >>>> code to do it, it is possible? >>>> >>>> On Wed, Aug 25, 2021 at 8:09 PM Alexander Saydakov < >>>> sayda...@verizonmedia.com> wrote: >>>> >>>>> Is there a good reason to necessarily update the same sketch you >>>>> decided to serialize? >>>>> I would suggest considering that sketch finalized. Perhaps, in your >>>>> system these sketches would represent different time periods or different >>>>> categories or something like that. Later on you may want to merge (union) >>>>> some of them to obtain an estimate for a longer time frame or a total >>>>> across categories and so on. >>>>> >>>>> On Wed, Aug 25, 2021 at 11:14 AM Karl Matthias <k...@community.com> >>>>> wrote: >>>>> >>>>>> Hey folks, >>>>>> >>>>>> I am working with both the Java library and the C++ library and the >>>>>> Theta sketch. >>>>>> >>>>>> What I would like to do is update a sketch, save it somewhere (i.e. >>>>>> disk, etc), then reload it later and possibly update it then. The >>>>>> CompactSketch doesn't support updates when an UpdateSketch is serialized >>>>>> and loaded, it is read-only. >>>>>> >>>>>> From looking at the Java code it seems like it would be possible to >>>>>> create an UpdateSketch from the contents of a CompactSketch but there >>>>>> doesn't appear to be an existing method that does this. Am I missing >>>>>> something that already does this? Or is it not possible? >>>>>> >>>>>> Many thanks >>>>>> Karl >>>>>> >>>>>>