Re: [D] TimeSeries Proposal [kvrocks]

via GitHub Thu, 10 Jul 2025 18:32:01 -0700


GitHub user mapleFU added a comment to the discussion: TimeSeries Proposal


For (4), you can dynamically compute it. In fact in (1) I think the timestamp 
in the root metadata can be eliminated if we care about the performance.

> We could allow users to specify chunk_size during time series creation to 
> avoid oversized chunks? A key advantage of fixed time-window chunks is that 
> given a timestamp, the corresponding chunk can be quickly located through 
> simple calculations. Could you elaborate further on the "internal merging 
> rules" and "dynamic chunking"?

The problem is that, assuming lots of tags and timeline, and a running server, 
some time it will have burst write, and the chunk would be extremely huge. 
Sometimes it's writing slowly, which causes very very small or empty chunks. 
It's hard for user to tune this in most cases. If the `chunk_size` is bytes, it 
assures that the system would not have "extremly large" or "extremly small" 
chunks. If we use time to switch firstly, we might need to merge small chunks 
to huge chunks, otherwise system can run slower and slower.

> The Compressed Chunk Type might be optional, as the performance benefits of 
> compression are still unclear

Generally this requires something like "byte-level" handling, and prevents from 
existing simd decoding, but I didn't testing that. Personally a compressed 
chunk might be regarded as "seal" ( meaning no or few writing ), so it should 
be size and aggregate optimized. Two stride is good for scan ( RANGE ) but not 
good for pointget( GET )

GitHub link: 
https://github.com/apache/kvrocks/discussions/3044#discussioncomment-13726774

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] TimeSeries Proposal [kvrocks]

Reply via email to