GitHub user yezhizi edited a comment on the discussion: TimeSeries Proposal
Thank you very much for taking the time and providing these suggestions! 1. The key format will be revised (I haven’t had time to update the proposal yet, my apologies). The current `key|version` will be modified to something like: `key|version|LABEL_IDX|label_key`. 2. Thank you for your attention to detail! In fact, our proposal aligns closely with your approach (though it wasn’t clearly stated in the proposal). While `lastTimestamp` can be easily updated during data writes, `firstTimestamp` is not updated with every write. This is because: > Note: TS.INFO needs to display the oldest unexpired firstTimestamp. This > field in datameta isn’t updated during data addition/expiration but is > instead retrieved when TS.INFO is explicitly called. Thus, when users explicitly call `TS.INFO`, we start iterating from the expiration boundary time, locate the specific chunk via the `Chunk datameta` key (which contains the chunk’s `first` and `end` timestamps), and then fetch the exact timestamp from Data storage. 3. We could allow users to specify `chunk_size` during time series creation to avoid oversized chunks? A key advantage of fixed time-window chunks is that given a timestamp, the corresponding chunk can be quickly located through simple calculations. Could you elaborate further on the "internal merging rules" and "dynamic chunking"? 4. The `memoryUsage` field is displayed in Redis’s [`TS.INFO`](https://redis.io/docs/latest/commands/ts.info/), but it seems impractical to track this in Kvrocks. Should we retain this field solely for Redis compatibility? I have no idea. cc @PragmaTwice @git-hulk Also: > Some fields may not align with our design: > - `chunkSize`: In `Redis`, this is a fixed byte size specified at creation. > - `memoryUsage`: This metric might be challenging to calculate and maintain > accurately. As an alternative, we could potentially use `diskUsage` instead, > with support from the `chunk datameta` structure ? 5. The Compressed Chunk Type might be optional, as the performance benefits of compression are still unclear. The format `ts1 value1 ts2 value2` is adopted because each time-series data point consists of two fields `(timestamp, value)` that need to be wrote or queried together. Therefore, storing them adjacently is logically sound. In comparison, using separate sequences like `ts1 ts2...` for timestamps and `value1 value2...` for values seams to offer minimal improvement during data append operations? GitHub link: https://github.com/apache/kvrocks/discussions/3044#discussioncomment-13722681 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
