GitHub user yezhizi edited a comment on the discussion: TimeSeries Proposal

Thank you very much for taking the time and providing these suggestions!
1. The key format will be revised (I haven’t had time to update the proposal 
yet, my apologies). The current `key|version` will be modified to something 
like: `key|version|LABEL_IDX|label_key`.

2. Thank you for your attention to detail! In fact, our proposal aligns closely 
with your approach (though it wasn’t clearly stated in the proposal). While 
`lastTimestamp` can be easily updated during data writes, `firstTimestamp` is 
not updated with every write. This is because:

> Note: TS.INFO needs to display the oldest unexpired firstTimestamp. This 
> field in datameta isn’t updated during data addition/expiration but is 
> instead retrieved when TS.INFO is explicitly called.

Thus, when users explicitly call `TS.INFO`, we start iterating from the 
expiration boundary time, locate the specific chunk via the `Chunk datameta` 
key (which contains the chunk’s `first` and `end` timestamps), and then fetch 
the exact timestamp from Data storage.

3. We could allow users to specify `chunk_size` during time series creation to 
avoid oversized chunks? A key advantage of fixed time-window chunks is that 
given a timestamp, the corresponding chunk can be quickly located through 
simple calculations. Could you elaborate further on the "internal merging 
rules" and "dynamic chunking"?

4. The `memoryUsage` field is displayed in Redis’s 
[`TS.INFO`](https://redis.io/docs/latest/commands/ts.info/), but it seems 
impractical to track this in Kvrocks. Should we retain this field solely for 
Redis compatibility? I have no idea.  cc @PragmaTwice @git-hulk 
Also: 
> Some fields may not align with our design:
> - `chunkSize`: In `Redis`, this is a fixed byte size specified at creation.
> - `memoryUsage`: This metric might be challenging to calculate and maintain 
> accurately. As an alternative, we could potentially use `diskUsage` instead, 
> with support from the `chunk datameta` structure ?

5. The Compressed Chunk Type might be optional, as the performance benefits of 
compression are still unclear. The format `ts1 value1 ts2 value2` is adopted 
because each time-series data point consists of two fields `(timestamp, value)` 
that need to be wrote or queried together. Therefore, storing them adjacently 
is logically sound. In comparison, using separate sequences like `ts1 ts2...` 
for timestamps and `value1 value2...` for values seams to offer minimal 
improvement during data append operations?

GitHub link: 
https://github.com/apache/kvrocks/discussions/3044#discussioncomment-13722681

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to