Comments inline.

On Thu, Feb 2, 2017 at 6:28 PM, Jason Gustafson <ja...@confluent.io> wrote:

> Took me a while to remember why we didn't do this. The timestamp that is
> included at the message set level is the max timestamp of all messages in
> the message set as is the case in the current message format (I will update
> the document to make this explicit). We could make the message timestamps
> relative to the max timestamp, but that makes serialization a bit awkward
> since the timestamps are not assumed to be increasing sequentially or
> monotonically. Once the messages in the message set had been determined, we
> would need to go back and adjust the relative timestamps.
>

Yes, I thought this would be a bit tricky and hence why I mentioned the
option of adding a new field at the message set level for the first
timestamp even though that's not ideal either.

Here's one idea. We let the timestamps in the messages be varints, but we
> make their values be relative to the timestamp of the previous message,
> with the timestamp of the first message being absolute. For example, if we
> had timestamps 500, 501, 499, then we would write 500 for the first
> message, 1 for the next, and -2 for the final message. Would that work? Let
> me think a bit about it and see if there are any problems.
>

It's an interesting idea. Comparing to the option of having the first
timestamp in the message set, It's a little more space efficient as we
don't have both a full timestamp in the message set _and_ a varint in the
first message (which would always be 0, so we avoid the extra byte) and
also the deltas could be a little smaller in the common case. The main
downside is that it introduces a semantics inconsistency between the first
message and the rest. Not ideal, but maybe we can live with that.

Ismael

Reply via email to