I'm getting to this a little late, but as for the missing timestamp semantics, it's a +1 from me for using Long.MIN_VALUE for missing timestamps for the reasons outlined by Matthias previously.
Thanks, Bill On Wed, Dec 6, 2017 at 2:05 AM, Dong Lin <lindon...@gmail.com> wrote: > Sounds good. I don't think there is concern with using Long.MIN_VALUE to > indicate that timestamp is not available. > > As Matthias also mentioned, using Long.MIN_VALUE to indicate missing > timestamp seems better than overloading -1 semantics. Do you want to update > the "NO_TIMESTAMP (−1) problem" session in the KIP? It may also be useful > to briefly mention the alternative solution we discussed (I realized that > Ted also mentioned this alternative). > > Thanks, > Dong > > On Tue, Dec 5, 2017 at 8:26 PM, Boerge Svingen <bsvin...@borkdal.com> > wrote: > > > > > Thank you for the suggestion. We considered this before. It works, but > > it’s a hack, and we would be providing a bad user experience for our > > consumers if we had to explain, “if you want to start consuming in 2014, > > you have to pretend to want 2214”. > > > > We would rather solve the underlying problem. These are perfectly valid > > timestamps, and I can’t see any reason why Kafka shouldn’t support them > - I > > don’t think using `Long.MIN_VALUE` instead of -1 would necessarily add > > complexity here? > > > > > > Thanks, > > Boerge. > > > > > > > > > On 2017-12-05, at 21:36, Dong Lin <lindon...@gmail.com> wrote: > > > > > > Hey Boerge, > > > > > > Thanks for the blog link. I will read this blog later. > > > > > > Here is another alternative solution which may be worth thinking. We > know > > > that the Unix time 0 corresponds to January 1, 1970. Let's say the > > earliest > > > time you may want to use as the timestamp of the Kafka message is > within > > X > > > milliseconds before the January 1, 1970. Then you can add X to the > > > timestamp before you produce Kafka message. And you can also make > similar > > > conversion when you use `offsetsForTimes()` or after you consume > > messages. > > > This seems to address your use-case without introducing negative > > timestamp. > > > > > > IMO, this solution requires a bit more logic in your application code. > > But > > > it keeps the Kafka timestamp logic simple and we reserve the capability > > to > > > use timestamp -1 for messages without timestamp for most Kafka users > who > > do > > > not need negative timestamp. Do you think this would be a good > > alternative > > > solution? > > > > > > Thanks, > > > Dong > > > > > > > > > On Tue, Dec 5, 2017 at 5:39 PM, Boerge Svingen <bsvin...@borkdal.com> > > wrote: > > > > > >> > > >> Yes. To provide a little more detail, we are using Kafka to store > > >> everything ever published by The New York Times, and to make this > > content > > >> available to a range of systems and applications. Assets are published > > to > > >> Kafka chronologically, so that consumers can seek to any point in time > > and > > >> start consuming from there, like Konstantin is describing, all the way > > back > > >> to our beginning in 1851. > > >> > > >> https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ > < > > >> https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ > > > > >> has more information on the use case. > > >> > > >> > > >> Thanks, > > >> Boerge. > > >> > > >> > > >> -- > > >> > > >> Boerge Svingen > > >> Director of Engineering > > >> The New York Times > > >> > > >> > > >> > > >> > > >>> On 2017-12-05, at 19:35, Dong Lin <lindon...@gmail.com> wrote: > > >>> > > >>> Hey Konstantin, > > >>> > > >>> According to KIP-32 the timestamp is also used for log rolling and > log > > >>> retention. Therefore, unless broker is configured to never delete any > > >>> message based on time, messages produced with negative timestamp in > > your > > >>> use-case will be deleted by the broker anyway. Do you actually plan > to > > >> use > > >>> Kafka as a persistent storage system that never delete messages? > > >>> > > >>> Thanks, > > >>> Dong > > >>> > > >>> > > >>> > > >>> > > >>> On Tue, Dec 5, 2017 at 1:24 PM, Konstantin Chukhlomin < > > >> chuhlo...@gmail.com> > > >>> wrote: > > >>> > > >>>> Hi Dong, > > >>>> > > >>>> Currently we are storing historical timestamp in the message. > > >>>> > > >>>> What we are trying to achieve is to make it possible to do Kafka > > lookup > > >>>> by timestamp. Ideally I would do `offsetsForTimes` to find articles > > >>>> published > > >>>> in 1910s (if we are storing articles on the log). > > >>>> > > >>>> So first two suggestions aren't really covering our use-case. > > >>>> > > >>>> We could create a new timestamp type like "HistoricalTimestamp" or > > >>>> "MaybeNegativeTimestamp". > > >>>> And the only difference between this one and CreateTime is that it > > could > > >>>> be negative. > > >>>> I tend to use CreateTime for this purpose because it's easier to > > >>>> understand from > > >>>> user perspective as a timestamp which publisher can set. > > >>>> > > >>>> Thanks, > > >>>> Konstantin > > >>>> > > >>>>> On Dec 5, 2017, at 3:47 PM, Dong Lin <lindon...@gmail.com> wrote: > > >>>>> > > >>>>> Hey Konstantin, > > >>>>> > > >>>>> Thanks for the KIP. I have a few questions below. > > >>>>> > > >>>>> Strictly speaking Kafka actually allows you to store historical > data. > > >> And > > >>>>> user are free to encode arbitrary timestamp field in their Kafka > > >> message. > > >>>>> For example, your Kafka message can currently have Json or Avro > > format > > >>>> and > > >>>>> you can put a timestamp field there. Do you think that could > address > > >> your > > >>>>> use-case? > > >>>>> > > >>>>> Alternatively, KIP-82 introduced Record Header in Kafka and you can > > >> also > > >>>>> define your customized key/value pair in the header. Do you think > > this > > >>>> can > > >>>>> address your use-case? > > >>>>> > > >>>>> Also, currently there are two types of timestamp according to > KIP-32. > > >> If > > >>>>> the type is LogAppendTime then the timestamp value is the time when > > >>>> broker > > >>>>> receives the message. If the type is CreateTime then the timestamp > > >> value > > >>>> is > > >>>>> determined when producer produces message. With these two > > definitions, > > >>>> the > > >>>>> timestamp should always be positive. We probably need a new type > here > > >> if > > >>>> we > > >>>>> can not put timestamp in the Record Header or the message payload. > > Does > > >>>>> this sound reasonable? > > >>>>> > > >>>>> Thanks, > > >>>>> Dong > > >>>>> > > >>>>> > > >>>>> > > >>>>> On Tue, Dec 5, 2017 at 8:40 AM, Konstantin Chukhlomin < > > >>>> chuhlo...@gmail.com> > > >>>>> wrote: > > >>>>> > > >>>>>> Hi all, > > >>>>>> > > >>>>>> I have created a KIP to support negative timestamp: > > >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > >>>>>> 228+Negative+record+timestamp+support <https://cwiki.apache.org/ > > >>>>>> confluence/display/KAFKA/KIP-228+Negative+record+timestamp+ > support> > > >>>>>> > > >>>>>> Here are proposed changes: https://github.com/apache/ > > >>>>>> kafka/compare/trunk...chuhlomin:trunk <https://github.com/apache/ > > >>>>>> kafka/compare/trunk...chuhlomin:trunk> > > >>>>>> > > >>>>>> I'm pretty sure that not cases are covered, so comments and > > >> suggestions > > >>>>>> are welcome. > > >>>>>> > > >>>>>> Thank you, > > >>>>>> Konstantin > > >>>> > > >>>> > > >> > > >> > > > > >