Hey Jun,

I am thinking about the interface. It seems that returning a timestamp for
earliestOffsets() and latestOffsets() is not that useful because the
timestamp of the first message is not necessarily the earliest timestamp,
and the timestamp of the last offset might be -1. Also exposing the
timestamp of high watermark seems unwanted. It seems better to return the
offsets without timestamps.

There are probably use cases that people want to get the timestamp of the
last message, but that requires users to consume the last message even if
we return the timestamp in the latestOffsets() because the returned
timestamp could be -1. So we can probably just make the interface only
return the offsets.

I'll make this change and start a voting thread.

Thanks for all the feedback.

Jiangjie (Becket) Qin

On Fri, Sep 9, 2016 at 10:55 AM, Becket Qin <becket....@gmail.com> wrote:

> Completely agree that we should have a consistent representation of
> something missing/unknown in the code.
>
> My understanding of the convention is that -1 means "not
> available"/"unknown". For example, when producer request failed, the offset
> we used in the callback is -1. And Message.NoTimestamp is also -1. Using -1
> instead of null has at least two benefits.
> 1) it works for primitive type as well as classes.
> 2) it is easy to send via wire protocols
>
> For the other use cases, Option/Null could be better.
>
> Thanks,
> Jiangjie (Becket) Qin
>
> On Fri, Sep 9, 2016 at 8:06 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>
>> Our inconsistent usage of magic values like `-1` and `null` to represent
>> missing values makes it hard to reason about code. We had a recent major
>> regression in MirrorMaker as a result of this[1], for example. `null` by
>> itself is hard enough (often called the billion dollar mistake), so can we
>> decide what we prefer and stick with that going forward?
>>
>> My take is: `Option`/`Optional` > `null` > `-1` unless it's a particularly
>> performance sensitive area where the boxing incurred by having `null` as a
>> possible value is a problem. Since we're still on Java 7, we can't use
>> `Optional` in Java code.
>>
>> Ismael
>>
>> [1] The fix:
>> https://github.com/apache/kafka/commit/4e4e2fb5085758ee9ccf6
>> 307433ad531a33198d3
>>
>> On Fri, Sep 9, 2016 at 3:48 PM, Jun Rao <j...@confluent.io> wrote:
>>
>> > Jiangjie,
>> >
>> > Returning TimestampOffset(-1, -1) sounds reasonable to me.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Thu, Sep 8, 2016 at 8:46 PM, Becket Qin <becket....@gmail.com>
>> wrote:
>> >
>> > > Hi Jun,
>> > >
>> > > > 1. latestOffsets() returns the next offset. So there won't be a
>> > timestamp
>> > > > associated with it. Would we use something like -1 for timestamp?
>> > >
>> > > The returned value would high watermark, so there might be an
>> associated
>> > > timestamp. But if it is log end offset, it seems that -1 is a
>> reasonable
>> > > value.
>> > >
>> > > > 2. Jason mentioned that if no message has timestamp >= the provided
>> > > > timestamp, we return a null value for that partition. Could we
>> document
>> > > > that in the wiki?
>> > >
>> > > I made a minor change to return a TimestampOffset(-1, -1) in that
>> case.
>> > Not
>> > > sure which on is better but there seems only minor difference. What do
>> > you
>> > > think?
>> > >
>> > > I haven't seen a planned release date yet, but I can probably get it
>> done
>> > > in 2-3 weeks with reasonable rounds of reviews.
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > >
>> > > On Thu, Sep 8, 2016 at 6:24 PM, Jun Rao <j...@confluent.io> wrote:
>> > >
>> > > > Hi, Jiangjie,
>> > > >
>> > > > Thanks for the updated KIP. A couple of minor comments.
>> > > >
>> > > > 1. latestOffsets() returns the next offset. So there won't be a
>> > timestamp
>> > > > associated with it. Would we use something like -1 for timestamp?
>> > > >
>> > > > 2. Jason mentioned that if no message has timestamp >= the provided
>> > > > timestamp, we return a null value for that partition. Could we
>> document
>> > > > that in the wiki?
>> > > >
>> > > > BTW, we are getting close to the next release. This is a really nice
>> > > > feature to have. Do you think you will have a patch ready for the
>> next
>> > > > release?
>> > > >
>> > > > Thanks.
>> > > >
>> > > > Jun
>> > > >
>> > > > On Wed, Sep 7, 2016 at 2:47 PM, Becket Qin <becket....@gmail.com>
>> > wrote:
>> > > >
>> > > > > That sounds reasonable to me. I'll update the KIP wiki page.
>> > > > >
>> > > > > On Wed, Sep 7, 2016 at 1:34 PM, Jason Gustafson <
>> ja...@confluent.io>
>> > > > > wrote:
>> > > > >
>> > > > > > Hey Becket,
>> > > > > >
>> > > > > > I don't have a super strong preference, but I think this
>> > > > > >
>> > > > > > earliestOffset(singleton(partition));
>> > > > > >
>> > > > > > captures the intent more clearly than this:
>> > > > > >
>> > > > > > offsetsForTimes(singletonMap(partition, -1));
>> > > > > >
>> > > > > > I can understand the desire to keep the API footprint small,
>> but I
>> > > > think
>> > > > > > the use case is common enough to justify separate APIs. A couple
>> > > > > additional
>> > > > > > points:
>> > > > > >
>> > > > > > 1. If we had separate methods, it might make sense to treat
>> > negative
>> > > > > > timestamps as illegal in offsetsForTimes. That seems safer from
>> the
>> > > > user
>> > > > > > perspective since legitimate timestamps should always be
>> positive.
>> > > > > > 2. The expected behavior of offsetsForTimes is to return the
>> > earliest
>> > > > > > offset which is greater than or equal to the passed offset, so
>> > having
>> > > > > > Long.MAX_VALUE return the latest value doesn't seem very
>> intuitive
>> > to
>> > > > > me. I
>> > > > > > would actually expect it to return null.
>> > > > > >
>> > > > > > Given that, I think I prefer having the custom methods. What do
>> you
>> > > > > think?
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Jason
>> > > > > >
>> > > > > > On Wed, Sep 7, 2016 at 1:00 PM, Becket Qin <
>> becket....@gmail.com>
>> > > > wrote:
>> > > > > >
>> > > > > > > Hi Jason,
>> > > > > > >
>> > > > > > > Thanks for the feedback. That is a good point. For the -1 and
>> -2
>> > > > > > semantics,
>> > > > > > > I was just thinking we will preserve the semantics in the wire
>> > > > > protocol.
>> > > > > > > For the user facing API, I agree that is not intuitive. We
>> can do
>> > > one
>> > > > > of
>> > > > > > > the following:
>> > > > > > > 1. Add two separate methods: earliestOffsets() and
>> > latestOffsets().
>> > > > > > > 2. just have offsetsForTimes() and return the earliest if the
>> > > > timestamp
>> > > > > > is
>> > > > > > > negative and the latest if the timestamp is Long.MAX_VALUE.
>> > > > > > >
>> > > > > > > The good thing about doing (1) is that we kind of have
>> symmetric
>> > > > > function
>> > > > > > > signatures like seekToBeginning() and seekToEnd(). However,
>> even
>> > if
>> > > > we
>> > > > > do
>> > > > > > > (1), we may still need to do (2) to handle the negative
>> timestamp
>> > > and
>> > > > > the
>> > > > > > > Long.MAX_VALUE timestamp in offsetsForTimes(). Then they
>> > > essentially
>> > > > > > become
>> > > > > > > redundant to earliestOffsets() and latestOffsets().
>> > > > > > >
>> > > > > > > Personally I prefer option (2) because of the conciseness and
>> it
>> > > > seems
>> > > > > > > intuitive enough. But I am open to option (1) as well.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Jiangjie (Becket) Qin
>> > > > > > >
>> > > > > > > On Wed, Sep 7, 2016 at 11:06 AM, Jason Gustafson <
>> > > ja...@confluent.io
>> > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hey Becket,
>> > > > > > > >
>> > > > > > > > Thanks for the KIP. As I understand, the intention is to
>> > preserve
>> > > > the
>> > > > > > > > current behavior with a timestamp of -1 indicating latest
>> > > timestamp
>> > > > > and
>> > > > > > > -2
>> > > > > > > > indicating earliest timestamp. So users can query these
>> offsets
>> > > > using
>> > > > > > the
>> > > > > > > > offsetsForTimes API if they know the magic values. I'm
>> > wondering
>> > > if
>> > > > > it
>> > > > > > > > would make the usage a little nicer to have a separate API
>> > > instead
>> > > > > for
>> > > > > > > > these special cases? Sort of in the way that we expose a
>> > generic
>> > > > > seek()
>> > > > > > > and
>> > > > > > > > a seekToBeginning(), maybe we could have an
>> earliestOffset() in
>> > > > > > addition
>> > > > > > > to
>> > > > > > > > offsetsForTimes()?
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Jason
>> > > > > > > >
>> > > > > > > > On Wed, Sep 7, 2016 at 10:04 AM, Becket Qin <
>> > > becket....@gmail.com>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Thanks everyone for all the feedback.
>> > > > > > > > >
>> > > > > > > > > If there is no further concerns or comments I will start a
>> > > voting
>> > > > > > > thread
>> > > > > > > > on
>> > > > > > > > > this KIP tomorrow.
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > >
>> > > > > > > > > Jiangjie (Becket) Qin
>> > > > > > > > >
>> > > > > > > > > On Tue, Sep 6, 2016 at 9:48 AM, Becket Qin <
>> > > becket....@gmail.com
>> > > > >
>> > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi Magnus,
>> > > > > > > > > >
>> > > > > > > > > > Thanks for the comments. I agree that querying messages
>> > > within
>> > > > a
>> > > > > > time
>> > > > > > > > > > range is a valid use case (actually this is an example
>> use
>> > > case
>> > > > > in
>> > > > > > my
>> > > > > > > > > > previous email). The current proposal can achieve this
>> by
>> > > > having
>> > > > > > two
>> > > > > > > > > > ListOffsetRequest, right? I think the current API
>> already
>> > > > > supports
>> > > > > > > the
>> > > > > > > > > use
>> > > > > > > > > > cases that require the offsets for multiple timestamps.
>> The
>> > > > > > question
>> > > > > > > is
>> > > > > > > > > > that whether it is worth adding more complexity to the
>> > > protocol
>> > > > > to
>> > > > > > > make
>> > > > > > > > > it
>> > > > > > > > > > easier for multiple timestamp query. Personally I think
>> > given
>> > > > > that
>> > > > > > > > query
>> > > > > > > > > > multiple timestamps is likely an infrequent operation,
>> > there
>> > > is
>> > > > > no
>> > > > > > > need
>> > > > > > > > > to
>> > > > > > > > > > optimize for it and complicates the protocol.
>> > > > > > > > > >
>> > > > > > > > > > Thanks,
>> > > > > > > > > >
>> > > > > > > > > > Jiangjie (Becket) Qin
>> > > > > > > > > >
>> > > > > > > > > > On Mon, Sep 5, 2016 at 11:21 PM, Magnus Edenhill <
>> > > > > > mag...@edenhill.se
>> > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > >> Good write-up Qin, the API looks promising.
>> > > > > > > > > >>
>> > > > > > > > > >> I have one comment:
>> > > > > > > > > >>
>> > > > > > > > > >> 2016-09-03 5:20 GMT+02:00 Becket Qin <
>> > becket....@gmail.com
>> > > >:
>> > > > > > > > > >>
>> > > > > > > > > >> > The currently offsetsForTimes() API obviously does
>> not
>> > > > support
>> > > > > > > > > querying
>> > > > > > > > > >> > multiple timestamps for the same partition. It
>> doesn't
>> > > > seems a
>> > > > > > > > feature
>> > > > > > > > > >> for
>> > > > > > > > > >> > ListOffsetRequest v0 either (sounds more like a
>> bug). My
>> > > > > > intuition
>> > > > > > > > is
>> > > > > > > > > >> that
>> > > > > > > > > >> > it's a rare use case. Given it does not exist before
>> and
>> > > we
>> > > > > > don't
>> > > > > > > > see
>> > > > > > > > > a
>> > > > > > > > > >> > strong need from the community either, maybe it is
>> > better
>> > > to
>> > > > > > keep
>> > > > > > > it
>> > > > > > > > > >> simple
>> > > > > > > > > >> > for ListOffsetRequest v1. We can add it later if it
>> > turns
>> > > > out
>> > > > > to
>> > > > > > > be
>> > > > > > > > a
>> > > > > > > > > >> > useful feature (that may need a interface change,
>> but I
>> > > > > honestly
>> > > > > > > do
>> > > > > > > > > not
>> > > > > > > > > >> > think people would frequently query many different
>> > > > timestamps
>> > > > > > for
>> > > > > > > > the
>> > > > > > > > > >> same
>> > > > > > > > > >> > partition)
>> > > > > > > > > >> >
>> > > > > > > > > >>
>> > > > > > > > > >> I argue that the current behaviour of OffsetRequest
>> with
>> > > > regards
>> > > > > > to
>> > > > > > > > > >> duplicate partitions is a bug
>> > > > > > > > > >> and think it would be a mistake to move the same
>> semantics
>> > > > over
>> > > > > to
>> > > > > > > > thew
>> > > > > > > > > >> new
>> > > > > > > > > >> ListOffset API.
>> > > > > > > > > >> One use case is that an application may want to know
>> the
>> > > > offset
>> > > > > > > range
>> > > > > > > > > >> between two timestamps,
>> > > > > > > > > >> e.g., for reprocessing, batching, searching, etc.
>> > > > > > > > > >>
>> > > > > > > > > >>
>> > > > > > > > > >> Thanks,
>> > > > > > > > > >> Magnus
>> > > > > > > > > >>
>> > > > > > > > > >>
>> > > > > > > > > >>
>> > > > > > > > > >> >
>> > > > > > > > > >> > Have a good long weekend!
>> > > > > > > > > >> >
>> > > > > > > > > >> > Thanks,
>> > > > > > > > > >> >
>> > > > > > > > > >> > Jiangjie (Becket) Qin
>> > > > > > > > > >> >
>> > > > > > > > > >> >
>> > > > > > > > > >> >
>> > > > > > > > > >> >
>> > > > > > > > > >> > On Fri, Sep 2, 2016 at 6:10 PM, Ismael Juma <
>> > > > > ism...@juma.me.uk>
>> > > > > > > > > wrote:
>> > > > > > > > > >> >
>> > > > > > > > > >> > > Thanks for the proposal Becket. Looks good
>> overall, a
>> > > few
>> > > > > > > > comments:
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > ListOffsetResponse => [TopicName
>> [PartitionOffsets]]
>> > > > > > > > > >> > > >   PartitionOffsets => Partition ErrorCode
>> Timestamp
>> > > > > [Offset]
>> > > > > > > > > >> > > >   Partition => int32
>> > > > > > > > > >> > > >   ErrorCode => int16
>> > > > > > > > > >> > > >   Timestamp => int64
>> > > > > > > > > >> > > >   Offset => int
>> > > > > > > > > >> > >
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > It should be int64 for `Offset` right?
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > Implementation wise, we will migrate to
>> > > > > o.a.k.common.requests.
>> > > > > > > > > >> > > ListOffsetRequest
>> > > > > > > > > >> > > > class on the broker side.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > Could you clarify what you mean here? We already
>> > > > > > > > > >> > > use o.a.k.common.requests.ListOffsetRequest in
>> > > KafkaApis.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > long offset = consumer.offsetForTime(
>> > > > > > Collections.singletonMap(
>> > > > > > > > > >> > > topicPartition,
>> > > > > > > > > >> > > > targetTime)).offset;
>> > > > > > > > > >> > >
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > The result of `offsetForTime` is a Map, so we can't
>> > just
>> > > > > call
>> > > > > > > > > >> `offset` on
>> > > > > > > > > >> > > it. You probably meant something like:
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > long offset = consumer.offsetForTime(
>> > > > > > Collections.singletonMap(
>> > > > > > > > > >> > > topicPartition,
>> > > > > > > > > >> > > targetTime)).get(topicPartition).offset;
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > Test searchByTimestamp with CreateTime and
>> > LogAppendTime
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > Do you mean `Test offsetForTime`?
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > And:
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 1. In KAFKA-1588, the following issue was described
>> > > "When
>> > > > > > > > performing
>> > > > > > > > > >> an
>> > > > > > > > > >> > > OffsetRequest, if you request the same topic and
>> > > partition
>> > > > > > > > > combination
>> > > > > > > > > >> > in a
>> > > > > > > > > >> > > single request more than once (for example, if you
>> > want
>> > > to
>> > > > > get
>> > > > > > > > both
>> > > > > > > > > >> the
>> > > > > > > > > >> > > head and tail offsets for a partition in the same
>> > > > request),
>> > > > > > you
>> > > > > > > > will
>> > > > > > > > > >> get
>> > > > > > > > > >> > a
>> > > > > > > > > >> > > response for both, but they will be the same
>> offset".
>> > > Will
>> > > > > the
>> > > > > > > new
>> > > > > > > > > >> > request
>> > > > > > > > > >> > > version support the use case where multiple
>> timestamps
>> > > are
>> > > > > > > passed
>> > > > > > > > > for
>> > > > > > > > > >> the
>> > > > > > > > > >> > > same topic partition? And if we do support it at
>> the
>> > > > > protocol
>> > > > > > > > level,
>> > > > > > > > > >> do
>> > > > > > > > > >> > we
>> > > > > > > > > >> > > also want to support it at the API level or do we
>> > think
>> > > > the
>> > > > > > > > > additional
>> > > > > > > > > >> > > complexity is not worth it?
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2. Is `offsetForTime` the right method name given
>> that
>> > > we
>> > > > > are
>> > > > > > > > > getting
>> > > > > > > > > >> > > multiple offsets? Maybe it should be
>> `offsetsForTimes`
>> > > or
>> > > > > > > > something
>> > > > > > > > > >> like
>> > > > > > > > > >> > > that.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > Ismael
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > On Wed, Aug 31, 2016 at 4:38 AM, Becket Qin <
>> > > > > > > becket....@gmail.com
>> > > > > > > > >
>> > > > > > > > > >> > wrote:
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > > Hi Kafka devs,
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > I created KIP-79 to allow consumer to precisely
>> > query
>> > > > the
>> > > > > > > > offsets
>> > > > > > > > > >> based
>> > > > > > > > > >> > > on
>> > > > > > > > > >> > > > timestamp.
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > In short we propose to :
>> > > > > > > > > >> > > > 1. add a ListOffsetRequest/ListOffsetResponse
>> v1,
>> > and
>> > > > > > > > > >> > > > 2. add an offsetForTime() method in new consumer.
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > The KIP wiki is the following:
>> > > > > > > > > >> > > > https://cwiki.apache.org/confl
>> uence/pages/viewpage.
>> > > > > > > > > >> > > action?pageId=65868090
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > Comments are welcome.
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > Thanks,
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > Jiangjie (Becket) Qin
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > >
>> > > > > > > > > >> >
>> > > > > > > > > >>
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to