Hi Habib, With regards to your earlier question about timezones, I've updated the KIP to remove the LatencyTime abstraction since it is no longer relevant. I added a note about epoch time as well.
Thanks, Sean On Wed, Jan 15, 2020 at 8:28 AM Habib Nahas <ha...@hbnet.io> wrote: > Hi Sean, > > Thats great, look forward to it. > > Thanks, > Habib > > On Tue, Jan 14, 2020, at 2:55 PM, Sean Glover wrote: > > Hi Habib, > > > > Thank you for the reminder. I'll update the KIP this week and address the > > feedback from you and Gokul. > > > > Regards, > > Sean > > > > On Tue, Jan 14, 2020 at 9:06 AM Habib Nahas <ha...@hbnet.io> wrote: > > > > > Any chance of an update on the KIP? We are interested in seeing this > move > > > forward. > > > > > > Thanks, > > > Habib > > > Sr SDE, AWS > > > > > > On Wed, Dec 18, 2019, at 3:27 PM, Habib Nahas wrote: > > > > Thanks Sean. Look forward to the updated KIP. > > > > > > > > Regards, > > > > Habib > > > > > > > > On Fri, Dec 13, 2019, at 6:22 AM, Sean Glover wrote: > > > > > Hi, > > > > > > > > > > After my last reply I had a nagging feeling something wasn't right, > > > and I > > > > > remembered that epoch time is UTC. This makes the discussion about > > > > > timezone irrelevant, since we're always using UTC. This makes the > need > > > for > > > > > the LatencyTime interface that I proposed in the design irrelevant > as > > > well, > > > > > since I can no longer think about how that might be useful. I'll > update > > > > > the KIP. I'll also review KIP-32 to understand message timestamps > > > better > > > > > so I can explain the different types of latency results that could > be > > > > > reported with this metric. > > > > > > > > > > Regards, > > > > > Sean > > > > > > > > > > On Thu, Dec 12, 2019 at 6:25 PM Sean Glover < > sean.glo...@lightbend.com > > > > > > > > > wrote: > > > > > > > > > > > Hi Habib, > > > > > > > > > > > > Thanks for question! If the consumer is in a different timezone > than > > > the > > > > > > timezone used to produce messages to a partition then you can > use an > > > > > > implementation of LatencyTime to return the current time of that > > > timezone. > > > > > > The current design assumes that messages produced to a partition > > > must all > > > > > > be produced from the same timezone. If timezone metadata were > > > encoded into > > > > > > each message then it would be possible to automatically > determine the > > > > > > source timezone and calculate latency, however the current design > > > will not > > > > > > pass individual messages into LatencyTime to retrieve message > > > metadata. > > > > > > Instead, the LatencyTime.getWallClockTime method is only called > once > > > per > > > > > > fetch request response per partition and then the metric is > recorded > > > once > > > > > > the latency calculation is complete. This follows the same > design as > > > the > > > > > > current consumer lag metric which calculates offset lag based on > the > > > last > > > > > > message of the fetch request response for a partition. Since the > > > metric is > > > > > > just an aggregate (max/mean) over some time window we only need > to > > > > > > occasionally calculate latency, which will have negligible > impact on > > > the > > > > > > performance of consumer polling. > > > > > > > > > > > > A simple implementation of LatencyTime that returns wall clock > time > > > for > > > > > > the Asia/Singapore timezone for all partitions: > > > > > > > > > > > > import java.time.*; > > > > > > > > > > > > class SingaporeTime implements LatencyTime { > > > > > > ZoneId zoneSingapore = ZoneId.of("Asia/Singapore"); > > > > > > Clock clockSingapore = Clock.system(zoneSingapore); > > > > > > > > > > > > @Override > > > > > > public long getWallClockTime(TopicPartition tp) { > > > > > > return clockSingapore.instant.getEpochSecond(); > > > > > > } > > > > > > > > > > > > ... > > > > > > } > > > > > > > > > > > > Regards, > > > > > > Sean > > > > > > > > > > > > > > > > > > On Thu, Dec 12, 2019 at 6:18 AM Habib Nahas <ha...@hbnet.io> > wrote: > > > > > > > > > > > >> Hi Sean, > > > > > >> > > > > > >> Thanks for the KIP. > > > > > >> > > > > > >> As I understand it users are free to set their own timestamp on > > > > > >> ProducerRecord. What is the recommendation for the proposed > metric > > > in a > > > > > >> scenario where the user sets this timestamp in timezone A and > > > consumes the > > > > > >> record in timezone B. Its not clear to me if a custom > > > implementation of > > > > > >> LatencyTime will help here. > > > > > >> > > > > > >> Thanks, > > > > > >> Habib > > > > > >> > > > > > >> On Wed, Dec 11, 2019, at 4:52 PM, Sean Glover wrote: > > > > > >> > Hello again, > > > > > >> > > > > > > >> > There has been some interest in this KIP recently. I'm > bumping the > > > > > >> thread > > > > > >> > to encourage feedback on the design. > > > > > >> > > > > > > >> > Regards, > > > > > >> > Sean > > > > > >> > > > > > > >> > On Mon, Jul 15, 2019 at 9:01 AM Sean Glover < > > > sean.glo...@lightbend.com> > > > > > >> > wrote: > > > > > >> > > > > > > >> > > To hopefully spark some discussion I've copied the > motivation > > > section > > > > > >> from > > > > > >> > > the KIP: > > > > > >> > > > > > > > >> > > Consumer lag is a useful metric to monitor how many records > are > > > > > >> queued to > > > > > >> > > be processed. We can look at individual lag per partition > or we > > > may > > > > > >> > > aggregate metrics. For example, we may want to monitor what > the > > > > > >> maximum lag > > > > > >> > > of any particular partition in our consumer subscription so > we > > > can > > > > > >> identify > > > > > >> > > hot partitions, caused by an insufficient producing > partitioning > > > > > >> strategy. > > > > > >> > > We may want to monitor a sum of lag across all partitions > so we > > > have a > > > > > >> > > sense as to our total backlog of messages to consume. Lag in > > > offsets > > > > > >> is > > > > > >> > > useful when you have a good understanding of your messages > and > > > > > >> processing > > > > > >> > > characteristics, but it doesn’t tell us how far behind *in > > > time* we > > > > > >> are. > > > > > >> > > This is known as wait time in queueing theory, or more > > > informally it’s > > > > > >> > > referred to as latency. > > > > > >> > > > > > > > >> > > The latency of a message can be defined as the difference > > > between when > > > > > >> > > that message was first produced to when the message is > received > > > by a > > > > > >> > > consumer. The latency of records in a partition correlates > with > > > lag, > > > > > >> but a > > > > > >> > > larger lag doesn’t necessarily mean a larger latency. For > > > example, a > > > > > >> topic > > > > > >> > > consumed by two separate application consumer groups A and B > > > may have > > > > > >> > > similar lag, but different latency per partition. > Application A > > > is a > > > > > >> > > consumer which performs CPU intensive business logic on each > > > message > > > > > >> it > > > > > >> > > receives. It’s distributed across many consumer group > members to > > > > > >> handle the > > > > > >> > > load quickly enough, but since its processing time is > slower, > > > it takes > > > > > >> > > longer to process each message per partition. Meanwhile, > > > Application > > > > > >> B is > > > > > >> > > a consumer which performs a simple ETL operation to land > > > streaming > > > > > >> data in > > > > > >> > > another system, such as HDFS. It may have similar lag to > > > Application > > > > > >> A, but > > > > > >> > > because it has a faster processing time its latency per > > > partition is > > > > > >> > > significantly less. > > > > > >> > > > > > > > >> > > If the Kafka Consumer reported a latency metric it would be > > > easier to > > > > > >> > > build Service Level Agreements (SLAs) based on > non-functional > > > > > >> requirements > > > > > >> > > of the streaming system. For example, the system must never > > > have a > > > > > >> latency > > > > > >> > > of greater than 10 minutes. This SLA could be used in > monitoring > > > > > >> alerts or > > > > > >> > > as input to automatic scaling solutions. > > > > > >> > > > > > > > >> > > On Thu, Jul 11, 2019 at 12:36 PM Sean Glover < > > > > > >> sean.glo...@lightbend.com> > > > > > >> > > wrote: > > > > > >> > > > > > > > >> > >> Hi kafka-dev, > > > > > >> > >> > > > > > >> > >> I've created KIP-489 as a proposal for adding latency > metrics > > > to the > > > > > >> > >> Kafka Consumer in a similar way as record-lag metrics are > > > > > >> implemented. > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/489%3A+Kafka+Consumer+Record+Latency+Metric > > > > > >> > >> > > > > > >> > >> Regards, > > > > > >> > >> Sean > > > > > >> > >> > > > > > >> > >> -- > > > > > >> > >> Principal Engineer, Lightbend, Inc. > > > > > >> > >> > > > > > >> > >> <http://lightbend.com> > > > > > >> > >> > > > > > >> > >> @seg1o <https://twitter.com/seg1o>, in/seanaglover > > > > > >> > >> <https://www.linkedin.com/in/seanaglover/> > > > > > >> > >> > > > > > >> > > > > > > > >> > > > > > > > >> > > -- > > > > > >> > > Principal Engineer, Lightbend, Inc. > > > > > >> > > > > > > > >> > > <http://lightbend.com> > > > > > >> > > > > > > > >> > > @seg1o <https://twitter.com/seg1o>, in/seanaglover > > > > > >> > > <https://www.linkedin.com/in/seanaglover/> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sean Glover > > > > > Principal Engineer, Alpakka, Lightbend, Inc. < > https://lightbend.com> > > > > > @seg1o <https://twitter.com/seg1o>, in/seanaglover > > > > > <https://www.linkedin.com/in/seanaglover/> > > > > > > > > > > > >