The replication.timeout means how long a producer would be willing to wait
for its produced data to be replicated according to the ack mode. Hence I
think it is really a producer config rather than a broker config: one
producer could set a smaller replication timeout indicating it does not
want to wait long for the response indicating its data has been replicated,
but rather fail-fast the message and retry, while another producer could
set a larger value, saying it prefer to wait longer for its produced data
to be replicated (today the ack mode is set as a global client config,
hence a single producer sending to different topics would always use the
same replication requirement.). I am not sure what would be its meaning if
we move this replication timeout to broker side.

That said, I agree that it is usually hard for the producers to set this
config right since the real replication latency depends on the broker's
inter communication pattern instead of the client-broker network
characteristics. As for KIP-19 itself, I think we can merge the
"replication timeout" with the new "network.request.timeout.ms", with its
semantics as "the max time to wait for response". For implementation, we
can do the checking around client.poll() with this timeout as well as
setting the ProduceRequest's timeout to this timeout as well. Hence both
broker and producer would check on this timeout to return early responses /
give up waiting on responses.

Guozhang


On Thu, May 28, 2015 at 8:48 PM, Jiangjie Qin <j...@linkedin.com.invalid>
wrote:

> Jun, Jay and Joel, thanks a lot for the explanations.
>
> Just summarize what I learned - please correct me if I understand wrong:
> 1. The actually replication time is determined by inter-broker
> (intra-cluster) communication.
> 2. The actual replication time for different topics may be different.
> 3. The replication timeout config means how long a broker wants to wait
> for replication. From the producer point of view, this prevents producer
> from waiting too long for the response, assuming broker will ever respond.
>
> I agree with Jay that replication timeout should be a part of request
> timeout. I actually still think we can remove it from producer side.
>
> As mentioned in (3), before we have request timeout, replication timeout
> implicitly become some kind of timeout to keep producer from waiting too
> long, assuming broker will ever respond. For a producer, after it sends
> out a request, it either receives a response or does not receive a
> response. So the producer can only control how long it will wait for the
> response. And that is request timeout.  In this sense request timeout will
> do better because it works even when broker does not respond.
> Hence it looks to me from producer point of view, request timeout cover
> all the things replication timeout does. So if we have request timeout,
> replication timeout on producer side is no longer needed.
>
> In addition, replication timeout on producer side also look a little bit
> awkward. Does it mean different producer can set different replication
> timeout for the same topic, even though the actual replication time of
> that topic would be the same for different producers? If so, it looks the
> producer side setting is only reasonable when it is actual replication
> time + some safety buffer. So it looks this configuration does not really
> mean "what client is willing to wait", but "what client has to wait if it
> wants to produce data². So does it mean replication timeout is a producer
> side config but producer does not really have control over it if the
> producer wants to produce data?
>
>
> Please correct me if I miss something.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On 5/28/15, 4:26 PM, "Joel Koshy" <jjkosh...@gmail.com> wrote:
>
> >> have a Kafka cluster across availability zones, data will be replicated
> >>to
> >...
> >> single timeout on the broker side. In theory, different producers may
> >>want
> >> to pick different replication time depending on the topics being sent.
> >
> >I think Becket raises a good point here in that the above
> >configurations are best known by the Kafka cluster operators and not
> >necessarily by the users (producers). So right now users end up having
> >to either know such details about the deployment when in fact it
> >should be set by the people who (may manually) assign partitions to
> >brokers; or they have to "guess" the timeouts or be content with
> >defaults.
> >
> >Actually this would end up being a LogConfig which would be per-topic
> >- i.e., it won't necessarily be a single timeout on the broker.
> >
> >Thanks,
> >
> >Joel
> >
> >On Thu, May 28, 2015 at 04:17:08PM -0700, Jun Rao wrote:
> >> Hi, Jiangjie,
> >>
> >> The replication time may vary a bit for different partitions. For
> >>example,
> >> a partition with more replicas may take a bit more time to propagate the
> >> messages. Also, the replication time depends on network latency. If you
> >> have a Kafka cluster across availability zones, data will be replicated
> >>to
> >> nodes within the same zone a bit faster than those outside of the zone.
> >>So,
> >> I am not sure if it's better to just reason about the replication time
> >>as a
> >> single timeout on the broker side. In theory, different producers may
> >>want
> >> to pick different replication time depending on the topics being sent.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Tue, May 26, 2015 at 4:46 PM, Jiangjie Qin
> >><j...@linkedin.com.invalid>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I am updating the wiki for KIP-19 and wondering why we have a
> >>replication
> >> > timeout on producer side and in producer request?
> >> >
> >> > From what I understand this is a server side setting and the reasons
> >>we
> >> > need this replication timeout is because we want to control the
> >>purgatory
> >> > size. If that is the case should we just have the replication timeout
> >>as a
> >> > broker configuration?
> >> > The downside of having it on server side might be that producer could
> >>have
> >> > a request timeout/socket timeout smaller than replication timeout. In
> >>this
> >> > case we can put request timeout in producer request and if the request
> >> > timeout is smaller than replication timeout on server side, we return
> >>a
> >> > mis-cofiguration exception.
> >> >
> >> > So we can have a producer request V1 which removes ack timeout but
> >>adds
> >> > request timeout. This will give user a cleaner timeout configurations
> >>on
> >> > producer side as well.
> >> >
> >> > What do people think about this?
> >> >
> >> > Thanks,
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> >
> >
>
>


-- 
-- Guozhang

Reply via email to