Hi, Please not that REPLICATION_FACTOR_CONFIG is already set as three. What is observed is that no mater what the producer request timeout is increased to for one or two partitions it still gets timed out after that time.
Streams side log simply has message like this org.apache.kafka.common.errors.TimeoutException: Expiring x record(s) for new-part-advice-key-table-changelog-n: xxxxxx ms has passed since last append xxxxxx is usually few milliseconds higher than ProducerConfig.REQUEST_ TIMEOUT_MS_CONFIG. 1. So we need to check the broker logs, can you please let us know which logs on broker side would contain information to correlate to this error. 2. Also what specific should we look at brokers side, since we have three brokers and there are lot of logging and we wanted to know how can the error be narrowed down. Another thing we observe is sometime (not always) the partition which throws this error gets leader as -1 assigned to it. I have posted more details on the same on thread "What could cause unavailable partitions issue and how to fix it". Please let us know there what can be done to fix such issues. Thanks Sachin On Sat, Jun 10, 2017 at 1:20 AM, Eno Thereska <eno.there...@gmail.com> wrote: > Hi Sachin, > > As Damian mentioned it'd be useful to see some logs from both broker and > streams. > > One thing that comes to mind is whether your topics are replicated at all. > You could try setting the replication factor of streams topics (e.g., > changelogs and repartition topics) to 2 or 3 using > StreamsConfig.REPLICATION_FACTOR_CONFIG. > > Thanks > Eno > > > > On 9 Jun 2017, at 20:01, Sachin Mittal <sjmit...@gmail.com> wrote: > > > > Hi All, > > We still intermittently get this error. > > > > We had added the config > > props.put(ProducerConfig.RETRIES_CONFIG, Integer.MAX_VALUE); > > > > and timeout as mentioned above is set as: > > props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 1800000); > > > > So we increased from default 30 sec to 3 min to 30 minutes. > > > > If this is connectivity issue then does this mean that for 30 minutes > > client could not connect to broker ? > > I doubt that will be the case because such error we see at a time on one > or > > two partitions only. > > > > Also we note that changelog topic partitions that get this error > sometimes > > become unavailable with leader set as -1. > > > > Also for this client side error what kind of server exception we should > > expect so we can correlate it with server logs to get better > understanding. > > > > Thanks > > Sachin > > > > > > > > > > > > On Mon, Dec 19, 2016 at 5:43 PM, Damian Guy <damian....@gmail.com> > wrote: > > > >> Hi Sachin, > >> > >> This would usually indicate that may indicate that there is a > connectivity > >> issue with the brokers. You would need to correlate the logs etc on the > >> brokers with the streams logs to try and understand what is happening. > >> > >> Thanks, > >> Damian > >> > >> On Sun, 18 Dec 2016 at 07:26 Sachin Mittal <sjmit...@gmail.com> wrote: > >> > >>> Hi all, > >>> I have a simple stream application pipeline > >>> src.filter.aggragteByKey.mapValues.forEach > >>> > >>> From time to time I get the following exception: > >>> Error sending record to topic test-stream-key-table-changelog > >>> org.apache.kafka.common.errors.TimeoutException: Batch containing 2 > >>> record(s) expired due to timeout while requesting metadata from brokers > >> for > >>> test-stream-key-table-changelog-0 > >>> > >>> What could be causing the issue? > >>> I investigated a bit and saw none of the stage takes a long time. Even > in > >>> forEach stage where we commit the output to external db takes sub 100 > ms > >> in > >>> worst case. > >>> > >>> I have right now done a workaround of > >>> props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 1800000); > >>> > >>> Increased the default timeout from 30 seconds to 3 minutes. > >>> > >>> However to dig deep into the issue where can the problem be? > >>> > >>> Is it that some stage is taking beyond 30 seconds to execute. Or is it > >> some > >>> network issue where it is taking a long time to connect to broker > itself? > >>> > >>> Any logging that I can enable at the streams side to get more complete > >>> stacktraces? > >>> > >>> Note that issue occurs in bunches. Then everything works fine for a > while > >>> then these exceptions come in bunch and then it works fine for sometime > >>> then again exceptions and so on. > >>> > >>> Note that my version is kafka_2.10-0.10.0.1. > >>> > >>> Thanks > >>> Sachin > >>> > >> > >