I think increasing the segment size for repartition topic should
mitigate the issues.

-Matthias


On 6/5/19 4:43 AM, Pieter Hameete wrote:
> Hi Guozhang,
> 
> Some additional finding: it seems to only happen on Kakfa Streams repartition 
> topics. We haven't seen this happening for any other topics so far.
> 
> Best,
> 
> Pieter
> 
> -----Oorspronkelijk bericht-----
> Van: Pieter Hameete <pieter.hame...@blockbax.com> 
> Verzonden: Wednesday, 5 June 2019 11:23
> Aan: users@kafka.apache.org
> Onderwerp: RE: Repeating UNKNOWN_PRODUCER_ID errors for Kafka streams 
> applications
> 
> Hi Guozhang,
> 
> Thanks for your reply! I noticed my original mail went out twice by accident, 
> sorry for that.
> 
> We currently have a small variety of keys so not all partitions are 'actively 
> used' indeed. The strange thing is though is that the errors occur for the 
> partitions that actively receive records every few seconds. I have checked 
> this using kafkacat to consume the specific partitions. Something I noticed 
> was that for each received record the partition offset was 2 higher than the 
> previous record, instead of the expected 1. Could that be due to the 
> producers retrying (see warning logs in my original mail)?
> 
> I've done the override for the configs in the repartition topics as follows, 
> on one of the brokers:
> 
> The values are taken from your KIP-443 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-443%3A+Return+to+default+segment.ms+and+segment.index.bytes+in+Streams+repartition+topics
> 
> kafka-topics --zookeeper localhost:2181 --alter --topic 
> event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition --config 
> segment.index.bytes=10485760 kafka-topics --zookeeper localhost:2181 --alter 
> --topic event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition 
> --config segment.bytes= 52428800 kafka-topics --zookeeper localhost:2181 
> --alter --topic 
> event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition --config 
> segment.ms=604800000 kafka-topics --zookeeper localhost:2181 --alter --topic 
> event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition --config 
> retention.ms=-1
> 
> Verifying afterwards:
> 
> kafka-topics --zookeeper localhost:2181 --describe --topic 
> event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition
> 
> Topic:event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition     
>  
>  PartitionCount:32       ReplicationFactor:3     
> Configs:segment.bytes=52428800,retention.ms=-1,segment.index.bytes=10485760,segment.ms=604800000,cleanup.policy=delete
> 
> Is there anything that seems off to you? Or something else I can investigate 
> further? We'd really like to nail this issue down. Especially because the 
> cause seems different than the 'low traffic' cause in JIRA issue KAFKA-7190 
> as the partitions for which errors are thrown are receiving data.
> 
> Best,
> 
> Pieter
> 
> -----Oorspronkelijk bericht-----
> Van: Guozhang Wang <wangg...@gmail.com>
> Verzonden: Wednesday, 5 June 2019 02:23
> Aan: users@kafka.apache.org
> Onderwerp: Re: Repeating UNKNOWN_PRODUCER_ID errors for Kafka streams 
> applications
> 
> Hello Pieter,
> 
> If you only have one record every few seconds that may be too small given you 
> have at least 25 partitions (as I saw you have a xxx--repartition-24 
> partition), which means that for a single partition, it may not see any 
> records for a long time, and in this case you may need to override it to very 
> large values. On the other hand, if you can reduce your num.partitions that 
> may also help increasing the traffic per partition.
> 
> Also could you show me how did you override the configs in the repartition 
> topics?
> 
> 
> Guozhang
> 
> On Tue, Jun 4, 2019 at 2:10 AM Pieter Hameete <pieter.hame...@blockbax.com>
> wrote:
> 
>> Hello,
>>
>> Our Kafka streams applications are showing the following warning every 
>> few seconds (on each of our 3 brokers, and on each of the 2 instances 
>> of the streams application):
>>
>>
>> [Producer
>> clientId=event-rule-engine-dd71ae9b-523c-425d-a7c0-c62993315b30-Stream
>> Thread-1-1_24-producer, transactionalId=event-rule-engine-1_24]
>> Resetting sequence number of batch with current sequence 1 for 
>> partition
>> event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition-24
>> to 0
>>
>>
>>
>> Followed by:
>>
>>
>>
>> [Producer
>> clientId=event-rule-engine-dd71ae9b-523c-425d-a7c0-c62993315b30-Stream
>> Thread-1-1_24-producer, transactionalId=event-rule-engine-1_24] Got 
>> error produce response with correlation id 5902 on topic-partition 
>> event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition-24
>> , retrying (2147483646 attempts left). Error: UNKNOWN_PRODUCER_ID
>>
>> The brokers are showing errors that look related:
>>
>>
>> Error processing append operation on partition
>> event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition-24
>> (kafka.server.ReplicaManager)
>>
>> org.apache.kafka.common.errors.UnknownProducerIdException: Found no 
>> record of producerId=72 on the broker. It is possible that the last 
>> message with the producerId=72 has been removed due to hitting the retention 
>> limit.
>>
>>
>>
>> We would expect the UNKNOWN_PRODUCER_ID error to occur once. After a 
>> retry the record would be published on the partition and the 
>> PRODUCER_ID would be known. However, this error keeps occurring every 
>> few seconds. This is roughly at the same rate at which records are 
>> produced on the input topics partitions, so it seems like it occurs for 
>> (nearly) every input record.
>>
>>
>>
>> The following JIRA issue: 
>> https://issues.apache.org/jira/browse/KAFKA-7190
>> looks related. Except the Jira issue mentions ‘little traffic’, and I 
>> am not sure if a message per every few seconds is regarded as little traffic.
>> Matthias mentions in the issue that a workaround seems to be to 
>> increase topic configs `segment.bytes`, `segment.index.bytes`, and 
>> `segment.ms` for the corresponding repartition topics. We’ve tried 
>> manually overriding these configs for a relevant topic to the config 
>> values in the linked pull request 
>> (https://github.com/apache/kafka/pull/6511) but this did not result in the 
>> errors disappearing.
>>
>>
>>
>> Could anyone help us to figure out what is happening here, and why the 
>> proposed fix for the above JIRA issue is not working in this case?
>>
>>
>>
>> Best,
>>
>>
>>
>> Pieter
>>
>>
> 
> --
> -- Guozhang
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to