It's probably worth noting that these jobs were originally run with
taskmanager.numberOfTaskSlots of 1 before increasing to 2, which may also
explain the issue. I thought I'd mention it for context in case it's
relevant.

On Mon, Dec 9, 2024 at 8:13 AM Rion Williams <rionmons...@gmail.com> wrote:

> Hi all,
>
> In trying to optimize the performance of some of the existing Flink jobs
> that are running in production environments, I've recently done some
> experimenting with taking advantage of the taskmanager.numberOfTaskSlots
> configuration for some of my Flink jobs and noticed an issue.
>
> It appears when this configuration is set to a value greater than 1, the
> Kafka-based producers will fail with the following error which appears to
> be directly related to the configuration changes themselves:
>
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after
>> 60000ms while awaiting InitProducerId
>
>
> I searched through the existing Apache JIRA project to try and identify a
> similar documented issue, however I didn't find anything that directly
> pointed to it. Is this a potential issue/bug or is this the expected
> behavior (i.e. numberOfTaskSlots must be 1 to support this).
>
> Any advice would be appreciated!
>
> Thanks,
>
> Rion
>

Reply via email to