Thanks Yi. We recovered this time around by processing all of the
checkpoints manually, deleting and recreating the checkpoint topic, then
re-populating the checkpoint topic with the saved offsets using the
checkpoint tool,. If it happens again, I'll turn up logging to try and get
more information. Fwiw, we're using Kafka 0.11.0.2 with Samza 0.14.1; my
understanding is that there should be version compatibility between Kafka
0.11.0.x-2.x.

If you have any other ideas, I'd be interested in hearing them.

Cheers,
Malcolm McFarland
Cavulus


On Fri, Apr 10, 2020 at 8:36 PM Yi Pan <nickpa...@gmail.com> wrote:

> Hi, Malcolm,
>
> Samza 0.14.1 is pretty old and if you have already upgraded Kafka to 2.2.1,
> I would highly recommend you to migrate to latest Samza version, which has
> Kafka 2.0 client. Depending on how you configure your broker, a 2.0 Kafka
> broker can be incompatible with older client like 0.8.6, since there have
> been wire-protocol changes in Kafka since 0.11.
>
> P.S. a more detailed log + configuration would be required to debug your
> issue (probably, turn on debug log for Kafka consumer lib), if you still
> choose to stay with Samza 0.14.1.
>
> Best,
>
> -Yi
>
> On Thu, Apr 9, 2020 at 12:50 PM Malcolm McFarland <mmcfarl...@cavulus.com>
> wrote:
>
> > Hey folks,
> >
> > We're occasionally seeing an issue when starting Samza containers where
> > none of the streamtasks for a job will instantiate consumers. We'll see
> the
> > beginnings of an attempt to read the checkpoint stream and nothing
> further
> > (including no errors). All of these streamtasks seem to be creating
> > producers -- there are plenty of "Registering
> > TaskName-SystemStreamPartition [kafka, <topic>, <partition>] with
> > producer." messages. We're not seeing any of the usual messages about
> > instantiating consumers; all we're seeing is this:
> >
> > 2020-04-09T19:24:14.968Z Validating offset 0 for topic and partition
> > [__samza_checkpoint_ver_1_for_<job_name>_1,0]
> > 2020-04-09T19:24:14.969Z Able to successfully read from offset 0 for
> topic
> > and partition [__samza_checkpoint_ver_1_for_<job_name>_1,0]. Using it to
> > instantiate consumer.
> > 2020-04-09T19:24:14.974Z Reading checkpoint for taskName
> > SystemStreamPartition [kafka, <topic>, <partition>]
> >
> > ...and that's as far as it will go. I am also noticing that when using
> > different versions of librdkafka, I *can* read the checkpoint stream with
> > librdkafka 1.3.0, but *not* with librdkafka 0.8.6. Could there be a
> version
> > incompatibility with how the data is being stored on the kafka server?
> > We're running Samza 0.14.1 and using AWS MSK which is running version
> Kafka
> > 2.2.1.
> >
> >
> > Thanks so much,
> > Malcolm McFarland
> > Cavulus
> >
>

Reply via email to