Hi Pushkar. Just for your information, https://github.com/line/decaton is a Kafka consumer framework that supports parallel processing per single partition.
It manages committable (i.e. the offset that all preceding offsets have been processed) offset internally so that preserves at-least-once semantics even when processing in parallel. 2020年11月24日(火) 1:16 Pushkar Deole <pdeole2...@gmail.com>: > Thanks Liam! > We don't have a requirement to maintain order of processing for events even > within a partition. Essentially, these are events for various accounts > (customers) that we want to support and do necessary database provisioning > for those in our database. So they can be processed in parallel. > > I think the 2nd option would suit our requirement to have a single consumer > and a bound thread pool for processors. However, the issue we may face is > to commit the offsets only after processing an event since we don't want > the consumer to auto commit offsets before the provisioning done for the > customer. How can that be achieved with model #2 ? > > On Tue, Oct 27, 2020 at 2:50 PM Liam Clarke-Hutchinson < > liam.cla...@adscale.co.nz> wrote: > > > Hi Pushkar, > > > > No. You'd need to combine a consumer with a thread pool or similar as you > > prefer. As the docs say (from > > > > > https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html > > ) > > > > We have intentionally avoided implementing a particular threading model > for > > > processing. This leaves several options for implementing multi-threaded > > > processing of records. > > > 1. One Consumer Per Thread > > > A simple option is to give each thread its own consumer instance. Here > > are > > > the pros and cons of this approach: > > > > > > - *PRO*: It is the easiest to implement > > > > > > > > > - *PRO*: It is often the fastest as no inter-thread co-ordination is > > > needed > > > > > > > > > - *PRO*: It makes in-order processing on a per-partition basis very > > > easy to implement (each thread just processes messages in the order > it > > > receives them). > > > > > > > > > - *CON*: More consumers means more TCP connections to the cluster > (one > > > per thread). In general Kafka handles connections very efficiently > so > > this > > > is generally a small cost. > > > > > > > > > - *CON*: Multiple consumers means more requests being sent to the > > > server and slightly less batching of data which can cause some drop > > in I/O > > > throughput. > > > > > > > > > - *CON*: The number of total threads across all processes will be > > > limited by the total number of partitions. > > > > > > 2. Decouple Consumption and Processing > > > Another alternative is to have one or more consumer threads that do all > > > data consumption and hands off ConsumerRecords > > > < > > > https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html > > > > instances > > > to a blocking queue consumed by a pool of processor threads that > actually > > > handle the record processing. This option likewise has pros and cons: > > > > > > - *PRO*: This option allows independently scaling the number of > > > consumers and processors. This makes it possible to have a single > > consumer > > > that feeds many processor threads, avoiding any limitation on > > partitions. > > > > > > > > > - *CON*: Guaranteeing order across the processors requires > particular > > > care as the threads will execute independently an earlier chunk of > > data may > > > actually be processed after a later chunk of data just due to the > > luck of > > > thread execution timing. For processing that has no ordering > > requirements > > > this is not a problem. > > > > > > > > > - *CON*: Manually committing the position becomes harder as it > > > requires that all threads co-ordinate to ensure that processing is > > complete > > > for that partition. > > > > > > There are many possible variations on this approach. For example each > > > processor thread can have its own queue, and the consumer threads can > > hash > > > into these queues using the TopicPartition to ensure in-order > consumption > > > and simplify commit. > > > > > > Cheers, > > > > Liam Clarke-Hutchinson > > > > On Tue, Oct 27, 2020 at 8:04 PM Pushkar Deole <pdeole2...@gmail.com> > > wrote: > > > > > Hi, > > > > > > Is there any configuration in kafka consumer to specify multiple > threads > > > the way it is there in kafka streams? > > > Essentially, can we have a consumer with multiple threads where the > > threads > > > would divide partitions of topic among them? > > > > > > -- ======================== Okada Haruki ocadar...@gmail.com ========================