So, the general scalability approach with kafka is to add more partitions
to scale.  If you are using consumer groups and the High Level Consumer
API, redistribution of partitions is automatic on a failover of a member
of a consumer group.   But, the High level consumer doesn¹t allow a
configuration to break up partitions as is noted here:
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
There isn¹t really any way for multiple separate clients on separate JVM's
to coordinate their consumption off of a single partition efficiently. So
the solution is simply to break up a topic into enough partitions so that
a single partition is a reasonable unit to scale a consumer by.  If a
consumer can only handle a single partition or worse, is falling behind,
your partitions are too large and need to be adjusted.

And if for some reason a process hangs on a partition, kill it and start
up a new one. Provided partitions are a reasonable unit of scale, it
shouldn¹t be a problem.  There will be a latency spike, but that¹s better
than starvation. You can split processing of a single partition pretty
easily within a JVM.  The kafka consuming runnable can just put messages
into a concurrent queue of some sort, and then have a large thread pool
pulling from that queue to do the processing.  That way if a thread in the
pool gets hung, there are many left to consume off the queue so nothing
gets hung up.  But this adds some risk on failover based on how kafka does
offset management for the high level consumer.

So, I don¹t think that sending backoff messages to a producer to let up on
a partition is a good design pattern for kafka. Again, the solution is
more partitions.  But offset data is stored in either kafka or zookeeper
depending on your configuration, which can tell you how many messages your
consumer is behind by.  But, since messages being published should be
evenly distributed across all partitions for a topic, all partitions
should be lagging equally.

If you need a true unified queue RabitMQ might be right for your needs.
But if order doesn¹t matter at all, kafka should give you more throughput
with enough partitions.  And since order doesn¹t matter, you have a lot of
flexibility here.

Also, another option to doing everything in a native java client is to use
a Spark application.  It makes faning out your data very easy, and has
some semantics that make it well suited for some of these concerns.



On 9/10/15, 9:54 AM, "Reza Aliakbari" <raliakb...@gmail.com> wrote:

>Hi Everybody,
>
>I have 2 question regarding the way consumers, consume messages of a
>partition.
>
>
>   - * Is it possible to configure Kafka to allow concurrent message
>   consumption from one partition concurrently? The order is not my
>concern at
>   all.*
>
>           I couldn't find any way to that by the Group Of Consumer
>approach, If it is possible please let me know, If impossible, then let me
>know how to address this problem:
>           For a reason a consumer that is assigned to a partition could
>get very slow, and the messages would be processed very slowly. How can I
>detect this and stop producing on this slow partition...
>
>
>
>
>   - * Suppose I have 5 partitions and 3 consumers and I am using Group of
>   Consumers model(I had 5 consumers at start but 2 servers crashed), 3
>   consumers are working busy with their 3   partitions and they never get
>   finished since the producer produce to their partitions **non-stop and
>a
>   little faster than their consumption. What happens to the other 2
>   partitions that are missing consumers? How the Group of Consumers can
>   handle this issue?*
>
>
>*The order is no matter for me, I need a simple configuration that address
>my concurrency needs and I need to make sure no message gets into
>starvation scenario that never consumed.*
>
>Please let us know, we want to select between Kafka and RabitMQ and we
>prefer Kafka because it is growing community and high throughput, But
>first
>we need to address these basic needs.
>
>
>Thanks,
>
>Reza Aliakabri

Reply via email to