Hi Hemant,

Being able to lookup specific records by key is not possible in Kafka. As a distributed streaming platform based on the concept of a commit log Kafka organizes data sequentially where each record has an offset that uniquely identifies not who the record is but where within the log it is positioned.

In order to implement record lookup by key you would need to use Kafka Streams or ksqlDB. I would recommend ksqlDB since you can easily create a stream out of your existing topic and then make that stream transformed into a table. Note only that currently ksqlDB requires that each table that would serve pull requests (i.e.: queries that serve requests given a key) need to be created using an aggregation construct. So you might need to work that out in order to achieve the behavior that you want.

Thanks,

-- Ricardo

On 6/19/20 1:07 PM, Hemant Bairwa wrote:
Thanks Ricardo.

I need some information on more use case.
In my application I need to use Kafka to maintain the different workflow states of message items while processing through different processes. For example in my application all messages transits from Process A to Process Z and I need to maintain all the processed states by an item. So for item xyz there should be total 26 entries in Kafka topic.
xyz, A
xyz, B... and so on.

User should be able to retrieve all the messages for any specific key as many times. That is a DB type of feature is required.

1. Is Kafka alone is able to cater this requirement?
2. Or do I need to use KSql DB for meeting this requirement? I did some research around it but I don't want to run separate KSql DB server.
3. Any other suggestions?

Regards,



On Thu, 18 Jun 2020, 6:51 pm Ricardo Ferreira, <rifer...@riferrei.com <mailto:rifer...@riferrei.com>> wrote:

    Hemant,

    This behavior might be the result of the version of AK (Apache
    Kafka) that you are using. Before AK 2.4 the default behavior for
    the DefaultPartitioner was to load balance data production across
    the partitions as you described. But it was found that this
    behavior would cause performance problems to the batching strategy
    that each producer does. Therefore, AK 2.4 introduced a new
    behavior into the DefaultPartitioner called sticky partitioning.
    You can follow up in this change reading up the KIP that was
    created for this change: *KIP-480
    
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner>*.

    The only downside that I see in your workaround is if you are
    handling connections to the partitions programmatically. That
    would make your code fragile because if the # of partitions for
    the topic changes then your code would not know this. Instead,
    just use the RoundRobinPartitioner
    
<https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/producer/RoundRobinPartitioner.html>
    explicitly in your producer:

    ```

    configs.put("partitioner.class",
    "org.apache.kafka.clients.producer.RoundRobinPartitioner");

    ```

    Thanks,

    -- Ricardo

    On 6/18/20 12:38 AM, Hemant Bairwa wrote:
    Hello All

    I have a single producer service which is queuing message into a topic with
    let say 12 partitions. I want to evenly distribute the messages across all
    the partitions in a round robin fashion.
    Even after using default partitioning and keeping key 'NULL', the messages
    are not getting distributed evenly. Rather some partitions are getting none
    of the messages while some are getting multiple.
    One reason I found for this behaviour, somewhere, is that if there are
    lesser number of producers than the number of partitions, it distributes
    the messages to fewer partitions to limit many open sockets.
    However I have achieved even distribution through code by first getting
    total partition numbers and then passing partition number in the
    incremental order along with the message into the producer record. Once the
    partition number reaches end of the partition number then again resetting
    the next partition number to zero.

    Query:
    1. Is there can be any downside of above approach used?
    2. If yes, how to achieve even distribution of messages in an optimized way?

Reply via email to