Re: Clickstream partition design question

Sachin Mittal Sun, 22 Dec 2019 21:01:25 -0800

Its better to have partition key based on some f(user).
This way a partition will always have same set of users and any new user
would get assigned to one of these partitions.


You can probably check
https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html
For kafka to spark integration.


On Sun, Dec 22, 2019 at 7:54 PM Girish Vasmatkar <
girish.vasmat...@hotwaxsystems.com> wrote:

> Hi All
>
> I have recently subscribed and am fairly new to Kafka so please pardon if
> the question sounds too naive!
>
> I'm trying to build a POC on clickstream analysis for logged in and
> Anonymous users for our e-commerce application. I am coming after visiting
> this thread -
>
> https://stackoverflow.com/questions/32761598/is-it-possible-to-create-a-kafka-topic-with-dynamic-partition-count
>
> I've following questions in this regard -
>
>    1. I am planning to have as many number of partitions as the number of
>    users we have in the system such that clickstream events for individual
>    users go in the dedicated partition for that user. The dedicated
> partition
>    id to be derived the user id of the user. Does this sound like a decent
>    approach? If not, what is the suggested way to go about this?
>    2. If a partition per user strategy is good enough, then what happens
>    when a new user signs up and obviously will have a new and unique user
> id.
>    I am not sure if we can add a new partition to a new topic?
>    3. This kafta streaming is going to be consumed by spark streaming job
>    (Kafka Consumer). How do I set it up so that it gets clickstrem events
> from
>    kafka topic for all users (irrespective of the partition id). In other
>    words, can we have a one-for-all consumer for a topic?
>
> Best,
> Girish
>

Re: Clickstream partition design question

Reply via email to