Re: Clickstream partition design question

2019-12-22 Thread Sachin Mittal
Its better to have partition key based on some f(user). This way a partition will always have same set of users and any new user would get assigned to one of these partitions. You can probably check https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html For kafka to spark integr

Re: MM2 startup delay

2019-12-22 Thread Vishal Santoshi
And can you share the patch... On Sun, Dec 22, 2019 at 10:34 PM Vishal Santoshi wrote: > We also have a large number of topics 1500 plus and in a cross DC > replication. How do we increase the default timeouts ? > > > On Wed, Dec 11, 2019 at 2:26 PM Ryanne Dolan > wrote: > >> Hey Peter. Do you

Re: MM2 startup delay

2019-12-22 Thread Vishal Santoshi
We also have a large number of topics 1500 plus and in a cross DC replication. How do we increase the default timeouts ? On Wed, Dec 11, 2019 at 2:26 PM Ryanne Dolan wrote: > Hey Peter. Do you see any timeouts in the logs? The internal scheduler will > timeout each task after 60 seconds by defa

Clickstream partition design question

2019-12-22 Thread Girish Vasmatkar
Hi All I have recently subscribed and am fairly new to Kafka so please pardon if the question sounds too naive! I'm trying to build a POC on clickstream analysis for logged in and Anonymous users for our e-commerce application. I am coming after visiting this thread - https://stackoverflow.com/qu