Thanks, I got the point. That solves my problem.
On Wed, Oct 5, 2016 at 10:58 PM, Matthias J. Sax <matth...@confluent.io> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > Hi, > > even if you have more distinct keys than partitions (ie, different key > go to the same partition), if you do "aggregate by key" Streams will > automatically separate the keys and compute an aggregate per key. > Thus, you do not need to worry about which keys is hashed to what > partition. > > - -Matthias > > On 10/5/16 1:37 PM, Adrienne Kole wrote: > > Hi, > > > > @Ali IMO, Yes. That is the job of kafka server to assign kafka > > instances partition(s) to process. Each instance can process more > > than one partition but one partition cannot be processed by more > > than one instance. > > > > @Michael, Thanks for reply. > >> Rather, pick the number of partitions in a way that matches your > >> needs to > > process the data in parallel I think this should be ' pick number > > of partitions that matches max number of possible keys in stream to > > be partitioned '. At least in my usecase , in which I am trying to > > partition stream by key and make windowed aggregations, if there > > are less number of topic partitions than possible keys, then > > application will not work correctly. > > > > That is, if the number of topic partitions is less than possible > > stream keys, then different keyed stream tuples will be assigned to > > same topic. That was the problem that I was trying to solve and it > > seems the only solution is to estimate max number of possible keys > > and assign accordingly. > > > > Thanks Adrienne > > > > > > > > > > > > On Wed, Oct 5, 2016 at 5:55 PM, Ali Akhtar <ali.rac...@gmail.com> > > wrote: > > > >>> It's often a good > >> idea to over-partition your topics. For example, even if today > >> 10 machines (and thus 10 partitions) would be sufficient, pick a > >> higher number of partitions (say, 50) so you have some wiggle > >> room to add more machines (11...50) later if need be. > >> > >> If you create e.g 30 partitions, but only have e.g 5 instances of > >> your program, all on the same consumer group, all using kafka > >> streams to consume the topic, do you still receive all the data > >> posted to the topic, or will you need to have the same instances > >> of the program as there are partitions? > >> > >> (If you have 1 instance, 30 partitions, will the same rules > >> apply, i.e it will receive all data?) > >> > >> On Wed, Oct 5, 2016 at 8:52 PM, Michael Noll > >> <mich...@confluent.io> wrote: > >> > >>>> So, in this case I should know the max number of possible > >>>> keys so that I can create that number of partitions. > >>> > >>> Assuming I understand your original question correctly, then > >>> you would > >> not > >>> need to do/know this. Rather, pick the number of partitions in > >>> a way > >> that > >>> matches your needs to process the data in parallel (e.g. if you > >>> expect > >> that > >>> you require 10 machines in order to process the incoming data, > >>> then you'd need 10 partitions). Also, as a general > >>> recommendation: It's often a > >> good > >>> idea to over-partition your topics. For example, even if today > >>> 10 > >> machines > >>> (and thus 10 partitions) would be sufficient, pick a higher > >>> number of partitions (say, 50) so you have some wiggle room to > >>> add more machines (11...50) later if need be. > >>> > >>> > >>> > >>> On Wed, Oct 5, 2016 at 9:34 AM, Adrienne Kole > >>> <adrienneko...@gmail.com> wrote: > >>> > >>>> Hi Guozhang, > >>>> > >>>> So, in this case I should know the max number of possible > >>>> keys so that > >> I > >>>> can create that number of partitions. > >>>> > >>>> Thanks > >>>> > >>>> Adrienne > >>>> > >>>> On Wed, Oct 5, 2016 at 1:00 AM, Guozhang Wang > >>>> <wangg...@gmail.com> > >>> wrote: > >>>> > >>>>> By default the partitioner will use murmur hash on the key > >>>>> and mode > >> on > >>>>> current num.partitions to determine which partitions to go > >>>>> to, so > >>> records > >>>>> with the same key will be assigned to the same partition. > >>>>> Would that > >> be > >>>> OK > >>>>> for your case? > >>>>> > >>>>> > >>>>> Guozhang > >>>>> > >>>>> > >>>>> On Tue, Oct 4, 2016 at 3:00 PM, Adrienne Kole < > >> adrienneko...@gmail.com > >>>> > >>>>> wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> From Streams documentation, I can see that each Streams > >>>>>> instance is processing data independently (from other > >>>>>> instances), reads from > >>> topic > >>>>>> partition(s) and writes to specified topic. > >>>>>> > >>>>>> > >>>>>> So here, the partitions of topic should be determined > >>>>>> beforehand > >> and > >>>>> should > >>>>>> remain static. In my usecase I want to create > >>>>>> partitioned/keyed (time) windows and aggregate them. I > >>>>>> can partition the incoming data to specified topic's > >>>>>> partitions > >> and > >>>>> each > >>>>>> Stream instance can do windowed aggregations. > >>>>>> > >>>>>> However, if I don't know the number of possible keys (to > >> partition), > >>>> then > >>>>>> what should I do? > >>>>>> > >>>>>> Thanks Adrienne > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- -- Guozhang > >>>>> > >>>> > >>> > >> > > > -----BEGIN PGP SIGNATURE----- > Comment: GPGTools - https://gpgtools.org > > iQIcBAEBCgAGBQJX9Wl2AAoJECnhiMLycopPVNQQAJnLVIEFTWRdUY41jLEjHEdJ > Nwqk/M/VrZ3/s8BR9+XKKN+lTd+lQaBFgQUxyae18kIchnEe5r+QB+PoDB4IkTV8 > zS6XhDTr7RwiHdhykGK9bKxhF/0gAiQ4qFu8iBlmwTfH3mOSDgY76z4/wQVnS7Sf > C1/s2ubvQFgEp0W1OOiTAy2uYhPkeskLjHpFL7Nxc19zGy4a8IeHFo2r1CYCsJHJ > VBOsLaBgstICTcWnx1lJBjqwhqlXPPo4+dOo+e6h71vuHhFMePhsPuxHQ9nBVKw/ > 0S0X4m+fB2FInx9XOG9rHA3nYvK5zr5eijKMNGGdJfU9lItcM5nhnEnPOI1QLnak > rrAgwbdeUlv0clo04tAyaxGxz2/F0Z5S3xJa1M5vvAd5895jeKdh1l7UdByQWA5R > BTkYWodEZ01Zn6fqHkhR5tsWzKLfvFr2bXps/21WzpC90bJK4snUXSs97ugVdT0U > UgngxEeD9566EENIFzF2HGOrkZd74B5sEs4p5Tp16JhzOydnv9xGGOfxDJXwr0lh > 5TBcKRqF/998zyil7UOFFecvR7DUYDc/pJIJVffRo7DyjvkOCK1OYBBQB50JTh3s > blMCHsNu7iXDbRocLT2EigkqKZtQ5w4Xm7e3pEkqQJ/KOnmvJsbg4JFPRC3sw+7X > h+bHtn7Nbc7HCUhho4nJ > =9zvn > -----END PGP SIGNATURE----- >