Hi Shekar, Please refer to [1]. You can set a custom partitioner through the producer cofig. You will have to implement your own partitioner based on your application and partitioning strategy.
Thanks Milinda [1] https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example On Thu, Mar 26, 2015 at 2:25 PM, Shekar Tippur <ctip...@gmail.com> wrote: > So if I have a feed with > > {user_id:12345, > ethnicity: asian, > location: "cerritos, ca", > Height:"5.9", > weight: "150 lbs"} > > I am referring to https://kafka.apache.org/081/ops.html#topic-config > > How do I map the 3 columns - (user_id, ethnicity, and location) to a > partition id. If I map it this way and say create 10 partitions, each > partition will contain a subset of data grouped by these columns - right? > > - Shekar > > > > > On Thu, Mar 26, 2015 at 9:38 AM, Roger Hoover <roger.hoo...@gmail.com> > wrote: > > > Hi Richard, > > > > You can also partition by a key like "user_id" so that all messages for a > > given user would end up in the same partition. This can be useful for > > calculating user-specific aggregations or doing a distributed join where > > the local state is also partitioned on user_id. > > > > Cheers, > > > > Roger > > > > On Thu, Mar 26, 2015 at 9:28 AM, Richard Lee <rd...@tivo.com> wrote: > > > > > Is there a typo below? Are all of these actually in the same topic, > just > > > different partitions? Partitioning, AFAIK, is mainly done for > > parallelism > > > & throughput reasons. What is the reason for partitioning your dataset > > by > > > ‘columns’? > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic > > > ? > > > > > > Richard > > > > > > > On Mar 26, 2015, at 8:22 AM, Shekar Tippur <ctip...@gmail.com> > wrote: > > > > > > > > Hello, > > > > > > > > Want to confirm a basic understanding of Kafka. > > > > If I have a dataset that needs to be partitioned by 4 columns, then > the > > > > progression is > > > > > > > > {topic1:partition_key1} -> {Group by samza on partition_key1} > > > > -> > > > > {topic2:partition_key2} -> {Group by samza on partition_key2} > > > > -> > > > > {topic3:partition_key3} -> {Group by samza on partition_key3} > > > > -> > > > > {topic4:partition_key4} -> {Group by samza on partition_key4} > > > > > > > > Can you please confirm if my understanding is right? > > > > > > > > - Shekar > > > > > > > > > ________________________________ > > > > > > This email and any attachments may contain confidential and privileged > > > material for the sole use of the intended recipient. Any review, > copying, > > > or distribution of this email (or any attachments) by others is > > prohibited. > > > If you are not the intended recipient, please contact the sender > > > immediately and permanently delete this email and any attachments. No > > > employee or agent of TiVo Inc. is authorized to conclude any binding > > > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo > > > Inc. may only be made by a signed written agreement. > > > > > > -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org