Hi Shekar, Each kafka partition is basically just a number, you would need to specify what partitioner strategy to use when mapping your event key to the partition number. You can take the 4 columns you have in the event and map it to a partition number,the partitioner in that case would be a function that would work similar to that notion: (a, b, c, d) -> (int)
Once you partition your data to different topic partitions, each partition will hold a sub-set of the dataset that is basically similar to what SQL "group by" statement would have done. Hope that helps, Dotan On Thu, Mar 26, 2015 at 5:22 PM, Shekar Tippur <ctip...@gmail.com> wrote: > Hello, > > Want to confirm a basic understanding of Kafka. > If I have a dataset that needs to be partitioned by 4 columns, then the > progression is > > {topic1:partition_key1} -> {Group by samza on partition_key1} > -> > {topic2:partition_key2} -> {Group by samza on partition_key2} > -> > {topic3:partition_key3} -> {Group by samza on partition_key3} > -> > {topic4:partition_key4} -> {Group by samza on partition_key4} > > Can you please confirm if my understanding is right? > > - Shekar >