Is there a typo below? Are all of these actually in the same topic, just different partitions? Partitioning, AFAIK, is mainly done for parallelism & throughput reasons. What is the reason for partitioning your dataset by ‘columns’?
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic? Richard > On Mar 26, 2015, at 8:22 AM, Shekar Tippur <ctip...@gmail.com> wrote: > > Hello, > > Want to confirm a basic understanding of Kafka. > If I have a dataset that needs to be partitioned by 4 columns, then the > progression is > > {topic1:partition_key1} -> {Group by samza on partition_key1} > -> > {topic2:partition_key2} -> {Group by samza on partition_key2} > -> > {topic3:partition_key3} -> {Group by samza on partition_key3} > -> > {topic4:partition_key4} -> {Group by samza on partition_key4} > > Can you please confirm if my understanding is right? > > - Shekar ________________________________ This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.