stream keyBy without repartition

Bart Wyatt Tue, 24 May 2016 07:23:05 -0700

(migrated from IRC)


Hello All,


My situation is this:

I have a large amount of data partitioned in kafka by "session" (natural 
partitioning).  After I read the data, I would like to do as much as possible 
before incurring re-serialization or network traffic due to the size of the 
data.  I am on 1.0.3 in the java API.


What I'd like to do is:


while maintaining the natural partitioning (so that a single thread can perform 
this) read data from kafka, perform a window'd fold over the incoming data 
keyed by a _different_ field("key") then take the product of that window'd fold 
and allow re-partitioning to colocate data with equivalent keys in a new 
partitioning scheme where they can be reduced into a final product.  The hope 
is that the products of such a windowed fold are orders of magnitude smaller 
than the data that would be serialized/sent if we re-partitioned before the 
window'd fold.


Is there a way to .keyBy(...) such that it will act within the physical 
partitioning of the data and not force a  re-partitioning of the data by that 
key?


thanks

-Bart


________________________________
This e-mail may contain CONFIDENTIAL AND PROPRIETARY INFORMATION and/or 
PRIVILEGED AND CONFIDENTIAL COMMUNICATION intended solely for the recipient 
and, therefore, may not be retransmitted to any party outside of the 
recipient's organization without the prior written consent of the sender. If 
you have received this e-mail in error please notify the sender immediately by 
telephone or reply e-mail and destroy the original message without making a 
copy. Deep Silver, Inc. accepts no liability for any losses or damages 
resulting from infected e-mail transmissions and viruses in e-mail attachments.

stream keyBy without repartition

Reply via email to