Re: Partitioning at the edges

Eno Thereska Sat, 03 Sep 2016 06:14:38 -0700

Hi Andy,

Could you share a bit more info or pseudocode so that we can understand the 
scenario a bit better? Especially around the streams at the edges. How are they 
created and what is the join meant to do?


Thanks
Eno

> On 3 Sep 2016, at 02:43, Andy Chambers <achambers.h...@gmail.com> wrote:
> 
> Hey Folks,
> 
> We are having quite a bit trouble modelling the flow of data through a very
> kafka centric system
> 
> As I understand it, every stream you might want to join with another must
> be partitioned the same way. But often streams at the edges of a system
> *cannot* be partitioned the same way because they don't have the partition
> key yet (often the work for this process is to find the key in some lookup
> table based on some other key we don't control).
> 
> We have come up with a few solutions but everything seems to add complexity
> and backs our designs into a corner.
> 
> What is frustrating is that most of the data is not really that big but we
> have a handful of topics we expect to require a lot of throughput.
> 
> Is this just unavoidable complexity asociated with scale or am I thinking
> about this in the wrong way. We're going all in on the "turning the
> database inside out" architecture but we end up spending more time thinking
> about how stuff gets broken up into tasks and distributed than we are about
> our business.
> 
> Do these problems seem familiar to anyone else?  Did you find any patterns
> that helped keep the complexity down.
> 
> Cheers

Re: Partitioning at the edges

Reply via email to