Hi Andy, Could you share a bit more info or pseudocode so that we can understand the scenario a bit better? Especially around the streams at the edges. How are they created and what is the join meant to do?
Thanks Eno > On 3 Sep 2016, at 02:43, Andy Chambers <achambers.h...@gmail.com> wrote: > > Hey Folks, > > We are having quite a bit trouble modelling the flow of data through a very > kafka centric system > > As I understand it, every stream you might want to join with another must > be partitioned the same way. But often streams at the edges of a system > *cannot* be partitioned the same way because they don't have the partition > key yet (often the work for this process is to find the key in some lookup > table based on some other key we don't control). > > We have come up with a few solutions but everything seems to add complexity > and backs our designs into a corner. > > What is frustrating is that most of the data is not really that big but we > have a handful of topics we expect to require a lot of throughput. > > Is this just unavoidable complexity asociated with scale or am I thinking > about this in the wrong way. We're going all in on the "turning the > database inside out" architecture but we end up spending more time thinking > about how stuff gets broken up into tasks and distributed than we are about > our business. > > Do these problems seem familiar to anyone else? Did you find any patterns > that helped keep the complexity down. > > Cheers