Partitioning at the edges

Andy Chambers Fri, 02 Sep 2016 18:44:15 -0700

Hey Folks,

We are having quite a bit trouble modelling the flow of data through a very
kafka centric system


As I understand it, every stream you might want to join with another must
be partitioned the same way. But often streams at the edges of a system
*cannot* be partitioned the same way because they don't have the partition
key yet (often the work for this process is to find the key in some lookup
table based on some other key we don't control).

We have come up with a few solutions but everything seems to add complexity
and backs our designs into a corner.

What is frustrating is that most of the data is not really that big but we
have a handful of topics we expect to require a lot of throughput.

Is this just unavoidable complexity asociated with scale or am I thinking
about this in the wrong way. We're going all in on the "turning the
database inside out" architecture but we end up spending more time thinking
about how stuff gets broken up into tasks and distributed than we are about
our business.

Do these problems seem familiar to anyone else?  Did you find any patterns
that helped keep the complexity down.

Cheers

Partitioning at the edges

Reply via email to