Hey Folks, We are having quite a bit trouble modelling the flow of data through a very kafka centric system
As I understand it, every stream you might want to join with another must be partitioned the same way. But often streams at the edges of a system *cannot* be partitioned the same way because they don't have the partition key yet (often the work for this process is to find the key in some lookup table based on some other key we don't control). We have come up with a few solutions but everything seems to add complexity and backs our designs into a corner. What is frustrating is that most of the data is not really that big but we have a handful of topics we expect to require a lot of throughput. Is this just unavoidable complexity asociated with scale or am I thinking about this in the wrong way. We're going all in on the "turning the database inside out" architecture but we end up spending more time thinking about how stuff gets broken up into tasks and distributed than we are about our business. Do these problems seem familiar to anyone else? Did you find any patterns that helped keep the complexity down. Cheers