For a kafka-streams application, we keep data per team. Data from 2 teams never meet but within a team, data is highly integrated. A team has team members but also has several types of equipment. A team has a lifespan of about 1-3 days after which the team is removed and all data relating to that team should be evicted.
How would you partition the data? - Using the team id as key for all streams seems not ideal b/c this means all aggregations need to happen per team involving a ser/deser of the entire team data. Suppose there's 10 team members and only 1 team member is sending events that need to be aggregated. In this case, we need a ser/deser of the entire aggregated team data. I'm afraid this would result in quite a bit of overhead because. - Using the user id or equipment id as key would result in much smaller aggregations but does mean quite a bit of repartitioning when aggregating and joining users of the same team. I ended up using the second approach, but I wonder if that was really a good idea b/c the entire streaming logic does become quite involved. What is your experience with this type of data? Best regards Jan