For a kafka-streams application, we keep data per team. Data from 2 teams
never meet but within a team, data is highly integrated. A team has team
members but also has several types of equipment.
A team has a lifespan of about 1-3 days after which the team is removed and
all data relating to that team should be evicted.

How would you partition the data?
- Using the team id as key for all streams seems not ideal b/c this means
all aggregations need to happen per team involving a ser/deser of the
entire team data. Suppose there's 10 team members and only 1 team member is
sending events that need to be aggregated. In this case, we need a
ser/deser of the entire aggregated team data. I'm afraid this would result
in quite a bit of overhead because.
- Using the user id or equipment id as key would result in much smaller
aggregations but does mean quite a bit of repartitioning when aggregating
and joining users of the same team.

I ended up using the second approach, but I wonder if that was really a
good idea b/c the entire streaming logic does become quite involved.

What is your experience with this type of data?

Best regards
Jan

Reply via email to