In my use case my source stream contain small size messages, but as part of flink processing I will be aggregating them into large messages and further processing will happen on these large messages. The structure of this large message will be something like this:
Class LargeMessage { String key List <String> messages; // this is where the aggregation of smaller messages happen } In some cases this list field of LargeMessage can get very large (1000’s of messages). Is it ok to create an intermediate stream of these LargeMessages? What should I be concerned about while designing the flink job? Specifically with parallelism in mind. As these LargeMessages flow from one flink subtask to another, do they get serialized/deserialized ? Thanks.