Hi there,

I am thinking of implement sideOutput into Flink which seems missing
support.
https://cloud.google.com/dataflow/model/par-do#side-outputs

It is useful because it will help pipeline author redirect corrputed input/
code bug to a side stream or write to a table and reconsile afterwards.

After some hack prototyping, I were able to get it works for simple tests.
Basically, It allows env to register a side output typeInfo which will be
passed to configurations during graph building; Adding a new transform
which similar to selection transform but holding different input type;
StreamEdge will has a boolean to see if that is side output edge, if so,
create output writer loads side output type serializer and emit record only
when sideOutput is called.

I have some problem passing side output type as template to each data
stream. It means it will have to expose any output stream with two type
parameters. As you can imagine, the API interface change will be sizable.

Any suggestion?

Chen

Reply via email to