Hey all, We've got a number of flink jobs deployed and I'm wrestling with how best to collect data lineage information from them. Our most common connector on either end is Kafka, as either a source or a sink. I've found that I can subclass the Kafka record de/serializers and collect topics off of the messages on the way in/out for the jobs that use the DataStream api but I'm finding it significantly more cumbersome to do similar for the ones going through the sql connector.
I've taken a step-back from that approach and am wondering if maybe there's a passive way I can collect this information by observing the flink job externally? I've found my source topics sprinkled throughout the logs, checkpoint files and in metric names on the job graph and if source topics were all that I was worried about, I think I could collect them from any of those places. I'm having a hard time finding sink topics by any passive means though, is it possible to get that information anywhere? Thanks!