davidradl commented on code in PR #24618:
URL: https://github.com/apache/flink/pull/24618#discussion_r1613290123


##########
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/lineage/LineageGraph.java:
##########
@@ -20,13 +20,12 @@
 package org.apache.flink.streaming.api.lineage;
 
 import org.apache.flink.annotation.PublicEvolving;
-import org.apache.flink.streaming.api.graph.StreamGraph;
 
 import java.util.List;
 
 /**
- * Job lineage is built according to {@link StreamGraph}, users can get 
sources, sinks and
- * relationships from lineage and manage the relationship between jobs and 
tables.
+ * Job lineage graph that users can get sources, sinks and relationships from 
lineage and manage the

Review Comment:
   > Thanks David for your comments. Yes, the documentation will be added after 
adding the job lineage listener which is more user facing. It is planned in 
this jira https://issues.apache.org/jira/browse/FLINK-33212. This PR only 
consider source/sink level lineage. Column level lineage is not included for 
this work, so internal transformations not need lineage info for now. Would you 
please elaborate more about "I assume a sink could be a source - so could be in 
both current lists"?
   
   Hi Peter, usually we think of lineage assets as the nodes in the lineage 
(e.g. open lineage). So the asset could be a Kafka topic and that topic would 
be being used as a source for some flows and a sink for other flows. I was 
wondering how this fits with  lineage at the table level, where there could be 
a table defined as a sink and a table defined as a source on the same Kafka 
topic. I guess when exporting / exposing to open lineage there could be many 
Flink tables referring to the same topic that would end up as one open lineage 
node. The natural way for Flink to store the lineage is at the table level - 
rather than at the asset level. So thinking about it, I think this is fine.     



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to