[ https://issues.apache.org/jira/browse/FLINK-31275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781613#comment-17781613 ]
Fang Yong commented on FLINK-31275: ----------------------------------- Hi [~mobuchowski], thanks for your comments. In the currently FLIP the `LineageVertex` is the top interface for vertexes in lineage graph, it will be used in flink sql jobs and datastream jobs. 1. For table connectors in sql jobs, there will be `TableLineageVertex` which is generated from flink catalog based table and provides catalog context, table schema for specified connector. The table lineage vertex and edge implementations will be created from dynamic tables for connectors in flink, and they will be updated when the connectors are updated. 2. For customized source/sink in datastream jobs, we can get source and slink `LineageVertex` implementations from `LineageVertexProvider`. When users implement customized lineage vertex and edge, they need to update them when their connectors are updated. IIUC, do you mean we should give an implementation of `LineageVertex` for datastream jobs and users can provide source/sink information there just like `TableLinageVertex` in sql jobs? Then listeners can use the datastream lineage vertex which is similar with table lineage vertex? Due to the flexibility of the source and sink in `DataStream`, we think it's hard to cover all of them, so we just provide `LineageVertex` and `LineageVertexProvider` for them. So we left this flexibility to users and listeners. If a custom connector is a table in `DataStream` job, users can return `TableLineageVertex` in the `LineageVertexProvider`. And for the following `LineageVertex` ``` public interface LineageVertex { /* Config for the lineage vertex contains all the options for the connector. */ Map<String, String> config(); /* List of datasets that are consumed by this job */ List<Dataset> inputs(); /* List of datasets that are produced by this job */ List<Dataset> outputs(); } ``` We tend to provide independent edge descriptions of connectors in `LineageEdge` for lineage graph instead of adding dataset in `LineageVertex`. The `LineageVertex` here is the `DataSet` you mentioned. WDYT? Hope to hear from you, thanks > Flink supports reporting and storage of source/sink tables relationship > ----------------------------------------------------------------------- > > Key: FLINK-31275 > URL: https://issues.apache.org/jira/browse/FLINK-31275 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner > Affects Versions: 1.18.0 > Reporter: Fang Yong > Assignee: Fang Yong > Priority: Major > > FLIP-314 has been accepted > https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener -- This message was sent by Atlassian Jira (v8.20.10#820010)