[jira] [Commented] (FLINK-31275) Flink supports reporting and storage of source/sink tables relationship

Fang Yong (Jira) Tue, 31 Oct 2023 23:34:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-31275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781613#comment-17781613
 ]


Fang Yong commented on FLINK-31275:
-----------------------------------

Hi [~mobuchowski], thanks for your comments. In the currently FLIP the 
`LineageVertex` is the top interface for vertexes in lineage graph, it will be 
used in flink sql jobs and datastream jobs. 

1. For table connectors in sql jobs, there will be `TableLineageVertex` which 
is generated from flink catalog based table and provides catalog context, table 
schema for specified connector. The table lineage vertex and edge 
implementations will be created from dynamic tables for connectors in flink, 
and they will be updated when the connectors are updated.

2. For customized source/sink in datastream jobs, we can get source and slink 
`LineageVertex` implementations from `LineageVertexProvider`. When users 
implement customized lineage vertex and edge, they need to update them when 
their connectors are updated.

IIUC, do you mean we should give an implementation of `LineageVertex` for 
datastream jobs and users can provide source/sink information there just like 
`TableLinageVertex` in sql jobs? Then listeners can use the datastream lineage 
vertex which is similar with table lineage vertex? 

Due to the flexibility of the source and sink in `DataStream`, we think it's 
hard to cover all of them, so we just provide `LineageVertex` and 
`LineageVertexProvider` for them. So we left this flexibility to users and 
listeners. If a custom connector is a table in `DataStream` job, users can 
return `TableLineageVertex` in the `LineageVertexProvider`.

And for the following `LineageVertex`
```
public interface LineageVertex {
    /* Config for the lineage vertex contains all the options for the 
connector. */
    Map<String, String> config();
    /* List of datasets that are consumed by this job */    
    List<Dataset> inputs();
    /* List of datasets that are produced by this job */    
    List<Dataset> outputs();
}
```
We tend to provide independent edge descriptions of connectors in `LineageEdge` 
for lineage graph instead of adding dataset in `LineageVertex`. The 
`LineageVertex` here is the `DataSet` you mentioned.

WDYT? Hope to hear from you, thanks






> Flink supports reporting and storage of source/sink tables relationship
> -----------------------------------------------------------------------
>
>                 Key: FLINK-31275
>                 URL: https://issues.apache.org/jira/browse/FLINK-31275
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Planner
>    Affects Versions: 1.18.0
>            Reporter: Fang Yong
>            Assignee: Fang Yong
>            Priority: Major
>
> FLIP-314 has been accepted 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31275) Flink supports reporting and storage of source/sink tables relationship

Reply via email to