Hi devs, According to the online-discussion in FLINK-3127 [1] and offline-discussion with Maciej Obuchowski and Zhenqiu Huang, we would like to update the lineage vertex relevant interfaces in FLIP-314 [2] as follows:
1. Introduce `LineageDataset` which represents source and sink in `LineageVertex`. The fields in `LineageDataset` are as follows: /* Name for this particular dataset. */ String name; /* Unique name for this dataset's storage, for example, url for jdbc connector and location for lakehouse connector. */ String namespace; /* Facets for the lineage vertex to describe the particular information of dataset, such as schema and config. */ Map<String, Facet> facets; 2. There may be multiple datasets in one `LineageVertex`, for example, kafka source or hybrid source. So users can get dataset list from `LineageVertex`: /** Get datasets from the lineage vertex. */ List<LineageDataset> datasets(); 3. There will be built in facets for config and schema. To describe columns in table/sql jobs and datastream jobs, we introduce `DatasetSchemaField`. /** Builtin config facet for dataset. */ @PublicEvolving public interface DatasetConfigFacet extends LineageDatasetFacet { Map<String, String> config(); } /** Field for schema in dataset. */ public interface DatasetSchemaField<T> { /** The name of the field. */ String name(); /** The type of the field. */ T type(); } Thanks for valuable inputs from @Maciej and @Zhenqiu. And looking forward to your feedback, thanks Best, Fang Yong On Mon, Sep 25, 2023 at 1:18 PM Shammon FY <zjur...@gmail.com> wrote: > Hi David, > > Do you want the detailed topology for Flink job? You can get > `JobDetailsInfo` in `RestCusterClient` with the submitted job id, it has > `String jsonPlan`. You can parse the json plan to get all steps and > relations between them in a Flink job. Hope this can help you, thanks! > > Best, > Shammon FY > > On Tue, Sep 19, 2023 at 11:46 PM David Radley <david_rad...@uk.ibm.com> > wrote: > >> Hi there, >> I am looking at the interfaces. If I am reading it correctly,there is one >> relationship between the source and sink and this relationship represents >> the operational lineage. Lineage is usually represented as asset -> process >> - > asset – see for example >> https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph >> >> Maybe I am missing it, but it seems to be that it would be useful to >> store the process in the lineage graph. >> >> It is useful to have the top level lineage as source -> Flink job -> >> sink. Where the Flink job is the process, but also to have this asset -> >> process -> asset pattern for each of the steps in the job. If this is >> present, please could you point me to it, >> >> Kind regards, David. >> >> >> >> >> >> From: David Radley <david_rad...@uk.ibm.com> >> Date: Tuesday, 19 September 2023 at 16:11 >> To: dev@flink.apache.org <dev@flink.apache.org> >> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-314: Support Customized Job >> Lineage Listener >> Hi, >> I notice that there is an experimental lineage integration for Flink with >> OpenLineage https://openlineage.io/docs/integrations/flink . I think >> this feature would allow for a superior Flink OpenLineage integration, >> Kind regards, David. >> >> From: XTransfer <jiabao....@xtransfer.cn.INVALID> >> Date: Tuesday, 19 September 2023 at 15:47 >> To: dev@flink.apache.org <dev@flink.apache.org> >> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-314: Support Customized Job >> Lineage Listener >> Thanks Shammon for this proposal. >> >> That’s helpful for collecting the lineage of Flink tasks. >> Looking forward to its implementation. >> >> Best, >> Jiabao >> >> >> > 2023年9月18日 20:56,Leonard Xu <xbjt...@gmail.com> 写道: >> > >> > Thanks Shammon for the informations, the comment makes the lifecycle >> clearer. >> > +1 >> > >> > >> > Best, >> > Leonard >> > >> > >> >> On Sep 18, 2023, at 7:54 PM, Shammon FY <zjur...@gmail.com> wrote: >> >> >> >> Hi devs, >> >> >> >> After discussing with @Qingsheng, I fixed a minor issue of the lineage >> lifecycle in `StreamExecutionEnvironment`. I have added the comment to >> explain that the lineage information in `StreamExecutionEnvironment` will >> be consistent with that of transformations. When users clear the existing >> transformations, the added lineage information will also be deleted. >> >> >> >> Please help to review it again, and If there are no more concerns >> about FLIP-314[1], I would like to start voting later, thanks. cc @ >> <>Leonard >> >> >> >> Best, >> >> Shammon FY >> >> >> >> On Mon, Jul 17, 2023 at 3:43 PM Shammon FY <zjur...@gmail.com <mailto: >> zjur...@gmail.com>> wrote: >> >> Hi devs, >> >> >> >> Thanks for all the valuable feedback. If there are no more concerns >> about FLIP-314[1], I would like to start voting later, thanks. >> >> >> >> >> >> [1] >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener >> < >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener >> > >> >> >> >> Best, >> >> Shammon FY >> >> >> >> >> >> On Wed, Jul 12, 2023 at 11:18 AM Shammon FY <zjur...@gmail.com >> <mailto:zjur...@gmail.com>> wrote: >> >> Thanks for the valuable feedback, Leonard. >> >> >> >> I have discussed with Leonard off-line. We have reached some >> conclusions about these issues and I have updated the FLIP as follows: >> >> >> >> 1. Simplify the `LineageEdge` interface by creating an edge from one >> source vertex to sink vertex. >> >> 2. Remove the `TableColumnSourceLineageVertex` interface and update >> `TableColumnLineageEdge` to create an edge from columns in one source to >> each sink column. >> >> 3. Rename `SupportsLineageVertex` to `LineageVertexProvider` >> >> 4. Add method `addLineageEdges(LineageEdge ... edges)` in >> `StreamExecutionEnviroment` for datastream job and remove previous methods >> in `DataStreamSource` and `DataStreamSink`. >> >> >> >> Looking forward to your feedback, thanks. >> >> >> >> Best, >> >> Shammon FY >> > >> >> Unless otherwise stated above: >> >> IBM United Kingdom Limited >> Registered in England and Wales with number 741598 >> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU >> >> Unless otherwise stated above: >> >> IBM United Kingdom Limited >> Registered in England and Wales with number 741598 >> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU >> >