Hi devs,

According to the online-discussion in FLINK-3127 [1] and offline-discussion
with Maciej Obuchowski and Zhenqiu Huang, we would like to update the
lineage vertex relevant interfaces in FLIP-314 [2] as follows:

1. Introduce `LineageDataset` which represents source and sink in
`LineageVertex`. The fields in `LineageDataset` are as follows:
    /* Name for this particular dataset. */
    String name;
    /* Unique name for this dataset's storage, for example, url for jdbc
connector and location for lakehouse connector. */
    String namespace;
    /* Facets for the lineage vertex to describe the particular information
of dataset, such as schema and config. */
    Map<String, Facet> facets;

2. There may be multiple datasets in one `LineageVertex`, for example,
kafka source or hybrid source. So users can get dataset list from
`LineageVertex`:
    /** Get datasets from the lineage vertex. */
    List<LineageDataset> datasets();

3. There will be built in facets for config and schema. To describe columns
in table/sql jobs and datastream jobs, we introduce `DatasetSchemaField`.
    /** Builtin config facet for dataset. */
    @PublicEvolving
    public interface DatasetConfigFacet extends LineageDatasetFacet {
        Map<String, String> config();
    }

    /** Field for schema in dataset. */
    public interface DatasetSchemaField<T> {
        /** The name of the field. */
        String name();
        /** The type of the field. */
        T type();
    }

Thanks for valuable inputs from @Maciej and @Zhenqiu. And looking forward
to your feedback, thanks

Best,
Fang Yong

On Mon, Sep 25, 2023 at 1:18 PM Shammon FY <zjur...@gmail.com> wrote:

> Hi David,
>
> Do you want the detailed topology for Flink job? You can get
> `JobDetailsInfo` in `RestCusterClient` with the submitted job id, it has
> `String jsonPlan`. You can parse the json plan to get all steps and
> relations between them in a Flink job. Hope this can help you, thanks!
>
> Best,
> Shammon FY
>
> On Tue, Sep 19, 2023 at 11:46 PM David Radley <david_rad...@uk.ibm.com>
> wrote:
>
>> Hi there,
>> I am looking at the interfaces. If I am reading it correctly,there is one
>> relationship between the source and sink and this relationship represents
>> the operational lineage. Lineage is usually represented as asset -> process
>> - > asset – see for example
>> https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph
>>
>> Maybe I am missing it, but it seems to be that it would be useful to
>> store the process in the lineage graph.
>>
>> It is useful to have the top level lineage as source -> Flink job ->
>> sink. Where the Flink job is the process, but also to have this asset ->
>> process -> asset pattern for each of the steps in the job. If this is
>> present, please could you point me to it,
>>
>>       Kind regards, David.
>>
>>
>>
>>
>>
>> From: David Radley <david_rad...@uk.ibm.com>
>> Date: Tuesday, 19 September 2023 at 16:11
>> To: dev@flink.apache.org <dev@flink.apache.org>
>> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-314: Support Customized Job
>> Lineage Listener
>> Hi,
>> I notice that there is an experimental lineage integration for Flink with
>> OpenLineage https://openlineage.io/docs/integrations/flink  . I think
>> this feature would allow for a superior Flink OpenLineage integration,
>>         Kind regards, David.
>>
>> From: XTransfer <jiabao....@xtransfer.cn.INVALID>
>> Date: Tuesday, 19 September 2023 at 15:47
>> To: dev@flink.apache.org <dev@flink.apache.org>
>> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-314: Support Customized Job
>> Lineage Listener
>> Thanks Shammon for this proposal.
>>
>> That’s helpful for collecting the lineage of Flink tasks.
>> Looking forward to its implementation.
>>
>> Best,
>> Jiabao
>>
>>
>> > 2023年9月18日 20:56,Leonard Xu <xbjt...@gmail.com> 写道:
>> >
>> > Thanks Shammon for the informations, the comment makes the lifecycle
>> clearer.
>> > +1
>> >
>> >
>> > Best,
>> > Leonard
>> >
>> >
>> >> On Sep 18, 2023, at 7:54 PM, Shammon FY <zjur...@gmail.com> wrote:
>> >>
>> >> Hi devs,
>> >>
>> >> After discussing with @Qingsheng, I fixed a minor issue of the lineage
>> lifecycle in `StreamExecutionEnvironment`. I have added the comment to
>> explain that the lineage information in `StreamExecutionEnvironment` will
>> be consistent with that of transformations. When users clear the existing
>> transformations, the added lineage information will also be deleted.
>> >>
>> >> Please help to review it again, and If there are no more concerns
>> about FLIP-314[1], I would like to start voting later, thanks. cc @
>> <>Leonard
>> >>
>> >> Best,
>> >> Shammon FY
>> >>
>> >> On Mon, Jul 17, 2023 at 3:43 PM Shammon FY <zjur...@gmail.com <mailto:
>> zjur...@gmail.com>> wrote:
>> >> Hi devs,
>> >>
>> >> Thanks for all the valuable feedback. If there are no more concerns
>> about FLIP-314[1], I would like to start voting later, thanks.
>> >>
>> >>
>> >> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
>>  <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
>> >
>> >>
>> >> Best,
>> >> Shammon FY
>> >>
>> >>
>> >> On Wed, Jul 12, 2023 at 11:18 AM Shammon FY <zjur...@gmail.com
>> <mailto:zjur...@gmail.com>> wrote:
>> >> Thanks for the valuable feedback, Leonard.
>> >>
>> >> I have discussed with Leonard off-line. We have reached some
>> conclusions about these issues and I have updated the FLIP as follows:
>> >>
>> >> 1. Simplify the `LineageEdge` interface by creating an edge from one
>> source vertex to sink vertex.
>> >> 2. Remove the `TableColumnSourceLineageVertex` interface and update
>> `TableColumnLineageEdge` to create an edge from columns in one source to
>> each sink column.
>> >> 3. Rename `SupportsLineageVertex` to `LineageVertexProvider`
>> >> 4. Add method `addLineageEdges(LineageEdge ... edges)` in
>> `StreamExecutionEnviroment` for datastream job and remove previous methods
>> in `DataStreamSource` and `DataStreamSink`.
>> >>
>> >> Looking forward to your feedback, thanks.
>> >>
>> >> Best,
>> >> Shammon FY
>> >
>>
>> Unless otherwise stated above:
>>
>> IBM United Kingdom Limited
>> Registered in England and Wales with number 741598
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>>
>> Unless otherwise stated above:
>>
>> IBM United Kingdom Limited
>> Registered in England and Wales with number 741598
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>>
>

Reply via email to