Re: Table API thrift support

Chen Qin Mon, 30 May 2022 18:36:39 -0700

On Mon, May 30, 2022 at 7:35 AM Martijn Visser <[email protected]>
wrote:


> Hi Chen,
>
> I think the best starting point would be to create a FLIP [1]. One of the
> important topics from my point of view is to make sure that such changes
> are not only available for SQL users, but are also being considered for
> Table API, DataStream and/or Python. There might be reasons why not to do
> that, but then those considerations should also be captured in the FLIP.
>
> > thanks for piointer, working on Flip-237, stay tune

> Another thing that would be interesting is how Thrift translates into Flink
> connectors & Flink formats. Or is your Thrift implementation only a
> connector?
>
> it's flink-format for most part, hope it can help with pyflink not sure.

>
> Best regards,
>
> Martijn
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>
> Op zo 29 mei 2022 om 19:06 schreef Chen Qin <[email protected]>:
>
> > Hi there,
> >
> > We would like to discuss and potentially upstream our thrift support
> > patches to flink.
> >
> > For some context, we have been internally patched flink-1.11.2 to support
> > FlinkSQL jobs read/write to thrift encoded kafka source/sink. Over the
> > course of last 12 months, those patches supports a few features not
> > available in open source master, including
> >
> >    - allow user defined inference thrift stub class name in table DDL,
> >    Thrift binary <-> Row
> >    - dynamic overwrite schema type information loaded from HiveCatalog
> >    (Table only)
> >    - forward compatible when kafka topic encode with new schema (adding
> new
> >    field)
> >    - backward compatible when job with new schema handles input or state
> >    with old schema
> >
> > With more FlinkSQL jobs in production, we expect maintenance of divergent
> > feature sets to increase in the next 6-12 months. Specifically challenges
> > around
> >
> >    - lack of systematic way to support inference based table/view ddl
> >    (parity with hiveql serde
> >    <
> >
> https://cwiki.apache.org/confluence/display/hive/serde#:~:text=SerDe%20Overview,-SerDe%20is%20short&text=Hive%20uses%20the%20SerDe%20interface,HDFS%20in%20any%20custom%20format
> > .>
> >    )
> >    - lack of robust mapping from thrift field to row field
> >    - dynamic update set of table with same inference class when
> performing
> >    schema change (e.g adding new field)
> >    - minor lack of handle UNSET case, use NULL
> >
> > Please kindly provide pointers around the challenges section.
> >
> > Thanks,
> > Chen, Pinterest.
> >
>

Re: Table API thrift support

Reply via email to