Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2023-06-30 Thread Ran Tao
Hi, Ilya. thanks for your reply. If your first s3 and second kafka source has same schema. Currently hybrid table source can work. For you question >> But because Flink’s optimizer removes unused fields from internal records in the batch mode, the problem of inconsistent schema arises at runtime.

Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2023-05-22 Thread Raman Verma
Hello Ran Tao, Thanks for this FLIP. I have a comment about the handover of switching context between sources. You have proposed to define interfaces named around timestamps, SupportsGetEndTimestamp and SupportsSwitchedStartTimestamp. These work well with KafkaSource as the downstream child sour

Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2023-05-11 Thread Ilya Soin
Hi, Ran Tao. Thanks for the reply! I agree that a way to manage inconsistent field names / numbers will need to be provided and that for POC it’s enough to support the case where the batch and streaming schemas are consistent. However, in the example provided by me, the schemas in batch and str

Re: Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2023-05-10 Thread Ran Tao
Hi, Илья. Thanks for your opinions! Your are right, and in fact, in addition to the different fields numbers, the names may also be different. Currently, we can also support inconsistent schema, which was discussed in the previous design, for example, we can provide a `schema.fields.mappings` para

RE: Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2023-05-10 Thread Илья Соин
Hi devs, I think for this approach to work, the internal record schema generated by Flink must be exactly the same for batch and stream records, because at runtime Flink will use the same serializer to send them downstream. However, it’s not always the case, because in batch mode Flink’s optimi

Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2023-05-09 Thread Ran Tao
Hi, devs. I don't know if you have any other considerations for this FLIP. All discussions are welcome. If there are no other opinions in the near days, I will try to initiate a vote. thank you all. Best Regards, Ran Tao Ran Tao 于2023年4月10日周一 15:33写道: > Hi, devs. I want to reopen this discus

Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2023-04-10 Thread Ran Tao
Hi, devs. I want to reopen this discussion because some questions have been solved or need more discussions. In the previous discussion, there were some questions and problems. @Timo 1.about option prefix, we decide to use identifiers. e.g. ``` create table hybrid_source( f0 varchar, f1 varcha

Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2023-02-03 Thread Ran Tao
Hi, all. i have updated flip-278[1]. I think all problems or comments has been addressed. 1.about option prefix, we use identifiers. 2.table api implementation and demo 3.about switched dynamic position (hybrid source use it auto switch from previous to next source) More details can be found at d

Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2023-02-03 Thread Ran Tao
Hi, Martijn. i have updated the flip about table api & switched start timestamp. thanks. Martijn Visser 于2022年12月16日周五 16:59写道: > Hi Ran, > > For completeness, this is a new thread that was already previously started > at https://lists.apache.org/thread/xptn2ddzj34q9f5vtbfb62lsybmvcwjq. I'm > li

Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2022-12-19 Thread Ran Tao
a mistake, childSources.get(sourceIndex).setStartTimetamp(switchedTimestamp); Ran Tao 于2022年12月19日周一 16:10写道: > Hi, John. thanks for your comments. > About question-2 the "handoff" is using for switching next source > seamlessly. but it's an option. Not every hybrid source job need to using > th

Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2022-12-19 Thread Ran Tao
Hi, John. thanks for your comments. About question-2 the "handoff" is using for switching next source seamlessly. but it's an option. Not every hybrid source job need to using this mode. The hybrid source sql or table need to implement two ways like DataStream api below. One for fixed position, u

Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2022-12-18 Thread John Roesler
Hello all, Thanks for the FLIP, Ran! The HybridSource is a really cool feature, and I was glad to see a proposal to expose it in the Table and SQL APIs. My main question is also about the switching control (question 2). It seems like the existing Kafka connector has all the options you'd want

Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2022-12-16 Thread Ran Tao
Hi, Martijn, thanks for your comments. Using identifier as child source prefix may be a good way instead of index. i will update the flip to illustrate how we can read from hybrid schema to generate child schemas for the question1. question2 is start position for the next kafka source. But curre

Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2022-12-16 Thread Martijn Visser
Hi Ran, For completeness, this is a new thread that was already previously started at https://lists.apache.org/thread/xptn2ddzj34q9f5vtbfb62lsybmvcwjq. I'm linking them because I think Timo's comments are relevant to be kept with this discussion thread. I agree with Timo's comments from there tha

Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2022-12-15 Thread Ran Tao
Fyi. This flip using index as child source option prefix because we may use the same connector as hybrid child sources. e.g. create table hybrid_source( f0 varchar, f1 varchar, f2 bigint ) with( 'connector'='hybrid', 'sources'='filesystem,filesystem', '0.path' = '/tmp/a.csv', '0.format' =