Hi Becket,
thanks for your feedback and the healthy discussion.
I think the connector story will still keep many of us busy in the next
time. It would be great if concepts from SQL can positively influence
the design of Source/Sink abstractions. Esp. we should think about some
guidelines of h
Hi Timo and Dawid,
Thanks for the patient explanation. I just had a phone call with Kurt and
Jark. I do see there are a few abstractions that we only see the use case
in SQL so far. Therefore while thinking of a Source abstraction that may be
shared with different use cases semantics is theoretica
Hi Becket,
Generally I don't think connector developers should bother with
understanding any of the SQL concepts.
I am not sure if we understand "connector developer" the same way. Let
me describe how I see the process of writing a new source (that can be
used in both Table & DataStream API)
1.
Hi Timo,
Regarding "connector developers just need to know how to write an
> ExpressionToParquetFilter":
>
This is the entire purpose of the DynamicTableSource/DynamicTableSink.
> The bridging between SQL concepts and connector specific concepts.
> Because this is the tricky part. How to get fro
Hi Becket,
Regarding "PushDown/NestedPushDown which is internal to optimizer":
Those concepts cannot be entirely internal to the optimizer, at some
point the optimizer needs to pass them into the connector specific code.
This code will then convert it to e.g. Parque expressions. So there must
Hi Timo,
Thanks for the reply. I totally agree that there must be something new
added to the connector in order to make it work for SQL / Table. My concern
is mostly over what they should be, and how to add them. To be honest, I
was kind of lost when looking at the interfaces such as
DataStructure
Hi Becket,
Let me clarify a few things first: Historically we thought of Table
API/SQL as a library on top of DataStream API. Similar to Gelly or CEP.
We used TypeInformation in Table API to integrate nicely with DataStream
API. However, the last years have shown that SQL is not just a library
Hi Kurt,
I do not object to promote the concepts of SQL, but I don't think we should
do that by introducing a new dedicate set of connector public interfaces
that is only for SQL. The same argument can be applied to Gelly, CEP, and
Machine Learning, claiming that they need to introduce a dedicated
Hi Becket,
I don't think we should discuss this in pure engineering aspects. Your
proposal is trying
to let SQL connector developers understand as less SQL concepts as
possible. But quite
the opposite, we are designing those interfaces to emphasize the SQL
concept, to bridge
high level concepts in
Hi Jark,
It is good to know that we do not expect the end users to touch those
interfaces.
Then the question boils down to whether the connector developers should be
aware of the interfaces that are only used by the SQL optimizer. It seems a
win if we can avoid that.
Two potential solutions off
Hi Becket,
Regarding to Flavor1 and Flavor2, I want to clarify that user will never
use table source like this:
{
MyTableSource myTableSource = MyTableSourceFactory.create();
myTableSource.setSchema(mySchema);
myTableSource.applyFilterPredicate(expression);
...
}
TableFactory and Tab
Hi Timo and Dawid,
Thanks for the clarification. They really help. You are right that we are
on the same page regarding the hierarchy. I think the only difference
between our view is the flavor of the interfaces. There are two flavors of
the source interface for DataStream and Table source.
*Flav
Hi Becket,
I really think we don't have a differing opinions. We might not see the
changes in the same way yet. Personally I think of the
DynamicTableSource as of a factory for a Source implemented for the
DataStream API. The important fact about the DynamicTableSource and all
feature traits (Supp
Hi Becket,
it is true that concepts such as projection and filtering are worth
having in DataStream API as well. And a SourceFunction can provide
interfaces for those concepts. In the table related classes we will
generate runtime classes that adhere to those interfaces and deal with
RowData
Hi Jark,
However, the interfaces proposed by FLIP-95 are mainly used during
> optimization (compiling), not runtime.
Yes, I am aware of that, I am wondering whether the SQL planner can use the
counterpart interface in the Source to apply the optimizations. It seems
should also work, right?
If w
Hey Kurt,
I don't think DataStream should see some SQL specific concepts such as
> Filtering or ComputedColumn.
Projectable and Filterable seems not necessarily SQL concepts, but could be
applicable to DataStream source as well to reduce the network load. For
example ORC and Parquet should proba
Thanks Timo for updating the formats section. That would be very helpful
for changelog supporting (FLIP-105).
I just left 2 minor comment about some method names. In general, I'm +1 to
start a voting.
Hi Becket,
I don't think DataStream should see some SQL specific concepts such as
Filtering or ComputedColumn. It's
better to stay within SQL area and translate to more generic concept when
translating to DataStream/Runtime
layer, such as use MapFunction to represent computed column logic.
Best,
Hi Timo and Dawid,
It's really great that we have the same goal. I am actually wondering if we
can go one step further to avoid some of the interfaces in Table as well.
For example, if we have the FilterableSource, do we still need the
FilterableTableSource? Should DynamicTableSource just become
+1. Thanks Timo for the design doc.
We can also consider @Experimental too. But I am +1 to @PublicEvolving, we
should be confident in the current change.
Best,
Jingsong Lee
On Tue, Mar 24, 2020 at 4:30 PM Timo Walther wrote:
> @Becket: We totally agree that we don't need table specific connect
@Becket: We totally agree that we don't need table specific connectors
during runtime. As Dawid said, the interfaces proposed here are just for
communication with the planner. Once the properties (watermarks,
computed column, filters, projecttion etc.) are negotiated, we can
configure a regular
Hi Becket,
Answering your question, we have the same intention not to duplicate
connectors between datastream and table apis. The interfaces proposed in
the FLIP are a way to describe relational properties of a source. The
intention is as you described to translate all of those expressed as
expres
Thanks Timo for the design doc.
In general I'm +1 to this, with a minor comment. Since we introduced dozens
interfaces all at once,
I'm not sure if it's good to annotate them with @PublicEnvolving already. I
can imagine these interfaces
would only be stable after 1 or 2 major release. Given the f
Hi Timo,
Thanks for the proposal. I completely agree that the current Table
connectors could be simplified quite a bit. I haven't finished reading
everything, but here are some quick thoughts.
Actually to me the biggest question is why should there be two different
connector systems for DataStrea
Hi Timo,
Thank you for the proposal. I think it is an important improvement that
will benefit many parts of the Table API. The proposal looks really good
to me and personally I would be comfortable with voting on the current
state.
Best,
Dawid
On 23/03/2020 18:53, Timo Walther wrote:
> Hi every
Hi everyone,
I received some questions around how the new interfaces play together
with formats and their factories.
Furthermore, for MySQL or Postgres CDC logs, the format should be able
to return a `ChangelogMode`.
Also, I incorporated the feedback around the factory design in general.
I
Hi Benchao,
this is a very good question. I will update the FLIP about this.
The legacy planner will not support the new interfaces. It will only
support the old interfaces. With the next release, I think the Blink
planner is stable enough to be the default one as well.
Regards,
Timo
On 18.
Hi Timo,
Thank you and others for the efforts to prepare this FLIP.
The FLIP LGTM generally.
+1 for moving blink data structures to table-common, it's useful to udf too
in the future.
A little question is, do we plan to support the new interfaces and data
types in legacy planner?
Or we only plan
Hi everyone,
I'm happy to present the results of long discussions that we had
internally. Jark, Dawid, Aljoscha, Kurt, Jingsong, me, and many more
have contributed to this design document.
We would like to propose new long-term table source and table sink
interfaces:
https://cwiki.apache.o
29 matches
Mail list logo