Re: SQL on Flink

Timo Walther Wed, 27 May 2015 08:06:12 -0700

It's rather passion for the future of the project than passion for SQL ;-)

I always try to think like someone from the economy. And IMO the guysfrom economy are still thinking in SQL. If you want to persuade someonecoming from the SQL world, you should offer a SQL interface to runlegacy code first (similar to Hadoop operators). Rewriting old queriesin Table API is not very convenient.

I share Stephans opinion. Building both APIs concurrently would act as agood source to test and extend the Table API. Currently, the Table APIis half-done, but I think the goal is to have SQL functionality. I canimplement an SQL operator and extend the Table API if functionality ismissing.


On 27.05.2015 16:41, Fabian Hueske wrote:

+1 for committer passion!

Please don't get me wrong, I think SQL on Flink would be a great feature.
I just wanted to make the point that the Table API needs to mirror all SQL
features, if SQL is implemented on top of the Table API.


2015-05-27 16:34 GMT+02:00 Kostas Tzoumas <ktzou...@apache.org>:

I think Fabian's arguments make a lot of sense.

However, if Timo *really wants* to start SQL on top of Table, that is what
he will do a great job at :-) As usual, we can keep it in beta status in
flink-staging until it is mature... and it will help create issues for the
Table API and give direction to its development. Perhaps we will have a
feature-poor SQL for a bit, then switch to hardening the Table API to
support more features and then back to SQL.

I'm just advocating for "committer passion"-first here :-) Perhaps Timo
should weight in

On Wed, May 27, 2015 at 4:19 PM, Fabian Hueske <fhue...@gmail.com> wrote:

IMO, it is better to have one feature that is reasonably well developed
instead of two half-baked features. That's why I proposed to advance the
Table API a bit further before starting the next big thing. I played

around

with the Table API recently and I think it definitely needs a bit more
contributor attention and more features to be actually usable. Also since
all features of the SQL interface need to be included in the Table API
(given we follow the SQL on Table approach) it makes sense IMO to push

the

Table API a bit further before going for the next thing.

2015-05-27 16:06 GMT+02:00 Stephan Ewen <se...@apache.org>:

I see no reason why a SQL interface cannot be "bootstrapped"

concurrently.

It would initially not support many operations,
but would act as a good source to test and drive functionality from the
Table API.


@Ted:

I would like to learn a bit more about the stack and internal

abstractions

of Drill. It may make sense to
reuse some of the query execution operators from Drill. I especially

like

the "learning schema on the fly" part of drill.

Flink DataSets and Streams have a schema, but it may in several cases

be

"schema lower bound", like the greatest common superclass.
Those cases may benefit big time from Drill's ability to refine schema

on

the fly.

That may be useful also in the Table API, making it again available to
LINQ-like programs, and SQL scripts.

On Wed, May 27, 2015 at 3:49 PM, Robert Metzger <rmetz...@apache.org>
wrote:

I didn't know that paper...  Thanks for sharing.

I've worked on a SQL layer for Stratosphere some time ago, using

Apache

Calcite (called Optiq back then). I think the project provides a lot

of

very good tooling for creating a SQL layer. So if we decide to go for

SQL

on Flink, I would suggest to use Calcite.
I can also help you a bit with Calcite to get started with it.

I agree with Fabian that it would probably make more sense for now to
enhance the Table API.
I think the biggest limitation right now is that it only supports

POJOs.

We should also support Tuples (I know thats difficult to do), data

from

HCatalog (that includes parquet & orc), JSON, ...
Then, I would add filter and projection pushdown into the table API.



On Tue, May 26, 2015 at 10:03 PM, Ted Dunning <ted.dunn...@gmail.com
wrote:

It would also be relatively simple (I think) to retarget drill to

Flink

if

Flink doesn't provide enough typing meta-data to do traditional

SQL.



On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske <fhue...@gmail.com

wrote:

Hi,

Flink's Table API is pretty close to what SQL provides. IMO, the

best

approach would be to leverage that and build a SQL parser (maybe

together

with a logical optimizer) on top of the Table API. Parser (and

optimizer)

could be built using Apache Calcite which is providing exactly

this.

Since the Table API is still a fairly new component and not very

feature

rich, it might make sense to extend and strengthen it before

putting

something major on top.

Cheers, Fabian

2015-05-26 21:38 GMT+02:00 Timo Walther <twal...@apache.org>:

Hey everyone,

I would be interested in having a complete SQL API in Flink.

How

is

the

status there? Is someone already working on it? If not, I would

like

to

work on it. I found

http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf

but

I couldn't find anything on the mailing list or Jira.

Otherwise I

would

open an issue and start a discussion about it there.

Regards,
Timo

Re: SQL on Flink

Reply via email to