Re: Stream SQL and Dynamic tables

Fabian Hueske Fri, 27 Jan 2017 12:42:10 -0800

Hi Radu,

the idea is to only support operations that are bounded in space and
compute time:


- space: the size of state may not infinitely grow over time or with
growing key domains. For these cases, the optimizer will enforce a cleanup
timeout and all data which is passed that timeout will be discarded.
Operations which cannot be bounded in space will be rejected.

- compute time: certain queries can not be efficiently execute because
newly arriving data (late data or just newly appended rows) might trigger
recomputation of large parts of the current state. Operations that will
result in such a computation pattern will be rejected. One example would be
event-time OVER ROWS windows as we discussed in the other thread.

So the plan is that the optimizer takes care of limiting the space
requirements and computation effort.
However, you are of course right. Retraction and long running windows can
result significant amounts of operator state.
I don't think this is a special requirement for the Table API (there are
users of the DataStream API with jobs that manage TBs of state). Persisting
state to disk with RocksDB and scaling out to more nodes should address the
scaling problem initially. In the long run, the Flink community will work
to improve the handling of large state with features such as incremental
checkpoints and new state backends.

Looking forward to your comments.

Best,
Fabian

2017-01-27 11:01 GMT+01:00 Radu Tudoran <[email protected]>:

> Hi,
>
> Thanks for the clarification Fabian - it is really useful.
> I agree that we should consolidate the module and avoid the need to
> further maintain 3 different "projects". It does make sense to see the
> current (I would call it)"Stream SQL" as a table with append semantics.
> However, one thing that should be clarified is what is the best way from
> the implementation point of view to keep the state of the table (if we can
> actually keep it - though the need is clear for supporting retraction). As
> the input is a stream and the table is append of course we run in the
> classical infinite issue that streams have. What should be the approach?
> Should we consider keeping the data in something like the statebackend now
> for windows, and then pushing them to the disk (e.g., like
> FSStateBackends). Perhaps with the disk we can at least enlarge the horizon
> of what we keep.
> I will give some comments and some thoughts in the document about this.
>
>
> Dr. Radu Tudoran
> Senior Research Engineer - Big Data Expert
> IT R&D Division
>
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH
> European Research Center
> Riesstrasse 25, 80992 München
>
> E-mail: [email protected]
> Mobile: +49 15209084330
> Telephone: +49 891588344173
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH
> Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
> Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
> Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
> Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
> Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
>
>
> -----Original Message-----
> From: Fabian Hueske [mailto:[email protected]]
> Sent: Thursday, January 26, 2017 3:37 PM
> To: [email protected]
> Subject: Re: Stream SQL and Dynamic tables
>
> Hi Radu,
>
> the idea is to have dynamic tables as the common ground for Table API and
> SQL.
> I don't think it is a good idea to implement and maintain 3 different
> relational APIs with possibly varying semantics.
>
> Actually, you can see the current status of the Table API / SQL on stream
> as a subset of the proposed semantics.
> Right now, all streams are implicitly converted into Tables with APPEND
> semantics. The currently supported operations (selection, filter, union,
> group windows) return streams.
> The only thing that would change for these operations would be the output
> mode to be retraction mode by default in order to be able to emit updated
> records (e.g., updated aggregates due to late records).
>
> The document is not final and we can of course discuss the proposal.
>
> Best, Fabian
>
> 2017-01-26 11:33 GMT+01:00 Radu Tudoran <[email protected]>:
>
> > Hi all,
> >
> >
> >
> > I have a question with respect to the scope behind the initiative
> > behind relational queries on data streams:
> >
> > https://docs.google.com/document/d/1qVVt_16kdaZQ8RTfA_
> > f4konQPW4tnl8THw6rzGUdaqU/edit#
> >
> >
> >
> > Is the approach of using dynamic tables intended to replace the
> > implementation and mechanisms build now in stream sql ? Or will the
> > two co-exist, be built one on top of the other?
> >
> >
> >
> > Also – is the document in the final form or can we still provide
> > feedback / ask questions?
> >
> >
> >
> > Thanks for the clarification (and sorry if I missed at some point the
> > discussion that might have clarified this)
> >
> >
> >
> > Dr. Radu Tudoran
> >
> > Senior Research Engineer - Big Data Expert
> >
> > IT R&D Division
> >
> >
> >
> > [image: cid:[email protected]]
> >
> > HUAWEI TECHNOLOGIES Duesseldorf GmbH
> >
> > European Research Center
> >
> > Riesstrasse 25, 80992 München
> >
> >
> >
> > E-mail: *[email protected] <[email protected]>*
> >
> > Mobile: +49 15209084330 <+49%201520%209084330>
> >
> > Telephone: +49 891588344173 <+49%2089%201588344173>
> >
> >
> >
> > HUAWEI TECHNOLOGIES Duesseldorf GmbH
> > Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com Registered
> > Office: Düsseldorf, Register Court Düsseldorf, HRB 56063, Managing
> > Director: Bo PENG, Wanzhou MENG, Lifang CHEN Sitz der Gesellschaft:
> > Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
> > Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
> >
> > This e-mail and its attachments contain confidential information from
> > HUAWEI, which is intended only for the person or entity whose address
> > is listed above. Any use of the information contained herein in any
> > way (including, but not limited to, total or partial disclosure,
> > reproduction, or dissemination) by persons other than the intended
> > recipient(s) is prohibited. If you receive this e-mail in error,
> > please notify the sender by phone or email immediately and delete it!
> >
> >
> >
>

Re: Stream SQL and Dynamic tables

Reply via email to