Re: Parallelizing a tumbling group window

Timo Walther Mon, 11 Dec 2017 04:16:39 -0800

Hi Colin,

unfortunately, selecting the parallelism for parts of a SQL query is notsupported yet. By default, tumbling window operators use the defaultparallelism of the environment. Simple project and select operationshave the same parallelism as the inputs they are applied on.

I think the easiest solution so far is to explicilty set the parallelismof operators that are not part of the Table API and use theenvironment's parallelism to scale the SQL query.


I hope that helps.

Regards,
Timo


Am 12/9/17 um 3:06 AM schrieb Colin Williams:

Hello,

I've inherited some flink application code.
We're currently creating a table using a Tumbling SQL query similar tothe first example in
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/sql.html#group-windows<https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/sql.html#group-windows>
Where each generated SQL query looks something like
SELECT measurement, `tag_AppId`(tagset), P99(`field_Adds`(fieldset)),TUMBLE_START(rowtime, INTERVAL '10' MINUTE) FROM LINES GROUP BYmeasurement, `tag_AppId`(tagset), TUMBLE(rowtime, INTERVAL '10' MINUTE)
We are also using a UDFAGG function in some of the queries which Ithink might be cleaned up and optimized a bit (using scala types andpossibly not well implemented)
We then turn the result table back into a datastream usingtoAppendStream, and eventually add a derivative stream to a sink.We've configured TimeCharacteristic to event-time processing.
In some streaming scenarios everything is working fine with aparallelism of 1, but in others it appears that we can't keep up withthe event source.
Then we are investigating how to enable parallelism specifically onthe SQL table query or aggregator.
Can anyone suggest a good way to go about this? It wasn't clear fromthe documentation.
Best,

Colin Williams

Re: Parallelizing a tumbling group window

Reply via email to