Hi Colin,

unfortunately, selecting the parallelism for parts of a SQL query is not supported yet. By default, tumbling window operators use the default parallelism of the environment. Simple project and select operations have the same parallelism as the inputs they are applied on.

I think the easiest solution so far is to explicilty set the parallelism of operators that are not part of the Table API and use the environment's parallelism to scale the SQL query.

I hope that helps.

Regards,
Timo


Am 12/9/17 um 3:06 AM schrieb Colin Williams:
Hello,

I've inherited some flink application code.

We're currently creating a table using a Tumbling SQL query similar to the first example in

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/sql.html#group-windows <https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/sql.html#group-windows>

Where each generated SQL query looks something like

SELECT measurement, `tag_AppId`(tagset), P99(`field_Adds`(fieldset)), TUMBLE_START(rowtime, INTERVAL '10' MINUTE) FROM LINES GROUP BY measurement, `tag_AppId`(tagset), TUMBLE(rowtime, INTERVAL '10' MINUTE)


We are also using a UDFAGG function in some of the queries which I think might be cleaned up and optimized a bit (using scala types and possibly not well implemented)

We then turn the result table back into a datastream using toAppendStream, and eventually add a derivative stream to a sink. We've configured TimeCharacteristic to event-time processing.

In some streaming scenarios everything is working fine with a parallelism of 1, but in others it appears that we can't keep up with the event source.

Then we are investigating how to enable parallelism specifically on the SQL table query or aggregator.

Can anyone suggest a good way to go about this? It wasn't clear from the documentation.

Best,

Colin Williams




Reply via email to