Re: PyFlink SQL: force maximum use of slots

John Tipper Wed, 20 Jul 2022 01:45:12 -0700

Hi Dian,

Thanks very much - I suppose the concept I'm struggling with is understanding 
how parallelism works when using SQL. I understand that in the Datastream world 
parallelism means each slot will get a subset of events. However, how does that 
work in the SQL world where you need to do joins between tables? If the events 
in tables A and B are being processed in parallel, then does this not mean that 
the slots will have only some events for those tables A and B in any given 
slot? How does Flink ensure consistency of results irrespective of the 
parallelism used, or does it just copy all events to all slots, in which case I 
don't understand how parallelism assists?

Many thanks,

John

________________________________
From: Dian Fu <dian0511...@gmail.com>
Sent: 20 July 2022 05:19
To: John Tipper <john_tip...@hotmail.com>
Cc: user@flink.apache.org <user@flink.apache.org>
Subject: Re: PyFlink SQL: force maximum use of slots

Hi John,

All the operators in the same slot sharing group will be put in one slot. The 
slot sharing group is only configurable in DataStream API [1]. Usually you 
don't need to set the slot sharing group explicitly [2] and this is good to 
share the resource between the operators running in the same slot. If the 
performance becomes a problem, you could just increase the parallelism or the 
resource(CPU or Memory) of the TM. For example, if the parallelism is set to 2, 
you will see that there will be two running slots and each slot containing all 
the operators.

Regards,
Dian

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/operators/overview/#set-slot-sharing-group
[2] 
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/concepts/flink-architecture/#task-slots-and-resources

On Tue, Jul 19, 2022 at 12:23 AM John Tipper 
<john_tip...@hotmail.com<mailto:john_tip...@hotmail.com>> wrote:
Hi all,

Is there a way of forcing a pipeline to use as many slots as possible?  I have 
a program in PyFlink using SQL and the Table API and currently all of my 
pipeline is using just a single slot.  I've tried this:

StreamExecutionEnvironment.get_execution_environment().disable_operator_chaining()

I did have a pipeline which had 17 different tasks running in just a single 
slot, but all this does is give me 79 different operators but all are still 
running in a single slot. Is there a way to get Flink to run different jobs in 
different slots whilst using the Table API and SQL?

Many thanks,

John

Re: PyFlink SQL: force maximum use of slots

Reply via email to