Hi Dian, Thanks very much - I suppose the concept I'm struggling with is understanding how parallelism works when using SQL. I understand that in the Datastream world parallelism means each slot will get a subset of events. However, how does that work in the SQL world where you need to do joins between tables? If the events in tables A and B are being processed in parallel, then does this not mean that the slots will have only some events for those tables A and B in any given slot? How does Flink ensure consistency of results irrespective of the parallelism used, or does it just copy all events to all slots, in which case I don't understand how parallelism assists?
Many thanks, John ________________________________ From: Dian Fu <dian0511...@gmail.com> Sent: 20 July 2022 05:19 To: John Tipper <john_tip...@hotmail.com> Cc: user@flink.apache.org <user@flink.apache.org> Subject: Re: PyFlink SQL: force maximum use of slots Hi John, All the operators in the same slot sharing group will be put in one slot. The slot sharing group is only configurable in DataStream API [1]. Usually you don't need to set the slot sharing group explicitly [2] and this is good to share the resource between the operators running in the same slot. If the performance becomes a problem, you could just increase the parallelism or the resource(CPU or Memory) of the TM. For example, if the parallelism is set to 2, you will see that there will be two running slots and each slot containing all the operators. Regards, Dian [1] https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/operators/overview/#set-slot-sharing-group [2] https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/concepts/flink-architecture/#task-slots-and-resources On Tue, Jul 19, 2022 at 12:23 AM John Tipper <john_tip...@hotmail.com<mailto:john_tip...@hotmail.com>> wrote: Hi all, Is there a way of forcing a pipeline to use as many slots as possible? I have a program in PyFlink using SQL and the Table API and currently all of my pipeline is using just a single slot. I've tried this: StreamExecutionEnvironment.get_execution_environment().disable_operator_chaining() I did have a pipeline which had 17 different tasks running in just a single slot, but all this does is give me 79 different operators but all are still running in a single slot. Is there a way to get Flink to run different jobs in different slots whilst using the Table API and SQL? Many thanks, John