Re: Is there an equivalent for --numberOfWorkerHarnessThreads in Python SDK?

2020-08-21 Thread Kamil Wasilewski
No, I'm not. But thanks anyway, I totally missed that option! It occurs in a simple pipeline that executes CoGroupByKey over two PCollections. Reading from a bounded source, 20 millions and 2 millions elements, respectively. One global window. Here's a link to the code, it's one of our tests: http

Re: Is there an equivalent for --numberOfWorkerHarnessThreads in Python SDK?

2020-08-21 Thread Reuven Lax
Streaming Dataflow relies on high thread count for performance. Streaming threads spend a high percentage of time blocked on IO, so in order to get decent CPU utilization we need a lot of threads. Limiting the thread count risks causing performance issues. On Fri, Aug 21, 2020 at 8:00 AM Kamil Was

Re: Resource Consumption increase With TupleTag

2020-08-21 Thread Luke Cwik
On Thu, Aug 20, 2020 at 12:54 PM Talat Uyarer wrote: > Hi Lucas, > >> Not really. It is more about pipeline complexity, logging, debugging, >> monitoring which become more complex. > > Should I use a different consumer group or should I use the same consumer > group ? > I don't know what you're a