Re: Resource Consumption increase With TupleTag

2020-08-21 Thread Luke Cwik
On Thu, Aug 20, 2020 at 12:54 PM Talat Uyarer wrote: > Hi Lucas, > >> Not really. It is more about pipeline complexity, logging, debugging, >> monitoring which become more complex. > > Should I use a different consumer group or should I use the same consumer > group ? > I don't know what you're a

Re: Resource Consumption increase With TupleTag

2020-08-20 Thread Talat Uyarer
Hi Lucas, > Not really. It is more about pipeline complexity, logging, debugging, > monitoring which become more complex. Should I use a different consumer group or should I use the same consumer group ? And also How Autoscaling will decide worker count ? What do you mean by it's not working pr

Re: Resource Consumption increase With TupleTag

2020-08-20 Thread Luke Cwik
Do you mean I can put my simple pipeline multiple times for all topics in one dataflow job ? Yes Is there any side effect having multiple independent DAG on one DF job ? Not really. It is more about pipeline complexity, logging, debugging, monitoring which become more complex. And also why the Tu

Re: Resource Consumption increase With TupleTag

2020-08-19 Thread Talat Uyarer
Filter step is an independent step. We can think it is an etl step or something else. MessageExtractor step writes messages on TupleTags based on the kafka header. Yes, MessageExtractor is literally a multi-output DoFn already. MessageExtractor is processing 48kps but branches are processing their

Re: Resource Consumption increase With TupleTag

2020-08-19 Thread Talat Uyarer
Hi Robert, I calculated process speed based on worker count. When I have separate jobs. topic1 job used 5 workers, topic2 job used 7 workers. Based on KafkaIO message count. they had 4kps processing speed per worker. After I combine them in one df job. That DF job started using ~18 workers, not 12

Re: Resource Consumption increase With TupleTag

2020-08-19 Thread Robert Bradshaw
Is this 2kps coming out of Filter1 + 2kps coming out of Filter2 (which would be 4kps total), or only 2kps coming out of KafkaIO and MessageExtractor? Though it /shouldn't/ matter, due to sibling fusion, there's a chance things are getting fused poorly and you could write Filter1 and Filter2 instea

Resource Consumption increase With TupleTag

2020-08-19 Thread Talat Uyarer
Hi, I have a very simple DAG on my dataflow job. (KafkaIO->Filter->WriteGCS). When I submit this Dataflow job per topic it has 4kps per instance processing speed. However I want to consume two different topics in one DF job. I used TupleTag. I created TupleTags per message type. Each topic has dif