Is this 2kps coming out of Filter1 + 2kps coming out of Filter2 (which would be 4kps total), or only 2kps coming out of KafkaIO and MessageExtractor?
Though it /shouldn't/ matter, due to sibling fusion, there's a chance things are getting fused poorly and you could write Filter1 and Filter2 instead as a DoFn with multiple outputs (see https://beam.apache.org/documentation/programming-guide/#additional-outputs). - Robert On Wed, Aug 19, 2020 at 3:37 PM Talat Uyarer <tuya...@paloaltonetworks.com> wrote: > > Hi, > > I have a very simple DAG on my dataflow job. (KafkaIO->Filter->WriteGCS). > When I submit this Dataflow job per topic it has 4kps per instance processing > speed. However I want to consume two different topics in one DF job. I used > TupleTag. I created TupleTags per message type. Each topic has different > message types and also needs different filters. So my pipeline turned to > below DAG. Message Extractor is a very simple step checking header of kafka > messages and writing the correct TupleTag. However after starting to use this > new DAG, dataflow canprocess 2kps per instance. > > |--->Filter1-->WriteGCS > KafkaIO->MessageExtractor-> | > |--->Filter2-->WriteGCS > > Do you have any idea why my data process speed decreased ? > > Thanks