RE: Registering Protobuf schema

2020-08-19 Thread Robert.Butcher
Hi Brian, Many thanks for your mail. Yes I figured that one out in the end from the docs, but many thanks for confirming. I did subsequently discover some other issues with protoBuf-derived schemas (essentially they don’t seem to be properly supported by BigQueryIO.Write or allow for optional

Re: Registering Protobuf schema

2020-08-19 Thread Luke Cwik
Brian, Coders have a provider model where the provider can be queried to resolve for a given type and the providers are resolved in a specific order. This gave the flexibility to handle situations like the one you described. On Wed, Aug 19, 2020 at 12:30 AM wrote: > Hi Brian, > > > > Many thanks

Re: Registering Protobuf schema

2020-08-19 Thread Brian Hulette
Ah yes, the SchemaRegistry and SchemaProvider follow the same model, but none of the SchemaProviders are registered by default. Users can register the proto schema provider with registerSchemaProvider(Class) [1]: p.getSchemaRegistry().registerSchemaProvider(ProtoMessageSchema.class); Then Schem

Re: Registering Protobuf schema

2020-08-19 Thread Luke Cwik
With providers there is also an ordering issue since multiple providers could work for a given type so we want to apply them using some stable ordering. On Wed, Aug 19, 2020 at 10:08 AM Brian Hulette wrote: > Ah yes, the SchemaRegistry and SchemaProvider follow the same model, but > none of the

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-08-19 Thread Brian Hulette
It looks like this is occurring because we don't actually support mixing SchemaProviders in nested types. The current SchemaProvider implementations only support nested types for homogenous types (e.g. an AutoValue with an AutoValue field). So when you use JavaFieldSchema as the SchemaProvider for

Resource Consumption increase With TupleTag

2020-08-19 Thread Talat Uyarer
Hi, I have a very simple DAG on my dataflow job. (KafkaIO->Filter->WriteGCS). When I submit this Dataflow job per topic it has 4kps per instance processing speed. However I want to consume two different topics in one DF job. I used TupleTag. I created TupleTags per message type. Each topic has dif

Re: Resource Consumption increase With TupleTag

2020-08-19 Thread Robert Bradshaw
Is this 2kps coming out of Filter1 + 2kps coming out of Filter2 (which would be 4kps total), or only 2kps coming out of KafkaIO and MessageExtractor? Though it /shouldn't/ matter, due to sibling fusion, there's a chance things are getting fused poorly and you could write Filter1 and Filter2 instea

Re: Resource Consumption increase With TupleTag

2020-08-19 Thread Talat Uyarer
Hi Robert, I calculated process speed based on worker count. When I have separate jobs. topic1 job used 5 workers, topic2 job used 7 workers. Based on KafkaIO message count. they had 4kps processing speed per worker. After I combine them in one df job. That DF job started using ~18 workers, not 12

Re: Resource Consumption increase With TupleTag

2020-08-19 Thread Talat Uyarer
Filter step is an independent step. We can think it is an etl step or something else. MessageExtractor step writes messages on TupleTags based on the kafka header. Yes, MessageExtractor is literally a multi-output DoFn already. MessageExtractor is processing 48kps but branches are processing their