Hi Brian,
Many thanks for your mail.
Yes I figured that one out in the end from the docs, but many thanks for
confirming.
I did subsequently discover some other issues with protoBuf-derived schemas
(essentially they don’t seem to be properly supported by BigQueryIO.Write or
allow for optional
Brian, Coders have a provider model where the provider can be queried to
resolve for a given type and the providers are resolved in a specific
order. This gave the flexibility to handle situations like the one you
described.
On Wed, Aug 19, 2020 at 12:30 AM wrote:
> Hi Brian,
>
>
>
> Many thanks
Ah yes, the SchemaRegistry and SchemaProvider follow the same model, but
none of the SchemaProviders are registered by default. Users can register
the proto schema provider with
registerSchemaProvider(Class) [1]:
p.getSchemaRegistry().registerSchemaProvider(ProtoMessageSchema.class);
Then Schem
With providers there is also an ordering issue since multiple providers
could work for a given type so we want to apply them using some stable
ordering.
On Wed, Aug 19, 2020 at 10:08 AM Brian Hulette wrote:
> Ah yes, the SchemaRegistry and SchemaProvider follow the same model, but
> none of the
It looks like this is occurring because we don't actually support mixing
SchemaProviders in nested types. The current SchemaProvider implementations
only support nested types for homogenous types (e.g. an AutoValue with an
AutoValue field). So when you use JavaFieldSchema as the SchemaProvider for
Hi,
I have a very simple DAG on my dataflow job. (KafkaIO->Filter->WriteGCS).
When I submit this Dataflow job per topic it has 4kps per instance
processing speed. However I want to consume two different topics in one DF
job. I used TupleTag. I created TupleTags per message type. Each topic has
dif
Is this 2kps coming out of Filter1 + 2kps coming out of Filter2 (which
would be 4kps total), or only 2kps coming out of KafkaIO and
MessageExtractor?
Though it /shouldn't/ matter, due to sibling fusion, there's a chance
things are getting fused poorly and you could write Filter1 and
Filter2 instea
Hi Robert,
I calculated process speed based on worker count. When I have
separate jobs. topic1 job used 5 workers, topic2 job used 7 workers. Based
on KafkaIO message count. they had 4kps processing speed per worker. After
I combine them in one df job. That DF job started using ~18 workers, not 12
Filter step is an independent step. We can think it is an etl step or
something else. MessageExtractor step writes messages on TupleTags based on
the kafka header. Yes, MessageExtractor is literally a multi-output DoFn
already. MessageExtractor is processing 48kps but branches are processing
their