from:"Heejong Lee"

Re: Python SDK ReadFromKafka: Timeout expired while fetching topic metadata

2020-06-08 Thread Heejong Lee

gt; wrote: > >> Seems like Java dependency is not being properly set up when running the >> cross-language Kafka step. I don't think this was available for Beam 2.21. >> Can you try with the latest Beam HEAD or Beam 2.22 when it's released ? >> +Heejong

Re: beam python on spark-runner

2020-05-14 Thread Heejong Lee

How did you start spark job server and what version of Apache Beam SDK did you use? There were some protocol changes recently so if both versions are not matched you may see gRPC errors. If you used the gradle command on the latest head for starting spark job server, I would recommend checking out

Re: PubsubIO writeProtos with proto3

2020-05-13 Thread Heejong Lee

What protobuf version did you use for generating the Message object? It looks like ProtoCoder supports both 2 and 3. https://github.com/apache/beam/blob/master/sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoCoder.java#L70 On Wed, May 13, 2020 at 6:43 AM T

Re: Behavior of KafkaIO

2020-05-11 Thread Heejong Lee

If we assume that there's only one reader, all partitions are assigned to a single KafkaConsumer. I think the order of reading each partition depends on KafkaConsumer implementation i.e. how KafkaConsumer.poll() returns messages. Reference: assigning partitions: https://github.com/apache/beam/blob

Re: Merging different PCollections for writing if BigQuery

2020-02-11 Thread Heejong Lee

What do you mean by "PCollection of dicts, each having different key values"? What's the type of the PCollections? I assume that you want to merge two PCollections of KV such as PCollection[("a", 1), ("b", 2), ("c", 3)] + PCollection[("a", 4), ("d", 5), ("e", 6)]. Is that correct? On Tue, Feb 11,

Re: (python) Using ToList output as input for AsSingleton or AsList in Apache Beam

2019-04-17 Thread Heejong Lee

Looks like your code is not grammatically correct. It's impossible to chain pcollection and pcollection with a vertical bar. Each chain should start with an initial pipeline or pcollection and generates another pcollection via a given ptransform. If you modify your code like below, it should work:

Re: Python SDK ReadFromKafka: Timeout expired while fetching topic metadata

Re: beam python on spark-runner

Re: PubsubIO writeProtos with proto3

Re: Behavior of KafkaIO

Re: Merging different PCollections for writing if BigQuery

Re: (python) Using ToList output as input for AsSingleton or AsList in Apache Beam

6 matches

Site Navigation

Mail list logo

Footer information