unable to read kerberized HDFS using dataflow

2020-06-16 Thread Vince Gonzalez
Is there specific configuration required to ensure that workers get access to UserGroupInformation when using TextIO? I am using Beam 2.22.0 on the dataflow runner. My main method looks like this below. My HdfsTextIOOptions extends HadoopFileSystemOptions, and I set the HdfsConfiguration on the op

Re: Beam supports Flink RichAsyncFunction

2020-06-16 Thread Eleanore Jin
Thanks Luke for the info. I will take a look. Eleanore On Mon, Jun 15, 2020 at 12:48 PM Luke Cwik wrote: > The intent is that users shouldn't have to use async I/O since the idea is > that the runner should increase the number of workers/threads being > processed automatically so that you never

KafkaIO Exactly once vs At least Once

2020-06-16 Thread Eleanore Jin
Hi All, I previously asked a few questions regarding enable EOS (exactly once semantics) please see below. Our Beam pipeline uses KafkaIO to read from source topic, and then use KafkaIO to publish to sink topic. According to Max's answer to my previous questions, enable EOS with KafkaIO will int

Webinar: Feature Powered by Apache Beam – Beyond Lambda (eBay)

2020-06-16 Thread Aizhamal Nurmamat kyzy
Hi all, We are resuming our webinars on Beam Learning Month! Please join us this Wednesday at *9.45am PDT/4:45pm GMT/12:45pm EST* , where Kobe Feng from eBay will deliver a talk about leveraging Apache Beam to build large scale feature pipelines at eBay. Register: https://learn.xnextcon.com/even

Preparing to sunset Python 2 offering in Apache Beam

2020-06-16 Thread Valentyn Tymofieiev
Hi Beam User community, In line with the pledge[1] to sunset Python 2 offering in new releases in 2020 that Apache Beam has committed to [1,2], we are discussing[3] a more concrete proposal on dev@ mailing list to make 2.23.0 the final Beam release supporting Python 2. We have now had more than 1

Re: unable to read kerberized HDFS using dataflow

2020-06-16 Thread Luke Cwik
Posted comments on your SO question. On Tue, Jun 16, 2020 at 4:32 AM Vince Gonzalez wrote: > Is there specific configuration required to ensure that workers get access > to UserGroupInformation when using TextIO? I am using Beam 2.22.0 on the > dataflow runner. > > My main method looks like this

ValueProviderOptions and templates

2020-06-16 Thread Marco Mistroni
HI all i am creating dataflow jobs using python API by creating templates which i then run on gcp. So suppose my dataflow job accepts 2 input parameter which i need to supply at invocation time. Do i need to specify these parameters when i create my template? Here' s a sample. suppose i need two p