Re: Flink Runner with HDFS

2020-05-29 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
m.Pipeline(options=options) as p: > (p | 'ReadMyFile' >> beam.io.ReadFromText(input_file_hdfs) > |"WriteMyFile" >> beam.io.WriteToText(output_file_hdfs)) -Max On 28.05.20 17:00, Ramanan, Buvana (Nokia - US/Murray Hill) wrote: > Hi Max,

Worker pool question

2020-05-29 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
Hello, I came across the following in SDK Harness Documentation page: https://beam.apache.org/documentation/runtime/sdk-harness-config/ environment_type: EXTERNAL: User code will be dispatched to an external service. For example, one can start an external service for Python workers by running do

Re: HDFS I/O with Beam on Spark and Flink runners - consistent Error messages

2020-05-28 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
_sdk" And I get an error message (see attachment) similar to what I get with Spark and Flink Runners where clusters are external. thanks, Buvana ____________ From: Ramanan, Buvana (Nokia - US/Murray Hill) Sent: Thursday, May 28, 2020 11:47 PM To: user@beam.apache.org Subj

Re: HDFS I/O with Beam on Spark and Flink runners - consistent Error messages

2020-05-28 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
PM Ramanan, Buvana (Nokia - US/Murray Hill) mailto:buvana.rama...@nokia-bell-labs.com>> wrote: Kyle, Max, All, I am desperately trying to get Beam working on at least one of the runners of Flink or Spark. Facing failures in both cases with similar message. Flink runner issue (Beam v

HDFS I/O with Beam on Spark and Flink runners - consistent Error messages

2020-05-28 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
Kyle, Max, All, I am desperately trying to get Beam working on at least one of the runners of Flink or Spark. Facing failures in both cases with similar message. Flink runner issue (Beam v 2.19.0) was reported yesterday with a permalink: https://lists.apache.org/thread.html/r4977083014eb2d25271

Re: Flink Runner with HDFS

2020-05-28 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
gt; lines = p | 'ReadMyFile' >> beam.Create(hdfs_client.open(input_file_hdfs)) Use: > lines = p | 'ReadMyFile' >> beam.io.ReadFromText(input_file_hdfs) Best, Max On 28.05.20 06:06, Ramanan, Buvana (Nokia - US/Murray Hill) wrote: > Further to

Re: Flink Runner with HDFS

2020-05-27 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
? I came across this thread, which helped to some extent, but not completely: https://lists.apache.org/thread.html/a9b6f019b22a65640c272a9497a3c9cc34d68cc2b5c1c9bdebc7ff38%40%3Cuser.beam.apache.org%3E From: "Ramanan, Buvana (Nokia - US/Murray Hill)" Reply-To: "user@beam.apache.org&qu

Flink Runner with HDFS

2020-05-27 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
Hello, I am trying to read from, process and write data to HDFS with beam pipelines in python. Using Flink Runner. Beam version 2.19.0. Flink 1.9.2 My initial code (pasted below my sign) to make sure I can read and write, works fine on Local Runner. However, I get the following error message (p

Re: SparkRunner on k8s

2020-04-22 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
cker host inside the container(s). But the more scalable solution is to use a distributed file system, such as HDFS, Google Cloud Storage, or Amazon S3. Check out the Beam programming guide for more info: https://beam.apache.org/documentation/programming-guide/#pipeline-io On Mon, Apr 13, 2020 at

Re: SparkRunner on k8s

2020-04-16 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
(read) transform you are using in your pipeline, that would be helpful for debugging. Thanks, Kyle On Wed, Apr 15, 2020 at 11:21 PM Ramanan, Buvana (Nokia - US/Murray Hill) mailto:buvana.rama...@nokia-bell-labs.com>> wrote: Kyle, I also built Python SDK from source of the same branch (rel

Re: SparkRunner on k8s

2020-04-15 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
nners, and I assume GCP's (unofficial) Flink and Spark [3] operators, are probably similar enough that it shouldn't be too hard to port the YAML from the Flink operator to the Spark operator. I filed an issue for it [4], but I probably won't have the bandwidth to work on it myself for a while

Re: SparkRunner on k8s

2020-04-15 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
have checked out the same version as the Python SDK you are using. Hope that helps. Kyle On Mon, Apr 13, 2020 at 7:41 PM Ramanan, Buvana (Nokia - US/Murray Hill) mailto:buvana.rama...@nokia-bell-labs.com>> wrote: Hello, I am trying to test the Beam Python pipeline on SparkRunner with Spa

Re: SparkRunner on k8s

2020-04-13 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
ead.java:748) Caused by: java.lang.IllegalArgumentException at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:127) at org.apache.beam.runners.core.construction.graph.PipelineValidator.validateParDo(PipelineValidator.java:238) at org.apache.beam.runners.core.constr

Re: SparkRunner on k8s

2020-04-13 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
or. I filed an issue for it [4], but I probably won't have the bandwidth to work on it myself for a while. - Kyle [1] https://beam.apache.org/roadmap/portability/ [2] https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md [3] https://github.com/Google

Re: SparkRunner on k8s

2020-04-11 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
efer https://spark.apache.org/docs/latest/running-on-kubernetes.html to know more about how to run Spark on k8s. The Beam pipeline will be translated to a Spark Pipeline using Spark APIs in Runtime. Regards, Rahul On Sat, Apr 11, 2020 at 4:38 AM Ramanan, Buvana (Nokia - US/Murray Hill) mailto:b

SparkRunner on k8s

2020-04-10 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
Hello, I newly joined this group and I went through the archive to see if any discussion exists on submitting Beam pipelines to a SparkRunner on k8s. I run my Spark jobs on a k8s cluster in the cluster mode. Would like to deploy my beam pipeline on a SparkRunner with k8s underneath. The Beam d