m.Pipeline(options=options) as
p:
> (p | 'ReadMyFile' >> beam.io.ReadFromText(input_file_hdfs)
> |"WriteMyFile" >> beam.io.WriteToText(output_file_hdfs))
-Max
On 28.05.20 17:00, Ramanan, Buvana (Nokia - US/Murray Hill) wrote:
> Hi Max,
Hello,
I came across the following in SDK Harness Documentation page:
https://beam.apache.org/documentation/runtime/sdk-harness-config/
environment_type:
EXTERNAL: User code will be dispatched to an external service. For example, one
can start an external service for Python workers by running do
_sdk"
And I get an error message (see attachment) similar to what I get with Spark
and Flink Runners where clusters are external.
thanks,
Buvana
____________
From: Ramanan, Buvana (Nokia - US/Murray Hill)
Sent: Thursday, May 28, 2020 11:47 PM
To: user@beam.apache.org
Subj
PM Ramanan, Buvana (Nokia - US/Murray Hill)
mailto:buvana.rama...@nokia-bell-labs.com>>
wrote:
Kyle, Max, All,
I am desperately trying to get Beam working on at least one of the runners of
Flink or Spark. Facing failures in both cases with similar message.
Flink runner issue (Beam v
Kyle, Max, All,
I am desperately trying to get Beam working on at least one of the runners of
Flink or Spark. Facing failures in both cases with similar message.
Flink runner issue (Beam v 2.19.0) was reported yesterday with a permalink:
https://lists.apache.org/thread.html/r4977083014eb2d25271
gt; lines = p | 'ReadMyFile' >> beam.Create(hdfs_client.open(input_file_hdfs))
Use:
> lines = p | 'ReadMyFile' >> beam.io.ReadFromText(input_file_hdfs)
Best,
Max
On 28.05.20 06:06, Ramanan, Buvana (Nokia - US/Murray Hill) wrote:
> Further to
?
I came across this thread, which helped to some extent, but not completely:
https://lists.apache.org/thread.html/a9b6f019b22a65640c272a9497a3c9cc34d68cc2b5c1c9bdebc7ff38%40%3Cuser.beam.apache.org%3E
From: "Ramanan, Buvana (Nokia - US/Murray Hill)"
Reply-To: "user@beam.apache.org&qu
Hello,
I am trying to read from, process and write data to HDFS with beam pipelines in
python. Using Flink Runner. Beam version 2.19.0. Flink 1.9.2
My initial code (pasted below my sign) to make sure I can read and write, works
fine on Local Runner. However, I get the following error message (p
cker host inside the container(s). But the more scalable solution is to
use a distributed file system, such as HDFS, Google Cloud Storage, or Amazon
S3. Check out the Beam programming guide for more info:
https://beam.apache.org/documentation/programming-guide/#pipeline-io
On Mon, Apr 13, 2020 at
(read)
transform you are using in your pipeline, that would be helpful for debugging.
Thanks,
Kyle
On Wed, Apr 15, 2020 at 11:21 PM Ramanan, Buvana (Nokia - US/Murray Hill)
mailto:buvana.rama...@nokia-bell-labs.com>>
wrote:
Kyle,
I also built Python SDK from source of the same branch (rel
nners, and I assume GCP's (unofficial) Flink and
Spark [3] operators, are probably similar enough that it shouldn't be too hard
to port the YAML from the Flink operator to the Spark operator. I filed an
issue for it [4], but I probably won't have the bandwidth to work on it myself
for a while
have checked out the same version as the Python SDK you are
using.
Hope that helps.
Kyle
On Mon, Apr 13, 2020 at 7:41 PM Ramanan, Buvana (Nokia - US/Murray Hill)
mailto:buvana.rama...@nokia-bell-labs.com>>
wrote:
Hello,
I am trying to test the Beam Python pipeline on SparkRunner with Spa
ead.java:748)
Caused by: java.lang.IllegalArgumentException
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:127)
at
org.apache.beam.runners.core.construction.graph.PipelineValidator.validateParDo(PipelineValidator.java:238)
at
org.apache.beam.runners.core.constr
or. I filed an
issue for it [4], but I probably won't have the bandwidth to work on it myself
for a while.
- Kyle
[1] https://beam.apache.org/roadmap/portability/
[2]
https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md
[3] https://github.com/Google
efer https://spark.apache.org/docs/latest/running-on-kubernetes.html to know
more about how to run Spark on k8s.
The Beam pipeline will be translated to a Spark Pipeline using Spark APIs in
Runtime.
Regards,
Rahul
On Sat, Apr 11, 2020 at 4:38 AM Ramanan, Buvana (Nokia - US/Murray Hill)
mailto:b
Hello,
I newly joined this group and I went through the archive to see if any
discussion exists on submitting Beam pipelines to a SparkRunner on k8s.
I run my Spark jobs on a k8s cluster in the cluster mode. Would like to deploy
my beam pipeline on a SparkRunner with k8s underneath.
The Beam d
16 matches
Mail list logo