Hooray! Thanks, Etienne!
On Thu, Aug 5, 2021 at 3:11 AM Etienne Chauchot
wrote:
> Hi all,
>
> Just to let you know that Spark Structured Streaming runner was migrated
> to Spark 3.
>
> Enjoy !
>
> Etienne
>
>
Hi,
Sorry for the late answer: the streaming mode in spark structured
streaming runner is stuck because of spark structured streaming
framework implementation of watermark at the apache spark project side. See
https://echauchot.blogspot.com/2020/11/watermark-architecture-proposal-for.html
be
Thanks. That was the issue. The latest release of Beam indicates it
supports Spark 3 and I find references in the code to Hadoop 3.2.1. Is it
possible to configure beam to run on Hadoop 3.2.1?
Trevor
On Mon, Jun 21, 2021 at 6:19 PM Kyle Weaver wrote:
> Looks like a version mismatch between Hado
Hi Alex -- Please se the details you are looking for.
I am running a sample pipeline and my environment is this.
python "SaiStudy - Apache-Beam-Spark.py" --runner=PortableRunner
--job_endpoint=192.168.99.102:8099
My Spark is running on a Docker Container and I can see that the JobService is
ru
Hi Ramesh,
By “+ Docker” do you mean Docker SDK Harness or running a Spark in Docker? For
the former I believe it works fine.
Could you share more details of what kind of error you are facing?
> On 27 Oct 2020, at 21:10, Ramesh Mathikumar wrote:
>
> Hi Group -- Has anyone got this to work?
That was it Juan Carlos. Can't thank you enough.
I now have generic Kettle transformations running on the Direct Runner,
Dataflow and Spark.
Cheers,
Matt
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
Op di 29 jan. 2019 om 18:19 schreef Matt Casters :
No you're right. I got so focused on getting
org.apache.hadoop.fs.FileSystem in order that I forgot about the other
files. Doh!
Thanks for the tip!
---
Matt Casters attcast...@gmail.com>
Senior Solution Architect, Kettle Project Founder
Op di 29 jan. 2019 om 16:41 schreef Juan Carlos Garcia :
Hi Matt,
Are the META-INF/services files merged correctly on the fat-jar?
On Tue, Jan 29, 2019 at 2:33 PM Matt Casters wrote:
> Hi Beam!
>
> After I have my pipeline created and the execution started on the Spark
> master I get this strange nugget:
>
> java.lang.IllegalStateException: No Tran
OK folks, I figured it out.
For the other people desperately clutching to years-old google results in
the hope to find any hint...
The Spark requirement to work with a fat jar caused a collision in the
packaging on file:
META-INF/services/org.apache.hadoop.fs.FileSystem
This in turn erased cert
Yeah for this setup I used flintrock to start up a bunch of nodes with
Spark and HDFS on AWS. I'm launching the pipeline on the master and all
possible HDFS libraries I can think of are available and hdfs dfs commands
work fine on the master and all the slaves.
It's a problem of transparency I thin
Matt is the machine from where you are launching the pipeline different
from where it should run?
If that's the case make sure the machine used for launching has all the
hdfs environments variable set, as the pipeline is being configured in the
launching machine before it hit the worker machine.
Thanks for the suggestion but throwing another server into the mix wouldn't
help in my case. I'm still betting that using SparkLauncher would solve a
lot.
Building a far jar isn't that big of a deal. However, all the Kettle libs
along with the ones from Beam clocks in at 1.4GB at this point in ti
Hi Matt,
I just wanted to remind that you also can use Apache Livy [1] to launch Spark
jobs (or Beam pipelines that are built with support of SparkRunner) on Spark
using just REST API [2].
And of course, you need to create manually a “fat" jar and put it somewhere
where Spark can find it.
[1]
Hi Matt,
With flink you will be able launch your pipeline just by invoking the main
method of your main class, however it will run as standalone process and
you will not have the advantage of distribute computation.
Am Fr., 18. Jan. 2019, 09:37 hat Matt Casters
geschrieben:
> Thanks for the rep
Thanks for the reply JC, I really appreciate it.
I really can't force our users to use antiquated stuff like scripts, let
alone command line things, but I'll simply use SparkLauncher and your
comment about the main class doing Pipeline.run() on the Master is
something I can work with... somewhat.
Hi Matt, during the time we were using Spark with Beam, the solution was
always to pack the jar and use the spark-submit command pointing to your
main class which will do `pipeline.run`.
The spark-submit command have a flag to decide how to run it
(--deploy-mode), whether to launch the job on the
Hi Mike,
>From the documentation on
https://beam.apache.org/documentation/runners/spark/#pipeline-options-for-the-spark-runner
storageLevel The StorageLevel to use when caching RDDs in batch pipelines.
The Spark Runner automatically caches RDDs that are evaluated repeatedly.
This is a batch-only
+dev
On Fri, Jun 29, 2018 at 1:32 AM Akanksha Sharma B <
akanksha.b.sha...@ericsson.com> wrote:
> Hi,
>
>
> I just started using apache beam pipeline for processing unbouned data, on
> spark. So it essentially uses spark-streaming.
>
>
> However, I came across following statement in Spark Runner
Let me create a branch on my github and share it with you. Let's continue the
discussion with direct message (to avoid to flood the mailing list).
Regards
JB
On 02/22/2018 06:51 PM, Gary Dusbabek wrote:
> Yes, that would be very helpful. If possible, I'd like to understand how it is
> constructed
Yes, that would be very helpful. If possible, I'd like to understand how it
is constructed so that I can maintain it. A link to a git repo would be
great.
I've spent some time trying to understand how the Beam project is
built/managed. It looks like the poms are intended primarily for developers
a
OK, do you want me to provide a Scala 2.10 build for you ?
Regards
JB
On 02/22/2018 06:44 PM, Gary Dusbabek wrote:
> Jean-Baptiste,
>
> Thanks for responding. I agree--it would be better to use Scala 2.11. I'm in
> the
> process of creating a Beam POC with an existing platform and upgrading
> e
Jean-Baptiste,
Thanks for responding. I agree--it would be better to use Scala 2.11. I'm
in the process of creating a Beam POC with an existing platform and
upgrading everything in that platform to Scala 2.11 as a prerequisite is
out of scope.
It would be helpful to know if Beam in it's current s
Hi Gary,
Beam 2.3.0 and the Spark runner use Scala 2.11.
I can help you to have a smooth transition by creating a local branch using
Scala 2.10. However, I strongly advice to upgrade to 2.11 as some other part of
Beam (other runners and IOs) use 2.11 already.
Regards
JB
On 02/22/2018 05:55 PM
Hi Ron,
Could you show the whole pom.xml which you actually used?
I can guess that it was an issue with transformers configuration because seems
ServicesResourceTransformer doesn’t support “resource" parameter.
WBR,
Alexey
> On 15 Jan 2018, at 07:51, Ron Gonzalez wrote:
>
> Hi,
> I added
Hi,
I second Luke here: you have to use Spark 1.x or use the PR supporting Spark
2.x.
Regards
JB
On 12/04/2017 08:14 PM, Lukasz Cwik wrote:
It seems like your trying to use Spark 2.1.0. Apache Beam currently relies on
users using Spark 1.6.3. There is an open pull request[1] to migrate to Spa
It seems like your trying to use Spark 2.1.0. Apache Beam currently relies
on users using Spark 1.6.3. There is an open pull request[1] to migrate to
Spark 2.2.0.
1: https://github.com/apache/beam/pull/4208/
On Mon, Dec 4, 2017 at 10:58 AM, Opitz, Daniel A
wrote:
> We are trying to submit a Spa
That's correct, but I would avoid to do this generally speaking.
IMHO, it's better to use the default Spark context created by the runner.
Regards
JB
On 09/27/2017 01:02 PM, Romain Manni-Bucau wrote:
You have the SparkPipelineOptions you can set
(org.apache.beam.runners.spark.SparkPipelineOpti
What type of Spark configuration are you trying to augment?
As Romain mentioned, you can supply a custom Spark conf to your Beam
pipeline, but depending on your use case this may not be necessary, and
best avoided.
Also, keep in mind that if you use spark-submit you can add your
configurations whe
You have the SparkPipelineOptions you can set
(org.apache.beam.runners.spark.SparkPipelineOptions -
https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java),
most of these options are explained at
https://beam.apache.org/documen
my last email want't clear, please ignore.
Thanks it's looks better (no error or exceptions)
my problem now is how to set Spark conf to my pipeline, this is what i have
?
onf = SparkConf();
conf.setAppName("InsightEdge Python Example")
conf.set("my.field1", "XXX")
conf.set("my.field2", "YYY")
Thanks it's looks better (no error or exceptions)
my problem now is how to set Spark conf to my pipeline, this is what i have
?
onf = SparkConf();
conf.setAppName("InsightEdge Python Example")
conf.set("spark.insightedge.space.name", "insightedge-space")
conf.set("spark.insightedge.space.lookup
Yes, you need a coder for your product if it is passed to the output for
next "apply". You can register it on the pipeline or through beam SPI. Here
is a sample to use java serialization:
pipeline.getCoderRegistry().registerCoderForClass(Product.class,
SerializableCoder.of(Product.class));
Roma
hi
i tried what you wrote the argumants that i use are:
--sparkMaster=local --runner=SparkRunner
i already have Spark running.
now i'm getting the following error:
.IllegalStateException: Unable to return a default Coder for
ParDo(Anonymous)/ParMultiDo(Anonymous).out0 [PCollectio
Thanks Tal
Hi Tal,
Did you try something like that:
public static void main(final String[] args) {
final Pipeline pipeline =
Pipeline.create(PipelineOptionsFactory.fromArgs(args).create());
pipeline.apply(GenerateSequence.from(0).to(10L))
.apply(ParDo.of(new DoFn() {
HI
i looked at the links you sent me, and i haven't found any clue how to
adapt it to my current code.
my code is very simple:
val sc = spark.sparkContext
val productsNum = 10
println(s"Saving $productsNum products RDD to the space")
val rdd = sc.parallelize(1 to productsNum).map { i =>
Pro
Hi Tal,
Thanks for reaching out!
Please take a look at our documentation:
Quickstart guide (Java):
https://beam.apache.org/get-started/quickstart-java/
This guide will show you how to run our wordcount example using each any of
the runners (For example, direct runner or Spark runner in your case
Glad to have helped!
Thanks again for reaching out. Keep on Beaming and let us know about any
issues you encounter.
On Fri, Aug 18, 2017 at 2:12 PM Sathish Jayaraman
wrote:
> Hey Aviem,
>
> Thank you for noticing. You were right, I was able to run all my jobs in
> Spark 1.6.3. You are awesome!!!
Hey Aviem,
Thank you for noticing. You were right, I was able to run all my jobs in Spark
1.6.3. You are awesome!!!
Regards,
Sathish. J
On 16-Aug-2017, at 3:23 PM, Aviem Zur
mailto:aviem...@gmail.com>> wrote:
2.0.0. I really doubt if its Spark setup issue. I wrote a native Spark
applicatio
Hi Jayaraman,
Thanks for reaching out.
We run Beam using Spark runner daily on a yarn cluster.
It appears that in many of the logs you sent there is hanging when
connecting to certain servers on certain ports, could this be a network
issue or an issue with your Spark setup?
Could you please shar
Hi,
Thanks for trying it out.
I was running the job in local single node setup. I also spawn a HDInsights
cluster in Azure platform just to test the WordCount program. Its the same
result there too, stuck at the Evaluating ParMultiDo step. It runs fine in mvn
compile exec, but when bundled int
Hi,
yes, but not with Yarn.
Let me bootstrap a quick cluster with Yarn to test it.
Regards
JB
On 08/03/2017 09:34 AM, Sathish Jayaraman wrote:
Hi,
Was anyone able to run Beam application on Spark at all??
I tried all possible options and still no luck. No executors getting assigned
for the
Hi,
Was anyone able to run Beam application on Spark at all??
I tried all possible options and still no luck. No executors getting assigned
for the job submitted by below command even though explicitly specified,
$ ~/spark/bin/spark-submit --class org.apache.beam.examples.WordCount --master
ya
Hi Sathish,
Do you see the tasks submitted on the history server ?
Regards
JB
On 08/01/2017 11:51 AM, Sathish Jayaraman wrote:
Hi,
I am trying to execute Beam example in local spark setup. When I try to submit
the sample WordCount jar via spark-submit, the job just hangs at 'INFO
SparkRunne
43 matches
Mail list logo