Re: Spark Structured Streaming runner migrated to Spark 3

2021-08-05 Thread Austin Bennett
Hooray! Thanks, Etienne! On Thu, Aug 5, 2021 at 3:11 AM Etienne Chauchot wrote: > Hi all, > > Just to let you know that Spark Structured Streaming runner was migrated > to Spark 3. > > Enjoy ! > > Etienne > >

Re: Spark Structured Streaming Runner Roadmap

2021-08-03 Thread Etienne Chauchot
Hi, Sorry for the late answer: the streaming mode in spark structured streaming runner is stuck because of spark structured streaming framework implementation of watermark at the apache spark project side. See https://echauchot.blogspot.com/2020/11/watermark-architecture-proposal-for.html be

Re: spark-submit and the portable runner

2021-06-24 Thread Trevor Kramer
Thanks. That was the issue. The latest release of Beam indicates it supports Spark 3 and I find references in the code to Hadoop 3.2.1. Is it possible to configure beam to run on Hadoop 3.2.1? Trevor On Mon, Jun 21, 2021 at 6:19 PM Kyle Weaver wrote: > Looks like a version mismatch between Hado

Re: Spark Portable Runner + Docker

2020-10-28 Thread Ramesh Mathikumar
Hi Alex -- Please se the details you are looking for. I am running a sample pipeline and my environment is this. python "SaiStudy - Apache-Beam-Spark.py" --runner=PortableRunner --job_endpoint=192.168.99.102:8099 My Spark is running on a Docker Container and I can see that the JobService is ru

Re: Spark Portable Runner + Docker

2020-10-28 Thread Alexey Romanenko
Hi Ramesh, By “+ Docker” do you mean Docker SDK Harness or running a Spark in Docker? For the former I believe it works fine. Could you share more details of what kind of error you are facing? > On 27 Oct 2020, at 21:10, Ramesh Mathikumar wrote: > > Hi Group -- Has anyone got this to work?

Re: Spark: No TransformEvaluator registered

2019-01-29 Thread Matt Casters
That was it Juan Carlos. Can't thank you enough. I now have generic Kettle transformations running on the Direct Runner, Dataflow and Spark. Cheers, Matt --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder Op di 29 jan. 2019 om 18:19 schreef Matt Casters :

Re: Spark: No TransformEvaluator registered

2019-01-29 Thread Matt Casters
No you're right. I got so focused on getting org.apache.hadoop.fs.FileSystem in order that I forgot about the other files. Doh! Thanks for the tip! --- Matt Casters attcast...@gmail.com> Senior Solution Architect, Kettle Project Founder Op di 29 jan. 2019 om 16:41 schreef Juan Carlos Garcia :

Re: Spark: No TransformEvaluator registered

2019-01-29 Thread Juan Carlos Garcia
Hi Matt, Are the META-INF/services files merged correctly on the fat-jar? On Tue, Jan 29, 2019 at 2:33 PM Matt Casters wrote: > Hi Beam! > > After I have my pipeline created and the execution started on the Spark > master I get this strange nugget: > > java.lang.IllegalStateException: No Tran

Re: Spark progress feedback

2019-01-29 Thread Matt Casters
OK folks, I figured it out. For the other people desperately clutching to years-old google results in the hope to find any hint... The Spark requirement to work with a fat jar caused a collision in the packaging on file: META-INF/services/org.apache.hadoop.fs.FileSystem This in turn erased cert

Re: Spark progress feedback

2019-01-28 Thread Matt Casters
Yeah for this setup I used flintrock to start up a bunch of nodes with Spark and HDFS on AWS. I'm launching the pipeline on the master and all possible HDFS libraries I can think of are available and hdfs dfs commands work fine on the master and all the slaves. It's a problem of transparency I thin

Re: Spark progress feedback

2019-01-28 Thread Juan Carlos Garcia
Matt is the machine from where you are launching the pipeline different from where it should run? If that's the case make sure the machine used for launching has all the hdfs environments variable set, as the pipeline is being configured in the launching machine before it hit the worker machine.

Re: Spark

2019-01-19 Thread Matt Casters
Thanks for the suggestion but throwing another server into the mix wouldn't help in my case. I'm still betting that using SparkLauncher would solve a lot. Building a far jar isn't that big of a deal. However, all the Kettle libs along with the ones from Beam clocks in at 1.4GB at this point in ti

Re: Spark

2019-01-18 Thread Alexey Romanenko
Hi Matt, I just wanted to remind that you also can use Apache Livy [1] to launch Spark jobs (or Beam pipelines that are built with support of SparkRunner) on Spark using just REST API [2]. And of course, you need to create manually a “fat" jar and put it somewhere where Spark can find it. [1]

Re: Spark

2019-01-18 Thread Juan Carlos Garcia
Hi Matt, With flink you will be able launch your pipeline just by invoking the main method of your main class, however it will run as standalone process and you will not have the advantage of distribute computation. Am Fr., 18. Jan. 2019, 09:37 hat Matt Casters geschrieben: > Thanks for the rep

Re: Spark

2019-01-18 Thread Matt Casters
Thanks for the reply JC, I really appreciate it. I really can't force our users to use antiquated stuff like scripts, let alone command line things, but I'll simply use SparkLauncher and your comment about the main class doing Pipeline.run() on the Master is something I can work with... somewhat.

Re: Spark

2019-01-17 Thread Juan Carlos Garcia
Hi Matt, during the time we were using Spark with Beam, the solution was always to pack the jar and use the spark-submit command pointing to your main class which will do `pipeline.run`. The spark-submit command have a flag to decide how to run it (--deploy-mode), whether to launch the job on the

Re: Spark storageLevel not taking effect

2018-10-12 Thread Juan Carlos Garcia
Hi Mike, >From the documentation on https://beam.apache.org/documentation/runners/spark/#pipeline-options-for-the-spark-runner storageLevel The StorageLevel to use when caching RDDs in batch pipelines. The Spark Runner automatically caches RDDs that are evaluated repeatedly. This is a batch-only

Re: spark streaming

2018-06-29 Thread Lukasz Cwik
+dev On Fri, Jun 29, 2018 at 1:32 AM Akanksha Sharma B < akanksha.b.sha...@ericsson.com> wrote: > Hi, > > > I just started using apache beam pipeline for processing unbouned data, on > spark. So it essentially uses spark-streaming. > > > However, I came across following statement in Spark Runner

Re: spark runner for Scala 2.10

2018-02-22 Thread Jean-Baptiste Onofré
Let me create a branch on my github and share it with you. Let's continue the discussion with direct message (to avoid to flood the mailing list). Regards JB On 02/22/2018 06:51 PM, Gary Dusbabek wrote: > Yes, that would be very helpful. If possible, I'd like to understand how it is > constructed

Re: spark runner for Scala 2.10

2018-02-22 Thread Gary Dusbabek
Yes, that would be very helpful. If possible, I'd like to understand how it is constructed so that I can maintain it. A link to a git repo would be great. I've spent some time trying to understand how the Beam project is built/managed. It looks like the poms are intended primarily for developers a

Re: spark runner for Scala 2.10

2018-02-22 Thread Jean-Baptiste Onofré
OK, do you want me to provide a Scala 2.10 build for you ? Regards JB On 02/22/2018 06:44 PM, Gary Dusbabek wrote: > Jean-Baptiste, > > Thanks for responding. I agree--it would be better to use Scala 2.11. I'm in > the > process of creating a Beam POC with an existing platform and upgrading > e

Re: spark runner for Scala 2.10

2018-02-22 Thread Gary Dusbabek
Jean-Baptiste, Thanks for responding. I agree--it would be better to use Scala 2.11. I'm in the process of creating a Beam POC with an existing platform and upgrading everything in that platform to Scala 2.11 as a prerequisite is out of scope. It would be helpful to know if Beam in it's current s

Re: spark runner for Scala 2.10

2018-02-22 Thread Jean-Baptiste Onofré
Hi Gary, Beam 2.3.0 and the Spark runner use Scala 2.11. I can help you to have a smooth transition by creating a local branch using Scala 2.10. However, I strongly advice to upgrade to 2.11 as some other part of Beam (other runners and IOs) use 2.11 already. Regards JB On 02/22/2018 05:55 PM

Re: Spark runner maven shade plugin

2018-01-15 Thread Alexey Romanenko
Hi Ron, Could you show the whole pom.xml which you actually used? I can guess that it was an issue with transformers configuration because seems ServicesResourceTransformer doesn’t support “resource" parameter. WBR, Alexey > On 15 Jan 2018, at 07:51, Ron Gonzalez wrote: > > Hi, > I added

Re: Spark Runner Issues with YARN

2017-12-04 Thread Jean-Baptiste Onofré
Hi, I second Luke here: you have to use Spark 1.x or use the PR supporting Spark 2.x. Regards JB On 12/04/2017 08:14 PM, Lukasz Cwik wrote: It seems like your trying to use Spark 2.1.0. Apache Beam currently relies on users using Spark 1.6.3. There is an open pull request[1] to migrate to Spa

Re: Spark Runner Issues with YARN

2017-12-04 Thread Lukasz Cwik
It seems like your trying to use Spark 2.1.0. Apache Beam currently relies on users using Spark 1.6.3. There is an open pull request[1] to migrate to Spark 2.2.0. 1: https://github.com/apache/beam/pull/4208/ On Mon, Dec 4, 2017 at 10:58 AM, Opitz, Daniel A wrote: > We are trying to submit a Spa

Re: Spark and Beam

2017-09-29 Thread Jean-Baptiste Onofré
That's correct, but I would avoid to do this generally speaking. IMHO, it's better to use the default Spark context created by the runner. Regards JB On 09/27/2017 01:02 PM, Romain Manni-Bucau wrote: You have the SparkPipelineOptions you can set (org.apache.beam.runners.spark.SparkPipelineOpti

Re: Spark and Beam

2017-09-27 Thread Aviem Zur
What type of Spark configuration are you trying to augment? As Romain mentioned, you can supply a custom Spark conf to your Beam pipeline, but depending on your use case this may not be necessary, and best avoided. Also, keep in mind that if you use spark-submit you can add your configurations whe

Re: Spark and Beam

2017-09-27 Thread Romain Manni-Bucau
You have the SparkPipelineOptions you can set (org.apache.beam.runners.spark.SparkPipelineOptions - https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java), most of these options are explained at https://beam.apache.org/documen

Re: Spark and Beam

2017-09-27 Thread tal m
my last email want't clear, please ignore. Thanks it's looks better (no error or exceptions) my problem now is how to set Spark conf to my pipeline, this is what i have ? onf = SparkConf(); conf.setAppName("InsightEdge Python Example") conf.set("my.field1", "XXX") conf.set("my.field2", "YYY")

Re: Spark and Beam

2017-09-27 Thread tal m
Thanks it's looks better (no error or exceptions) my problem now is how to set Spark conf to my pipeline, this is what i have ? onf = SparkConf(); conf.setAppName("InsightEdge Python Example") conf.set("spark.insightedge.space.name", "insightedge-space") conf.set("spark.insightedge.space.lookup

Re: Spark and Beam

2017-09-26 Thread Romain Manni-Bucau
Yes, you need a coder for your product if it is passed to the output for next "apply". You can register it on the pipeline or through beam SPI. Here is a sample to use java serialization: pipeline.getCoderRegistry().registerCoderForClass(Product.class, SerializableCoder.of(Product.class)); Roma

Re: Spark and Beam

2017-09-26 Thread tal m
hi i tried what you wrote the argumants that i use are: --sparkMaster=local --runner=SparkRunner i already have Spark running. now i'm getting the following error: .IllegalStateException: Unable to return a default Coder for ParDo(Anonymous)/ParMultiDo(Anonymous).out0 [PCollectio Thanks Tal

Re: Spark and Beam

2017-09-26 Thread Romain Manni-Bucau
Hi Tal, Did you try something like that: public static void main(final String[] args) { final Pipeline pipeline = Pipeline.create(PipelineOptionsFactory.fromArgs(args).create()); pipeline.apply(GenerateSequence.from(0).to(10L)) .apply(ParDo.of(new DoFn() {

Re: Spark and Beam

2017-09-26 Thread tal m
HI i looked at the links you sent me, and i haven't found any clue how to adapt it to my current code. my code is very simple: val sc = spark.sparkContext val productsNum = 10 println(s"Saving $productsNum products RDD to the space") val rdd = sc.parallelize(1 to productsNum).map { i => Pro

Re: Spark and Beam

2017-09-26 Thread Aviem Zur
Hi Tal, Thanks for reaching out! Please take a look at our documentation: Quickstart guide (Java): https://beam.apache.org/get-started/quickstart-java/ This guide will show you how to run our wordcount example using each any of the runners (For example, direct runner or Spark runner in your case

Re: Spark job hangs up at Evaluating ParMultiDo(ParseInput)

2017-08-20 Thread Aviem Zur
Glad to have helped! Thanks again for reaching out. Keep on Beaming and let us know about any issues you encounter. On Fri, Aug 18, 2017 at 2:12 PM Sathish Jayaraman wrote: > Hey Aviem, > > Thank you for noticing. You were right, I was able to run all my jobs in > Spark 1.6.3. You are awesome!!!

Re: Spark job hangs up at Evaluating ParMultiDo(ParseInput)

2017-08-18 Thread Sathish Jayaraman
Hey Aviem, Thank you for noticing. You were right, I was able to run all my jobs in Spark 1.6.3. You are awesome!!! Regards, Sathish. J On 16-Aug-2017, at 3:23 PM, Aviem Zur mailto:aviem...@gmail.com>> wrote: 2.0.0. I really doubt if its Spark setup issue. I wrote a native Spark applicatio

Re: Spark job hangs up at Evaluating ParMultiDo(ParseInput)

2017-08-10 Thread Aviem Zur
Hi Jayaraman, Thanks for reaching out. We run Beam using Spark runner daily on a yarn cluster. It appears that in many of the logs you sent there is hanging when connecting to certain servers on certain ports, could this be a network issue or an issue with your Spark setup? Could you please shar

Re: Spark job hangs up at Evaluating ParMultiDo(ParseInput)

2017-08-03 Thread Sathish Jayaraman
Hi, Thanks for trying it out. I was running the job in local single node setup. I also spawn a HDInsights cluster in Azure platform just to test the WordCount program. Its the same result there too, stuck at the Evaluating ParMultiDo step. It runs fine in mvn compile exec, but when bundled int

Re: Spark job hangs up at Evaluating ParMultiDo(ParseInput)

2017-08-03 Thread Jean-Baptiste Onofré
Hi, yes, but not with Yarn. Let me bootstrap a quick cluster with Yarn to test it. Regards JB On 08/03/2017 09:34 AM, Sathish Jayaraman wrote: Hi, Was anyone able to run Beam application on Spark at all?? I tried all possible options and still no luck. No executors getting assigned for the

Re: Spark job hangs up at Evaluating ParMultiDo(ParseInput)

2017-08-03 Thread Sathish Jayaraman
Hi, Was anyone able to run Beam application on Spark at all?? I tried all possible options and still no luck. No executors getting assigned for the job submitted by below command even though explicitly specified, $ ~/spark/bin/spark-submit --class org.apache.beam.examples.WordCount --master ya

Re: Spark job hangs up at Evaluating ParMultiDo(ParseInput)

2017-08-01 Thread Jean-Baptiste Onofré
Hi Sathish, Do you see the tasks submitted on the history server ? Regards JB On 08/01/2017 11:51 AM, Sathish Jayaraman wrote: Hi, I am trying to execute Beam example in local spark setup. When I try to submit the sample WordCount jar via spark-submit, the job just hangs at 'INFO SparkRunne