Hi, I am trying to get the application id after I use SparkSubmit.main for a
yarn submission. I am able to make it asynchronous using
spark.yarn.watForCompletion=false configuration option, but I can't seem to
figure out how I can get the application id for this job. I read both
SparkSubmit.s
wrote:
Hi Ron,
You can try using the toDebugString method on the RDD, this will print the RDD
lineage.
Regards,Keith.
http://keith-chapman.com
On Fri, Jul 21, 2017 at 11:24 AM, Ron Gonzalez
wrote:
Hi, Can someone point me to a test case or share sample code that is able to
extract the RDD
Hi, Can someone point me to a test case or share sample code that is able to
extract the RDD graph from a Spark job anywhere during its lifecycle? I
understand that Spark has UI that can show the graph of the execution so I'm
hoping that is using some API somewhere that I could use. I know RDD
Hi,
After I create a table in spark sql and load infile an hdfs file to
it, the file is no longer queryable if I do hadoop fs -ls.
Is this expected?
Thanks,
Ron
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
F
I'd use Random Forest. It will give you better generalizability. There
are also a number of things you can do with RF that allows to train on
samples of the massive data set and then just average over the resulting
models...
Thanks,
Ron
On 07/21/2015 02:17 PM, Olivier Girardot wrote:
depends
Hi,
Question on using spark sql.
Can someone give an example for creating table from a directory
containing parquet files in HDFS instead of an actual parquet file?
Thanks,
Ron
On 07/21/2015 01:59 PM, Brandon White wrote:
A few questions about caching a table in Spark SQL.
1) Is there an
ing-the-thrift-jdbcodbc-server
>
>> On Mon, Jul 13, 2015 at 6:31 PM, Jerrick Hoang
>> wrote:
>> Well for adhoc queries you can use the CLI
>>
>>> On Mon, Jul 13, 2015 at 5:34 PM, Ron Gonzalez
>>> wrote:
>>> Hi,
>>> I have a q
Hi,
I have a question for Spark SQL. Is there a way to be able to use
Spark SQL on YARN without having to submit a job?
Bottom line here is I want to be able to reduce the latency of
running queries as a job. I know that the spark sql default submission
is like a job, but was wondering if i
If you're running on Ubuntu, do ulimit -n, which gives the max number of
allowed open files. You will have to change the value in
/etc/security/limits.conf to something like 1, logout and log back in.
Thanks,
Ron
Sent from my iPad
> On Aug 10, 2014, at 10:19 PM, Davies Liu wrote:
>
>> On
Hi Vida,
It's possible to save an RDD as a hadoop file using hadoop output formats. It
might be worthwhile to investigate using DBOutputFormat and see if this will
work for you.
I haven't personally written to a db, but I'd imagine this would be one way
to do it.
Thanks,
Ron
Sent from my i
One key thing I forgot to mention is that I changed the avro version to 1.7.7
to get AVRO-1476.
I took a closer look at the jars, and what I noticed is that the assembly jars
that work do not have the org.apache.avro.mapreduce package packaged into the
assembly. For spark-1.0.1, org.apache.avro
Cool thanks!
On Monday, August 4, 2014 8:58 AM, kriskalish wrote:
Hey Ron,
It was pretty much exactly as Sean had depicted. I just needed to provide
count an anonymous function to tell it which elements to count. Since I
wanted to count them all, the function is simply "true".
va
Can you share the mapValues approach you did?
Thanks,
Ron
Sent from my iPhone
> On Aug 1, 2014, at 3:00 PM, kriskalish wrote:
>
> Thanks for the help everyone. I got the mapValues approach working. I will
> experiment with the reduceByKey approach later.
>
> <3
>
> -Kris
>
>
>
>
> --
>
You have to import org.apache.spark.rdd._, which will automatically make
available this method.
Thanks,
Ron
Sent from my iPhone
> On Aug 1, 2014, at 3:26 PM, touchdown wrote:
>
> Hi, I am facing a similar dilemma. I am trying to aggregate a bunch of small
> avro files into one avro file. I re
Hi,
I took avro 1.7.7 and recompiled my distribution to be able to fix the issue
when dealing with avro GenericRecord. The issue I got was resolved. I'm
referring to AVRO-1476.
I also enabled kryo registration in SparkConf.
That said, I am still seeing a NotSerializableException for
Schema
le is overwritten in
hdfs after it's been registered as a local resource. Node manager logs are your
friend!
Just sharing in case other folks run into the same problem.
Thanks,
Ron
Sent from my iPhone
> On Jul 25, 2014, at 9:36 AM, Ron Gonzalez wrote:
>
> Folks,
> I've
Folks,
I've been able to submit simple jobs to yarn thus far. However, when I did
something more complicated that added 194 dependency jars using --addJars, the
job fails in YARN with no logs. What ends up happening is that no container
logs get created (app master or executor). If I add just
s just by chance that this ends up changing your
> average to be rounded.
>
> Can you try with cloning the records in the map call? Also look at the
> contents and see if they're actually changed, or if the resulting RDD after a
> cache is just the last record "smeared&qu
Hi,
I'm doing the following:
def main(args: Array[String]) = {
val sparkConf = new SparkConf().setAppName("AvroTest").setMaster("local[2]")
val sc = new SparkContext(sparkConf)
val conf = new Configuration()
val job = new Job(conf)
val path = new Path("/tmp/a.avro");
va
Hi,
I was doing programmatic submission of Spark yarn jobs and I saw code in
ClientBase.getDefaultYarnApplicationClasspath():
val field = classOf[MRJobConfig].getField("DEFAULT_YARN_APPLICATION_CLASSPATH)
MRJobConfig doesn't have this field so the created launch env is incomplete.
Workaround i
I am able to use Client.scala or LauncherExecutor.scala as my programmatic
entry point for Yarn.
Thanks,
Ron
Sent from my iPad
> On Jul 9, 2014, at 7:14 AM, Jerry Lam wrote:
>
> +1 as well for being able to submit jobs programmatically without using shell
> script.
>
> we also experience is
Koert,
Yeah I had the same problems trying to do programmatic submission of spark jobs
to my Yarn cluster. I was ultimately able to resolve it by reviewing the
classpath and debugging through all the different things that the Spark Yarn
client (Client.scala) did for submitting to Yarn (like env
The idea behind YARN is that you can run different application types like
MapReduce, Storm and Spark.
I would recommend that you build your spark jobs in the main method without
specifying how you deploy it. Then you can use spark-submit to tell Spark how
you would want to deploy to it using ya
Btw, I'm on 0.9.1. Will setting a queue programmatically be available in 1.0?
Thanks,
Ron
Sent from my iPad
> On May 20, 2014, at 6:27 PM, Ron Gonzalez wrote:
>
> Hi Sandy,
> Is there a programmatic way? We're building a platform as a service and
> need to assi
What version are you using? For 0.9, you need to set it outside your code
> with the SPARK_YARN_QUEUE environment variable.
>
> -Sandy
>
>
>> On Mon, May 19, 2014 at 9:29 PM, Ron Gonzalez wrote:
>> Hi,
>> How does one submit a spark job to yarn and specify a q
Hi,
How does one submit a spark job to yarn and specify a queue?
The code that successfully submits to yarn is:
val conf = new SparkConf()
val sc = new SparkContext("yarn-client", "Simple App", conf)
Where do I need to specify the queue?
Thanks in advance for any help on this...
> doing an avro one for this you probably want one of :
>> https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/*ProtoBuf*
>>
>> or just whatever your using at the moment to open them in a MR job probably
>> co
Hi,
Can you explain a little more what's going on? Which one submits a job to the
yarn cluster that creates an application master and spawns containers for the
local jobs? I tried yarn-client and submitted to our yarn cluster and it seems
to work that way. Shouldn't Client.scala be running wi
to make sure it propogates everywhere. There are also places it calls
SparkHadoopUtil.get.newConfiguration() so not sure those would handle it
properly.
You can always file a jira to add support for it and see what people think.
Tom
On Thursday, April 3, 2014 8:46 AM, Ron Gonzalez wrote:
Rig
Hi,
I know that sources need to either be java serializable or use kryo
serialization.
Does anyone have sample code that reads, transforms and writes avro files in
spark?
Thanks,
Ron
_DIR is getting put into your classpath. I would also make sure
HADOOP_PREFIX is being set.
Tom
On Wednesday, April 2, 2014 10:10 PM, Ron Gonzalez wrote:
Hi,
I have a small program but I cannot seem to make it connect to the right
properties of the cluster.
I have the SPARK_YARN_APP_JAR
Hi,
I have a small program but I cannot seem to make it connect to the right
properties of the cluster.
I have the SPARK_YARN_APP_JAR, SPARK_JAR and SPARK_HOME set properly.
If I run this scala file, I am seeing that this is never using the
yarn.resourcemanager.address property that I set o
32 matches
Mail list logo