Thanks! It's good to know
--- Original Message ---
From: "Eskilson,Aleksander"
Sent: June 25, 2015 5:57 AM
To: "Felix C" , user@spark.apache.org
Subject: Re: SparkR parallelize not found with 1.4.1?
Hi there,
Parallelize is part of the RDD API which was made private
Hi,
It must be something very straightforward...
Not working:
parallelize(sc)
Error: could not find function "parallelize"
Working:
df <- createDataFrame(sqlContext, localDF)
What did I miss?
Thanks
Your python job runs in a python process interacting with JVM. You do need
matching python version and other dependent packages on the driver and all
worker nodes if you run in YARN mode.
--- Original Message ---
From: "Bin Wang"
Sent: May 8, 2015 9:56 PM
To: "Apache.Spark.User"
Subject: Usin
Or you could build an uber jar ( you could google that )
https://eradiating.wordpress.com/2015/02/15/getting-spark-streaming-on-kafka-to-work/
--- Original Message ---
From: "Akhil Das"
Sent: April 4, 2015 11:52 PM
To: "Priya Ch"
Cc: user@spark.apache.org, "dev"
Subject: Re: Spark streaming w
The spark-csv package can handle header row, and the code is at the link below.
It could also use the header to infer field names in the schema.
https://github.com/databricks/spark-csv/blob/master/src/main/scala/com/databricks/spark/csv/CsvRelation.scala
--- Original Message ---
From: "Dean Wam
We have gotten it to work...
--- Original Message ---
From: "nitinkak001"
Sent: March 3, 2015 7:46 AM
To: user@spark.apache.org
Subject: Re: Running Spark jobs via oozie
I am also starting to work on this one. Did you get any solution to this
issue?
--
View this message in context:
http://a
It should work in CDH without having to recompile.
http://eradiating.wordpress.com/2015/02/22/getting-hivecontext-to-work-in-cdh/
--- Original Message ---
From: "Ted Yu"
Sent: March 2, 2015 1:35 PM
To: "nitinkak001"
Cc: "user"
Subject: Re: Executing hive query from Spark code
Here is snippet
We use Oozie as well, and it has worked well.
The catch is each action in Oozie is separate and one cannot retain
SparkContext or RDD, or leverage caching or temp table, going into another
Oozie action. You could either save output to file or put all Spark processing
into one Oozie action.
---
Kafka 0.8.2 has built-in offset management, how would that affect direct stream
in spark?
Please see KAFKA-1012
--- Original Message ---
From: "Tathagata Das"
Sent: February 23, 2015 9:53 PM
To: "V Dineshkumar"
Cc: "user"
Subject: Re: Write ahead Logs and checkpoint
Exactly, that is the reas
Your earlier call stack clearly states that it fails because the Derby
metastore has already been started by another instance, so I think that is
explained by your attempt to run this concurrently.
Are you running Spark standalone? Do you have a cluster? You should be able to
run spark in yarn-
You would probably write to hdfs or check out
https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html
You might be able to retrofit it to you use case.
--- Original Message ---
From: "Su She"
Sent: February 11, 2015 10:55 PM
To: "Felix C&qu
: "Su She"
Sent: February 11, 2015 10:23 AM
To: "Felix C"
Cc: "Kelvin Chu" <2dot7kel...@gmail.com>, user@spark.apache.org
Subject: Re: Can spark job server be used to visualize streaming data?
Thank you Felix and Kelvin. I think I'll def be using the k-means
As far as from my tests, language integrated query in spark isn't type safe, ie.
query.where('cost == "foo")
Would compile and return nothing.
If you want type safety, perhaps you want to map the SchemaRDD to a RDD of
Product (your type, not scala.Product)
--- Original Message ---
From: "jay
Checkout
https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html
In there are links to how that is done.
--- Original Message ---
From: "Kelvin Chu" <2dot7kel...@gmail.com>
Sent: February 10, 2015 12:48 PM
To: "Su She"
Cc: user@spark.apache.org
Subject: Re: Can s
Alternatively, is there another way to do it?
groupByKey has been called out as expensive and should be avoid (it causes
shuffling of data).
I've generally found it possible to use reduceByKey instead
--- Original Message ---
From: "Arun Luthra"
Sent: February 10, 2015 1:16 PM
To: user@spark.a
Agree. PySpark would call spark-submit. Check out the command line there.
--- Original Message ---
From: "Mohit Singh"
Sent: February 9, 2015 11:26 PM
To: "Ashish Kumar"
Cc: user@spark.apache.org
Subject: Re: ImportError: No module named pyspark, when running pi.py
I think you have to run that
Is YARN_CONF_DIR set?
--- Original Message ---
From: "Aniket Bhatnagar"
Sent: February 4, 2015 6:16 AM
To: "kundan kumar" , "spark users"
Subject: Re: Spark Job running on localhost on yarn cluster
Have you set master in SparkConf/SparkContext in your code? Driver logs
show in which mode the
Try rdd.coalesce(1).saveAsParquetFile(...)
http://spark.apache.org/docs/1.2.0/programming-guide.html#transformations
--- Original Message ---
From: "Manoj Samel"
Sent: January 29, 2015 9:28 AM
To: user@spark.apache.org
Subject: schemaRDD.saveAsParquetFile creates large number of small parquet
Python couldn't find your module. Do you have that on each worker node? You
will need to have that on each one
--- Original Message ---
From: "Davies Liu"
Sent: January 22, 2015 9:12 PM
To: "Mohit Singh"
Cc: user@spark.apache.org
Subject: Re: Using third party libraries in pyspark
You need to
+1. I can confirm this. It says collect fails in Py4J
--- Original Message ---
From: "Dave"
Sent: January 20, 2015 6:49 AM
To: user@spark.apache.org
Subject: Re: Error for first run from iPython Notebook
Not sure if anyone who can help has seen this. Any suggestions would be
appreciated, thanks
20 matches
Mail list logo