Hi,
Affects Version/s:1.6.0
Component/s:PySpark
I faced below exception when I tried to run
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=filter#pyspark.sql.SQLContext.jsonRDD
samples:
Exception: Python in worker has different version 2.7 than that in driver
3.5, PySpar
Have you built the Spark jars? Can you run the Spark Scala shell?
--Hossein
On Tuesday, October 6, 2015, Khandeshi, Ami
wrote:
> > Sys.setenv(SPARKR_SUBMIT_ARGS="--verbose sparkr-shell")
> > Sys.setenv(SPARK_PRINT_LAUNCH_COMMAND=1)
> >
> > sc <- sparkR.
Why not letting SparkSQL deal with parallelism? When using SparkSQL data
sources you can control parallelism by specifying mapred.min.split.size
and mapred.max.split.size in your Hadoop configuration. You can then
repartition your data as you wish and save it as Parquet.
--Hossein
On Thu, May 28
You can use SparkContext.wholeTextFile().
Please note that the documentation suggests: "Small files are preferred,
large file is also allowable, but may cause bad performance."
--Hossein
On Tue, Jul 29, 2014 at 9:21 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrot