Re: createDataFrame causing a strange error.

2016-11-29 Thread Andrew Holway
Hi Marco, I was not able to find out what was causing the problem but a "git stash" seems to have fixed it :/ Thanks for your help... :) On Mon, Nov 28, 2016 at 10:50 PM, Marco Mistroni wrote: > Hi Andrew, > sorry but to me it seems s3 is the culprit > I have downloaded your json file and

Re: createDataFrame causing a strange error.

2016-11-28 Thread Marco Mistroni
Hi Andrew, sorry but to me it seems s3 is the culprit I have downloaded your json file and stored locally. Then write this simple app (a subset of what you have in ur github, sorry i m littebit rusty on how to create new column out of existing ones) which basically read the json file It's in Sc

Re: createDataFrame causing a strange error.

2016-11-28 Thread Andrew Holway
I extracted out the boto bits and tested in vanilla python on the nodes. I am pretty sure that the data from S3 is ok. I've applied a public policy to the bucket s3://time-waits-for-no-man. There is a publicly available object here: https://s3-eu-west-1.amazonaws.com/time-waits-for-no-man/1973-01-1

Re: createDataFrame causing a strange error.

2016-11-27 Thread Marco Mistroni
Hi pickle erros normally point to serialisation issue. i am suspecting something wrong with ur S3 data , but is just a wild guess... Is your s3 object publicly available? few suggestions to nail down the problem 1 - try to see if you can read your object from s3 using boto3 library 'offline',

Re: createDataFrame causing a strange error.

2016-11-27 Thread Andrew Holway
I get a slight different error when not specifying a schema: Traceback (most recent call last): File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 61, in df = sqlContext.createDataFrame(foo) File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/co

createDataFrame causing a strange error.

2016-11-27 Thread Andrew Holway
Hi, Can anyone tell me what is causing this error Spark 2.0.0 Python 2.7.5 df = sqlContext.createDataFrame(foo, schema) https://gist.github.com/mooperd/368e3453c29694c8b2c038d6b7b4413a Traceback (most recent call last): File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", li

Re: Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Saeed Shahrivari
Yes there is. But the RDD is more than 10 TB and compression does not help. On Wed, Jul 15, 2015 at 8:36 PM, Ted Yu wrote: > bq. serializeUncompressed() > > Is there a method which enables compression ? > > Just wondering if that would reduce the memory footprint. > > Cheers > > On Wed, Jul 15,

Re: Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Ted Yu
bq. serializeUncompressed() Is there a method which enables compression ? Just wondering if that would reduce the memory footprint. Cheers On Wed, Jul 15, 2015 at 8:06 AM, Saeed Shahrivari < saeed.shahriv...@gmail.com> wrote: > I use a simple map/reduce step in a Java/Spark program to remove >

Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Saeed Shahrivari
I use a simple map/reduce step in a Java/Spark program to remove duplicated documents from a large (10 TB compressed) sequence file containing some html pages. Here is the partial code: JavaPairRDD inputRecords = sc.sequenceFile(args[0], BytesWritable.class, NullWritable.class).coalesce(numMap

Question - writing data to Cassandra to Spark gives a strange error message

2015-06-24 Thread Koen Vantomme
Hello, Trying to write data from Spark to Cassandra. Reading data from Cassandra is ok, but writing seems to give a strange error. Exception in thread "main" scala.ScalaReflectionException: is not a term at scala.reflect.api.Symbols$SymbolApi$class.asTerm(Symbols.scala:259) The

Re: Shuffle strange error

2015-06-05 Thread octavian.ganea
context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-strange-error-tp23179p23180.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Shuffle strange error

2015-06-05 Thread octavian.ganea
read.run(Thread.java:745) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-strange-error-tp23179.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To u

strange error

2014-04-25 Thread Joe L
0.034754522 s [info] Lines with a: 62, Lines with b: 35 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/strange-error-tp4830.html Sent from the Apache Spark User List mailing list archive at Nabble.com.