Hi Marco,
I was not able to find out what was causing the problem but a "git stash"
seems to have fixed it :/
Thanks for your help... :)
On Mon, Nov 28, 2016 at 10:50 PM, Marco Mistroni
wrote:
> Hi Andrew,
> sorry but to me it seems s3 is the culprit
> I have downloaded your json file and
Hi Andrew,
sorry but to me it seems s3 is the culprit
I have downloaded your json file and stored locally. Then write this simple
app (a subset of what you have in ur github, sorry i m littebit rusty on
how to create new column out of existing ones) which basically read the
json file
It's in Sc
I extracted out the boto bits and tested in vanilla python on the nodes. I
am pretty sure that the data from S3 is ok. I've applied a public policy to
the bucket s3://time-waits-for-no-man. There is a publicly available object
here: https://s3-eu-west-1.amazonaws.com/time-waits-for-no-man/1973-01-1
Hi
pickle erros normally point to serialisation issue. i am suspecting
something wrong with ur S3 data , but is just a wild guess...
Is your s3 object publicly available?
few suggestions to nail down the problem
1 - try to see if you can read your object from s3 using boto3 library
'offline',
I get a slight different error when not specifying a schema:
Traceback (most recent call last):
File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py",
line 61, in
df = sqlContext.createDataFrame(foo)
File
"/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/co
Hi,
Can anyone tell me what is causing this error
Spark 2.0.0
Python 2.7.5
df = sqlContext.createDataFrame(foo, schema)
https://gist.github.com/mooperd/368e3453c29694c8b2c038d6b7b4413a
Traceback (most recent call last):
File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py",
li