Hi Marco,
I was not able to find out what was causing the problem but a "git stash"
seems to have fixed it :/
Thanks for your help... :)
On Mon, Nov 28, 2016 at 10:50 PM, Marco Mistroni
wrote:
> Hi Andrew,
> sorry but to me it seems s3 is the culprit
> I have downloaded your json file and
Hi Andrew,
sorry but to me it seems s3 is the culprit
I have downloaded your json file and stored locally. Then write this simple
app (a subset of what you have in ur github, sorry i m littebit rusty on
how to create new column out of existing ones) which basically read the
json file
It's in Sc
I extracted out the boto bits and tested in vanilla python on the nodes. I
am pretty sure that the data from S3 is ok. I've applied a public policy to
the bucket s3://time-waits-for-no-man. There is a publicly available object
here: https://s3-eu-west-1.amazonaws.com/time-waits-for-no-man/1973-01-1
Hi
pickle erros normally point to serialisation issue. i am suspecting
something wrong with ur S3 data , but is just a wild guess...
Is your s3 object publicly available?
few suggestions to nail down the problem
1 - try to see if you can read your object from s3 using boto3 library
'offline',
I get a slight different error when not specifying a schema:
Traceback (most recent call last):
File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py",
line 61, in
df = sqlContext.createDataFrame(foo)
File
"/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/co
Hi,
Can anyone tell me what is causing this error
Spark 2.0.0
Python 2.7.5
df = sqlContext.createDataFrame(foo, schema)
https://gist.github.com/mooperd/368e3453c29694c8b2c038d6b7b4413a
Traceback (most recent call last):
File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py",
li
Yes there is.
But the RDD is more than 10 TB and compression does not help.
On Wed, Jul 15, 2015 at 8:36 PM, Ted Yu wrote:
> bq. serializeUncompressed()
>
> Is there a method which enables compression ?
>
> Just wondering if that would reduce the memory footprint.
>
> Cheers
>
> On Wed, Jul 15,
bq. serializeUncompressed()
Is there a method which enables compression ?
Just wondering if that would reduce the memory footprint.
Cheers
On Wed, Jul 15, 2015 at 8:06 AM, Saeed Shahrivari <
saeed.shahriv...@gmail.com> wrote:
> I use a simple map/reduce step in a Java/Spark program to remove
>
I use a simple map/reduce step in a Java/Spark program to remove duplicated
documents from a large (10 TB compressed) sequence file containing some
html pages. Here is the partial code:
JavaPairRDD inputRecords =
sc.sequenceFile(args[0], BytesWritable.class,
NullWritable.class).coalesce(numMap
Hello,
Trying to write data from Spark to Cassandra.
Reading data from Cassandra is ok, but writing seems to give a strange
error.
Exception in thread "main" scala.ScalaReflectionException: is not a
term
at scala.reflect.api.Symbols$SymbolApi$class.asTerm(Symbols.scala:259)
The
context:
http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-strange-error-tp23179p23180.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
read.run(Thread.java:745)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-strange-error-tp23179.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To u
0.034754522 s
[info] Lines with a: 62, Lines with b: 35
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/strange-error-tp4830.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
13 matches
Mail list logo