Java IO Stream Corrupted - Invalid Type AC?

2014-06-04 Thread Matt Kielo
Hi Im trying run some spark code on a cluster but I keep running into a "java.io.StreamCorruptedException: invalid type code: AC" error. My task involves analyzing ~50GB of data (some operations involve sorting) then writing them out to a JSON file. Im running the analysis on each of the data's ~1

Re: ---cores option in spark-shell

2014-06-03 Thread Matt Kielo
I havent been able to set the cores with that option in Spark 1.0.0 either. To work around that, setting the environment variable: SPARK_JAVA_OPTS="-Dspark.cores.max=" seems to do the trick. Matt Kielo Data Scientist Oculus Info Inc. On Tue, Jun 3, 2014 at 11:15 AM, Marek Wiewiorka wr

Sorting data large data- "too many open files" exception

2014-05-26 Thread Matt Kielo
Hello, I currently have a task always failing with "java.io.FileNotFoundException: [...]/shuffle_0_257_2155 (Too many open files)" when I run sorting operations such as distinct, sortByKey, or reduceByKey on a large number of partitions. Im working with 365 GB of data which is being split into 59