PySpark OOM when running PCA

2019-02-07 Thread Riccardo Ferrari
Hi list, I am having troubles running a PCA with pyspark. I am trying to reduce a matrix size since my features after OHE gets 40k wide. Spark 2.2.0 Stand-alone (Oracle JVM) pyspark 2.2.0 from a docker (OpenJDK) I'm starting the spark session from the notebook however I make sure to: - PYSPA

Re: Aws

2019-02-07 Thread Noritaka Sekiyama
Hi Pedro, It seems that you disabled maximize resource allocation in 5.16, but enabled in 5.20. This config can be different based on how you start EMR cluster (via quick wizard, advanced wizard in console, or CLI/API). You can see that in EMR console Configuration tab. Please compare spark prope

Re: Aws

2019-02-07 Thread Hiroyuki Nagata
Hi, thank you Pedro I tested maximizeResourceAllocation option. When it's enabled, it seems Spark utilized their cores fully. However the performance is not so different from default setting. I consider to use s3-distcp for uploading files. And, I think table(dataframe) caching is also effectiven

Spark 2.4 partitions and tasks

2019-02-07 Thread Pedro Tuero
Hi, I am running a job in spark (using aws emr) and some stages are taking a lot more using spark 2.4 instead of Spark 2.3.1: Spark 2.4: [image: image.png] Spark 2.3.1: [image: image.png] With Spark 2.4, the keyBy operation take more than 10X what it took with Spark 2.3.1 It seems to be related

Re: java.lang.IllegalArgumentException: Unsupported class file major version 55

2019-02-07 Thread Jungtaek Lim
ASM 6 doesn't support Java 11. In master branch (for Spark 3.0) there's dependency upgrade on ASM 7 and also some efforts (if my understanding is right) to support Java 11, so you may need to use lower version of JDK (8 safest) for Spark 2.4.0, and try out master branch for preparing Java 11. Than

Re: java.lang.IllegalArgumentException: Unsupported class file major version 55

2019-02-07 Thread Gabor Somogyi
Hi Hande, "Unsupported class file major version 55" means java incompatibility. This error means you're trying to load a Java "class" file that was compiled with a newer version of Java than you have installed. For example, your .class file could have been compiled for JDK 8, and you're trying to

java.lang.IllegalArgumentException: Unsupported class file major version 55

2019-02-07 Thread Hande, Ranjit Dilip (Ranjit)
Hi, I am developing one java process which will consume data from Kafka using Apache Spark Streaming. For this I am using following: Java: openjdk version "11.0.1" 2018-10-16 LTS OpenJDK Runtime Environment Zulu11.2+3 (build 11.0.1+13-LTS) OpenJDK 64-Bit Server VM Zulu11.2+3 (build 11.0.1+13-LT