[ 
https://issues.apache.org/jira/browse/SPARK-17223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-17223.
-------------------------------
    Resolution: Duplicate

> "grows beyond 64 KB" with data frame with many columns
> ------------------------------------------------------
>
>                 Key: SPARK-17223
>                 URL: https://issues.apache.org/jira/browse/SPARK-17223
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, PySpark
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: K
>
> Hi everyone, 
> We have a dataset with ~500 column. If I called a LabelIndexer on it and 
> tried to print out the first line, it fails with "grows beyond 64KB" error 
> below. My original dataset had >20K rows, I stripped to 100 rows, but didn't 
> help. Eventually, we want to feed LabelIndexer, VectorAssembler and Random 
> Forest into Pipeline but  we are not having much luck here :( We tried with 
> 2.0.0, and 2.1.0(snapshot as of 8/23). The problem is reproducible with the 
> data file here: 
> https://drive.google.com/file/d/0B2zl8xCBUVh6TFZDd3ZSUTNsam8/view?usp=sharing
> Thanks a lot!!
> Environment: Cluster with 2 nodes (CentOS, 64GB RAM and 8 cores each)
> Code is here (JIRA corrupted it so moved to google doc)
> https://docs.google.com/document/d/19unfhSMMCjoXqhmFOA1omm4V2wHaraY0RxZesbQluZU/edit?usp=sharing
> ERROR:
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.sql.execution.python.EvaluatePython.takeAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 250.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
> 250.0 (TID 4666, ip): java.util.concurrent.ExecutionException: 
> java.lang.Exception: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "compare(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to