[
https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592000#comment-15592000
]
Aleksander Eskilson commented on SPARK-17131:
---------------------------------------------
Yeah, that makes sense. So far, what I documented and this one seem to have
been the only JIRAs that exhibit specifically the Constant Pool limit error.
I'm trying to dig deeper into it to see if it really marks its own class of
error, but given that SPARK-17702 didn't resolve the error case I posted (even
though it splits up sections of large generated code), I do suspect they are,
quite related, but ultimately different issues. I think the spliExpressions
technique that was used in SPARK-17702 and that also appears to be being
employed in SPARK-16845 could be useful for the range of different classes that
can generate too many lines of code. Seeing the issues linked together is
definitely useful.
To that end, I'll leave mine resolved as a duplicate of SPARK-16845 for now
until I can make use of the patch it develops, so we can see more conclusively
if they're related issues, or truly duplicates. And I'll link the two "0xFFFF"
issues together as related.
> Code generation fails when running SQL expressions against a wide dataset
> (thousands of columns)
> ------------------------------------------------------------------------------------------------
>
> Key: SPARK-17131
> URL: https://issues.apache.org/jira/browse/SPARK-17131
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Iaroslav Zeigerman
> Attachments:
> _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch
>
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to
> generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0xFFFF
> {noformat}
> When running a common select with all columns it's fine:
> {code}
> val allCols = df.columns.map(c => col(c).as(c + "_alias"))
> val newDf = df.select(allCols: _*)
> newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
> at
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
> at
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
> at
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
> at
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
> at
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
> ... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has
> grown past JVM limit of 0xFFFF
> at
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
> at
> org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
> at
> org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
> at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
> at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
> at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
> at
> org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
> at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
> at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
> at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
> at
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
> at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
> at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
> at
> org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
> at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
> at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
> at
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
> at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
> at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
> at
> org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
> at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
> at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
> at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
> ....
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]