Here is the relevant generated code and the Exception stacktrace. The problem in the generated code is at line 35.
/* 001 */ public java.lang.Object generate(Object[] references) { /* 002 */ return new SpecificSafeProjection(references); /* 003 */ } /* 004 */ /* 005 */ class SpecificSafeProjection extends org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection { /* 006 */ /* 007 */ private Object[] references; /* 008 */ private InternalRow mutableRow; /* 009 */ private boolean resultIsNull_0; /* 010 */ private boolean resultIsNull_1; /* 011 */ private boolean globalIsNull_0; /* 012 */ private boolean resultIsNull_2; /* 013 */ private long argValue_0; /* 014 */ private InternalRow value_MapObject_lambda_variable_2; /* 015 */ private boolean isNull_MapObject_lambda_variable_2; /* 016 */ private boolean resultIsNull_3; /* 017 */ private int argValue_1; /* 018 */ private boolean globalIsNull_1; /* 019 */ private java.lang.String[] mutableStateArray_0 = new java.lang.String[2]; /* 020 */ /* 021 */ public SpecificSafeProjection(Object[] references) { /* 022 */ this.references = references; /* 023 */ mutableRow = (InternalRow) references[references.length - 1]; /* 024 */ /* 025 */ /* 026 */ } /* 027 */ /* 028 */ public void initialize(int partitionIndex) { /* 029 */ /* 030 */ } /* 031 */ /* 032 */ public java.lang.Object apply(java.lang.Object _i) { /* 033 */ InternalRow i = (InternalRow) _i; /* 034 */ final MyPojo value_1 = false ? /* 035 */ null : MyPojo$.MODULE$.apply(); /* 036 */ MyPojo javaBean_0 = value_1; /* 037 */ if (!false) { /* 038 */ initializeJavaBean_1_0(i, javaBean_0); /* 039 */ initializeJavaBean_1_1(i, javaBean_0); /* 040 */ } /* 041 */ if (false) { /* 042 */ mutableRow.setNullAt(0); /* 043 */ } else { /* 044 */ /* 045 */ mutableRow.update(0, value_1); /* 046 */ } /* 047 */ /* 048 */ return mutableRow; /* 049 */ } /* 050 */ 13:59:55.114 [stream execution thread for [id = 8fe39e82-f94b-46c2-a54c-be7a69c76701, runId = 9fa63e2a-cca0-474a-b476-fc72f5af201a]] ERROR o.a.s.s.e.s.MicroBatchExecution - Query [id = 8fe39e82-f94b-46c2-a54c-be7a69c76701, runId = 9fa63e2a-cca0-474a-b476-fc72f5af201a] terminated with error org.apache.spark.SparkException: Job aborted due to stage failure: Task 71 in stage 32.0 failed 1 times, most recent failure: Lost task 71.0 in stage 32.0 (TID 729, LT405218.asml.com, executor driver): java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, Column 78: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, Column 78: Unknown variable or type "MyPojo$.MODULE$" at org.sparkproject.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at org.sparkproject.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at org.sparkproject.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.sparkproject.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) at org.sparkproject.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1318) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:205) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1261) at org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression.bufferRowToObject$lzycompute(TypedAggregateExpression.scala:267) at org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression.bufferRowToObject(TypedAggregateExpression.scala:267) at org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression.deserialize(TypedAggregateExpression.scala:271) at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.merge(interfaces.scala:559) at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1.$anonfun$applyOrElse$3(AggregationIterator.scala:199) at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1.$anonfun$applyOrElse$3$adapted(AggregationIterator.scala:199) at org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateProcessRow$7(AggregationIterator.scala:213) at org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateProcessRow$7$adapted(AggregationIterator.scala:207) at org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.processInputs(ObjectAggregationIterator.scala:159) at org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.<init>(ObjectAggregationIterator.scala:78) at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2(ObjectHashAggregateExec.scala:129) at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$2$adapted(ObjectHashAggregateExec.scala:107) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:859) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:859) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, Column 78: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, Column 78: Unknown variable or type "com.asml.sara.foundation.data.waferDomainModel.LotDataRecord$.MODULE$" at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1382) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1467) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1464) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) ... 37 more > Am 29.04.2021 um 15:30 schrieb Sean Owen <sro...@gmail.com>: > > > I don't know this code well, but yes seems like something is looking for > members of a companion object when there is none here. Can you show any more > of the stack trace or generated code? > >> On Thu, Apr 29, 2021 at 7:40 AM Rico Bergmann <i...@ricobergmann.de> wrote: >> Hi all! >> >> A simplified code snippet of what my Spark pipeline written in Java does: >> >> public class MyPojo implements Serializable { >> >> ... // some fields with Getter and Setter >> >> } >> >> >> a custom Aggregator (defined in the Driver class): >> >> public static MyAggregator extends >> org.apache.spark.sql.expressions.Aggregator<Row, MyPojo, MyPojo> { ... } >> >> >> in my Driver I do: >> >> Dataset<Row> inputDF = ... //some calculations before >> >> inputDF.groupBy("col1", "col2", "col3").agg(new >> MyAggregator().toColumn().name("aggregated"); >> >> >> When executing this part I get a CompileException complaining about an >> unknown variable or type "MyPojo$.MODULE$". For me it looks like the >> CodeGenerator generates code for Scala (since as far as I know .MODULE$ >> is a scala specific variable). I tried it with Spark 3.1.1 and Spark 3.0.1. >> >> Does anyone have an idea what's going wrong here? >> >> >> Best, >> >> Rico. >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>