Unfortunately, I don't have a repro, and I'm only seeing this at scale. But I was able to get around the issue by fiddling with the distribution of my data before asking GraphFrames to process it. (I think that's where the error was being thrown from.)
On Wed, Dec 7, 2016 at 7:32 AM Kazuaki Ishizaki <ishiz...@jp.ibm.com> wrote: > I do not have a repro, too. > But, when I took a quick browse at the file 'UnsafeInMemorySort.java', I > am afraid about the similar cast issue like > https://issues.apache.org/jira/browse/SPARK-18458at the following line. > > https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java#L156 > > Regards, > Kazuaki Ishizaki > > > > From: Reynold Xin <r...@databricks.com> > To: Nicholas Chammas <nicholas.cham...@gmail.com> > Cc: Spark dev list <dev@spark.apache.org> > Date: 2016/12/07 14:27 > Subject: Re: Reduce memory usage of UnsafeInMemorySorter > ------------------------------ > > > > This is not supposed to happen. Do you have a repro? > > > On Tue, Dec 6, 2016 at 6:11 PM, Nicholas Chammas < > *nicholas.cham...@gmail.com* <nicholas.cham...@gmail.com>> wrote: > [Re-titling thread.] > OK, I see that the exception from my original email is being triggered > from this part of UnsafeInMemorySorter: > > *https://github.com/apache/spark/blob/v2.0.2/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java#L209-L212* > <https://github.com/apache/spark/blob/v2.0.2/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java#L209-L212> > So I can ask a more refined question now: How can I ensure that > UnsafeInMemorySorterhas room to insert new records? In other words, how > can I ensure that hasSpaceForAnotherRecord()returns a true value? > Do I need: > > - More, smaller partitions? > - More memory per executor? > - Some Java or Spark option enabled? > - etc. > I’m running Spark 2.0.2 on Java 7 and YARN. Would Java 8 help here? > (Unfortunately, I cannot upgrade at this time, but it would be good to know > regardless.) > This is morphing into a user-list question, so accept my apologies. > Since I can’t find any information anywhere else about this, and the > question is about internals like UnsafeInMemorySorter, I hope this is > OK here. > Nick > On Mon, Dec 5, 2016 at 9:11 AM Nicholas Chammas > *nicholas.cham...@gmail.com* > <http://mailto:nicholas.cham...@gmail.com/>wrote: > I was testing out a new project at scale on Spark 2.0.2 running on > YARN, and my job failed with an interesting error message: > > TaskSetManager: Lost task 37.3 in stage 31.0 (TID 10684, > *server.host.name* <http://server.host.name/>): > java.lang.IllegalStateException: There is no space for new record > 05:27:09.573 at > > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.insertRecord(UnsafeInMemorySorter.java:211) > 05:27:09.574 at > > org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:127) > 05:27:09.574 at > > org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:244) > 05:27:09.575 at > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown > Source) > 05:27:09.575 at > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > 05:27:09.576 at > > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > 05:27:09.576 at > > org.apache.spark.sql.execution.WholeStageCodegenExec$anonfun$8$anon$1.hasNext(WholeStageCodegenExec.scala:370) > 05:27:09.577 at > scala.collection.Iterator$anon$11.hasNext(Iterator.scala:408) > 05:27:09.577 at > > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > 05:27:09.577 at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > 05:27:09.578 at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > 05:27:09.578 at org.apache.spark.scheduler.Task.run(Task.scala:86) > 05:27:09.578 at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > 05:27:09.579 at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > 05:27:09.579 at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > 05:27:09.579 at java.lang.Thread.run(Thread.java:745) > > I’ve never seen this before, and searching on Google/DDG/JIRA doesn’t > yield any results. There are no other errors coming from that executor, > whether related to memory, storage space, or otherwise. > Could this be a bug? If so, how would I narrow down the source? > Otherwise, how might I work around the issue? > Nick > > > > > >