unfortunately this is a known issue:
https://issues.apache.org/jira/browse/SPARK-1476

as Sean suggested, you need to think of some other way of doing the same
thing, even if its just breaking your one big broadcast var into a few
smaller ones

On Fri, Feb 13, 2015 at 12:30 PM, Sean Owen <so...@cloudera.com> wrote:

> I think you've hit the nail on the head. Since the serialization
> ultimately creates a byte array, and arrays can have at most ~2
> billion elements in the JVM, the broadcast can be at most ~2GB.
>
> At that scale, you might consider whether you really have to broadcast
> these values, or want to handle them as RDDs and join and so on.
>
> Or consider whether you can break it up into several broadcasts?
>
>
> On Fri, Feb 13, 2015 at 6:24 PM, soila <skavu...@gmail.com> wrote:
> > I am trying to broadcast a large 5GB variable using Spark 1.2.0. I get
> the
> > following exception when the size of the broadcast variable exceeds 2GB.
> Any
> > ideas on how I can resolve this issue?
> >
> > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
> >         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:829)
> >         at
> org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:123)
> >         at
> org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:132)
> >         at
> > org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:99)
> >         at
> > org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:147)
> >         at
> > org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
> >         at
> > org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
> >         at
> > org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
> >         at
> > org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
> >         at
> >
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
> >         at
> >
> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:84)
> >         at
> >
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> >         at
> >
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
> >         at
> >
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
> >         at
> org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Size-exceeds-Integer-MAX-VALUE-exception-when-broadcasting-large-variable-tp21648.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to