No sure if it can help, btw:
Checkpoint cuts the lineage. The checkpoint method is a flag. In order to
actually perform the checkpoint you must do NOT materialise the RDD before
it has been flagged otherwise the flag is just ignored.
rdd2 = rdd1.map(..)
rdd2.checkpoint()
rdd2.count
rdd2.isCheckpoi
Hi,
i have a non-serializable class and as workaround i'm trying to
re-instantiate it at each de-serialization. Thus, i created a wrapper class
and overridden the writeObject and readObject methods as follow:
> private def writeObject(oos: ObjectOutputStream) {
> oos.defaultWriteObject()
>
Hi all,
I'd like to announce my MSc thesis that regards about an abstraction that
is developed on top of Spark. More precisely, the computations are graph
computations like Pregel/Bagel and the most recent GraphX.
Starting from "graph as a network" view, a protocols stack-based view is
conceived.
AFAIK cache() is just a shortcut to the persist method with "MEMORY_ONLY"
as storage level..
from the source code of RDD:
> /** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
> def persist(): RDD[T] = persist(StorageLevel.MEMORY_ONLY)
>
> /** Persist this RDD with the de
Hi Sai,
i don't sincerely figure out where you are using the RDDs (because the
split method isn't defined in them) by the way you should use the map
function instead of the foreach due the fact it is NOT idempotent and some
partitions could be recomputed executing the function multiple times.
Wha
er observation is that in the 4th image some of your RDDs are massive
> and some are tiny.
>
>
> On Mon, Apr 14, 2014 at 11:45 AM, Andrea Esposito wrote:
>
>> Hi all,
>>
>> i'm developing an iterative computation over graphs but i'm struggling
>> with
Maybe your memory isn't enough to contain the current RDD and also all the
past ones?
RDDs that are cached or persisted have to be unpersisted explicitly, no
auto-unpersist (maybe changes will be for 1.0 version?) exists.
Be careful that calling cache() or persist() doesn't imply the RDD will be
ma
Hi there,
i'm doing an iterative algorithm and sometimes i ended up with
StackOverflowError, doesn't matter if i do checkpoints or not.
Remaining i don't understand why this is happening, i figure out that
increasing the stack size is a workaround.
Developing using "local[n]" so the local mode i
Update: Checkpointing it doesn't perform. I checked by the "isCheckpointed"
method but it returns always false. ???
2014-05-05 23:14 GMT+02:00 Andrea Esposito :
> Checkpoint doesn't help it seems. I do it at each iteration/superstep.
>
> Looking deeply, the RDDs are
Thanks all for helping.
Following the Earthson's tip i resolved. I have to report that if you
materialized the RDD and after you try to checkpoint it the operation
doesn't perform.
newRdd = oldRdd.map(myFun).persist(myStorageLevel)
newRdd.foreach(x => myFunLogic(x)) // Here materialized for other
Hi there,
sorry if i'm posting a lot lately.
i'm trying to add the KryoSerializer but i receive this exception:
2014 - 05 - 06 11: 45: 23 WARN TaskSetManager: 62 - Loss was due to
java.io.EOFException
java.io.EOFException
at
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSer
UP, doesn't anyone know something about it? ^^
2014-05-06 12:05 GMT+02:00 Andrea Esposito :
> Hi there,
>
> sorry if i'm posting a lot lately.
>
> i'm trying to add the KryoSerializer but i receive this exception:
> 2014 - 05 - 06 11: 45: 23 WARN Tas
Hi,
i just migrate to 1.0. Still having the same issue.
Either with or without the custom registrator. Just the usage of the
KryoSerializer triggers the exception immediately.
I set the kryo settings through the property:
System.setProperty("spark.serializer", "org.apache.spark.serializer.
KryoS
13 matches
Mail list logo