Re: Best practices for removing lineage of a RDD or Graph object?

2014-06-18 Thread Andrea Esposito
No sure if it can help, btw: Checkpoint cuts the lineage. The checkpoint method is a flag. In order to actually perform the checkpoint you must do NOT materialise the RDD before it has been flagged otherwise the flag is just ignored. rdd2 = rdd1.map(..) rdd2.checkpoint() rdd2.count rdd2.isCheckpoi

Custom Serialization

2014-07-02 Thread Andrea Esposito
Hi, i have a non-serializable class and as workaround i'm trying to re-instantiate it at each de-serialization. Thus, i created a wrapper class and overridden the writeObject and readObject methods as follow: > private def writeObject(oos: ObjectOutputStream) { > oos.defaultWriteObject() >

An abstraction over Spark

2014-07-17 Thread Andrea Esposito
Hi all, I'd like to announce my MSc thesis that regards about an abstraction that is developed on top of Spark. More precisely, the computations are graph computations like Pregel/Bagel and the most recent GraphX. Starting from "graph as a network" view, a protocols stack-based view is conceived.

Re: what is the difference between persist() and cache()?

2014-04-13 Thread Andrea Esposito
AFAIK cache() is just a shortcut to the persist method with "MEMORY_ONLY" as storage level.. from the source code of RDD: > /** Persist this RDD with the default storage level (`MEMORY_ONLY`). */ > def persist(): RDD[T] = persist(StorageLevel.MEMORY_ONLY) > > /** Persist this RDD with the de

Re: Efficient Aggregation over DB data

2014-05-01 Thread Andrea Esposito
Hi Sai, i don't sincerely figure out where you are using the RDDs (because the split method isn't defined in them) by the way you should use the map function instead of the foreach due the fact it is NOT idempotent and some partitions could be recomputed executing the function multiple times. Wha

Re: Incredible slow iterative computation

2014-05-02 Thread Andrea Esposito
er observation is that in the 4th image some of your RDDs are massive > and some are tiny. > > > On Mon, Apr 14, 2014 at 11:45 AM, Andrea Esposito wrote: > >> Hi all, >> >> i'm developing an iterative computation over graphs but i'm struggling >> with

Re: cache not work as expected for iteration?

2014-05-04 Thread Andrea Esposito
Maybe your memory isn't enough to contain the current RDD and also all the past ones? RDDs that are cached or persisted have to be unpersisted explicitly, no auto-unpersist (maybe changes will be for 1.0 version?) exists. Be careful that calling cache() or persist() doesn't imply the RDD will be ma

Increase Stack Size Workers

2014-05-05 Thread Andrea Esposito
Hi there, i'm doing an iterative algorithm and sometimes i ended up with StackOverflowError, doesn't matter if i do checkpoints or not. Remaining i don't understand why this is happening, i figure out that increasing the stack size is a workaround. Developing using "local[n]" so the local mode i

Re: Incredible slow iterative computation

2014-05-05 Thread Andrea Esposito
Update: Checkpointing it doesn't perform. I checked by the "isCheckpointed" method but it returns always false. ??? 2014-05-05 23:14 GMT+02:00 Andrea Esposito : > Checkpoint doesn't help it seems. I do it at each iteration/superstep. > > Looking deeply, the RDDs are

Re: Incredible slow iterative computation

2014-05-06 Thread Andrea Esposito
Thanks all for helping. Following the Earthson's tip i resolved. I have to report that if you materialized the RDD and after you try to checkpoint it the operation doesn't perform. newRdd = oldRdd.map(myFun).persist(myStorageLevel) newRdd.foreach(x => myFunLogic(x)) // Here materialized for other

KryoSerializer Exception

2014-05-06 Thread Andrea Esposito
Hi there, sorry if i'm posting a lot lately. i'm trying to add the KryoSerializer but i receive this exception: 2014 - 05 - 06 11: 45: 23 WARN TaskSetManager: 62 - Loss was due to java.io.EOFException java.io.EOFException at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSer

Re: KryoSerializer Exception

2014-05-16 Thread Andrea Esposito
UP, doesn't anyone know something about it? ^^ 2014-05-06 12:05 GMT+02:00 Andrea Esposito : > Hi there, > > sorry if i'm posting a lot lately. > > i'm trying to add the KryoSerializer but i receive this exception: > 2014 - 05 - 06 11: 45: 23 WARN Tas

Re: KryoSerializer Exception

2014-05-30 Thread Andrea Esposito
Hi, i just migrate to 1.0. Still having the same issue. Either with or without the custom registrator. Just the usage of the KryoSerializer triggers the exception immediately. I set the kryo settings through the property: System.setProperty("spark.serializer", "org.apache.spark.serializer. KryoS