pySpark and py4j: NoClassDefFoundError when upgrading a jar

2018-12-31 Thread Alessandro Liparoti
We developed a Scala library to run on spark called FV. We also built wrappers in python for its public API using py4j as in spark. For example, the main object is instantiated like this self._java_obj = self._new_java_obj("com.example.FV", self.uid) and the methods on the object are called in th

Trigger full GC during executor idle time?

2018-12-31 Thread Sean Owen
https://github.com/apache/spark/pull/23401 Interesting PR; I thought it was not worthwhile until I saw a paper claiming this can speed things up to the tune of 2-6%. Has anyone considered this before? Sean - To unsubscribe e-mai

Re: Trigger full GC during executor idle time?

2018-12-31 Thread Reynold Xin
Not sure how reputable or representative that paper is... On Mon, Dec 31, 2018 at 10:57 AM Sean Owen wrote: > https://github.com/apache/spark/pull/23401 > > Interesting PR; I thought it was not worthwhile until I saw a paper > claiming this can speed things up to the tune of 2-6%. Has anyone > c

Re: Trigger full GC during executor idle time?

2018-12-31 Thread Ryan Blue
After a quick look, I don't think that the paper's evaluation is very thorough. I don't see where it discusses what the PageRank implementation is doing in terms of object allocation or whether data is cached between iterati

Re: Trigger full GC during executor idle time?

2018-12-31 Thread Holden Karau
Maybe it would make sense to loop in the paper authors? I imagine they might have more information than ended up in the paper. On Mon, Dec 31, 2018 at 2:10 PM Ryan Blue wrote: > After a quick look, I don't think that the paper's >