Re: If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-19 Thread Xuefeng Wu
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- ~Yours, Xuefeng Wu/吴雪峰 敬上

Re: Spark stalls or hangs: is this a clue? remote fetches seem to never return?

2015-02-05 Thread Xuefeng Wu
what's the dump info by jstack? Yours, Xuefeng Wu 吴雪峰 敬上 > On 2015年2月6日, at 上午10:20, Michael Albert > wrote: > > My apologies for following up my own post, but I thought this might be of > interest. > > I terminated the java process corresponding to executor whic

Re: how to debug this kind of error, e.g. "lost executor"?

2015-02-05 Thread Xuefeng Wu
could you find the shuffle files? or the files were deleted by other processes? Yours, Xuefeng Wu 吴雪峰 敬上 > On 2015年2月5日, at 下午11:14, Yifan LI wrote: > > Hi, > > I am running a heavy memory/cpu overhead graphx application, I think the > memory is sufficient and set RDDs’

Re: Shuffle write increases in spark 1.2

2015-02-10 Thread Xuefeng Wu
It looks because different snappy version, if you disable compress or switch to lz4, the size is no different. Yours, Xuefeng Wu 吴雪峰 敬上 > On 2015年2月10日, at 下午6:13, chris wrote: > > Hello, > > as the original message from Kevin Jung never got accepted to the > mailinglis

Re: how to use SPARK_PUBLIC_DNS

2014-08-10 Thread Xuefeng Wu
there is docker script for spark 0.9 in spark git Yours, Xuefeng Wu 吴雪峰 敬上 > On 2014年8月10日, at 下午8:27, 诺铁 wrote: > > hi, all, > > I am playing with docker, trying to create a spark cluster with docker > containers. > > since spark master, worker, driver all nee

Re: [scala-user] Why aggregate is inconsistent?

2014-10-30 Thread Xuefeng Wu
4 at 5:39 PM, Xuefeng Wu wrote: > >> scala> import scala.collection.GenSeq >> scala> val seq = GenSeq("This", "is", "an", "example") >> >> scala> seq.aggregate("0")(_ + _, _ + _) >> res0: String = 0Th

How take top N of top M from RDD as RDD

2014-12-01 Thread Xuefeng Wu
Scores = for { (_, ageScores) <- takeTop(scores, _.age) (_, numScores) <- takeTop(ageScores, _.num) } yield { numScores } topScores.size -- ~Yours, Xuefeng Wu/吴雪峰 敬上

Re: How take top N of top M from RDD as RDD

2014-12-01 Thread Xuefeng Wu
hi Debasish, I found test code in map translate, would it collect all products too? + val sortedProducts = products.toArray.sorted(ord.reverse) Yours, Xuefeng Wu 吴雪峰 敬上 > On 2014年12月2日, at 上午1:33, Debasish Das wrote: > > rdd.top collects it on master... > > If you want top

Re: Alternatives to groupByKey

2014-12-03 Thread Xuefeng Wu
I have similar requirememt,take top N by key. right now I use groupByKey,but one key would group more than half data in some dataset. Yours, Xuefeng Wu 吴雪峰 敬上 > On 2014年12月4日, at 上午7:26, Nathan Kronenfeld > wrote: > > I think it would depend on the type and amount of inform

Re: Alternatives to groupByKey

2014-12-03 Thread Xuefeng Wu
looks good. I concern about the foldLeftByKey which looks break the consistence from foldLeft in RDD and aggregateByKey in PairRDD Yours, Xuefeng Wu 吴雪峰 敬上 > On 2014年12月4日, at 上午7:47, Koert Kuipers wrote: > > fold

Re: Is it possible to store graph directly into HDFS?

2014-12-30 Thread Xuefeng Wu
how about save as object? Yours, Xuefeng Wu 吴雪峰 敬上 > On 2014年12月30日, at 下午9:27, Jason Hong wrote: > > Dear all:) > > We're trying to make a graph using large input data and get a subgraph > applied some filter. > > Now, we wanna save this graph to HDFS so that

Re: State of spark docker script

2014-03-09 Thread Xuefeng Wu
Hi Aureliaono, First, docker is not ready for production, unless you know what are doing and prepared for some risk. Then, in my opinion , there are so many hard code in spark docker script, you have to modify it for your goal. Yours, Xuefeng Wu 吴雪峰 敬上 > On 2014年3月10日, at 上午12

Re: What is the difference between map and flatMap

2014-03-12 Thread Xuefeng Wu
atMap and what > is a good use case for each? > > -- > Eran | CTO > -- ~Yours, Xuefeng Wu/吴雪峰 敬上