from:"Xuefeng Wu"

Re: If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-19 Thread Xuefeng Wu

> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- ~Yours, Xuefeng Wu/吴雪峰敬上

Re: Spark stalls or hangs: is this a clue? remote fetches seem to never return?

2015-02-05 Thread Xuefeng Wu

what's the dump info by jstack? Yours, Xuefeng Wu 吴雪峰敬上 > On 2015年2月6日, at 上午10:20, Michael Albert > wrote: > > My apologies for following up my own post, but I thought this might be of > interest. > > I terminated the java process corresponding to executor whic

Re: how to debug this kind of error, e.g. "lost executor"?

2015-02-05 Thread Xuefeng Wu

could you find the shuffle files? or the files were deleted by other processes? Yours, Xuefeng Wu 吴雪峰敬上 > On 2015年2月5日, at 下午11:14, Yifan LI wrote: > > Hi, > > I am running a heavy memory/cpu overhead graphx application, I think the > memory is sufficient and set RDDs’

Re: Shuffle write increases in spark 1.2

2015-02-10 Thread Xuefeng Wu

It looks because different snappy version, if you disable compress or switch to lz4, the size is no different. Yours, Xuefeng Wu 吴雪峰敬上 > On 2015年2月10日, at 下午6:13, chris wrote: > > Hello, > > as the original message from Kevin Jung never got accepted to the > mailinglis

Re: how to use SPARK_PUBLIC_DNS

2014-08-10 Thread Xuefeng Wu

there is docker script for spark 0.9 in spark git Yours, Xuefeng Wu 吴雪峰敬上 > On 2014年8月10日, at 下午8:27, 诺铁 wrote: > > hi, all, > > I am playing with docker, trying to create a spark cluster with docker > containers. > > since spark master, worker, driver all nee

Re: [scala-user] Why aggregate is inconsistent?

2014-10-30 Thread Xuefeng Wu

4 at 5:39 PM, Xuefeng Wu wrote: > >> scala> import scala.collection.GenSeq >> scala> val seq = GenSeq("This", "is", "an", "example") >> >> scala> seq.aggregate("0")(_ + _, _ + _) >> res0: String = 0Th

How take top N of top M from RDD as RDD

2014-12-01 Thread Xuefeng Wu

Scores = for { (_, ageScores) <- takeTop(scores, _.age) (_, numScores) <- takeTop(ageScores, _.num) } yield { numScores } topScores.size -- ~Yours, Xuefeng Wu/吴雪峰敬上

Re: How take top N of top M from RDD as RDD

2014-12-01 Thread Xuefeng Wu

hi Debasish， I found test code in map translate, would it collect all products too？ + val sortedProducts = products.toArray.sorted(ord.reverse) Yours, Xuefeng Wu 吴雪峰敬上 > On 2014年12月2日, at 上午1:33, Debasish Das wrote: > > rdd.top collects it on master... > > If you want top

Re: Alternatives to groupByKey

2014-12-03 Thread Xuefeng Wu

I have similar requirememt，take top N by key. right now I use groupByKey，but one key would group more than half data in some dataset. Yours, Xuefeng Wu 吴雪峰敬上 > On 2014年12月4日, at 上午7:26, Nathan Kronenfeld > wrote: > > I think it would depend on the type and amount of inform

Re: Alternatives to groupByKey

2014-12-03 Thread Xuefeng Wu

looks good. I concern about the foldLeftByKey which looks break the consistence from foldLeft in RDD and aggregateByKey in PairRDD Yours, Xuefeng Wu 吴雪峰敬上 > On 2014年12月4日, at 上午7:47, Koert Kuipers wrote: > > fold

Re: Is it possible to store graph directly into HDFS?

2014-12-30 Thread Xuefeng Wu

how about save as object? Yours, Xuefeng Wu 吴雪峰敬上 > On 2014年12月30日, at 下午9:27, Jason Hong wrote: > > Dear all:) > > We're trying to make a graph using large input data and get a subgraph > applied some filter. > > Now, we wanna save this graph to HDFS so that

Re: State of spark docker script

2014-03-09 Thread Xuefeng Wu

Hi Aureliaono, First, docker is not ready for production, unless you know what are doing and prepared for some risk. Then, in my opinion , there are so many hard code in spark docker script, you have to modify it for your goal. Yours, Xuefeng Wu 吴雪峰敬上 > On 2014年3月10日, at 上午12

Re: What is the difference between map and flatMap

2014-03-12 Thread Xuefeng Wu

atMap and what > is a good use case for each? > > -- > Eran | CTO > -- ~Yours, Xuefeng Wu/吴雪峰敬上

Re: If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

Re: Spark stalls or hangs: is this a clue? remote fetches seem to never return?

Re: how to debug this kind of error, e.g. "lost executor"?

Re: Shuffle write increases in spark 1.2

Re: how to use SPARK_PUBLIC_DNS

Re: [scala-user] Why aggregate is inconsistent?

How take top N of top M from RDD as RDD

Re: How take top N of top M from RDD as RDD

Re: Alternatives to groupByKey

Re: Alternatives to groupByKey

Re: Is it possible to store graph directly into HDFS?

Re: State of spark docker script

Re: What is the difference between map and flatMap

13 matches

Site Navigation

Mail list logo

Footer information