from:"Alexis Gillain"

Re: how to handle OOMError from groupByKey

2015-09-28 Thread Alexis Gillain

box/%3ccaae1cqr8rd8ypebcmbjwfhm+lxh6nw4+r+uharx00psk_sh...@mail.gmail.com%3E >>> >>> http://apache-spark-user-list.1001560.n3.nabble.com/Partition-sorting-by-Spark-framework-td18213.html >>> >>> http://apache-spark-user-list.1001560.n3.nabble.com/Alternatives-to-groupByKey-td20293.html >>> >>> And this Jira seems relevant too: >>> https://issues.apache.org/jira/browse/SPARK-3655 >>> >>> The amount of memory that I'm using is 2g per executor, and I can't go >>> higher than that because each executor gets a YARN container from nodes >>> with 16 GB of RAM and 5 YARN containers allowed per node. >>> >>> So I'd like to know if there's an easy solution to executing my logic on >>> my full dataset in Spark. >>> >>> Thanks! >>> >>> -- Elango >>> >> >> > -- Alexis GILLAIN

Re: How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-25 Thread Alexis Gillain

:12 ERROR yarn.ApplicationMaster: Failed to connect to > driver at 127.0.0.1:35706, retrying ... > 15/09/25 19:07:12 ERROR yarn.ApplicationMaster: Failed to connect to > driver at 127.0.0.1:35706, retrying ... > 15/09/25 19:07:12 ERROR yarn.ApplicationMaster: Failed to connect to > driver at 127.0.0.1:35706, retrying ... > > I shall sincerely appreciate your kind help very much! > Zhiliang > > > > > > > -- Alexis GILLAIN

Re: Troubleshooting "Task not serializable" in Spark/Scala environments

2015-09-21 Thread Alexis Gillain

ext.scala:1893) >> org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311) >> org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310) >> >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) >> >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) >> org.apache.spark.rdd.RDD.withScope(RDD.scala:286) >> org.apache.spark.rdd.RDD.filter(RDD.scala:310) >> cmd6$$user$$anonfun$3.apply(Main.scala:134) >> cmd6$$user$$anonfun$3.apply(Main.scala:133) >> >> Thanks, >> Balaji >> > -- Alexis GILLAIN

Re: Spark wastes a lot of space (tmp data) for iterative jobs

2015-09-16 Thread Alexis Gillain

ory space), GC does not run; therefore the finalize() methods for > the intermediate RDDs are not triggered. > > > 2. System.gc() is only executed on the driver, but not on the workers (Is > it how it works??!!) > > Any suggestions? > > Kind regards > Ali Hadian > >

Re: Spark wastes a lot of space (tmp data) for iterative jobs

2015-09-15 Thread Alexis Gillain

e data from the previous iteration. Anyways, why does it keep > the intermediate data for ALL previous iterations??? > How can we enforce Spark to clear these intermediate data *during* the > execution of job? > > Kind regards, > Ali hadian > > -- Alexis GILLAIN

Re: Spark aggregateByKey Issues

2015-09-15 Thread Alexis Gillain

How to set the decent number of partition, if it need not to be equal to > the number of keys ? > > 在 2015年9月15日，下午3:41，Alexis Gillain 写道： > > Sorry I made a typo error in my previous message, you can't > sortByKey(youkey,date) and have all records of your keys in the same

Re: Spark aggregateByKey Issues

2015-09-14 Thread Alexis Gillain

Function > do nothing! > > I tried to take the second “numPartitions” parameter, pass the number of > key to it. But, the number of key is so large to all the tasks be killed. > > > What should I do with this case ? > > I'm asking for advises online... > > Thank you. > -- Alexis GILLAIN

Re: Multilabel classification support

2015-09-11 Thread Alexis Gillain

:00 Yanbo Liang : > LogisticRegression in MLlib(not ML) package supports both multiclass and > multilabel classification. > > > 2015-09-11 16:21 GMT+08:00 Alexis Gillain : > >> You can try these packages for adaboost.mh : >> >> https://github.com/BaiGang/spark_m

Re: Multilabel classification support

2015-09-11 Thread Alexis Gillain

/docs/latest/mllib-classification-regression.html, > it is not what I mean. Is there a way to use multilabel classification? > Thanks alot. > > Best, > yasemin > > -- > hiç ender hiç > -- Alexis GILLAIN

Re: Filtering an rdd depending upon a list of values in Spark

2015-09-09 Thread Alexis Gillain

; >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Filtering-an-rdd-depending-upon-a-list-of-values-in-Spark-tp24631.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > -- Alexis GILLAIN

Re: Memory-efficient successive calls to repartition()

2015-09-02 Thread alexis GILLAIN

d > line browser to look at the webui (I cannot access the server in graphical > display mode), this should help me understand what's going on. I will also > try the workarounds mentioned in the link. Keep you posted. > > Again, thanks a lot! > > Best, > > Aurelien

Re: Memory-efficient successive calls to repartition()

2015-09-02 Thread alexis GILLAIN

t; Cloudera Manager) *besides* the checkpoint files (which are regular HDFS > files), and the application eventually runs out of disk space. The same is > true even if I checkpoint at every iteration. > > What am I doing wrong? Maybe some garbage collector setting? > > Thanks a lot for

Re: MLlib Prefixspan implementation

2015-08-26 Thread alexis GILLAIN

GMT+08:00 Feynman Liang : > CCing the mailing list again. > > It's currently not on the radar. Do you have a use case for it? I can > bring it up during 1.6 roadmap planning tomorrow. > > On Mon, Aug 24, 2015 at 8:28 PM, alexis GILLAIN > wrote: > >> Hi, >>

Re: Memory-efficient successive calls to repartition()

2015-08-24 Thread alexis GILLAIN

Hi Aurelien, The first code should create a new RDD in memory at each iteration (check the webui). The second code will unpersist the RDD but that's not the main problem. I think you have trouble due to long lineage as .cache() keep track of lineage for recovery. You should have a look at checkpo

Re: Memory-efficient successive calls to repartition()

2015-08-23 Thread Alexis Gillain

park-user-list.1001560.n3.nabble.com/Memory-efficient-successive-calls-to-repartition-tp24358.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Alexis GILLAIN

MLlib Prefixspan implementation

2015-08-20 Thread alexis GILLAIN

I want to use prefixspan so I had a look at the code and the cited paper : "Distributed PrefixSpan Algorithm Based on MapReduce". There is a result in the paper I didn't really undertstand and I could'nt find where it is used in the code. Suppose a sequence database S = {1,2...n}, a sequence

MLlib Prefixspan implementation

2015-08-20 Thread Alexis Gillain

d Algorithm for Sequential Pattern Mining Based on PrefixSpan [J]. Computer Engineering, 2009, 35(23): 56-61" but it didn't help me to understand it and how it can improve the algorithm. -- Alexis GILLAIN

Re: serialization stakeoverflow error during reduce on nested objects

2015-03-14 Thread alexis GILLAIN

I haven't register my class in kryo but I dont think it would have such an impact on the stack size. I'm thinking of using graphx and I'm wondering how it serializes the graph object as it can use kryo as serializer. 2015-03-14 6:22 GMT+01:00 Ted Yu : > Have you registered your class with kryo ?

Re: how to handle OOMError from groupByKey

Re: How to properly set conf/spark-env.sh for spark to run on yarn

Re: Troubleshooting "Task not serializable" in Spark/Scala environments

Re: Spark wastes a lot of space (tmp data) for iterative jobs

Re: Spark wastes a lot of space (tmp data) for iterative jobs

Re: Spark aggregateByKey Issues

Re: Spark aggregateByKey Issues

Re: Multilabel classification support

Re: Multilabel classification support

Re: Filtering an rdd depending upon a list of values in Spark

Re: Memory-efficient successive calls to repartition()

Re: Memory-efficient successive calls to repartition()

Re: MLlib Prefixspan implementation

Re: Memory-efficient successive calls to repartition()

Re: Memory-efficient successive calls to repartition()

MLlib Prefixspan implementation

MLlib Prefixspan implementation

Re: serialization stakeoverflow error during reduce on nested objects

18 matches

Site Navigation

Mail list logo

Footer information