box/%3ccaae1cqr8rd8ypebcmbjwfhm+lxh6nw4+r+uharx00psk_sh...@mail.gmail.com%3E
>>>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Partition-sorting-by-Spark-framework-td18213.html
>>>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Alternatives-to-groupByKey-td20293.html
>>>
>>> And this Jira seems relevant too:
>>> https://issues.apache.org/jira/browse/SPARK-3655
>>>
>>> The amount of memory that I'm using is 2g per executor, and I can't go
>>> higher than that because each executor gets a YARN container from nodes
>>> with 16 GB of RAM and 5 YARN containers allowed per node.
>>>
>>> So I'd like to know if there's an easy solution to executing my logic on
>>> my full dataset in Spark.
>>>
>>> Thanks!
>>>
>>> -- Elango
>>>
>>
>>
>
--
Alexis GILLAIN
:12 ERROR yarn.ApplicationMaster: Failed to connect to
> driver at 127.0.0.1:35706, retrying ...
> 15/09/25 19:07:12 ERROR yarn.ApplicationMaster: Failed to connect to
> driver at 127.0.0.1:35706, retrying ...
> 15/09/25 19:07:12 ERROR yarn.ApplicationMaster: Failed to connect to
> driver at 127.0.0.1:35706, retrying ...
>
> I shall sincerely appreciate your kind help very much!
> Zhiliang
>
>
>
>
>
>
>
--
Alexis GILLAIN
ext.scala:1893)
>> org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311)
>> org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310)
>>
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>>
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>> org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>> org.apache.spark.rdd.RDD.filter(RDD.scala:310)
>> cmd6$$user$$anonfun$3.apply(Main.scala:134)
>> cmd6$$user$$anonfun$3.apply(Main.scala:133)
>>
>> Thanks,
>> Balaji
>>
>
--
Alexis GILLAIN
ory space), GC does not run; therefore the finalize() methods for
> the intermediate RDDs are not triggered.
>
>
> 2. System.gc() is only executed on the driver, but not on the workers (Is
> it how it works??!!)
>
> Any suggestions?
>
> Kind regards
> Ali Hadian
>
>
e data from the previous iteration. Anyways, why does it keep
> the intermediate data for ALL previous iterations???
> How can we enforce Spark to clear these intermediate data *during* the
> execution of job?
>
> Kind regards,
> Ali hadian
>
>
--
Alexis GILLAIN
How to set the decent number of partition, if it need not to be equal to
> the number of keys ?
>
> 在 2015年9月15日,下午3:41,Alexis Gillain 写道:
>
> Sorry I made a typo error in my previous message, you can't
> sortByKey(youkey,date) and have all records of your keys in the same
Function
> do nothing!
>
> I tried to take the second “numPartitions” parameter, pass the number of
> key to it. But, the number of key is so large to all the tasks be killed.
>
>
> What should I do with this case ?
>
> I'm asking for advises online...
>
> Thank you.
>
--
Alexis GILLAIN
:00 Yanbo Liang :
> LogisticRegression in MLlib(not ML) package supports both multiclass and
> multilabel classification.
>
>
> 2015-09-11 16:21 GMT+08:00 Alexis Gillain :
>
>> You can try these packages for adaboost.mh :
>>
>> https://github.com/BaiGang/spark_m
/docs/latest/mllib-classification-regression.html,
> it is not what I mean. Is there a way to use multilabel classification?
> Thanks alot.
>
> Best,
> yasemin
>
> --
> hiç ender hiç
>
--
Alexis GILLAIN
;
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Filtering-an-rdd-depending-upon-a-list-of-values-in-Spark-tp24631.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
--
Alexis GILLAIN
d
> line browser to look at the webui (I cannot access the server in graphical
> display mode), this should help me understand what's going on. I will also
> try the workarounds mentioned in the link. Keep you posted.
>
> Again, thanks a lot!
>
> Best,
>
> Aurelien
t; Cloudera Manager) *besides* the checkpoint files (which are regular HDFS
> files), and the application eventually runs out of disk space. The same is
> true even if I checkpoint at every iteration.
>
> What am I doing wrong? Maybe some garbage collector setting?
>
> Thanks a lot for
GMT+08:00 Feynman Liang :
> CCing the mailing list again.
>
> It's currently not on the radar. Do you have a use case for it? I can
> bring it up during 1.6 roadmap planning tomorrow.
>
> On Mon, Aug 24, 2015 at 8:28 PM, alexis GILLAIN
> wrote:
>
>> Hi,
>>
Hi Aurelien,
The first code should create a new RDD in memory at each iteration (check
the webui).
The second code will unpersist the RDD but that's not the main problem.
I think you have trouble due to long lineage as .cache() keep track of
lineage for recovery.
You should have a look at checkpo
park-user-list.1001560.n3.nabble.com/Memory-efficient-successive-calls-to-repartition-tp24358.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Alexis GILLAIN
I want to use prefixspan so I had a look at the code and the cited paper :
"Distributed PrefixSpan Algorithm Based on MapReduce".
There is a result in the paper I didn't really undertstand and I could'nt
find where it is used in the code.
Suppose a sequence database S = {1,2...n}, a sequence
d Algorithm for Sequential Pattern Mining Based on PrefixSpan [J].
Computer Engineering, 2009, 35(23): 56-61" but it didn't help me to
understand it and how it can improve the algorithm.
--
Alexis GILLAIN
I haven't register my class in kryo but I dont think it would have such an
impact on the stack size.
I'm thinking of using graphx and I'm wondering how it serializes the graph
object as it can use kryo as serializer.
2015-03-14 6:22 GMT+01:00 Ted Yu :
> Have you registered your class with kryo ?
18 matches
Mail list logo