hello, I ‘m using spark 1.4.2-SNAPSHOT
I ‘m running in yarn mode:-)
I wonder if the spark.shuffle.memoryFraction or spark.shuffle.manager work?
how to set these parameters...
> 在 2015年7月1日,上午1:32,Ted Yu 写道:
>
> Which Spark release are you using ?
>
> Are you running in standalone mode ?
>
> Ch
for these data points. What then?
>
> Also would you care to bring this to the user@ list? it's kind of interesting.
>
> On Thu, Feb 26, 2015 at 2:02 PM, lisendong wrote:
>> I set the score of ‘0’ interaction user-item pair to 0.0
>> the code is as following:
>&
I’m using ALS with spark 1.0.0, the code should be:
https://github.com/apache/spark/blob/branch-1.0/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
I think the following two method should produce the same (or near) result:
MatrixFactorizationModel model = ALS.train(ratings.r
I 'm using spark als.
I set the iteration number to 30.
And in each iteration, tasks will produce nearly 1TB shuffle write.
To my surprise, this shuffle data will not be cleaned until the total job
finished, which means, I need 30TB disk to store the shuffle data.
I think after each iteration,
in ALS, I guess all the iteration’s rdds are referenced by its next
iteration’s rdd, so all the shuffle data will not be deleted until the als job
finished…
I guess checkpoint could solve my problem, do you know checkpoint?
> 在 2015年3月3日,下午4:18,nitin [via Apache Spark User List]
> 写道:
>
> S
why does the gc time so long?
i 'm using als in mllib, while the garbage collection time is too long
(about 1/3 of total time)
I have tried some measures in the "tunning spark guide", and try to set the
new generation memory, but it still does not work...
Tasks
Task Index Task ID Stat
As long as I set the "spark.local.dir" to multiple disks, the job will
failed, the errors are as follow:
(if I set the spark.local.dir to only 1 dir, the job will succed...)
Exception in thread "main" org.apache.spark.SparkException: Job cancelled
because SparkContext was shut down
at
org.
15/03/04 09:26:36 INFO ClientCnxn: Client session timed out, have not heard
from server in 26679ms for sessionid 0x34bbf3313a8001b, closing socket
connection and attempting reconnect
15/03/04 09:26:36 INFO ConnectionStateManager: State change: SUSPENDED
15/03/04 09:26:36 INFO ZooKeeperLeaderElectio
I 'm using spark1.0.0 with cloudera.
but I want to use new als code which supports more features, such as rdd
cache level(MEMORY ONLY), checkpoint, and so on.
What is the easiest way to use the new als code?
I only need the mllib als code, so maybe I don't need to update all the
spark & mllib o
I ‘m sorry, but how to look at the mesos logs?
where are them?
> 在 2015年3月4日,下午6:06,Akhil Das 写道:
>
> You can check in the mesos logs and see whats really happening.
>
> Thanks
> Best Regards
>
> On Wed, Mar 4, 2015 at 3:10 PM, lisendong <mailto:lisend...@163.co
I found my task takes so long time for YoungGen GC, I set the young gen size
to about 1.5G, I wonder why it takes so long time?
not all the tasks take such long time, only about 1% tasks so long...
180.426: [GC [PSYoungGen: 9916105K->1676785K(14256640K)]
26201020K->18690057K(53403648K), 17.358150
you see, the core of ALS 1.0.0 is the following code:
there should be flatMap and groupByKey when running ALS iterations , right?
but when I run als iteration, there are ONLY flatMap tasks...
do you know why?
private def updateFeatures(
products: RDD[(Int, Array[Arr
; 在 2015年3月31日,上午12:11,Xiangrui Meng 写道:
>
> setCheckpointInterval was added in the current master and branch-1.3. Please
> help check whether it works. It will be included in the 1.3.1 and 1.4.0
> release. -Xiangrui
>
> On Mon, Mar 30, 2015 at 7:27 AM, lisendong <mailto:l
System.runFinalization()
> while (weakRef.get != null) {
> System.gc()
> System.runFinalization()
> Thread.sleep(200)
> if (System.currentTimeMillis - startTime > 1) {
> throw new Exception("automatically cleanup error")
> }
> }
> }
(weakRef.get != null) {
> System.gc()
> System.runFinalization()
> Thread.sleep(200)
> if (System.currentTimeMillis - startTime > 1) {
> throw new Exception("automatically cleanup error")
> }
> }
> }
>
>
> --
; checkpoint. Is it correct?
>
> Best,
> Xiangrui
>
> On Tue, Mar 31, 2015 at 8:58 AM, lisendong <mailto:lisend...@163.com>> wrote:
> guoqiang ’s method works very well …
>
> it only takes 1TB disk now.
>
> thank you very much!
>
>
>
>>
ferring to the initialization, not the result, right? It's possible
> that the resulting weight vectors are sparse although this looks surprising
> to me. But it is not related to the initial state, right?
>
> On Thu, Apr 2, 2015 at 10:43 AM, lisendong <mailto:lisend...@163.com>&g
alization, not the result, right? It's possible
> that the resulting weight vectors are sparse although this looks surprising
> to me. But it is not related to the initial state, right?
>
> On Thu, Apr 2, 2015 at 10:43 AM, lisendong <mailto:lisend...@163.com>> wrote
yes!
thank you very much:-)
> 在 2015年4月2日,下午7:13,Sean Owen 写道:
>
> Right, I asked because in your original message, you were looking at
> the initialization to a random vector. But that is the initial state,
> not final state.
>
> On Thu, Apr 2, 2015 at 11:51 AM, lisendo
the pseudo code :
object myApp {
var myStaticRDD: RDD[Int]
def main() {
... //init streaming context, and get two DStream (streamA and streamB)
from two hdfs path
//complex transformation using the two DStream
val new_stream = streamA.transformWith(StreamB, (a, b, t) => {
a.join(
I have one hdfs dir, which contains many files:
/user/root/1.txt
/user/root/2.txt
/user/root/3.txt
/user/root/4.txt
and there is a daemon process which add one file per minute to this dir.
(e.g., 5.txt, 6.txt, 7.txt...)
I want to start a spark streaming job which load 3.txt, 4.txt and then
det
but in fact the directories are not ready at the beginning to my task .
for example:
/user/root/2015/05/11/data.txt
/user/root/2015/05/12/data.txt
/user/root/2015/05/13/data.txt
like this.
and one new directory one day.
how to create the new DStream for tomorrow’s new
directory(/user/root/20
reduce/LzoTextInputFormat.java>
> the class. You can read more here
> https://github.com/twitter/hadoop-lzo#maven-repository
> <https://github.com/twitter/hadoop-lzo#maven-repository>
>
> Thanks
> Best Regards
>
> On Thu, May 14, 2015 at 1:22 PM, lisendong &
23 matches
Mail list logo