update resource when running spark

2015-05-07 Thread Hoai-Thu Vuong
Hi all, I use a function to create or return context for spark application, in this function I load some resources from text file to a list. My question is how to update a list?

Re: Pyspark where do third parties libraries need to be installed under Yarn-client mode

2015-04-24 Thread Hoai-Thu Vuong
I use sudo pip install ... for each machine in cluster. And don't think how submit library On Fri, Apr 24, 2015 at 4:21 AM dusts66 wrote: > I am trying to figure out python library management. So my question is: > Where do third party Python libraries(ex. numpy, scipy, etc.) need to exist > if

Re: Understanding Spark/MLlib failures

2015-04-24 Thread Hoai-Thu Vuong
Hi Andrew, according to you we should balance the time when gc run and the batch time, which rdd is processed? On Fri, Apr 24, 2015 at 6:58 AM Reza Zadeh wrote: > Hi Andrew, > > The .principalComponents feature of RowMatrix is currently constrained to > tall and skinny matrices. Your matrix is b

Re: about window length and slide interval

2015-01-14 Thread Hoai-Thu Vuong
you want. You > get non-overlapping RDDs. > > On Wed, Jan 14, 2015 at 9:26 AM, Hoai-Thu Vuong wrote: > > I would like to counting value in non overlap window, so that I think, I > can > > do it with same value of window length and slide interval, (note that, > this > >

Re: about window length and slide interval

2015-01-14 Thread Hoai-Thu Vuong
gt; > — > FG > > > On Wed, Jan 14, 2015 at 9:58 AM, Hoai-Thu Vuong wrote: > >> Could we run spark streaming and reducebywindow with same window length >> and slide interval? >> >

about window length and slide interval

2015-01-14 Thread Hoai-Thu Vuong
Could we run spark streaming and reducebywindow with same window length and slide interval?

word count aggregation

2014-12-29 Thread Hoai-Thu Vuong
dear user of spark I've got a program, streaming a folder, when a new file is created in this folder, I count a word, which appears in this document and update it (I used StatefulNetworkWordCount to do it). And it work like charm. However, I would like to know the different of top 10 word at now a

Re: DStream cannot write to text file

2014-08-21 Thread Hoai-Thu Vuong
50075 is default port for web access the right port is 9000 or some thing is configured in core-site.xml with variable: fs.default.name. please check the document On Thu, Aug 21, 2014 at 3:01 PM, Mayur Rustagi wrote: > is your hdfs running, can spark access it? > > Mayur Rustagi > Ph: +1 (760)

Re: How to save mllib model to hdfs and reload it

2014-08-14 Thread Hoai-Thu Vuong
A man in this community give me a video: https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question in this community and other guys helped me to solve this problem. I'm trying to load MatrixFactorizationModel from object file, but compiler said that, I can not create object because the

Re: how to use the method saveAsTextFile of a RDD like javaRDD

2014-08-14 Thread Hoai-Thu Vuong
I've found a method saveAsObjectFile in RDD (or JavaRDD). I think we can save this array to file and load back to object when read these file. However, I've known the way to load back and cast RDD to specific object, need time to try. On Thu, Aug 14, 2014 at 3:48 PM, Gefei Li wrote: > Thank you

Re: training recsys model

2014-08-13 Thread Hoai-Thu Vuong
movie-recommendation-with-mllib.html > -Xiangrui > > On Tue, Aug 12, 2014 at 8:01 PM, Hoai-Thu Vuong wrote: > > In MLLib, I found the method to train matrix factorization model to > predict > > the taste of user. In this function, there are some parameters such as > > lambd

training recsys model

2014-08-12 Thread Hoai-Thu Vuong
In MLLib, I found the method to train matrix factorization model to predict the taste of user. In this function, there are some parameters such as lambda, and rank, I can not find the best value to set these parameters and how to optimize this value. Could you please give me some recommends? -- T

Re: Mllib : Save SVM model to disk

2014-08-12 Thread Hoai-Thu Vuong
you should try watching this video https://www.youtube.com/watch?v=sPhyePwo7FA, for more details, please search in the archives, I've got a same kind of question and other guys helped me to solve the problem. On Tue, Aug 12, 2014 at 12:36 PM, XiaoQinyu wrote: > Have you solved this problem?? > >

about spark and using machine learning model

2014-08-04 Thread Hoai-Thu Vuong
Hello everybody! I'm getting started with spark and mllib. I'm successful in building a small cluster and follow the tutorial. However, I would like to ask about how to use the model, which is trained by mllib. I understand that, with data we can training the model such as Classifier model, then u