from:"qinwei"

Re: IP to geo information in spark streaming application

2014-12-02 Thread qinwei

1) I think using library based solution is a better idea, we used that, and it works.2) We used broadcast variable, and it works qinwei From: Noam KfirDate: 2014-12-02 23:14To: user@spark.apache.orgSubject: IP to geo information in spark streaming application Hi I'm n

Re: Joined RDD

2014-11-12 Thread qinwei

ng to the log, it is stored qinwei From: ajay gargDate: 2014-11-13 14:56To: userSubject: Joined RDDHi, I have two RDDs A and B which are created from reading file from HDFS. I have a third RDD C which is created by taking join of A and B. All three RDDs (A, B and C ) are not cached. Now if I per

why flatmap has shuffle

2014-11-12 Thread qinwei

gt; arg.asInstanceOf[(String, String, Long, String, String)]).filter(arg => arg._3 != 0)val flatMappedRDD = rdd.flatMap(arg => List((arg._1, (arg._2, arg._3, 1)), (arg._2, (arg._1, arg._3, 0 Thank for your help! qinwei

Re: Pass RDD to functions

2014-11-12 Thread qinwei

I think it‘s ok，feel free to treat RDD like common object qinwei From: Deep PradhanDate: 2014-11-12 18:24To: user@spark.apache.orgSubject: Pass RDD to functionsHi, Can we pass RDD to functions?Like, can we do the following? def func (temp: RDD[String]):RDD[String] = {//body of the

Re: Re: about write mongodb in mapPartitions

2014-11-09 Thread qinwei

Thanks for your reply! As you mentioned , the insert clause is not executed as the results of args.map are never used anywhere, and after i modified the code , it works. qinwei From: Tobias PfeifferDate: 2014-11-07 18:04To: qinweiCC: userSubject: Re: about write mongodb in

Re: Re: about write mongodb in mapPartitions

2014-11-09 Thread qinwei

Thanks for your reply！ According to your hint, the code should be like this: // i want to save data in rdd to mongodb and hdfs rdd.saveAsNewAPIHadoopFile() rdd.saveAsTextFile() but will the application read hdfs twice? qinwei From: Akhil DasDate: 2014-11-07

about write mongodb in mapPartitions

2014-11-07 Thread qinwei

re someting wrong? I know that collecting the newRDD to driver and then saving it to mongodb will success, but will the following saveAsTextFile read the filesystem once again? Thanks qinwei

about aggregateByKey and standard deviation

2014-10-31 Thread qinwei

the moment, i have done that by using groupByKey and map, I notice that groupByKey is very expensive, but i can not figure out how to do it by using aggregateByKey, so i wonder is there any better way to do this? Thanks! qinwei

Re: Re: problem with patitioning

2014-09-28 Thread qinwei

Thank you for your reply, and your tips on code refactoring is helpful, after a second look on the code, the casts and null check is really unnecessary. qinwei From: Sean OwenDate: 2014-09-28 15:03To: qinweiCC: userSubject: Re: problem with patitioning(Most of this code is not relevant

回复: RE: problem with data locality api

2014-09-27 Thread qinwei

your reply! qinwei ?发件人：?Shao, Saisai发送时间：?2014-09-28?14:42收件人：?qinwei抄送：?user主题：?RE: problem with data locality api Hi ? First conf is used for Hadoop to determine the locality distribution of HDFS file. Second conf is used for Spark, though with the same name, actually they are two

problem with patitioning

2014-09-27 Thread qinwei

nd path3, i want to change the number of patition by the code below: val rdd1 = sc.textFile(path1, 1920) val rdd2 = sc.textFile(path2, 1920) val rdd3 = sc.textFile(path3, 1920) by doing this, i expect there are 1920 tasks totally, but i found the number of tasks becomes 8920, any idea what's going on here? Thanks! qinwei

problem with data locality api

2014-09-27 Thread qinwei

”)))?? ??? ??val sc = new SparkContext(conf, locData)? ? but i found the two confs above are of different types, conf in the first line if of type?org.apache.hadoop.conf.Configuration, and conf in the second line is of type SparkConf, ?can anyone explain that to me or give me some example code?? ?? qinwei

Re: How to use multi thread in RDD map function ?

2014-09-27 Thread qinwei

in the options of spark-submit, there are two options which may be helpful to your problem, they are "--total-executor-cores NUM"(standalone and mesos only), "--executor-cores"(yarn only) qinwei From: myasukaDate: 2014-09-28 11:44To: userSubject: How to use mult

Re: Re: Problem with the Item-Based Collaborative Filtering Recommendation Algorithms in spark

2014-04-27 Thread qinwei

Thanks a lot for your reply, it gave me much inspiration. qinwei From: Sean OwenDate: 2014-04-25 14:10To: userSubject: Re: Problem with the Item-Based Collaborative Filtering Recommendation Algorithms in sparkSo you are computing all-pairs similarity over 20M users? This going to take

Re: Re: what is the best way to do cartesian

2014-04-27 Thread qinwei

Thanks a lot for your reply, but i have tried the built-in RDD.cartesian() method before, it didn't make it faster. qinwei From: Alex BoisvertDate: 2014-04-26 00:32To: userSubject: Re: what is the best way to do cartesianYou might want to try the built-in RDD.cartesian() method.

Re: Re: how to set spark.executor.memory and heap size

2014-04-23 Thread qinwei

try the complete path qinwei From: wxhsdpDate: 2014-04-24 14:21To: userSubject: Re: how to set spark.executor.memory and heap sizethank you, i add setJars, but nothing changes val conf = new SparkConf() .setMaster("spark://127.0.0.1:7077") .setAppName(&

Re: IP to geo information in spark streaming application

Re: Joined RDD

why flatmap has shuffle

Re: Pass RDD to functions

Re: Re: about write mongodb in mapPartitions

Re: Re: about write mongodb in mapPartitions

about write mongodb in mapPartitions

about aggregateByKey and standard deviation

Re: Re: problem with patitioning

回复: RE: problem with data locality api

problem with patitioning

problem with data locality api

Re: How to use multi thread in RDD map function ?

Re: Re: Problem with the Item-Based Collaborative Filtering Recommendation Algorithms in spark

Re: Re: what is the best way to do cartesian

Re: Re: how to set spark.executor.memory and heap size

16 matches

Site Navigation

Mail list logo

Footer information