1) I think using library based solution is a better idea, we used that, and it
works.2) We used broadcast variable, and it works
qinwei
From: Noam KfirDate: 2014-12-02 23:14To: user@spark.apache.orgSubject: IP to
geo information in spark streaming application
Hi
I'm n
ng to the log, it is stored
qinwei
From: ajay gargDate: 2014-11-13 14:56To: userSubject: Joined RDDHi,
I have two RDDs A and B which are created from reading file from HDFS.
I have a third RDD C which is created by taking join of A and B. All three
RDDs (A, B and C ) are not cached.
Now if I per
gt;
arg.asInstanceOf[(String, String, Long, String, String)]).filter(arg => arg._3
!= 0)val flatMappedRDD = rdd.flatMap(arg => List((arg._1, (arg._2, arg._3, 1)),
(arg._2, (arg._1, arg._3, 0
Thank for your help!
qinwei
I think it‘s ok,feel free to treat RDD like common object
qinwei
From: Deep PradhanDate: 2014-11-12 18:24To: user@spark.apache.orgSubject: Pass
RDD to functionsHi, Can we pass RDD to functions?Like, can we do the following?
def func (temp: RDD[String]):RDD[String] = {//body of the
Thanks for your reply! As you mentioned , the insert clause is not executed as
the results of args.map are never used anywhere, and after i modified the code
, it works.
qinwei
From: Tobias PfeifferDate: 2014-11-07 18:04To: qinweiCC: userSubject: Re:
about write mongodb in
Thanks for your reply! According to your hint, the code should be like this:
// i want to save data in rdd to mongodb and hdfs
rdd.saveAsNewAPIHadoopFile() rdd.saveAsTextFile()
but will the application read hdfs twice?
qinwei
From: Akhil DasDate: 2014-11-07
re someting
wrong? I know that collecting the newRDD to driver and then saving it to
mongodb will success, but will the following saveAsTextFile read the filesystem
once again?
Thanks
qinwei
the moment, i have done that by
using groupByKey and map, I notice that groupByKey is very expensive, but i
can not figure out how to do it by using aggregateByKey, so i wonder is there
any better way to do this?
Thanks!
qinwei
Thank you for your reply, and your tips on code refactoring is helpful, after a
second look on the code, the casts and null check is really unnecessary.
qinwei
From: Sean OwenDate: 2014-09-28 15:03To: qinweiCC: userSubject: Re: problem
with patitioning(Most of this code is not relevant
your reply!
qinwei
?发件人:?Shao, Saisai发送时间:?2014-09-28?14:42收件人:?qinwei抄送:?user主题:?RE: problem with
data locality api
Hi
?
First conf is used for Hadoop to determine the locality distribution of HDFS
file. Second conf is used for Spark, though with the same name, actually
they are two
nd path3, i want to change the number of patition by
the code below: val rdd1 = sc.textFile(path1, 1920)
val rdd2 = sc.textFile(path2, 1920)
val rdd3 = sc.textFile(path3, 1920)
by doing this, i expect there are 1920 tasks totally, but i found the
number of tasks becomes 8920, any idea what's going on here?
Thanks!
qinwei
”)))?? ???
??val sc = new SparkContext(conf, locData)? ? but i found the two confs above
are of different types, conf in the first line if of
type?org.apache.hadoop.conf.Configuration, and conf in the second line is of
type SparkConf, ?can anyone explain that to me or give me some example code?? ??
qinwei
in the options of spark-submit, there are two options which may be helpful to
your problem, they are "--total-executor-cores NUM"(standalone and mesos only),
"--executor-cores"(yarn only)
qinwei
From: myasukaDate: 2014-09-28 11:44To: userSubject: How to use mult
Thanks a lot for your reply, it gave me much inspiration.
qinwei
From: Sean OwenDate: 2014-04-25 14:10To: userSubject: Re: Problem with the
Item-Based Collaborative Filtering Recommendation Algorithms in sparkSo you are
computing all-pairs similarity over 20M users?
This going to take
Thanks a lot for your reply, but i have tried the built-in RDD.cartesian()
method before, it didn't make it faster.
qinwei
From: Alex BoisvertDate: 2014-04-26 00:32To: userSubject: Re: what is the best
way to do cartesianYou might want to try the built-in RDD.cartesian() method.
try the complete path
qinwei
From: wxhsdpDate: 2014-04-24 14:21To: userSubject: Re: how to set
spark.executor.memory and heap sizethank you, i add setJars, but nothing changes
val conf = new SparkConf()
.setMaster("spark://127.0.0.1:7077")
.setAppName(&
16 matches
Mail list logo