Re: Problem of RDD in calculation

2015-10-16 Thread Xiao Li
For most programmers, dataFrames are preferred thanks to the flexibility, but using sql syntax is a great option for users who feel more comfortable using SQL. : ) 2015-10-16 18:22 GMT-07:00 Ali Tajeldin EDU : > Since DF2 only has the userID, I'm assuming you are musing DF2 to filter > for desire

Re: Problem of RDD in calculation

2015-10-16 Thread Ali Tajeldin EDU
Since DF2 only has the userID, I'm assuming you are musing DF2 to filter for desired userIDs. You can just use the join() and groupBy operations on DataFrame to do what you desire. For example: scala> val df1=app.createDF("id:String; v:Integer", "X,1;X,2;Y,3;Y,4;Z,10") df1: org.apache.spark.sql

Re: Problem of RDD in calculation

2015-10-16 Thread Xiao Li
Hi, Frank, After registering these DF as a temp table (via the API registerTempTable), you can do it using SQL. I believe this should be much easier. Good luck, Xiao Li 2015-10-16 12:10 GMT-07:00 ChengBo : > Hi all, > > > > I am new in Spark, and I have a question in dealing with RDD. > > I’ve

Problem of RDD in calculation

2015-10-16 Thread ChengBo
Hi all, I am new in Spark, and I have a question in dealing with RDD. I've converted RDD to DataFrame. So there are two DF: DF1 and DF2 DF1 contains: userID, time, dataUsage, duration DF2 contains: userID Each userID has multiple rows in DF1. DF2 has distinct userID, and I would like to compute t