For most programmers, dataFrames are preferred thanks to the flexibility,
but using sql syntax is a great option for users who feel more comfortable
using SQL. : )
2015-10-16 18:22 GMT-07:00 Ali Tajeldin EDU :
> Since DF2 only has the userID, I'm assuming you are musing DF2 to filter
> for desire
Since DF2 only has the userID, I'm assuming you are musing DF2 to filter for
desired userIDs.
You can just use the join() and groupBy operations on DataFrame to do what you
desire. For example:
scala> val df1=app.createDF("id:String; v:Integer", "X,1;X,2;Y,3;Y,4;Z,10")
df1: org.apache.spark.sql
Hi, Frank,
After registering these DF as a temp table (via the API registerTempTable),
you can do it using SQL. I believe this should be much easier.
Good luck,
Xiao Li
2015-10-16 12:10 GMT-07:00 ChengBo :
> Hi all,
>
>
>
> I am new in Spark, and I have a question in dealing with RDD.
>
> I’ve
Hi all,
I am new in Spark, and I have a question in dealing with RDD.
I've converted RDD to DataFrame. So there are two DF: DF1 and DF2
DF1 contains: userID, time, dataUsage, duration
DF2 contains: userID
Each userID has multiple rows in DF1.
DF2 has distinct userID, and I would like to compute t