it is ok but . I want to categorize the urls by sessions actually.
*DATA:* (sorted by time)
*(userid1_time, url1) *
*(userid1_time2, url2)*
*(userid1_time3, url3) *
*(userid1_time4, url4)*
*RESULT: *
*url1 *already added to* session1*
*time2-time1 < 30 min *so* url2 *go to* session1*
*time3-time2
map{case(x, y) => s = x.split("_"), (s(0), (s(1),
y)))}.groupByKey().filter{case (_, (a, b)) => abs(a._1, a._1) < 30min}
does it work for you ?
2015-12-25 16:53 GMT+08:00 Yasemin Kaya :
> hi,
>
> I have struggled this data couple of days, i cant find solution. Could you
> help me?
>
> *DATA:*
>