Hi Junfeng, You should be able to do this with window aggregation functions lead or lag https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-functions.html#lead
Thanks, Dev On Mon, Aug 27, 2018 at 7:08 AM JF Chen <darou...@gmail.com> wrote: > Thanks Sonal. > For example, I have data as following: > login 2018/8/27 10:00 > logout 2018/8/27 10:05 > login 2018/8/27 10:08 > logout 2018/8/27 10:15 > login 2018/8/27 11:08 > logout 2018/8/27 11:32 > > Now I want to calculate the time between each login and logout. For > example, I should get 5 min, 7 min, 24 min from the above sample data. > I know I can calculate it with foreach, but it seems all data running on > spark driver node rather than multi executors. > So any good way to solve this problem? Thanks! > > Regard, > Junfeng Chen > > > On Thu, Aug 23, 2018 at 6:15 PM Sonal Goyal <sonalgoy...@gmail.com> wrote: > >> Hi Junfeng, >> >> Can you please show by means of an example what you are trying to >> achieve? >> >> Thanks, >> Sonal >> Nube Technologies <http://www.nubetech.co> >> >> <http://in.linkedin.com/in/sonalgoyal> >> >> >> >> On Thu, Aug 23, 2018 at 8:22 AM, JF Chen <darou...@gmail.com> wrote: >> >>> For example, I have some data with timstamp marked as category A and B, >>> and ordered by time. Now I want to calculate each duration from A to B. In >>> normal program, I can use the flag bit to record the preview data if it is >>> A or B, and then calculate the duration. But in Spark Dataframe, how to do >>> it? >>> >>> Thanks! >>> >>> Regard, >>> Junfeng Chen >>> >> >> -- To achieve, you need thought. You have to know what you are doing and that's real power.