Hi Junfeng,

You should be able to do this with  window aggregation functions  lead or
lag
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-functions.html#lead

Thanks,
Dev

On Mon, Aug 27, 2018 at 7:08 AM JF Chen <darou...@gmail.com> wrote:

> Thanks Sonal.
> For example, I have data as following:
> login 2018/8/27 10:00
> logout 2018/8/27 10:05
> login 2018/8/27 10:08
> logout 2018/8/27 10:15
> login 2018/8/27 11:08
> logout 2018/8/27 11:32
>
> Now I want to calculate the time between each login and logout. For
> example, I should get 5 min, 7 min, 24 min from the above sample data.
> I know I can calculate it with foreach, but it seems all data running on
> spark driver node rather than multi executors.
> So any good way to solve this problem? Thanks!
>
> Regard,
> Junfeng Chen
>
>
> On Thu, Aug 23, 2018 at 6:15 PM Sonal Goyal <sonalgoy...@gmail.com> wrote:
>
>> Hi Junfeng,
>>
>> Can you please show by means of an example what you are trying to
>> achieve?
>>
>> Thanks,
>> Sonal
>> Nube Technologies <http://www.nubetech.co>
>>
>> <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>> On Thu, Aug 23, 2018 at 8:22 AM, JF Chen <darou...@gmail.com> wrote:
>>
>>> For example, I have some data with timstamp marked as category A and B,
>>> and ordered by time. Now I want to calculate each duration from A to B. In
>>> normal program, I can use the  flag bit to record the preview data if it is
>>> A or B, and then calculate the duration. But in Spark Dataframe, how to do
>>> it?
>>>
>>> Thanks!
>>>
>>> Regard,
>>> Junfeng Chen
>>>
>>
>>

-- 
To achieve, you need thought. You have to know what you are doing and
that's real power.

Reply via email to