Re: mapGroupsWithState in Python

2018-01-31 Thread ayan guha
Thanks a lot TD, exactly what I was looking for. And I have seen most of your talks, really great stuff you guys are doing :) On Thu, Feb 1, 2018 at 10:38 AM, Tathagata Das wrote: > Hello Ayan, > > From what I understand, mapGroupsWithState (probably the more general > flatMapGroupsWithState) is

Re: mapGroupsWithState in Python

2018-01-31 Thread Tathagata Das
Hello Ayan, >From what I understand, mapGroupsWithState (probably the more general flatMapGroupsWithState) is the best way forward (not available in python). However, you need to figure out your desired semantics of when you want to output the deduplicated data from the stremaing query. For exampl

Re: mapGroupsWithState in Python

2018-01-30 Thread ayan guha
Any help would be much appreciated :) On Mon, Jan 29, 2018 at 6:25 PM, ayan guha wrote: > Hi > > I want to write something in Structured streaming: > > 1. I have a dataset which has 3 columns: id, last_update_timestamp, > attribute > 2. I am receiving the data through Kinesis > > I want to dedup

mapGroupsWithState in Python

2018-01-28 Thread ayan guha
Hi I want to write something in Structured streaming: 1. I have a dataset which has 3 columns: id, last_update_timestamp, attribute 2. I am receiving the data through Kinesis I want to deduplicate records based on last_updated. In batch, it looks like: spark.sql("select * from (Select *, row_nu