Thanks a lot TD, exactly what I was looking for. And I have seen most of
your talks, really great stuff you guys are doing :)
On Thu, Feb 1, 2018 at 10:38 AM, Tathagata Das
wrote:
> Hello Ayan,
>
> From what I understand, mapGroupsWithState (probably the more general
> flatMapGroupsWithState) is
Hello Ayan,
>From what I understand, mapGroupsWithState (probably the more general
flatMapGroupsWithState) is the best way forward (not available in python).
However, you need to figure out your desired semantics of when you want to
output the deduplicated data from the stremaing query. For exampl
Any help would be much appreciated :)
On Mon, Jan 29, 2018 at 6:25 PM, ayan guha wrote:
> Hi
>
> I want to write something in Structured streaming:
>
> 1. I have a dataset which has 3 columns: id, last_update_timestamp,
> attribute
> 2. I am receiving the data through Kinesis
>
> I want to dedup
Hi
I want to write something in Structured streaming:
1. I have a dataset which has 3 columns: id, last_update_timestamp,
attribute
2. I am receiving the data through Kinesis
I want to deduplicate records based on last_updated. In batch, it looks
like:
spark.sql("select * from (Select *, row_nu