Keeping only latest row by key?

Porritt, James Tue, 17 Jul 2018 09:27:40 -0700

In Spark if I want to be able to get a set of unique rows by id, using the 
criteria of keeping the row with the latest timestamp, I would do the following:


                        .withColumn("rn",
                            F.row_number().over(
                                Window.partitionBy('id') \
                                    .orderBy(F.col('timestamp').desc())
                            )
                        ) \
                        .where(F.col("rn") == 1)

I see Flink has windowing functionality, but I don't see it has row 
enumeration? How best in that case would I achieve the above?

Thanks,
James.
######################################################################

The information contained in this communication is confidential and

intended only for the individual(s) named above. If you are not a named

addressee, please notify the sender immediately and delete this email

from your system and do not disclose the email or any part of it to any

person. The views expressed in this email are the views of the author

and do not necessarily represent the views of Millennium Capital Partners

LLP (MCP LLP) or any of its affiliates. Outgoing and incoming electronic

communications of MCP LLP and its affiliates, including telephone

communications, may be electronically archived and subject to review

and/or disclosure to someone other than the recipient. MCP LLP is

authorized and regulated by the Financial Conduct Authority. Millennium

Capital Partners LLP is a limited liability partnership registered in

England & Wales with number OC312897 and with its registered office at

50 Berkeley Street, London, W1J 8HD.

######################################################################

Keeping only latest row by key?

Reply via email to