Have you tried to join both datasets, filter accordingly and then write the full dataset to your filesystem? Alternatively work with a NoSQL database that you update by key (eg it sounds a key/value store could be useful for you).
However, it could be also that you need to do more depending on your use case. > Am 14.08.2019 um 05:08 schrieb Shyam P <shyamabigd...@gmail.com>: > > Hi, > Any advice how to do this in spark sql ? > > I have a scenario as below > > dataframe1 = loaded from an HDFS Parquet file. > > dataframe2 = read from a Kafka Stream. > > If column1 of dataframe1 value in columnX value of dataframe2 , then I need > then I need to replace column1 value of dataframe1. > > Else add column1 value of dataframe1 to dataframe2 as a new record. > > > > In a sense need to implement a look up dataframe which is refresh-able. > > For more information please check > > https://stackoverflow.com/questions/57479581/how-to-do-this-scenario-in-spark-streaming?noredirect=1#comment101437596_57479581 > > > Let me know if u need more info > > Thanks