Have you tried to join both datasets, filter accordingly and then write the 
full dataset to your filesystem?
Alternatively work with a NoSQL database that you update by key (eg it sounds a 
key/value store could be useful for you).

However, it could be also that you need to do more depending on your use case.

> Am 14.08.2019 um 05:08 schrieb Shyam P <shyamabigd...@gmail.com>:
> 
> Hi,
> Any advice how to do this in spark sql ?
> 
> I have a scenario as below
> 
> dataframe1   = loaded from an HDFS Parquet file.
> 
> dataframe2 =   read from a Kafka Stream.
> 
> If column1 of dataframe1 value in columnX value of dataframe2 , then I need 
> then I need to replace column1 value of dataframe1. 
> 
> Else add column1 value of dataframe1 to dataframe2 as a new record.
> 
> 
> 
> In a sense need to implement a look up dataframe which is refresh-able.
> 
> For more information please check
> 
> https://stackoverflow.com/questions/57479581/how-to-do-this-scenario-in-spark-streaming?noredirect=1#comment101437596_57479581
>  
> 
>  Let me know if u need more info  
> 
> Thanks

Reply via email to