So the data in the fcst dataframe is like this
Product, fcst_qty A 100 B 50 Sales DF has data like this Order# Item# Sales qty 101 A 10 101 B 5 102 A 5 102 B 10 I want to update the FCSt DF data, based on Product=Item# So the resultant FCST DF should have data Product, fcst_qty A 85 B 35 Hope it helps If I join the data between the 2 DFs (based on Product# and item#), I will get a cartesion join and my result will not be what I want Thanks for your help From: Mike Metzger [mailto:[email protected]] Sent: Friday, August 26, 2016 2:12 PM To: Subhajit Purkayastha <[email protected]> Cc: user @spark <[email protected]> Subject: Re: Spark 2.0 - Insert/Update to a DataFrame Without seeing exactly what you were wanting to accomplish, it's hard to say. A Join is still probably the method I'd suggest using something like: select (FCST.quantity - SO.quantity) as quantity <other needed columns> from FCST LEFT OUTER JOIN SO ON FCST.productid = SO.productid WHERE <conditions> with specifics depending on the layout and what language you're using. Thanks Mike On Fri, Aug 26, 2016 at 3:29 PM, Subhajit Purkayastha <[email protected] <mailto:[email protected]> > wrote: Mike, The grains of the dataFrame are different. I need to reduce the forecast qty (which is in the FCST DF) based on the sales qty (coming from the sales order DF) Hope it helps Subhajit From: Mike Metzger [mailto:[email protected] <mailto:[email protected]> ] Sent: Friday, August 26, 2016 1:13 PM To: Subhajit Purkayastha <[email protected] <mailto:[email protected]> > Cc: user @spark <[email protected] <mailto:[email protected]> > Subject: Re: Spark 2.0 - Insert/Update to a DataFrame Without seeing the makeup of the Dataframes nor what your logic is for updating them, I'd suggest doing a join of the Forecast DF with the appropriate columns from the SalesOrder DF. Mike On Fri, Aug 26, 2016 at 11:53 AM, Subhajit Purkayastha <[email protected] <mailto:[email protected]> > wrote: I am using spark 2.0, have 2 DataFrames, SalesOrder and Forecast. I need to update the Forecast Dataframe record(s), based on the SaleOrder DF record. What is the best way to achieve this functionality
