Broadcast Join and Inner Join giving different result on same DataFrame

2016-12-30 Thread titli batali
Hi, I have two dataframes which has common column Product_Id on which i have to perform a join operation. val transactionDF = readCSVToDataFrame(sqlCtx: SQLContext, pathToReadTransactions: String, transactionSchema: StructType) val productDF = readCSVToDataFrame(sqlCtx: SQLContext, pathTo

Re: Spark Partitioning Strategy with Parquet

2016-12-30 Thread titli batali
function meets certain criteria such > as associative and cumulative like, say Add or multiplication, you can use > reducebykey, else you may use groupbykey. > > HTH > On 18 Nov 2016 06:45, "titli batali" wrote: > >> >> That would help but again in a part

Re: Spark Partitioning Strategy with Parquet

2016-11-17 Thread titli batali
t n letters of userid > > On 17 November 2016 at 08:25, titli batali wrote: > >> Hi, >> >> I have a use case, where we have 1000 csv files with a column user_Id, >> having 8 million unique users. The data contains: userid,date,transaction, >> where we run some quer

Fwd: Spark Partitioning Strategy with Parquet

2016-11-17 Thread titli batali
Hi, I have a use case, where we have 1000 csv files with a column user_Id, having 8 million unique users. The data contains: userid,date,transaction, where we run some queries. We have a case where we need to iterate for each transaction in a particular date for each user. There is three nesting