Re: Spark Partitioning Strategy with Parquet

2016-12-30 Thread titli batali
Yeah, it works for me. Thanks On Fri, Nov 18, 2016 at 3:08 AM, ayan guha wrote: > Hi > > I think you can use map reduce paradigm here. Create a key using user ID > and date and record as a value. Then you can express your operation (do > something) part as a function. If the function meets cer

Re: Spark Partitioning Strategy with Parquet

2016-11-17 Thread ayan guha
Hi I think you can use map reduce paradigm here. Create a key using user ID and date and record as a value. Then you can express your operation (do something) part as a function. If the function meets certain criteria such as associative and cumulative like, say Add or multiplication, you can use

Re: Spark Partitioning Strategy with Parquet

2016-11-17 Thread titli batali
That would help but again in a particular partitions i would need to a iterate over the customers having first n letters of user id in that partition. I want to get rid of nested iterations. Thanks On Thu, Nov 17, 2016 at 10:28 PM, Xiaomeng Wan wrote: > You can partitioned on the first n letter

Re: Spark Partitioning Strategy with Parquet

2016-11-17 Thread Xiaomeng Wan
You can partitioned on the first n letters of userid On 17 November 2016 at 08:25, titli batali wrote: > Hi, > > I have a use case, where we have 1000 csv files with a column user_Id, > having 8 million unique users. The data contains: userid,date,transaction, > where we run some queries. > > We