Re: Batch aggregation by sliding window + join

2015-05-30 Thread Igor Berman
yes, I see now. In a case of 3 days it's indeed possible, however if I want to hold 30 days(or even bigger) block aggregation it will be a bit slow. for the sake of the history: I've found several directions that I can improve shuffling(from video https://www.youtube.com/watch?v=Wg2boMqLjCg) e.g.

Re: Batch aggregation by sliding window + join

2015-05-29 Thread ayan guha
My point is if you keep daily aggregates already computed then you do not reprocess raw data. But yuh you may decide to recompute last 3 days everyday. On 29 May 2015 23:52, "Igor Berman" wrote: > Hi Ayan, > thanks for the response > I'm using 1.3.1. I'll check window queries(I dont use spark-sql

Re: Batch aggregation by sliding window + join

2015-05-29 Thread Igor Berman
Hi Ayan, thanks for the response I'm using 1.3.1. I'll check window queries(I dont use spark-sql...only core, might be I should?) What do you mean by materialized? I can repartitionAndSort by key daily-aggregation, however I'm not quite understand how it will help with yesterdays block which needs

Re: Batch aggregation by sliding window + join

2015-05-28 Thread ayan guha
Which version of spark? In 1.4 window queries will show up for these kind of scenarios. 1 thing I can suggest is keep daily aggregates materialised and partioned by key and sorted by key-day combination using repartitionandsort method. It allows you to use custom partitioner and custom sorter. Be