yes, I see now. In a case of 3 days it's indeed possible, however if I want
to hold 30 days(or even bigger) block aggregation it will be a bit slow.
for the sake of the history:
I've found several directions that I can improve shuffling(from video
https://www.youtube.com/watch?v=Wg2boMqLjCg) e.g.
My point is if you keep daily aggregates already computed then you do not
reprocess raw data. But yuh you may decide to recompute last 3 days
everyday.
On 29 May 2015 23:52, "Igor Berman" wrote:
> Hi Ayan,
> thanks for the response
> I'm using 1.3.1. I'll check window queries(I dont use spark-sql
Hi Ayan,
thanks for the response
I'm using 1.3.1. I'll check window queries(I dont use spark-sql...only
core, might be I should?)
What do you mean by materialized? I can repartitionAndSort by key
daily-aggregation, however I'm not quite understand how it will help with
yesterdays block which needs
Which version of spark? In 1.4 window queries will show up for these kind
of scenarios.
1 thing I can suggest is keep daily aggregates materialised and partioned
by key and sorted by key-day combination using repartitionandsort method.
It allows you to use custom partitioner and custom sorter.
Be