RE: can spark take advantage of ordered data?

java8964 Wed, 11 Mar 2015 17:41:12 -0700

RangePartitioner?
At least for join, you can implement your own partitioner, to utilize the 
sorted data.
Just my 2 cents.
Date: Wed, 11 Mar 2015 17:38:04 -0400
Subject: can spark take advantage of ordered data?
From: [email protected]
To: [email protected]


Hello all,
I am wondering if spark already has support for optimizations on sorted data 
and/or if such support could be added (I am comfortable dropping to a lower 
level if necessary to implement this, but I'm not sure if it is possible at 
all).
Context: we have a number of data sets which are essentially already sorted on 
a key. With our current systems, we can take advantage of this to do a lot of 
analysis in a very efficient fashion...merges and joins, for example, can be 
done very efficiently, as can folds on a secondary key and so on.
I was wondering if spark would be a fit for implementing these sorts of 
optimizations? Obviously it is sort of a niche case, but would this be 
achievable? Any pointers on where I should look?

RE: can spark take advantage of ordered data?

Reply via email to