On Wed, Nov 16, 2016 at 1:54 PM, Yakov Zhdanov <yzhda...@apache.org> wrote:
> > On Wed, Nov 16, 2016 at 11:22 AM, Yakov Zhdanov <yzhda...@apache.org> > wrote: > > > > > Yakov, I agree that such scenario should be avoided. I also think > that > > > > > loadCache(...) method, as it is right now, provides a way to avoid > it. > > > > > > > > No, it does not. > > > > > > Yes it does :) > > No it doesn't. Load cache should either send a query to DB that filters all > the data on server side which, in turn, may result to full-scan of 2 Tb > data set dozens of times (equal to node count) or send a query that brings > the whole dataset to each node which is unacceptable as well. > Why not store the partition ID in the database and query only local partitions? Whatever approach we design with a DataStreamer will be slower than this.