On Wed, Nov 16, 2016 at 1:54 PM, Yakov Zhdanov <yzhda...@apache.org> wrote:

> > On Wed, Nov 16, 2016 at 11:22 AM, Yakov Zhdanov <yzhda...@apache.org>
> wrote:
>
> > > > Yakov, I agree that such scenario should be avoided. I also think
> that
>
> > > > loadCache(...) method, as it is right now, provides a way to avoid
> it.
>
> > >
>
> > > No, it does not.
>
> > >
> > Yes it does :)
>
> No it doesn't. Load cache should either send a query to DB that filters all
> the data on server side which, in turn, may result to full-scan of 2 Tb
> data set dozens of times (equal to node count) or send a query that brings
> the whole dataset to each node which is unacceptable as well.
>

Why not store the partition ID in the database and query only local
partitions? Whatever approach we design with a DataStreamer will be slower
than this.

Reply via email to