Re: on shark, is tachyon less efficient than memory_only cache strategy ?

Aaron Davidson Tue, 08 Jul 2014 09:59:15 -0700

Shark's in-memory format is already serialized (it's compressed and
column-based).



On Tue, Jul 8, 2014 at 9:50 AM, Mridul Muralidharan <[email protected]>
wrote:

> You are ignoring serde costs :-)
>
> - Mridul
>
> On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson <[email protected]> wrote:
> > Tachyon should only be marginally less performant than memory_only,
> because
> > we mmap the data from Tachyon's ramdisk. We do not have to, say, transfer
> > the data over a pipe from Tachyon; we can directly read from the buffers
> in
> > the same way that Shark reads from its in-memory columnar format.
> >
> >
> >
> > On Tue, Jul 8, 2014 at 1:18 AM, qingyang li <[email protected]>
> > wrote:
> >
> >> hi, when i create a table, i can point the cache strategy using
> >> shark.cache,
> >> i think "shark.cache=memory_only"  means data are managed by spark, and
> >> data are in the same jvm with excutor;   while  "shark.cache=tachyon"
> >>  means  data are managed by tachyon which is off heap, and data are not
> in
> >> the same jvm with excutor,  so spark will load data from tachyon for
> each
> >> query sql , so,  is  tachyon less efficient than memory_only cache
> strategy
> >>  ?
> >> if yes, can we let spark load all data once from tachyon  for all sql
> query
> >>  if i want to use tachyon cache strategy since tachyon is more HA than
> >> memory_only ?
> >>
>

Re: on shark, is tachyon less efficient than memory_only cache strategy ?

Reply via email to