Yes. For Shark, two modes, "shark.cache=tachyon" and "shark.cache=memory", have the same ser/de overhead. Shark loads data from outsize of the process in Tachyon mode with the following benefits:
- In-memory data sharing across multiple Shark instances (i.e. stronger isolation) - Instant recovery of in-memory tables - Reduce heap size => faster GC in shark - If the table is larger than the memory size, only the hot columns will be cached in memory from http://tachyon-project.org/master/Running-Shark-on-Tachyon.html and https://github.com/amplab/shark/wiki/Running-Shark-with-Tachyon Haoyuan On Tue, Jul 8, 2014 at 9:58 AM, Aaron Davidson <ilike...@gmail.com> wrote: > Shark's in-memory format is already serialized (it's compressed and > column-based). > > > On Tue, Jul 8, 2014 at 9:50 AM, Mridul Muralidharan <mri...@gmail.com> > wrote: > > > You are ignoring serde costs :-) > > > > - Mridul > > > > On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson <ilike...@gmail.com> > wrote: > > > Tachyon should only be marginally less performant than memory_only, > > because > > > we mmap the data from Tachyon's ramdisk. We do not have to, say, > transfer > > > the data over a pipe from Tachyon; we can directly read from the > buffers > > in > > > the same way that Shark reads from its in-memory columnar format. > > > > > > > > > > > > On Tue, Jul 8, 2014 at 1:18 AM, qingyang li <liqingyang1...@gmail.com> > > > wrote: > > > > > >> hi, when i create a table, i can point the cache strategy using > > >> shark.cache, > > >> i think "shark.cache=memory_only" means data are managed by spark, > and > > >> data are in the same jvm with excutor; while "shark.cache=tachyon" > > >> means data are managed by tachyon which is off heap, and data are > not > > in > > >> the same jvm with excutor, so spark will load data from tachyon for > > each > > >> query sql , so, is tachyon less efficient than memory_only cache > > strategy > > >> ? > > >> if yes, can we let spark load all data once from tachyon for all sql > > query > > >> if i want to use tachyon cache strategy since tachyon is more HA than > > >> memory_only ? > > >> > > > -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/