RE: AVRO File size when caching in-memory

2016-11-16 Thread Shreya Agarwal
.@microsoft.com>; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: AVRO File size when caching in-memory It's something like the schema shown below (with several additional levels/sublevels) root |-- sentAt: long (nullable = true) |-- sharing: string (nullable

Re: AVRO File size when caching in-memory

2016-11-16 Thread Prithish
gt; The purpose of these formats is to store data to persistent storage in a >>> way that's faster to read from, not to reduce cache-memory usage. >>> >>> >>> >>> Maybe others here have more info to share. >>> >>> >>>

Re: AVRO File size when caching in-memory

2016-11-16 Thread Takeshi Yamamuro
sage. >> >> >> >> Maybe others here have more info to share. >> >> >> >> Regards, >> >> Shreya >> >> >> >> Sent from my Windows 10 phone >> >> >> >> *From: *Prithish >> *Sent: *Tuesday,

Re: AVRO File size when caching in-memory

2016-11-16 Thread Prithish
nfo to share. > > > > Regards, > > Shreya > > > > Sent from my Windows 10 phone > > > > *From: *Prithish > *Sent: *Tuesday, November 15, 2016 11:04 PM > *To: *Shreya Agarwal > *Subject: *Re: AVRO File size when caching in-memory > > > I d

RE: AVRO File size when caching in-memory

2016-11-15 Thread Shreya Agarwal
gards, Shreya Sent from my Windows 10 phone From: Prithish<mailto:prith...@gmail.com> Sent: Tuesday, November 15, 2016 11:04 PM To: Shreya Agarwal<mailto:shrey...@microsoft.com> Subject: Re: AVRO File size when caching in-memory I did another test and noting my observations here. These w

Re: AVRO File size when caching in-memory

2016-11-15 Thread Prithish
Anyone? On Tue, Nov 15, 2016 at 10:45 AM, Prithish wrote: > I am using 2.0.1 and databricks avro library 3.0.1. I am running this on > the latest AWS EMR release. > > On Mon, Nov 14, 2016 at 3:06 PM, Jörn Franke wrote: > >> spark version? Are you using tungsten? >> >> > On 14 Nov 2016, at 10:05

Re: AVRO File size when caching in-memory

2016-11-14 Thread Prithish
I am using 2.0.1 and databricks avro library 3.0.1. I am running this on the latest AWS EMR release. On Mon, Nov 14, 2016 at 3:06 PM, Jörn Franke wrote: > spark version? Are you using tungsten? > > > On 14 Nov 2016, at 10:05, Prithish wrote: > > > > Can someone please explain why this happens?

Re: AVRO File size when caching in-memory

2016-11-14 Thread Jörn Franke
spark version? Are you using tungsten? > On 14 Nov 2016, at 10:05, Prithish wrote: > > Can someone please explain why this happens? > > When I read a 600kb AVRO file and cache this in memory (using cacheTable), it > shows up as 11mb (storage tab in Spark UI). I have tried this with different