Re: Parquet-like partitioning support in spark SQL's in-memory columnar cache

2016-11-28 Thread Nitin Goyal
s what you were refering to originally? Thanks -Nitin On Fri, Nov 25, 2016 at 11:29 AM, Reynold Xin wrote: > It's already there isn't it? The in-memory columnar cache format. > > > On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal > wrote: > >> Hi, >>

Parquet-like partitioning support in spark SQL's in-memory columnar cache

2016-11-24 Thread Nitin Goyal
Hi, Do we have any plan of supporting parquet-like partitioning support in Spark SQL in-memory cache? Something like one RDD[CachedBatch] per in-memory cache partition. -Nitin

Continuous warning while consuming using new kafka-spark010 API

2016-09-19 Thread Nitin Goyal
ew API? Is this the expected behaviour or am I missing something here? -- Regards Nitin Goyal

Re: Ever increasing physical memory for a Spark Application in YARN

2016-05-03 Thread Nitin Goyal
Hi Daniel, I could indeed discover the problem in my case and it turned out to be a bug at parquet side and I had raised and contributed to the following issue :- https://issues.apache.org/jira/browse/PARQUET-353 Hope this helps! Thanks -Nitin On Mon, May 2, 2016 at 9:15 PM, Daniel Darabos

Re: Secondary Indexing of RDDs?

2015-12-14 Thread Nitin Goyal
Spar SQL's in-memory cache stores statistics per column which in turn is used to skip batches(default size 1) within partition https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala#L25 Hope this helps Thanks -Niti

Re: Running individual test classes

2015-11-03 Thread Nitin Goyal
t. I've tried to >> look this up in the mailing list archives but haven't had luck so far. >> >> How can I run a single test suite? Thanks in advance! >> >> -- >> BR, >> Stefano Baghino >> > > -- Regards Nitin Goyal

Re: want to contribute

2015-10-29 Thread Nitin Goyal
You both can check out following links :- https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark http://spark.apache.org/docs/latest/building-spark.html Thanks -Nitin On Thu, Oct 29, 2015 at 4:13 PM, Aadi Thakar wrote: > Hello, my name is Aaditya Thakkar and I am a sec

Re: Operations with cached RDD

2015-10-11 Thread Nitin Goyal
;(memory)" written which means input data has been fetched from memory (your cached RDD). As far as lineage/call site is concerned, I think there was a change in spark 1.3 which excluded some classes from appearing in call site (I know that some Spark SQL related were removed for sure). Thanks

Re: [ compress in-memory column storage used in sparksql cache table ]

2015-09-02 Thread Nitin Goyal
I think spark sql's in-memory columnar cache already does compression. Check out classes in following path :- https://github.com/apache/spark/tree/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/compression Although compression ratio is not as good as Parquet. Thanks -

Ever increasing physical memory for a Spark Application in YARN

2015-07-27 Thread Nitin Goyal
I am running a spark application in YARN having 2 executors with Xms/Xmx as 32 Gigs and spark.yarn.excutor.memoryOverhead as 6 gigs. I am seeing that the app's physical memory is ever increasing and finally gets killed by node manager 2015-07-25 15:07:05,354 WARN org.apache.hadoop.yarn.server.nod

Re: ClosureCleaner slowing down Spark SQL queries

2015-05-30 Thread Nitin Goyal
Thanks Josh and Yin. Created following JIRA for the same :- https://issues.apache.org/jira/browse/SPARK-7970 Thanks -Nitin -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tp12466p12515.html Sent from

Re: ClosureCleaner slowing down Spark SQL queries

2015-05-27 Thread Nitin Goyal
for a single query. I also looked at the fix's code diff and it wasn't related to the problem which seems to exist in Closure Cleaner code. Thanks -Nitin -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-S

ClosureCleaner slowing down Spark SQL queries

2015-05-27 Thread Nitin Goyal
://pasteboard.co/MnQtB4o.png http://pasteboard.co/MnrzHwJ.png Any help/suggestion to fix this will be highly appreciated since this needs to be fixed for production Thanks in Advance, Nitin -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner

Guidance for becoming Spark contributor

2015-04-10 Thread Nitin Mathur
Hi Spark Dev Team, I want to start contributing to Spark Open source. This is the first time I will be doing any open source contributions. It would be great if I can get some guidance on where I can start with. Thanks, - Nitin

Does Spark delete shuffle files of lost executor in running system(on YARN)?

2015-02-24 Thread nitin
river? Thanks -Nitin -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Does-Spark-delete-shuffle-files-of-lost-executor-in-running-system-on-YARN-tp10755.html Sent from the Apache Spark Developers List mailing list archive at Nabbl

Re: Spark SQL - Long running job

2015-02-22 Thread nitin
I believe calling processedSchemaRdd.persist(DISK) and processedSchemaRdd.checkpoint() only persists data and I will lose all the RDD metadata and when I re-start my driver, that data is kind of useless for me (correct me if I am wrong). I thought of doing processedSchemaRdd.saveAsParquetFile (hdf

Spark SQL - Long running job

2015-02-21 Thread nitin
ickly and going out of space as its a long running spark job. (running spark in yarn-client mode btw). Thanks -Nitin -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Long-running-job-tp10717.html Sent from the Apache Spark Developers List ma