s what you were refering to originally?
Thanks
-Nitin
On Fri, Nov 25, 2016 at 11:29 AM, Reynold Xin wrote:
> It's already there isn't it? The in-memory columnar cache format.
>
>
> On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal
> wrote:
>
>> Hi,
>>
Hi,
Do we have any plan of supporting parquet-like partitioning support in
Spark SQL in-memory cache? Something like one RDD[CachedBatch] per
in-memory cache partition.
-Nitin
ew API? Is this the expected
behaviour or am I missing something here?
--
Regards
Nitin Goyal
Hi Daniel,
I could indeed discover the problem in my case and it turned out to be a
bug at parquet side and I had raised and contributed to the following issue
:-
https://issues.apache.org/jira/browse/PARQUET-353
Hope this helps!
Thanks
-Nitin
On Mon, May 2, 2016 at 9:15 PM, Daniel Darabos
Spar SQL's in-memory cache stores statistics per column which in turn is
used to skip batches(default size 1) within partition
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala#L25
Hope this helps
Thanks
-Niti
t. I've tried to
>> look this up in the mailing list archives but haven't had luck so far.
>>
>> How can I run a single test suite? Thanks in advance!
>>
>> --
>> BR,
>> Stefano Baghino
>>
>
>
--
Regards
Nitin Goyal
You both can check out following links :-
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
http://spark.apache.org/docs/latest/building-spark.html
Thanks
-Nitin
On Thu, Oct 29, 2015 at 4:13 PM, Aadi Thakar
wrote:
> Hello, my name is Aaditya Thakkar and I am a sec
;(memory)" written which
means input data has been fetched from memory (your cached RDD).
As far as lineage/call site is concerned, I think there was a change in
spark 1.3 which excluded some classes from appearing in call site (I know
that some Spark SQL related were removed for sure).
Thanks
I think spark sql's in-memory columnar cache already does compression. Check
out classes in following path :-
https://github.com/apache/spark/tree/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/compression
Although compression ratio is not as good as Parquet.
Thanks
-
I am running a spark application in YARN having 2 executors with Xms/Xmx as
32 Gigs and spark.yarn.excutor.memoryOverhead as 6 gigs.
I am seeing that the app's physical memory is ever increasing and finally
gets killed by node manager
2015-07-25 15:07:05,354 WARN
org.apache.hadoop.yarn.server.nod
Thanks Josh and Yin.
Created following JIRA for the same :-
https://issues.apache.org/jira/browse/SPARK-7970
Thanks
-Nitin
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tp12466p12515.html
Sent from
for a
single query. I also looked at the fix's code diff and it wasn't related to
the problem which seems to exist in Closure Cleaner code.
Thanks
-Nitin
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-S
://pasteboard.co/MnQtB4o.png
http://pasteboard.co/MnrzHwJ.png
Any help/suggestion to fix this will be highly appreciated since this needs
to be fixed for production
Thanks in Advance,
Nitin
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner
Hi Spark Dev Team,
I want to start contributing to Spark Open source. This is the first time I
will be doing any open source contributions.
It would be great if I can get some guidance on where I can start with.
Thanks,
- Nitin
river?
Thanks
-Nitin
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Does-Spark-delete-shuffle-files-of-lost-executor-in-running-system-on-YARN-tp10755.html
Sent from the Apache Spark Developers List mailing list archive at Nabbl
I believe calling processedSchemaRdd.persist(DISK) and
processedSchemaRdd.checkpoint() only persists data and I will lose all the
RDD metadata and when I re-start my driver, that data is kind of useless for
me (correct me if I am wrong).
I thought of doing processedSchemaRdd.saveAsParquetFile (hdf
ickly and going out of space as
its a long running spark job. (running spark in yarn-client mode btw).
Thanks
-Nitin
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Long-running-job-tp10717.html
Sent from the Apache Spark Developers List ma
17 matches
Mail list logo