ickly and going out of space as
its a long running spark job. (running spark in yarn-client mode btw).
Thanks
-Nitin
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Long-running-job-tp10717.html
Sent from the Apache Spark Developers List ma
I believe calling processedSchemaRdd.persist(DISK) and
processedSchemaRdd.checkpoint() only persists data and I will lose all the
RDD metadata and when I re-start my driver, that data is kind of useless for
me (correct me if I am wrong).
I thought of doing processedSchemaRdd.saveAsParquetFile (hdf
river?
Thanks
-Nitin
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Does-Spark-delete-shuffle-files-of-lost-executor-in-running-system-on-YARN-tp10755.html
Sent from the Apache Spark Developers List mailing list archive at Nabbl
ew API? Is this the expected
behaviour or am I missing something here?
--
Regards
Nitin Goyal
Hi,
Do we have any plan of supporting parquet-like partitioning support in
Spark SQL in-memory cache? Something like one RDD[CachedBatch] per
in-memory cache partition.
-Nitin
s what you were refering to originally?
Thanks
-Nitin
On Fri, Nov 25, 2016 at 11:29 AM, Reynold Xin wrote:
> It's already there isn't it? The in-memory columnar cache format.
>
>
> On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal
> wrote:
>
>> Hi,
>>
Hi Spark Dev Team,
I want to start contributing to Spark Open source. This is the first time I
will be doing any open source contributions.
It would be great if I can get some guidance on where I can start with.
Thanks,
- Nitin
://pasteboard.co/MnQtB4o.png
http://pasteboard.co/MnrzHwJ.png
Any help/suggestion to fix this will be highly appreciated since this needs
to be fixed for production
Thanks in Advance,
Nitin
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner
for a
single query. I also looked at the fix's code diff and it wasn't related to
the problem which seems to exist in Closure Cleaner code.
Thanks
-Nitin
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-S
Thanks Josh and Yin.
Created following JIRA for the same :-
https://issues.apache.org/jira/browse/SPARK-7970
Thanks
-Nitin
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tp12466p12515.html
Sent from
I am running a spark application in YARN having 2 executors with Xms/Xmx as
32 Gigs and spark.yarn.excutor.memoryOverhead as 6 gigs.
I am seeing that the app's physical memory is ever increasing and finally
gets killed by node manager
2015-07-25 15:07:05,354 WARN
org.apache.hadoop.yarn.server.nod
I think spark sql's in-memory columnar cache already does compression. Check
out classes in following path :-
https://github.com/apache/spark/tree/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/compression
Although compression ratio is not as good as Parquet.
Thanks
-
;(memory)" written which
means input data has been fetched from memory (your cached RDD).
As far as lineage/call site is concerned, I think there was a change in
spark 1.3 which excluded some classes from appearing in call site (I know
that some Spark SQL related were removed for sure).
Thanks
You both can check out following links :-
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
http://spark.apache.org/docs/latest/building-spark.html
Thanks
-Nitin
On Thu, Oct 29, 2015 at 4:13 PM, Aadi Thakar
wrote:
> Hello, my name is Aaditya Thakkar and I am a sec
t. I've tried to
>> look this up in the mailing list archives but haven't had luck so far.
>>
>> How can I run a single test suite? Thanks in advance!
>>
>> --
>> BR,
>> Stefano Baghino
>>
>
>
--
Regards
Nitin Goyal
Spar SQL's in-memory cache stores statistics per column which in turn is
used to skip batches(default size 1) within partition
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala#L25
Hope this helps
Thanks
-Niti
Hi Daniel,
I could indeed discover the problem in my case and it turned out to be a
bug at parquet side and I had raised and contributed to the following issue
:-
https://issues.apache.org/jira/browse/PARQUET-353
Hope this helps!
Thanks
-Nitin
On Mon, May 2, 2016 at 9:15 PM, Daniel Darabos
17 matches
Mail list logo