Hi Keith,
I don't think that we keep such references.
But we do experience exceptions during the job execution that we catch and
retry (timeouts/network issues from different data sources).
Can they affect RDD cleanup?
Thanks,
Alex
On Sun, Jul 21, 2019 at 10:49 PM Keith Chapman
wrote:
> Hi Ale
Hi,
I would like to add the applicationId to all logs produced by Spark through
Log4j. Consider that I have a cluster with several jobs running in it, so
the presence of the applicationId would be useful to logically divide them.
I have found a partial solution. If I change the layout of the
Patt
Hi Bobby Evans,
I apologise for the delayed response , yes you are right I missed out to
paste the complete stack trace exception. Here with I have attached the
complete yarn log for the same.
Thank you , It would be helpful if you guys could assist me on this error.
On Tue, Jul 23, 2019 at 05:10:19PM +, Mario Amatucci wrote:
> https://spark.apache.org/docs/2.2.0/configuration.html#memory-management
thanks for the pointer, however, I tried almost every configuration and
the behavior tends to show that spark keeps things in memory instead of
releasing it
https://spark.apache.org/docs/2.2.0/configuration.html#memory-management
MARIO AMATUCCI
Senior Software Engineer
Office: +48 12 881 10 05 x 31463 Email: mario_amatu...@epam.com
Gdansk, Poland epam.com
~do more with less~
CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intende
Hi
I have those avro file with the schema id:Long, content:Binary
the binary are large image with a maximum of 2GB of size.
I d like to get a subset of row "where id in (...)"
Sadly I get memory errors even if the subset is 0 of size. It looks like
the reader stores the binary information until