I'm curious if someone could provide a bit deeper insight into how
memory_and_disk_ser persistence level works.
I've noticed that if my cluster has 2.2 TB of memory and I set the
persistence level to memory_only_ser that Spark will use about 2TB and the
storage tab shows 97-99% fraction cached (th
nd it doesn't seem it is possible to report storage
evictions at the moment?
That would be a really nice feature, to be able to set up alerts on such an
event.
On Wed, Jun 16, 2021 at 3:07 PM Zilvinas Saltys <
zilvinas.sal...@verizonmedia.com> wrote:
> Hi,
>
> I'm runni
Hi,
I'm running Spark 3.0.1 on AWS. Dynamic allocation is disabled. I'm caching
a large dataset 100% in memory. Before caching it I coalesce the dataset to
1792 partitions. There are 112 executors and 896 cores on the cluster.
The next stage is reading as input those 1792 partitions. The query pl
The challenge I have is this. There's two streams of data where an event
might look like this in stream1: (time, hashkey, foo1) and in stream2:
(time, hashkey, foo2)
The result after joining should be (time, hashkey, foo1, foo2) .. The join
happens on hashkey and the time difference can be ~30 mins