Hi
Can you please mention the spark version, give us the code for setting up
spark session, and the operation you are talking about? It will be good to
know the amount of memory that your system has as well and number of
executors you are using per system
In general I have faced issues when doing g
Hi Filip,
Care to share the code behind "The only thing I found so far involves using
forEachBatch and manually updating my aggregates. "?
I'm not completely sure I understand your use case and hope the code could
shed more light on it. Thank you.
Pozdrawiam,
Jacek Laskowski
https://about.m
HI,
I have a piece of code in which an rdd is created from a main method.
It then does work on this rdd from 2 different threads running in parallel.
When running this code as part of a test with a local master it will
sometimes make spark hang ( 1 task will never get completed)
If i make a copy
Hi,
I don't have any code for the forEachBatch approach, I mentioned it due to
this response to my question on SO:
https://stackoverflow.com/a/65803718/1017130
I have added some very simple code below that I think shows what I'm trying
to do:
val schema = StructType(
Array(
StructFiel
RDDs are immutable, and Spark itself is thread-safe. This should be fine.
Something else is going on in your code.
On Fri, Jan 22, 2021 at 7:59 AM jelmer wrote:
> HI,
>
> I have a piece of code in which an rdd is created from a main method.
> It then does work on this rdd from 2 different thread
Hi,
I am running Airflow + Spark + AKS (Azure K8s). Sporadically, when I have a
spark job complete, my spark-submit process does not notice that the driver has
succeeded and continues to track the job as running. Does anyone know how
spark-submit process monitors driver processes on k8s? My exp