Hello,
In my organization, we have an accounting system for spark jobs that uses
the task execution time to determine how much time a spark job uses the
executors for and we use it as a way to segregate cost. We sum all the task
times per job and apply proportions. Our clusters follow a 1 task per
b.cern.ch/node/192
>
>
>
> Best,
>
> Luca
>
>
>
>
>
> *From:* Faiz Halde
> *Sent:* Thursday, December 7, 2023 23:25
> *To:* user@spark.apache.org
> *Subject:* Spark on Java 17
>
>
>
> Hello,
>
>
>
> We are planning to switch to Java 1
Hello,
We are planning to switch to Java 17 for Spark and were wondering if
there's any obvious learnings from anybody related to JVM tuning?
We've been running on Java 8 for a while now and used to use the parallel
GC as that used to be a general recommendation for high throughout systems.
How h
Hello,
Is it possible to run SparkML using Spark Connect 3.5.0? So far I've had no
success setting up a connect client that uses ML package
The ML package uses spark core/sql afaik which seems to be shadowing the
Spark connect client classes
Do I have to exclude any dependencies from the mllib
23 at 12:47, Holden Karau wrote:
>
>> So I don’t think we make any particular guarantees around class path
>> isolation there, so even if it does work it’s something you’d need to pay
>> attention to on upgrades. Class path isolation is tricky to get right.
>>
>> On Mon, Nov
01:30 Holden Karau wrote:
> So I don’t think we make any particular guarantees around class path
> isolation there, so even if it does work it’s something you’d need to pay
> attention to on upgrades. Class path isolation is tricky to get right.
>
> On Mon, Nov 27, 2023 at 2:58
Hello,
We are using spark 3.5.0 and were wondering if the following is achievable
using spark-core
Our use case involves spinning up a spark cluster where the driver
application loads user jars containing spark transformations at runtime. A
single spark application can load multiple user jars ( s
Hello,
Due to the way Spark implements shuffle, a loss of an executor sometimes
results in the recomputation of partitions that were lost
The definition of a *partition* is the tuple ( RDD-ids, partition id )
RDD-ids is a sequence of RDD ids
In our system, we define the unit of work performed fo
Hello,
We've been in touch with a few spark specialists who suggested us a
potential solution to improve the reliability of our jobs that are shuffle
heavy
Here is what our setup looks like
- Spark version: 3.3.1
- Java version: 1.8
- We do not use external shuffle service
- We use s
Hello,
We run our spark workloads on spot and we would like to quantify the impact
of spot interruptions on our workloads. We are proposing the following
metric but would like your opinions on it
We are leveraging Spark's Event Listener and performing the following
T = task
T1 = sum(T.execution
10 matches
Mail list logo