Further information . I have kerberized cluster and am also doing the kinit
. Problem is only coming where the proxy user is being used .
On Fri, Apr 22, 2022 at 10:21 AM Pralabh Kumar
wrote:
> Hi
>
> Running Spark 3.2 on K8s with --proxy-user and getting below error and
> then the job fails .
Hi
Running Spark 3.2 on K8s with --proxy-user and getting below error and then
the job fails . However when running without a proxy user job is running
fine . Can anyone please help me with the same .
22/04/21 17:50:30 WARN Client: Exception encountered while connecting to
the server : org.apach
Hi all,
The Apache Kyuubi (Incubating) community is pleased to announce that
Apache Kyuubi (Incubating) 1.5.1-incubating has been released!
Apache Kyuubi (Incubating) is a distributed multi-tenant JDBC server for
large-scale data processing and analytics, built on top of Apache Spark
and designed
The line of code triggers a job, the job triggers stages. You should see
they are different operations, all supporting execution of the action on
that line.
On Thu, Apr 21, 2022 at 9:24 AM Joe wrote:
> Hi Sean,
> Thanks for replying but my question was about multiple stages running
> the same li
There are a few things going on here.
1. Spark is lazy, so nothing happens until a result is collected back to the
driver or data is written to a sink. So the 1 line you see
is most likely just that trigger. Once triggered, all of the work required to
make that final result happen occurs. If th
Hi Sean,
Thanks for replying but my question was about multiple stages running
the same line of code, not about multiple stages in general. Yes single
job can have multiple stages, but they should not be repeated, as far
as I know, if you're caching/persisting your intermediate outputs.
My questio
A job can have multiple stages for sure. One action triggers a job. This
seems normal.
On Thu, Apr 21, 2022, 9:10 AM Joe wrote:
> Hi,
> When looking at application UI (in Amazon EMR) I'm seeing one job for
> my particular line of code, for example:
> 64 Running count at MySparkJob.scala:540
>
>
Hi,
When looking at application UI (in Amazon EMR) I'm seeing one job for
my particular line of code, for example:
64 Running count at MySparkJob.scala:540
When I click into the job and go to stages I can see over a 100 stages
running the same line of code (stages are active, pending or
completed)
Hi Sean
Persisting/caching is useful when you’re going to reuse dataframe. So in your
case no persisting/caching is required. This is regarding to “when”.
The “where” usually belongs to the closest point of reusing
calculations/transformations
Btw, I’m not sure if caching is useful when you h
You persist before actions, not after, if you want the action's outputs to
be persistent.
If anything swap line 2 and 3. However, there's no point in the count()
here, and because there is already only one action following to write, no
caching is useful in that example.
On Thu, Apr 21, 2022 at 2:2
Hello all, we are running into some issues while attempting graceful
decommissioning of executors. We are running spark-thriftserver (3.2.0) on
Kubernetes (GKE 1.20.15-gke.2500). We enabled:
- spark.decommission.enabled
- spark.storage.decommission.rddBlocks.enabled
- spark.storage.decomm
Not a max - all values are needed. pivot() if anything is much closer, but
see the rest of this thread.
On Thu, Apr 21, 2022 at 1:19 AM Sonal Goyal wrote:
> Seems like an interesting problem to solve!
>
> If I have understood it correctly, you have 10114 files each with the
> structure
>
> rowid
Hi Folks,
I am working on Spark Dataframe API where I am doing following thing:
1) df = spark.sql("some sql on huge dataset").persist()
2) df1 = df.count()
3) df.repartition().write.mode().parquet("")
AFAIK, persist should be used after count statement if at all it is needed
to be used since sp
13 matches
Mail list logo