date:20220421

Re: Spark3.2 on K8s with proxy-user

2022-04-21 Thread Pralabh Kumar

Further information . I have kerberized cluster and am also doing the kinit . Problem is only coming where the proxy user is being used . On Fri, Apr 22, 2022 at 10:21 AM Pralabh Kumar wrote: > Hi > > Running Spark 3.2 on K8s with --proxy-user and getting below error and > then the job fails .

Spark3.2 on K8s with proxy-user

2022-04-21 Thread Pralabh Kumar

Hi Running Spark 3.2 on K8s with --proxy-user and getting below error and then the job fails . However when running without a proxy user job is running fine . Can anyone please help me with the same . 22/04/21 17:50:30 WARN Client: Exception encountered while connecting to the server : org.apach

[ANNOUNCE] Apache Kyuubi (Incubating) released 1.5.1-incubating

2022-04-21 Thread Fu Chen

Hi all, The Apache Kyuubi (Incubating) community is pleased to announce that Apache Kyuubi (Incubating) 1.5.1-incubating has been released! Apache Kyuubi (Incubating) is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark and designed

Re: Why is spark running multiple stages with the same code line?

2022-04-21 Thread Sean Owen

The line of code triggers a job, the job triggers stages. You should see they are different operations, all supporting execution of the action on that line. On Thu, Apr 21, 2022 at 9:24 AM Joe wrote: > Hi Sean, > Thanks for replying but my question was about multiple stages running > the same li

Re: Why is spark running multiple stages with the same code line?

2022-04-21 Thread Russell Spitzer

There are a few things going on here. 1. Spark is lazy, so nothing happens until a result is collected back to the driver or data is written to a sink. So the 1 line you see is most likely just that trigger. Once triggered, all of the work required to make that final result happen occurs. If th

Re: Why is spark running multiple stages with the same code line?

2022-04-21 Thread Joe

Hi Sean, Thanks for replying but my question was about multiple stages running the same line of code, not about multiple stages in general. Yes single job can have multiple stages, but they should not be repeated, as far as I know, if you're caching/persisting your intermediate outputs. My questio

Re: Why is spark running multiple stages with the same code line?

2022-04-21 Thread Sean Owen

A job can have multiple stages for sure. One action triggers a job. This seems normal. On Thu, Apr 21, 2022, 9:10 AM Joe wrote: > Hi, > When looking at application UI (in Amazon EMR) I'm seeing one job for > my particular line of code, for example: > 64 Running count at MySparkJob.scala:540 > >

Why is spark running multiple stages with the same code line?

2022-04-21 Thread Joe

Hi, When looking at application UI (in Amazon EMR) I'm seeing one job for my particular line of code, for example: 64 Running count at MySparkJob.scala:540 When I click into the job and go to stages I can see over a 100 stages running the same line of code (stages are active, pending or completed)

Re: When should we cache / persist ? After or Before Actions?

2022-04-21 Thread Yuri Oleynikov (‫יורי אולייניקוב‬‎)

Hi Sean Persisting/caching is useful when you’re going to reuse dataframe. So in your case no persisting/caching is required. This is regarding to “when”. The “where” usually belongs to the closest point of reusing calculations/transformations Btw, I’m not sure if caching is useful when you h

Re: When should we cache / persist ? After or Before Actions?

2022-04-21 Thread Sean Owen

You persist before actions, not after, if you want the action's outputs to be persistent. If anything swap line 2 and 3. However, there's no point in the count() here, and because there is already only one action following to write, no caching is useful in that example. On Thu, Apr 21, 2022 at 2:2

[Spark Core]: Unexpectedly exiting executor while gracefully decommissioning

2022-04-21 Thread Yeachan Park

Hello all, we are running into some issues while attempting graceful decommissioning of executors. We are running spark-thriftserver (3.2.0) on Kubernetes (GKE 1.20.15-gke.2500). We enabled: - spark.decommission.enabled - spark.storage.decommission.rddBlocks.enabled - spark.storage.decomm

Re: How is union() implemented? Need to implement column bind

2022-04-21 Thread Sean Owen

Not a max - all values are needed. pivot() if anything is much closer, but see the rest of this thread. On Thu, Apr 21, 2022 at 1:19 AM Sonal Goyal wrote: > Seems like an interesting problem to solve! > > If I have understood it correctly, you have 10114 files each with the > structure > > rowid

When should we cache / persist ? After or Before Actions?

2022-04-21 Thread Sid

Hi Folks, I am working on Spark Dataframe API where I am doing following thing: 1) df = spark.sql("some sql on huge dataset").persist() 2) df1 = df.count() 3) df.repartition().write.mode().parquet("") AFAIK, persist should be used after count statement if at all it is needed to be used since sp

Re: Spark3.2 on K8s with proxy-user

Spark3.2 on K8s with proxy-user

[ANNOUNCE] Apache Kyuubi (Incubating) released 1.5.1-incubating

Re: Why is spark running multiple stages with the same code line?

Re: Why is spark running multiple stages with the same code line?

Re: Why is spark running multiple stages with the same code line?

Re: Why is spark running multiple stages with the same code line?

Why is spark running multiple stages with the same code line?

Re: When should we cache / persist ? After or Before Actions?

Re: When should we cache / persist ? After or Before Actions?

[Spark Core]: Unexpectedly exiting executor while gracefully decommissioning

Re: How is union() implemented? Need to implement column bind

When should we cache / persist ? After or Before Actions?

13 matches

Site Navigation

Mail list logo

Footer information