date:20221014

[Feature Request] make unix_micros() and unix_millis() available in PySpark (pyspark.sql.functions)

2022-10-14 Thread Martin

Hi everyone, In *Spark SQL* there are several timestamp related functions - unix_micros(timestamp) Returns the number of microseconds since 1970-01-01 00:00:00 UTC. - unix_millis(timestamp) Returns the number of milliseconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of

Re: Apache Spark Operator for Kubernetes?

2022-10-14 Thread Artemis User

If you have the hardware resources, it isn't difficult to set up Spark in a kubernetes cluster. The online doc describes everything you would need (https://spark.apache.org/docs/latest/running-on-kubernetes.html). You're right, both AWS EMR and Google's environment aren't flexible and not che

Apache Spark Operator for Kubernetes?

2022-10-14 Thread Clayton Wohl

My company has been exploring the Google Spark Operator for running Spark jobs on a Kubernetes cluster, but we've found lots of limitations and problems, and the product seems weakly supported. Is there any official Apache option, or plans for such an option, to run Spark jobs on Kubernetes? Is th

[SparkListener] Calculating the total amount of re-computations / waste

2022-10-14 Thread Faiz Halde

Hello, We run our spark workloads on spot and we would like to quantify the impact of spot interruptions on our workloads. We are proposing the following metric but would like your opinions on it We are leveraging Spark's Event Listener and performing the following T = task T1 = sum(T.execution