from:"Asaf Mesika"

S3 Metrics when reading/writing using Spark

2024-12-23 Thread Asaf Mesika

Hi, I was looking for metrics specifying how many objects ("files") were read / written when using Spark over S3. The metrics specified at [ https://spark.apache.org/docs/3.5.1/monitoring.html#component-instance--executor] do not have objects written / read from s3 metric. I do see the Hadoop de

Sometimes TaskContext configuration is almost empty

2024-07-08 Thread Asaf Mesika

Hi, I'm using PySpark to construct a DataFrame and run it. This DataFrame uses views created by other DataFrames in the same session. One of these views, say view A, has an SQL statement that has in one of the SQL expressions a combination of map functions, among them Map Concat. Another view, say