Re: BUG :: UI Spark

2024-05-23 Thread Prem Sahoo
Does anyone have a clue ? On Thu, May 23, 2024 at 11:40 AM Prem Sahoo wrote: > Hello Team, > in spark DAG UI , we have Stages tab. Once you click on each stage you can > view the tasks. > > In each task we have a column "ShuffleWrite Size/Records " that column > prints wrong data when it gets th

Re: [s3a] Spark is not reading s3 object content

2024-05-23 Thread Mich Talebzadeh
Could be a number of reasons First test reading the file with a cli aws s3 cp s3a://input/testfile.csv . cat testfile.csv Try this code with debug option to diagnose the problem from pyspark.sql import SparkSession from pyspark.sql.utils import AnalysisException try: # Initialize Spark se

BUG :: UI Spark

2024-05-23 Thread Prem Sahoo
Hello Team, in spark DAG UI , we have Stages tab. Once you click on each stage you can view the tasks. In each task we have a column "ShuffleWrite Size/Records " that column prints wrong data when it gets the data from cache/persist . it typically will show the wrong record number though the data

[s3a] Spark is not reading s3 object content

2024-05-23 Thread Amin Mosayyebzadeh
I am trying to read an s3 object from a local S3 storage (Ceph based) using Spark 3.5.1. I see it can access the bucket and list the files (I have verified it on Ceph side by checking its logs), even returning the correct size of the object. But the content is not read. The object url is: s3a://i