Does anyone have a clue ?
On Thu, May 23, 2024 at 11:40 AM Prem Sahoo wrote:
> Hello Team,
> in spark DAG UI , we have Stages tab. Once you click on each stage you can
> view the tasks.
>
> In each task we have a column "ShuffleWrite Size/Records " that column
> prints wrong data when it gets th
Could be a number of reasons
First test reading the file with a cli
aws s3 cp s3a://input/testfile.csv .
cat testfile.csv
Try this code with debug option to diagnose the problem
from pyspark.sql import SparkSession
from pyspark.sql.utils import AnalysisException
try:
# Initialize Spark se
Hello Team,
in spark DAG UI , we have Stages tab. Once you click on each stage you can
view the tasks.
In each task we have a column "ShuffleWrite Size/Records " that column
prints wrong data when it gets the data from cache/persist . it
typically will show the wrong record number though the data
I am trying to read an s3 object from a local S3 storage (Ceph based)
using Spark 3.5.1. I see it can access the bucket and list the files (I
have verified it on Ceph side by checking its logs), even returning the
correct size of the object. But the content is not read.
The object url is:
s3a://i