I am reading from a single file: df = spark.read.text("s3a://test-bucket/testfile.csv")
On Fri, May 31, 2024 at 5:26 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Tell Spark to read from a single file > > data = spark.read.text("s3a://test-bucket/testfile.csv") > > This clarifies to Spark that you are dealing with a single file and avoids > any bucket-like interpretation. > > HTH > > Mich Talebzadeh, > Technologist | Architect | Data Engineer | Generative AI | FinCrime > PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College > London <https://en.wikipedia.org/wiki/Imperial_College_London> > London, United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Fri, 31 May 2024 at 09:53, Amin Mosayyebzadeh <mosayyebza...@gmail.com> > wrote: > >> I will work on the first two possible causes. >> For the third one, which I guess is the real problem, Spark treats the >> testfile.csv object with the url s3a://test-bucket/testfile.csv as a bucket >> to access _spark_metadata with url >> s3a://test-bucket/testfile.csv/_spark_metadata >> testfile.csv is an object and should not be treated as a bucket. But I am >> not sure how to prevent Spark from doing that. >> >