Re: Probable Spark Bug while inserting into flat GCS bucket?

2023-08-19 Thread Mich Talebzadeh
Under gs directory "gs://test_dd1/abc/" What do you see? gsutil ls gs://test_dd1/abc and the same gs://test_dd1/ gsutil ls gs://test_dd1 I suspect you need a folder for multiple ORC slices! Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin p

Probable Spark Bug while inserting into flat GCS bucket?

2023-08-19 Thread Dipayan Dev
Hi Everyone, I'm stuck with one problem, where I need to provide a custom GCS location for the Hive table from Spark. The code fails while doing an *'insert into'* whenever my Hive table has a flag GS location like gs://, but works for nested locations like gs://bucket_name/blob_name. Is anyone a

Re: [External Email] Re: [Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-19 Thread Nebi Aydin
Here's the executor logs ``` java.io.IOException: Connection from ip-172-31-16-143.ec2.internal/172.31.16.143:7337 closed at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:146) at org.apache.spark.network.server.TransportCha

Re: [External Email] Re: [Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-19 Thread Mich Talebzadeh
That error message *FetchFailedException: Failed to connect to on port 7337 *happens when a task running on one executor node tries to fetch data from another executor node but fails to establish a connection to the specified port (7337 in this case). In a nutshell it is performing network IO amon