Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-21 Thread Riccardo Ferrari
Hi Aakash, Can you share how are you adding those jars? Are you using the package method ? I assume you're running in a cluster, and those dependencies might have not properly distributed. How are you submitting your app? What kind of resource manager are you using standalone, yarn, ... Best, O

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-21 Thread Aakash Basu
Any help, anyone? On Fri, Dec 21, 2018 at 2:21 PM Aakash Basu wrote: > Hey Shuporno, > > With the updated config too, I am getting the same error. While trying to > figure that out, I found this link which says I need aws-java-sdk (which I > already have): > https://github.com/amazon-archives/ki

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-21 Thread Aakash Basu
Hey Shuporno, With the updated config too, I am getting the same error. While trying to figure that out, I found this link which says I need aws-java-sdk (which I already have): https://github.com/amazon-archives/kinesis-storm-spout/issues/8 Now, this is my java details: java version "1.8.0_181"

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-21 Thread Shuporno Choudhury
Hi, I don't know whether the following config (that you have tried) are correct: fs.s3a.awsAccessKeyId fs.s3a.awsSecretAccessKey The correct ones probably are: fs.s3a.access.key fs.s3a.secret.key On Fri, 21 Dec 2018 at 13:21, Aakash Basu-2 [via Apache Spark User List] < ml+s1001560n34217...@n3.na

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-20 Thread Aakash Basu
Hey Shuporno, Thanks for a prompt reply. Thanks for noticing the silly mistake, I tried this out, but still getting another error, which is related to connectivity it seems. >>> hadoop_conf.set("fs.s3a.awsAccessKeyId", "abcd") > >>> hadoop_conf.set("fs.s3a.awsSecretAccessKey", "123abc") > >>> a =

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-20 Thread Shuporno Choudhury
On Fri, 21 Dec 2018 at 12:47, Shuporno Choudhury < shuporno.choudh...@gmail.com> wrote: > Hi, > Your connection config uses 's3n' but your read command uses 's3a'. > The config for s3a are: > spark.hadoop.fs.s3a.access.key > spark.hadoop.fs.s3a.secret.key > > I feel this should solve the problem.

Connection issue with AWS S3 from PySpark 2.3.1

2018-12-20 Thread Aakash Basu
Hi, I am trying to connect to AWS S3 and read a csv file (running POC) from a bucket. I have s3cmd and and being able to run ls and other operation from cli. *Present Configuration:* Python 3.7 Spark 2.3.1 *JARs added:* hadoop-aws-2.7.3.jar (in sync with the hadoop version used with spark) aws-