Ok, thanks a lot for the heads up.
Sent from my iPhone
> On Feb 25, 2017, at 10:58 AM, Steve Loughran wrote:
>
>
>> On 24 Feb 2017, at 07:47, Femi Anthony wrote:
>>
>> Have you tried reading using s3n which is a slightly older protocol ? I'm
>> not sure how compatible s3a is with older vers
On 24 Feb 2017, at 07:47, Femi Anthony
mailto:femib...@gmail.com>> wrote:
Have you tried reading using s3n which is a slightly older protocol ? I'm not
sure how compatible s3a is with older versions of Spark.
I would absolutely not use s3n with a 1.2 GB file.
There is a WONTFIX JIRA on how it
Gourav,
I’ll start experimenting with Spark 2.1 to see if this works.
Cheers,
Ben
> On Feb 24, 2017, at 5:46 AM, Gourav Sengupta
> wrote:
>
> Hi Benjamin,
>
> First of all fetching data from S3 while writing a code in on premise system
> is a very bad idea. You might want to first copy the
Hi Benjamin,
First of all fetching data from S3 while writing a code in on premise
system is a very bad idea. You might want to first copy the data in to
local HDFS before running your code. Ofcourse this depends on the volume of
data and internet speed that you have.
The platform which makes you
Have you tried reading using s3n which is a slightly older protocol ? I'm
not sure how compatible s3a is with older versions of Spark.
Femi
On Fri, Feb 24, 2017 at 2:18 AM, Benjamin Kim wrote:
> Hi Gourav,
>
> My answers are below.
>
> Cheers,
> Ben
>
>
> On Feb 23, 2017, at 10:57 PM, Gourav S
Hi Gourav,
My answers are below.
Cheers,
Ben
> On Feb 23, 2017, at 10:57 PM, Gourav Sengupta
> wrote:
>
> Can I ask where are you running your CDH? Is it on premise or have you
> created a cluster for yourself in AWS? Our cluster in on premise in our data
> center.
>
> Also I have really
Can I ask where are you running your CDH? Is it on premise or have you
created a cluster for yourself in AWS?
Also I have really never seen use s3a before, that was used way long before
when writing s3 files took a long time, but I think that you are reading
it.
Anyideas why you are not migrating
Aakash,
Here is a code snippet for the keys.
val accessKey = “---"
val secretKey = “---"
val hadoopConf = sc.hadoopConfiguration
hadoopConf.set("fs.s3a.access.key", accessKey)
hadoopConf.set("fs.s3a.secret.key", secretKey)
hadoopConf.set("spark.hadoop.fs.s3a.access.key",accessKey)
hadoopConf.set
Hey,
Please recheck your access key and secret key being used to fetch the
parquet file. It seems to be a credential error. Either mismatch/load. If
load, then first use it directly in code and see if the issue resolves,
then it can be hidden and read from Input Params.
Thanks,
Aakash.
On 23-Fe
We are trying to use Spark 1.6 within CDH 5.7.1 to retrieve a 1.3GB Parquet
file from AWS S3. We can read the schema and show some data when the file is
loaded into a DataFrame, but when we try to do some operations, such as count,
we get this error below.
com.cloudera.com.amazonaws.AmazonClien
10 matches
Mail list logo