Re: AWS Glue PySpark Job

2025-01-04 Thread Perez
Hi Team, I would appreciate any help with this. https://stackoverflow.com/questions/79324390/aws-glue-pyspark-job-is-not-ending/79324917#79324917 On Fri, Jan 3, 2025 at 3:53 PM Perez wrote: > Hi Team, > > I would need your help in understanding the below problem. > > > https://stackoverflow.co

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-29 Thread Gourav Sengupta
Hi Nicolas, thanks a ton for your kind response, I will surely try this out. Regards, Gourav Sengupta On Sun, Aug 29, 2021 at 11:01 PM Nicolas Paris wrote: > as a workaround turn off pruning : > > spark.sql.hive.metastorePartitionPruning false > spark.sql.hive.convertMetastoreParquet false > >

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-29 Thread Nicolas Paris
as a workaround turn off pruning : spark.sql.hive.metastorePartitionPruning false spark.sql.hive.convertMetastoreParquet false see https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/issues/45 On Tue Aug 24, 2021 at 9:18 AM CEST, Gourav Sengupta wrote: > Hi, > > I

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-24 Thread Gourav Sengupta
Hi, I received a response from AWS, this is an issue with EMR, and they are working on resolving the issue I believe. Thanks and Regards, Gourav Sengupta On Mon, Aug 23, 2021 at 1:35 PM Gourav Sengupta < gourav.sengupta.develo...@gmail.com> wrote: > Hi, > > the query still gives the same error

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-23 Thread Gourav Sengupta
Hi, the query still gives the same error if we write "SELECT * FROM table_name WHERE data_partition > CURRENT_DATE() - INTERVAL 10 DAYS". Also the queries work fine in SPARK 3.0.x, or in EMR 6.2.0. Thanks and Regards, Gourav Sengupta On Mon, Aug 23, 2021 at 1:16 PM Sean Owen wrote: > Date ha

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-23 Thread Sean Owen
Date handling was tightened up in Spark 3. I think you need to compare to a date literal, not a string literal. On Mon, Aug 23, 2021 at 5:12 AM Gourav Sengupta < gourav.sengupta.develo...@gmail.com> wrote: > Hi, > > while I am running in EMR 6.3.0 (SPARK 3.1.1) a simple query as "SELECT * > FROM

Re: Aws

2019-02-08 Thread Pedro Tuero
Hi Noritaka, I start clusters from Java API. Clusters running on 5.16 have not manual configurations in the Emr console Configuration tab, so I assume the value of this property should be the default on 5.16. I enabled maximize resource allocation because otherwise, the number of cores automatical

Re: Aws

2019-02-07 Thread Noritaka Sekiyama
Hi Pedro, It seems that you disabled maximize resource allocation in 5.16, but enabled in 5.20. This config can be different based on how you start EMR cluster (via quick wizard, advanced wizard in console, or CLI/API). You can see that in EMR console Configuration tab. Please compare spark prope

Re: Aws

2019-02-07 Thread Hiroyuki Nagata
Hi, thank you Pedro I tested maximizeResourceAllocation option. When it's enabled, it seems Spark utilized their cores fully. However the performance is not so different from default setting. I consider to use s3-distcp for uploading files. And, I think table(dataframe) caching is also effectiven

Re: Aws

2019-02-01 Thread Pedro Tuero
Hi Hiroyuki, thanks for the answer. I found a solution for the cores per executor configuration: I set this configuration to true: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#emr-spark-maximizeresourceallocation Probably it was true by default at version 5.16, but

Re: Aws

2019-01-31 Thread Hiroyuki Nagata
Hi, Pedro I also start using AWS EMR, with Spark 2.4.0. I'm seeking methods for performance tuning. Do you configure dynamic allocation ? FYI: https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation I've not tested it yet. I guess spark-submit needs to specify numb

Re: AWS credentials needed while trying to read a model from S3 in Spark

2018-05-09 Thread Srinath C
You could use IAM roles in AWS to access the data in S3 without credentials. See this link and this link for an

Re: AWS CLI --jars comma problem

2015-12-07 Thread Akhil Das
Not a direct answer but you can create a big fat jar combining all the classes in the three jars and pass it. Thanks Best Regards On Thu, Dec 3, 2015 at 10:21 PM, Yusuf Can Gürkan wrote: > Hello > > I have a question about AWS CLI for people who use it. > > I create a spark cluster with aws cli

Re: AWS-Credentials fails with org.apache.hadoop.fs.s3.S3Exception: FORBIDDEN

2015-05-08 Thread in4maniac
HI GUYS... I realised that it was a bug in my code that caused the code to break.. I was running the filter on a SchemaRDD when I was supposed to be running it on an RDD. But I still don't understand why the stderr was about S3 request rather than a type checking error such as "No tuple position

Re: AWS-Credentials fails with org.apache.hadoop.fs.s3.S3Exception: FORBIDDEN

2015-05-08 Thread Akhil Das
Have a look at this SO question, it has discussion on various ways of accessing S3. Thanks Best Regards On Fri, May 8, 2015 at 1:21 AM, in4maniac wrote: > Hi Guys, > > I think th

Re: AWS SDK HttpClient version conflict (spark.files.userClassPathFirst not working)

2015-03-15 Thread Adam Lewandowski
Just following up on this issue. I discovered that when I ran the application in a YARN cluster (on AWS EMR), I was able to use the AWS SDK without issue (without the 'spark.files.userClassPath' flag set). Also, I learned that the entire 'child-first' classloader setup was changed in Spark 1.3.0 (r

Re: AWS SDK HttpClient version conflict (spark.files.userClassPathFirst not working)

2015-03-12 Thread 浦野 裕也
Hi Adam, Could you try building spark with profile -Pkinesis-asl. mvn -Pkinesis-asl -DskipTests clean package refers to 'Running the Example' section. https://spark.apache.org/docs/latest/streaming-kinesis-integration.html In fact, I've seen same issue and have been able to use the AWS SDK by

Re: AWS Credentials for private S3 reads

2014-07-02 Thread Matei Zaharia
Hmm, yeah, that is weird but because it’s only on some files it might mean those didn’t get fully uploaded. Matei On Jul 2, 2014, at 4:50 PM, Brian Gawalt wrote: > HUH; not-scrubbing the slashes fixed it. I would have sworn I tried it, got a > 403 Forbidden, then remembered the slash prescrip

Re: AWS Credentials for private S3 reads

2014-07-02 Thread Brian Gawalt
HUH; not-scrubbing the slashes fixed it. I would have sworn I tried it, got a 403 Forbidden, then remembered the slash prescription. Can confirm I was never scrubbing the actual URIs. It looks like it'd all be working now except it's smacking its head against: 14/07/02 23:37:38 INFO rdd.HadoopRDD:

Re: AWS Credentials for private S3 reads

2014-07-02 Thread Matei Zaharia
When you use hadoopConfiguration directly, I don’t think you have to replace the “/“ with “%2f”. Have you tried it without that? Also make sure you’re not replacing slashes in the URL itself. Matei On Jul 2, 2014, at 4:17 PM, Brian Gawalt wrote: > Hello everyone, > > I'm having some difficul

Re: AWS Spark-ec2 script with different user

2014-04-09 Thread Marco Costantini
Perfect. Now I know what to do. Thanks to your help! Many thanks, Marco. On Wed, Apr 9, 2014 at 12:27 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > The AMI should automatically switch between PVM and HVM based on the > instance type you specify on the command line. For refere

Re: AWS Spark-ec2 script with different user

2014-04-09 Thread Shivaram Venkataraman
The AMI should automatically switch between PVM and HVM based on the instance type you specify on the command line. For reference (note you don't need to specify this on the command line), the PVM ami id is ami-5bb18832 in us-east-1. FWIW we maintain the list of AMI Ids (across regions and pvm, hv

Re: AWS Spark-ec2 script with different user

2014-04-09 Thread Marco Costantini
Ah, tried that. I believe this is an HVM AMI? We are exploring paravirtual AMIs. On Wed, Apr 9, 2014 at 11:17 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > And for the record, that AMI is ami-35b1885c. Again, you don't need to > specify it explicitly; spark-ec2 will default to it.

Re: AWS Spark-ec2 script with different user

2014-04-09 Thread Nicholas Chammas
And for the record, that AMI is ami-35b1885c. Again, you don't need to specify it explicitly; spark-ec2 will default to it. On Wed, Apr 9, 2014 at 11:08 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Marco, > > If you call spark-ec2 launch without specifying an AMI, it will default

Re: AWS Spark-ec2 script with different user

2014-04-09 Thread Nicholas Chammas
Marco, If you call spark-ec2 launch without specifying an AMI, it will default to the Spark-provided AMI. Nick On Wed, Apr 9, 2014 at 9:43 AM, Marco Costantini < silvio.costant...@granatads.com> wrote: > Hi there, > To answer your question; no there is no reason NOT to use an AMI that > Spark

Re: AWS Spark-ec2 script with different user

2014-04-09 Thread Marco Costantini
Hi there, To answer your question; no there is no reason NOT to use an AMI that Spark has prepared. The reason we haven't is that we were not aware such AMIs existed. Would you kindly point us to the documentation where we can read about this further? Many many thanks, Shivaram. Marco. On Tue, A

Re: AWS Spark-ec2 script with different user

2014-04-08 Thread Shivaram Venkataraman
Is there any reason why you want to start with a vanilla amazon AMI rather than the ones we build and provide as a part of Spark EC2 scripts ? The AMIs we provide are close to the vanilla AMI but have the root account setup properly and install packages like java that are used by Spark. If you wis

Re: AWS Spark-ec2 script with different user

2014-04-08 Thread Marco Costantini
I was able to keep the "workaround" ...around... by overwriting the generated '/root/.ssh/authorized_keys' file with a known good one, in the '/etc/rc.local' file On Tue, Apr 8, 2014 at 10:12 AM, Marco Costantini < silvio.costant...@granatads.com> wrote: > Another thing I didn't mention. The AMI

Re: AWS Spark-ec2 script with different user

2014-04-08 Thread Marco Costantini
Another thing I didn't mention. The AMI and user used: naturally I've created several of my own AMIs with the following characteristics. None of which worked. 1) Enabling ssh as root as per this guide ( http://blog.tiger-workshop.com/enable-root-access-on-amazon-ec2-instance/). When doing this, I

Re: AWS Spark-ec2 script with different user

2014-04-08 Thread Marco Costantini
As requested, here is the script I am running. It is a simple shell script which calls spark-ec2 wrapper script. I execute it from the 'ec2' directory of spark, as usual. The AMI used is the raw one from the AWS Quick Start section. It is the first option (an Amazon Linux paravirtual image). Any id

Re: AWS Spark-ec2 script with different user

2014-04-07 Thread Shivaram Venkataraman
Hmm -- That is strange. Can you paste the command you are using to launch the instances ? The typical workflow is to use the spark-ec2 wrapper script using the guidelines at http://spark.apache.org/docs/latest/ec2-scripts.html Shivaram On Mon, Apr 7, 2014 at 1:53 PM, Marco Costantini < silvio.co

Re: AWS Spark-ec2 script with different user

2014-04-07 Thread Marco Costantini
Hi Shivaram, OK so let's assume the script CANNOT take a different user and that it must be 'root'. The typical workaround is as you said, allow the ssh with the root user. Now, don't laugh, but, this worked last Friday, but today (Monday) it no longer works. :D Why? ... ...It seems that NOW, whe

Re: AWS Spark-ec2 script with different user

2014-04-07 Thread Shivaram Venkataraman
Right now the spark-ec2 scripts assume that you have root access and a lot of internal scripts assume have the user's home directory hard coded as /root. However all the Spark AMIs we build should have root ssh access -- Do you find this not to be the case ? You can also enable root ssh access i