Re: OOM concern

2024-05-28 Thread Perez
Thanks Mich for the detailed explanation. On Tue, May 28, 2024 at 9:53 PM Mich Talebzadeh wrote: > Russell mentioned some of these issues before. So in short your mileage > varies. For a 100 GB data transfer, the speed difference between Glue and > EMR might not be significant, especially consid

Re: OOM concern

2024-05-28 Thread Russell Jurney
If Glue lets you take a configuration based approach, and you don't have to operate any servers as with EMR... use Glue. Try EMR if that is troublesome. Russ On Tue, May 28, 2024 at 9:23 AM Mich Talebzadeh wrote: > Russell mentioned some of these issues before. So in short your mileage > varies

Re: OOM concern

2024-05-28 Thread Mich Talebzadeh
Russell mentioned some of these issues before. So in short your mileage varies. For a 100 GB data transfer, the speed difference between Glue and EMR might not be significant, especially considering the benefits of Glue's managed service aspects. However, for much larger datasets or scenarios where

Re: OOM concern

2024-05-28 Thread Perez
Thanks Mich. Yes, I agree on the costing part but how does the data transfer speed be impacted? Is it because glue takes some time to initialize underlying resources and then process the data? On Tue, May 28, 2024 at 2:23 PM Mich Talebzadeh wrote: > Your mileage varies as usual > > Glue with D

Re: OOM concern

2024-05-28 Thread Mich Talebzadeh
Your mileage varies as usual Glue with DPUs seems like a strong contender for your data transfer needs based on the simplicity, scalability, and managed service aspects. However, if data transfer speed is critical or costs become a concern after testing, consider EMR as an alternative. HTH Mich

Re: OOM concern

2024-05-27 Thread Perez
Thank you everyone for your response. I am not getting any errors as of now. I am just trying to choose the right tool for my task which is data loading from an external source into s3 via Glue/EMR. I think Glue job would be the best fit for me because I can calculate DPUs needed (maybe keeping s

Re: OOM concern

2024-05-27 Thread Russell Jurney
If you’re using EMR and Spark, you need to choose nodes with enough RAM to accommodate any given partition in your data or you can get an OOM error. Not sure if this job involves a reduce, but I would choose a single 128GB+ memory optimized instance and then adjust parallelism as via the Dpark docs

Re: OOM concern

2024-05-27 Thread Meena Rajani
What exactly is the error? Is it erroring out while reading the data from db? How are you partitioning the data? How much memory currently do you have? What is the network time out? Regards, Meena On Mon, May 27, 2024 at 4:22 PM Perez wrote: > Hi Team, > > I want to extract the data from DB a

OOM concern

2024-05-27 Thread Perez
Hi Team, I want to extract the data from DB and just dump it into S3. I don't have to perform any transformations on the data yet. My data size would be ~100 GB (historical load). Choosing the right DPUs(Glue jobs) should solve this problem right? Or should I move to EMR. I don't feel the need t