Hi All, I have large volume of data nearly 500TB(from 2016-2018-till date), I have to do some ETL on that data.
This data is there in the AWS S3, so I planning to use AWS EMR setup to process this data but I am not sure what should be the config I should select . 1. Do I need to process monthly or can I process all data at once? 2. What should be Master and slave(executor) memory both Ram and storage? 3. What kind of processor (speed) I need? 4. How many slaves do we need ? Based on this I want to calculate the cost of AWS EMR and start process the data Regards Indra