Apache spark Web UI on Amazon EMR not working

2015-12-10 Thread sonal sharma
Hi, We are using Spark on Amazon EMR 4.1. To access Spark web UI, we are using the link in yarn resource manager, but we are seeing a blank page on it. Further, using Firefox debugging we noticed that we got a HTTP 500 error in response. We have tried configuring proxy settings for AWS and also r

Parquet partitioning performance issue

2015-09-13 Thread sonal sharma
Hi Team, We have scheduled jobs that read new records from MySQL database every hour and write (append) them to parquet. For each append operation, spark creates 10 new partitions in parquet file. Some of these partitions are fairly small in size (20-40 KB) leading to high number of smaller parti