Hi, we always get issues on inserting or creating table with Amazon EMR Spark version, by inserting about 1GB resultset, the spark sql query will never be finished.
by inserting small resultset (like 500MB), works fine. *spark.sql.shuffle.partitions* by default 200 or *set spark.sql.shuffle.partitions=1* do not help. the log stopped at: */15/04/01 15:48:13 INFO s3n.S3NativeFileSystem: rename s3://hive-db/tmp/hive-hadoop/hive_2015-04-01_15-47-43_036_1196347178448825102-15/-ext-10000 s3://hive-db/db_xxx/some_huge_table/* then only metrics.MetricsSaver logs. we set / <property> <name>hive.metastore.warehouse.dir</name> <value>s3://hive-db</value> </property>/ but hive.exec.scratchdir ist not set, i have no idea why the tmp files were created in /s3://hive-db/tmp/hive-hadoop// we just tried the newest Spark 1.3.0 on AMI 3.5.x and AMI 3.6 (https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/VersionInformation.md), still not work. anyone get same issue? any idea about how to fix it? i believe Amazon EMR's Spark version use com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem to access s3, but not the original hadoop s3n implementation, right? /home/hadoop/spark/classpath/emr/* and /home/hadoop/spark/classpath/emrfs/* is in classpath btw. is there any plan to use the new hadoop s3a implementation instead of s3n ? Thanks for any help. Teng -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Issue-on-Spark-SQL-insert-or-create-table-with-Spark-running-on-AWS-EMR-s3n-S3NativeFileSystem-renamd-tp22340.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org