Stacktrace are below. But someone told me that it's known issue and will be patched in couple of weeks(EMR 4.1.)
So, dont' mind about that. I'll waiting until patched. scala> val ORCFile = sqlContext.read.format("orc").load("s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000") 2015-09-13 07:33:29,228 INFO [main] fs.EmrFileSystem (EmrFileSystem.java:initialize(107)) - Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation 2015-09-13 07:33:29,314 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[CF49E1372BEF2E81], ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0, ClientExecuteTime=[85.608], HttpRequestTime=[85.101], HttpClientReceiveResponseTime=[13.891], RequestSigningTime=[0.259], ResponseProcessingTime=[0.007], HttpClientSendRequestTime=[0.305], 2015-09-13 07:33:29,351 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[55B8C5E6009F0246], ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[32.776], HttpRequestTime=[13.17], HttpClientReceiveResponseTime=[10.961], RequestSigningTime=[0.28], ResponseProcessingTime=[19.042], HttpClientSendRequestTime=[0.295], 2015-09-13 07:33:29,421 INFO [main] s3n.S3NativeFileSystem (S3NativeFileSystem.java:open(1159)) - Opening 's3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000' for reading 2015-09-13 07:33:29,477 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206], ServiceName=[Amazon S3], AWSRequestID=[F698A6A43297754E], ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[53.698], HttpRequestTime=[50.815], HttpClientReceiveResponseTime=[48.774], RequestSigningTime=[0.372], ResponseProcessingTime=[0.861], HttpClientSendRequestTime=[0.362], 2015-09-13 07:33:29,478 INFO [main] metrics.MetricsSaver (MetricsSaver.java:<init>(915)) - Thread 1 created MetricsLockFreeSaver 1 2015-09-13 07:33:29,479 INFO [main] s3n.S3NativeFileSystem (S3NativeFileSystem.java:retrievePair(292)) - Stream for key 'S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000' seeking to position '217260502' 2015-09-13 07:33:29,590 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206], ServiceName=[Amazon S3], AWSRequestID=[AD631A8AE229AFE7], ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0, ClientExecuteTime=[109.859], HttpRequestTime=[109.204], HttpClientReceiveResponseTime=[58.468], RequestSigningTime=[0.286], ResponseProcessingTime=[0.133], HttpClientSendRequestTime=[0.327], 2015-09-13 07:33:29,753 INFO [main] s3n.S3NativeFileSystem (S3NativeFileSystem.java:listStatus(896)) - listStatus s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000 with recursive false 2015-09-13 07:33:29,877 INFO [main] hive.HiveContext (Logging.scala:logInfo(59)) - Initializing HiveMetastoreConnection version 0.13.1 using Spark classes. 2015-09-13 07:33:30,593 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2015-09-13 07:33:30,622 INFO [main] metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(493)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 2015-09-13 07:33:30,641 INFO [main] metastore.ObjectStore (ObjectStore.java:initialize(246)) - ObjectStore, initialize called 2015-09-13 07:33:30,782 INFO [main] DataNucleus.Persistence (Log4JLogger.java:info(77)) - Property datanucleus.cache.level2 unknown - will be ignored 2015-09-13 07:33:30,782 INFO [main] DataNucleus.Persistence (Log4JLogger.java:info(77)) - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 2015-09-13 07:33:31,208 INFO [main] metastore.ObjectStore (ObjectStore.java:getPMF(315)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 2015-09-13 07:33:32,375 INFO [main] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 2015-09-13 07:33:32,376 INFO [main] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 2015-09-13 07:33:32,470 INFO [main] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 2015-09-13 07:33:32,470 INFO [main] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 2015-09-13 07:33:32,558 INFO [main] DataNucleus.Query (Log4JLogger.java:info(77)) - Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing 2015-09-13 07:33:32,561 INFO [main] metastore.ObjectStore (ObjectStore.java:setConf(229)) - Initialized ObjectStore 2015-09-13 07:33:32,816 INFO [main] metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(551)) - Added admin role in metastore 2015-09-13 07:33:32,819 INFO [main] metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(560)) - Added public role in metastore 2015-09-13 07:33:32,888 INFO [main] metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers(588)) - No user is added in admin role, since config is empty 2015-09-13 07:33:33,343 INFO [main] session.SessionState (SessionState.java:start(360)) - No Tez session required at this point. hive.execution.engine=mr. ORCFile: org.apache.spark.sql.DataFrame = [h_header1: string, h_header2: string, h_header3: string, h_header4: string, h_header5: string, h_header6: string, h_header7: string, h_header8: string, h_header9: string, body: map<string,string>, yymmdd: int, country: string] scala> ORCFile.head 2015-09-13 07:33:41,080 INFO [main] sources.DataSourceStrategy (Logging.scala:logInfo(59)) - Selected 1 partitions out of 1, pruned 0.0% partitions. 2015-09-13 07:33:41,169 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(243112) called with curMem=0, maxMem=280248975 2015-09-13 07:33:41,171 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_0 stored as values in memory (estimated size 237.4 KB, free 267.0 MB) 2015-09-13 07:33:41,214 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(22100) called with curMem=243112, maxMem=280248975 2015-09-13 07:33:41,215 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_0_piece0 stored as bytes in memory (estimated size 21.6 KB, free 267.0 MB) 2015-09-13 07:33:41,216 INFO [sparkDriver-akka.actor.default-dispatcher-3] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_0_piece0 in memory on 10.0.0.112:48218 (size: 21.6 KB, free: 267.2 MB) 2015-09-13 07:33:41,221 INFO [main] spark.SparkContext (Logging.scala:logInfo(59)) - Created broadcast 0 from head at <console>:22 2015-09-13 07:33:41,396 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(244448) called with curMem=265212, maxMem=280248975 2015-09-13 07:33:41,396 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_1 stored as values in memory (estimated size 238.7 KB, free 266.8 MB) 2015-09-13 07:33:41,422 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(22567) called with curMem=509660, maxMem=280248975 2015-09-13 07:33:41,422 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_1_piece0 stored as bytes in memory (estimated size 22.0 KB, free 266.8 MB) 2015-09-13 07:33:41,423 INFO [sparkDriver-akka.actor.default-dispatcher-3] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_1_piece0 in memory on 10.0.0.112:48218 (size: 22.0 KB, free: 267.2 MB) 2015-09-13 07:33:41,426 INFO [main] spark.SparkContext (Logging.scala:logInfo(59)) - Created broadcast 1 from head at <console>:22 2015-09-13 07:33:41,495 INFO [main] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> 2015-09-13 07:33:41,497 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1049)) - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 2015-09-13 07:33:41,501 INFO [ORC_GET_SPLITS #0] s3n.S3NativeFileSystem (S3NativeFileSystem.java:listStatus(896)) - listStatus s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000 with recursive false 2015-09-13 07:33:41,504 INFO [ORC_GET_SPLITS #1] s3n.S3NativeFileSystem (S3NativeFileSystem.java:open(1159)) - Opening 's3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000' for reading 2015-09-13 07:33:41,593 INFO [ORC_GET_SPLITS #1] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206], ServiceName=[Amazon S3], AWSRequestID=[8DFE404E45BFD9CD], ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0, ClientExecuteTime=[88.129], HttpRequestTime=[86.932], HttpClientReceiveResponseTime=[42.613], RequestSigningTime=[0.539], ResponseProcessingTime=[0.142], HttpClientSendRequestTime=[0.337], 2015-09-13 07:33:41,594 INFO [ORC_GET_SPLITS #1] s3n.S3NativeFileSystem (S3NativeFileSystem.java:retrievePair(292)) - Stream for key 'S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000' seeking to position '217260502' 2015-09-13 07:33:41,674 INFO [ORC_GET_SPLITS #1] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206], ServiceName=[Amazon S3], AWSRequestID=[040D77B7E7E76AA5], ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0, ClientExecuteTime=[79.608], HttpRequestTime=[79.064], HttpClientReceiveResponseTime=[36.843], RequestSigningTime=[0.222], ResponseProcessingTime=[0.11], HttpClientSendRequestTime=[0.343], 2015-09-13 07:33:41,681 ERROR [ORC_GET_SPLITS #1] orc.OrcInputFormat (OrcInputFormat.java:run(826)) - Unexpected Exception java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.createSplit(OrcInputFormat.java:694) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:822) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.waitForTasks(OrcInputFormat.java:466) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:919) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:944) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.getPartitions(HadoopRDD.scala:375) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:121) at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:125) at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1269) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1203) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1210) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:33) at $iwC$$iwC$$iwC.<init>(<console>:35) at $iwC$$iwC.<init>(<console>:37) at $iwC.<init>(<console>:39) at <init>(<console>:41) at .<init>(<console>:45) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.createSplit(OrcInputFormat.java:694) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:822) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Question-ORC-EMRFS-Problem-tp24673p24675.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org