Stacktrace are below.

But someone told me that it's known issue and will be patched in couple of
weeks(EMR 4.1.)

So, dont' mind about that. I'll waiting until patched.







scala> val ORCFile =
sqlContext.read.format("orc").load("s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000")



2015-09-13 07:33:29,228 INFO  [main] fs.EmrFileSystem
(EmrFileSystem.java:initialize(107)) - Consistency disabled, using
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem
implementation
2015-09-13 07:33:29,314 INFO  [main] amazonaws.latency
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200],
ServiceName=[Amazon S3], AWSRequestID=[CF49E1372BEF2E81],
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com],
HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0,
HttpClientPoolAvailableCount=0, ClientExecuteTime=[85.608],
HttpRequestTime=[85.101], HttpClientReceiveResponseTime=[13.891],
RequestSigningTime=[0.259], ResponseProcessingTime=[0.007],
HttpClientSendRequestTime=[0.305],
2015-09-13 07:33:29,351 INFO  [main] amazonaws.latency
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200],
ServiceName=[Amazon S3], AWSRequestID=[55B8C5E6009F0246],
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com],
HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0,
HttpClientPoolAvailableCount=1, ClientExecuteTime=[32.776],
HttpRequestTime=[13.17], HttpClientReceiveResponseTime=[10.961],
RequestSigningTime=[0.28], ResponseProcessingTime=[19.042],
HttpClientSendRequestTime=[0.295],
2015-09-13 07:33:29,421 INFO  [main] s3n.S3NativeFileSystem
(S3NativeFileSystem.java:open(1159)) - Opening
's3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000'
for reading
2015-09-13 07:33:29,477 INFO  [main] amazonaws.latency
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206],
ServiceName=[Amazon S3], AWSRequestID=[F698A6A43297754E],
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com],
HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0,
HttpClientPoolAvailableCount=1, ClientExecuteTime=[53.698],
HttpRequestTime=[50.815], HttpClientReceiveResponseTime=[48.774],
RequestSigningTime=[0.372], ResponseProcessingTime=[0.861],
HttpClientSendRequestTime=[0.362],
2015-09-13 07:33:29,478 INFO  [main] metrics.MetricsSaver
(MetricsSaver.java:<init>(915)) - Thread 1 created MetricsLockFreeSaver 1
2015-09-13 07:33:29,479 INFO  [main] s3n.S3NativeFileSystem
(S3NativeFileSystem.java:retrievePair(292)) - Stream for key
'S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000'
seeking to position '217260502'
2015-09-13 07:33:29,590 INFO  [main] amazonaws.latency
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206],
ServiceName=[Amazon S3], AWSRequestID=[AD631A8AE229AFE7],
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com],
HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0,
HttpClientPoolAvailableCount=0, ClientExecuteTime=[109.859],
HttpRequestTime=[109.204], HttpClientReceiveResponseTime=[58.468],
RequestSigningTime=[0.286], ResponseProcessingTime=[0.133],
HttpClientSendRequestTime=[0.327],
2015-09-13 07:33:29,753 INFO  [main] s3n.S3NativeFileSystem
(S3NativeFileSystem.java:listStatus(896)) - listStatus
s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000
with recursive false
2015-09-13 07:33:29,877 INFO  [main] hive.HiveContext
(Logging.scala:logInfo(59)) - Initializing HiveMetastoreConnection version
0.13.1 using Spark classes.
2015-09-13 07:33:30,593 WARN  [main] util.NativeCodeLoader
(NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
2015-09-13 07:33:30,622 INFO  [main] metastore.HiveMetaStore
(HiveMetaStore.java:newRawStore(493)) - 0: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2015-09-13 07:33:30,641 INFO  [main] metastore.ObjectStore
(ObjectStore.java:initialize(246)) - ObjectStore, initialize called
2015-09-13 07:33:30,782 INFO  [main] DataNucleus.Persistence
(Log4JLogger.java:info(77)) - Property datanucleus.cache.level2 unknown -
will be ignored
2015-09-13 07:33:30,782 INFO  [main] DataNucleus.Persistence
(Log4JLogger.java:info(77)) - Property hive.metastore.integral.jdo.pushdown
unknown - will be ignored
2015-09-13 07:33:31,208 INFO  [main] metastore.ObjectStore
(ObjectStore.java:getPMF(315)) - Setting MetaStore object pin classes with
hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2015-09-13 07:33:32,375 INFO  [main] DataNucleus.Datastore
(Log4JLogger.java:info(77)) - The class
"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
"embedded-only" so does not have its own datastore table.
2015-09-13 07:33:32,376 INFO  [main] DataNucleus.Datastore
(Log4JLogger.java:info(77)) - The class
"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only"
so does not have its own datastore table.
2015-09-13 07:33:32,470 INFO  [main] DataNucleus.Datastore
(Log4JLogger.java:info(77)) - The class
"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
"embedded-only" so does not have its own datastore table.
2015-09-13 07:33:32,470 INFO  [main] DataNucleus.Datastore
(Log4JLogger.java:info(77)) - The class
"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only"
so does not have its own datastore table.
2015-09-13 07:33:32,558 INFO  [main] DataNucleus.Query
(Log4JLogger.java:info(77)) - Reading in results for query
"org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is
closing
2015-09-13 07:33:32,561 INFO  [main] metastore.ObjectStore
(ObjectStore.java:setConf(229)) - Initialized ObjectStore
2015-09-13 07:33:32,816 INFO  [main] metastore.HiveMetaStore
(HiveMetaStore.java:createDefaultRoles(551)) - Added admin role in metastore
2015-09-13 07:33:32,819 INFO  [main] metastore.HiveMetaStore
(HiveMetaStore.java:createDefaultRoles(560)) - Added public role in
metastore
2015-09-13 07:33:32,888 INFO  [main] metastore.HiveMetaStore
(HiveMetaStore.java:addAdminUsers(588)) - No user is added in admin role,
since config is empty
2015-09-13 07:33:33,343 INFO  [main] session.SessionState
(SessionState.java:start(360)) - No Tez session required at this point.
hive.execution.engine=mr.
ORCFile: org.apache.spark.sql.DataFrame = [h_header1: string, h_header2:
string, h_header3: string, h_header4: string, h_header5: string, h_header6:
string, h_header7: string, h_header8: string, h_header9: string, body:
map<string,string>, yymmdd: int, country: string]




scala> ORCFile.head



2015-09-13 07:33:41,080 INFO  [main] sources.DataSourceStrategy
(Logging.scala:logInfo(59)) - Selected 1 partitions out of 1, pruned 0.0%
partitions.
2015-09-13 07:33:41,169 INFO  [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - ensureFreeSpace(243112) called with curMem=0,
maxMem=280248975
2015-09-13 07:33:41,171 INFO  [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - Block broadcast_0 stored as values in memory
(estimated size 237.4 KB, free 267.0 MB)
2015-09-13 07:33:41,214 INFO  [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - ensureFreeSpace(22100) called with
curMem=243112, maxMem=280248975
2015-09-13 07:33:41,215 INFO  [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - Block broadcast_0_piece0 stored as bytes in
memory (estimated size 21.6 KB, free 267.0 MB)
2015-09-13 07:33:41,216 INFO  [sparkDriver-akka.actor.default-dispatcher-3]
storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added
broadcast_0_piece0 in memory on 10.0.0.112:48218 (size: 21.6 KB, free: 267.2
MB)
2015-09-13 07:33:41,221 INFO  [main] spark.SparkContext
(Logging.scala:logInfo(59)) - Created broadcast 0 from head at <console>:22
2015-09-13 07:33:41,396 INFO  [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - ensureFreeSpace(244448) called with
curMem=265212, maxMem=280248975
2015-09-13 07:33:41,396 INFO  [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - Block broadcast_1 stored as values in memory
(estimated size 238.7 KB, free 266.8 MB)
2015-09-13 07:33:41,422 INFO  [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - ensureFreeSpace(22567) called with
curMem=509660, maxMem=280248975
2015-09-13 07:33:41,422 INFO  [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - Block broadcast_1_piece0 stored as bytes in
memory (estimated size 22.0 KB, free 266.8 MB)
2015-09-13 07:33:41,423 INFO  [sparkDriver-akka.actor.default-dispatcher-3]
storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added
broadcast_1_piece0 in memory on 10.0.0.112:48218 (size: 22.0 KB, free: 267.2
MB)
2015-09-13 07:33:41,426 INFO  [main] spark.SparkContext
(Logging.scala:logInfo(59)) - Created broadcast 1 from head at <console>:22
2015-09-13 07:33:41,495 INFO  [main] log.PerfLogger
(PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=OrcGetSplits
from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2015-09-13 07:33:41,497 INFO  [main] Configuration.deprecation
(Configuration.java:warnOnceIfDeprecated(1049)) - mapred.input.dir is
deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2015-09-13 07:33:41,501 INFO  [ORC_GET_SPLITS #0] s3n.S3NativeFileSystem
(S3NativeFileSystem.java:listStatus(896)) - listStatus
s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000
with recursive false
2015-09-13 07:33:41,504 INFO  [ORC_GET_SPLITS #1] s3n.S3NativeFileSystem
(S3NativeFileSystem.java:open(1159)) - Opening
's3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000'
for reading
2015-09-13 07:33:41,593 INFO  [ORC_GET_SPLITS #1] amazonaws.latency
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206],
ServiceName=[Amazon S3], AWSRequestID=[8DFE404E45BFD9CD],
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com],
HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0,
HttpClientPoolAvailableCount=0, ClientExecuteTime=[88.129],
HttpRequestTime=[86.932], HttpClientReceiveResponseTime=[42.613],
RequestSigningTime=[0.539], ResponseProcessingTime=[0.142],
HttpClientSendRequestTime=[0.337],
2015-09-13 07:33:41,594 INFO  [ORC_GET_SPLITS #1] s3n.S3NativeFileSystem
(S3NativeFileSystem.java:retrievePair(292)) - Stream for key
'S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000'
seeking to position '217260502'
2015-09-13 07:33:41,674 INFO  [ORC_GET_SPLITS #1] amazonaws.latency
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206],
ServiceName=[Amazon S3], AWSRequestID=[040D77B7E7E76AA5],
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com],
HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0,
HttpClientPoolAvailableCount=0, ClientExecuteTime=[79.608],
HttpRequestTime=[79.064], HttpClientReceiveResponseTime=[36.843],
RequestSigningTime=[0.222], ResponseProcessingTime=[0.11],
HttpClientSendRequestTime=[0.343],
2015-09-13 07:33:41,681 ERROR [ORC_GET_SPLITS #1] orc.OrcInputFormat
(OrcInputFormat.java:run(826)) - Unexpected Exception
java.lang.ArrayIndexOutOfBoundsException: 3
        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.createSplit(OrcInputFormat.java:694)
        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:822)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
java.lang.RuntimeException: serious problem
        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.waitForTasks(OrcInputFormat.java:466)
        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:919)
        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:944)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
        at
org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.getPartitions(HadoopRDD.scala:375)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
        at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
        at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
        at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
        at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractTraversable.map(Traversable.scala:105)
        at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
        at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
        at
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:121)
        at
org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:125)
        at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1269)
        at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1203)
        at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1210)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
        at $iwC$$iwC$$iwC.<init>(<console>:35)
        at $iwC$$iwC.<init>(<console>:37)
        at $iwC.<init>(<console>:39)
        at <init>(<console>:41)
        at .<init>(<console>:45)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
        at
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at
org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at
org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
        at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.createSplit(OrcInputFormat.java:694)
        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:822)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Question-ORC-EMRFS-Problem-tp24673p24675.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to