Hi Owen Thank you for reply
I heard that some peoples say ORC is Owen’s RC file haha ;)
And, Some peoples tells to me after posting it’s already known issues about AWS
EMR 4.0.0
They said that it might be Hive 0.13.1 and Spark 1.4.1 compatibility issue
So AWS will launch EMR 4.1.0 in couple of weeks with Spark 1.5 and higher
version of Hive
I hope it works properly after 4.1.0
Error log is below, but don’t mind about that
Thank you very much
scala> val ORCFile =
sqlContext.read.format("orc").load("s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000")
2015-09-13 07:33:29,228 INFO [main] fs.EmrFileSystem
(EmrFileSystem.java:initialize(107)) - Consistency disabled, using
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation
2015-09-13 07:33:29,314 INFO [main] amazonaws.latency
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200],
ServiceName=[Amazon S3], AWSRequestID=[CF49E1372BEF2E81],
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com
<https://s3bucketname.s3.amazonaws.com/>], HttpClientPoolLeasedCount=0,
RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0,
ClientExecuteTime=[85.608], HttpRequestTime=[85.101],
HttpClientReceiveResponseTime=[13.891], RequestSigningTime=[0.259],
ResponseProcessingTime=[0.007], HttpClientSendRequestTime=[0.305],
2015-09-13 07:33:29,351 INFO [main] amazonaws.latency
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200],
ServiceName=[Amazon S3], AWSRequestID=[55B8C5E6009F0246],
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com
<https://s3bucketname.s3.amazonaws.com/>], HttpClientPoolLeasedCount=0,
RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1,
ClientExecuteTime=[32.776], HttpRequestTime=[13.17],
HttpClientReceiveResponseTime=[10.961], RequestSigningTime=[0.28],
ResponseProcessingTime=[19.042], HttpClientSendRequestTime=[0.295],
2015-09-13 07:33:29,421 INFO [main] s3n.S3NativeFileSystem
(S3NativeFileSystem.java:open(1159)) - Opening
's3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000'
for reading
2015-09-13 07:33:29,477 INFO [main] amazonaws.latency
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206],
ServiceName=[Amazon S3], AWSRequestID=[F698A6A43297754E],
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com
<https://s3bucketname.s3.amazonaws.com/>], HttpClientPoolLeasedCount=0,
RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1,
ClientExecuteTime=[53.698], HttpRequestTime=[50.815],
HttpClientReceiveResponseTime=[48.774], RequestSigningTime=[0.372],
ResponseProcessingTime=[0.861], HttpClientSendRequestTime=[0.362],
2015-09-13 07:33:29,478 INFO [main] metrics.MetricsSaver
(MetricsSaver.java:<init>(915)) - Thread 1 created MetricsLockFreeSaver 1
2015-09-13 07:33:29,479 INFO [main] s3n.S3NativeFileSystem
(S3NativeFileSystem.java:retrievePair(292)) - Stream for key
'S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000'
seeking to position '217260502'
2015-09-13 07:33:29,590 INFO [main] amazonaws.latency
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206],
ServiceName=[Amazon S3], AWSRequestID=[AD631A8AE229AFE7],
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com
<https://s3bucketname.s3.amazonaws.com/>], HttpClientPoolLeasedCount=0,
RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0,
ClientExecuteTime=[109.859], HttpRequestTime=[109.204],
HttpClientReceiveResponseTime=[58.468], RequestSigningTime=[0.286],
ResponseProcessingTime=[0.133], HttpClientSendRequestTime=[0.327],
2015-09-13 07:33:29,753 INFO [main] s3n.S3NativeFileSystem
(S3NativeFileSystem.java:listStatus(896)) - listStatus
s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000
with recursive false
2015-09-13 07:33:29,877 INFO [main] hive.HiveContext
(Logging.scala:logInfo(59)) - Initializing HiveMetastoreConnection version
0.13.1 using Spark classes.
2015-09-13 07:33:30,593 WARN [main] util.NativeCodeLoader
(NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
2015-09-13 07:33:30,622 INFO [main] metastore.HiveMetaStore
(HiveMetaStore.java:newRawStore(493)) - 0: Opening raw store with implemenation
class:org.apache.hadoop.hive.metastore.ObjectStore
2015-09-13 07:33:30,641 INFO [main] metastore.ObjectStore
(ObjectStore.java:initialize(246)) - ObjectStore, initialize called
2015-09-13 07:33:30,782 INFO [main] DataNucleus.Persistence
(Log4JLogger.java:info(77)) - Property datanucleus.cache.level2 unknown - will
be ignored
2015-09-13 07:33:30,782 INFO [main] DataNucleus.Persistence
(Log4JLogger.java:info(77)) - Property hive.metastore.integral.jdo.pushdown
unknown - will be ignored
2015-09-13 07:33:31,208 INFO [main] metastore.ObjectStore
(ObjectStore.java:getPMF(315)) - Setting MetaStore object pin classes with
hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2015-09-13 07:33:32,375 INFO [main] DataNucleus.Datastore
(Log4JLogger.java:info(77)) - The class
"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
"embedded-only" so does not have its own datastore table.
2015-09-13 07:33:32,376 INFO [main] DataNucleus.Datastore
(Log4JLogger.java:info(77)) - The class
"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so
does not have its own datastore table.
2015-09-13 07:33:32,470 INFO [main] DataNucleus.Datastore
(Log4JLogger.java:info(77)) - The class
"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
"embedded-only" so does not have its own datastore table.
2015-09-13 07:33:32,470 INFO [main] DataNucleus.Datastore
(Log4JLogger.java:info(77)) - The class
"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so
does not have its own datastore table.
2015-09-13 07:33:32,558 INFO [main] DataNucleus.Query
(Log4JLogger.java:info(77)) - Reading in results for query
"org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is
closing
2015-09-13 07:33:32,561 INFO [main] metastore.ObjectStore
(ObjectStore.java:setConf(229)) - Initialized ObjectStore
2015-09-13 07:33:32,816 INFO [main] metastore.HiveMetaStore
(HiveMetaStore.java:createDefaultRoles(551)) - Added admin role in metastore
2015-09-13 07:33:32,819 INFO [main] metastore.HiveMetaStore
(HiveMetaStore.java:createDefaultRoles(560)) - Added public role in metastore
2015-09-13 07:33:32,888 INFO [main] metastore.HiveMetaStore
(HiveMetaStore.java:addAdminUsers(588)) - No user is added in admin role, since
config is empty
2015-09-13 07:33:33,343 INFO [main] session.SessionState
(SessionState.java:start(360)) - No Tez session required at this point.
hive.execution.engine=mr.
ORCFile: org.apache.spark.sql.DataFrame = [h_header1: string, h_header2:
string, h_header3: string, h_header4: string, h_header5: string, h_header6:
string, h_header7: string, h_header8: string, h_header9: string, body:
map<string,string>, yymmdd: int, country: string]
scala> ORCFile.head
2015-09-13 07:33:41,080 INFO [main] sources.DataSourceStrategy
(Logging.scala:logInfo(59)) - Selected 1 partitions out of 1, pruned 0.0%
partitions.
2015-09-13 07:33:41,169 INFO [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - ensureFreeSpace(243112) called with curMem=0,
maxMem=280248975
2015-09-13 07:33:41,171 INFO [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - Block broadcast_0 stored as values in memory
(estimated size 237.4 KB, free 267.0 MB)
2015-09-13 07:33:41,214 INFO [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - ensureFreeSpace(22100) called with curMem=243112,
maxMem=280248975
2015-09-13 07:33:41,215 INFO [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - Block broadcast_0_piece0 stored as bytes in
memory (estimated size 21.6 KB, free 267.0 MB)
2015-09-13 07:33:41,216 INFO [sparkDriver-akka.actor.default-dispatcher-3]
storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_0_piece0
in memory on 10.0.0.112:48218 (size: 21.6 KB, free: 267.2 MB)
2015-09-13 07:33:41,221 INFO [main] spark.SparkContext
(Logging.scala:logInfo(59)) - Created broadcast 0 from head at <console>:22
2015-09-13 07:33:41,396 INFO [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - ensureFreeSpace(244448) called with
curMem=265212, maxMem=280248975
2015-09-13 07:33:41,396 INFO [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - Block broadcast_1 stored as values in memory
(estimated size 238.7 KB, free 266.8 MB)
2015-09-13 07:33:41,422 INFO [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - ensureFreeSpace(22567) called with curMem=509660,
maxMem=280248975
2015-09-13 07:33:41,422 INFO [main] storage.MemoryStore
(Logging.scala:logInfo(59)) - Block broadcast_1_piece0 stored as bytes in
memory (estimated size 22.0 KB, free 266.8 MB)
2015-09-13 07:33:41,423 INFO [sparkDriver-akka.actor.default-dispatcher-3]
storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_1_piece0
in memory on 10.0.0.112:48218 (size: 22.0 KB, free: 267.2 MB)
2015-09-13 07:33:41,426 INFO [main] spark.SparkContext
(Logging.scala:logInfo(59)) - Created broadcast 1 from head at <console>:22
2015-09-13 07:33:41,495 INFO [main] log.PerfLogger
(PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=OrcGetSplits
from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2015-09-13 07:33:41,497 INFO [main] Configuration.deprecation
(Configuration.java:warnOnceIfDeprecated(1049)) - mapred.input.dir is
deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2015-09-13 07:33:41,501 INFO [ORC_GET_SPLITS #0] s3n.S3NativeFileSystem
(S3NativeFileSystem.java:listStatus(896)) - listStatus
s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000
with recursive false
2015-09-13 07:33:41,504 INFO [ORC_GET_SPLITS #1] s3n.S3NativeFileSystem
(S3NativeFileSystem.java:open(1159)) - Opening
's3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000'
for reading
2015-09-13 07:33:41,593 INFO [ORC_GET_SPLITS #1] amazonaws.latency
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206],
ServiceName=[Amazon S3], AWSRequestID=[8DFE404E45BFD9CD],
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com
<https://s3bucketname.s3.amazonaws.com/>], HttpClientPoolLeasedCount=0,
RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0,
ClientExecuteTime=[88.129], HttpRequestTime=[86.932],
HttpClientReceiveResponseTime=[42.613], RequestSigningTime=[0.539],
ResponseProcessingTime=[0.142], HttpClientSendRequestTime=[0.337],
2015-09-13 07:33:41,594 INFO [ORC_GET_SPLITS #1] s3n.S3NativeFileSystem
(S3NativeFileSystem.java:retrievePair(292)) - Stream for key
'S3serviceCode/yymmdd=20150801/country=eu/75e91844-2a87-4d8f-af9f-9268e34daef6-000000'
seeking to position '217260502'
2015-09-13 07:33:41,674 INFO [ORC_GET_SPLITS #1] amazonaws.latency
(AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206],
ServiceName=[Amazon S3], AWSRequestID=[040D77B7E7E76AA5],
ServiceEndpoint=[https://S3bucketName.s3.amazonaws.com
<https://s3bucketname.s3.amazonaws.com/>], HttpClientPoolLeasedCount=0,
RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0,
ClientExecuteTime=[79.608], HttpRequestTime=[79.064],
HttpClientReceiveResponseTime=[36.843], RequestSigningTime=[0.222],
ResponseProcessingTime=[0.11], HttpClientSendRequestTime=[0.343],
2015-09-13 07:33:41,681 ERROR [ORC_GET_SPLITS #1] orc.OrcInputFormat
(OrcInputFormat.java:run(826)) - Unexpected Exception
java.lang.ArrayIndexOutOfBoundsException: 3
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.createSplit(OrcInputFormat.java:694)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:822)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
java.lang.RuntimeException: serious problem
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.waitForTasks(OrcInputFormat.java:466)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:919)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:944)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at
org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.getPartitions(HadoopRDD.scala:375)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:121)
at
org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:125)
at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1269)
at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1203)
at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1210)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
at $iwC$$iwC$$iwC.<init>(<console>:35)
at $iwC$$iwC.<init>(<console>:37)
at $iwC.<init>(<console>:39)
at <init>(<console>:41)
at .<init>(<console>:45)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.createSplit(OrcInputFormat.java:694)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:822)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
--
[email protected]
[email protected]
http://www.Cazen.co.kr
> 2015. 9. 13., 오후 3:00, Owen O'Malley <[email protected]> 작성:
>
> Do you have a stack trace of the array out of bounds exception? I don't
> remember an array out of bounds problem off the top of my head. A stack trace
> will tell me a lot, obviously.
>
> If you are using Spark 1.4 that implies Hive 0.13, which is pretty old. It
> may be a problem that we fixed a while ago.
>
> Thanks,
> Owen
>
>
>
> On Sat, Sep 12, 2015 at 8:15 AM, Cazen Lee <[email protected]
> <mailto:[email protected]>> wrote:
> Good Day!
>
> I think there are some problems between ORC and AWS EMRFS.
>
> When I was trying to read "upper 150M" ORC files from S3, ArrayOutOfIndex
> Exception occured.
>
> I'm sure that it's AWS side issue because there was no exception when trying
> from HDFS or S3NativeFileSystem.
>
> Parquet runs ordinarily but it's inconvenience(Almost our system runs based
> on ORC)
>
> Does anybody knows about this issue?
>
> I've tried spark 1.4.1(EMR 4.0.0) and there are no 1.5 patch-note about this
>
> Thank You
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> <mailto:[email protected]>
> For additional commands, e-mail: [email protected]
> <mailto:[email protected]>
>
>