ystem.
Parquet runs ordinarily but it's inconvenience(Almost our system runs based
on ORC)
Does anybody knows about this issue?
I've tried spark 1.4.1(EMR 4.0.0) and there are no 1.5 patch-note about this
Thank You
--
ca...@korea.com
cazen@samsung.com
http://www.Cazen.co.kr
--
View
Stacktrace are below.
But someone told me that it's known issue and will be patched in couple of
weeks(EMR 4.1.)
So, dont' mind about that. I'll waiting until patched.
scala> val ORCFile =
sqlContext.read.format("orc").load("s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91
Good day junHyeok
Did you set HADOOP_CONF_DIR? It seems that spark cannot find AWS key
properties
If it doesn't work after set, How about export AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY before running py-spark shell?
BR
--
View this message in context:
http://apache-spark-user-list.1001560.n
Good Day!
I think there are some problems between ORC and AWS EMRFS.
When I was trying to read "upper 150M" ORC files from S3, ArrayOutOfIndex
Exception occured.
I'm sure that it's AWS side issue because there was no exception when trying
from HDFS or S3NativeFileSystem.
Parquet runs ordinari
ystem.
Parquet runs ordinarily but it's inconvenience(Almost our system runs based on
ORC)
Does anybody knows about this issue?
I've tried spark 1.4.1(EMR 4.0.0) and there are no 1.5 patch-note about this
Thank You
--
ca...@korea.com
cazen@sams
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.createSplit(OrcInputFormat.java:694)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:822)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)