from:"Cazen"

[Question] ORC - EMRFS Problem

2015-09-12 Thread Cazen

ystem. Parquet runs ordinarily but it's inconvenience(Almost our system runs based on ORC) Does anybody knows about this issue? I've tried spark 1.4.1(EMR 4.0.0) and there are no 1.5 patch-note about this Thank You -- ca...@korea.com cazen@samsung.com http://www.Cazen.co.kr -- View

Re: [Question] ORC - EMRFS Problem

2015-09-13 Thread Cazen

Stacktrace are below. But someone told me that it's known issue and will be patched in couple of weeks(EMR 4.1.) So, dont' mind about that. I'll waiting until patched. scala> val ORCFile = sqlContext.read.format("orc").load("s3n://S3bucketName/S3serviceCode/yymmdd=20150801/country=eu/75e91

Re: Directly reading data from S3 to EC2 with PySpark

2015-09-15 Thread Cazen

Good day junHyeok Did you set HADOOP_CONF_DIR? It seems that spark cannot find AWS key properties If it doesn't work after set, How about export AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY before running py-spark shell? BR -- View this message in context: http://apache-spark-user-list.1001560.n

[Question] ORC - EMRFS Problem

2015-09-12 Thread Cazen Lee

Good Day! I think there are some problems between ORC and AWS EMRFS. When I was trying to read "upper 150M" ORC files from S3, ArrayOutOfIndex Exception occured. I'm sure that it's AWS side issue because there was no exception when trying from HDFS or S3NativeFileSystem. Parquet runs ordinari

[Question] ORC - EMRFS Problem

2015-09-12 Thread Cazen Lee

ystem. Parquet runs ordinarily but it's inconvenience(Almost our system runs based on ORC) Does anybody knows about this issue? I've tried spark 1.4.1(EMR 4.0.0) and there are no 1.5 patch-note about this Thank You -- ca...@korea.com cazen@sams

Re: [Question] ORC - EMRFS Problem

2015-09-13 Thread Cazen Lee

at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.createSplit(OrcInputFormat.java:694) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:822) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

[Question] ORC - EMRFS Problem

Re: [Question] ORC - EMRFS Problem

Re: Directly reading data from S3 to EC2 with PySpark

[Question] ORC - EMRFS Problem

[Question] ORC - EMRFS Problem

Re: [Question] ORC - EMRFS Problem

6 matches

Site Navigation

Mail list logo

Footer information