Re: HIVE ORC table returns NULLs ( EMR 5.9 Hive 2.3.0 )

2017-10-25 Thread Oleg Ruchovets
Thanks, Owen. I tried to run from hdfs (not from s3) the problem is the same. May you please share your hive-site.xml? What env variables, parameters should I check? I would use structor with pleasure, but I need to use EMR for this project. Thanks Oleg On Thu, Oct 26, 2017 at 12:22 AM, Owen O

Re: HIVE ORC table returns NULLs ( EMR 5.9 Hive 2.3.0 )

2017-10-25 Thread Owen O'Malley
I'm not sure. Using a virtual environment with Hortonwork's version (2.6.1) and hdfs instead of s3 it works: hive> CREATE EXTERNAL TABLE Table1 (Id INT, Name STRING) STORED AS ORC > LOCATION 'hdfs://nn.example.com/user/vagrant/country/'; > OK > Time taken: 4.073 seconds > hive> Select * from Table

Re: HIVE ORC table returns NULLs ( EMR 5.9 Hive 2.3.0 )

2017-10-25 Thread Oleg Ruchovets
Yes, It is exactly my point. Since the file has the data (orc is valid), why hive returns NULLs? I tested it s3 , hdfs , hive , beeline. the behavior is the same: select count (*) returns 10. select * returns NULLs ... What is the way to debug this problem? Any configuration, logging. I

Re: HIVE ORC table returns NULLs ( EMR 5.9 Hive 2.3.0 )

2017-10-24 Thread Owen O'Malley
The file has the data. I'm not sure what Hive is doing wrong. owen@laptop> java -jar ../tools/target/orc-tools-1.5.0-SNAPSHOT-uber.jar > data ~/Downloads/Country.orc > Processing data file /Users/owen/Downloads/Country.orc [length: 392] > {"Id":1,"Name":"Singapore"} > {"Id":2,"Name":"Malaysia"} >

Re: Hive ORC Table

2017-01-22 Thread goun na
Please refer the document below as well: Hive on Tez Performance Tuning - Determining Reducer Counts https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html might I hope it gives you some clue to understand Tez inside. 2017-01-21 23:35 GMT+09:00 M

Re: Hive ORC Table

2017-01-21 Thread Mahender Sarangam
Yes below option, i tried it, But I'm not sure about work load (data ingestion). I cant go with fixed hard coded value,I would like to know reason for getting 1009 reducer task. On 1/20/2017 7:45 PM, goun na wrote: Hi Mahender , 1st : Didn't work the following option in Tez? set mapreduce.job.

Re: Hive ORC Table

2017-01-20 Thread goun na
Hi Mahender Sarangam, 1st : Didn't work the following option in Tez? set mapreduce.job.reduces=100 or set mapred.reduce.tasks=100 (deprecated) 2nd : Possibility of data skew. It happens when handling null sometimes. Goun 2017-01-21 9:58 GMT+09:00 Mahender Sarangam : > Hi All, > > We have ORC