Do you have any timestamp fields in the table that might contain null value ?
I faced a similar situation sometime back - changing the data type to string made it work. But I was working on delimited text files. Not sure if it applies to JSON .. but its still worth giving a try !! Richa On Wed, Jun 19, 2013 at 7:28 AM, Sunita Arvind <sunitarv...@gmail.com>wrote: > Having the a column name same as the table name, is a problem due to which > I was not able to reference jobs.values.id from jobs. Changing the table > name to jobs1 resolved the semantic error. > However, the query still returns null > > hive> select jobs.values.position.title from jobs1; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_201306080116_0036, Tracking URL = > http://node01.expressanalytics.net:50030/jobdetails.jsp?jobid=job_201306080116_0036 > Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201306080116_0036 > Hadoop job information for Stage-1: number of mappers: 1; number of > reducers: 0 > 2013-06-18 18:55:52,381 Stage-1 map = 0%, reduce = 0% > 2013-06-18 18:55:56,394 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 0.88 sec > 2013-06-18 18:55:57,400 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 0.88 sec > 2013-06-18 18:55:58,407 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 0.88 sec > MapReduce Total cumulative CPU time: 880 msec > Ended Job = job_201306080116_0036 > MapReduce Jobs Launched: > Job 0: Map: 1 Cumulative CPU: 0.88 sec HDFS Read: 35374 HDFS Write: 3 > SUCCESS > Total MapReduce CPU Time Spent: 880 msec > OK > null > Time taken: 9.591 seconds > > > regards > Sunita > > > On Tue, Jun 18, 2013 at 9:35 PM, Sunita Arvind <sunitarv...@gmail.com>wrote: > >> Ok. >> The data files are quite small. Around 35 KB and 1 KB each. >> >> [sunita@node01 tables]$ hadoop fs -ls /user/sunita/tables/jobs >> Found 1 items >> -rw-r--r-- 3 sunita hdfs 35172 2013-06-18 18:31 >> /user/sunita/tables/jobs/jobs_noSite_parsed.json >> >> >> [sunita@node01 tables]$ hadoop fs -text >> /user/sunita/tables/jobs/jobs_noSite_parsed.json |more >> {"jobs": {"_total": 1812, "_count": 20, "_start": 0, "values": >> [{"company": {"i >> d": 21836, "name": "CyberCoders"}, "postingDate": {"year": 2013, "day": >> 10, "mo >> nth": 6}, "descriptionSnippet": "Software Engineer-Hadoop, HDFS, HBase, >> Pig- Ve >> rtica Analytics Senior Hadoop Engineer - Skills Required - Hadoop, HDFS, >> HBase, >> Pig, SQL, Industrial Software Development, System Integration, Java, >> high perf >> ormance, multi-threading, VerticaWe are a well known consumer product >> developme >> nt company and we are looking to add a Hadoop Engineer to our Engineering >> team. >> You will be working with the latest ", "expirationDate": {"year": 2013, >> "day" >> >> Its a single line, so used 'more' rather than 'head'. But effectively, >> the file exists and has the data. >> >> regards >> Sunita >> >> >> On Tue, Jun 18, 2013 at 8:38 PM, Stephen Sprague <sprag...@gmail.com>wrote: >> >>> As Nitin alluded to its best to confirm the data is definitely in hdfs >>> using hdfs semantics rather than hive for the first step. >>> >>> 1. how big is it? hadoop fs -ls <your hdfs dir> >>> 2. cat a bit of it and see if anything is there. hadoop fs -text <your >>> hdfs dir>/<filename> | head -10 >>> >>> do you see any data from step #2? >>> >>> >>> >>> >>> On Tue, Jun 18, 2013 at 3:58 PM, Sunita Arvind <sunitarv...@gmail.com>wrote: >>> >>>> I ran some complex queries. Something to the extent of >>>> select jobs from jobs; >>>> which triggers map reduce jobs but does not show errors and produces >>>> the same output "null". If I try referencing the struct elements, I get >>>> error which seems to be the root cause. >>>> >>>> Attached are the select statement outputs with the corresponding hive >>>> logs. >>>> >>>> I have also attached my usage details of another table - try_parsed >>>> which has a subset of the same data which seems to work fine. Also attached >>>> is the input file for this table - try_parsed.json >>>> Thanks for your help >>>> >>>> Sunita >>>> >>>> >>>> On Tue, Jun 18, 2013 at 4:35 PM, Nitin Pawar >>>> <nitinpawar...@gmail.com>wrote: >>>> >>>>> can you run a little more complex query >>>>> >>>>> select uniq across columns or do some maths. so we know when it fires >>>>> up a mapreduce >>>>> >>>>> >>>>> On Wed, Jun 19, 2013 at 1:59 AM, Sunita Arvind >>>>> <sunitarv...@gmail.com>wrote: >>>>> >>>>>> Thanks for responding Nitin. Yes I am sure that serde is working fine >>>>>> and json file is being picked based on all the errors that showed up till >>>>>> this stage. What sort of error are you suspecting. File not present or >>>>>> serde not parsing it ? >>>>>> >>>>>> >>>>>> On Tuesday, June 18, 2013, Nitin Pawar wrote: >>>>>> >>>>>>> select * from table is as good as hdfs -cat >>>>>>> >>>>>>> are you sure there is any data in the table? >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 18, 2013 at 11:54 PM, Sunita Arvind < >>>>>>> sunitarv...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am able to parse the input JSON file and load it into hive. I do >>>>>>>> not see any errors with create table, so I am assuming that. But when >>>>>>>> I try >>>>>>>> to read the data, I get null >>>>>>>> >>>>>>>> hive> select * from jobs; >>>>>>>> OK >>>>>>>> null >>>>>>>> >>>>>>>> I have validated the JSON with JSONLint and Notepad++ JSON plugin >>>>>>>> and it is a valid JSON. Here is my create table statement and >>>>>>>> attached is the json input file. >>>>>>>> >>>>>>>> create external table jobs ( >>>>>>>> jobs STRUCT< >>>>>>>> values : ARRAY<STRUCT< >>>>>>>> company : STRUCT< >>>>>>>> id : STRING, >>>>>>>> name : STRING>, >>>>>>>> postingDate : STRUCT< >>>>>>>> year : INT, >>>>>>>> day : INT, >>>>>>>> month : INT>, >>>>>>>> descriptionSnippet : STRING, >>>>>>>> expirationDate : STRUCT< >>>>>>>> year : INT, >>>>>>>> day : INT, >>>>>>>> month : INT>, >>>>>>>> position : STRUCT< >>>>>>>> title : STRING, >>>>>>>> jobFunctions : ARRAY<STRUCT< >>>>>>>> code : STRING, >>>>>>>> name : STRING>>, >>>>>>>> industries : ARRAY<STRUCT< >>>>>>>> code : STRING, >>>>>>>> id : STRING, >>>>>>>> name : STRING>>, >>>>>>>> jobType : STRUCT< >>>>>>>> code : STRING, >>>>>>>> name : STRING>, >>>>>>>> experienceLevel : STRUCT< >>>>>>>> code : STRING, >>>>>>>> name : STRING>>, >>>>>>>> id : STRING, >>>>>>>> customerJobCode : STRING, >>>>>>>> skillsAndExperience : STRING, >>>>>>>> salary : STRING, >>>>>>>> jobPoster : STRUCT< >>>>>>>> id : STRING, >>>>>>>> firstName : STRING, >>>>>>>> lastName : STRING, >>>>>>>> headline : STRING>, >>>>>>>> referralBonus : STRING, >>>>>>>> locationDescription : STRING>>> >>>>>>>> ) >>>>>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' >>>>>>>> LOCATION '/user/sunita/tables/jobs'; >>>>>>>> >>>>>>>> The table creation works fine, but when I attempt to query, I get >>>>>>>> null as the result. >>>>>>>> I tried adding Input/Output formats, Serde Properties, nothing >>>>>>>> seems to impact. >>>>>>>> >>>>>>>> I am of the opinion that the libraries cannot handle this level of >>>>>>>> nesting and I probably will have to write a custom serde or a parser >>>>>>>> myself. Just wanted to seek guidance before I get into that. Appreciate >>>>>>>> your help and guidance. >>>>>>>> >>>>>>>> regards >>>>>>>> Sunita >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nitin Pawar >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Nitin Pawar >>>>> >>>> >>>> >>> >> >