Thanks for sharing your experience Richa. I do have timestamps but in the format of year : INT, day : INT, month : INT. As per your suggestion, I changed them all to string, but still get null as the output.
regards Sunita On Wed, Jun 19, 2013 at 2:17 AM, Richa Sharma <mailtorichasha...@gmail.com>wrote: > Do you have any timestamp fields in the table that might contain null > value ? > > I faced a similar situation sometime back - changing the data type to > string made it work. > > But I was working on delimited text files. > Not sure if it applies to JSON .. but its still worth giving a try !! > > Richa > > > > On Wed, Jun 19, 2013 at 7:28 AM, Sunita Arvind <sunitarv...@gmail.com>wrote: > >> Having the a column name same as the table name, is a problem due to >> which I was not able to reference jobs.values.id from jobs. Changing the >> table name to jobs1 resolved the semantic error. >> However, the query still returns null >> >> hive> select jobs.values.position.title from jobs1; >> Total MapReduce jobs = 1 >> Launching Job 1 out of 1 >> Number of reduce tasks is set to 0 since there's no reduce operator >> Starting Job = job_201306080116_0036, Tracking URL = >> http://node01.expressanalytics.net:50030/jobdetails.jsp?jobid=job_201306080116_0036 >> Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201306080116_0036 >> Hadoop job information for Stage-1: number of mappers: 1; number of >> reducers: 0 >> 2013-06-18 18:55:52,381 Stage-1 map = 0%, reduce = 0% >> 2013-06-18 18:55:56,394 Stage-1 map = 100%, reduce = 0%, Cumulative CPU >> 0.88 sec >> 2013-06-18 18:55:57,400 Stage-1 map = 100%, reduce = 0%, Cumulative CPU >> 0.88 sec >> 2013-06-18 18:55:58,407 Stage-1 map = 100%, reduce = 100%, Cumulative >> CPU 0.88 sec >> MapReduce Total cumulative CPU time: 880 msec >> Ended Job = job_201306080116_0036 >> MapReduce Jobs Launched: >> Job 0: Map: 1 Cumulative CPU: 0.88 sec HDFS Read: 35374 HDFS Write: 3 >> SUCCESS >> Total MapReduce CPU Time Spent: 880 msec >> OK >> null >> Time taken: 9.591 seconds >> >> >> regards >> Sunita >> >> >> On Tue, Jun 18, 2013 at 9:35 PM, Sunita Arvind <sunitarv...@gmail.com>wrote: >> >>> Ok. >>> The data files are quite small. Around 35 KB and 1 KB each. >>> >>> [sunita@node01 tables]$ hadoop fs -ls /user/sunita/tables/jobs >>> Found 1 items >>> -rw-r--r-- 3 sunita hdfs 35172 2013-06-18 18:31 >>> /user/sunita/tables/jobs/jobs_noSite_parsed.json >>> >>> >>> [sunita@node01 tables]$ hadoop fs -text >>> /user/sunita/tables/jobs/jobs_noSite_parsed.json |more >>> {"jobs": {"_total": 1812, "_count": 20, "_start": 0, "values": >>> [{"company": {"i >>> d": 21836, "name": "CyberCoders"}, "postingDate": {"year": 2013, "day": >>> 10, "mo >>> nth": 6}, "descriptionSnippet": "Software Engineer-Hadoop, HDFS, HBase, >>> Pig- Ve >>> rtica Analytics Senior Hadoop Engineer - Skills Required - Hadoop, HDFS, >>> HBase, >>> Pig, SQL, Industrial Software Development, System Integration, Java, >>> high perf >>> ormance, multi-threading, VerticaWe are a well known consumer product >>> developme >>> nt company and we are looking to add a Hadoop Engineer to our >>> Engineering team. >>> You will be working with the latest ", "expirationDate": {"year": >>> 2013, "day" >>> >>> Its a single line, so used 'more' rather than 'head'. But effectively, >>> the file exists and has the data. >>> >>> regards >>> Sunita >>> >>> >>> On Tue, Jun 18, 2013 at 8:38 PM, Stephen Sprague <sprag...@gmail.com>wrote: >>> >>>> As Nitin alluded to its best to confirm the data is definitely in hdfs >>>> using hdfs semantics rather than hive for the first step. >>>> >>>> 1. how big is it? hadoop fs -ls <your hdfs dir> >>>> 2. cat a bit of it and see if anything is there. hadoop fs -text >>>> <your hdfs dir>/<filename> | head -10 >>>> >>>> do you see any data from step #2? >>>> >>>> >>>> >>>> >>>> On Tue, Jun 18, 2013 at 3:58 PM, Sunita Arvind >>>> <sunitarv...@gmail.com>wrote: >>>> >>>>> I ran some complex queries. Something to the extent of >>>>> select jobs from jobs; >>>>> which triggers map reduce jobs but does not show errors and produces >>>>> the same output "null". If I try referencing the struct elements, I get >>>>> error which seems to be the root cause. >>>>> >>>>> Attached are the select statement outputs with the corresponding hive >>>>> logs. >>>>> >>>>> I have also attached my usage details of another table - try_parsed >>>>> which has a subset of the same data which seems to work fine. Also >>>>> attached >>>>> is the input file for this table - try_parsed.json >>>>> Thanks for your help >>>>> >>>>> Sunita >>>>> >>>>> >>>>> On Tue, Jun 18, 2013 at 4:35 PM, Nitin Pawar >>>>> <nitinpawar...@gmail.com>wrote: >>>>> >>>>>> can you run a little more complex query >>>>>> >>>>>> select uniq across columns or do some maths. so we know when it fires >>>>>> up a mapreduce >>>>>> >>>>>> >>>>>> On Wed, Jun 19, 2013 at 1:59 AM, Sunita Arvind <sunitarv...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> Thanks for responding Nitin. Yes I am sure that serde is working >>>>>>> fine and json file is being picked based on all the errors that showed >>>>>>> up >>>>>>> till this stage. What sort of error are you suspecting. File not >>>>>>> present or >>>>>>> serde not parsing it ? >>>>>>> >>>>>>> >>>>>>> On Tuesday, June 18, 2013, Nitin Pawar wrote: >>>>>>> >>>>>>>> select * from table is as good as hdfs -cat >>>>>>>> >>>>>>>> are you sure there is any data in the table? >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 18, 2013 at 11:54 PM, Sunita Arvind < >>>>>>>> sunitarv...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am able to parse the input JSON file and load it into hive. I do >>>>>>>>> not see any errors with create table, so I am assuming that. But when >>>>>>>>> I try >>>>>>>>> to read the data, I get null >>>>>>>>> >>>>>>>>> hive> select * from jobs; >>>>>>>>> OK >>>>>>>>> null >>>>>>>>> >>>>>>>>> I have validated the JSON with JSONLint and Notepad++ JSON plugin >>>>>>>>> and it is a valid JSON. Here is my create table statement and >>>>>>>>> attached is the json input file. >>>>>>>>> >>>>>>>>> create external table jobs ( >>>>>>>>> jobs STRUCT< >>>>>>>>> values : ARRAY<STRUCT< >>>>>>>>> company : STRUCT< >>>>>>>>> id : STRING, >>>>>>>>> name : STRING>, >>>>>>>>> postingDate : STRUCT< >>>>>>>>> year : INT, >>>>>>>>> day : INT, >>>>>>>>> month : INT>, >>>>>>>>> descriptionSnippet : STRING, >>>>>>>>> expirationDate : STRUCT< >>>>>>>>> year : INT, >>>>>>>>> day : INT, >>>>>>>>> month : INT>, >>>>>>>>> position : STRUCT< >>>>>>>>> title : STRING, >>>>>>>>> jobFunctions : ARRAY<STRUCT< >>>>>>>>> code : STRING, >>>>>>>>> name : STRING>>, >>>>>>>>> industries : ARRAY<STRUCT< >>>>>>>>> code : STRING, >>>>>>>>> id : STRING, >>>>>>>>> name : STRING>>, >>>>>>>>> jobType : STRUCT< >>>>>>>>> code : STRING, >>>>>>>>> name : STRING>, >>>>>>>>> experienceLevel : STRUCT< >>>>>>>>> code : STRING, >>>>>>>>> name : STRING>>, >>>>>>>>> id : STRING, >>>>>>>>> customerJobCode : STRING, >>>>>>>>> skillsAndExperience : STRING, >>>>>>>>> salary : STRING, >>>>>>>>> jobPoster : STRUCT< >>>>>>>>> id : STRING, >>>>>>>>> firstName : STRING, >>>>>>>>> lastName : STRING, >>>>>>>>> headline : STRING>, >>>>>>>>> referralBonus : STRING, >>>>>>>>> locationDescription : STRING>>> >>>>>>>>> ) >>>>>>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' >>>>>>>>> LOCATION '/user/sunita/tables/jobs'; >>>>>>>>> >>>>>>>>> The table creation works fine, but when I attempt to query, I get >>>>>>>>> null as the result. >>>>>>>>> I tried adding Input/Output formats, Serde Properties, nothing >>>>>>>>> seems to impact. >>>>>>>>> >>>>>>>>> I am of the opinion that the libraries cannot handle this level of >>>>>>>>> nesting and I probably will have to write a custom serde or a parser >>>>>>>>> myself. Just wanted to seek guidance before I get into that. >>>>>>>>> Appreciate >>>>>>>>> your help and guidance. >>>>>>>>> >>>>>>>>> regards >>>>>>>>> Sunita >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nitin Pawar >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nitin Pawar >>>>>> >>>>> >>>>> >>>> >>> >> >