Having the a column name same as the table name, is a problem due to which I was not able to reference jobs.values.id from jobs. Changing the table name to jobs1 resolved the semantic error. However, the query still returns null
hive> select jobs.values.position.title from jobs1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201306080116_0036, Tracking URL = http://node01.expressanalytics.net:50030/jobdetails.jsp?jobid=job_201306080116_0036 Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201306080116_0036 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2013-06-18 18:55:52,381 Stage-1 map = 0%, reduce = 0% 2013-06-18 18:55:56,394 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.88 sec 2013-06-18 18:55:57,400 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.88 sec 2013-06-18 18:55:58,407 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 0.88 sec MapReduce Total cumulative CPU time: 880 msec Ended Job = job_201306080116_0036 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 0.88 sec HDFS Read: 35374 HDFS Write: 3 SUCCESS Total MapReduce CPU Time Spent: 880 msec OK null Time taken: 9.591 seconds regards Sunita On Tue, Jun 18, 2013 at 9:35 PM, Sunita Arvind <sunitarv...@gmail.com>wrote: > Ok. > The data files are quite small. Around 35 KB and 1 KB each. > > [sunita@node01 tables]$ hadoop fs -ls /user/sunita/tables/jobs > Found 1 items > -rw-r--r-- 3 sunita hdfs 35172 2013-06-18 18:31 > /user/sunita/tables/jobs/jobs_noSite_parsed.json > > > [sunita@node01 tables]$ hadoop fs -text > /user/sunita/tables/jobs/jobs_noSite_parsed.json |more > {"jobs": {"_total": 1812, "_count": 20, "_start": 0, "values": > [{"company": {"i > d": 21836, "name": "CyberCoders"}, "postingDate": {"year": 2013, "day": > 10, "mo > nth": 6}, "descriptionSnippet": "Software Engineer-Hadoop, HDFS, HBase, > Pig- Ve > rtica Analytics Senior Hadoop Engineer - Skills Required - Hadoop, HDFS, > HBase, > Pig, SQL, Industrial Software Development, System Integration, Java, high > perf > ormance, multi-threading, VerticaWe are a well known consumer product > developme > nt company and we are looking to add a Hadoop Engineer to our Engineering > team. > You will be working with the latest ", "expirationDate": {"year": 2013, > "day" > > Its a single line, so used 'more' rather than 'head'. But effectively, the > file exists and has the data. > > regards > Sunita > > > On Tue, Jun 18, 2013 at 8:38 PM, Stephen Sprague <sprag...@gmail.com>wrote: > >> As Nitin alluded to its best to confirm the data is definitely in hdfs >> using hdfs semantics rather than hive for the first step. >> >> 1. how big is it? hadoop fs -ls <your hdfs dir> >> 2. cat a bit of it and see if anything is there. hadoop fs -text <your >> hdfs dir>/<filename> | head -10 >> >> do you see any data from step #2? >> >> >> >> >> On Tue, Jun 18, 2013 at 3:58 PM, Sunita Arvind <sunitarv...@gmail.com>wrote: >> >>> I ran some complex queries. Something to the extent of >>> select jobs from jobs; >>> which triggers map reduce jobs but does not show errors and produces >>> the same output "null". If I try referencing the struct elements, I get >>> error which seems to be the root cause. >>> >>> Attached are the select statement outputs with the corresponding hive >>> logs. >>> >>> I have also attached my usage details of another table - try_parsed >>> which has a subset of the same data which seems to work fine. Also attached >>> is the input file for this table - try_parsed.json >>> Thanks for your help >>> >>> Sunita >>> >>> >>> On Tue, Jun 18, 2013 at 4:35 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote: >>> >>>> can you run a little more complex query >>>> >>>> select uniq across columns or do some maths. so we know when it fires >>>> up a mapreduce >>>> >>>> >>>> On Wed, Jun 19, 2013 at 1:59 AM, Sunita Arvind >>>> <sunitarv...@gmail.com>wrote: >>>> >>>>> Thanks for responding Nitin. Yes I am sure that serde is working fine >>>>> and json file is being picked based on all the errors that showed up till >>>>> this stage. What sort of error are you suspecting. File not present or >>>>> serde not parsing it ? >>>>> >>>>> >>>>> On Tuesday, June 18, 2013, Nitin Pawar wrote: >>>>> >>>>>> select * from table is as good as hdfs -cat >>>>>> >>>>>> are you sure there is any data in the table? >>>>>> >>>>>> >>>>>> On Tue, Jun 18, 2013 at 11:54 PM, Sunita Arvind < >>>>>> sunitarv...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am able to parse the input JSON file and load it into hive. I do >>>>>>> not see any errors with create table, so I am assuming that. But when I >>>>>>> try >>>>>>> to read the data, I get null >>>>>>> >>>>>>> hive> select * from jobs; >>>>>>> OK >>>>>>> null >>>>>>> >>>>>>> I have validated the JSON with JSONLint and Notepad++ JSON plugin >>>>>>> and it is a valid JSON. Here is my create table statement and >>>>>>> attached is the json input file. >>>>>>> >>>>>>> create external table jobs ( >>>>>>> jobs STRUCT< >>>>>>> values : ARRAY<STRUCT< >>>>>>> company : STRUCT< >>>>>>> id : STRING, >>>>>>> name : STRING>, >>>>>>> postingDate : STRUCT< >>>>>>> year : INT, >>>>>>> day : INT, >>>>>>> month : INT>, >>>>>>> descriptionSnippet : STRING, >>>>>>> expirationDate : STRUCT< >>>>>>> year : INT, >>>>>>> day : INT, >>>>>>> month : INT>, >>>>>>> position : STRUCT< >>>>>>> title : STRING, >>>>>>> jobFunctions : ARRAY<STRUCT< >>>>>>> code : STRING, >>>>>>> name : STRING>>, >>>>>>> industries : ARRAY<STRUCT< >>>>>>> code : STRING, >>>>>>> id : STRING, >>>>>>> name : STRING>>, >>>>>>> jobType : STRUCT< >>>>>>> code : STRING, >>>>>>> name : STRING>, >>>>>>> experienceLevel : STRUCT< >>>>>>> code : STRING, >>>>>>> name : STRING>>, >>>>>>> id : STRING, >>>>>>> customerJobCode : STRING, >>>>>>> skillsAndExperience : STRING, >>>>>>> salary : STRING, >>>>>>> jobPoster : STRUCT< >>>>>>> id : STRING, >>>>>>> firstName : STRING, >>>>>>> lastName : STRING, >>>>>>> headline : STRING>, >>>>>>> referralBonus : STRING, >>>>>>> locationDescription : STRING>>> >>>>>>> ) >>>>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' >>>>>>> LOCATION '/user/sunita/tables/jobs'; >>>>>>> >>>>>>> The table creation works fine, but when I attempt to query, I get >>>>>>> null as the result. >>>>>>> I tried adding Input/Output formats, Serde Properties, nothing seems >>>>>>> to impact. >>>>>>> >>>>>>> I am of the opinion that the libraries cannot handle this level of >>>>>>> nesting and I probably will have to write a custom serde or a parser >>>>>>> myself. Just wanted to seek guidance before I get into that. Appreciate >>>>>>> your help and guidance. >>>>>>> >>>>>>> regards >>>>>>> Sunita >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nitin Pawar >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Nitin Pawar >>>> >>> >>> >> >