Thanks for looking into it Ramki. Yes I had tried these options. Here is what I get (renamed the table to have a meaningful name):
hive> select jobs.values[1].id from linkedinjobsearch; ......mapreduce task details.... OK NULL Time taken: 9.586 seconds hive> select jobs.values[0].position.title from linkedinjobsearch; Total MapReduce jobs = 1 Launching Job 1 out of 1 OK NULL Time taken: 9.617 seconds I am trying to connect btrace to the process to be able to trace the code but cant get it to respond. Here is what I tried: [sunita@node01 ~]$ hive --debug, recursive=y, port=7000,mainSuspend=y, childSuspend=y ERROR: Cannot load this JVM TI agent twice, check your java command line for duplicate jdwp options. Error occurred during initialization of VM agent library failed to init: jdwp Tried changing the port also. Any idea regarding the debuggers that can be used. I also tried explain query and that does not show any issues either. regards Sunita On Wed, Jun 19, 2013 at 12:11 PM, Ramki Palle <ramki.pa...@gmail.com> wrote: > Can you run some other queries from job1 table and see if any query > returns some data? > > I am guessing your query "select jobs.values.position.title from jobs1;" > may have some issue. May be it should be as > > select jobs.values[0].position.title from jobs1; > > > Regards, > Ramki. > > > On Wed, Jun 19, 2013 at 8:24 AM, Sunita Arvind <sunitarv...@gmail.com>wrote: > >> Thanks Stephen, >> >> That's just what I tried with the try_parsed table. It is exactly same >> data with lesser nesting in the structure and lesser number of entries. >> Do you mean to say that highly nested jsons can lead to issues? What are >> typical solution to such issues? Write UDFs in hive or parse the JSON into >> a delimited file? >> I have heard of custom serdes also. Not sure if UDFs and custom serdes >> are one and the same. >> >> regards >> Sunita >> >> >> On Wed, Jun 19, 2013 at 10:38 AM, Stephen Sprague <sprag...@gmail.com>wrote: >> >>> I think you might have to start small here instead of going for the home >>> run on the first swing. when all else fails start with a trivial json >>> object and then build up from there and see what additional step breaks >>> it. that way you know if the trivial example fails is something >>> fundamental and not the complexity of your json object that's throwing >>> things off. >>> >>> >>> On Wed, Jun 19, 2013 at 4:34 AM, Sunita Arvind <sunitarv...@gmail.com>wrote: >>> >>>> Thanks for sharing your experience Richa. >>>> I do have timestamps but in the format of year : INT, day : INT, month >>>> : INT. >>>> As per your suggestion, I changed them all to string, but still get >>>> null as the output. >>>> >>>> regards >>>> Sunita >>>> >>>> >>>> On Wed, Jun 19, 2013 at 2:17 AM, Richa Sharma < >>>> mailtorichasha...@gmail.com> wrote: >>>> >>>>> Do you have any timestamp fields in the table that might contain null >>>>> value ? >>>>> >>>>> I faced a similar situation sometime back - changing the data type to >>>>> string made it work. >>>>> >>>>> But I was working on delimited text files. >>>>> Not sure if it applies to JSON .. but its still worth giving a try !! >>>>> >>>>> Richa >>>>> >>>>> >>>>> >>>>> On Wed, Jun 19, 2013 at 7:28 AM, Sunita Arvind >>>>> <sunitarv...@gmail.com>wrote: >>>>> >>>>>> Having the a column name same as the table name, is a problem due to >>>>>> which I was not able to reference jobs.values.id from jobs. Changing >>>>>> the table name to jobs1 resolved the semantic error. >>>>>> However, the query still returns null >>>>>> >>>>>> hive> select jobs.values.position.title from jobs1; >>>>>> Total MapReduce jobs = 1 >>>>>> Launching Job 1 out of 1 >>>>>> Number of reduce tasks is set to 0 since there's no reduce operator >>>>>> Starting Job = job_201306080116_0036, Tracking URL = >>>>>> http://node01.expressanalytics.net:50030/jobdetails.jsp?jobid=job_201306080116_0036 >>>>>> Kill Command = /usr/lib/hadoop/bin/hadoop job -kill >>>>>> job_201306080116_0036 >>>>>> Hadoop job information for Stage-1: number of mappers: 1; number of >>>>>> reducers: 0 >>>>>> 2013-06-18 18:55:52,381 Stage-1 map = 0%, reduce = 0% >>>>>> 2013-06-18 18:55:56,394 Stage-1 map = 100%, reduce = 0%, Cumulative >>>>>> CPU 0.88 sec >>>>>> 2013-06-18 18:55:57,400 Stage-1 map = 100%, reduce = 0%, Cumulative >>>>>> CPU 0.88 sec >>>>>> 2013-06-18 18:55:58,407 Stage-1 map = 100%, reduce = 100%, >>>>>> Cumulative CPU 0.88 sec >>>>>> MapReduce Total cumulative CPU time: 880 msec >>>>>> Ended Job = job_201306080116_0036 >>>>>> MapReduce Jobs Launched: >>>>>> Job 0: Map: 1 Cumulative CPU: 0.88 sec HDFS Read: 35374 HDFS >>>>>> Write: 3 SUCCESS >>>>>> Total MapReduce CPU Time Spent: 880 msec >>>>>> OK >>>>>> null >>>>>> Time taken: 9.591 seconds >>>>>> >>>>>> >>>>>> regards >>>>>> Sunita >>>>>> >>>>>> >>>>>> On Tue, Jun 18, 2013 at 9:35 PM, Sunita Arvind <sunitarv...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> Ok. >>>>>>> The data files are quite small. Around 35 KB and 1 KB each. >>>>>>> >>>>>>> [sunita@node01 tables]$ hadoop fs -ls /user/sunita/tables/jobs >>>>>>> Found 1 items >>>>>>> -rw-r--r-- 3 sunita hdfs 35172 2013-06-18 18:31 >>>>>>> /user/sunita/tables/jobs/jobs_noSite_parsed.json >>>>>>> >>>>>>> >>>>>>> [sunita@node01 tables]$ hadoop fs -text >>>>>>> /user/sunita/tables/jobs/jobs_noSite_parsed.json |more >>>>>>> {"jobs": {"_total": 1812, "_count": 20, "_start": 0, "values": >>>>>>> [{"company": {"i >>>>>>> d": 21836, "name": "CyberCoders"}, "postingDate": {"year": 2013, >>>>>>> "day": 10, "mo >>>>>>> nth": 6}, "descriptionSnippet": "Software Engineer-Hadoop, HDFS, >>>>>>> HBase, Pig- Ve >>>>>>> rtica Analytics Senior Hadoop Engineer - Skills Required - Hadoop, >>>>>>> HDFS, HBase, >>>>>>> Pig, SQL, Industrial Software Development, System Integration, >>>>>>> Java, high perf >>>>>>> ormance, multi-threading, VerticaWe are a well known consumer >>>>>>> product developme >>>>>>> nt company and we are looking to add a Hadoop Engineer to our >>>>>>> Engineering team. >>>>>>> You will be working with the latest ", "expirationDate": {"year": >>>>>>> 2013, "day" >>>>>>> >>>>>>> Its a single line, so used 'more' rather than 'head'. But >>>>>>> effectively, the file exists and has the data. >>>>>>> >>>>>>> regards >>>>>>> Sunita >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 18, 2013 at 8:38 PM, Stephen Sprague <sprag...@gmail.com >>>>>>> > wrote: >>>>>>> >>>>>>>> As Nitin alluded to its best to confirm the data is definitely in >>>>>>>> hdfs using hdfs semantics rather than hive for the first step. >>>>>>>> >>>>>>>> 1. how big is it? hadoop fs -ls <your hdfs dir> >>>>>>>> 2. cat a bit of it and see if anything is there. hadoop fs -text >>>>>>>> <your hdfs dir>/<filename> | head -10 >>>>>>>> >>>>>>>> do you see any data from step #2? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 18, 2013 at 3:58 PM, Sunita Arvind < >>>>>>>> sunitarv...@gmail.com> wrote: >>>>>>>> >>>>>>>>> I ran some complex queries. Something to the extent of >>>>>>>>> select jobs from jobs; >>>>>>>>> which triggers map reduce jobs but does not show errors and >>>>>>>>> produces the same output "null". If I try referencing the struct >>>>>>>>> elements, >>>>>>>>> I get error which seems to be the root cause. >>>>>>>>> >>>>>>>>> Attached are the select statement outputs with the corresponding >>>>>>>>> hive logs. >>>>>>>>> >>>>>>>>> I have also attached my usage details of another table - >>>>>>>>> try_parsed which has a subset of the same data which seems to work >>>>>>>>> fine. >>>>>>>>> Also attached is the input file for this table - try_parsed.json >>>>>>>>> Thanks for your help >>>>>>>>> >>>>>>>>> Sunita >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jun 18, 2013 at 4:35 PM, Nitin Pawar < >>>>>>>>> nitinpawar...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> can you run a little more complex query >>>>>>>>>> >>>>>>>>>> select uniq across columns or do some maths. so we know when it >>>>>>>>>> fires up a mapreduce >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jun 19, 2013 at 1:59 AM, Sunita Arvind < >>>>>>>>>> sunitarv...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks for responding Nitin. Yes I am sure that serde is working >>>>>>>>>>> fine and json file is being picked based on all the errors that >>>>>>>>>>> showed up >>>>>>>>>>> till this stage. What sort of error are you suspecting. File not >>>>>>>>>>> present or >>>>>>>>>>> serde not parsing it ? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tuesday, June 18, 2013, Nitin Pawar wrote: >>>>>>>>>>> >>>>>>>>>>>> select * from table is as good as hdfs -cat >>>>>>>>>>>> >>>>>>>>>>>> are you sure there is any data in the table? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jun 18, 2013 at 11:54 PM, Sunita Arvind < >>>>>>>>>>>> sunitarv...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I am able to parse the input JSON file and load it into hive. >>>>>>>>>>>>> I do not see any errors with create table, so I am assuming that. >>>>>>>>>>>>> But when >>>>>>>>>>>>> I try to read the data, I get null >>>>>>>>>>>>> >>>>>>>>>>>>> hive> select * from jobs; >>>>>>>>>>>>> OK >>>>>>>>>>>>> null >>>>>>>>>>>>> >>>>>>>>>>>>> I have validated the JSON with JSONLint and Notepad++ JSON >>>>>>>>>>>>> plugin and it is a valid JSON. Here is my create table >>>>>>>>>>>>> statement and attached is the json input file. >>>>>>>>>>>>> >>>>>>>>>>>>> create external table jobs ( >>>>>>>>>>>>> jobs STRUCT< >>>>>>>>>>>>> values : ARRAY<STRUCT< >>>>>>>>>>>>> company : STRUCT< >>>>>>>>>>>>> id : STRING, >>>>>>>>>>>>> name : STRING>, >>>>>>>>>>>>> postingDate : STRUCT< >>>>>>>>>>>>> year : INT, >>>>>>>>>>>>> day : INT, >>>>>>>>>>>>> month : INT>, >>>>>>>>>>>>> descriptionSnippet : STRING, >>>>>>>>>>>>> expirationDate : STRUCT< >>>>>>>>>>>>> year : INT, >>>>>>>>>>>>> day : INT, >>>>>>>>>>>>> month : INT>, >>>>>>>>>>>>> position : STRUCT< >>>>>>>>>>>>> title : STRING, >>>>>>>>>>>>> jobFunctions : ARRAY<STRUCT< >>>>>>>>>>>>> code : STRING, >>>>>>>>>>>>> name : STRING>>, >>>>>>>>>>>>> industries : ARRAY<STRUCT< >>>>>>>>>>>>> code : STRING, >>>>>>>>>>>>> id : STRING, >>>>>>>>>>>>> name : STRING>>, >>>>>>>>>>>>> jobType : STRUCT< >>>>>>>>>>>>> code : STRING, >>>>>>>>>>>>> name : STRING>, >>>>>>>>>>>>> experienceLevel : STRUCT< >>>>>>>>>>>>> code : STRING, >>>>>>>>>>>>> name : STRING>>, >>>>>>>>>>>>> id : STRING, >>>>>>>>>>>>> customerJobCode : STRING, >>>>>>>>>>>>> skillsAndExperience : STRING, >>>>>>>>>>>>> salary : STRING, >>>>>>>>>>>>> jobPoster : STRUCT< >>>>>>>>>>>>> id : STRING, >>>>>>>>>>>>> firstName : STRING, >>>>>>>>>>>>> lastName : STRING, >>>>>>>>>>>>> headline : STRING>, >>>>>>>>>>>>> referralBonus : STRING, >>>>>>>>>>>>> locationDescription : STRING>>> >>>>>>>>>>>>> ) >>>>>>>>>>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' >>>>>>>>>>>>> LOCATION '/user/sunita/tables/jobs'; >>>>>>>>>>>>> >>>>>>>>>>>>> The table creation works fine, but when I attempt to query, I >>>>>>>>>>>>> get null as the result. >>>>>>>>>>>>> I tried adding Input/Output formats, Serde Properties, nothing >>>>>>>>>>>>> seems to impact. >>>>>>>>>>>>> >>>>>>>>>>>>> I am of the opinion that the libraries cannot handle this >>>>>>>>>>>>> level of nesting and I probably will have to write a custom serde >>>>>>>>>>>>> or a >>>>>>>>>>>>> parser myself. Just wanted to seek guidance before I get into >>>>>>>>>>>>> that. >>>>>>>>>>>>> Appreciate your help and guidance. >>>>>>>>>>>>> >>>>>>>>>>>>> regards >>>>>>>>>>>>> Sunita >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Nitin Pawar >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Nitin Pawar >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >