Ok. The data files are quite small. Around 35 KB and 1 KB each. [sunita@node01 tables]$ hadoop fs -ls /user/sunita/tables/jobs Found 1 items -rw-r--r-- 3 sunita hdfs 35172 2013-06-18 18:31 /user/sunita/tables/jobs/jobs_noSite_parsed.json
[sunita@node01 tables]$ hadoop fs -text /user/sunita/tables/jobs/jobs_noSite_parsed.json |more {"jobs": {"_total": 1812, "_count": 20, "_start": 0, "values": [{"company": {"i d": 21836, "name": "CyberCoders"}, "postingDate": {"year": 2013, "day": 10, "mo nth": 6}, "descriptionSnippet": "Software Engineer-Hadoop, HDFS, HBase, Pig- Ve rtica Analytics Senior Hadoop Engineer - Skills Required - Hadoop, HDFS, HBase, Pig, SQL, Industrial Software Development, System Integration, Java, high perf ormance, multi-threading, VerticaWe are a well known consumer product developme nt company and we are looking to add a Hadoop Engineer to our Engineering team. You will be working with the latest ", "expirationDate": {"year": 2013, "day" Its a single line, so used 'more' rather than 'head'. But effectively, the file exists and has the data. regards Sunita On Tue, Jun 18, 2013 at 8:38 PM, Stephen Sprague <sprag...@gmail.com> wrote: > As Nitin alluded to its best to confirm the data is definitely in hdfs > using hdfs semantics rather than hive for the first step. > > 1. how big is it? hadoop fs -ls <your hdfs dir> > 2. cat a bit of it and see if anything is there. hadoop fs -text <your > hdfs dir>/<filename> | head -10 > > do you see any data from step #2? > > > > > On Tue, Jun 18, 2013 at 3:58 PM, Sunita Arvind <sunitarv...@gmail.com>wrote: > >> I ran some complex queries. Something to the extent of >> select jobs from jobs; >> which triggers map reduce jobs but does not show errors and produces the >> same output "null". If I try referencing the struct elements, I get error >> which seems to be the root cause. >> >> Attached are the select statement outputs with the corresponding hive >> logs. >> >> I have also attached my usage details of another table - try_parsed which >> has a subset of the same data which seems to work fine. Also attached is >> the input file for this table - try_parsed.json >> Thanks for your help >> >> Sunita >> >> >> On Tue, Jun 18, 2013 at 4:35 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote: >> >>> can you run a little more complex query >>> >>> select uniq across columns or do some maths. so we know when it fires up >>> a mapreduce >>> >>> >>> On Wed, Jun 19, 2013 at 1:59 AM, Sunita Arvind <sunitarv...@gmail.com>wrote: >>> >>>> Thanks for responding Nitin. Yes I am sure that serde is working fine >>>> and json file is being picked based on all the errors that showed up till >>>> this stage. What sort of error are you suspecting. File not present or >>>> serde not parsing it ? >>>> >>>> >>>> On Tuesday, June 18, 2013, Nitin Pawar wrote: >>>> >>>>> select * from table is as good as hdfs -cat >>>>> >>>>> are you sure there is any data in the table? >>>>> >>>>> >>>>> On Tue, Jun 18, 2013 at 11:54 PM, Sunita Arvind <sunitarv...@gmail.com >>>>> > wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am able to parse the input JSON file and load it into hive. I do >>>>>> not see any errors with create table, so I am assuming that. But when I >>>>>> try >>>>>> to read the data, I get null >>>>>> >>>>>> hive> select * from jobs; >>>>>> OK >>>>>> null >>>>>> >>>>>> I have validated the JSON with JSONLint and Notepad++ JSON plugin and >>>>>> it is a valid JSON. Here is my create table statement and attached >>>>>> is the json input file. >>>>>> >>>>>> create external table jobs ( >>>>>> jobs STRUCT< >>>>>> values : ARRAY<STRUCT< >>>>>> company : STRUCT< >>>>>> id : STRING, >>>>>> name : STRING>, >>>>>> postingDate : STRUCT< >>>>>> year : INT, >>>>>> day : INT, >>>>>> month : INT>, >>>>>> descriptionSnippet : STRING, >>>>>> expirationDate : STRUCT< >>>>>> year : INT, >>>>>> day : INT, >>>>>> month : INT>, >>>>>> position : STRUCT< >>>>>> title : STRING, >>>>>> jobFunctions : ARRAY<STRUCT< >>>>>> code : STRING, >>>>>> name : STRING>>, >>>>>> industries : ARRAY<STRUCT< >>>>>> code : STRING, >>>>>> id : STRING, >>>>>> name : STRING>>, >>>>>> jobType : STRUCT< >>>>>> code : STRING, >>>>>> name : STRING>, >>>>>> experienceLevel : STRUCT< >>>>>> code : STRING, >>>>>> name : STRING>>, >>>>>> id : STRING, >>>>>> customerJobCode : STRING, >>>>>> skillsAndExperience : STRING, >>>>>> salary : STRING, >>>>>> jobPoster : STRUCT< >>>>>> id : STRING, >>>>>> firstName : STRING, >>>>>> lastName : STRING, >>>>>> headline : STRING>, >>>>>> referralBonus : STRING, >>>>>> locationDescription : STRING>>> >>>>>> ) >>>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' >>>>>> LOCATION '/user/sunita/tables/jobs'; >>>>>> >>>>>> The table creation works fine, but when I attempt to query, I get >>>>>> null as the result. >>>>>> I tried adding Input/Output formats, Serde Properties, nothing seems >>>>>> to impact. >>>>>> >>>>>> I am of the opinion that the libraries cannot handle this level of >>>>>> nesting and I probably will have to write a custom serde or a parser >>>>>> myself. Just wanted to seek guidance before I get into that. Appreciate >>>>>> your help and guidance. >>>>>> >>>>>> regards >>>>>> Sunita >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Nitin Pawar >>>>> >>>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >> >