Ok.
The data files are quite small. Around 35 KB and 1 KB each.

[sunita@node01 tables]$  hadoop fs -ls /user/sunita/tables/jobs
Found 1 items
-rw-r--r--   3 sunita hdfs      35172 2013-06-18 18:31
/user/sunita/tables/jobs/jobs_noSite_parsed.json


[sunita@node01 tables]$ hadoop fs -text
/user/sunita/tables/jobs/jobs_noSite_parsed.json |more
{"jobs": {"_total": 1812, "_count": 20, "_start": 0, "values": [{"company":
{"i
d": 21836, "name": "CyberCoders"}, "postingDate": {"year": 2013, "day": 10,
"mo
nth": 6}, "descriptionSnippet": "Software Engineer-Hadoop, HDFS, HBase,
Pig- Ve
rtica Analytics Senior Hadoop Engineer - Skills Required - Hadoop, HDFS,
HBase,
 Pig, SQL, Industrial Software Development, System Integration, Java, high
perf
ormance, multi-threading, VerticaWe are a well known consumer product
developme
nt company and we are looking to add a Hadoop Engineer to our Engineering
team.
  You will be working with the latest ", "expirationDate": {"year": 2013,
"day"

Its a single line, so used 'more' rather than 'head'. But effectively, the
file exists and has the data.

regards
Sunita


On Tue, Jun 18, 2013 at 8:38 PM, Stephen Sprague <sprag...@gmail.com> wrote:

> As Nitin alluded to its best to confirm the data is definitely in hdfs
> using hdfs semantics rather than hive for the first step.
>
> 1. how big is it?   hadoop fs -ls <your hdfs dir>
> 2. cat a bit of it and see if anything is there.   hadoop fs -text <your
> hdfs dir>/<filename> | head -10
>
> do you see any data from step #2?
>
>
>
>
> On Tue, Jun 18, 2013 at 3:58 PM, Sunita Arvind <sunitarv...@gmail.com>wrote:
>
>> I ran some complex queries. Something to the extent of
>>                     select jobs from jobs;
>>  which triggers map reduce jobs but does not show errors and produces the
>> same output "null". If I try referencing the struct elements, I get error
>> which seems to be the root cause.
>>
>> Attached are the select statement outputs with the corresponding hive
>> logs.
>>
>> I have also attached my usage details of another table - try_parsed which
>> has a subset of the same data which seems to work fine. Also attached is
>> the input file for this table - try_parsed.json
>> Thanks for your help
>>
>> Sunita
>>
>>
>> On Tue, Jun 18, 2013 at 4:35 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote:
>>
>>> can you run a little more complex query
>>>
>>> select uniq across columns or do some maths. so we know when it fires up
>>> a mapreduce
>>>
>>>
>>> On Wed, Jun 19, 2013 at 1:59 AM, Sunita Arvind <sunitarv...@gmail.com>wrote:
>>>
>>>> Thanks for responding Nitin. Yes I am sure that serde is working fine
>>>> and json file is being picked based on all the errors that showed up till
>>>> this stage. What sort of error are you suspecting. File not present or
>>>> serde not parsing it ?
>>>>
>>>>
>>>> On Tuesday, June 18, 2013, Nitin Pawar wrote:
>>>>
>>>>> select * from table is as good as hdfs -cat
>>>>>
>>>>> are you sure there is any data in the table?
>>>>>
>>>>>
>>>>> On Tue, Jun 18, 2013 at 11:54 PM, Sunita Arvind <sunitarv...@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am able to parse the input JSON file and load it into hive. I do
>>>>>> not see any errors with create table, so I am assuming that. But when I 
>>>>>> try
>>>>>> to read the data, I get null
>>>>>>
>>>>>> hive> select * from jobs;
>>>>>> OK
>>>>>> null
>>>>>>
>>>>>> I have validated the JSON with JSONLint and Notepad++ JSON plugin and
>>>>>> it is a valid JSON. Here is my create table statement and attached
>>>>>> is the json input file.
>>>>>>
>>>>>> create external table jobs (
>>>>>> jobs STRUCT<
>>>>>> values : ARRAY<STRUCT<
>>>>>> company : STRUCT<
>>>>>> id : STRING,
>>>>>> name : STRING>,
>>>>>> postingDate : STRUCT<
>>>>>> year : INT,
>>>>>> day : INT,
>>>>>> month : INT>,
>>>>>> descriptionSnippet : STRING,
>>>>>> expirationDate : STRUCT<
>>>>>> year : INT,
>>>>>> day : INT,
>>>>>> month : INT>,
>>>>>> position : STRUCT<
>>>>>> title : STRING,
>>>>>> jobFunctions : ARRAY<STRUCT<
>>>>>> code : STRING,
>>>>>> name : STRING>>,
>>>>>> industries : ARRAY<STRUCT<
>>>>>> code : STRING,
>>>>>> id : STRING,
>>>>>> name : STRING>>,
>>>>>> jobType : STRUCT<
>>>>>> code : STRING,
>>>>>> name : STRING>,
>>>>>> experienceLevel : STRUCT<
>>>>>> code : STRING,
>>>>>> name : STRING>>,
>>>>>> id : STRING,
>>>>>> customerJobCode : STRING,
>>>>>> skillsAndExperience : STRING,
>>>>>> salary : STRING,
>>>>>> jobPoster : STRUCT<
>>>>>> id : STRING,
>>>>>> firstName : STRING,
>>>>>> lastName : STRING,
>>>>>> headline : STRING>,
>>>>>> referralBonus : STRING,
>>>>>> locationDescription : STRING>>>
>>>>>>  )
>>>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
>>>>>> LOCATION '/user/sunita/tables/jobs';
>>>>>>
>>>>>> The table creation works fine, but when I attempt to query, I get
>>>>>> null as the result.
>>>>>> I tried adding Input/Output formats, Serde Properties, nothing seems
>>>>>> to impact.
>>>>>>
>>>>>> I am of the opinion that the libraries cannot handle this level of
>>>>>> nesting and I probably will have to write a custom serde or a parser
>>>>>> myself. Just wanted to seek guidance before I get into that. Appreciate
>>>>>> your help and guidance.
>>>>>>
>>>>>> regards
>>>>>> Sunita
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>

Reply via email to