Re: Hive select shows null after successful data load

Sunita Arvind Wed, 19 Jun 2013 04:35:59 -0700

Thanks for sharing your experience Richa.
I do have timestamps but in the format of year : INT, day : INT, month :
INT.
As per your suggestion, I changed them all to string, but still get null as
the output.


regards
Sunita


On Wed, Jun 19, 2013 at 2:17 AM, Richa Sharma
<mailtorichasha...@gmail.com>wrote:

> Do you have any timestamp fields in the table that might contain null
> value ?
>
> I faced a similar situation sometime back - changing the data type to
> string made it work.
>
> But I was working on delimited text files.
> Not sure if it applies to JSON .. but its still worth giving a try !!
>
> Richa
>
>
>
> On Wed, Jun 19, 2013 at 7:28 AM, Sunita Arvind <sunitarv...@gmail.com>wrote:
>
>> Having the a column name same as the table name, is a problem due to
>> which I was not able to reference jobs.values.id from jobs. Changing the
>> table name to jobs1 resolved the semantic error.
>> However, the query still returns null
>>
>> hive> select jobs.values.position.title from jobs1;
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> Starting Job = job_201306080116_0036, Tracking URL =
>> http://node01.expressanalytics.net:50030/jobdetails.jsp?jobid=job_201306080116_0036
>> Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_201306080116_0036
>> Hadoop job information for Stage-1: number of mappers: 1; number of
>> reducers: 0
>> 2013-06-18 18:55:52,381 Stage-1 map = 0%,  reduce = 0%
>> 2013-06-18 18:55:56,394 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
>> 0.88 sec
>> 2013-06-18 18:55:57,400 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
>> 0.88 sec
>> 2013-06-18 18:55:58,407 Stage-1 map = 100%,  reduce = 100%, Cumulative
>> CPU 0.88 sec
>> MapReduce Total cumulative CPU time: 880 msec
>> Ended Job = job_201306080116_0036
>> MapReduce Jobs Launched:
>> Job 0: Map: 1   Cumulative CPU: 0.88 sec   HDFS Read: 35374 HDFS Write: 3
>> SUCCESS
>> Total MapReduce CPU Time Spent: 880 msec
>> OK
>> null
>> Time taken: 9.591 seconds
>>
>>
>> regards
>> Sunita
>>
>>
>> On Tue, Jun 18, 2013 at 9:35 PM, Sunita Arvind <sunitarv...@gmail.com>wrote:
>>
>>> Ok.
>>> The data files are quite small. Around 35 KB and 1 KB each.
>>>
>>> [sunita@node01 tables]$  hadoop fs -ls /user/sunita/tables/jobs
>>> Found 1 items
>>> -rw-r--r--   3 sunita hdfs      35172 2013-06-18 18:31
>>> /user/sunita/tables/jobs/jobs_noSite_parsed.json
>>>
>>>
>>> [sunita@node01 tables]$ hadoop fs -text
>>> /user/sunita/tables/jobs/jobs_noSite_parsed.json |more
>>> {"jobs": {"_total": 1812, "_count": 20, "_start": 0, "values":
>>> [{"company": {"i
>>> d": 21836, "name": "CyberCoders"}, "postingDate": {"year": 2013, "day":
>>> 10, "mo
>>> nth": 6}, "descriptionSnippet": "Software Engineer-Hadoop, HDFS, HBase,
>>> Pig- Ve
>>> rtica Analytics Senior Hadoop Engineer - Skills Required - Hadoop, HDFS,
>>> HBase,
>>>  Pig, SQL, Industrial Software Development, System Integration, Java,
>>> high perf
>>> ormance, multi-threading, VerticaWe are a well known consumer product
>>> developme
>>> nt company and we are looking to add a Hadoop Engineer to our
>>> Engineering team.
>>>   You will be working with the latest ", "expirationDate": {"year":
>>> 2013, "day"
>>>
>>> Its a single line, so used 'more' rather than 'head'. But effectively,
>>> the file exists and has the data.
>>>
>>> regards
>>> Sunita
>>>
>>>
>>> On Tue, Jun 18, 2013 at 8:38 PM, Stephen Sprague <sprag...@gmail.com>wrote:
>>>
>>>> As Nitin alluded to its best to confirm the data is definitely in hdfs
>>>> using hdfs semantics rather than hive for the first step.
>>>>
>>>> 1. how big is it?   hadoop fs -ls <your hdfs dir>
>>>> 2. cat a bit of it and see if anything is there.   hadoop fs -text
>>>> <your hdfs dir>/<filename> | head -10
>>>>
>>>> do you see any data from step #2?
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jun 18, 2013 at 3:58 PM, Sunita Arvind 
>>>> <sunitarv...@gmail.com>wrote:
>>>>
>>>>> I ran some complex queries. Something to the extent of
>>>>>                     select jobs from jobs;
>>>>>  which triggers map reduce jobs but does not show errors and produces
>>>>> the same output "null". If I try referencing the struct elements, I get
>>>>> error which seems to be the root cause.
>>>>>
>>>>> Attached are the select statement outputs with the corresponding hive
>>>>> logs.
>>>>>
>>>>> I have also attached my usage details of another table - try_parsed
>>>>> which has a subset of the same data which seems to work fine. Also 
>>>>> attached
>>>>> is the input file for this table - try_parsed.json
>>>>> Thanks for your help
>>>>>
>>>>> Sunita
>>>>>
>>>>>
>>>>> On Tue, Jun 18, 2013 at 4:35 PM, Nitin Pawar 
>>>>> <nitinpawar...@gmail.com>wrote:
>>>>>
>>>>>> can you run a little more complex query
>>>>>>
>>>>>> select uniq across columns or do some maths. so we know when it fires
>>>>>> up a mapreduce
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 19, 2013 at 1:59 AM, Sunita Arvind <sunitarv...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Thanks for responding Nitin. Yes I am sure that serde is working
>>>>>>> fine and json file is being picked based on all the errors that showed 
>>>>>>> up
>>>>>>> till this stage. What sort of error are you suspecting. File not 
>>>>>>> present or
>>>>>>> serde not parsing it ?
>>>>>>>
>>>>>>>
>>>>>>> On Tuesday, June 18, 2013, Nitin Pawar wrote:
>>>>>>>
>>>>>>>> select * from table is as good as hdfs -cat
>>>>>>>>
>>>>>>>> are you sure there is any data in the table?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 18, 2013 at 11:54 PM, Sunita Arvind <
>>>>>>>> sunitarv...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I am able to parse the input JSON file and load it into hive. I do
>>>>>>>>> not see any errors with create table, so I am assuming that. But when 
>>>>>>>>> I try
>>>>>>>>> to read the data, I get null
>>>>>>>>>
>>>>>>>>> hive> select * from jobs;
>>>>>>>>> OK
>>>>>>>>> null
>>>>>>>>>
>>>>>>>>> I have validated the JSON with JSONLint and Notepad++ JSON plugin
>>>>>>>>> and it is a valid JSON. Here is my create table statement and
>>>>>>>>> attached is the json input file.
>>>>>>>>>
>>>>>>>>> create external table jobs (
>>>>>>>>> jobs STRUCT<
>>>>>>>>> values : ARRAY<STRUCT<
>>>>>>>>> company : STRUCT<
>>>>>>>>> id : STRING,
>>>>>>>>> name : STRING>,
>>>>>>>>> postingDate : STRUCT<
>>>>>>>>> year : INT,
>>>>>>>>> day : INT,
>>>>>>>>> month : INT>,
>>>>>>>>> descriptionSnippet : STRING,
>>>>>>>>> expirationDate : STRUCT<
>>>>>>>>> year : INT,
>>>>>>>>> day : INT,
>>>>>>>>> month : INT>,
>>>>>>>>> position : STRUCT<
>>>>>>>>> title : STRING,
>>>>>>>>> jobFunctions : ARRAY<STRUCT<
>>>>>>>>> code : STRING,
>>>>>>>>> name : STRING>>,
>>>>>>>>> industries : ARRAY<STRUCT<
>>>>>>>>> code : STRING,
>>>>>>>>> id : STRING,
>>>>>>>>> name : STRING>>,
>>>>>>>>> jobType : STRUCT<
>>>>>>>>> code : STRING,
>>>>>>>>> name : STRING>,
>>>>>>>>> experienceLevel : STRUCT<
>>>>>>>>> code : STRING,
>>>>>>>>> name : STRING>>,
>>>>>>>>> id : STRING,
>>>>>>>>> customerJobCode : STRING,
>>>>>>>>> skillsAndExperience : STRING,
>>>>>>>>> salary : STRING,
>>>>>>>>> jobPoster : STRUCT<
>>>>>>>>> id : STRING,
>>>>>>>>> firstName : STRING,
>>>>>>>>> lastName : STRING,
>>>>>>>>> headline : STRING>,
>>>>>>>>> referralBonus : STRING,
>>>>>>>>> locationDescription : STRING>>>
>>>>>>>>>  )
>>>>>>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
>>>>>>>>> LOCATION '/user/sunita/tables/jobs';
>>>>>>>>>
>>>>>>>>> The table creation works fine, but when I attempt to query, I get
>>>>>>>>> null as the result.
>>>>>>>>> I tried adding Input/Output formats, Serde Properties, nothing
>>>>>>>>> seems to impact.
>>>>>>>>>
>>>>>>>>> I am of the opinion that the libraries cannot handle this level of
>>>>>>>>> nesting and I probably will have to write a custom serde or a parser
>>>>>>>>> myself. Just wanted to seek guidance before I get into that. 
>>>>>>>>> Appreciate
>>>>>>>>> your help and guidance.
>>>>>>>>>
>>>>>>>>> regards
>>>>>>>>> Sunita
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Nitin Pawar
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Nitin Pawar
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hive select shows null after successful data load

Reply via email to