Re: Hive select shows null after successful data load

Richa Sharma Tue, 18 Jun 2013 23:19:03 -0700

Do you have any timestamp fields in the table that might contain null value
?


I faced a similar situation sometime back - changing the data type to
string made it work.

But I was working on delimited text files.
Not sure if it applies to JSON .. but its still worth giving a try !!

Richa



On Wed, Jun 19, 2013 at 7:28 AM, Sunita Arvind <sunitarv...@gmail.com>wrote:

> Having the a column name same as the table name, is a problem due to which
> I was not able to reference jobs.values.id from jobs. Changing the table
> name to jobs1 resolved the semantic error.
> However, the query still returns null
>
> hive> select jobs.values.position.title from jobs1;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201306080116_0036, Tracking URL =
> http://node01.expressanalytics.net:50030/jobdetails.jsp?jobid=job_201306080116_0036
> Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_201306080116_0036
> Hadoop job information for Stage-1: number of mappers: 1; number of
> reducers: 0
> 2013-06-18 18:55:52,381 Stage-1 map = 0%,  reduce = 0%
> 2013-06-18 18:55:56,394 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 0.88 sec
> 2013-06-18 18:55:57,400 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 0.88 sec
> 2013-06-18 18:55:58,407 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 0.88 sec
> MapReduce Total cumulative CPU time: 880 msec
> Ended Job = job_201306080116_0036
> MapReduce Jobs Launched:
> Job 0: Map: 1   Cumulative CPU: 0.88 sec   HDFS Read: 35374 HDFS Write: 3
> SUCCESS
> Total MapReduce CPU Time Spent: 880 msec
> OK
> null
> Time taken: 9.591 seconds
>
>
> regards
> Sunita
>
>
> On Tue, Jun 18, 2013 at 9:35 PM, Sunita Arvind <sunitarv...@gmail.com>wrote:
>
>> Ok.
>> The data files are quite small. Around 35 KB and 1 KB each.
>>
>> [sunita@node01 tables]$  hadoop fs -ls /user/sunita/tables/jobs
>> Found 1 items
>> -rw-r--r--   3 sunita hdfs      35172 2013-06-18 18:31
>> /user/sunita/tables/jobs/jobs_noSite_parsed.json
>>
>>
>> [sunita@node01 tables]$ hadoop fs -text
>> /user/sunita/tables/jobs/jobs_noSite_parsed.json |more
>> {"jobs": {"_total": 1812, "_count": 20, "_start": 0, "values":
>> [{"company": {"i
>> d": 21836, "name": "CyberCoders"}, "postingDate": {"year": 2013, "day":
>> 10, "mo
>> nth": 6}, "descriptionSnippet": "Software Engineer-Hadoop, HDFS, HBase,
>> Pig- Ve
>> rtica Analytics Senior Hadoop Engineer - Skills Required - Hadoop, HDFS,
>> HBase,
>>  Pig, SQL, Industrial Software Development, System Integration, Java,
>> high perf
>> ormance, multi-threading, VerticaWe are a well known consumer product
>> developme
>> nt company and we are looking to add a Hadoop Engineer to our Engineering
>> team.
>>   You will be working with the latest ", "expirationDate": {"year": 2013,
>> "day"
>>
>> Its a single line, so used 'more' rather than 'head'. But effectively,
>> the file exists and has the data.
>>
>> regards
>> Sunita
>>
>>
>> On Tue, Jun 18, 2013 at 8:38 PM, Stephen Sprague <sprag...@gmail.com>wrote:
>>
>>> As Nitin alluded to its best to confirm the data is definitely in hdfs
>>> using hdfs semantics rather than hive for the first step.
>>>
>>> 1. how big is it?   hadoop fs -ls <your hdfs dir>
>>> 2. cat a bit of it and see if anything is there.   hadoop fs -text <your
>>> hdfs dir>/<filename> | head -10
>>>
>>> do you see any data from step #2?
>>>
>>>
>>>
>>>
>>> On Tue, Jun 18, 2013 at 3:58 PM, Sunita Arvind <sunitarv...@gmail.com>wrote:
>>>
>>>> I ran some complex queries. Something to the extent of
>>>>                     select jobs from jobs;
>>>>  which triggers map reduce jobs but does not show errors and produces
>>>> the same output "null". If I try referencing the struct elements, I get
>>>> error which seems to be the root cause.
>>>>
>>>> Attached are the select statement outputs with the corresponding hive
>>>> logs.
>>>>
>>>> I have also attached my usage details of another table - try_parsed
>>>> which has a subset of the same data which seems to work fine. Also attached
>>>> is the input file for this table - try_parsed.json
>>>> Thanks for your help
>>>>
>>>> Sunita
>>>>
>>>>
>>>> On Tue, Jun 18, 2013 at 4:35 PM, Nitin Pawar 
>>>> <nitinpawar...@gmail.com>wrote:
>>>>
>>>>> can you run a little more complex query
>>>>>
>>>>> select uniq across columns or do some maths. so we know when it fires
>>>>> up a mapreduce
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2013 at 1:59 AM, Sunita Arvind 
>>>>> <sunitarv...@gmail.com>wrote:
>>>>>
>>>>>> Thanks for responding Nitin. Yes I am sure that serde is working fine
>>>>>> and json file is being picked based on all the errors that showed up till
>>>>>> this stage. What sort of error are you suspecting. File not present or
>>>>>> serde not parsing it ?
>>>>>>
>>>>>>
>>>>>> On Tuesday, June 18, 2013, Nitin Pawar wrote:
>>>>>>
>>>>>>> select * from table is as good as hdfs -cat
>>>>>>>
>>>>>>> are you sure there is any data in the table?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 18, 2013 at 11:54 PM, Sunita Arvind <
>>>>>>> sunitarv...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am able to parse the input JSON file and load it into hive. I do
>>>>>>>> not see any errors with create table, so I am assuming that. But when 
>>>>>>>> I try
>>>>>>>> to read the data, I get null
>>>>>>>>
>>>>>>>> hive> select * from jobs;
>>>>>>>> OK
>>>>>>>> null
>>>>>>>>
>>>>>>>> I have validated the JSON with JSONLint and Notepad++ JSON plugin
>>>>>>>> and it is a valid JSON. Here is my create table statement and
>>>>>>>> attached is the json input file.
>>>>>>>>
>>>>>>>> create external table jobs (
>>>>>>>> jobs STRUCT<
>>>>>>>> values : ARRAY<STRUCT<
>>>>>>>> company : STRUCT<
>>>>>>>> id : STRING,
>>>>>>>> name : STRING>,
>>>>>>>> postingDate : STRUCT<
>>>>>>>> year : INT,
>>>>>>>> day : INT,
>>>>>>>> month : INT>,
>>>>>>>> descriptionSnippet : STRING,
>>>>>>>> expirationDate : STRUCT<
>>>>>>>> year : INT,
>>>>>>>> day : INT,
>>>>>>>> month : INT>,
>>>>>>>> position : STRUCT<
>>>>>>>> title : STRING,
>>>>>>>> jobFunctions : ARRAY<STRUCT<
>>>>>>>> code : STRING,
>>>>>>>> name : STRING>>,
>>>>>>>> industries : ARRAY<STRUCT<
>>>>>>>> code : STRING,
>>>>>>>> id : STRING,
>>>>>>>> name : STRING>>,
>>>>>>>> jobType : STRUCT<
>>>>>>>> code : STRING,
>>>>>>>> name : STRING>,
>>>>>>>> experienceLevel : STRUCT<
>>>>>>>> code : STRING,
>>>>>>>> name : STRING>>,
>>>>>>>> id : STRING,
>>>>>>>> customerJobCode : STRING,
>>>>>>>> skillsAndExperience : STRING,
>>>>>>>> salary : STRING,
>>>>>>>> jobPoster : STRUCT<
>>>>>>>> id : STRING,
>>>>>>>> firstName : STRING,
>>>>>>>> lastName : STRING,
>>>>>>>> headline : STRING>,
>>>>>>>> referralBonus : STRING,
>>>>>>>> locationDescription : STRING>>>
>>>>>>>>  )
>>>>>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
>>>>>>>> LOCATION '/user/sunita/tables/jobs';
>>>>>>>>
>>>>>>>> The table creation works fine, but when I attempt to query, I get
>>>>>>>> null as the result.
>>>>>>>> I tried adding Input/Output formats, Serde Properties, nothing
>>>>>>>> seems to impact.
>>>>>>>>
>>>>>>>> I am of the opinion that the libraries cannot handle this level of
>>>>>>>> nesting and I probably will have to write a custom serde or a parser
>>>>>>>> myself. Just wanted to seek guidance before I get into that. Appreciate
>>>>>>>> your help and guidance.
>>>>>>>>
>>>>>>>> regards
>>>>>>>> Sunita
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Nitin Pawar
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Hive select shows null after successful data load

Reply via email to