Re: Hive select shows null after successful data load

Sunita Arvind Wed, 19 Jun 2013 10:02:24 -0700

Thanks for looking into it Ramki.
Yes I had tried these options. Here is what I get (renamed the table to
have a meaningful name):


hive> select jobs.values[1].id from linkedinjobsearch;
......mapreduce task details....
OK
NULL
Time taken: 9.586 seconds


hive> select jobs.values[0].position.title from linkedinjobsearch;
Total MapReduce jobs = 1
Launching Job 1 out of 1

OK
NULL
Time taken: 9.617 seconds


I am trying to connect btrace to the process to be able to trace the code
but cant get it to respond. Here is what I tried:

[sunita@node01 ~]$ hive --debug, recursive=y, port=7000,mainSuspend=y,
childSuspend=y
ERROR: Cannot load this JVM TI agent twice, check your java command line
for duplicate jdwp options.
Error occurred during initialization of VM
agent library failed to init: jdwp

Tried changing the port also. Any idea regarding the debuggers that can be
used. I also tried explain query and that does not show any issues either.

regards
Sunita







On Wed, Jun 19, 2013 at 12:11 PM, Ramki Palle <ramki.pa...@gmail.com> wrote:

> Can you run some other queries from job1 table and see if any query
> returns some data?
>
> I am guessing your query "select jobs.values.position.title from jobs1;"
> may have some issue. May be it should be as
>
> select jobs.values[0].position.title from jobs1;
>
>
> Regards,
> Ramki.
>
>
> On Wed, Jun 19, 2013 at 8:24 AM, Sunita Arvind <sunitarv...@gmail.com>wrote:
>
>> Thanks Stephen,
>>
>> That's just what I tried with the try_parsed table. It is exactly same
>> data with lesser nesting in the structure and lesser number of entries.
>> Do you mean to say that highly nested jsons can lead to issues? What are
>> typical solution to such issues? Write UDFs in hive or parse the JSON into
>> a delimited file?
>> I have heard of custom serdes also. Not sure if UDFs and custom serdes
>> are one and the same.
>>
>> regards
>> Sunita
>>
>>
>> On Wed, Jun 19, 2013 at 10:38 AM, Stephen Sprague <sprag...@gmail.com>wrote:
>>
>>> I think you might have to start small here instead of going for the home
>>> run on the first swing.  when all else fails start with a trivial json
>>> object and then build up from there and see what additional step breaks
>>> it.   that way you know if the trivial example fails is something
>>> fundamental and not the complexity of your json object that's throwing
>>> things off.
>>>
>>>
>>> On Wed, Jun 19, 2013 at 4:34 AM, Sunita Arvind <sunitarv...@gmail.com>wrote:
>>>
>>>> Thanks for sharing your experience Richa.
>>>> I do have timestamps but in the format of year : INT, day : INT, month
>>>> : INT.
>>>> As per your suggestion, I changed them all to string, but still get
>>>> null as the output.
>>>>
>>>> regards
>>>> Sunita
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 2:17 AM, Richa Sharma <
>>>> mailtorichasha...@gmail.com> wrote:
>>>>
>>>>> Do you have any timestamp fields in the table that might contain null
>>>>> value ?
>>>>>
>>>>> I faced a similar situation sometime back - changing the data type to
>>>>> string made it work.
>>>>>
>>>>> But I was working on delimited text files.
>>>>> Not sure if it applies to JSON .. but its still worth giving a try !!
>>>>>
>>>>> Richa
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2013 at 7:28 AM, Sunita Arvind 
>>>>> <sunitarv...@gmail.com>wrote:
>>>>>
>>>>>> Having the a column name same as the table name, is a problem due to
>>>>>> which I was not able to reference jobs.values.id from jobs. Changing
>>>>>> the table name to jobs1 resolved the semantic error.
>>>>>> However, the query still returns null
>>>>>>
>>>>>> hive> select jobs.values.position.title from jobs1;
>>>>>> Total MapReduce jobs = 1
>>>>>> Launching Job 1 out of 1
>>>>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>>>>> Starting Job = job_201306080116_0036, Tracking URL =
>>>>>> http://node01.expressanalytics.net:50030/jobdetails.jsp?jobid=job_201306080116_0036
>>>>>> Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill
>>>>>> job_201306080116_0036
>>>>>> Hadoop job information for Stage-1: number of mappers: 1; number of
>>>>>> reducers: 0
>>>>>> 2013-06-18 18:55:52,381 Stage-1 map = 0%,  reduce = 0%
>>>>>> 2013-06-18 18:55:56,394 Stage-1 map = 100%,  reduce = 0%, Cumulative
>>>>>> CPU 0.88 sec
>>>>>> 2013-06-18 18:55:57,400 Stage-1 map = 100%,  reduce = 0%, Cumulative
>>>>>> CPU 0.88 sec
>>>>>> 2013-06-18 18:55:58,407 Stage-1 map = 100%,  reduce = 100%,
>>>>>> Cumulative CPU 0.88 sec
>>>>>> MapReduce Total cumulative CPU time: 880 msec
>>>>>> Ended Job = job_201306080116_0036
>>>>>> MapReduce Jobs Launched:
>>>>>> Job 0: Map: 1   Cumulative CPU: 0.88 sec   HDFS Read: 35374 HDFS
>>>>>> Write: 3 SUCCESS
>>>>>> Total MapReduce CPU Time Spent: 880 msec
>>>>>> OK
>>>>>> null
>>>>>> Time taken: 9.591 seconds
>>>>>>
>>>>>>
>>>>>> regards
>>>>>> Sunita
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 18, 2013 at 9:35 PM, Sunita Arvind <sunitarv...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Ok.
>>>>>>> The data files are quite small. Around 35 KB and 1 KB each.
>>>>>>>
>>>>>>> [sunita@node01 tables]$  hadoop fs -ls /user/sunita/tables/jobs
>>>>>>> Found 1 items
>>>>>>> -rw-r--r--   3 sunita hdfs      35172 2013-06-18 18:31
>>>>>>> /user/sunita/tables/jobs/jobs_noSite_parsed.json
>>>>>>>
>>>>>>>
>>>>>>> [sunita@node01 tables]$ hadoop fs -text
>>>>>>> /user/sunita/tables/jobs/jobs_noSite_parsed.json |more
>>>>>>> {"jobs": {"_total": 1812, "_count": 20, "_start": 0, "values":
>>>>>>> [{"company": {"i
>>>>>>> d": 21836, "name": "CyberCoders"}, "postingDate": {"year": 2013,
>>>>>>> "day": 10, "mo
>>>>>>> nth": 6}, "descriptionSnippet": "Software Engineer-Hadoop, HDFS,
>>>>>>> HBase, Pig- Ve
>>>>>>> rtica Analytics Senior Hadoop Engineer - Skills Required - Hadoop,
>>>>>>> HDFS, HBase,
>>>>>>>  Pig, SQL, Industrial Software Development, System Integration,
>>>>>>> Java, high perf
>>>>>>> ormance, multi-threading, VerticaWe are a well known consumer
>>>>>>> product developme
>>>>>>> nt company and we are looking to add a Hadoop Engineer to our
>>>>>>> Engineering team.
>>>>>>>   You will be working with the latest ", "expirationDate": {"year":
>>>>>>> 2013, "day"
>>>>>>>
>>>>>>> Its a single line, so used 'more' rather than 'head'. But
>>>>>>> effectively, the file exists and has the data.
>>>>>>>
>>>>>>> regards
>>>>>>> Sunita
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 18, 2013 at 8:38 PM, Stephen Sprague <sprag...@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> As Nitin alluded to its best to confirm the data is definitely in
>>>>>>>> hdfs using hdfs semantics rather than hive for the first step.
>>>>>>>>
>>>>>>>> 1. how big is it?   hadoop fs -ls <your hdfs dir>
>>>>>>>> 2. cat a bit of it and see if anything is there.   hadoop fs -text
>>>>>>>> <your hdfs dir>/<filename> | head -10
>>>>>>>>
>>>>>>>> do you see any data from step #2?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 18, 2013 at 3:58 PM, Sunita Arvind <
>>>>>>>> sunitarv...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I ran some complex queries. Something to the extent of
>>>>>>>>>                     select jobs from jobs;
>>>>>>>>>  which triggers map reduce jobs but does not show errors and
>>>>>>>>> produces the same output "null". If I try referencing the struct 
>>>>>>>>> elements,
>>>>>>>>> I get error which seems to be the root cause.
>>>>>>>>>
>>>>>>>>> Attached are the select statement outputs with the corresponding
>>>>>>>>> hive logs.
>>>>>>>>>
>>>>>>>>> I have also attached my usage details of another table -
>>>>>>>>> try_parsed which has a subset of the same data which seems to work 
>>>>>>>>> fine.
>>>>>>>>> Also attached is the input file for this table - try_parsed.json
>>>>>>>>> Thanks for your help
>>>>>>>>>
>>>>>>>>> Sunita
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 18, 2013 at 4:35 PM, Nitin Pawar <
>>>>>>>>> nitinpawar...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> can you run a little more complex query
>>>>>>>>>>
>>>>>>>>>> select uniq across columns or do some maths. so we know when it
>>>>>>>>>> fires up a mapreduce
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 19, 2013 at 1:59 AM, Sunita Arvind <
>>>>>>>>>> sunitarv...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for responding Nitin. Yes I am sure that serde is working
>>>>>>>>>>> fine and json file is being picked based on all the errors that 
>>>>>>>>>>> showed up
>>>>>>>>>>> till this stage. What sort of error are you suspecting. File not 
>>>>>>>>>>> present or
>>>>>>>>>>> serde not parsing it ?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tuesday, June 18, 2013, Nitin Pawar wrote:
>>>>>>>>>>>
>>>>>>>>>>>> select * from table is as good as hdfs -cat
>>>>>>>>>>>>
>>>>>>>>>>>> are you sure there is any data in the table?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 18, 2013 at 11:54 PM, Sunita Arvind <
>>>>>>>>>>>> sunitarv...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am able to parse the input JSON file and load it into hive.
>>>>>>>>>>>>> I do not see any errors with create table, so I am assuming that. 
>>>>>>>>>>>>> But when
>>>>>>>>>>>>> I try to read the data, I get null
>>>>>>>>>>>>>
>>>>>>>>>>>>> hive> select * from jobs;
>>>>>>>>>>>>> OK
>>>>>>>>>>>>> null
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have validated the JSON with JSONLint and Notepad++ JSON
>>>>>>>>>>>>> plugin and it is a valid JSON. Here is my create table
>>>>>>>>>>>>> statement and attached is the json input file.
>>>>>>>>>>>>>
>>>>>>>>>>>>> create external table jobs (
>>>>>>>>>>>>> jobs STRUCT<
>>>>>>>>>>>>> values : ARRAY<STRUCT<
>>>>>>>>>>>>> company : STRUCT<
>>>>>>>>>>>>> id : STRING,
>>>>>>>>>>>>> name : STRING>,
>>>>>>>>>>>>> postingDate : STRUCT<
>>>>>>>>>>>>> year : INT,
>>>>>>>>>>>>> day : INT,
>>>>>>>>>>>>> month : INT>,
>>>>>>>>>>>>> descriptionSnippet : STRING,
>>>>>>>>>>>>> expirationDate : STRUCT<
>>>>>>>>>>>>> year : INT,
>>>>>>>>>>>>> day : INT,
>>>>>>>>>>>>> month : INT>,
>>>>>>>>>>>>> position : STRUCT<
>>>>>>>>>>>>> title : STRING,
>>>>>>>>>>>>> jobFunctions : ARRAY<STRUCT<
>>>>>>>>>>>>> code : STRING,
>>>>>>>>>>>>> name : STRING>>,
>>>>>>>>>>>>> industries : ARRAY<STRUCT<
>>>>>>>>>>>>> code : STRING,
>>>>>>>>>>>>> id : STRING,
>>>>>>>>>>>>> name : STRING>>,
>>>>>>>>>>>>> jobType : STRUCT<
>>>>>>>>>>>>> code : STRING,
>>>>>>>>>>>>> name : STRING>,
>>>>>>>>>>>>> experienceLevel : STRUCT<
>>>>>>>>>>>>> code : STRING,
>>>>>>>>>>>>> name : STRING>>,
>>>>>>>>>>>>> id : STRING,
>>>>>>>>>>>>> customerJobCode : STRING,
>>>>>>>>>>>>> skillsAndExperience : STRING,
>>>>>>>>>>>>> salary : STRING,
>>>>>>>>>>>>> jobPoster : STRUCT<
>>>>>>>>>>>>> id : STRING,
>>>>>>>>>>>>> firstName : STRING,
>>>>>>>>>>>>> lastName : STRING,
>>>>>>>>>>>>> headline : STRING>,
>>>>>>>>>>>>> referralBonus : STRING,
>>>>>>>>>>>>> locationDescription : STRING>>>
>>>>>>>>>>>>>  )
>>>>>>>>>>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
>>>>>>>>>>>>> LOCATION '/user/sunita/tables/jobs';
>>>>>>>>>>>>>
>>>>>>>>>>>>> The table creation works fine, but when I attempt to query, I
>>>>>>>>>>>>> get null as the result.
>>>>>>>>>>>>> I tried adding Input/Output formats, Serde Properties, nothing
>>>>>>>>>>>>> seems to impact.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am of the opinion that the libraries cannot handle this
>>>>>>>>>>>>> level of nesting and I probably will have to write a custom serde 
>>>>>>>>>>>>> or a
>>>>>>>>>>>>> parser myself. Just wanted to seek guidance before I get into 
>>>>>>>>>>>>> that.
>>>>>>>>>>>>> Appreciate your help and guidance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> regards
>>>>>>>>>>>>> Sunita
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Nitin Pawar
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Nitin Pawar
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hive select shows null after successful data load

Reply via email to