sorry for the spam. but one correction. json_pp appears to be part of the perl core so if you have a modern perl you may already have it installed.
On Mon, Jul 1, 2013 at 1:31 PM, Stephen Sprague <sprag...@gmail.com> wrote: > ok. so i just learned of a perl script called "json_pp" (for json > pretty_prrint) that is a binary included in the distro for the perl JSON > module. You gotta figure there are analogous tools in other languages as > well but given this is a binary it doesn't matter much what language its > written in - it just works. > > so doing this: $ hive -e 'select <your_json_column> from <your_table>' | > json_pp > > you'd actually be able to see what your json looks like in a human > readable fashion. > > get the binary (and the module) here: https://metacpan.org/module/JSON > > > On Wed, Jun 26, 2013 at 4:38 PM, Sunita Arvind <sunitarv...@gmail.com>wrote: > >> Ok. Thanks Stephen. I will try that out. >> Will update the group if I am able to get this to work. For now, I will >> continue with non-partitioned table. >> >> regards >> Sunita >> >> >> On Wed, Jun 26, 2013 at 7:11 PM, Stephen Sprague <sprag...@gmail.com>wrote: >> >>> it would appear to be that you may partition only by non-nested >>> columns. I would recommend transforming your original dataset into one >>> where the first column is YYYYMM and the rest is your json object. During >>> this transformation you may also wish to make further optimizations as well >>> since you'll be scanning every record. >>> >>> as always my 2 cents only. >>> >>> >>> On Wed, Jun 26, 2013 at 3:47 PM, Sunita Arvind <sunitarv...@gmail.com>wrote: >>> >>>> Hi, >>>> >>>> I am unable to create a partitioned table. >>>> The error I get is: >>>> FAILED: ParseException line 37:16 mismatched input >>>> '"jobs.values.postingDate.year"' expecting Identifier near '(' in column >>>> specification >>>> >>>> I tried referring to the columns in various ways, >>>> S.jobs.values.postingDate.year, with quotes, without quotes, get the same >>>> error. Also tried creating a partition by year alone. Still get the same >>>> error. >>>> >>>> Here is the create table statement: >>>> >>>> create external table linkedin_JobSearch ( >>>> jobs STRUCT< >>>> values : ARRAY<STRUCT< >>>> company : STRUCT< >>>> id : STRING, >>>> name : STRING>, >>>> postingDate : STRUCT< >>>> day : STRING>, >>>> descriptionSnippet : STRING, >>>> expirationDate : STRUCT< >>>> ...... >>>> ....... >>>> locationDescription : STRING>>> >>>> ) >>>> PARTITIONED BY ("jobs.values.postingDate.year" STRING, >>>> "jobs.values.postingDate.month" STRING) >>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' >>>> WITH SERDEPROPERTIES ( >>>> "company"="$.jobs.values.company.name", >>>> "position"="$.jobs.values.position.title", >>>> "customerJobCode"="$.jobs.values.customerJobCode", >>>> "locationDescription"="$.jobs.values.locationDescription", >>>> "jobPoster"="$.jobs.values.jobposter.headline" >>>> ) >>>> LOCATION '/user/sunita/Linkedin/JobSearch'; >>>> >>>> I need to be able to partition this information. Please help. >>>> >>>> regards >>>> Sunita >>>> >>> >>> >> >