it would appear to be that you may partition only by non-nested columns.  I
would recommend transforming your original dataset into one where the first
column is YYYYMM and the rest is your json object.  During this
transformation you may also wish to make further optimizations as well
since you'll be scanning every record.

as always my 2 cents only.


On Wed, Jun 26, 2013 at 3:47 PM, Sunita Arvind <sunitarv...@gmail.com>wrote:

> Hi,
>
> I am unable to create a partitioned table.
> The error I get is:
> FAILED: ParseException line 37:16 mismatched input
> '"jobs.values.postingDate.year"' expecting Identifier near '(' in column
> specification
>
> I tried referring to the columns in various ways,
> S.jobs.values.postingDate.year, with quotes, without quotes, get the same
> error. Also tried creating a partition by year alone. Still get the same
> error.
>
> Here is the create table statement:
>
> create external table linkedin_JobSearch (
> jobs STRUCT<
> values : ARRAY<STRUCT<
> company : STRUCT<
> id : STRING,
> name : STRING>,
> postingDate : STRUCT<
> day : STRING>,
> descriptionSnippet : STRING,
> expirationDate : STRUCT<
> ......
> .......
> locationDescription : STRING>>>
> )
> PARTITIONED BY ("jobs.values.postingDate.year" STRING,
> "jobs.values.postingDate.month" STRING)
> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
> WITH SERDEPROPERTIES (
> "company"="$.jobs.values.company.name",
>  "position"="$.jobs.values.position.title",
> "customerJobCode"="$.jobs.values.customerJobCode",
> "locationDescription"="$.jobs.values.locationDescription",
> "jobPoster"="$.jobs.values.jobposter.headline"
> )
> LOCATION '/user/sunita/Linkedin/JobSearch';
>
> I need to be able to partition this information. Please help.
>
> regards
> Sunita
>

Reply via email to