Any one has insights on the following issue? Where would be the best place to make the changes? I have tried several approaches but it breaks some other queries.
Basically, the inner query produces some additional columns (from window function) not consumed by the outer query. The query plan says we won’t generate those columns since they are not used (which seems correct), while during the execution, we DO output the additional column to the temp file. That causes the issue when we read the temp file in later. Best regards, Aihua Xu Software Engineer, CCE a...@cloudera.com Begin forwarded message: > From: Aihua Xu <a...@cloudera.com> > Subject: Exception with the window function > Date: January 5, 2015 at 6:51:01 AM PST > To: user@hive.apache.org > > Hi folks, > > I’m working on the HIVE-9228 > (https://issues.apache.org/jira/browse/HIVE-9228) with the following window > query throwing ArrayOutOfBoundExcption. > > select st_fips_cd, zip_cd_5, hh_surr_key > from > ( > select st_fips_cd, zip_cd_5, hh_surr_key, > count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by > st_fips_cd, zip_cd_5) as CNT_ADVTG_LEN_RSDNC_CD_1, > row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key > asc) as analytic_row_number3 > from hh_agg > where analytic_row_number2 = 1 > ) t; > > > Here is the explain extend output for the query. At the File Sink Operator > of stage 1 below, seems like it should only output 4 columns while the temp > table in fact output one additional column (the value of > CNT_ADVTG_LEN_RSDNC_CD_1). I’m investigating toward such mismatch, but anyone > can confirm and provide additional info that will be helpful. > > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-2 depends on stages: Stage-1 > Stage-0 is a root stage > > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Map Operator Tree: > TableScan > alias: hh_agg > Statistics: Num rows: 33208 Data size: 10361206 Basic stats: > COMPLETE Column stats: NONE > GatherStats: false > Filter Operator > isSamplingPred: false > predicate: (analytic_row_number2 = 1) (type: boolean) > Statistics: Num rows: 16604 Data size: 5180603 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: st_fips_cd (type: string), zip_cd_5 (type: > string), st_fips_cd (type: string), zip_cd_5 (type: string) > sort order: ++++ > Map-reduce partition columns: st_fips_cd (type: string), > zip_cd_5 (type: string) > Statistics: Num rows: 16604 Data size: 5180603 Basic stats: > COMPLETE Column stats: NONE > tag: -1 > value expressions: st_fips_cd (type: string), zip_cd_5 (type: > string), hh_surr_key (type: bigint), advtg_len_rsdnc_cd (type: string) > Path -> Alias: > file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg] > Path -> Partition: > file:/Users/axu/Documents/localDB/23982_debug > Partition > base file name: 23982_debug > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE true > EXTERNAL TRUE > bucket_count -1 > columns > st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2 > columns.comments > columns.types > string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int > field.delim , > file.inputformat org.apache.hadoop.mapred.TextInputFormat > file.outputformat > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > location file:/Users/axu/Documents/localDB/23982_debug > name default.hh_agg > numFiles 0 > numRows 0 > rawDataSize 0 > serialization.ddl struct hh_agg { string st_fips_cd, string > zip_cd_5, i64 hh_surr_key, i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte > hh_pop, string advtg_len_rsdnc_cd, string advtg_home_ownr_cd, string > dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd, string > advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double > zip_mdn_age, i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64 > cnt_gend_cd_3, i64 cnt_gend_cd_unk, i64 cnt_advtg_marital_stat_cd_1, i64 > cnt_advtg_marital_stat_cd_2, i64 cnt_advtg_marital_stat_cd_unk, i64 > cnt_nbr_tradeline_0, i64 cnt_nbr_tradeline_1, i64 cnt_nbr_tradeline_2, i64 > cnt_nbr_tradeline_3, i64 cnt_nbr_tradeline_4, i64 cnt_nbr_tradeline_5, i64 > cnt_nbr_tradeline_6, i64 cnt_nbr_tradeline_7, i64 cnt_nbr_tradeline_8, i64 > cnt_nbr_tradeline_9, i64 cnt_nbr_tradeline_unk, string advtg_dwell_type_cd, > string prprty_mkt_val_cd, double zip_avg_prprty_mkt_val, double > zip_mdn_prprty_mkt_val, double zip_avg_home_eqty_amt, double > zip_mdn_home_eqty_amt, string trgt_inc_cd, double > zip_avg_trgt_inc_narrow_band, double zip_mdn_trgt_inc_narrow_band, double > zip_avg_inc_prodc_asset_cd, double zip_mdn_inc_prodc_asset_cd, double > zip_avg_net_wrth_cd, double zip_mdn_net_wrth_cd, string > rylty_trgt_mktg_val_scr_cd, i32 analytic_row_number2} > serialization.format , > serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > totalSize 0 > transient_lastDdlTime 1419367415 > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE true > EXTERNAL TRUE > bucket_count -1 > columns > st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2 > columns.comments > columns.types > string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int > field.delim , > file.inputformat org.apache.hadoop.mapred.TextInputFormat > file.outputformat > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > location file:/Users/axu/Documents/localDB/23982_debug > name default.hh_agg > numFiles 0 > numRows 0 > rawDataSize 0 > serialization.ddl struct hh_agg { string st_fips_cd, string > zip_cd_5, i64 hh_surr_key, i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte > hh_pop, string advtg_len_rsdnc_cd, string advtg_home_ownr_cd, string > dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd, string > advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double > zip_mdn_age, i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64 > cnt_gend_cd_3, i64 cnt_gend_cd_unk, i64 cnt_advtg_marital_stat_cd_1, i64 > cnt_advtg_marital_stat_cd_2, i64 cnt_advtg_marital_stat_cd_unk, i64 > cnt_nbr_tradeline_0, i64 cnt_nbr_tradeline_1, i64 cnt_nbr_tradeline_2, i64 > cnt_nbr_tradeline_3, i64 cnt_nbr_tradeline_4, i64 cnt_nbr_tradeline_5, i64 > cnt_nbr_tradeline_6, i64 cnt_nbr_tradeline_7, i64 cnt_nbr_tradeline_8, i64 > cnt_nbr_tradeline_9, i64 cnt_nbr_tradeline_unk, string advtg_dwell_type_cd, > string prprty_mkt_val_cd, double zip_avg_prprty_mkt_val, double > zip_mdn_prprty_mkt_val, double zip_avg_home_eqty_amt, double > zip_mdn_home_eqty_amt, string trgt_inc_cd, double > zip_avg_trgt_inc_narrow_band, double zip_mdn_trgt_inc_narrow_band, double > zip_avg_inc_prodc_asset_cd, double zip_mdn_inc_prodc_asset_cd, double > zip_avg_net_wrth_cd, double zip_mdn_net_wrth_cd, string > rylty_trgt_mktg_val_scr_cd, i32 analytic_row_number2} > serialization.format , > serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > totalSize 0 > transient_lastDdlTime 1419367415 > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: default.hh_agg > name: default.hh_agg > Truncated Path -> Alias: > file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg] > Needs Tagging: false > Reduce Operator Tree: > Extract > Statistics: Num rows: 16604 Data size: 5180603 Basic stats: > COMPLETE Column stats: NONE > PTF Operator > Statistics: Num rows: 16604 Data size: 5180603 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > GlobalTableId: 0 > directory: > file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003 > NumFilesPerFileSink: 1 > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > properties: > columns _col0,_col1,_col2,_col6 > columns.types string,string,bigint,string > escape.delim \ > serialization.lib > org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe > serde: > org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe > TotalFiles: 1 > GatherStats: false > MultiFileSpray: false > > Stage: Stage-2 > Map Reduce > Map Operator Tree: > TableScan > GatherStats: false > Reduce Output Operator > key expressions: _col0 (type: string), _col1 (type: string), > _col2 (type: bigint) > sort order: +++ > Map-reduce partition columns: _col0 (type: string), _col1 > (type: string) > Statistics: Num rows: 16604 Data size: 5180603 Basic stats: > COMPLETE Column stats: NONE > tag: -1 > value expressions: _col0 (type: string), _col1 (type: string), > _col2 (type: bigint) > Path -> Alias: > > file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003 > > [file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003] > Path -> Partition: > > file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003 > > Partition > base file name: -mr-10003 > input format: org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > properties: > columns _col0,_col1,_col2,_col6 > columns.types string,string,bigint,string > escape.delim \ > serialization.lib > org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe > serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe > > input format: org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > properties: > columns _col0,_col1,_col2,_col6 > columns.types string,string,bigint,string > escape.delim \ > serialization.lib > org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe > serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe > Truncated Path -> Alias: > > file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003 > > [file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003] > Needs Tagging: false > Reduce Operator Tree: > Extract > Statistics: Num rows: 16604 Data size: 5180603 Basic stats: > COMPLETE Column stats: NONE > PTF Operator > Statistics: Num rows: 16604 Data size: 5180603 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: string), _col2 (type: string), _col3 > (type: bigint) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 16604 Data size: 5180603 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > GlobalTableId: 0 > directory: > file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10000/.hive-staging_hive_2015-01-01_13-25-01_841_2717056365737730087-1/-ext-10001 > NumFilesPerFileSink: 1 > Statistics: Num rows: 16604 Data size: 5180603 Basic stats: > COMPLETE Column stats: NONE > Stats Publishing Key Prefix: > file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10000/.hive-staging_hive_2015-01-01_13-25-01_841_2717056365737730087-1/-ext-10001/ > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > columns _col0,_col1,_col2 > columns.types string:string:bigint > escape.delim \ > hive.serialization.extend.nesting.levels true > serialization.format 1 > serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > TotalFiles: 1 > GatherStats: false > MultiFileSpray: false > > Stage: Stage-0 > Fetch Operator > limit: -1