I am still running into an issue with index not returning all my data. This is 
with hive 0.8.1.  I'm not sure where to go from here and open to suggestions.

It almost looks as if my upgrade (from 0.7.1) to 0.8.1 has some issue - as also 
the autoindex feature does not seem to work for me.
For the purpose of this test I kept Hive 0.7.1 as is but I installed Hive 0.8.1 
into a separate directory and used a different metastore (using mysql) for it.
This is just on the hope that I can keep the existing installation unchanged 
and still test the newer version. I set HIVE_HOME to the 8.1 directory and put 
all the jars in the lib into the CLASSPATH before invoking hive in the test.

After I run the test, I have 533 rows when index is not used, zero rows with 
index.  It should be 533 rows.
Corresponding test using uncompressed table returns 533 rows both with/without 
index.

Each of the four steps were run sequentially but in separate hive session.


1.       Test table created this way:
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzopCodec;

create table omnic as select * from omni;


2.       index created this way:
drop index omni_sess on omnic;
SET hive.exec.compress.output=false;
SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
create index omni_sess on table omnic(session_id) as 'COMPACT' with deferred 
rebuild in table omnic_sess;
alter index omni_sess on omnic rebuild;


3.       Sample table:

SET hive.exec.compress.output=false;

insert overwrite directory '/user/robert/bobc' select `_bucketname`,`_offsets` 
from omnic_sess a join sampled b on a.session_id=b.session_id where 
a.session_id is not null;


4.       Finally the test itself:

SET hive.index.compact.file=/user/robert/bobc;
SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite directory 'testnox' select /*+ mapjoin(b) */  
a.session_id,a.hit_epoc_sec  from omnic a join sampled b on 
a.session_id=b.session_id where a.session_id is not null;

set hive.optimize.autoindex=true;
set 
hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
insert overwrite directory 'testidx' select /*+ mapjoin(b) */  
a.session_id,a.hit_epoc_sec  from omnic a join sampled b on 
a.session_id=b.session_id where a.session_id is not null;



[hdfs@txn4pchad05 test]$ hadoop fs -text /user/robert/bobc/*|head
12/02/23 21:29:53 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
12/02/23 21:29:53 INFO lzo.LzoCodec: Successfully loaded & initialized 
native-lzo library [hadoop-lzo rev fatal: Not a git repository (or any of the 
parent directories): .git]
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000113_0.lzo122167654122183303122173507122165476122180645122170417122175969122178155
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000070_0.lzo217747089
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000142_0.lzo101758512101751307101723208101755283101737621101712562101734346101729300101717005101719070101726197101746274101740611101732296101710463101753412101743471101721157101748344101715031
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000070_0.lzo217784824
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000113_0.lzo122416609
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000125_0.lzo71150312
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000142_0.lzo101955949101965898101989168101863966101914106101831065101960915101859329101943971101842195101980791101870322101861642101918637102004882101837518101975824101875893101958430101898913101890798101839915101946990101834331101986822101866949101873585101910562101953630101924255101968381101894098101854877101846609101935603101932331101930145101973347101857217101901404101963399101883316102001496102007314101998177101937945101885813101927798101983279101994635101896429101888311101848988101921911101970864101852404101978301101941481101904948101950365101916452101844501101878364101991354101908221101828157101880840
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000104_0.lzo174169190
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000104_0.lzo174222731
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000100_0.lzo99993323

Robert Hamilton
HP.com IT
512.432.8445 office |  robert.hamll...@hp.com<mailto:robert.hamll...@hp.com>
14231 Tandem Blvd | Austin | TX 78728




Reply via email to