Indexes

Peter Marron Mon, 24 Feb 2014 07:00:32 -0800

Hi,

I am using Hadoop 1.0.4 and Hive 0.11.0. But I've tried and have the same 
problems
with Hive versions 10, 11, 12 and 13.


I am trying to create my own indexes. As I've mentioned before (24/1/14) I have 
created my own class derived from TableBasedIndexHandler
I copied all the methods from CompactIndexHandler but I added lots of 
System.out.printlns so that I
could check and see what was going on. So this is, effectively, an instrumented 
copy of CompactIndexHandler.
Of course it doesn't work. Now I have built Hive 13 from source and 
investigated and would like to discuss a few points.


1)      The reason that SHOW INDEX and SHOW INDEX FORMATTED fails is because on 
line 127 of file

MetaDataFormatUtils.java we find this code:


    IndexType indexType = HiveIndex.getIndexTypeByClassName(indexHandlerClass);
    indexColumns.add(indexType.getName());



This code fails with an NPE because the HiveIndex class is an Enum that 
includes compact

and bitmap indexes only. This code can easily be fixed with something like:


    IndexType indexType = HiveIndex.getIndexTypeByClassName(indexHandlerClass);
    indexColumns.add((indexType == null) ? "" : indexType.getName());





2)      The next problem I run into is that the generateIndexQuery method of my 
index class

is not being invoked. It's not hard to track this down. It's because in 
IndexWhereTaskDispatcher

method createOperatorRules the code checks that the index class name is

in a list of supported indexes. It builds a list of supported indexes and puts 
compact and bitmap only in it.

In other words the code seems to be written quite explicitly so that it only 
supports bitmap and compact
indexes. It would seem that to add any more indexes you have to build your own 
custom version of Hive.
However I thought that this page 
https://cwiki.apache.org/confluence/display/Hive/IndexDev
which has this text:
"This document explains the proposed design for adding index support to Hive 
(HIVE-417<http://issues.apache.org/jira/browse/HIVE-417>). Indexing is a 
standard database technique, but with many possible variations. Rather than 
trying to provide a "one-size-fits-all" index implementation, the approach we 
are taking is to define indexing in a pluggable manner (related to 
StorageHandlers<https://cwiki.apache.org/confluence/display/Hive/StorageHandlers>)
 and provide one concrete indexing implementation as a reference, leaving it 
open for contributors to plug in other indexing schemes as time goes by."

Surely this implies that end-users can plug their own index implementations in. 
(Similarly chapter 8 of
the Programming Hive book gave me the same impression.) Is it just me? Have I 
got the
wrong end of the stick? is the Hive implementation of indexes supposed to be
non-extensible or is it fundamentally broken?

I also have another fundamental problem.
The reason that I'm doing all this in the first place is that I want to be able 
to use my
indexes but without running Map/Reduce. I know that I will have to modify Hive
quite a lot to do this because it currently assumes that indexes can only be 
used
when running map/reduce jobs. The current compact and bitmap index 
implementations
require a map/reduce job and so I will have to stop them from being used when 
there
is no map/reduce job. My inclination would be to extend the HiveIndexHandler 
interface
so that there's another method boolean requiresMapReduce() which defaults to 
true in
the AbstractIndexHandler base class. Would this be viewed as a sensible start?

I'm only just starting and so I'm not really in a position to submit patches yet
but I thought that it would be sensible to see if these sort of changes are 
going
to be acceptable.

Regards,

Peter Marron
Senior Developer
Trillium Software, A Harte Hanks Company
Theale Court, 1st Floor, 11-13 High Street
Theale
RG7 5AH
+44 (0) 118 940 7609 office
+44 (0) 118 940 7699 fax
[https://4b2685446389bc779b46-5f66fbb59518cc4fcae8900db28267f5.ssl.cf2.rackcdn.com/trillium.png]<http://www.trilliumsoftware.com/>
trilliumsoftware.com<http://www.trilliumsoftware.com/> / 
linkedin<http://www.linkedin.com/company/17710> / 
twitter<https://twitter.com/trilliumsw> / 
facebook<http://www.facebook.com/HarteHanks>

<<inline: image003.png>>

Indexes

Reply via email to