Hi, I am using Hadoop 1.0.4 and Hive 0.11.0. But I've tried and have the same problems with Hive versions 10, 11, 12 and 13.
I am trying to create my own indexes. As I've mentioned before (24/1/14) I have created my own class derived from TableBasedIndexHandler I copied all the methods from CompactIndexHandler but I added lots of System.out.printlns so that I could check and see what was going on. So this is, effectively, an instrumented copy of CompactIndexHandler. Of course it doesn't work. Now I have built Hive 13 from source and investigated and would like to discuss a few points. 1) The reason that SHOW INDEX and SHOW INDEX FORMATTED fails is because on line 127 of file MetaDataFormatUtils.java we find this code: IndexType indexType = HiveIndex.getIndexTypeByClassName(indexHandlerClass); indexColumns.add(indexType.getName()); This code fails with an NPE because the HiveIndex class is an Enum that includes compact and bitmap indexes only. This code can easily be fixed with something like: IndexType indexType = HiveIndex.getIndexTypeByClassName(indexHandlerClass); indexColumns.add((indexType == null) ? "" : indexType.getName()); 2) The next problem I run into is that the generateIndexQuery method of my index class is not being invoked. It's not hard to track this down. It's because in IndexWhereTaskDispatcher method createOperatorRules the code checks that the index class name is in a list of supported indexes. It builds a list of supported indexes and puts compact and bitmap only in it. In other words the code seems to be written quite explicitly so that it only supports bitmap and compact indexes. It would seem that to add any more indexes you have to build your own custom version of Hive. However I thought that this page https://cwiki.apache.org/confluence/display/Hive/IndexDev which has this text: "This document explains the proposed design for adding index support to Hive (HIVE-417<http://issues.apache.org/jira/browse/HIVE-417>). Indexing is a standard database technique, but with many possible variations. Rather than trying to provide a "one-size-fits-all" index implementation, the approach we are taking is to define indexing in a pluggable manner (related to StorageHandlers<https://cwiki.apache.org/confluence/display/Hive/StorageHandlers>) and provide one concrete indexing implementation as a reference, leaving it open for contributors to plug in other indexing schemes as time goes by." Surely this implies that end-users can plug their own index implementations in. (Similarly chapter 8 of the Programming Hive book gave me the same impression.) Is it just me? Have I got the wrong end of the stick? is the Hive implementation of indexes supposed to be non-extensible or is it fundamentally broken? I also have another fundamental problem. The reason that I'm doing all this in the first place is that I want to be able to use my indexes but without running Map/Reduce. I know that I will have to modify Hive quite a lot to do this because it currently assumes that indexes can only be used when running map/reduce jobs. The current compact and bitmap index implementations require a map/reduce job and so I will have to stop them from being used when there is no map/reduce job. My inclination would be to extend the HiveIndexHandler interface so that there's another method boolean requiresMapReduce() which defaults to true in the AbstractIndexHandler base class. Would this be viewed as a sensible start? I'm only just starting and so I'm not really in a position to submit patches yet but I thought that it would be sensible to see if these sort of changes are going to be acceptable. Regards, Peter Marron Senior Developer Trillium Software, A Harte Hanks Company Theale Court, 1st Floor, 11-13 High Street Theale RG7 5AH +44 (0) 118 940 7609 office +44 (0) 118 940 7699 fax [https://4b2685446389bc779b46-5f66fbb59518cc4fcae8900db28267f5.ssl.cf2.rackcdn.com/trillium.png]<http://www.trilliumsoftware.com/> trilliumsoftware.com<http://www.trilliumsoftware.com/> / linkedin<http://www.linkedin.com/company/17710> / twitter<https://twitter.com/trilliumsw> / facebook<http://www.facebook.com/HarteHanks>
<<inline: image003.png>>