1) How to create index old way via intermediate HFiles?

I see “direct” option for IndexTool but description says its disabled:

private static final Option DIRECT_API_OPTION = new Option("direct", "direct", 
false,
    "This parameter is deprecated. Direct mode will be used whether it is set 
or not. Keeping it for backwards compatibility.”);


2) On phoenix-4.14.2 (old indexes) WAL disabling for index table was possible 
by “ALTER TABLE main_table SET DISABLE_WAL=true”
Maybe we can add this feature to 4.16+ ?


3) My main table has VERSIONS=>1. Anyway I decided to major-compacted before 
next run and still got Delete mutations

From table metrics ~ 10% of mutations is Delete


I checked my main table, it has loaded IndexRegionObserver:

coprocessor$1 => 
'|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|',
coprocessor$2 => 
'|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|',
coprocessor$3 => 
'|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|',
coprocessor$4 => 
'|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|',
coprocessor$5 => 
'|org.apache.phoenix.hbase.index.IndexRegionObserver|805306366|org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder'


By the way I split index table for more regions, increased 
hbase.hregion.memstore.flush.size, hbase.hstore.blockingStoreFiles and get ~ 
30% speedup.
This is still very slow compared to old index creation.

> On 31 Mar 2021, at 02:55, Kadir Ozdemir <ka...@gsuite.cloud.apache.org> wrote:
> 
> I assume that your base table has several versions for a given row. If so, 
> creating a consistent index on this base table can be slower than creating an 
> old design index. This is because the new design creates an index row for 
> every data table row version.  It simply replays the mutations on a row 
> without updating the data table but makes necessary mutations on the index 
> table. It does this to make sure that if you use SCN connections to do 
> point-in-time queries, the index will return correct results. During these 
> replays, index rows will be deleted if index columns are modified. This is 
> the reason I think you see delete mutations on the index table. 
> 
> 1) Yes
> 2) No
> 3) No
> 
> It will be a good improvement to have an option to support (3) by just 
> creating indexes using the last data row versions. Please feel free to create 
> an improvement Jira for this.
> 
> Did you create your base table using 4.16? If not, have you upgraded it to 
> the new index design using IndexUpgradeTool? I am asking this to make sure 
> that your index actually uses the new index design. You can verify this using 
> the HBase shell by describing the data table and checking if the 
> IndexRegionObserver coproc is loaded on your  base table.
>   
> 
> On Tue, Mar 30, 2021 at 3:10 PM Alexander Batyrshin <0x62...@gmail.com 
> <mailto:0x62...@gmail.com>> wrote:
> I tried on phoenix-4.16.0
> 
> > On 31 Mar 2021, at 00:54, Alexander Batyrshin <0x62...@gmail.com 
> > <mailto:0x62...@gmail.com>> wrote:
> > 
> > Hello,
> > I tried to create new consistent index on mutable table and found out that 
> > IndexTool MapReduce works 3-5 times slower compared to old indexes on 4.14.2
> > So I have some question;
> > 
> > 1) Is it possible to create index old way via intermediate HFiles and 
> > bulk-loading?
> > 2) Is it possible to disable WAL on HBase index table for creation time?
> > 3) My main table has no updates, but I observe Delete mutations on index 
> > table. Is it possible to disable this for initial index creation time?
> > 
> 

Reply via email to