1) How to create index old way via intermediate HFiles? I see “direct” option for IndexTool but description says its disabled:
private static final Option DIRECT_API_OPTION = new Option("direct", "direct", false, "This parameter is deprecated. Direct mode will be used whether it is set or not. Keeping it for backwards compatibility.”); 2) On phoenix-4.14.2 (old indexes) WAL disabling for index table was possible by “ALTER TABLE main_table SET DISABLE_WAL=true” Maybe we can add this feature to 4.16+ ? 3) My main table has VERSIONS=>1. Anyway I decided to major-compacted before next run and still got Delete mutations From table metrics ~ 10% of mutations is Delete I checked my main table, it has loaded IndexRegionObserver: coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.IndexRegionObserver|805306366|org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder' By the way I split index table for more regions, increased hbase.hregion.memstore.flush.size, hbase.hstore.blockingStoreFiles and get ~ 30% speedup. This is still very slow compared to old index creation. > On 31 Mar 2021, at 02:55, Kadir Ozdemir <ka...@gsuite.cloud.apache.org> wrote: > > I assume that your base table has several versions for a given row. If so, > creating a consistent index on this base table can be slower than creating an > old design index. This is because the new design creates an index row for > every data table row version. It simply replays the mutations on a row > without updating the data table but makes necessary mutations on the index > table. It does this to make sure that if you use SCN connections to do > point-in-time queries, the index will return correct results. During these > replays, index rows will be deleted if index columns are modified. This is > the reason I think you see delete mutations on the index table. > > 1) Yes > 2) No > 3) No > > It will be a good improvement to have an option to support (3) by just > creating indexes using the last data row versions. Please feel free to create > an improvement Jira for this. > > Did you create your base table using 4.16? If not, have you upgraded it to > the new index design using IndexUpgradeTool? I am asking this to make sure > that your index actually uses the new index design. You can verify this using > the HBase shell by describing the data table and checking if the > IndexRegionObserver coproc is loaded on your base table. > > > On Tue, Mar 30, 2021 at 3:10 PM Alexander Batyrshin <0x62...@gmail.com > <mailto:0x62...@gmail.com>> wrote: > I tried on phoenix-4.16.0 > > > On 31 Mar 2021, at 00:54, Alexander Batyrshin <0x62...@gmail.com > > <mailto:0x62...@gmail.com>> wrote: > > > > Hello, > > I tried to create new consistent index on mutable table and found out that > > IndexTool MapReduce works 3-5 times slower compared to old indexes on 4.14.2 > > So I have some question; > > > > 1) Is it possible to create index old way via intermediate HFiles and > > bulk-loading? > > 2) Is it possible to disable WAL on HBase index table for creation time? > > 3) My main table has no updates, but I observe Delete mutations on index > > table. Is it possible to disable this for initial index creation time? > > >