1) I was thinking about the bulk load tool ( https://phoenix.apache.org/bulk_dataload.html). However, in this case, you are not interested in bulk loading into the data table and its index but just the index table. Now, I see that it would not work for you. You are supposed to build a strongly consistent index once when you create the index. I am curious why you are so concerned about its performance.
2) I thought you wanted to disable WAL only during index rebuild for the index table, not all the time. You should be able to still use the ALTER TABLE command with the new index design. Please note that in this case you would disable WAL for the main table too. Is that what you are looking for? If you are willing to disable WAL, then there is no point in using strongly consistent indexes because you would lose recently written data if region servers crash. By the way, you can use IndexUpgradeTool to downgrade your tables to the old design (to replace IndexRegionObserver with Indexer), see https://phoenix.apache.org/secondary_indexing.html. 3) Delete markers will be added each time you run the index create command whenever the data table rows have multiple versions and the versions of a row have different values for indexed columns. On Thu, Apr 1, 2021 at 3:28 PM Alexander Batyrshin <0x62...@gmail.com> wrote: > 1) How to create index old way via intermediate HFiles? > > I see “direct” option for IndexTool but description says its disabled: > > private static final Option DIRECT_API_OPTION = new Option("direct", > "direct", false, > "This parameter is deprecated. Direct mode will be used whether it is > set or not. Keeping it for backwards compatibility.”); > > > 2) On phoenix-4.14.2 (old indexes) WAL disabling for index table was > possible by “ALTER TABLE main_table SET DISABLE_WAL=true” > Maybe we can add this feature to 4.16+ ? > > > 3) My main table has VERSIONS=>1. Anyway I decided to major-compacted > before next run and still got Delete mutations > > From table metrics ~ 10% of mutations is Delete > > I checked my main table, it has loaded IndexRegionObserver: > > coprocessor$1 => > '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', > coprocessor$2 => > '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', > coprocessor$3 => > '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', > coprocessor$4 => > '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', > coprocessor$5 => > '|org.apache.phoenix.hbase.index.IndexRegionObserver|805306366|org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder' > > > By the way I split index table for more regions, increased > hbase.hregion.memstore.flush.size, hbase.hstore.blockingStoreFiles and get > ~ 30% speedup. > This is still very slow compared to old index creation. > > On 31 Mar 2021, at 02:55, Kadir Ozdemir <ka...@gsuite.cloud.apache.org> > wrote: > > I assume that your base table has several versions for a given row. If so, > creating a consistent index on this base table can be slower than creating > an old design index. This is because the new design creates an index row > for every data table row version. It simply replays the mutations on a row > without updating the data table but makes necessary mutations on the index > table. It does this to make sure that if you use SCN connections to do > point-in-time queries, the index will return correct results. During these > replays, index rows will be deleted if index columns are modified. This is > the reason I think you see delete mutations on the index table. > > 1) Yes > 2) No > 3) No > > It will be a good improvement to have an option to support (3) by just > creating indexes using the last data row versions. Please feel free to > create an improvement Jira for this. > > Did you create your base table using 4.16? If not, have you upgraded it to > the new index design using IndexUpgradeTool? I am asking this to make sure > that your index actually uses the new index design. You can verify this > using the HBase shell by describing the data table and checking if the > IndexRegionObserver coproc is loaded on your base table. > > > On Tue, Mar 30, 2021 at 3:10 PM Alexander Batyrshin <0x62...@gmail.com> > wrote: > >> I tried on phoenix-4.16.0 >> >> > On 31 Mar 2021, at 00:54, Alexander Batyrshin <0x62...@gmail.com> >> wrote: >> > >> > Hello, >> > I tried to create new consistent index on mutable table and found out >> that IndexTool MapReduce works 3-5 times slower compared to old indexes on >> 4.14.2 >> > So I have some question; >> > >> > 1) Is it possible to create index old way via intermediate HFiles and >> bulk-loading? >> > 2) Is it possible to disable WAL on HBase index table for creation time? >> > 3) My main table has no updates, but I observe Delete mutations on >> index table. Is it possible to disable this for initial index creation time? >> > >> >> >