> 2 апр. 2021 г., в 03:55, Kadir Ozdemir <ka...@apache.org> написал(а): > > > 1) I was thinking about the bulk load tool > (https://phoenix.apache.org/bulk_dataload.html). However, in this case, you > are not interested in bulk loading into the data table and its index but just > the index table. Now, I see that it would not work for you. You are supposed > to build a strongly consistent index once when you create the index. I am > curious why you are so concerned about its performance.
I need minimum maintenance time window on our cluster. > 2) I thought you wanted to disable WAL only during index rebuild for the > index table, not all the time. You should be able to still use the ALTER > TABLE command with the new index design. Please note that in this case you > would disable WAL for the main table too. Is that what you are looking for? > If you are willing to disable WAL, then there is no point in using strongly > consistent indexes because you would lose recently written data if region > servers crash. By the way, you can use IndexUpgradeTool to downgrade your > tables to the old design (to replace IndexRegionObserver with Indexer), see > https://phoenix.apache.org/secondary_indexing.html I know about possibility of data loosing. But it’s not a problem if main table do not receive mutation during index creation (maintenance window). Old indexes goes inconsistent too often, so it not the way. > > 3) Delete markers will be added each time you run the index create command > whenever the data table rows have multiple versions and the versions of a row > have different values for indexed columns. My table has 1 version per row after major-compaction. Also main table has no mutation during index creation >> On Thu, Apr 1, 2021 at 3:28 PM Alexander Batyrshin <0x62...@gmail.com> wrote: >> 1) How to create index old way via intermediate HFiles? >> >> I see “direct” option for IndexTool but description says its disabled: >> >> private static final Option DIRECT_API_OPTION = new Option("direct", >> "direct", false, >> "This parameter is deprecated. Direct mode will be used whether it is >> set or not. Keeping it for backwards compatibility.”); >> >> >> 2) On phoenix-4.14.2 (old indexes) WAL disabling for index table was >> possible by “ALTER TABLE main_table SET DISABLE_WAL=true” >> Maybe we can add this feature to 4.16+ ? >> >> >> 3) My main table has VERSIONS=>1. Anyway I decided to major-compacted before >> next run and still got Delete mutations >> >> From table metrics ~ 10% of mutations is Delete >> <PastedGraphic-1.png> >> >> I checked my main table, it has loaded IndexRegionObserver: >> >> coprocessor$1 => >> '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', >> coprocessor$2 => >> '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', >> coprocessor$3 => >> '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', >> coprocessor$4 => >> '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', >> coprocessor$5 => >> '|org.apache.phoenix.hbase.index.IndexRegionObserver|805306366|org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder' >> >> >> By the way I split index table for more regions, increased >> hbase.hregion.memstore.flush.size, hbase.hstore.blockingStoreFiles and get ~ >> 30% speedup. >> This is still very slow compared to old index creation. >> >>> On 31 Mar 2021, at 02:55, Kadir Ozdemir <ka...@gsuite.cloud.apache.org> >>> wrote: >>> >>> I assume that your base table has several versions for a given row. If so, >>> creating a consistent index on this base table can be slower than creating >>> an old design index. This is because the new design creates an index row >>> for every data table row version. It simply replays the mutations on a row >>> without updating the data table but makes necessary mutations on the index >>> table. It does this to make sure that if you use SCN connections to do >>> point-in-time queries, the index will return correct results. During these >>> replays, index rows will be deleted if index columns are modified. This is >>> the reason I think you see delete mutations on the index table. >>> >>> 1) Yes >>> 2) No >>> 3) No >>> >>> It will be a good improvement to have an option to support (3) by just >>> creating indexes using the last data row versions. Please feel free to >>> create an improvement Jira for this. >>> >>> Did you create your base table using 4.16? If not, have you upgraded it to >>> the new index design using IndexUpgradeTool? I am asking this to make sure >>> that your index actually uses the new index design. You can verify this >>> using the HBase shell by describing the data table and checking if the >>> IndexRegionObserver coproc is loaded on your base table. >>> >>> >>>> On Tue, Mar 30, 2021 at 3:10 PM Alexander Batyrshin <0x62...@gmail.com> >>>> wrote: >>>> I tried on phoenix-4.16.0 >>>> >>>> > On 31 Mar 2021, at 00:54, Alexander Batyrshin <0x62...@gmail.com> wrote: >>>> > >>>> > Hello, >>>> > I tried to create new consistent index on mutable table and found out >>>> > that IndexTool MapReduce works 3-5 times slower compared to old indexes >>>> > on 4.14.2 >>>> > So I have some question; >>>> > >>>> > 1) Is it possible to create index old way via intermediate HFiles and >>>> > bulk-loading? >>>> > 2) Is it possible to disable WAL on HBase index table for creation time? >>>> > 3) My main table has no updates, but I observe Delete mutations on index >>>> > table. Is it possible to disable this for initial index creation time? >>>> > >>>> >>