> 2 апр. 2021 г., в 03:55, Kadir Ozdemir <ka...@apache.org> написал(а):
> 
> 
> 1) I was thinking about the bulk load tool 
> (https://phoenix.apache.org/bulk_dataload.html). However, in this case, you 
> are not interested in bulk loading into the data table and its index but just 
> the index table. Now, I see that it would not work for you. You are supposed 
> to build a strongly consistent index once when you create the index. I am 
> curious why you are so concerned about its performance. 

I need minimum maintenance time window on our cluster.

> 2)  I thought you wanted to disable WAL only during index rebuild for the 
> index table, not all the time. You should be able to still use the ALTER 
> TABLE command with the new index design. Please note that in this case you 
> would disable WAL for the main table too.  Is that what you are looking for? 
> If you are willing to disable WAL, then there is no point in using strongly 
> consistent indexes because you would lose recently written data if region 
> servers crash. By the way, you can use IndexUpgradeTool to downgrade your 
> tables to the old design (to replace IndexRegionObserver with Indexer), see 
> https://phoenix.apache.org/secondary_indexing.html

I know about possibility of data loosing. But it’s not a problem if main table 
do not receive mutation during index creation (maintenance window).

Old indexes goes inconsistent too often, so it not the way.
> 
> 3) Delete markers will be added each time you run the index create command 
> whenever the data table rows have multiple versions and the versions of a row 
> have different values for indexed columns.

My table has 1 version per row after major-compaction. Also main table has no 
mutation during index creation

>> On Thu, Apr 1, 2021 at 3:28 PM Alexander Batyrshin <0x62...@gmail.com> wrote:
>> 1) How to create index old way via intermediate HFiles?
>> 
>> I see “direct” option for IndexTool but description says its disabled:
>> 
>> private static final Option DIRECT_API_OPTION = new Option("direct", 
>> "direct", false,
>>     "This parameter is deprecated. Direct mode will be used whether it is 
>> set or not. Keeping it for backwards compatibility.”);
>> 
>> 
>> 2) On phoenix-4.14.2 (old indexes) WAL disabling for index table was 
>> possible by “ALTER TABLE main_table SET DISABLE_WAL=true”
>> Maybe we can add this feature to 4.16+ ?
>> 
>> 
>> 3) My main table has VERSIONS=>1. Anyway I decided to major-compacted before 
>> next run and still got Delete mutations
>> 
>> From table metrics ~ 10% of mutations is Delete
>> <PastedGraphic-1.png>
>> 
>> I checked my main table, it has loaded IndexRegionObserver:
>> 
>> coprocessor$1 => 
>> '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|',
>> coprocessor$2 => 
>> '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|',
>> coprocessor$3 => 
>> '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|',
>> coprocessor$4 => 
>> '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|',
>> coprocessor$5 => 
>> '|org.apache.phoenix.hbase.index.IndexRegionObserver|805306366|org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder'
>> 
>> 
>> By the way I split index table for more regions, increased 
>> hbase.hregion.memstore.flush.size, hbase.hstore.blockingStoreFiles and get ~ 
>> 30% speedup.
>> This is still very slow compared to old index creation.
>> 
>>> On 31 Mar 2021, at 02:55, Kadir Ozdemir <ka...@gsuite.cloud.apache.org> 
>>> wrote:
>>> 
>>> I assume that your base table has several versions for a given row. If so, 
>>> creating a consistent index on this base table can be slower than creating 
>>> an old design index. This is because the new design creates an index row 
>>> for every data table row version.  It simply replays the mutations on a row 
>>> without updating the data table but makes necessary mutations on the index 
>>> table. It does this to make sure that if you use SCN connections to do 
>>> point-in-time queries, the index will return correct results. During these 
>>> replays, index rows will be deleted if index columns are modified. This is 
>>> the reason I think you see delete mutations on the index table. 
>>> 
>>> 1) Yes
>>> 2) No
>>> 3) No
>>> 
>>> It will be a good improvement to have an option to support (3) by just 
>>> creating indexes using the last data row versions. Please feel free to 
>>> create an improvement Jira for this.
>>> 
>>> Did you create your base table using 4.16? If not, have you upgraded it to 
>>> the new index design using IndexUpgradeTool? I am asking this to make sure 
>>> that your index actually uses the new index design. You can verify this 
>>> using the HBase shell by describing the data table and checking if the 
>>> IndexRegionObserver coproc is loaded on your  base table.
>>>   
>>> 
>>>> On Tue, Mar 30, 2021 at 3:10 PM Alexander Batyrshin <0x62...@gmail.com> 
>>>> wrote:
>>>> I tried on phoenix-4.16.0
>>>> 
>>>> > On 31 Mar 2021, at 00:54, Alexander Batyrshin <0x62...@gmail.com> wrote:
>>>> > 
>>>> > Hello,
>>>> > I tried to create new consistent index on mutable table and found out 
>>>> > that IndexTool MapReduce works 3-5 times slower compared to old indexes 
>>>> > on 4.14.2
>>>> > So I have some question;
>>>> > 
>>>> > 1) Is it possible to create index old way via intermediate HFiles and 
>>>> > bulk-loading?
>>>> > 2) Is it possible to disable WAL on HBase index table for creation time?
>>>> > 3) My main table has no updates, but I observe Delete mutations on index 
>>>> > table. Is it possible to disable this for initial index creation time?
>>>> > 
>>>> 
>> 

Reply via email to