Kirill.

«Count of segments» is a very internal thing for a regular user.
Regular user don’t want to know about such things.

You suggest to calculate the number (space required to store WAL) with some 
kind of rough calculation, and with the «Count of bytes written in WAL» we can 
have exact number without any suggestions or calculations.

Moreover, «Count of bytes written in WAL» is independent on internal WAL 
implementation.

So, I think exact number is always better to have then some approximation.

What do you think?


> 15 февр. 2021 г., в 20:45, ткаленко кирилл <tkalkir...@yandex.ru> написал(а):
> 
> Hi, Nikolay!
> 
> We set the number of segments in the working directory, we also delete by 
> segment, it seems that this is a matter of usability. I prefer to dwell on my 
> own version, this is a simple metric that does not hurt and you can add more 
> as needed.
> 
> 15.02.2021, 17:10, "Nikolay Izhikov" <nizhi...@apache.org>:
>> My suggestion that «count of files» is meaningless number.
>> And «count of bytes written to the files» is useful number to know and use 
>> for capacity planning..
>> 
>>>  15 февр. 2021 г., в 15:59, ткаленко кирилл <tkalkir...@yandex.ru> 
>>> написал(а):
>>> 
>>>  Hi, Nikolay!
>>> 
>>>  There may be a number (count of segments * segment size) or there may be a 
>>> count of segments, whichever is more convenient for the user.
>>> 
>>>  15.02.2021, 13:14, "Nikolay Izhikov" <nizhi...@apache.org>:
>>>>  Hello, Kirill.
>>>> 
>>>>  Thanks for an answers.
>>>>  Now, I understand your intentions.
>>>> 
>>>>>   t also seems that it will be more natural to operate not just bytes but 
>>>>> multiples of a segment.
>>>> 
>>>>  Can’t agree here.
>>>>  From my point of view - it’s better to know exact number, not just «count 
>>>> of segments».
>>>> 
>>>>>   15 февр. 2021 г., в 13:00, ткаленко кирилл <tkalkir...@yandex.ru> 
>>>>> написал(а):
>>>>> 
>>>>>   Hello, Nikolay!
>>>>> 
>>>>>   The period of one day (24h) seems more natural, you can take more or 
>>>>> less, I think that one day may not be enough, and it is worth getting the 
>>>>> metric for several days (collect statistics) for example a week. Yes, the 
>>>>> total size of the segments may not be 
>>>>> DataStorageConfiguration#getMaxWalArchiveSize, but for capacity planning, 
>>>>> accuracy is not so important to us, since the load can always change, it 
>>>>> will hurt users more if we overflow the archive and it will not be able 
>>>>> to start the node. So to say that more is better than less, it also seems 
>>>>> that it will be more natural to operate not just bytes but multiples of a 
>>>>> segment.
>>>>> 
>>>>>   In separate threads, you can discuss the metric that you propose about 
>>>>> page memory and indexes estimates.
>>>>> 
>>>>>   14.02.2021, 11:54, "Nikolay Izhikov" <nizhi...@apache.org>:
>>>>>>   Hello, Kirill
>>>>>> 
>>>>>>   Your conclusions still not clear for me.
>>>>>> 
>>>>>>>     It is not possible for us to estimate how much space a user will 
>>>>>>> need in the archive so as not to overflow it under its load
>>>>>>>     We take the maximum 44 and multiply it by a 
>>>>>>> DataStorageConfiguration#getWalSegmentSize
>>>>>> 
>>>>>>   Why you take a single day (24h) for a standard period? Is there any 
>>>>>> rationale behind this?
>>>>>> 
>>>>>>   1. We have `walAutoArchiveAfterInactivity` property. So WAL segment 
>>>>>> can have a size less than the maximum.
>>>>>>   2. For CDC feature I want to introduce «WAL force rollover timeout» to 
>>>>>> make data available for a consumer in a guaranteed period [1].
>>>>>> 
>>>>>>   Why does the user want to estimate those numbers in the first place?
>>>>>>   Are we talking about some kind of capacity planning?
>>>>>> 
>>>>>>   If yes, then maybe it will be better to have a metric for a count of 
>>>>>> bytes written in the WAL?
>>>>>>   With it, we will have an exact number of space we need for WAL.
>>>>>> 
>>>>>>   How user should estimate capacity for a page memory and indexes?
>>>>>> 
>>>>>>   [1] https://issues.apache.org/jira/browse/IGNITE-13582
>>>>>> 
>>>>>>>    14 февр. 2021 г., в 09:48, ткаленко кирилл <tkalkir...@yandex.ru> 
>>>>>>> написал(а):
>>>>>>> 
>>>>>>>    Hi, Nikolay!
>>>>>>> 
>>>>>>>    The user will be able to take the getLastArchivedSegmentIndex every 
>>>>>>> day and remember it and do it, say, for several days.
>>>>>>> 
>>>>>>>    For example, when starting the application, the 
>>>>>>> getLastArchivedSegmentIndex is 0, then at the end of the first day the 
>>>>>>> value will be 30 at the end of the second 55 and at the end of the 
>>>>>>> third 99.
>>>>>>>    It turns out that 30 segments were used for the first day, 25 for 
>>>>>>> the second and 44 for the third. We take the maximum 44 and multiply it 
>>>>>>> by a DataStorageConfiguration#getWalSegmentSize, and we get the 
>>>>>>> possible maximum that the archive overflow was the least likely. If the 
>>>>>>> user uses compression, then it can be subtracted from the result 
>>>>>>> (result * getMaxSizeCompressedArchivedSegment).
>>>>>>> 
>>>>>>>    13.02.2021, 10:47, "Nikolay Izhikov" <nizhi...@apache.org>:
>>>>>>>>    Hello, Kirill.
>>>>>>>> 
>>>>>>>>>     It is not possible for us to estimate how much space a user will 
>>>>>>>>> need in the archive so as not to overflow it under its load
>>>>>>>> 
>>>>>>>>    It still not clear for me why do we need those metrics.
>>>>>>>>    Can you please, write down specific scenario - how user will use 
>>>>>>>> these metrics to estimate required WAL volume?
>>>>>>>> 
>>>>>>>>>     12 февр. 2021 г., в 19:35, ткаленко кирилл <tkalkir...@yandex.ru> 
>>>>>>>>> написал(а):
>>>>>>>>> 
>>>>>>>>>     Hi, Nikolay!
>>>>>>>>> 
>>>>>>>>>     It is not possible for us to estimate how much space a user will 
>>>>>>>>> need in the archive so as not to overflow it under its load. And the 
>>>>>>>>> proposed metrics will allow you to make a rough estimate.
>>>>>>>>> 
>>>>>>>>>     12.02.2021, 17:23, "Nikolay Izhikov" <nizhi...@apache.org>:
>>>>>>>>>>     Hello, Kirill.
>>>>>>>>>> 
>>>>>>>>>>     Can you, please, clarify - What question about WAL user have in 
>>>>>>>>>> mind?
>>>>>>>>>>     And what answers he(or she) gets with these new metrics?
>>>>>>>>>> 
>>>>>>>>>>>      12 февр. 2021 г., в 14:26, ткаленко кирилл 
>>>>>>>>>>> <tkalkir...@yandex.ru> написал(а):
>>>>>>>>>>> 
>>>>>>>>>>>      Hi everyone!
>>>>>>>>>>>      At the moment, I have not found an opportunity to estimate how 
>>>>>>>>>>> many WAL segments fall into the archive, say per day.
>>>>>>>>>>>      So I created a ticket 
>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-14170 to add a couple 
>>>>>>>>>>> of new metrics.

Reply via email to