[jira] [Updated] (IGNITE-6532) Introduce preallocation in LFS files to avoid high fragmentation on filesystem level

Ivan Rakov (JIRA) Fri, 29 Sep 2017 01:49:07 -0700

     [ 
https://issues.apache.org/jira/browse/IGNITE-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ivan Rakov updated IGNITE-6532:
-------------------------------
    Description: 
Modern databases (Oracle, MySql) work with storage drive on physical level, 
creating their own partition table and filesystem.
Ignite Persistent Store work with regular files. It appends new pages to 
partition file once new pages are allocated and written on checkpoint. These 
new pages can form one or several fragments on filesystem level.
As a result, after weeks of uptime, partition files can contain huge amount of 
fragments. There were reports about 1200000 fragments in index.bin file on XFS 
filesystem. 
We can work this around by preallocating files in bigger chunks, e.g. 1000 
pages at a time. On the other hand, early allocation will increase LFS size 
overhead, so we should consider reasonable heuristic for allocation.
Allocation should be performed on native level. Just writing a byte at position 
(file_size + page_size * 1000) won't do it because XFS (and other filesystems 
as well) has an optimization for that case. Missing range will be just skipped.
Related article about filesystem internals: 
https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadlock-kmem_alloc/

  was:
Modern databases (Oracle, MySql) work with storage drive on physical level, 
creating their own partition table and filesystem.
Ignite Persistent Store work with regular files. It appends new pages to 
partition file once new pages are allocated and written on checkpoint. These 
new pages can form one or several fragments on filesystem level.
As a result, after weeks of uptime, partition files can contain huge amount of 
fragments. There were reports about 1200000 fragments in index.bin file on XFS 
filesystem. 
We can work this around by preallocating files in bigger chunks, e.g. 1000 
pages at a time. On the other hand, early allocation will increase LFS size 
overhead, so we should consider reasonable heuristic for allocation.
Allocation should be performed on native level. Just writing a byte at position 
(file_size + page_size * 1000) won't do it because XFS (and other filesystems 
as well) has an optimization for that case. Missing range will be just skipped.


> Introduce preallocation in LFS files to avoid high fragmentation on 
> filesystem level
> ------------------------------------------------------------------------------------
>
>                 Key: IGNITE-6532
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6532
>             Project: Ignite
>          Issue Type: Bug
>          Components: persistence
>    Affects Versions: 2.2
>            Reporter: Ivan Rakov
>             Fix For: 2.4
>
>
> Modern databases (Oracle, MySql) work with storage drive on physical level, 
> creating their own partition table and filesystem.
> Ignite Persistent Store work with regular files. It appends new pages to 
> partition file once new pages are allocated and written on checkpoint. These 
> new pages can form one or several fragments on filesystem level.
> As a result, after weeks of uptime, partition files can contain huge amount 
> of fragments. There were reports about 1200000 fragments in index.bin file on 
> XFS filesystem. 
> We can work this around by preallocating files in bigger chunks, e.g. 1000 
> pages at a time. On the other hand, early allocation will increase LFS size 
> overhead, so we should consider reasonable heuristic for allocation.
> Allocation should be performed on native level. Just writing a byte at 
> position (file_size + page_size * 1000) won't do it because XFS (and other 
> filesystems as well) has an optimization for that case. Missing range will be 
> just skipped.
> Related article about filesystem internals: 
> https://blog.codecentric.de/en/2017/04/xfs-possible-memory-allocation-deadlock-kmem_alloc/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (IGNITE-6532) Introduce preallocation in LFS files to avoid high fragmentation on filesystem level

Reply via email to