On 07/24/2015 05:41 AM, Gena Makhomed wrote:

To anyone reading this, there are a few things here worth noting.

a. Such overhead is caused by three things:
1. creating then removing data (vzctl compact takes care of that)
2. filesystem fragmentation (we have some experimental patches to ext4
     plus an ext4 defragmenter to solve it, but currently it's still in
research stage)
3. initial filesystem layout (which depends on initial ext4 fs size,
including inode requirement)

So, #1 is solved, #2 is solvable, and #3 is a limitation of the used
file system and can me mitigated
by properly choosing initial size of a newly created ploop.

this container is compacted every night, during working day
only new static files added to container, this container does
not contain many "creating then removing data" operations.

current state:

on hardware node:

# du -b /vz/private/155/root.hdd
203547480857    /vz/private/155/root.hdd

inside container:

# df -B1
Filesystem 1B-blocks Used Available Use% Mounted on
/dev/ploop55410p1     270426705920  163581190144  94476423168  64% /


used space, bytes: 163581190144

image size, bytes: 203547480857

overhead: ~ 37 GiB, ~ 19.6%

container was compacted at 03:00
by command /usr/sbin/vzctl compact 155

run container compacting right now:
9443 clusters have been relocated

result:

used space, bytes: 163604983808

image size, bytes: 193740149529

overhead: ~ 28 GiB, ~ 15.5%

I think this is not good idea run ploop compaction more frequently,
then one time per day at the night - so we need take into account
not minimal value of overhead, but maximal one, after 24 hours
of container work in normal mode - to planning disk space
on hardware node for all ploop images.

so real overhead of ploop can be accounted only
after at lest 24h of container being in running state.

A example of #3 effect is this: if you create a very large filesystem
initially (say, 16TB) and then downsize it (say, to 1TB), filesystem
metadata overhead will be quite big. Same thing happens if you ask
for lots of inodes (here "lots" means more than a default value which
is 1 inode per 16K of disk space). This happens because ext4
filesystem is not designed to shrink. Therefore, to have lowest
possible overhead you have to choose the initial filesystem size
carefully. Yes, this is not a solution but a workaround.

as you can see by inodes:

# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/ploop55410p1 16777216 1198297 15578919 8% /

initial filesystem size was 256 GiB:

c (16777216 * 16 * 1024) / 1024.0/1024.0/1024.0 == 256 GiB.

current filesystem size is also 256 GiB:

# cat /etc/vz/conf/155.conf | grep DISKSPACE
DISKSPACE="268435456:268435456"

so there is no extra "filesystem metadata overhead".

Agree, this looks correct.


what I am doing wrong, and how I can decrease ploop overhead here?

Most probably it's because of filesystem defragmentation (my item #2 above).
We are currently working on that. For example, see this report:

 https://lwn.net/Articles/637428/


I found only one way: migrate to ZFS with turned on lz4 compression.

Also note, that ploop was not designed with any specific filesystem in
mind, it is universal, so #3 can be solved by moving to a different fs in the future.

XFS currently not support filesystem snrinking at all:
http://xfs.org/index.php/Shrinking_Support

Actually ext4 doesn't support shrinking either; in ploop we worked around it
using a hidden balloon file. It appears to work pretty well, with the only
downside is if you initially create a very large ploop and then shrink it
considerably, ext4 metadata overhead will be larger.


BTRFS is not production-ready and no other variants
except ext4 are available for using with ploop in near future.

_______________________________________________
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users

Reply via email to