...and it is, remounting the OSD's with a fixed allocsize of 128M e.g:

$ mount
...
/dev/vdb1 on /var/lib/ceph/osd/ceph-0 type xfs (rw,allocsize=128M)

prevents the previously observed transient 2x space utilization.

Reading
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=055388a3188f56676c21e92962fc366ac8b5cb72 clears up my original concern - that you'd need 2x the space (or similar) when putting large objects...the preallocation is smart and is aware of how close to full a filesystem is, which is hopefully enough to prevent needless ENOSPC on smaller clusters.

On 14/04/14 13:30, Mark Kirkwood wrote:
Yeah, I was looking at preallocation as the likely cause, but your link
is way better than anything I'd found (especially with the likely commit
- speculative preallocation - mentioned
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=055388a3188f56676c21e92962fc366ac8b5cb72)!


I'll try retesting with allocsize specified to (hopefully) confirm that
this *is* the effect we are seeing.

Cheers

Mark

On 14/04/14 01:41, Wojciech Meler wrote:
XFS aggressively preallocates blocks to prevent fragmentation.
Try du --apparent-size. You can tune preallocation for xfs filesystem
with 'allocsize' mount option.

Check out this
http://serverfault.com/questions/406069/why-are-my-xfs-filesystems-suddenly-consuming-more-space-and-full-of-sparse-file



On Thu, Apr 10, 2014 at 5:41 AM, Mark Kirkwood
<mark.kirkw...@catalyst.net.nz <mailto:mark.kirkw...@catalyst.net.nz>>
wrote:

    Redoing (attached, 1st file is for 2x space, 2nd for normal). I'm
    seeing:

    $ diff osd-du.0.txt osd-du.1.txt
    924,925c924,925
    < 2048
    /var/lib/ceph/osd/ceph-1/__current/5.1a_head/file__head___2E6FB49A__5
    < 2048    /var/lib/ceph/osd/ceph-1/__current/5.1a_head
    ---
     > 1024
    /var/lib/ceph/osd/ceph-1/__current/5.1a_head/file__head___2E6FB49A__5
     > 1024    /var/lib/ceph/osd/ceph-1/__current/5.1a_head
    931c931
    < 2054    /var/lib/ceph/osd/ceph-1/__current
    ---
     > 1030    /var/lib/ceph/osd/ceph-1/__current
    936c936
    < 2054    /var/lib/ceph/osd/ceph-1/
    ---
     > 1030    /var/lib/ceph/osd/ceph-1/

    Looks like the actual object has twice the disk footprint.
    Interestingly, comparing du vs ls info for it at that point shows:

    $ ls -l
    total 2097088
    -rw-r--r-- 1 root root 1073741824 Apr 10 15:33 file__head_2E6FB49A__5

    $ du file__head_2E6FB49A__5
    2097088 file__head_2E6FB49A__5

    ...which is interesting.

    Regards

    Mark


    On 10/04/14 15:11, Mark Kirkwood wrote:

        Ah right - sorry, I didn't realize that my 'du' was missing the
        files! I will retest and post updated output.

        Cheers

        Mark

        On 10/04/14 15:04, Gregory Farnum wrote:

            Right, but I'm interested in the space allocation within the
            PG. The
            best guess I can come up with without trawling through the
            code is
            that some layer in the stack is preallocated and then
            trimmed the
            objects back down once writing stops, but I'd like some more
            data
            points before I dig.
            -Greg
            Software Engineer #42 @ http://inktank.com | http://ceph.com


            On Wed, Apr 9, 2014 at 7:59 PM, Mark Kirkwood
            <mark.kirkw...@catalyst.net.nz
            <mailto:mark.kirkw...@catalyst.net.nz>__> wrote:

                It is only that single pg using the space (see attached)
                - but essentially:

                $ du -m /var/lib/ceph/osd/ceph-1
                ...
                2048    /var/lib/ceph/osd/ceph-1/__current/5.1a_head
                2053    /var/lib/ceph/osd/ceph-1/__current
                2053    /var/lib/ceph/osd/ceph-1/

                Which is resized to 1025 soon after. Interestingly I am
                not seeing this
                effect (same ceph version) on a single host setup with 2
                osds using
                preexisting partitions... it's only on these multi host
                configurations that
                have osd's using whole devices (both setups installed
                using ceph-deploy, so
                in theory nothing exotic about 'em except for the multi
                'hosts' are actually
                VMs).

                Regards

                Mark

                On 10/04/14 02:27, Gregory Farnum wrote:

                    I don't think the backing store should be seeing any
                    effects like
                    that. What are the filenames which are using up that
                    space inside the
                    folders?
                    -Greg
                    Software Engineer #42 @ http://inktank.com |
                    http://ceph.com


                    On Wed, Apr 9, 2014 at 1:58 AM, Mark Kirkwood
                    <mark.kirkw...@catalyst.net.nz
                    <mailto:mark.kirkw...@catalyst.net.nz>__> wrote:

                        Hi all,

                        I've noticed that objects are using twice their
                        actual space for a few
                        minutes after they are 'put' via rados:

                        $ ceph -v
                        ceph version 0.79-42-g010dff1
                        (__010dff12c38882238591bb042f8e49__7a1f7ba020)

                        $ ceph osd tree
                        # id    weight  type name       up/down reweight
                        -1      0.03998 root default
                        -2      0.009995                host ceph1
                        0       0.009995                        osd.0
                        up      1
                        -3      0.009995                host ceph2
                        1       0.009995                        osd.1
                        up      1
                        -4      0.009995                host ceph3
                        2       0.009995                        osd.2
                        up      1
                        -5      0.009995                host ceph4
                        3       0.009995                        osd.3
                        up      1

                        $ ceph osd dump|grep repool
                        pool 5 'repool' replicated size 3 min_size 2
                        crush_ruleset 0 object_hash
                        rjenkins pg_num 64 pgp_num 64 last_change 57
                        owner 0 flags hashpspool
                        stripe_width 0

                        $ du -m  file
                        1025    file

                        $ rados put -p repool file file

                        $ cd /var/lib/ceph/osd/ceph-1/__current/
                        $ du -m 5.1a_head
                        2048          5.1a_head

                        [later]

                        $ du -m 5.1a_head
                        1024          5.1a_head

                        The above situation is repeated on the other two
                        OSD's where this pg is
                        mapped. So after about 5 minutes or so we have
                        (as expected) that the 1G
                        file is using 1G on each of the 3 OSD's it is
                        mapped to, however for a
                        short
                        period of time it is using twice this! I very
                        interested to know what
                        activity is happening that causes the 2x space
                        use - as this could be a
                        significant foot gun if uploading large files
                        when we don't have 2x the
                        space available on each OSD.

                        Regards

                        Mark
                        _________________________________________________
                        ceph-users mailing list
                        ceph-users@lists.ceph.com
                        <mailto:ceph-users@lists.ceph.com>

http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>



        _________________________________________________
        ceph-users mailing list
        ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
        http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>



    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to