[ceph-users] Interesting re-shuffling of pg's after adding new osd

2015-05-17 Thread Erik Logtenberg
Hi,

Two days ago I added a new osd to one of my ceph machines, because one
of the existing osd's got rather full. There was quite a difference in
disk space usage between osd's, but I understand this is kind of just
how ceph works. It spreads data over osd's but not perfectly even.

Now check out the graph of free disk space. You can clearly see the new
4TB osd added and how it starts to fill up. It's also quite visible that
some existing osd's profit more than others.
And not only is data put onto the new osd, but also data is exchanged
between existing osd's. This is also why it takes so incredibly long to
fill the new osd up, because ceph is spending most its time shuffling
data around instead of moving it to the new osd.

Anyway, what is especially troubling, is that the osd that was already
lowest on disk space, is actually filling up even more during this
process (!)
What's causing that and how can I get ceph to do the reasonable thing?

All crush weights are identical.

Thanks,

Erik.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] new relic ceph plugin

2015-05-17 Thread German Anders
Hi all,

   I want to know if someone has deploy some new relic (pyhon) plugin for
Ceph.

Thanks a lot,

Best regards,

*Ger*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Complete freeze of a cephfs client (unavoidable hard reboot)

2015-05-17 Thread Francois Lafont
Hi,

Sorry for my late answer.

Gregory Farnum wrote:

>> 1. Is this kind of freeze normal? Can I avoid these freezes with a
>> more recent version of the kernel in the client?
> 
> Yes, it's normal. Although you should have been able to do a lazy
> and/or force umount. :)

Ah, I haven't tried it.
Maybe I'm wrong but I think a "lazy" or a "force" umount wouldn't
succeed. I'll try to test if I can reproduce the freeze.

> You can't avoid the freeze with a newer client. :(
> 
> If you notice the problem quickly enough, you should be able to
> reconnect everything by rebooting the MDS — although if the MDS hasn't
> failed the client then things shouldn't be blocking, so actually that
> probably won't help you.

Yes, the mds was completely ok and after the hard-reboot of the client,
the client had access again to the cephfs with the exactly same mds service
in the cluster side (no restart etc).

>> 2. Can I avoid these freezes with ceph-fuse instead of the kernel
>> cephfs module? But in this case, the cephfs performance will be
>> worse. Am I wrong?
> 
> No, ceph-fuse will suffer the same blockage, although obviously in
> userspace it's a bit easier to clean up.

Yes, I suppose that after "kill" commands, I would be able to remount
the cephfs without any reboot etc., isn't it?

> Depending on your workload it
> will be slightly faster to a lot slower. Though you'll also get
> updates faster/more easily. ;)

Yes, I imagine that with "ceph-fuse" I have a completely updated
cephfs-client (in user-space) whereas with the cephfs-client kernel
I have just the version available in the current kernel of my client
node (3.16 in my case).

>> 3. Is there a parameter in ceph.conf to tell mds to be more patient
>> before closing the "stale session" of a client?
> 
> Yes. You'll need to increase the "mds session timeout" value on the
> MDS; it currently defaults to 60 seconds. You can increase that to
> whatever values you like. The tradeoff here is that if you have a
> client die, anything it had "capabilities' on (for read/write access)
> will be unavailable for anybody who's doing something that might
> conflict with those capabilities.

Ok, thanks for the warning, it seems logical.

> If you've got a new enough MDS (Hammer, probably, but you can check)

Yes, I use Hammer.

> then you can use the admin socket to boot specific sessions, so it may
> suit you to set very large timeouts and manually zap any client which
> actually goes away badly (rather than getting disconnected by the
> network).

Ok, I see. According to the online documentation, the way to close
a cephfs client session is:

ceph daemon mds.$id session ls # to get the $session_id and the 
$address
ceph osd blacklist add $address
ceph osd dump  # to get the $epoch
ceph daemon mds.$id osdmap barrier $epoch
ceph daemon mds.$id session evict $session_id

Is it correct?

With the commands above, could I reproduce the client freeze in my testing
cluster?

I'll try because it convenient to be able reproduce the problem just with
command lines (without to really stop the network in the client etc). I
would like to test if, with ceph-fuse, I can easily restore the situation
of my client.

>> I'm in a testing period and a hard reboot of my cephfs clients would
>> be quite annoying for me. Thanks in advance for your help.
> 
> Yeah. Unfortunately there's a basic tradeoff in strictly-consistent
> (aka POSIX) network filesystems here: if the network goes away, you
> can't be consistent any more because the disconnected client can make
> conflicting changes. And you can't tell exactly when the network
> disappeared.

And could it be conceivable one day (for instance with an option) to be
able to change the behavior of cephfs to be *not*-strictly-consistent,
like NFS for instance? It seems to me it could improve performances of
cephfs and cephfs could be more flexible concerning short network failure
(not really sure for this second point). Ok it's just a remark of a simple
and unqualified ceph-user ;) but it seems to me that NFS isn't strictly
consistent and generally this not a problem in many use cases. Am I wrong?

> So while we hope to make this less painful in the future, the network
> dying that badly is a failure case that you need to be aware of
> meaning that the client might have conflicting information. If it
> *does* have conflicting info, the best we can do about it is be
> polite, return a bunch of error codes, and unmount gracefully. We'll
> get there eventually but it's a lot of work.

Yes, I can imagine the amount of work...
Thank a lot Greg for your answer. ;)

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Complete freeze of a cephfs client (unavoidable hard reboot)

2015-05-17 Thread Francois Lafont
John Spray wrote:

> Greg's response is pretty comprehensive, but for completeness I'll add that 
> the specific case of shutdown blocking is http://tracker.ceph.com/issues/9477

Yes indeed, during the freeze, "INFO: task sync:3132 blocked for more than 120 
seconds..."
was exactly the message I have seen in the VNC console of the client (it was a 
Openstack VM).

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup hundreds or thousands of TB

2015-05-17 Thread Francois Lafont
Hi,

Wido den Hollander wrote:
 
> Aren't snapshots something that should protect you against removal? IF
> snapshots work properly in CephFS you could create a snapshot every hour.

Are you talking about the .snap/ directory in a cephfs directory?
If yes, does it work well? Because, with Hammer, if I want to enable
this feature:

~# ceph mds set allow_new_snaps true
Error EPERM: Snapshots are unstable and will probably break your FS!
Set to --yes-i-really-mean-it if you are sure you want to enable them

I have never tried with the --yes-i-really-mean-it option. The warning
is not very encouraging. ;)

> With the recursive statistics [0] of CephFS you could "easily" backup
> all your data to a different Ceph system or anything not Ceph.

What is the link between this (very interesting) recursive statistics
feature and the backup? I'm not sure to understand. Can you explain me?
Maybe you test if the size of a directory has changed?

> I've done this with a ~700TB CephFS cluster and that is still working
> properly.
> 
> Wido
> 
> [0]:
> http://blog.widodh.nl/2015/04/playing-with-cephfs-recursive-statistics/

Thanks Wido for this very interesting (and very simple) feature.
But does it work well? Because, I use Hammer in a Ubuntu Trusty
cluster nodes, and in a Ubuntu Trusty client with 3.16 kernel
and cephfs mounted with the kernel module client, I have this:

~# mount | grep cephfs # /mnt is my mounted cephfs
10.0.2.150,10.0.2.151,10.0.2.152:/ on /mnt type ceph 
(noacl,name=cephfs,key=client.cephfs)

~# ls -lah /mnt/dir1/
total 0
drwxr-xr-x 1 root root  96M May 12 21:06 .
drwxr-xr-x 1 root root 103M May 17 23:56 ..
drwxr-xr-x 1 root root  96M May 12 21:06 8
drwxr-xr-x 1 root root 4.0M May 17 23:57 test

As you can see:
  /mnt/dir1/8/  => 96M
  /mnt/dir1/test/   => 4.0M

But:
  /mnt/dir1/ (ie .) => 96M

I should have:

size("/mnt/dir1/") = size("/mnt/dir1/8/") + size("/mnt/dir1/test/")

and this is not the case. Is it normal?

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG scrubbing taking a long time

2015-05-17 Thread Tu Holmes
Hello everyone. I’m having an interesting thing happening to me.

I have a PG that has been doing a deep scrub for 3 days.

Other PGs start scrubbing and finish within a minute or two, but this PG just 
will not finish scrubbing at all. Any ideas as to how I can kick the scrub or 
nudge it into finishing?

Thanks.

===
Tu Holmes
tu.hol...@gmail.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com