[ceph-users] OSD removal is not cleaning entry from osd listing

2015-07-31 Thread Mallikarjun Biradar
Hi,

I had 27 OSD's in my cluster. I removed two of the OSD from (osd.20)
host-3 & (osd.22) host-6.

user@host-1:~$ sudo ceph osd tree
ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 184.67990 root default
-7  82.07996 chassis chassis2
-4  41.03998 host host-3
 8   6.84000 osd.8  up  1.0  1.0
 9   6.84000 osd.9  up  1.0  1.0
10   6.84000 osd.10 up  1.0  1.0
11   6.84000 osd.11 up  1.0  1.0
20   6.84000 osd.20 up  1.0  1.0
21   6.84000 osd.21 up  1.0  1.0
-5  41.03998 host host-6
12   6.84000 osd.12 up  1.0  1.0
13   6.84000 osd.13 up  1.0  1.0
14   6.84000 osd.14 up  1.0  1.0
15   6.84000 osd.15 up  1.0  1.0
22   6.84000 osd.22 up  1.0  1.0
23   6.84000 osd.23 up  1.0  1.0
-6 102.59995 chassis chassis1
-2  47.87997 host host-1
 0   6.84000 osd.0  up  1.0  1.0
 1   6.84000 osd.1  up  1.0  1.0
 2   6.84000 osd.2  up  1.0  1.0
 3   6.84000 osd.3  up  1.0  1.0
16   6.84000 osd.16 up  1.0  1.0
17   6.84000 osd.17 up  1.0  1.0
24   6.84000 osd.24 up  1.0  1.0
-3  54.71997 host host-2
 4   6.84000 osd.4  up  1.0  1.0
 5   6.84000 osd.5  up  1.0  1.0
 6   6.84000 osd.6  up  1.0  1.0
 7   6.84000 osd.7  up  1.0  1.0
18   6.84000 osd.18 up  1.0  1.0
19   6.84000 osd.19 up  1.0  1.0
25   6.84000 osd.25 up  1.0  1.0
26   6.84000 osd.26 up  1.0  1.0
user@host-1:~$

Steps used to remove OSD:
user@host-1:~$ ceph auth del osd.20; ceph osd crush rm osd.20; ceph
osd down osd.20; ceph osd rm osd.20
updated
removed item id 20 name 'osd.20' from crush map
marked down osd.22.
removed osd.22

Removed both of OSD's osd.20 & osd.22

But, even after removing them, ceph osd tree is listing deleted OSD's
& ceph -s reporting total number of OSD's as 27.

user@host-1:~$ sudo ceph osd tree
ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 184.67990 root default
-7  82.07996 chassis chassis2
-4  41.03998 host host-3
 8   6.84000 osd.8  up  1.0  1.0
 9   6.84000 osd.9  up  1.0  1.0
10   6.84000 osd.10 up  1.0  1.0
11   6.84000 osd.11 up  1.0  1.0
21   6.84000 osd.21 up  1.0  1.0
-5  41.03998 host host-6
12   6.84000 osd.12 up  1.0  1.0
13   6.84000 osd.13 up  1.0  1.0
14   6.84000 osd.14 up  1.0  1.0
15   6.84000 osd.15 up  1.0  1.0
23   6.84000 osd.23 up  1.0  1.0
-6 102.59995 chassis chassis1
-2  47.87997 host host-1
 0   6.84000 osd.0  up  1.0  1.0
 1   6.84000 osd.1  up  1.0  1.0
 2   6.84000 osd.2  up  1.0  1.0
 3   6.84000 osd.3  up  1.0  1.0
16   6.84000 osd.16 up  1.0  1.0
17   6.84000 osd.17 up  1.0  1.0
24   6.84000 osd.24 up  1.0  1.0
-3  54.71997 host host-2
 4   6.84000 osd.4  up  1.0  1.0
 5   6.84000 osd.5  up  1.0  1.0
 6   6.84000 osd.6  up  1.0  1.0
 7   6.84000 osd.7  up  1.0  1.0
18   6.84000 osd.18 up  1.0  1.0
19   6.84000 osd.19 up  1.0  1.0
25   6.84000 osd.25 up  1.0  1.0
26   6.84000 osd.26 up  1.0  1.0
20 0 osd.20

Re: [ceph-users] dropping old distros: el6, precise 12.04, debian wheezy?

2015-07-31 Thread Alexandre DERUMIER
>>As I still haven't heard or seen about any upstream distros for Debian 
>>Jessie (see also [1]),

Gitbuilder is already done for jessie

http://gitbuilder.ceph.com/ceph-deb-jessie-x86_64-basic/

@Sage : Don't known if something is blocking to release package officially ?



- Mail original -
De: "Brian Kroth" 
À: "Sage Weil" 
Cc: "ceph-devel" , "ceph-users" 

Envoyé: Jeudi 30 Juillet 2015 17:58:12
Objet: Re: [ceph-users] dropping old distros: el6, precise 12.04, debian wheezy?

Sage Weil  2015-07-30 06:54: 
>As time marches on it becomes increasingly difficult to maintain proper 
>builds and packages for older distros. For example, as we make the 
>systemd transition, maintaining the kludgey sysvinit and udev support for 
>centos6/rhel6 is a pain in the butt and eats up time and energy to 
>maintain and test that we could be spending doing more useful work. 
> 
>"Dropping" them would mean: 
> 
> - Ongoing development on master (and future versions like infernalis and 
>jewel) would not be tested on these distros. 
> 
> - We would stop building upstream release packages on ceph.com for new 
>releases. 
> 
> - We would probably continue building hammer and firefly packages for 
>future bugfix point releases. 
> 
> - The downstream distros would probably continue to package them, but the 
>burden would be on them. For example, if Ubuntu wanted to ship Jewel on 
>precise 12.04, they could, but they'd probably need to futz with the 
>packaging and/or build environment to make it work. 
> 
>So... given that, I'd like to gauge user interest in these old distros. 
>Specifically, 
> 
> CentOS6 / RHEL6 
> Ubuntu precise 12.04 
> Debian wheezy 
> 
>Would anyone miss them? 
> 
>In particular, dropping these three would mean we could drop sysvinit 
>entirely and focus on systemd (and continue maintaining the existing 
>upstart files for just a bit longer). That would be a relief. (The 
>sysvinit files wouldn't go away in the source tree, but we wouldn't worry 
>about packaging and testing them properly.) 
> 
>Thanks! 
>sage 

As I still haven't heard or seen about any upstream distros for Debian 
Jessie (see also [1]), I am still running Debian Wheezy and as that is 
supposed to be supported for another ~4 years by Debian, it would be 
very nice if there were at least stability and security fixes backported 
for the upstream ceph package repositories for that platform. 

Additionally, I'll note that I'm personally likely to continue to use 
sysvinit so long as I still can, even when I am able to make the switch 
to Jessie. 

Thanks, 
Brian 

[1]  

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD removal is not cleaning entry from osd listing

2015-07-31 Thread Mallikarjun Biradar
For a moment it de-list removed OSD's and after sometime it again
comes up in ceph osd tree listing.

On Fri, Jul 31, 2015 at 12:45 PM, Mallikarjun Biradar
 wrote:
> Hi,
>
> I had 27 OSD's in my cluster. I removed two of the OSD from (osd.20)
> host-3 & (osd.22) host-6.
>
> user@host-1:~$ sudo ceph osd tree
> ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 184.67990 root default
> -7  82.07996 chassis chassis2
> -4  41.03998 host host-3
>  8   6.84000 osd.8  up  1.0  1.0
>  9   6.84000 osd.9  up  1.0  1.0
> 10   6.84000 osd.10 up  1.0  1.0
> 11   6.84000 osd.11 up  1.0  1.0
> 20   6.84000 osd.20 up  1.0  1.0
> 21   6.84000 osd.21 up  1.0  1.0
> -5  41.03998 host host-6
> 12   6.84000 osd.12 up  1.0  1.0
> 13   6.84000 osd.13 up  1.0  1.0
> 14   6.84000 osd.14 up  1.0  1.0
> 15   6.84000 osd.15 up  1.0  1.0
> 22   6.84000 osd.22 up  1.0  1.0
> 23   6.84000 osd.23 up  1.0  1.0
> -6 102.59995 chassis chassis1
> -2  47.87997 host host-1
>  0   6.84000 osd.0  up  1.0  1.0
>  1   6.84000 osd.1  up  1.0  1.0
>  2   6.84000 osd.2  up  1.0  1.0
>  3   6.84000 osd.3  up  1.0  1.0
> 16   6.84000 osd.16 up  1.0  1.0
> 17   6.84000 osd.17 up  1.0  1.0
> 24   6.84000 osd.24 up  1.0  1.0
> -3  54.71997 host host-2
>  4   6.84000 osd.4  up  1.0  1.0
>  5   6.84000 osd.5  up  1.0  1.0
>  6   6.84000 osd.6  up  1.0  1.0
>  7   6.84000 osd.7  up  1.0  1.0
> 18   6.84000 osd.18 up  1.0  1.0
> 19   6.84000 osd.19 up  1.0  1.0
> 25   6.84000 osd.25 up  1.0  1.0
> 26   6.84000 osd.26 up  1.0  1.0
> user@host-1:~$
>
> Steps used to remove OSD:
> user@host-1:~$ ceph auth del osd.20; ceph osd crush rm osd.20; ceph
> osd down osd.20; ceph osd rm osd.20
> updated
> removed item id 20 name 'osd.20' from crush map
> marked down osd.22.
> removed osd.22
>
> Removed both of OSD's osd.20 & osd.22
>
> But, even after removing them, ceph osd tree is listing deleted OSD's
> & ceph -s reporting total number of OSD's as 27.
>
> user@host-1:~$ sudo ceph osd tree
> ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 184.67990 root default
> -7  82.07996 chassis chassis2
> -4  41.03998 host host-3
>  8   6.84000 osd.8  up  1.0  1.0
>  9   6.84000 osd.9  up  1.0  1.0
> 10   6.84000 osd.10 up  1.0  1.0
> 11   6.84000 osd.11 up  1.0  1.0
> 21   6.84000 osd.21 up  1.0  1.0
> -5  41.03998 host host-6
> 12   6.84000 osd.12 up  1.0  1.0
> 13   6.84000 osd.13 up  1.0  1.0
> 14   6.84000 osd.14 up  1.0  1.0
> 15   6.84000 osd.15 up  1.0  1.0
> 23   6.84000 osd.23 up  1.0  1.0
> -6 102.59995 chassis chassis1
> -2  47.87997 host host-1
>  0   6.84000 osd.0  up  1.0  1.0
>  1   6.84000 osd.1  up  1.0  1.0
>  2   6.84000 osd.2  up  1.0  1.0
>  3   6.84000 osd.3  up  1.0  1.0
> 16   6.84000 osd.16 up  1.0  1.0
> 17   6.84000 osd.17 up  1.0  1.0
> 24   6.84000 osd.24 up  1.0  1.0
> -3  54.71997 host host-2
>  4   6.84000 osd.4  up  1.0  1.0
>  5   6.84000 osd.5  up  1.0  1.0
>  6   6.84000 osd.6  up  1.0  1.0
>  7   6.84000 osd.7  up  1.0  1.0
>

Re: [ceph-users] RGW + civetweb + SSL

2015-07-31 Thread Bernhard Duebi





Hi,I had the same problem. Aparently civetweb can talk https when run standalone. But I didn't find out how to pass the necessary options to civetweb through ceph. So, I put haproxy in front of civerweb. haproxy terminates the https connection and forwards the requests in plain text to civerweb. For me this is ok as haproxy and civetweb are running on the same host. I'm not a haproxy specialist. I found the attached config on google and it's working for me.RegardsBernhard-Original Message-From: okd...@gmail.comSent: Thu, 30 Jul 2015 21:16:19 -0300To: ceph-users@lists.ceph.comSubject: [ceph-users] RGW + civetweb + SSL

Hello,I’d like to know if someone know how to setup a SSL implementation of RGW with civetweb?The only “documentation” that I found about that is a “bug” - http://tracker.ceph.com/issues/11239 - which I’d like to know if this kind of implementation really works?
Regards.Italo Santoshttp://italosantos.com.br/




Free 3D Marine Aquarium Screensaver
Watch dolphins, sharks & orcas on your desktop! Check it out at www.inbox.com/marineaquarium




haproxy.cfg
Description: Binary data
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD startup causing slow requests - one tip from me

2015-07-31 Thread Jan Schermer
I know a few other people here were battling with the occasional issue of OSD 
being extremely slow when starting.

I personally run OSDs mixed with KVM guests on the same nodes, and was baffled 
by this issue occuring mostly on the most idle (empty) machines.
Thought it was some kind of race condition where OSD started too fast and disks 
couldn’t catch up, was investigating latency of CPUs and cards on a mostly idle 
hardware etc. - with no improvement.

But in the end, most of my issues were caused by page cache using too much 
memory. This doesn’t cause any problems when the OSDs have their memory 
allocated and are running, but when the OSD is (re)started, OS struggles to 
allocate contiguous blocks of memory for it and its buffers.
This could also be why I’m seeing such an improvement with my NUMA pinning 
script - cleaning memory on one node is probably easier and doesn’t block 
allocations on other nodes.

How can you tell if this is your case? When restarting an OSD that has this 
issue, look for CPU usage of “kswapd” processes. If it is >0 then you have this 
issue and would benefit from setting this:

for i in $(mount |grep "ceph/osd" |cut -d' ' -f1 |cut -d'/' -f3 |tr -d '[0-9]') 
; do echo 1 >/sys/block/$i/bdi/max_ratio ; done
(another option is echo 1 > drop_caches before starting the OSD, but that’s a 
bit brutal)

What this does is it limits the pagecache size for each block device to 1% of 
physical memory. I’d like to limit it even further but it doesn’t understand 
“0.3”...

Let me know if it helps, I’ve not been able to test if this cures the problem 
completely, but there was no regression after setting it.

Jan

P.S. This is for RHEL 6 / CentOS 6 ancient 2.6.32 kernel, newer kernels have 
tunables to limit the overall pagecache size. You can also set the limits in 
cgroups but I’m afraid that won’t help in this case as you can only set the 
whole memory footprint limit where it will battle for allocations anyway.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] dropping old distros: el6, precise 12.04, debian wheezy?

2015-07-31 Thread Mariusz Gronczewski
On Thu, 30 Jul 2015 06:54:13 -0700 (PDT), Sage Weil 
wrote:


> So... given that, I'd like to gauge user interest in these old distros.  
> Specifically,
> 
>  CentOS6 / RHEL6
>  Ubuntu precise 12.04
>  Debian wheezy
> 
> Would anyone miss them?
> 

Well, Centos 6 will be supported to 2020, and centos 7 was released a
year ago so I'd imagine a lot of people haven't migrated yet and
migration process is nontrivial if you already did some modificiations
to c6 (read: fix broken as fuck init scripts for few apps)
-- 
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczew...@efigence.com



pgpcGhLAW_G42.pgp
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Check networking first?

2015-07-31 Thread John Spray

On 31/07/15 06:27, Stijn De Weirdt wrote:
wouldn't it be nice that ceph does something like this in background 
(some sort of network-scrub). debugging network like this is not that 
easy (can't expect admins to install e.g. perfsonar on all nodes 
and/or clients)


something like: every X min, each service X pick a service Y on 
another host (assuming X and Y will exchange some communication at 
some point; like osd with other osd), send 1MB of data, and make the 
timing data available so we can monitor it and detect underperforming 
links over time.


ideally clients also do this, but not sure where they should 
report/store the data.


interpreting the data can be a bit tricky, but extreme outliers will 
be spotted easily, and the main issue with this sort of debugging is 
collecting the data.


simply reporting / keeping track of ongoing communications is already 
a big step forward, but then we need to have the size of the exchanged 
data to allow interpretation (and the timing should be about the 
network part, not e.g. flush data to disk in case of an osd). (and 
obviously sampling is enough, no need to have details of every bit send).


Yes, it's a reasonable concept, although it's not clear that we'd 
necessarily want it built into existing ceph services.  For example, 
where there are several OSDs running on a host, we don't really want all 
the OSDs redundantly verifying that particular host's network 
functionality.  This use case is a pretty good argument for a ceph 
supervisor service of some kind that exists on a one-per-host basis.  
The trick is finding someone with time to write it :-)


The prior art here is Lustres LNET self test (LST) which exists for 
exactly these reasons (Mark will have memories of this too I'm sure).


John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD removal is not cleaning entry from osd listing

2015-07-31 Thread John Spray



On 31/07/15 09:47, Mallikarjun Biradar wrote:

For a moment it de-list removed OSD's and after sometime it again
comes up in ceph osd tree listing.



Is the OSD service itself definitely stopped?  Are you using any 
orchestration systems (puppet, chef) that might be re-creating its auth 
key etc?


John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD removal is not cleaning entry from osd listing

2015-07-31 Thread Mallikarjun Biradar
Yeah. OSD service stopped.
Nope, I am not using any orchestration system.

user@host-1:~$ ps -ef | grep ceph
root  2305 1  7 Jul27 ?06:52:36 /usr/bin/ceph-osd
--cluster=ceph -i 3 -f
root  2522 1  6 Jul27 ?06:19:42 /usr/bin/ceph-osd
--cluster=ceph -i 0 -f
root  2792 1  6 Jul27 ?06:07:49 /usr/bin/ceph-osd
--cluster=ceph -i 2 -f
root  2904 1  8 Jul27 ?07:48:19 /usr/bin/ceph-osd
--cluster=ceph -i 1 -f
root 13368 1  5 Jul28 ?04:15:31 /usr/bin/ceph-osd
--cluster=ceph -i 17 -f
root 16685 1  6 Jul28 ?04:36:54 /usr/bin/ceph-osd
--cluster=ceph -i 16 -f
root 26942 1  7 Jul29 ?03:54:45 /usr/bin/ceph-osd
--cluster=ceph -i 24 -f
user  42767 42749  0 15:58 pts/300:00:00 grep --color=auto ceph
use@host-1:~$ ceph osd tree
ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 170.1 root default
-7  68.39996 chassis chassis2
-4  34.19998 host host-3
 8   6.84000 osd.8  up  1.0  1.0
 9   6.84000 osd.9  up  1.0  1.0
10   6.84000 osd.10 up  1.0  1.0
11   6.84000 osd.11 up  1.0  1.0
21   6.84000 osd.21 up  1.0  1.0
-5  34.19998 host host-6
12   6.84000 osd.12 up  1.0  1.0
13   6.84000 osd.13 up  1.0  1.0
14   6.84000 osd.14 up  1.0  1.0
15   6.84000 osd.15 up  1.0  1.0
23   6.84000 osd.23 up  1.0  1.0
-6 102.59995 chassis chassis1
-2  47.87997 host host-1
 0   6.84000 osd.0  up  1.0  1.0
 1   6.84000 osd.1  up  1.0  1.0
 2   6.84000 osd.2  up  1.0  1.0
 3   6.84000 osd.3  up  1.0  1.0
16   6.84000 osd.16 up  1.0  1.0
17   6.84000 osd.17 up  1.0  1.0
24   6.84000 osd.24 up  1.0  1.0
-3  54.71997 host host-2
 4   6.84000 osd.4  up  1.0  1.0
 5   6.84000 osd.5  up  1.0  1.0
 6   6.84000 osd.6  up  1.0  1.0
 7   6.84000 osd.7  up  1.0  1.0
18   6.84000 osd.18 up  1.0  1.0
19   6.84000 osd.19 up  1.0  1.0
25   6.84000 osd.25 up  1.0  1.0
26   6.84000 osd.26 up  1.0  1.0
20 0 osd.20 up  1.0  1.0
22 0 osd.22 up  1.0  1.0
user@host-1:~$
user@host-1:~$ df
Filesystem  1K-blocks   Used  Available Use% Mounted on
/dev/sdq1   414579696   11211248  382285928   3% /
none4  0  4   0% /sys/fs/cgroup
udev 65980912  4   65980908   1% /dev
tmpfs13198836   1124   13197712   1% /run
none 5120  0   5120   0% /run/lock
none 65994176 12   65994164   1% /run/shm
none   102400  0 102400   0% /run/user
/dev/sdl1  7345777988 3233438932 4112339056  45% /var/lib/ceph/osd/ceph-2
/dev/sda1  7345777988 4484766028 2861011960  62% /var/lib/ceph/osd/ceph-3
/dev/sdn1  7345777988 3344604424 4001173564  46% /var/lib/ceph/osd/ceph-1
/dev/sdp1  7345777988 3897260808 3448517180  54% /var/lib/ceph/osd/ceph-0
/dev/sdc1  7345777988 3029110220 4316667768  42% /var/lib/ceph/osd/ceph-16
/dev/sde1  7345777988 2673181020 4672596968  37% /var/lib/ceph/osd/ceph-17
/dev/sdg1  7345777988 3537932824 3807845164  49% /var/lib/ceph/osd/ceph-24
user@host-1:~$

On Fri, Jul 31, 2015 at 3:53 PM, John Spray  wrote:
>
>
> On 31/07/15 09:47, Mallikarjun Biradar wrote:
>>
>> For a moment it de-list removed OSD's and after sometime it again
>> comes up in ceph osd tree listing.
>>
>
> Is the OSD service itself definitely stopped?  Are you using any
> orchestration systems (puppet, chef) that might be re-creating its auth key
> etc?
>
> John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD removal is not cleaning entry from osd listing

2015-07-31 Thread Mallikarjun Biradar
I am using hammer 0.94

On Fri, Jul 31, 2015 at 4:01 PM, Mallikarjun Biradar
 wrote:
> Yeah. OSD service stopped.
> Nope, I am not using any orchestration system.
>
> user@host-1:~$ ps -ef | grep ceph
> root  2305 1  7 Jul27 ?06:52:36 /usr/bin/ceph-osd
> --cluster=ceph -i 3 -f
> root  2522 1  6 Jul27 ?06:19:42 /usr/bin/ceph-osd
> --cluster=ceph -i 0 -f
> root  2792 1  6 Jul27 ?06:07:49 /usr/bin/ceph-osd
> --cluster=ceph -i 2 -f
> root  2904 1  8 Jul27 ?07:48:19 /usr/bin/ceph-osd
> --cluster=ceph -i 1 -f
> root 13368 1  5 Jul28 ?04:15:31 /usr/bin/ceph-osd
> --cluster=ceph -i 17 -f
> root 16685 1  6 Jul28 ?04:36:54 /usr/bin/ceph-osd
> --cluster=ceph -i 16 -f
> root 26942 1  7 Jul29 ?03:54:45 /usr/bin/ceph-osd
> --cluster=ceph -i 24 -f
> user  42767 42749  0 15:58 pts/300:00:00 grep --color=auto ceph
> use@host-1:~$ ceph osd tree
> ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 170.1 root default
> -7  68.39996 chassis chassis2
> -4  34.19998 host host-3
>  8   6.84000 osd.8  up  1.0  1.0
>  9   6.84000 osd.9  up  1.0  1.0
> 10   6.84000 osd.10 up  1.0  1.0
> 11   6.84000 osd.11 up  1.0  1.0
> 21   6.84000 osd.21 up  1.0  1.0
> -5  34.19998 host host-6
> 12   6.84000 osd.12 up  1.0  1.0
> 13   6.84000 osd.13 up  1.0  1.0
> 14   6.84000 osd.14 up  1.0  1.0
> 15   6.84000 osd.15 up  1.0  1.0
> 23   6.84000 osd.23 up  1.0  1.0
> -6 102.59995 chassis chassis1
> -2  47.87997 host host-1
>  0   6.84000 osd.0  up  1.0  1.0
>  1   6.84000 osd.1  up  1.0  1.0
>  2   6.84000 osd.2  up  1.0  1.0
>  3   6.84000 osd.3  up  1.0  1.0
> 16   6.84000 osd.16 up  1.0  1.0
> 17   6.84000 osd.17 up  1.0  1.0
> 24   6.84000 osd.24 up  1.0  1.0
> -3  54.71997 host host-2
>  4   6.84000 osd.4  up  1.0  1.0
>  5   6.84000 osd.5  up  1.0  1.0
>  6   6.84000 osd.6  up  1.0  1.0
>  7   6.84000 osd.7  up  1.0  1.0
> 18   6.84000 osd.18 up  1.0  1.0
> 19   6.84000 osd.19 up  1.0  1.0
> 25   6.84000 osd.25 up  1.0  1.0
> 26   6.84000 osd.26 up  1.0  1.0
> 20 0 osd.20 up  1.0  1.0
> 22 0 osd.22 up  1.0  1.0
> user@host-1:~$
> user@host-1:~$ df
> Filesystem  1K-blocks   Used  Available Use% Mounted on
> /dev/sdq1   414579696   11211248  382285928   3% /
> none4  0  4   0% /sys/fs/cgroup
> udev 65980912  4   65980908   1% /dev
> tmpfs13198836   1124   13197712   1% /run
> none 5120  0   5120   0% /run/lock
> none 65994176 12   65994164   1% /run/shm
> none   102400  0 102400   0% /run/user
> /dev/sdl1  7345777988 3233438932 4112339056  45% /var/lib/ceph/osd/ceph-2
> /dev/sda1  7345777988 4484766028 2861011960  62% /var/lib/ceph/osd/ceph-3
> /dev/sdn1  7345777988 3344604424 4001173564  46% /var/lib/ceph/osd/ceph-1
> /dev/sdp1  7345777988 3897260808 3448517180  54% /var/lib/ceph/osd/ceph-0
> /dev/sdc1  7345777988 3029110220 4316667768  42% /var/lib/ceph/osd/ceph-16
> /dev/sde1  7345777988 2673181020 4672596968  37% /var/lib/ceph/osd/ceph-17
> /dev/sdg1  7345777988 3537932824 3807845164  49% /var/lib/ceph/osd/ceph-24
> user@host-1:~$
>
> On Fri, Jul 31, 2015 at 3:53 PM, John Spray  wrote:
>>
>>
>> On 31/07/15 09:47, Mallikarjun Biradar wrote:
>>>
>>> For a moment it de-list removed OSD's and after sometime it again
>>> comes up in ceph osd tree listing.
>>>
>>
>> Is the OSD service itself definitely stopped?  Are you using any
>> orchestration systems (puppet, chef) that might be re-creating its auth key
>> etc?
>>
>> John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Check networking first?

2015-07-31 Thread Mark Nelson

On 07/31/2015 05:21 AM, John Spray wrote:

On 31/07/15 06:27, Stijn De Weirdt wrote:

wouldn't it be nice that ceph does something like this in background
(some sort of network-scrub). debugging network like this is not that
easy (can't expect admins to install e.g. perfsonar on all nodes
and/or clients)

something like: every X min, each service X pick a service Y on
another host (assuming X and Y will exchange some communication at
some point; like osd with other osd), send 1MB of data, and make the
timing data available so we can monitor it and detect underperforming
links over time.

ideally clients also do this, but not sure where they should
report/store the data.

interpreting the data can be a bit tricky, but extreme outliers will
be spotted easily, and the main issue with this sort of debugging is
collecting the data.

simply reporting / keeping track of ongoing communications is already
a big step forward, but then we need to have the size of the exchanged
data to allow interpretation (and the timing should be about the
network part, not e.g. flush data to disk in case of an osd). (and
obviously sampling is enough, no need to have details of every bit send).


Yes, it's a reasonable concept, although it's not clear that we'd
necessarily want it built into existing ceph services.  For example,
where there are several OSDs running on a host, we don't really want all
the OSDs redundantly verifying that particular host's network
functionality.  This use case is a pretty good argument for a ceph
supervisor service of some kind that exists on a one-per-host basis. The
trick is finding someone with time to write it :-)

The prior art here is Lustres LNET self test (LST) which exists for
exactly these reasons (Mark will have memories of this too I'm sure).


Haha, yes!  So I think something written to use the messenger might be 
the right way to go.  I'll probably build all-to-all and one-to-all 
tests using iperf into CBT at some point as a hold over since I've got a 
couple of simple scripts that do that already.  It'd probably be fairly 
easy to implement.




John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Elastic-sized RBD planned?

2015-07-31 Thread pixelfairy
rbd is already thin provisioned. when you set its size, your setting the
maximum size. its explained here,
http://ceph.com/docs/master/rbd/rados-rbd-cmds/

On Thu, Jul 30, 2015 at 12:04 PM Robert LeBlanc 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> I'll take a stab at this.
>
> I don't think it will be a feature that you will find in Ceph due to
> the fact that Ceph doesn't really understand what is going on inside
> the RBD. There are too many technologies that can use RBD that it is
> not feasible to try and support something like this.
>
> You can however have a service that runs in your VM which monitors
> free space. Then the free space gets too low, it can call to a service
> you write which will then expand the RBD on the fly and then the VM
> itself and resize the partitions and the file system after the RBD is
> expanded.
>
> I'm not sure how CephFS fits into all of this as it is different than
> RBD. CephFS is to NFS as RBD is to SAN block storage.
>
> If I misunderstood the question, please clarify.
> -BEGIN PGP SIGNATURE-
> Version: Mailvelope v0.13.1
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJVunUCCRDmVDuy+mK58QAAwGcP/0Y5gz4flZ0XuaXdKlez
> iogX9QsAoPk9s8fd0vpc3Prlhx/YODgkqqm35iJh5PaDEqb6njZwe3CR0WBh
> mba6xAjSGe9D8dzkLP5cCvRSlkVexddAfj5K/M+JkjWyQhlq4TQcu3CSBo8Q
> 1pyI6dDwWNl8ScCu4/PN4Bl2OD9favEs8tXjNJ5/mhZWFjSN7t8/LLqsUNu6
> AlMp2hFFfb1f0ky7EI/Hg3uw0+BbGDn/N0oxDyZqH7Wqnp/5L4kRAkZkXsGh
> T7a4KxReNu/5eg2Tef83h7AAeRTSDTEw+38ToGnOIGpXYCxKdWF878Xs7tVa
> +jPUds7aSNazaRB9nSPYiIxXBcTRKN2VFqhFNyQ/6CrEvsjkZQKkQfjJgNtL
> eg0hmjS1X0QryapX7xhfz+Apx369Pkitm8UosyPIwEPnMuVqwVN5VwDTDkub
> FlGNHX+b1/NDgZDpWF+b5gOErHMW8kWRNt/+2i5pXj0ZjDADmrQn+Hd9G0Hx
> g1dot64vLogcvcyt0C+fLicF9xlddU/Zuz7VZLyIOH1KSVhABK1RaI8+Zws6
> ZWriDFal0ztd0BNEQlCqtlo4hyY/AVies9qB6V4sUL0UuEL/+iTj71/VNh09
> RJIURK6KySwxtW97pMGGafw5xPOjpxnm75D6AmovZ6WV68GSxlNpTY/V7CPH
> dW/U
> =h5n5
> -END PGP SIGNATURE-
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Jul 29, 2015 at 11:32 PM, Shneur Zalman Mattern
>  wrote:
> > Hi to all!
> >
> >
> > Perhaps, somebody already thought about, but my Googling had no results.
> >
> >
> > How can I do RBD that will grow on demand of VM/client disk space.
> >
> > Are there in Ceph some options for this?
> >
> > Is it planned to do?
> >
> > Is it utopic idea?
> >
> > Is this client need CephFS already?
> >
> >
> > Thanks,
> >
> > Shneur
> >
> >
> >
> >
> >
> >
> 
> > This footnote confirms that this email message has been scanned by
> > PineApp Mail-SeCure for the presence of malicious code, vandals &
> computer
> > viruses.
> >
> 
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Elastic-sized RBD planned?

2015-07-31 Thread pixelfairy
also, you probably want to reclaim unused space when you delete files.
http://ceph.com/docs/master/rbd/qemu-rbd/#enabling-discard-trim

On Fri, Jul 31, 2015 at 3:54 AM pixelfairy  wrote:

> rbd is already thin provisioned. when you set its size, your setting the
> maximum size. its explained here,
> http://ceph.com/docs/master/rbd/rados-rbd-cmds/
>
> On Thu, Jul 30, 2015 at 12:04 PM Robert LeBlanc 
> wrote:
>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> I'll take a stab at this.
>>
>> I don't think it will be a feature that you will find in Ceph due to
>> the fact that Ceph doesn't really understand what is going on inside
>> the RBD. There are too many technologies that can use RBD that it is
>> not feasible to try and support something like this.
>>
>> You can however have a service that runs in your VM which monitors
>> free space. Then the free space gets too low, it can call to a service
>> you write which will then expand the RBD on the fly and then the VM
>> itself and resize the partitions and the file system after the RBD is
>> expanded.
>>
>> I'm not sure how CephFS fits into all of this as it is different than
>> RBD. CephFS is to NFS as RBD is to SAN block storage.
>>
>> If I misunderstood the question, please clarify.
>> -BEGIN PGP SIGNATURE-
>> Version: Mailvelope v0.13.1
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJVunUCCRDmVDuy+mK58QAAwGcP/0Y5gz4flZ0XuaXdKlez
>> iogX9QsAoPk9s8fd0vpc3Prlhx/YODgkqqm35iJh5PaDEqb6njZwe3CR0WBh
>> mba6xAjSGe9D8dzkLP5cCvRSlkVexddAfj5K/M+JkjWyQhlq4TQcu3CSBo8Q
>> 1pyI6dDwWNl8ScCu4/PN4Bl2OD9favEs8tXjNJ5/mhZWFjSN7t8/LLqsUNu6
>> AlMp2hFFfb1f0ky7EI/Hg3uw0+BbGDn/N0oxDyZqH7Wqnp/5L4kRAkZkXsGh
>> T7a4KxReNu/5eg2Tef83h7AAeRTSDTEw+38ToGnOIGpXYCxKdWF878Xs7tVa
>> +jPUds7aSNazaRB9nSPYiIxXBcTRKN2VFqhFNyQ/6CrEvsjkZQKkQfjJgNtL
>> eg0hmjS1X0QryapX7xhfz+Apx369Pkitm8UosyPIwEPnMuVqwVN5VwDTDkub
>> FlGNHX+b1/NDgZDpWF+b5gOErHMW8kWRNt/+2i5pXj0ZjDADmrQn+Hd9G0Hx
>> g1dot64vLogcvcyt0C+fLicF9xlddU/Zuz7VZLyIOH1KSVhABK1RaI8+Zws6
>> ZWriDFal0ztd0BNEQlCqtlo4hyY/AVies9qB6V4sUL0UuEL/+iTj71/VNh09
>> RJIURK6KySwxtW97pMGGafw5xPOjpxnm75D6AmovZ6WV68GSxlNpTY/V7CPH
>> dW/U
>> =h5n5
>> -END PGP SIGNATURE-
>> 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Jul 29, 2015 at 11:32 PM, Shneur Zalman Mattern
>>  wrote:
>> > Hi to all!
>> >
>> >
>> > Perhaps, somebody already thought about, but my Googling had no results.
>> >
>> >
>> > How can I do RBD that will grow on demand of VM/client disk space.
>> >
>> > Are there in Ceph some options for this?
>> >
>> > Is it planned to do?
>> >
>> > Is it utopic idea?
>> >
>> > Is this client need CephFS already?
>> >
>> >
>> > Thanks,
>> >
>> > Shneur
>> >
>> >
>> >
>> >
>> >
>> >
>> 
>> > This footnote confirms that this email message has been scanned by
>> > PineApp Mail-SeCure for the presence of malicious code, vandals &
>> computer
>> > viruses.
>> >
>> 
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] problem with RGW

2015-07-31 Thread Butkeev Stas
Hello everybody

We have ceph cluster that consist of 8 host with 12 osd per each host. It's 2T 
SATA disks.

[13:23]:[root@se087  ~]# ceph osd tree
ID  WEIGHTTYPE NAMEUP/DOWN REWEIGHT 
PRIMARY-AFFINITY 
 -1 182.99203 root default  
 
 -2 182.99203 region RU 
 
 -3  91.49487 datacenter ru-msk-comp1p  
 
 -9  22.87500 host 1 
 48   1.90599 osd.48up  1.0  
1.0 
 49   1.90599 osd.49up  1.0  
1.0 
 50   1.90599 osd.50up  1.0  
1.0 
 51   1.90599 osd.51up  1.0  
1.0 
 52   1.90599 osd.52up  1.0  
1.0 
 53   1.90599 osd.53up  1.0  
1.0 
 54   1.90599 osd.54up  1.0  
1.0 
 55   1.90599 osd.55up  1.0  
1.0 
 56   1.90599 osd.56up  1.0  
1.0 
 57   1.90599 osd.57up  1.0  
1.0 
 58   1.90599 osd.58up  1.0  
1.0 
 59   1.90599 osd.59up  1.0  
1.0 
-10  22.87216 host 2 
 60   1.90599 osd.60up  1.0  
1.0 
 61   1.90599 osd.61up  1.0  
1.0 
 62   1.90599 osd.62up  1.0  
1.0 
 63   1.90599 osd.63up  1.0  
1.0 
 64   1.90599 osd.64up  1.0  
1.0 
 65   1.90599 osd.65up  1.0  
1.0 
 66   1.90599 osd.66up  1.0  
1.0 
 67   1.90599 osd.67up  1.0  
1.0 
 69   1.90599 osd.69up  1.0  
1.0 
 70   1.90599 osd.70up  1.0  
1.0 
 71   1.90599 osd.71up  1.0  
1.0 
 68   1.90627 osd.68up  1.0  
1.0 
-11  22.87500 host 3 
 72   1.90599 osd.72up  1.0  
1.0 
 73   1.90599 osd.73up  1.0  
1.0 
 74   1.90599 osd.74up  1.0  
1.0 
 75   1.90599 osd.75up  1.0  
1.0 
 76   1.90599 osd.76up  1.0  
1.0 
 77   1.90599 osd.77up  1.0  
1.0 
 78   1.90599 osd.78up  1.0  
1.0 
 79   1.90599 osd.79up  1.0  
1.0 
 80   1.90599 osd.80up  1.0  
1.0 
 81   1.90599 osd.81up  1.0  
1.0 
 82   1.90599 osd.82up  1.0  
1.0 
 83   1.90599 osd.83up  1.0  
1.0 
-12  22.87271 host 4 
 84   1.90599 osd.84up  1.0  
1.0 
 86   1.90599 osd.86up  1.0  
1.0 
 89   1.90599 osd.89up  1.0  
1.0 
 90   1.90599 osd.90up  1.0  
1.0 
 91   1.90599 osd.91up  1.0  
1.0 
 92   1.90599 osd.92up  1.0  
1.0 
 93   1.90599 osd.93up  1.0  
1.0 
 94   1.90599 osd.94up  1.0  
1.0 
 95   1.90599 osd.95up  1.0  
1.0 
 85   1.90627 osd.85up  1.0  
1.0 
 88   1.90627 osd.88up  1.0  
1.0 
 87   1.90627 osd.87up  1.0  
1.0 
 -4  91.49716 datacenter ru-msk-vol51   
 
 -5  22.87216 host 5 
  1   1.90599 osd.1 up  1.000

[ceph-users] update docs? just mounted a format2 rbd image with client 0.80.8 server 0.87.2

2015-07-31 Thread pixelfairy
according to http://ceph.com/docs/master/rbd/rbd-snapshot/#layering,
you have two choices,

format 1: you can mount with rbd kernel module
format 2: you can clone

just mapped and mounted a this image,
rbd image 'vm-101-disk-2': size 5120 MB in 1280 objects order 22 (4096 kB
objects) block_name_prefix: rbd_data.11e62ae8944a format: 2 features:
layering
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD startup causing slow requests - one tip from me

2015-07-31 Thread Georgios Dimitrakakis

Jan,

this is very handy to know! Thanks for sharing with us!

People, do you believe that it would be nice to have a place where we 
can gather either good practices or problem resolutions or tips from the 
community? We could have a voting system and those with the most votes 
(or above a threshold) could appear there.


Regards,

George


I know a few other people here were battling with the occasional
issue of OSD being extremely slow when starting.

I personally run OSDs mixed with KVM guests on the same nodes, and
was baffled by this issue occuring mostly on the most idle (empty)
machines.
Thought it was some kind of race condition where OSD started too fast
and disks couldn’t catch up, was investigating latency of CPUs and
cards on a mostly idle hardware etc. - with no improvement.

But in the end, most of my issues were caused by page cache using too
much memory. This doesn’t cause any problems when the OSDs have their
memory allocated and are running, but when the OSD is (re)started, OS
struggles to allocate contiguous blocks of memory for it and its
buffers.
This could also be why I’m seeing such an improvement with my NUMA
pinning script - cleaning memory on one node is probably easier and
doesn’t block allocations on other nodes.

How can you tell if this is your case? When restarting an OSD that
has this issue, look for CPU usage of “kswapd” processes. If it is >0
then you have this issue and would benefit from setting this:

for i in $(mount |grep "ceph/osd" |cut -d' ' -f1 |cut -d'/' -f3 |tr
-d '[0-9]') ; do echo 1 >/sys/block/$i/bdi/max_ratio ; done
(another option is echo 1 > drop_caches before starting the OSD, but
that’s a bit brutal)

What this does is it limits the pagecache size for each block device
to 1% of physical memory. I’d like to limit it even further but it
doesn’t understand “0.3”...

Let me know if it helps, I’ve not been able to test if this cures the
problem completely, but there was no regression after setting it.

Jan

P.S. This is for RHEL 6 / CentOS 6 ancient 2.6.32 kernel, newer
kernels have tunables to limit the overall pagecache size. You can
also set the limits in cgroups but I’m afraid that won’t help in this
case as you can only set the whole memory footprint limit where it
will battle for allocations anyway.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] dropping old distros: el6, precise 12.04, debian wheezy?

2015-07-31 Thread Sage Weil
On Fri, 31 Jul 2015, Alexandre DERUMIER wrote:
> >>As I still haven't heard or seen about any upstream distros for Debian 
> >>Jessie (see also [1]),
> 
> Gitbuilder is already done for jessie
> 
> http://gitbuilder.ceph.com/ceph-deb-jessie-x86_64-basic/
> 
> @Sage : Don't known if something is blocking to release package officially ?

I don't think there are specific blockers except Alfredo's time.  We need 
to set up another release build target machine in Jenkins.  Alfredo's been 
rearchitecting the release build process and I've been hoping he can avoid 
touching the old ball of duct tape and add Jessie to the new hotness, but 
I'm not sure what the timeline looks like... Alfredo?

sage


> 
> 
> 
> - Mail original -
> De: "Brian Kroth" 
> À: "Sage Weil" 
> Cc: "ceph-devel" , "ceph-users" 
> 
> Envoyé: Jeudi 30 Juillet 2015 17:58:12
> Objet: Re: [ceph-users] dropping old distros: el6, precise 12.04, debian 
> wheezy?
> 
> Sage Weil  2015-07-30 06:54: 
> >As time marches on it becomes increasingly difficult to maintain proper 
> >builds and packages for older distros. For example, as we make the 
> >systemd transition, maintaining the kludgey sysvinit and udev support for 
> >centos6/rhel6 is a pain in the butt and eats up time and energy to 
> >maintain and test that we could be spending doing more useful work. 
> > 
> >"Dropping" them would mean: 
> > 
> > - Ongoing development on master (and future versions like infernalis and 
> >jewel) would not be tested on these distros. 
> > 
> > - We would stop building upstream release packages on ceph.com for new 
> >releases. 
> > 
> > - We would probably continue building hammer and firefly packages for 
> >future bugfix point releases. 
> > 
> > - The downstream distros would probably continue to package them, but the 
> >burden would be on them. For example, if Ubuntu wanted to ship Jewel on 
> >precise 12.04, they could, but they'd probably need to futz with the 
> >packaging and/or build environment to make it work. 
> > 
> >So... given that, I'd like to gauge user interest in these old distros. 
> >Specifically, 
> > 
> > CentOS6 / RHEL6 
> > Ubuntu precise 12.04 
> > Debian wheezy 
> > 
> >Would anyone miss them? 
> > 
> >In particular, dropping these three would mean we could drop sysvinit 
> >entirely and focus on systemd (and continue maintaining the existing 
> >upstart files for just a bit longer). That would be a relief. (The 
> >sysvinit files wouldn't go away in the source tree, but we wouldn't worry 
> >about packaging and testing them properly.) 
> > 
> >Thanks! 
> >sage 
> 
> As I still haven't heard or seen about any upstream distros for Debian 
> Jessie (see also [1]), I am still running Debian Wheezy and as that is 
> supposed to be supported for another ~4 years by Debian, it would be 
> very nice if there were at least stability and security fixes backported 
> for the upstream ceph package repositories for that platform. 
> 
> Additionally, I'll note that I'm personally likely to continue to use 
> sysvinit so long as I still can, even when I am able to make the switch 
> to Jessie. 
> 
> Thanks, 
> Brian 
> 
> [1]  
> 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] update docs? just mounted a format2 rbd image with client 0.80.8 server 0.87.2

2015-07-31 Thread Ilya Dryomov
On Fri, Jul 31, 2015 at 2:21 PM, pixelfairy  wrote:
> according to http://ceph.com/docs/master/rbd/rbd-snapshot/#layering,
> you have two choices,
>
> format 1: you can mount with rbd kernel module
> format 2: you can clone
>
> just mapped and mounted a this image,
> rbd image 'vm-101-disk-2': size 5120 MB in 1280 objects order 22 (4096 kB
> objects) block_name_prefix: rbd_data.11e62ae8944a format: 2 features:
> layering

Yeah, I'll fix it.  You can clone from format 2 images only, but you
can map both format 1 and format 2, provided the format 2 image was
created with the default striping pattern (i.e. stripe_unit ==
object_size, stripe_count == 1).

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] update docs? just mounted a format2 rbd image with client 0.80.8 server 0.87.2

2015-07-31 Thread Shinobu Kinjo
Thanks for your quick action!!

 - Shinobu

On Fri, Jul 31, 2015 at 11:01 PM, Ilya Dryomov  wrote:

> On Fri, Jul 31, 2015 at 2:21 PM, pixelfairy  wrote:
> > according to http://ceph.com/docs/master/rbd/rbd-snapshot/#layering,
> > you have two choices,
> >
> > format 1: you can mount with rbd kernel module
> > format 2: you can clone
> >
> > just mapped and mounted a this image,
> > rbd image 'vm-101-disk-2': size 5120 MB in 1280 objects order 22 (4096 kB
> > objects) block_name_prefix: rbd_data.11e62ae8944a format: 2 features:
> > layering
>
> Yeah, I'll fix it.  You can clone from format 2 images only, but you
> can map both format 1 and format 2, provided the format 2 image was
> created with the default striping pattern (i.e. stripe_unit ==
> object_size, stripe_count == 1).
>
> Thanks,
>
> Ilya
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Email:
 shin...@linux.com
 ski...@redhat.com

 Life w/ Linux 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] dropping old distros: el6, precise 12.04, debian wheezy?

2015-07-31 Thread Brian Kroth
That's good to hear.  Thanks for the heads up.  We're going to be 
getting another pile of hardware in the next couple of weeks and I'd 
prefer to not have to start with Wheezy just to have to move to Jessie a 
little bit later on.  As someone said earlier, OS rollouts take some care 
to do in large environments.  In the meantime I'll just keep refreshing 
http://ceph.com/debian/dists/ every couple of days :)


Cheers,
Brian

Alexandre DERUMIER  2015-07-31 09:21:

As I still haven't heard or seen about any upstream distros for Debian
Jessie (see also [1]),


Gitbuilder is already done for jessie

http://gitbuilder.ceph.com/ceph-deb-jessie-x86_64-basic/

@Sage : Don't known if something is blocking to release package officially ?



- Mail original -
De: "Brian Kroth" 
À: "Sage Weil" 
Cc: "ceph-devel" , "ceph-users" 

Envoyé: Jeudi 30 Juillet 2015 17:58:12
Objet: Re: [ceph-users] dropping old distros: el6, precise 12.04, debian wheezy?

Sage Weil  2015-07-30 06:54:

As time marches on it becomes increasingly difficult to maintain proper
builds and packages for older distros. For example, as we make the
systemd transition, maintaining the kludgey sysvinit and udev support for
centos6/rhel6 is a pain in the butt and eats up time and energy to
maintain and test that we could be spending doing more useful work.

"Dropping" them would mean:

- Ongoing development on master (and future versions like infernalis and
jewel) would not be tested on these distros.

- We would stop building upstream release packages on ceph.com for new
releases.

- We would probably continue building hammer and firefly packages for
future bugfix point releases.

- The downstream distros would probably continue to package them, but the
burden would be on them. For example, if Ubuntu wanted to ship Jewel on
precise 12.04, they could, but they'd probably need to futz with the
packaging and/or build environment to make it work.

So... given that, I'd like to gauge user interest in these old distros.
Specifically,

CentOS6 / RHEL6
Ubuntu precise 12.04
Debian wheezy

Would anyone miss them?

In particular, dropping these three would mean we could drop sysvinit
entirely and focus on systemd (and continue maintaining the existing
upstart files for just a bit longer). That would be a relief. (The
sysvinit files wouldn't go away in the source tree, but we wouldn't worry
about packaging and testing them properly.)

Thanks!
sage


As I still haven't heard or seen about any upstream distros for Debian
Jessie (see also [1]), I am still running Debian Wheezy and as that is
supposed to be supported for another ~4 years by Debian, it would be
very nice if there were at least stability and security fixes backported
for the upstream ceph package repositories for that platform.

Additionally, I'll note that I'm personally likely to continue to use
sysvinit so long as I still can, even when I am able to make the switch
to Jessie.

Thanks,
Brian

[1] 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD startup causing slow requests - one tip from me

2015-07-31 Thread Haomai Wang
On Fri, Jul 31, 2015 at 5:47 PM, Jan Schermer  wrote:
> I know a few other people here were battling with the occasional issue of OSD 
> being extremely slow when starting.
>
> I personally run OSDs mixed with KVM guests on the same nodes, and was 
> baffled by this issue occuring mostly on the most idle (empty) machines.
> Thought it was some kind of race condition where OSD started too fast and 
> disks couldn’t catch up, was investigating latency of CPUs and cards on a 
> mostly idle hardware etc. - with no improvement.
>
> But in the end, most of my issues were caused by page cache using too much 
> memory. This doesn’t cause any problems when the OSDs have their memory 
> allocated and are running, but when the OSD is (re)started, OS struggles to 
> allocate contiguous blocks of memory for it and its buffers.
> This could also be why I’m seeing such an improvement with my NUMA pinning 
> script - cleaning memory on one node is probably easier and doesn’t block 
> allocations on other nodes.
>

Although this is make sense to me. It still let me shocked by the fact
that pagecache free or memory fragmentation will cause slow request!

> How can you tell if this is your case? When restarting an OSD that has this 
> issue, look for CPU usage of “kswapd” processes. If it is >0 then you have 
> this issue and would benefit from setting this:
>
> for i in $(mount |grep "ceph/osd" |cut -d' ' -f1 |cut -d'/' -f3 |tr -d 
> '[0-9]') ; do echo 1 >/sys/block/$i/bdi/max_ratio ; done
> (another option is echo 1 > drop_caches before starting the OSD, but that’s a 
> bit brutal)
>
> What this does is it limits the pagecache size for each block device to 1% of 
> physical memory. I’d like to limit it even further but it doesn’t understand 
> “0.3”...
>
> Let me know if it helps, I’ve not been able to test if this cures the problem 
> completely, but there was no regression after setting it.
>
> Jan
>
> P.S. This is for RHEL 6 / CentOS 6 ancient 2.6.32 kernel, newer kernels have 
> tunables to limit the overall pagecache size. You can also set the limits in 
> cgroups but I’m afraid that won’t help in this case as you can only set the 
> whole memory footprint limit where it will battle for allocations anyway.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados bench multiple clients error

2015-07-31 Thread Kenneth Waegeman

Hi,

I was trying rados bench, and first wrote 250 objects from 14 hosts with 
 --no-cleanup. Then I ran the read tests from the same 14 hosts and ran 
into this:


[root@osd007 test]# /usr/bin/rados -p ectest bench 100 seq
2015-07-31 17:52:51.027872 7f6c40de17c0 -1 WARNING: the following 
dangerous and experimental features are enabled: keyvaluestore


   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
 0   0 0 0 0 0 - 0
read got -2
error during benchmark: -5
error 5: (5) Input/output error

The objects are there:
...
benchmark_data_osd011.gigalith.os_39338_object2820
benchmark_data_osd004.gigalith.os_142795_object3059
benchmark_data_osd001.gigalith.os_98375_object1182
benchmark_data_osd007.gigalith.os_20502_object2226
benchmark_data_osd008.gigalith.os_3059_object2183
benchmark_data_osd001.gigalith.os_94812_object1390
benchmark_data_osd010.gigalith.os_37614_object253
benchmark_data_osd011.gigalith.os_41998_object1093
benchmark_data_osd009.gigalith.os_90933_object1270
benchmark_data_osd010.gigalith.os_35614_object393
benchmark_data_osd009.gigalith.os_90933_object2611
benchmark_data_osd010.gigalith.os_35614_object2114
benchmark_data_osd013.gigalith.os_29915_object976
benchmark_data_osd014.gigalith.os_45604_object2497
benchmark_data_osd003.gigalith.os_147071_object1775
...


This works when only using 1 host..
Is there a way to run the benchmarks with multiple instances?

I'm looking to find what our performance problem is, and what the 
difference is between directly reading objects from the erasure coded 
pool and through the cache layer.


I tested to read large files that weren't in cache from 14 hosts through 
cephfs ( cached files are performing enough) and got only 8MB/ stream, 
while our disks were hardly working (as seen in iostat)
So my next steps would be to run these tests through rados: first 
directly on ecpool, and then on cache pool.. Someone an idea?



Thank you!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD startup causing slow requests - one tip from me

2015-07-31 Thread Jan Schermer

> On 31 Jul 2015, at 17:28, Haomai Wang  wrote:
> 
> On Fri, Jul 31, 2015 at 5:47 PM, Jan Schermer  wrote:
>> I know a few other people here were battling with the occasional issue of 
>> OSD being extremely slow when starting.
>> 
>> I personally run OSDs mixed with KVM guests on the same nodes, and was 
>> baffled by this issue occuring mostly on the most idle (empty) machines.
>> Thought it was some kind of race condition where OSD started too fast and 
>> disks couldn’t catch up, was investigating latency of CPUs and cards on a 
>> mostly idle hardware etc. - with no improvement.
>> 
>> But in the end, most of my issues were caused by page cache using too much 
>> memory. This doesn’t cause any problems when the OSDs have their memory 
>> allocated and are running, but when the OSD is (re)started, OS struggles to 
>> allocate contiguous blocks of memory for it and its buffers.
>> This could also be why I’m seeing such an improvement with my NUMA pinning 
>> script - cleaning memory on one node is probably easier and doesn’t block 
>> allocations on other nodes.
>> 
> 
> Although this is make sense to me. It still let me shocked by the fact
> that pagecache free or memory fragmentation will cause slow request!

“Fragmentation” may be inaccurate description. I know that is an issue for 
atomic kernel allocations (DMA, driver buffers…) where it can lead to memory 
starvation even though “free” shows tens of gigabytes free. This manifests in 
the same way except there’s no “page allocation failure” in dmesg when this 
happens, probably because there are no strict deadlines for satisfying userland 
requests.
And although I’ve looked into it, I can’t say I am 100% sure my explanation is 
correct. 
Anyway if you start a process that needs ~2GB resident memory to work, you need 
to clean 2GB of pagecache - either by dropping the clean pages or by writing 
out dirty pages. And writing those pages under pressure while being bombarded 
by new allocations is not that fast.

RH kernels are quite twisted with backported features that shouldn’t be there, 
some features that are no longer anywhere else, and their interaction and 
function is often unclear so that might be another issue in my case.
As a side note, did you know that barriers don’t actually exist on RH (6) 
kernels? They replaced it with FUA backport… So does it actually behave the 
same way as newer kernels do? I can take a guess from what I’ve seen… :-)

Have a nice weekend

Jan

> 
>> How can you tell if this is your case? When restarting an OSD that has this 
>> issue, look for CPU usage of “kswapd” processes. If it is >0 then you have 
>> this issue and would benefit from setting this:
>> 
>> for i in $(mount |grep "ceph/osd" |cut -d' ' -f1 |cut -d'/' -f3 |tr -d 
>> '[0-9]') ; do echo 1 >/sys/block/$i/bdi/max_ratio ; done
>> (another option is echo 1 > drop_caches before starting the OSD, but that’s 
>> a bit brutal)
>> 
>> What this does is it limits the pagecache size for each block device to 1% 
>> of physical memory. I’d like to limit it even further but it doesn’t 
>> understand “0.3”...
>> 
>> Let me know if it helps, I’ve not been able to test if this cures the 
>> problem completely, but there was no regression after setting it.
>> 
>> Jan
>> 
>> P.S. This is for RHEL 6 / CentOS 6 ancient 2.6.32 kernel, newer kernels have 
>> tunables to limit the overall pagecache size. You can also set the limits in 
>> cgroups but I’m afraid that won’t help in this case as you can only set the 
>> whole memory footprint limit where it will battle for allocations anyway.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> Best Regards,
> 
> Wheat

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] some basic concept questions

2015-07-31 Thread Charley Guan

dear Ceph experts;

I am pretty new in ceph project and we are working on a management
infrastructure and using Ceph / Calamari as our storage resource.

I have some basic questions:

1) what is the purpose of installing and configuring salt-master and
salt-minion in Ceph environment?

is this true that salt-master installed in calamari master machine and
calamari-minion would be configured in Ceph clusters?


2) what is the functional difference between a Ceph admin node and calamari
master? Both are doing managing and minitoring.


Thanks a lot for your help,

Charley
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD removal is not cleaning entry from osd listing

2015-07-31 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I usually do the crush rm step second to last. I don't know if your
modifying the osd after removing it from the CRUSH is putting it back
in.
1. Stop OSD process
2. ceph osd rm
3. ceph osd crush rm osd.
4. ceph auth del osd.

Can you try the crush rm command again for kicks and giggles?

-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVu6u+CRDmVDuy+mK58QAAe8sP/2FCIN7Aufifp0BA8vGu
k9qJiCxwq59t/ucTWmb1iJo0wxyWtElImIs72b+f7bcZfuds2IU9jPys0AJJ
83AairDCcmTD8f8X+IuFF3jG2L3pt1SBB2I1fpxjvaDHCjZVsB8EHFixjadM
DtxY0UDocU8gfVFNA2OWguqvu1tphsZ6p2muZehZZ7AZIvFyi8Ls7IZD5kGf
wmXL3Omv0q/b9Es8NXXk3OwwThxp5lYLz2RkNoe6ThXd4R65uaNL/iZt9RvD
Xtsjgik9sT9L/jXieY6kPG0IumuiYivJkswy1SnWyPeRPF/yTTzSAKC2cx6D
KMfBNwqxYIx5BymVFu7k38clY64U9uIhqbaW7VujvQ/Bs0/1ERv1mltoajZb
1fS8s75xpWPf5W2B80rg361ukExzH5y+X+fZvVjbcKDBE8GECN9T0oy3YNM0
C7S/YRkJr5yr0/scaL7Z5nrq2/MLgJHF2bK1y25SGDkdjm5d2YpF0LMeT8Gp
MIpKDA0LJnznEs5YkIa7u6NkWhQ3netiNJkC8XOlr5NYrBfDQlVrkDtiJPHl
GGoIk/vPuDWNp2x0g2rAbRLS61zSi2Oo1D6PNFa6cFU9/QW8cGWZ8zGDOf+C
GepwY8UHA0uDJv31IOWvsTABPvI7D1I3rimkBZU72QYbrrS8/uu/hZEQwF5k
Ltce
=hyW9
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Fri, Jul 31, 2015 at 1:15 AM, Mallikarjun Biradar
 wrote:
> Hi,
>
> I had 27 OSD's in my cluster. I removed two of the OSD from (osd.20)
> host-3 & (osd.22) host-6.
>
> user@host-1:~$ sudo ceph osd tree
> ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 184.67990 root default
> -7  82.07996 chassis chassis2
> -4  41.03998 host host-3
>  8   6.84000 osd.8  up  1.0  1.0
>  9   6.84000 osd.9  up  1.0  1.0
> 10   6.84000 osd.10 up  1.0  1.0
> 11   6.84000 osd.11 up  1.0  1.0
> 20   6.84000 osd.20 up  1.0  1.0
> 21   6.84000 osd.21 up  1.0  1.0
> -5  41.03998 host host-6
> 12   6.84000 osd.12 up  1.0  1.0
> 13   6.84000 osd.13 up  1.0  1.0
> 14   6.84000 osd.14 up  1.0  1.0
> 15   6.84000 osd.15 up  1.0  1.0
> 22   6.84000 osd.22 up  1.0  1.0
> 23   6.84000 osd.23 up  1.0  1.0
> -6 102.59995 chassis chassis1
> -2  47.87997 host host-1
>  0   6.84000 osd.0  up  1.0  1.0
>  1   6.84000 osd.1  up  1.0  1.0
>  2   6.84000 osd.2  up  1.0  1.0
>  3   6.84000 osd.3  up  1.0  1.0
> 16   6.84000 osd.16 up  1.0  1.0
> 17   6.84000 osd.17 up  1.0  1.0
> 24   6.84000 osd.24 up  1.0  1.0
> -3  54.71997 host host-2
>  4   6.84000 osd.4  up  1.0  1.0
>  5   6.84000 osd.5  up  1.0  1.0
>  6   6.84000 osd.6  up  1.0  1.0
>  7   6.84000 osd.7  up  1.0  1.0
> 18   6.84000 osd.18 up  1.0  1.0
> 19   6.84000 osd.19 up  1.0  1.0
> 25   6.84000 osd.25 up  1.0  1.0
> 26   6.84000 osd.26 up  1.0  1.0
> user@host-1:~$
>
> Steps used to remove OSD:
> user@host-1:~$ ceph auth del osd.20; ceph osd crush rm osd.20; ceph
> osd down osd.20; ceph osd rm osd.20
> updated
> removed item id 20 name 'osd.20' from crush map
> marked down osd.22.
> removed osd.22
>
> Removed both of OSD's osd.20 & osd.22
>
> But, even after removing them, ceph osd tree is listing deleted OSD's
> & ceph -s reporting total number of OSD's as 27.
>
> user@host-1:~$ sudo ceph osd tree
> ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 184.67990 root default
> -7  82.07996 chassis chassis2
> -4  41.03998 host host-3
>  8   6.84000 osd.8  up  1.0  1.0
>  9   6.84000 osd.9  up  1.0  1.0
> 10   6.84000 osd.10 up  1.0  1.0
> 11   6.84000 osd.11 up  1.0  1.0
> 21   6.84000 osd.21 up  1.0  1.0
> -5  41.03998 host host-6
> 12   6.84000 osd.12 up  1.0  1.0
> 13   6.84000 osd

Re: [ceph-users] Check networking first?

2015-07-31 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Even just a ping at max MTU set with nodefrag could tell a lot about
connectivity issues and latency without a lot of traffic. Using Ceph
messenger would be even better to check firewall ports. I like the
idea of incorporating simple network checks into Ceph. The monitor can
correlate failures and help determine if the problem is related to one
host from the CRUSH map.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Jul 30, 2015 at 11:27 PM, Stijn De Weirdt  wrote:
> wouldn't it be nice that ceph does something like this in background (some
> sort of network-scrub). debugging network like this is not that easy (can't
> expect admins to install e.g. perfsonar on all nodes and/or clients)
>
> something like: every X min, each service X pick a service Y on another host
> (assuming X and Y will exchange some communication at some point; like osd
> with other osd), send 1MB of data, and make the timing data available so we
> can monitor it and detect underperforming links over time.
>
> ideally clients also do this, but not sure where they should report/store
> the data.
>
> interpreting the data can be a bit tricky, but extreme outliers will be
> spotted easily, and the main issue with this sort of debugging is collecting
> the data.
>
> simply reporting / keeping track of ongoing communications is already a big
> step forward, but then we need to have the size of the exchanged data to
> allow interpretation (and the timing should be about the network part, not
> e.g. flush data to disk in case of an osd). (and obviously sampling is
> enough, no need to have details of every bit send).
>
>
>
> stijn
>
>
> On 07/30/2015 08:04 PM, Mark Nelson wrote:
>>
>> Thanks for posting this!  We see issues like this more often than you'd
>> think.  It's really important too because if you don't figure it out the
>> natural inclination is to blame Ceph! :)
>>
>> Mark
>>
>> On 07/30/2015 12:50 PM, Quentin Hartman wrote:
>>>
>>> Just wanted to drop a note to the group that I had my cluster go
>>> sideways yesterday, and the root of the problem was networking again.
>>> Using iperf I discovered that one of my nodes was only moving data at
>>> 1.7Mb / s. Moving that node to a different switch port with a different
>>> cable has resolved the problem. It took awhile to track down because
>>> none of the server-side error metrics for disk or network showed
>>> anything was amiss, and I didn't think to test network performance (as
>>> suggested in another thread) until well into the process.
>>>
>>> Check networking first!
>>>
>>> QH
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVu7QoCRDmVDuy+mK58QAAcpAQAKbv6xPRxMMJ8NWrXym0
NAtZFIYywvStKfTG2pL1xjb2p/xDM+6Z5mnYJTBHb+0dkGIO6qe0jF9t4XEE
ppH+55eIpkCZrKMdfN1L0vUe9ldFnJS2jsAlGkvzyRLJale++q1evymIAaWb
JnEZgV3pGrPTCRaVKNrT3NaGZVDLm6ygnsT6PYJaiXM8Av3equ00Uls2/i6v
vZhlIBz5TbKsNag/W7cRJVvjj7YDsgU+dplDl62mmDJ6o+cWvILlf9WPINdV
MrmIeg+7fqUEp8nuEzTMm+BDHQ3c/5cxrYr8bksiVoBTXV7m9fO0Je9Exn6N
iWTa5eDUBtR6Ha8WaVUib/cvFj6j94QRNWYmXHl9lG50p+XZ0L5bZ1G8v9Nb
gGxRoYgAncp9M1J+7Pvm5z8wZgxXAs/veUtrf+6SkUbGyCRnUSn/VS7C8syJ
4WW2aWP/A0nxSDe1u+TGpkkPmhk7UDrJEfMQaZrFwS9FkFLfgLH7PxMcAZjJ
hlN129vldPh3QxLviLidlJmzUTvKtb+XrSkA0MjhFMJS2M79DR16j+XWe7Ub
wPnKpZcZ8WsQzOlTHtDEHQvhE3ilcm+4oALSiuqEAZKNKk8lUTtvfzJ2BKyu
Tv46c+Wf3LbwrdMnkGiMHLuIlqhQT2FzauM2Pi+Pt7QJ7L9xXfWW4vzdemxj
bBQD
=rPC0
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Check networking first?

2015-07-31 Thread Jan Schermer
I remember reading that ScaleIO (I think?) does something like this by 
regularly sending reports to a multicast group, thus any node with issues (or 
just overload) is reweighted or avoided automatically on the client. OSD map is 
the Ceph equivalent I guess. It makes sense to gather metrics and prioritize 
better performing OSDs over those with e.g. worse latencies, but it needs to 
update fast. But I believe that _network_ monitoring itself ought to be part 
of… a network monitoring system you should already have :-) and not a storage 
system that just happens to use network. I don’t remember seeing anything but a 
simple ping/traceroute/dns test in any SAN interface. If an OSD has issues it 
might be anything from a failing drive to a swapping OS and a number like 
“commit latency” (= response time average from the clients’ perspective) is 
maybe the ultimate metric of all for this purpose, irrespective of the root 
cause.

Nice option would be to read data from all replicas at once - this would of 
course increase load and cause all sorts of issues if abused, but if you have 
an app that absolutely-always-without-fail-must-get-data-ASAP then you could 
enable this in the client (and I think that would be an easy option to add). 
This is actually used in some systems. Harder part is to fail nicely when 
writing (like waiting only for the remote network buffers on 2 nodes to get the 
data instead of waiting for commit on all 3 replicas…)

Jan

> On 31 Jul 2015, at 19:45, Robert LeBlanc  wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Even just a ping at max MTU set with nodefrag could tell a lot about
> connectivity issues and latency without a lot of traffic. Using Ceph
> messenger would be even better to check firewall ports. I like the
> idea of incorporating simple network checks into Ceph. The monitor can
> correlate failures and help determine if the problem is related to one
> host from the CRUSH map.
> - 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Thu, Jul 30, 2015 at 11:27 PM, Stijn De Weirdt  wrote:
>> wouldn't it be nice that ceph does something like this in background (some
>> sort of network-scrub). debugging network like this is not that easy (can't
>> expect admins to install e.g. perfsonar on all nodes and/or clients)
>> 
>> something like: every X min, each service X pick a service Y on another host
>> (assuming X and Y will exchange some communication at some point; like osd
>> with other osd), send 1MB of data, and make the timing data available so we
>> can monitor it and detect underperforming links over time.
>> 
>> ideally clients also do this, but not sure where they should report/store
>> the data.
>> 
>> interpreting the data can be a bit tricky, but extreme outliers will be
>> spotted easily, and the main issue with this sort of debugging is collecting
>> the data.
>> 
>> simply reporting / keeping track of ongoing communications is already a big
>> step forward, but then we need to have the size of the exchanged data to
>> allow interpretation (and the timing should be about the network part, not
>> e.g. flush data to disk in case of an osd). (and obviously sampling is
>> enough, no need to have details of every bit send).
>> 
>> 
>> 
>> stijn
>> 
>> 
>> On 07/30/2015 08:04 PM, Mark Nelson wrote:
>>> 
>>> Thanks for posting this!  We see issues like this more often than you'd
>>> think.  It's really important too because if you don't figure it out the
>>> natural inclination is to blame Ceph! :)
>>> 
>>> Mark
>>> 
>>> On 07/30/2015 12:50 PM, Quentin Hartman wrote:
 
 Just wanted to drop a note to the group that I had my cluster go
 sideways yesterday, and the root of the problem was networking again.
 Using iperf I discovered that one of my nodes was only moving data at
 1.7Mb / s. Moving that node to a different switch port with a different
 cable has resolved the problem. It took awhile to track down because
 none of the server-side error metrics for disk or network showed
 anything was amiss, and I didn't think to test network performance (as
 suggested in another thread) until well into the process.
 
 Check networking first!
 
 QH
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> -BEGIN PGP SIGNATURE-
> Version: Mailvelope v0.13.1
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJVu7QoCRDmVDuy+mK58QAAcpAQAKbv6xPRxMMJ8NWrXym0
> NAtZFIYywvStKf

[ceph-users] Happy SysAdmin Day!

2015-07-31 Thread Mark Nelson
Most folks have either probably already left or are on their way out the 
door late on a friday, but I just wanted to say Happy SysAdmin day to 
all of the excellent System Administrators out there running Ceph 
clusters. :)


Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Happy SysAdmin Day!

2015-07-31 Thread Michael Kuriger
Thanks Mark you too

 

 
Michael Kuriger
Sr. Unix Systems Engineer
* mk7...@yp.com |( 818-649-7235





On 7/31/15, 3:02 PM, "ceph-users on behalf of Mark Nelson"
 wrote:

>Most folks have either probably already left or are on their way out the
>door late on a friday, but I just wanted to say Happy SysAdmin day to
>all of the excellent System Administrators out there running Ceph
>clusters. :)
>
>Mark
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinf
>o.cgi_ceph-2Dusers-2Dceph.com&d=AwICAg&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSnc
>m6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=eWaHPQA1Amni5T9DeUE2Z49jPowepQBRTYxB
>Z_Fotdo&s=HX0cDDeTqopJKkZMAMpDAwHhwtOaunwuMtSupqRllGo&e= 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Happy SysAdmin Day!

2015-07-31 Thread Jan Schermer
May your bytes stay with you :)

Happy bofhday!

Jan

> On 01 Aug 2015, at 00:10, Michael Kuriger  wrote:
> 
> Thanks Mark you too
> 
> 
> 
> 
> Michael Kuriger
> Sr. Unix Systems Engineer
> * mk7...@yp.com |( 818-649-7235
> 
> 
> 
> 
> 
> On 7/31/15, 3:02 PM, "ceph-users on behalf of Mark Nelson"
>  wrote:
> 
>> Most folks have either probably already left or are on their way out the
>> door late on a friday, but I just wanted to say Happy SysAdmin day to
>> all of the excellent System Administrators out there running Ceph
>> clusters. :)
>> 
>> Mark
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinf
>> o.cgi_ceph-2Dusers-2Dceph.com&d=AwICAg&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSnc
>> m6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=eWaHPQA1Amni5T9DeUE2Z49jPowepQBRTYxB
>> Z_Fotdo&s=HX0cDDeTqopJKkZMAMpDAwHhwtOaunwuMtSupqRllGo&e= 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem with RGW

2015-07-31 Thread Brad Hubbard


- Original Message -
From: "Butkeev Stas" 
To: ceph-us...@ceph.com, ceph-commun...@lists.ceph.com, supp...@ceph.com
Sent: Friday, 31 July, 2015 9:10:40 PM
Subject: [ceph-users] problem with RGW

>Hello everybody
>
>We have ceph cluster that consist of 8 host with 12 osd per each host. It's 2T 
>SATA disks.
>In log osd.0
>
>2015-07-31 14:03:24.490774 7f2cd95c5700  0 log_channel(cluster) log [WRN] : 35 
>slow requests, 9 included below; oldest blocked for > 3003.952332 secs
>2015-07-31 14:03:24.490782 7f2cd95c5700  0 log_channel(cluster) log [WRN] : 
>slow request 960.179599 seconds old, received at 2015-07-31 13:47:24.311080: 
>osd_op(client.67321.0:7856 
>default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [writefull 0~0] 
>26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag 
>points reached
>2015-07-31 14:03:24.490791 7f2cd95c5700  0 log_channel(cluster) log [WRN] : 
>slow request 960.179357 seconds old, received at 2015-07-31 13:47:24.311323: 
>osd_op(client.67321.0:7857 
>default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [writefull 
>0~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no 
>flag points reached
>2015-07-31 14:03:24.490794 7f2cd95c5700  0 log_channel(cluster) log [WRN] : 
>slow request 960.167539 seconds old, received at 2015-07-31 13:47:24.323141: 
>osd_op(client.67321.0:7858 
>default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 
>524288~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) 
>currently no flag points reached
>2015-07-31 14:03:24.490797 7f2cd95c5700  0 log_channel(cluster) log [WRN] : 
>slow request 960.14 seconds old, received at 2015-07-31 13:47:24.335126: 
>osd_op(client.67321.0:7859 
>default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 
>1048576~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) 
>currently no flag points reached
>2015-07-31 14:03:24.490801 7f2cd95c5700  0 log_channel(cluster) log [WRN] : 
>slow request 960.145867 seconds old, received at 2015-07-31 13:47:24.344813: 
>osd_op(client.67321.0:7860 
>default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 
>1572864~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) 
>currently no flag points reached
>2015-07-31 14:03:25.491062 7f2cd95c5700  0 log_channel(cluster) log [WRN] : 35 
>slow requests, 4 included below; oldest blocked for > 3004.952621 secs
>2015-07-31 14:03:25.491078 7f2cd95c5700  0 log_channel(cluster) log [WRN] : 
>slow request 961.140790 seconds old, received at 2015-07-31 13:47:24.350178: 
>osd_op(client.67321.0:7861 
>default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 
>2097152~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) 
>currently no flag points reached
>2015-07-31 14:03:25.491084 7f2cd95c5700  0 log_channel(cluster) log [WRN] : 
>slow request 961.097870 seconds old, received at 2015-07-31 13:47:24.393098: 
>osd_op(client.67321.0:7862 
>default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 
>2621440~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) 
>currently no flag points reached
>2015-07-31 14:03:25.491089 7f2cd95c5700  0 log_channel(cluster) log [WRN] : 
>slow request 961.093229 seconds old, received at 2015-07-31 13:47:24.397740: 
>osd_op(client.67321.0:7863 
>default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 
>3145728~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) 
>currently no flag points reached
>2015-07-31 14:03:25.491095 7f2cd95c5700  0 log_channel(cluster) log [WRN] : 
>slow request 961.002957 seconds old, received at 2015-07-31 13:47:24.488012: 
>osd_op(client.67321.0:7864 
>default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 
>3670016~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) 
>currently no flag points reached
>
>How I can avoid these blocked requests? What is root cause of this problem?
>

Do a "ceph pg dump" and look for the pgs in this state,
ack+ondisk+write+known_if_redirected then do a "ceph pg [pgid] query" and post
the output here (if there aren't too many, otherwise a representative sample).
Also look carefully at the acting OSDs for these pgs and check the output of
"ceph daemon /var/run/ceph/ceph-osd.NNN.asok dump_ops_in_flight". There could be
problems with these OSDs slowing down the requests, including hardware problems
so check thoroughly.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Check networking first?

2015-07-31 Thread Ben Hines
I encountered a similar problem. Incoming firewall ports were blocked
on one host. So the other OSDs kept marking that OSD as down. But, it
could talk out, so it kept saying 'hey, i'm up, mark me up' so then
the other OSDs started trying to send it data again, causing backed up
requests.. Which goes on, ad infinitum. I had to figure out the
connectivity problem myself by looking in the OSD logs.

After a while, the cluster should just say 'no, you're not reachable,
stop putting yourself back into the cluster'.

-Ben

On Fri, Jul 31, 2015 at 11:21 AM, Jan Schermer  wrote:
> I remember reading that ScaleIO (I think?) does something like this by 
> regularly sending reports to a multicast group, thus any node with issues (or 
> just overload) is reweighted or avoided automatically on the client. OSD map 
> is the Ceph equivalent I guess. It makes sense to gather metrics and 
> prioritize better performing OSDs over those with e.g. worse latencies, but 
> it needs to update fast. But I believe that _network_ monitoring itself ought 
> to be part of… a network monitoring system you should already have :-) and 
> not a storage system that just happens to use network. I don’t remember 
> seeing anything but a simple ping/traceroute/dns test in any SAN interface. 
> If an OSD has issues it might be anything from a failing drive to a swapping 
> OS and a number like “commit latency” (= response time average from the 
> clients’ perspective) is maybe the ultimate metric of all for this purpose, 
> irrespective of the root cause.
>
> Nice option would be to read data from all replicas at once - this would of 
> course increase load and cause all sorts of issues if abused, but if you have 
> an app that absolutely-always-without-fail-must-get-data-ASAP then you could 
> enable this in the client (and I think that would be an easy option to add). 
> This is actually used in some systems. Harder part is to fail nicely when 
> writing (like waiting only for the remote network buffers on 2 nodes to get 
> the data instead of waiting for commit on all 3 replicas…)
>
> Jan
>
>> On 31 Jul 2015, at 19:45, Robert LeBlanc  wrote:
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> Even just a ping at max MTU set with nodefrag could tell a lot about
>> connectivity issues and latency without a lot of traffic. Using Ceph
>> messenger would be even better to check firewall ports. I like the
>> idea of incorporating simple network checks into Ceph. The monitor can
>> correlate failures and help determine if the problem is related to one
>> host from the CRUSH map.
>> - 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Thu, Jul 30, 2015 at 11:27 PM, Stijn De Weirdt  wrote:
>>> wouldn't it be nice that ceph does something like this in background (some
>>> sort of network-scrub). debugging network like this is not that easy (can't
>>> expect admins to install e.g. perfsonar on all nodes and/or clients)
>>>
>>> something like: every X min, each service X pick a service Y on another host
>>> (assuming X and Y will exchange some communication at some point; like osd
>>> with other osd), send 1MB of data, and make the timing data available so we
>>> can monitor it and detect underperforming links over time.
>>>
>>> ideally clients also do this, but not sure where they should report/store
>>> the data.
>>>
>>> interpreting the data can be a bit tricky, but extreme outliers will be
>>> spotted easily, and the main issue with this sort of debugging is collecting
>>> the data.
>>>
>>> simply reporting / keeping track of ongoing communications is already a big
>>> step forward, but then we need to have the size of the exchanged data to
>>> allow interpretation (and the timing should be about the network part, not
>>> e.g. flush data to disk in case of an osd). (and obviously sampling is
>>> enough, no need to have details of every bit send).
>>>
>>>
>>>
>>> stijn
>>>
>>>
>>> On 07/30/2015 08:04 PM, Mark Nelson wrote:

 Thanks for posting this!  We see issues like this more often than you'd
 think.  It's really important too because if you don't figure it out the
 natural inclination is to blame Ceph! :)

 Mark

 On 07/30/2015 12:50 PM, Quentin Hartman wrote:
>
> Just wanted to drop a note to the group that I had my cluster go
> sideways yesterday, and the root of the problem was networking again.
> Using iperf I discovered that one of my nodes was only moving data at
> 1.7Mb / s. Moving that node to a different switch port with a different
> cable has resolved the problem. It took awhile to track down because
> none of the server-side error metrics for disk or network showed
> anything was amiss, and I didn't think to test network performance (as
> suggested in another thread) until well into the process.
>
> Check networking first!
>
> QH
>
>
> ___