date:20150427

[ceph-users] cephfs: recovering from transport endpoint not connected?

2015-04-27 Thread Burkhard Linke


Hi,

I've deployed ceph on a number of nodes in our compute cluster (Ubuntu 
14.04 Ceph Firefly 0.80.9). /ceph is mounted via ceph-fuse.


From time to time some nodes loose their access to cephfs with the 
following error message:


# ls /ceph
ls: cannot access /ceph: Transport endpoint is not connected

The ceph client log contains the entries like:
2015-04-22 14:25:42.834607 7fcca6fa07c0  0 ceph version 0.80.9 
(b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-fuse, pid 156483


2015-04-26 17:23:15.430052 7f08570777c0  0 ceph version 0.80.9 
(b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-fuse, pid 140778

2015-04-26 17:23:15.625731 7f08570777c0 -1 fuse_parse_cmdline failed.
2015-04-26 17:23:18.921788 7f5bc299b7c0  0 ceph version 0.80.9 
(b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-fuse, pid 140807

2015-04-26 17:23:19.166199 7f5bc299b7c0 -1 fuse_parse_cmdline failed.

Re-mounting resolves the problem, but it may not be possible due to 
processes with (now stale) access to the mount point. Is there a better 
way to resolve this problem (especially without remounting)?


Best regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread tuomas . juntunen



I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer

Then created new pools and deleted some old ones. Also I created one pool for
tier to be able to move data without outage.

After these operations all but 10 OSD's are down and creating this kind of
messages to logs, I get more than 100gb of these in a night:

 -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
0'0 inactive NOTIFY] enter Started
   -18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
0'0 inactive NOTIFY] enter Start
   -17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23 pg_epoch: 17882
pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
0'0 inactive NOTIFY] state: transitioning to Stray
   -16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
0'0 inactive NOTIFY] exit Start 0.25 0 0.00
   -15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
0'0 inactive NOTIFY] enter Started/Stray
   -14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] exit
Reset 0.119467 4 0.37
   -13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
Started
   -12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
Start
   -11> 2015-04-27 10:17:08.808838 7fd8e748d700  1 osd.23 pg_epoch: 17882
pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY]
state: transitioning to Stray
   -10> 2015-04-27 10:17:08.808849 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] exit
Start 0.20 0 0.00
-9> 2015-04-27 10:17:08.808861 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
Started/Stray
-8> 2015-04-27 10:17:08.809427 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] exit
Reset 7.511623 45 0.000165
-7> 2015-04-27 10:17:08.809445 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] enter
Started
-6> 2015-04-27 10:17:08.809456 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] enter
Start
-5> 2015-04-27 10:17:08.809468 7fd8e748d700  1 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive]
state: transitioning to Primary
-4> 2015-04-27 10:17:08.809479 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] exit
Start 0.23 0 0.00
-3> 2015-04-27 10:17:08.809492 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] enter
Started/Primary
-2> 2015-04-27 10:17:08.809502 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] enter
Started/Primary/Peering
-1> 2015-04-27 10:17:08.809513 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 peering] enter
Started/Primary/Peering/GetInfo
 0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1 ./include/interval_set.h: In
function 'void interval_set::erase(T, T) [with T = snapid_t]' thread
7fd8e

Re: [ceph-users] Ceph recovery network?

2015-04-27 Thread Sebastien Han

Well yes “pretty much” the same thing :).
I think some people would like to distinguish recovery from replication and 
maybe perform some QoS around these 2.
We have to replicate while recovering so one can impact the other.

In the end, I just think it’s a doc issue, still waiting for a dev to answer :).

> On 27 Apr 2015, at 00:50, Robert LeBlanc  wrote:
> 
> My understanding is that Monitors monitor the public address of the
> OSDs and other OSDs monitor the cluster address of the OSDs.
> Replication, recovery and backfill traffic all use the same network
> when you specify 'cluster network = ' in your ceph.conf.
> It is useful to remember that replication, recovery and backfill
> traffic are pretty much the same thing, just at different points in
> time.
> 
> On Sun, Apr 26, 2015 at 4:39 PM, Sebastien Han
>  wrote:
>> Hi list,
>> 
>> While reading this 
>> http://ceph.com/docs/master/rados/configuration/network-config-ref/#ceph-networks,
>>  I came across the following sentence:
>> 
>> "You can also establish a separate cluster network to handle OSD heartbeat, 
>> object replication and recovery traffic”
>> 
>> I didn’t know it was possible to perform such stretching, at least for 
>> recovery traffic.
>> Replication is generally handled by the cluster_network_addr and the 
>> heartbeat can be used with osd_heartbeat_addr.
>> Although I’m a bit confused by the osd_heartbeat_addr since I thought the 
>> heartbeat was binding on both public and cluster addresses.
>> 
>> So my question is: how to isolate the recovery traffic to specific network?
>> 
>> Thanks!
>> 
>> Cheers.
>> 
>> Sébastien Han
>> Cloud Architect
>> 
>> "Always give 100%. Unless you're giving blood."
>> 
>> Phone: +33 (0)1 49 70 99 72
>> Mail: sebastien@enovance.com
>> Address : 11 bis, rue Roquépine - 75008 Paris
>> Web : www.enovance.com - Twitter : @enovance
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 


Cheers.

Sébastien Han
Cloud Architect

"Always give 100%. Unless you're giving blood."

Phone: +33 (0)1 49 70 99 72
Mail: sebastien@enovance.com
Address : 11 bis, rue Roquépine - 75008 Paris
Web : www.enovance.com - Twitter : @enovance



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cluster not coming up after reboot

2015-04-27 Thread Kenneth Waegeman




On 04/23/2015 06:58 PM, Craig Lewis wrote:

Yes, unless you've adjusted:
[global]
   mon osd min down reporters = 9
   mon osd min down reports = 12

OSDs talk to the MONs on the public network.  The cluster network is
only used for OSD to OSD communication.

If one OSD node can't talk on that network, the other nodes will tell
the MONs that it's OSDs are down.  And that node will also tell the MONs
that all the other OSDs are down.  Then the OSDs marked down will tell
the MONs that they're not down, and the cycle will repeat.


Thanks for the explanation, that makes sense now! Good to know I should 
set those values:)


I'm somewhat surprised that your cluster eventually stabilized.
The OSDs of that one node were eventually set 'out' of the cluster. I 
guess the osds where down long enough to get marked out? (Or the 
monitors took some action after too many failures?) And then the other 
OSDs could stay up I guess:)



I have 8 OSDs per node.  I set my min down reporters high enough that no
single node can mark another node's OSDs down.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] very different performance on two volumes in the same pool

2015-04-27 Thread Nikola Ciprich

Hello Somnath,
> Thanks for the perf data..It seems innocuous..I am not seeing single tcmalloc 
> trace, are you running with tcmalloc by the way ?

according to ldd, it seems I have it compiled in, yes:
[root@vfnphav1a ~]# ldd /usr/bin/ceph-osd
.
.
libtcmalloc.so.4 => /usr/lib64/libtcmalloc.so.4 (0x7f7a3756e000)
.
.


> What about my other question, is the performance of slow volume increasing if 
> you stop IO on the other volume ?
I don't have any other cpeh users, actually whole cluster is idle..

> Are you using default ceph.conf ? Probably, you want to try with different 
> osd_op_num_shards (may be = 10 , based on your osd server config) and 
> osd_op_num_threads_per_shard (may be = 1). Also, you may want to see the 
> effect by doing osd_enable_op_tracker = false

I guess I'm using pretty default settings, few changes probably not much 
related:

[osd]
osd crush update on start = false

[client]
rbd cache = true
rbd cache writethrough until flush = true

[mon]
debug paxos = 0



I now tried setting
throttler perf counter = false
osd enable op tracker = false
osd_op_num_threads_per_shard = 1
osd_op_num_shards = 10

and restarting all ceph servers.. but it seems to make no big difference..


> 
> Are you seeing similar resource consumption on both the servers while IO is 
> going on ?
yes, on all three nodes, ceph-osd seems to be consuming lots of CPU during 
benchmark.

> 
> Need some information about your client, are the volumes exposed with krbd or 
> running with librbd environment ? If krbd and with same physical box, hope 
> you mapped the images with 'noshare' enabled.

I'm using fio with ceph engine, so I guess none rbd related stuff is in use 
here?


> 
> Too many questions :-)  But, this may give some indication what is going on 
> there.
:-) hopefully my answers are not too confused, I'm still pretty new to ceph..

BR

nik


> 
> Thanks & Regards
> Somnath
> 
> -Original Message-
> From: Nikola Ciprich [mailto:nikola.cipr...@linuxbox.cz] 
> Sent: Sunday, April 26, 2015 7:32 AM
> To: Somnath Roy
> Cc: ceph-users@lists.ceph.com; n...@linuxbox.cz
> Subject: Re: [ceph-users] very different performance on two volumes in the 
> same pool
> 
> Hello Somnath,
> 
> On Fri, Apr 24, 2015 at 04:23:19PM +, Somnath Roy wrote:
> > This could be again because of tcmalloc issue I reported earlier.
> > 
> > Two things to observe.
> > 
> > 1. Is the performance improving if you stop IO on other volume ? If so, it 
> > could be different issue.
> there is no other IO.. only cephfs mounted, but no users of it.
> 
> > 
> > 2. Run perf top in the OSD node and see if tcmalloc traces are popping up.
> 
> don't see anything special:
> 
>   3.34%  libc-2.12.so  [.] _int_malloc
>   2.87%  libc-2.12.so  [.] _int_free
>   2.79%  [vdso][.] __vdso_gettimeofday
>   2.67%  libsoftokn3.so[.] 0x0001fad9
>   2.34%  libfreeblpriv3.so [.] 0x000355e6
>   2.33%  libpthread-2.12.so[.] pthread_mutex_unlock
>   2.19%  libpthread-2.12.so[.] pthread_mutex_lock
>   1.80%  libc-2.12.so  [.] malloc
>   1.43%  [kernel]  [k] do_raw_spin_lock
>   1.42%  libc-2.12.so  [.] memcpy
>   1.23%  [kernel]  [k] __switch_to
>   1.19%  [kernel]  [k] acpi_processor_ffh_cstate_enter
>   1.09%  libc-2.12.so  [.] malloc_consolidate
>   1.08%  [kernel]  [k] __schedule
>   1.05%  libtcmalloc.so.4.1.0  [.] 0x00017e6f
>   0.98%  libc-2.12.so  [.] vfprintf
>   0.83%  libstdc++.so.6.0.13   [.] std::basic_ostream std::char_traits >& std::__ostream_insert 
> >(std::basic_ostream   0.76%  libstdc++.so.6.0.13   [.] 0x0008092a
>   0.73%  libc-2.12.so  [.] __memset_sse2
>   0.72%  libc-2.12.so  [.] __strlen_sse42
>   0.70%  libstdc++.so.6.0.13   [.] std::basic_streambuf std::char_traits >::xsputn(char const*, long)
>   0.68%  libpthread-2.12.so[.] pthread_mutex_trylock
>   0.67%  librados.so.2.0.0 [.] ceph_crc32c_sctp
>   0.63%  libpython2.6.so.1.0   [.] 0x0007d823
>   0.55%  libnss3.so[.] 0x00056d2a
>   0.52%  libc-2.12.so  [.] free
>   0.50%  libstdc++.so.6.0.13   [.] std::basic_string std::char_traits, std::allocator >::basic_string(std::string 
> const&)
> 
> should I check anything else?
> BR
> nik
> 
> 
> > 
> > Thanks & Regards
> > Somnath
> > 
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> > Nikola Ciprich
> > Sent: Friday, April 24, 2015 7:10 AM
> > To: ceph-users@lists.ceph.com
> > Cc: n...@linuxbox.cz
> > Subject: [ceph-users] very different performance on two volumes in the same 
> > pool
> > 
> > Hello,
> > 
> > I'm trying to solve a bit myst

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Ian Colle

It looks like you may have hit http://tracker.ceph.com/issues/7915 

Ian R. Colle
Global Director
of Software Engineering
Red Hat (Inktank is now part of Red Hat!)
http://www.linkedin.com/in/ircolle
http://www.twitter.com/ircolle
Cell: +1.303.601.7713
Email: ico...@redhat.com

- Original Message -
From: "tuomas juntunen" 
To: ceph-users@lists.ceph.com
Sent: Monday, April 27, 2015 1:56:29 PM
Subject: [ceph-users] Upgrade from Giant to Hammer and after some basic 
operations most of the OSD's went down



I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer

Then created new pools and deleted some old ones. Also I created one pool for
tier to be able to move data without outage.

After these operations all but 10 OSD's are down and creating this kind of
messages to logs, I get more than 100gb of these in a night:

 -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
0'0 inactive NOTIFY] enter Started
   -18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
0'0 inactive NOTIFY] enter Start
   -17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23 pg_epoch: 17882
pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
0'0 inactive NOTIFY] state: transitioning to Stray
   -16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
0'0 inactive NOTIFY] exit Start 0.25 0 0.00
   -15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
0'0 inactive NOTIFY] enter Started/Stray
   -14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] exit
Reset 0.119467 4 0.37
   -13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
Started
   -12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
Start
   -11> 2015-04-27 10:17:08.808838 7fd8e748d700  1 osd.23 pg_epoch: 17882
pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY]
state: transitioning to Stray
   -10> 2015-04-27 10:17:08.808849 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] exit
Start 0.20 0 0.00
-9> 2015-04-27 10:17:08.808861 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
Started/Stray
-8> 2015-04-27 10:17:08.809427 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] exit
Reset 7.511623 45 0.000165
-7> 2015-04-27 10:17:08.809445 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] enter
Started
-6> 2015-04-27 10:17:08.809456 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] enter
Start
-5> 2015-04-27 10:17:08.809468 7fd8e748d700  1 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive]
state: transitioning to Primary
-4> 2015-04-27 10:17:08.809479 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] exit
Start 0.23 0 0.00
-3> 2015-04-27 10:17:08.809492 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] enter
Started/Primary
-2> 2015-04-27 10:17:08.809502 7fd8e748d700  5 osd.23 pg_epoch: 17882
pg[2.189( empty local-les=16127 n=

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread tuomas . juntunen

Hi

Any solution for this yet?

Br,
Tuomas

> It looks like you may have hit http://tracker.ceph.com/issues/7915
>
> Ian R. Colle
> Global Director
> of Software Engineering
> Red Hat (Inktank is now part of Red Hat!)
> http://www.linkedin.com/in/ircolle
> http://www.twitter.com/ircolle
> Cell: +1.303.601.7713
> Email: ico...@redhat.com
>
> - Original Message -
> From: "tuomas juntunen" 
> To: ceph-users@lists.ceph.com
> Sent: Monday, April 27, 2015 1:56:29 PM
> Subject: [ceph-users] Upgrade from Giant to Hammer and after some basic
> operations most of the OSD's went down
>
>
>
> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
>
> Then created new pools and deleted some old ones. Also I created one pool for
> tier to be able to move data without outage.
>
> After these operations all but 10 OSD's are down and creating this kind of
> messages to logs, I get more than 100gb of these in a night:
>
>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
> 0'0 inactive NOTIFY] enter Started
>-18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
> 0'0 inactive NOTIFY] enter Start
>-17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23 pg_epoch: 17882
> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
> 0'0 inactive NOTIFY] state: transitioning to Stray
>-16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
> 0'0 inactive NOTIFY] exit Start 0.25 0 0.00
>-15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
> 0'0 inactive NOTIFY] enter Started/Stray
>-14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] exit
> Reset 0.119467 4 0.37
>-13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
> Started
>-12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
> Start
>-11> 2015-04-27 10:17:08.808838 7fd8e748d700  1 osd.23 pg_epoch: 17882
> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY]
> state: transitioning to Stray
>-10> 2015-04-27 10:17:08.808849 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] exit
> Start 0.20 0 0.00
> -9> 2015-04-27 10:17:08.808861 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
> Started/Stray
> -8> 2015-04-27 10:17:08.809427 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] exit
> Reset 7.511623 45 0.000165
> -7> 2015-04-27 10:17:08.809445 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] enter
> Started
> -6> 2015-04-27 10:17:08.809456 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] enter
> Start
> -5> 2015-04-27 10:17:08.809468 7fd8e748d700  1 osd.23 pg_epoch: 17882
> pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive]
> state: transitioning to Primary
> -4> 2015-04-27 10:17:08.809479 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] exit
> Start 0.23 0 0.00
> -3> 2015-04-27 10:17:08.809492 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[2.189( empty local-les=16127 n=0

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Samuel Just

Can you explain exactly what you mean by:

"Also I created one pool for tier to be able to move data without outage."

-Sam
- Original Message -
From: "tuomas juntunen" 
To: "Ian Colle" 
Cc: ceph-users@lists.ceph.com
Sent: Monday, April 27, 2015 4:23:44 AM
Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic 
operations most of the OSD's went down

Hi

Any solution for this yet?

Br,
Tuomas

> It looks like you may have hit http://tracker.ceph.com/issues/7915
>
> Ian R. Colle
> Global Director
> of Software Engineering
> Red Hat (Inktank is now part of Red Hat!)
> http://www.linkedin.com/in/ircolle
> http://www.twitter.com/ircolle
> Cell: +1.303.601.7713
> Email: ico...@redhat.com
>
> - Original Message -
> From: "tuomas juntunen" 
> To: ceph-users@lists.ceph.com
> Sent: Monday, April 27, 2015 1:56:29 PM
> Subject: [ceph-users] Upgrade from Giant to Hammer and after some basic
> operations most of the OSD's went down
>
>
>
> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
>
> Then created new pools and deleted some old ones. Also I created one pool for
> tier to be able to move data without outage.
>
> After these operations all but 10 OSD's are down and creating this kind of
> messages to logs, I get more than 100gb of these in a night:
>
>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
> 0'0 inactive NOTIFY] enter Started
>-18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
> 0'0 inactive NOTIFY] enter Start
>-17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23 pg_epoch: 17882
> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
> 0'0 inactive NOTIFY] state: transitioning to Stray
>-16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
> 0'0 inactive NOTIFY] exit Start 0.25 0 0.00
>-15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
> 0'0 inactive NOTIFY] enter Started/Stray
>-14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] exit
> Reset 0.119467 4 0.37
>-13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
> Started
>-12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
> Start
>-11> 2015-04-27 10:17:08.808838 7fd8e748d700  1 osd.23 pg_epoch: 17882
> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY]
> state: transitioning to Stray
>-10> 2015-04-27 10:17:08.808849 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] exit
> Start 0.20 0 0.00
> -9> 2015-04-27 10:17:08.808861 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
> Started/Stray
> -8> 2015-04-27 10:17:08.809427 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] exit
> Reset 7.511623 45 0.000165
> -7> 2015-04-27 10:17:08.809445 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] enter
> Started
> -6> 2015-04-27 10:17:08.809456 7fd8e748d700  5 osd.23 pg_epoch: 17882
> pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] enter
> Start
> -5> 2015-04-27 10:17:08.809468 7fd8e748d700  1 osd.23 pg_epoch: 17882
> pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive]
> sta

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread tuomas . juntunen



The following:

ceph osd tier add img images --force-nonempty
ceph osd tier cache-mode images forward
ceph osd tier set-overlay img images

Idea was to make images as a tier to img, move data to img then change clients
to use the new img pool.

Br,
Tuomas

> Can you explain exactly what you mean by:
>
> "Also I created one pool for tier to be able to move data without outage."
>
> -Sam
> - Original Message -
> From: "tuomas juntunen" 
> To: "Ian Colle" 
> Cc: ceph-users@lists.ceph.com
> Sent: Monday, April 27, 2015 4:23:44 AM
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
> operations most of the OSD's went down
>
> Hi
>
> Any solution for this yet?
>
> Br,
> Tuomas
>
>> It looks like you may have hit http://tracker.ceph.com/issues/7915
>>
>> Ian R. Colle
>> Global Director
>> of Software Engineering
>> Red Hat (Inktank is now part of Red Hat!)
>> http://www.linkedin.com/in/ircolle
>> http://www.twitter.com/ircolle
>> Cell: +1.303.601.7713
>> Email: ico...@redhat.com
>>
>> - Original Message -
>> From: "tuomas juntunen" 
>> To: ceph-users@lists.ceph.com
>> Sent: Monday, April 27, 2015 1:56:29 PM
>> Subject: [ceph-users] Upgrade from Giant to Hammer and after some basic
>> operations most of the OSD's went down
>>
>>
>>
>> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
>>
>> Then created new pools and deleted some old ones. Also I created one pool for
>> tier to be able to move data without outage.
>>
>> After these operations all but 10 OSD's are down and creating this kind of
>> messages to logs, I get more than 100gb of these in a night:
>>
>>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
>> 0'0 inactive NOTIFY] enter Started
>>-18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
>> 0'0 inactive NOTIFY] enter Start
>>-17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23 pg_epoch: 17882
>> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
>> 0'0 inactive NOTIFY] state: transitioning to Stray
>>-16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
>> 0'0 inactive NOTIFY] exit Start 0.25 0 0.00
>>-15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
>> 0'0 inactive NOTIFY] enter Started/Stray
>>-14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] exit
>> Reset 0.119467 4 0.37
>>-13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
>> Started
>>-12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
>> Start
>>-11> 2015-04-27 10:17:08.808838 7fd8e748d700  1 osd.23 pg_epoch: 17882
>> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY]
>> state: transitioning to Stray
>>-10> 2015-04-27 10:17:08.808849 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] exit
>> Start 0.20 0 0.00
>> -9> 2015-04-27 10:17:08.808861 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
>> Started/Stray
>> -8> 2015-04-27 10:17:08.809427 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
>> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] exit
>> Reset 7.511623 45 0.000165
>> -7> 2015-04-27 10:17:08.809445 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
>> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] enter
>> Started
>> -6> 2015-04-27 10:17:08.809456 7fd8e748d700  5 osd.23

Re: [ceph-users] cephfs: recovering from transport endpoint not connected?

2015-04-27 Thread Yan, Zheng

On Mon, Apr 27, 2015 at 3:42 PM, Burkhard Linke
 wrote:
> Hi,
>
> I've deployed ceph on a number of nodes in our compute cluster (Ubuntu 14.04
> Ceph Firefly 0.80.9). /ceph is mounted via ceph-fuse.
>
> From time to time some nodes loose their access to cephfs with the following
> error message:
>
> # ls /ceph
> ls: cannot access /ceph: Transport endpoint is not connected

looks like ceph-fuse was crashed. please check if there is any crash
related information in the client log files

>
> The ceph client log contains the entries like:
> 2015-04-22 14:25:42.834607 7fcca6fa07c0  0 ceph version 0.80.9
> (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-fuse, pid 156483
>
> 2015-04-26 17:23:15.430052 7f08570777c0  0 ceph version 0.80.9
> (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-fuse, pid 140778
> 2015-04-26 17:23:15.625731 7f08570777c0 -1 fuse_parse_cmdline failed.
> 2015-04-26 17:23:18.921788 7f5bc299b7c0  0 ceph version 0.80.9
> (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-fuse, pid 140807
> 2015-04-26 17:23:19.166199 7f5bc299b7c0 -1 fuse_parse_cmdline failed.
>
> Re-mounting resolves the problem, but it may not be possible due to
> processes with (now stale) access to the mount point. Is there a better way
> to resolve this problem (especially without remounting)?
>
> Best regards,
> Burkhard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Samuel Just

So, the base tier is what determines the snapshots for the cache/base pool 
amalgam.  You added a populated pool complete with snapshots on top of a base 
tier without snapshots.  Apparently, it caused an existential crisis for the 
snapshot code.  That's one of the reasons why there is a --force-nonempty flag 
for that operation, I think.  I think the immediate answer is probably to 
disallow pools with snapshots as a cache tier altogether until we think of a 
good way to make it work.
-Sam

- Original Message -
From: "tuomas juntunen" 
To: "Samuel Just" 
Cc: ceph-users@lists.ceph.com
Sent: Monday, April 27, 2015 4:56:58 AM
Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic 
operations most of the OSD's went down



The following:

ceph osd tier add img images --force-nonempty
ceph osd tier cache-mode images forward
ceph osd tier set-overlay img images

Idea was to make images as a tier to img, move data to img then change clients
to use the new img pool.

Br,
Tuomas

> Can you explain exactly what you mean by:
>
> "Also I created one pool for tier to be able to move data without outage."
>
> -Sam
> - Original Message -
> From: "tuomas juntunen" 
> To: "Ian Colle" 
> Cc: ceph-users@lists.ceph.com
> Sent: Monday, April 27, 2015 4:23:44 AM
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
> operations most of the OSD's went down
>
> Hi
>
> Any solution for this yet?
>
> Br,
> Tuomas
>
>> It looks like you may have hit http://tracker.ceph.com/issues/7915
>>
>> Ian R. Colle
>> Global Director
>> of Software Engineering
>> Red Hat (Inktank is now part of Red Hat!)
>> http://www.linkedin.com/in/ircolle
>> http://www.twitter.com/ircolle
>> Cell: +1.303.601.7713
>> Email: ico...@redhat.com
>>
>> - Original Message -
>> From: "tuomas juntunen" 
>> To: ceph-users@lists.ceph.com
>> Sent: Monday, April 27, 2015 1:56:29 PM
>> Subject: [ceph-users] Upgrade from Giant to Hammer and after some basic
>> operations most of the OSD's went down
>>
>>
>>
>> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
>>
>> Then created new pools and deleted some old ones. Also I created one pool for
>> tier to be able to move data without outage.
>>
>> After these operations all but 10 OSD's are down and creating this kind of
>> messages to logs, I get more than 100gb of these in a night:
>>
>>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
>> 0'0 inactive NOTIFY] enter Started
>>-18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
>> 0'0 inactive NOTIFY] enter Start
>>-17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23 pg_epoch: 17882
>> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
>> 0'0 inactive NOTIFY] state: transitioning to Stray
>>-16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
>> 0'0 inactive NOTIFY] exit Start 0.25 0 0.00
>>-15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 crt=8480'7 lcod
>> 0'0 inactive NOTIFY] enter Started/Stray
>>-14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] exit
>> Reset 0.119467 4 0.37
>>-13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
>> Started
>>-12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] enter
>> Start
>>-11> 2015-04-27 10:17:08.808838 7fd8e748d700  1 osd.23 pg_epoch: 17882
>> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY]
>> state: transitioning to Stray
>>-10> 2015-04-27 10:17:08.808849 7fd8e748d700  5 osd.23 pg_epoch: 17882
>> pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] exit
>> Start 0.20 0 0.00
>> -9> 20

[ceph-users] Calamari server not working after upgrade 0.87-1 -> 0.94-1

2015-04-27 Thread Steffen W Sørensen

All,

After successfully upgrading from Giant to Hammer, at first our Calamari server 
seems fine, showing the new too many PGs. then during/after 
removing/consolidating various pool, it failed to get updated, Haven’t been 
able to find any RC, I decided to flush the Postgress DB (calamari-ctl clear 
—yes-I-am-sure) and try to start all over (calamari-ctl initialize) restarting 
all nodes’ salt-mimion + diamond, only now I just see this on my dashboard:

This appears to be the first time you have started Calamari and there are no 
clusters currently configured.

4 Ceph servers are connected to Calamari, but no Ceph cluster has been created 
yet. Please use ceph-deploy to create a cluster; please see the Inktank Ceph 
Enterprise documentation for more details.


salt key are still accepted:
root@node1:/var/log/calamari# salt-key -L
Accepted Keys:
node1.
node2.
node3.
node4.
Unaccepted Keys:
Rejected Keys:

Our cluster is of course running fine:

root@node1:/var/log/calamari# ceph -s
cluster 16fe2dcf-2629-422f-a649-871deba78bcd
 health HEALTH_OK
 monmap e29: 3 mons at 
{0=10.0.3.4:6789/0,1=10.0.3.2:6789/0,2=10.0.3.1:6789/0}
election epoch 1382, quorum 0,1,2 2,1,0
 mdsmap e152: 1/1/1 up {0=2=up:active}, 1 up:standby
 osdmap e3579: 24 osds: 24 up, 24 in
  pgmap v4646340: 3072 pgs, 3 pools, 913 GB data, 229 kobjects
1824 GB used, 1334 GB / 3159 GB avail
3072 active+clean
  client io 32524 B/s wr, 11 op/s


Any hints appreciated…

TIA!

/Steffen___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] very different performance on two volumes in the same pool

2015-04-27 Thread Irek Fasikhov

Hi, Nikola.

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg19152.html

2015-04-27 14:17 GMT+03:00 Nikola Ciprich :

> Hello Somnath,
> > Thanks for the perf data..It seems innocuous..I am not seeing single
> tcmalloc trace, are you running with tcmalloc by the way ?
>
> according to ldd, it seems I have it compiled in, yes:
> [root@vfnphav1a ~]# ldd /usr/bin/ceph-osd
> .
> .
> libtcmalloc.so.4 => /usr/lib64/libtcmalloc.so.4 (0x7f7a3756e000)
> .
> .
>
>
> > What about my other question, is the performance of slow volume
> increasing if you stop IO on the other volume ?
> I don't have any other cpeh users, actually whole cluster is idle..
>
> > Are you using default ceph.conf ? Probably, you want to try with
> different osd_op_num_shards (may be = 10 , based on your osd server config)
> and osd_op_num_threads_per_shard (may be = 1). Also, you may want to see
> the effect by doing osd_enable_op_tracker = false
>
> I guess I'm using pretty default settings, few changes probably not much
> related:
>
> [osd]
> osd crush update on start = false
>
> [client]
> rbd cache = true
> rbd cache writethrough until flush = true
>
> [mon]
> debug paxos = 0
>
>
>
> I now tried setting
> throttler perf counter = false
> osd enable op tracker = false
> osd_op_num_threads_per_shard = 1
> osd_op_num_shards = 10
>
> and restarting all ceph servers.. but it seems to make no big difference..
>
>
> >
> > Are you seeing similar resource consumption on both the servers while IO
> is going on ?
> yes, on all three nodes, ceph-osd seems to be consuming lots of CPU during
> benchmark.
>
> >
> > Need some information about your client, are the volumes exposed with
> krbd or running with librbd environment ? If krbd and with same physical
> box, hope you mapped the images with 'noshare' enabled.
>
> I'm using fio with ceph engine, so I guess none rbd related stuff is in
> use here?
>
>
> >
> > Too many questions :-)  But, this may give some indication what is going
> on there.
> :-) hopefully my answers are not too confused, I'm still pretty new to
> ceph..
>
> BR
>
> nik
>
>
> >
> > Thanks & Regards
> > Somnath
> >
> > -Original Message-
> > From: Nikola Ciprich [mailto:nikola.cipr...@linuxbox.cz]
> > Sent: Sunday, April 26, 2015 7:32 AM
> > To: Somnath Roy
> > Cc: ceph-users@lists.ceph.com; n...@linuxbox.cz
> > Subject: Re: [ceph-users] very different performance on two volumes in
> the same pool
> >
> > Hello Somnath,
> >
> > On Fri, Apr 24, 2015 at 04:23:19PM +, Somnath Roy wrote:
> > > This could be again because of tcmalloc issue I reported earlier.
> > >
> > > Two things to observe.
> > >
> > > 1. Is the performance improving if you stop IO on other volume ? If
> so, it could be different issue.
> > there is no other IO.. only cephfs mounted, but no users of it.
> >
> > >
> > > 2. Run perf top in the OSD node and see if tcmalloc traces are popping
> up.
> >
> > don't see anything special:
> >
> >   3.34%  libc-2.12.so  [.] _int_malloc
> >   2.87%  libc-2.12.so  [.] _int_free
> >   2.79%  [vdso][.] __vdso_gettimeofday
> >   2.67%  libsoftokn3.so[.] 0x0001fad9
> >   2.34%  libfreeblpriv3.so [.] 0x000355e6
> >   2.33%  libpthread-2.12.so[.] pthread_mutex_unlock
> >   2.19%  libpthread-2.12.so[.] pthread_mutex_lock
> >   1.80%  libc-2.12.so  [.] malloc
> >   1.43%  [kernel]  [k] do_raw_spin_lock
> >   1.42%  libc-2.12.so  [.] memcpy
> >   1.23%  [kernel]  [k] __switch_to
> >   1.19%  [kernel]  [k]
> acpi_processor_ffh_cstate_enter
> >   1.09%  libc-2.12.so  [.] malloc_consolidate
> >   1.08%  [kernel]  [k] __schedule
> >   1.05%  libtcmalloc.so.4.1.0  [.] 0x00017e6f
> >   0.98%  libc-2.12.so  [.] vfprintf
> >   0.83%  libstdc++.so.6.0.13   [.] std::basic_ostream std::char_traits >& std::__ostream_insert std::char_traits >(std::basic_ostream >   0.76%  libstdc++.so.6.0.13   [.] 0x0008092a
> >   0.73%  libc-2.12.so  [.] __memset_sse2
> >   0.72%  libc-2.12.so  [.] __strlen_sse42
> >   0.70%  libstdc++.so.6.0.13   [.] std::basic_streambuf std::char_traits >::xsputn(char const*, long)
> >   0.68%  libpthread-2.12.so[.] pthread_mutex_trylock
> >   0.67%  librados.so.2.0.0 [.] ceph_crc32c_sctp
> >   0.63%  libpython2.6.so.1.0   [.] 0x0007d823
> >   0.55%  libnss3.so[.] 0x00056d2a
> >   0.52%  libc-2.12.so  [.] free
> >   0.50%  libstdc++.so.6.0.13   [.] std::basic_string std::char_traits, std::allocator >::basic_string(std::string
> const&)
> >
> > should I check anything else?
> > BR
> > nik
> >
> >
> > >
> > > Thanks & Regards
> > > Somnath
> > >
> > > -Original Message-

[ceph-users] radosgw default.conf

2015-04-27 Thread alistair.whittle

I have had been trying to configure my radosgw-agent on RHEL 6.5, but after a 
recent reboot of the gateway node discovered that the file needed by the 
/etc/init.d/radosgw-agent script has disappeared 
(/etc/ceph/radosgw-agent/default.conf).   As a result, I can no longer start up 
the radosgw.

Has anybody seen this happen before?   Am I missing something?

Thanks

___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Tuomas Juntunen

Thanks for the info.

For my knowledge there was no snapshots on that pool, but cannot verify that. 
Any way to make this work again?
Removing the tier and other settings didn't fix it, I tried it the second this 
happened.

Br,
Tuomas

-Original Message-
From: Samuel Just [mailto:sj...@redhat.com] 
Sent: 27. huhtikuuta 2015 15:50
To: tuomas juntunen
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic 
operations most of the OSD's went down

So, the base tier is what determines the snapshots for the cache/base pool 
amalgam.  You added a populated pool complete with snapshots on top of a base 
tier without snapshots.  Apparently, it caused an existential crisis for the 
snapshot code.  That's one of the reasons why there is a --force-nonempty flag 
for that operation, I think.  I think the immediate answer is probably to 
disallow pools with snapshots as a cache tier altogether until we think of a 
good way to make it work.
-Sam

- Original Message -
From: "tuomas juntunen" 
To: "Samuel Just" 
Cc: ceph-users@lists.ceph.com
Sent: Monday, April 27, 2015 4:56:58 AM
Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic 
operations most of the OSD's went down



The following:

ceph osd tier add img images --force-nonempty ceph osd tier cache-mode images 
forward ceph osd tier set-overlay img images

Idea was to make images as a tier to img, move data to img then change clients 
to use the new img pool.

Br,
Tuomas

> Can you explain exactly what you mean by:
>
> "Also I created one pool for tier to be able to move data without outage."
>
> -Sam
> - Original Message -
> From: "tuomas juntunen" 
> To: "Ian Colle" 
> Cc: ceph-users@lists.ceph.com
> Sent: Monday, April 27, 2015 4:23:44 AM
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some 
> basic operations most of the OSD's went down
>
> Hi
>
> Any solution for this yet?
>
> Br,
> Tuomas
>
>> It looks like you may have hit http://tracker.ceph.com/issues/7915
>>
>> Ian R. Colle
>> Global Director
>> of Software Engineering
>> Red Hat (Inktank is now part of Red Hat!) 
>> http://www.linkedin.com/in/ircolle
>> http://www.twitter.com/ircolle
>> Cell: +1.303.601.7713
>> Email: ico...@redhat.com
>>
>> - Original Message -
>> From: "tuomas juntunen" 
>> To: ceph-users@lists.ceph.com
>> Sent: Monday, April 27, 2015 1:56:29 PM
>> Subject: [ceph-users] Upgrade from Giant to Hammer and after some 
>> basic operations most of the OSD's went down
>>
>>
>>
>> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
>>
>> Then created new pools and deleted some old ones. Also I created one 
>> pool for tier to be able to move data without outage.
>>
>> After these operations all but 10 OSD's are down and creating this 
>> kind of messages to logs, I get more than 100gb of these in a night:
>>
>>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
>> 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
>> crt=8480'7 lcod
>> 0'0 inactive NOTIFY] enter Started
>>-18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
>> 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
>> crt=8480'7 lcod
>> 0'0 inactive NOTIFY] enter Start
>>-17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23 pg_epoch: 
>> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
>> 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
>> crt=8480'7 lcod
>> 0'0 inactive NOTIFY] state: transitioning to Stray
>>-16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
>> 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
>> crt=8480'7 lcod
>> 0'0 inactive NOTIFY] exit Start 0.25 0 0.00
>>-15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
>> 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
>> crt=8480'7 lcod
>> 0'0 inactive NOTIFY] enter Started/Stray
>>-14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] 
>> exit Reset 0.119467 4 0.37
>>-13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] 
>> enter Started
>>-12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23]

Re: [ceph-users] Calamari server not working after upgrade 0.87-1 -> 0.94-1

2015-04-27 Thread Alexandre DERUMIER

Hi, can you check on your ceph node

/var/log/salt/minion 

?

I have had some similar problem, I have need to remove

rm /etc/salt/pki/minion/minion_master.pub
/etc/init.d/salt-minion restart


(I don't known if "calamari-ctl clear" change the salt master key)


- Mail original -
De: "Steffen W Sørensen" 
À: "ceph-users" 
Envoyé: Lundi 27 Avril 2015 14:50:56
Objet: [ceph-users] Calamari server not working after upgrade 0.87-1 -> 0.94-1

All, 

After successfully upgrading from Giant to Hammer, at first our Calamari server 
seems fine, showing the new too many PGs. then during/after 
removing/consolidating various pool, it failed to get updated, Haven’t been 
able to find any RC, I decided to flush the Postgress DB (calamari-ctl clear 
—yes-I-am-sure) and try to start all over (calamari-ctl initialize) restarting 
all nodes’ salt-mimion + diamond, only now I just see this on my dashboard: 



This appears to be the first time you have started Calamari and there are no 
clusters currently configured. 


4 Ceph servers are connected to Calamari, but no Ceph cluster has been created 
yet. Please use ceph-deploy to create a cluster; please see the Inktank Ceph 
Enterprise documentation for more details. 

salt key are still accepted: 
root@node1:/var/log/calamari# salt-key -L 
Accepted Keys: 
node1. 
node2. 
node3. 
node4. 
Unaccepted Keys: 
Rejected Keys: 

Our cluster is of course running fine: 

root@node1:/var/log/calamari# ceph -s 
cluster 16fe2dcf-2629-422f-a649-871deba78bcd 
health HEALTH_OK 
monmap e29: 3 mons at {0=10.0.3.4:6789/0,1=10.0.3.2:6789/0,2=10.0.3.1:6789/0} 
election epoch 1382, quorum 0,1,2 2,1,0 
mdsmap e152: 1/1/1 up {0=2=up:active}, 1 up:standby 
osdmap e3579: 24 osds: 24 up, 24 in 
pgmap v4646340: 3072 pgs, 3 pools, 913 GB data, 229 kobjects 
1824 GB used, 1334 GB / 3159 GB avail 
3072 active+clean 
client io 32524 B/s wr, 11 op/s 


Any hints appreciated… 

TIA! 

/Steffen 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Tuomas Juntunen

I can sacrifice the images and img pools, if that is necessary.

Just need to get the thing going again 

Tuomas

-Original Message-
From: Samuel Just [mailto:sj...@redhat.com] 
Sent: 27. huhtikuuta 2015 15:50
To: tuomas juntunen
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic 
operations most of the OSD's went down

So, the base tier is what determines the snapshots for the cache/base pool 
amalgam.  You added a populated pool complete with snapshots on top of a base 
tier without snapshots.  Apparently, it caused an existential crisis for the 
snapshot code.  That's one of the reasons why there is a --force-nonempty flag 
for that operation, I think.  I think the immediate answer is probably to 
disallow pools with snapshots as a cache tier altogether until we think of a 
good way to make it work.
-Sam

- Original Message -
From: "tuomas juntunen" 
To: "Samuel Just" 
Cc: ceph-users@lists.ceph.com
Sent: Monday, April 27, 2015 4:56:58 AM
Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic 
operations most of the OSD's went down



The following:

ceph osd tier add img images --force-nonempty ceph osd tier cache-mode images 
forward ceph osd tier set-overlay img images

Idea was to make images as a tier to img, move data to img then change clients 
to use the new img pool.

Br,
Tuomas

> Can you explain exactly what you mean by:
>
> "Also I created one pool for tier to be able to move data without outage."
>
> -Sam
> - Original Message -
> From: "tuomas juntunen" 
> To: "Ian Colle" 
> Cc: ceph-users@lists.ceph.com
> Sent: Monday, April 27, 2015 4:23:44 AM
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some 
> basic operations most of the OSD's went down
>
> Hi
>
> Any solution for this yet?
>
> Br,
> Tuomas
>
>> It looks like you may have hit http://tracker.ceph.com/issues/7915
>>
>> Ian R. Colle
>> Global Director
>> of Software Engineering
>> Red Hat (Inktank is now part of Red Hat!) 
>> http://www.linkedin.com/in/ircolle
>> http://www.twitter.com/ircolle
>> Cell: +1.303.601.7713
>> Email: ico...@redhat.com
>>
>> - Original Message -
>> From: "tuomas juntunen" 
>> To: ceph-users@lists.ceph.com
>> Sent: Monday, April 27, 2015 1:56:29 PM
>> Subject: [ceph-users] Upgrade from Giant to Hammer and after some 
>> basic operations most of the OSD's went down
>>
>>
>>
>> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
>>
>> Then created new pools and deleted some old ones. Also I created one 
>> pool for tier to be able to move data without outage.
>>
>> After these operations all but 10 OSD's are down and creating this 
>> kind of messages to logs, I get more than 100gb of these in a night:
>>
>>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
>> 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
>> crt=8480'7 lcod
>> 0'0 inactive NOTIFY] enter Started
>>-18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
>> 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
>> crt=8480'7 lcod
>> 0'0 inactive NOTIFY] enter Start
>>-17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23 pg_epoch: 
>> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
>> 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
>> crt=8480'7 lcod
>> 0'0 inactive NOTIFY] state: transitioning to Stray
>>-16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
>> 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
>> crt=8480'7 lcod
>> 0'0 inactive NOTIFY] exit Start 0.25 0 0.00
>>-15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
>> 16609/16659
>> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
>> crt=8480'7 lcod
>> 0'0 inactive NOTIFY] enter Started/Stray
>>-14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] 
>> exit Reset 0.119467 4 0.37
>>-13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] 
>> enter Started
>>-12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 osd.23 pg_epoch: 
>> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
>> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] 
>> enter Start
>>-11> 2015-04-27 10:17:08.808838 7fd8e748d700  1 osd.23 pg_epo

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-27 Thread Mark Nelson


Hi Alex,

Is it possible that you were suffering from the bug during the first 
test but once reinstalled you hadn't hit it yet?  That's a pretty major 
performance swing.  I'm not sure if we can draw any conclusions about 
jemalloc vs tcmalloc until we can figure out what went wrong.


Mark

On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote:

I'll retest tcmalloc, because I was prety sure to have patched it correctly.


Ok, I really think I have patched tcmalloc wrongly.
I have repatched it, reinstalled it, and now I'm getting 195k iops with a 
single osd (10fio rbd jobs 4k randread).

So better than jemalloc.


- Mail original -
De: "aderumier" 
À: "Mark Nelson" 
Cc: "ceph-users" , "ceph-devel" , 
"Milosz Tanski" 
Envoyé: Lundi 27 Avril 2015 07:01:21
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops

Hi,

also another big difference,

I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k 
iops max with tcmalloc.

I'll retest tcmalloc, because I was prety sure to have patched it correctly.


- Mail original -
De: "aderumier" 
À: "Mark Nelson" 
Cc: "ceph-users" , "ceph-devel" , 
"Milosz Tanski" 
Envoyé: Samedi 25 Avril 2015 06:45:43
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops


We haven't done any kind of real testing on jemalloc, so use at your own
peril. Having said that, we've also been very interested in hearing
community feedback from folks trying it out, so please feel free to give
it a shot. :D


Some feedback, I have runned bench all the night, no speed regression.

And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to 
be the reverse)

with tcmalloc :

10 fio-rbd jobs = 300k iops
15 fio-rbd jobs = 290k iops
20 fio-rbd jobs = 270k iops
40 fio-rbd jobs = 250k iops

(all with up and down values during the fio bench)


with jemalloc:

10 fio-rbd jobs = 300k iops
15 fio-rbd jobs = 320k iops
20 fio-rbd jobs = 330k iops
40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 
20cores 100%)

(all with contant values during the fio bench)

- Mail original -
De: "Mark Nelson" 
À: "Stefan Priebe" , "aderumier" 
Cc: "ceph-users" , "ceph-devel" , "Somnath 
Roy" , "Milosz Tanski" 
Envoyé: Vendredi 24 Avril 2015 20:02:15
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops

We haven't done any kind of real testing on jemalloc, so use at your own
peril. Having said that, we've also been very interested in hearing
community feedback from folks trying it out, so please feel free to give
it a shot. :D

Mark

On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote:

Is jemalloc recommanded in general? Does it also work for firefly?

Stefan

Excuse my typo sent from my mobile phone.

Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER mailto:aderum...@odiso.com>>:


Hi,

I have finished to rebuild ceph with jemalloc,

all seem to working fine.

I got a constant 300k iops for the moment, so no speed regression.

I'll do more long benchmark next week.

Regards,

Alexandre

- Mail original -
De: "Irek Fasikhov" mailto:malm...@gmail.com>>
À: "Somnath Roy" mailto:somnath@sandisk.com>>
Cc: "aderumier" mailto:aderum...@odiso.com>>,
"Mark Nelson" mailto:mnel...@redhat.com>>,
"ceph-users" mailto:ceph-users@lists.ceph.com>>, "ceph-devel"
mailto:ceph-de...@vger.kernel.org>>,
"Milosz Tanski" mailto:mil...@adfin.com>>
Envoyé: Vendredi 24 Avril 2015 13:37:52
Objet: Re: [ceph-users] strange benchmark problem : restarting osd
daemon improve performance from 100k iops to 300k iops

Hi,Alexandre!
Do not try to change the parameter vm.min_free_kbytes?

2015-04-23 19:24 GMT+03:00 Somnath Roy < somnath@sandisk.com
 > :


Alexandre,
You can configure with --with-jemalloc or ./do_autogen -J to build
ceph with jemalloc.

Thanks & Regards
Somnath

-Original Message-
From: ceph-users [mailto: ceph-users-boun...@lists.ceph.com
 ] On Behalf Of Alexandre
DERUMIER
Sent: Thursday, April 23, 2015 4:56 AM
To: Mark Nelson
Cc: ceph-users; ceph-devel; Milosz Tanski
Subject: Re: [ceph-users] strange benchmark problem : restarting osd
daemon improve performance from 100k iops to 300k iops


If you have the means to compile the same version of ceph with
jemalloc, I would be very interested to see how it does.


Yes, sure. (I have around 3-4 weeks to do all the benchs)

But I don't know how to do it ?
I'm running the cluster on centos7.1, maybe it can be easy to patch
the srpms to rebuild the package with jemalloc.



- Mail original -
De: "Mark Nelson" < mnel...@redhat.com  >
À: "aderumier" < aderum...@odiso.com  >,
"Srinivasula Maram" < srinivasula.ma...@sandisk.com

Re: [ceph-users] Ceph Radosgw multi zone data replication failure

2015-04-27 Thread Vickey Singh

Hello Cephers

Still waiting for your hep.

I tried sever things but no luck.



On Mon, Apr 27, 2015 at 9:07 AM, Vickey Singh 
wrote:

> Any help with related to this problem would be highly appreciated.
>
> -VS-
>
>
> On Sun, Apr 26, 2015 at 6:01 PM, Vickey Singh  > wrote:
>
>> Hello Geeks
>>
>>
>> I am trying to setup Ceph Radosgw multi site data replication using
>> official documentation
>> http://ceph.com/docs/master/radosgw/federated-config/#multi-site-data-replication
>>
>>
>> Everything seems to work except radosgw-agent sync , Request you to
>> please check the below outputs and help me in any possible way.
>>
>>
>> *Environment : *
>>
>>
>> CentOS 7.0.1406
>>
>> Ceph Versino 0.87.1
>>
>> Rados Gateway configured using Civetweb
>>
>>
>>
>> *Radosgw zone list : Works nicely *
>>
>>
>> [root@us-east-1 ceph]# radosgw-admin zone list --name
>> client.radosgw.us-east-1
>>
>> { "zones": [
>>
>> "us-west",
>>
>> "us-east"]}
>>
>> [root@us-east-1 ceph]#
>>
>>
>> *Curl request to master zone : Works nicely *
>>
>>
>> [root@us-east-1 ceph]# curl http://us-east-1.crosslogic.com:7480
>>
>> http://s3.amazonaws.com/doc/2006-03-01/
>> ">anonymous
>>
>> [root@us-east-1 ceph]#
>>
>>
>> *Curl request to secondary zone : Works nicely *
>>
>>
>> [root@us-east-1 ceph]# curl http://us-west-1.crosslogic.com:7480
>>
>> http://s3.amazonaws.com/doc/2006-03-01/
>> ">anonymous
>>
>> [root@us-east-1 ceph]#
>>
>>
>> *Rados Gateway agent configuration file : Seems correct, no TYPO errors*
>>
>>
>> [root@us-east-1 ceph]# cat cluster-data-sync.conf
>>
>> src_access_key: M7QAKDH8CYGTK86CG93U
>>
>> src_secret_key: 0xQR6PINk23W\/GYrWJ14aF+1stG56M6xMkqkdloO
>>
>> destination: http://us-west-1.crosslogic.com:7480
>>
>> dest_access_key: ZQ32ES1WAWPG05YMZ7T7
>>
>> dest_secret_key: INvk8AkrZRsejLEL34yRpMLmOqydt8ncOXy4RHCM
>>
>> log_file: /var/log/radosgw/radosgw-sync-us-east-west.log
>>
>> [root@us-east-1 ceph]#
>>
>>
>> *Rados Gateway agent SYNC : Fails , however it can fetch region map so i
>> think src and dest KEYS are correct. But don't know why it fails on
>> AttributeError *
>>
>>
>>
>> *[root@us-east-1 ceph]# radosgw-agent -c cluster-data-sync.conf*
>>
>> *region map is: {u'us': [u'us-west', u'us-east']}*
>>
>> *Traceback (most recent call last):*
>>
>> *  File "/usr/bin/radosgw-agent", line 21, in *
>>
>> *sys.exit(main())*
>>
>> *  File "/usr/lib/python2.7/site-packages/radosgw_agent/cli.py", line
>> 275, in main*
>>
>> *except client.ClientException as e:*
>>
>> *AttributeError: 'module' object has no attribute 'ClientException'*
>>
>> *[root@us-east-1 ceph]#*
>>
>>
>> *Can query to Ceph cluster using us-east-1 ID*
>>
>>
>> [root@us-east-1 ceph]# ceph -s --name client.radosgw.us-east-1
>>
>> cluster 9609b429-eee2-4e23-af31-28a24fcf5cbc
>>
>>  health HEALTH_OK
>>
>>  monmap e3: 3 mons at {ceph-node1=
>> 192.168.1.101:6789/0,ceph-node2=192.168.1.102:6789/0,ceph-node3=192.168.1.103:6789/0},
>> election epoch 448, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3
>>
>>  osdmap e1063: 9 osds: 9 up, 9 in
>>
>>   pgmap v8473: 1500 pgs, 43 pools, 374 MB data, 2852 objects
>>
>> 1193 MB used, 133 GB / 134 GB avail
>>
>> 1500 active+clean
>>
>> [root@us-east-1 ceph]#
>>
>>
>> *Can query to Ceph cluster using us-west-1 ID*
>>
>>
>> [root@us-east-1 ceph]# ceph -s --name client.radosgw.us-west-1
>>
>> cluster 9609b429-eee2-4e23-af31-28a24fcf5cbc
>>
>>  health HEALTH_OK
>>
>>  monmap e3: 3 mons at {ceph-node1=
>> 192.168.1.101:6789/0,ceph-node2=192.168.1.102:6789/0,ceph-node3=192.168.1.103:6789/0},
>> election epoch 448, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3
>>
>>  osdmap e1063: 9 osds: 9 up, 9 in
>>
>>   pgmap v8473: 1500 pgs, 43 pools, 374 MB data, 2852 objects
>>
>> 1193 MB used, 133 GB / 134 GB avail
>>
>> 1500 active+clean
>>
>> [root@us-east-1 ceph]#
>>
>>
>> *Hope these packages are correct*
>>
>>
>> [root@us-east-1 ceph]# rpm -qa | egrep -i "ceph|radosgw"
>>
>> libcephfs1-0.87.1-0.el7.centos.x86_64
>>
>> ceph-common-0.87.1-0.el7.centos.x86_64
>>
>> python-ceph-0.87.1-0.el7.centos.x86_64
>>
>> ceph-radosgw-0.87.1-0.el7.centos.x86_64
>>
>> ceph-release-1-0.el7.noarch
>>
>> ceph-0.87.1-0.el7.centos.x86_64
>>
>> radosgw-agent-1.2.1-0.el7.centos.noarch
>>
>> [root@us-east-1 ceph]#
>>
>>
>>
>> Regards
>>
>> VS
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-27 Thread Alexandre DERUMIER

>>Is it possible that you were suffering from the bug during the first 
>>test but once reinstalled you hadn't hit it yet?  

yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning.
I had patched it, but I think it's not enough.
I had always this bug in random, but mainly when I have a "lot" of concurrent 
client (20 -40).
more client increase - lower iops .


Today,I had try to start osd with TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M , 
and now it's working fine in all my benchs.


>>That's a pretty major 
>>performance swing.  I'm not sure if we can draw any conclusions about 
>>jemalloc vs tcmalloc until we can figure out what went wrong.

From my bench, jemalloc use a little bit more cpu than tcmalloc (maybe 1% or 
2%).
Tcmalloc seem to works better, with correct tuning of thread_cache_bytes.


But I don't known how to tune TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES correctly.
Maybe Sommath can tell us ?


- Mail original -
De: "Mark Nelson" 
À: "aderumier" 
Cc: "ceph-users" , "ceph-devel" 
, "Milosz Tanski" 
Envoyé: Lundi 27 Avril 2015 16:54:34
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops

Hi Alex, 

Is it possible that you were suffering from the bug during the first 
test but once reinstalled you hadn't hit it yet? That's a pretty major 
performance swing. I'm not sure if we can draw any conclusions about 
jemalloc vs tcmalloc until we can figure out what went wrong. 

Mark 

On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote: 
>>> I'll retest tcmalloc, because I was prety sure to have patched it 
>>> correctly. 
> 
> Ok, I really think I have patched tcmalloc wrongly. 
> I have repatched it, reinstalled it, and now I'm getting 195k iops with a 
> single osd (10fio rbd jobs 4k randread). 
> 
> So better than jemalloc. 
> 
> 
> - Mail original - 
> De: "aderumier"  
> À: "Mark Nelson"  
> Cc: "ceph-users" , "ceph-devel" 
> , "Milosz Tanski"  
> Envoyé: Lundi 27 Avril 2015 07:01:21 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
> improve performance from 100k iops to 300k iops 
> 
> Hi, 
> 
> also another big difference, 
> 
> I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k 
> iops max with tcmalloc. 
> 
> I'll retest tcmalloc, because I was prety sure to have patched it correctly. 
> 
> 
> - Mail original - 
> De: "aderumier"  
> À: "Mark Nelson"  
> Cc: "ceph-users" , "ceph-devel" 
> , "Milosz Tanski"  
> Envoyé: Samedi 25 Avril 2015 06:45:43 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
> improve performance from 100k iops to 300k iops 
> 
>>> We haven't done any kind of real testing on jemalloc, so use at your own 
>>> peril. Having said that, we've also been very interested in hearing 
>>> community feedback from folks trying it out, so please feel free to give 
>>> it a shot. :D 
> 
> Some feedback, I have runned bench all the night, no speed regression. 
> 
> And I have a speed increase with fio with more jobs. (with tcmalloc, it seem 
> to be the reverse) 
> 
> with tcmalloc : 
> 
> 10 fio-rbd jobs = 300k iops 
> 15 fio-rbd jobs = 290k iops 
> 20 fio-rbd jobs = 270k iops 
> 40 fio-rbd jobs = 250k iops 
> 
> (all with up and down values during the fio bench) 
> 
> 
> with jemalloc: 
> 
> 10 fio-rbd jobs = 300k iops 
> 15 fio-rbd jobs = 320k iops 
> 20 fio-rbd jobs = 330k iops 
> 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine 
> with 20cores 100%) 
> 
> (all with contant values during the fio bench) 
> 
> - Mail original - 
> De: "Mark Nelson"  
> À: "Stefan Priebe" , "aderumier"  
> Cc: "ceph-users" , "ceph-devel" 
> , "Somnath Roy" , 
> "Milosz Tanski"  
> Envoyé: Vendredi 24 Avril 2015 20:02:15 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
> improve performance from 100k iops to 300k iops 
> 
> We haven't done any kind of real testing on jemalloc, so use at your own 
> peril. Having said that, we've also been very interested in hearing 
> community feedback from folks trying it out, so please feel free to give 
> it a shot. :D 
> 
> Mark 
> 
> On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote: 
>> Is jemalloc recommanded in general? Does it also work for firefly? 
>> 
>> Stefan 
>> 
>> Excuse my typo sent from my mobile phone. 
>> 
>> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER > >: 
>> 
>>> Hi, 
>>> 
>>> I have finished to rebuild ceph with jemalloc, 
>>> 
>>> all seem to working fine. 
>>> 
>>> I got a constant 300k iops for the moment, so no speed regression. 
>>> 
>>> I'll do more long benchmark next week. 
>>> 
>>> Regards, 
>>> 
>>> Alexandre 
>>> 
>>> - Mail original - 
>>> De: "Irek Fasikhov" mailto:malm...@gmail.com>> 
>>> À: "Somnath Roy" >> > 
>>> Cc: "aderumier" mailto:aderum...@odiso.com>>, 
>>> "Mark Nelson" mailto:mnel...@redhat.c

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-27 Thread Sage Weil

On Mon, 27 Apr 2015, Alexandre DERUMIER wrote:
> >>If I want to use librados API for performance testing, are there any 
> >>existing benchmark tools which directly accesses librados (not through 
> >>rbd or gateway) 
> 
> you can use "rados bench" from ceph packages
> 
> http://ceph.com/docs/master/man/8/rados/
> 
> "
> bench seconds mode [ -b objsize ] [ -t threads ]
> Benchmark for seconds. The mode can be write, seq, or rand. seq and rand are 
> read benchmarks, either sequential or random. Before running one of the 
> reading benchmarks, run a write benchmark with the ?no-cleanup option. The 
> default object size is 4 MB, and the default number of simulated threads 
> (parallel writes) is 16.
> "

This one creates whole objects.  You might also look at ceph_smalliobench 
(in the ceph-tests package) which is a bit more featureful but less 
friendly to use.

Also, fio has an rbd driver.

sage


> 
> 
> - Mail original -
> De: "Venkateswara Rao Jujjuri" 
> À: "aderumier" 
> Cc: "Mark Nelson" , "ceph-users" 
> , "ceph-devel" , 
> "Milosz Tanski" 
> Envoyé: Lundi 27 Avril 2015 08:12:49
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
> improve performance from 100k iops to 300k iops
> 
> If I want to use librados API for performance testing, are there any 
> existing benchmark tools which directly accesses librados (not through 
> rbd or gateway) 
> 
> Thanks in advance, 
> JV 
> 
> On Sun, Apr 26, 2015 at 10:46 PM, Alexandre DERUMIER 
>  wrote: 
> >>>I'll retest tcmalloc, because I was prety sure to have patched it 
> >>>correctly. 
> > 
> > Ok, I really think I have patched tcmalloc wrongly. 
> > I have repatched it, reinstalled it, and now I'm getting 195k iops with a 
> > single osd (10fio rbd jobs 4k randread). 
> > 
> > So better than jemalloc. 
> > 
> > 
> > - Mail original - 
> > De: "aderumier"  
> > À: "Mark Nelson"  
> > Cc: "ceph-users" , "ceph-devel" 
> > , "Milosz Tanski"  
> > Envoyé: Lundi 27 Avril 2015 07:01:21 
> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
> > improve performance from 100k iops to 300k iops 
> > 
> > Hi, 
> > 
> > also another big difference, 
> > 
> > I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 
> > 50k iops max with tcmalloc. 
> > 
> > I'll retest tcmalloc, because I was prety sure to have patched it 
> > correctly. 
> > 
> > 
> > - Mail original - 
> > De: "aderumier"  
> > À: "Mark Nelson"  
> > Cc: "ceph-users" , "ceph-devel" 
> > , "Milosz Tanski"  
> > Envoyé: Samedi 25 Avril 2015 06:45:43 
> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
> > improve performance from 100k iops to 300k iops 
> > 
> >>>We haven't done any kind of real testing on jemalloc, so use at your own 
> >>>peril. Having said that, we've also been very interested in hearing 
> >>>community feedback from folks trying it out, so please feel free to give 
> >>>it a shot. :D 
> > 
> > Some feedback, I have runned bench all the night, no speed regression. 
> > 
> > And I have a speed increase with fio with more jobs. (with tcmalloc, it 
> > seem to be the reverse) 
> > 
> > with tcmalloc : 
> > 
> > 10 fio-rbd jobs = 300k iops 
> > 15 fio-rbd jobs = 290k iops 
> > 20 fio-rbd jobs = 270k iops 
> > 40 fio-rbd jobs = 250k iops 
> > 
> > (all with up and down values during the fio bench) 
> > 
> > 
> > with jemalloc: 
> > 
> > 10 fio-rbd jobs = 300k iops 
> > 15 fio-rbd jobs = 320k iops 
> > 20 fio-rbd jobs = 330k iops 
> > 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine 
> > with 20cores 100%) 
> > 
> > (all with contant values during the fio bench) 
> > 
> > - Mail original - 
> > De: "Mark Nelson"  
> > À: "Stefan Priebe" , "aderumier" 
> >  
> > Cc: "ceph-users" , "ceph-devel" 
> > , "Somnath Roy" , 
> > "Milosz Tanski"  
> > Envoyé: Vendredi 24 Avril 2015 20:02:15 
> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
> > improve performance from 100k iops to 300k iops 
> > 
> > We haven't done any kind of real testing on jemalloc, so use at your own 
> > peril. Having said that, we've also been very interested in hearing 
> > community feedback from folks trying it out, so please feel free to give 
> > it a shot. :D 
> > 
> > Mark 
> > 
> > On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote: 
> >> Is jemalloc recommanded in general? Does it also work for firefly? 
> >> 
> >> Stefan 
> >> 
> >> Excuse my typo sent from my mobile phone. 
> >> 
> >> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER  >> >: 
> >> 
> >>> Hi, 
> >>> 
> >>> I have finished to rebuild ceph with jemalloc, 
> >>> 
> >>> all seem to working fine. 
> >>> 
> >>> I got a constant 300k iops for the moment, so no speed regression. 
> >>> 
> >>> I'll do more long benchmark next week. 
> >>> 
> >>> Regards, 
> >>> 
> >>> Alexandre 
> >>> 
> >>> - Mail original - 
> >>> De: "Irek Fasikhov"

Re: [ceph-users] CephFs - Ceph-fuse Client Read Performance During Cache Tier Flushing

2015-04-27 Thread Mohamed Pakkeer

Hi all

The issue is resolved after upgrading Ceph from Giant to Hammer(0.94.1)

cheers
K.Mohamed Pakkeer

On Sun, Apr 26, 2015 at 11:28 AM, Mohamed Pakkeer 
wrote:

> Hi
>
>  I was doing some testing on erasure coded based CephFS cluster. cluster
> is running with giant 0.87.1 release.
>
>
>
> Cluster info
>
> 15 * 36 drives node(journal on same osd)
>
> 3 * 4 drives SSD cache node( Intel DC3500)
>
> 3 * MON/MDS
>
> EC 10 +3
>
> 10G Ethernet for private and cluster network
>
>
>
> We got approx. 55MB/s read transfer speed using ceph-fuse client, when the
> data was available on cache tier( cold storage was empty). When I tried to
> add more data, ceph started the flushing the data from cache tier to cold
> storage. During flushing, cluster read speed became approx 100 KB/s. But I
> got 50 – 55MB/s write transfer speed during flushing from multiple
> simultaneous ceph-fuse client( 1G Ethernet). I think there is an issue on
> data migration from cold storage to cache tier during ceph-fuse client
> read. Am I hitting any known issue/bug or is there any issue with my
> cluster?
>
>
>
> I used big video files( approx 5 GB to 10 GB) for this testing .
>
>
>
> Any help ?
>
> Cheers
> K.Mohamed Pakkeer
>
>
>


-- 
Thanks & Regards
K.Mohamed Pakkeer
Mobile- 0091-8754410114
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow Files

2015-04-27 Thread Yehuda Sadeh-Weinraub

It will get to the ceph mainline eventually. We're still reviewing and testing 
the fix, and there's more work to be done on the cleanup tool.

Yehuda

- Original Message -
> From: "Ben" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: "ceph-users" 
> Sent: Sunday, April 26, 2015 11:02:23 PM
> Subject: Re: [ceph-users] Shadow Files
> 
> Are these fixes going to make it into the repository versions of ceph,
> or will we be required to compile and install manually?
> 
> On 2015-04-26 02:29, Yehuda Sadeh-Weinraub wrote:
> > Yeah, that's definitely something that we'd address soon.
> > 
> > Yehuda
> > 
> > - Original Message -
> >> From: "Ben" 
> >> To: "Ben Hines" , "Yehuda Sadeh-Weinraub"
> >> 
> >> Cc: "ceph-users" 
> >> Sent: Friday, April 24, 2015 5:14:11 PM
> >> Subject: Re: [ceph-users] Shadow Files
> >> 
> >> Definitely need something to help clear out these old shadow files.
> >> 
> >> I'm sure our cluster has around 100TB of these shadow files.
> >> 
> >> I've written a script to go through known objects to get prefixes of
> >> objects
> >> that should exist to compare to ones that shouldn't, but the time it
> >> takes
> >> to do this over millions and millions of objects is just too long.
> >> 
> >> On 25/04/15 09:53, Ben Hines wrote:
> >> 
> >> 
> >> 
> >> When these are fixed it would be great to get good steps for listing /
> >> cleaning up any orphaned objects. I have suspicions this is affecting
> >> us.
> >> 
> >> thanks-
> >> 
> >> -Ben
> >> 
> >> On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub <
> >> yeh...@redhat.com >
> >> wrote:
> >> 
> >> 
> >> These ones:
> >> 
> >> http://tracker.ceph.com/issues/10295
> >> http://tracker.ceph.com/issues/11447
> >> 
> >> - Original Message -
> >> > From: "Ben Jackson" 
> >> > To: "Yehuda Sadeh-Weinraub" < yeh...@redhat.com >
> >> > Cc: "ceph-users" < ceph-us...@ceph.com >
> >> > Sent: Friday, April 24, 2015 3:06:02 PM
> >> > Subject: Re: [ceph-users] Shadow Files
> >> >
> >> > We were firefly, then we upgraded to giant, now we are on hammer.
> >> >
> >> > What issues?
> >> >
> >> > On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub < yeh...@redhat.com >
> >> > wrote:
> >> > >
> >> > > What version are you running? There are two different issues that we
> >> > > were
> >> > > fixing this week, and we should have that upstream pretty soon.
> >> > >
> >> > > Yehuda
> >> > >
> >> > > - Original Message -
> >> > > > From: "Ben" 
> >> > > > To: "ceph-users" < ceph-us...@ceph.com >
> >> > > > Cc: "Yehuda Sadeh-Weinraub" < yeh...@redhat.com >
> >> > > > Sent: Thursday, April 23, 2015 7:42:06 PM
> >> > > > Subject: [ceph-users] Shadow Files
> >> > > >
> >> > > > We are still experiencing a problem with out gateway not properly
> >> > > > clearing out shadow files.
> >> > > >
> >> > > > I have done numerous tests where I have:
> >> > > > -Uploaded a file of 1.5GB in size using s3browser application
> >> > > > -Done an object stat on the file to get its prefix
> >> > > > -Done rados ls -p .rgw.buckets | grep  to count the number
> >> > > > of
> >> > > > shadow files associated (in this case it is around 290 shadow files)
> >> > > > -Deleted said file with s3browser
> >> > > > -Performed a gc list, which shows the ~290 files listed
> >> > > > -Waited 24 hours to redo the rados ls -p .rgw.buckets | grep
> >> > > > 
> >> > > > to
> >> > > > recount the shadow files only to be left with 290 files still there
> >> > > >
> >> > > > From log output /var/log/ceph/radosgw.log, I can see the following
> >> > > > when
> >> > > > clicking DELETE (this appears 290 times)
> >> > > > 2015-04-24 10:43:29.996523 7f0b0afb5700 0
> >> > > > RGWObjManifest::operator++():
> >> > > > result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule->part_size=0
> >> > > > 2015-04-24 10:43:29.996557 7f0b0afb5700 0
> >> > > > RGWObjManifest::operator++():
> >> > > > result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule->part_size=0
> >> > > > 2015-04-24 10:43:29.996564 7f0b0afb5700 0
> >> > > > RGWObjManifest::operator++():
> >> > > > result: ofs=13107200 stripe_ofs=13107200 part_ofs=0
> >> > > > rule->part_size=0
> >> > > > 2015-04-24 10:43:29.996570 7f0b0afb5700 0
> >> > > > RGWObjManifest::operator++():
> >> > > > result: ofs=17301504 stripe_ofs=17301504 part_ofs=0
> >> > > > rule->part_size=0
> >> > > > 2015-04-24 10:43:29.996576 7f0b0afb5700 0
> >> > > > RGWObjManifest::operator++():
> >> > > > result: ofs=21495808 stripe_ofs=21495808 part_ofs=0
> >> > > > rule->part_size=0
> >> > > > 2015-04-24 10:43:29.996581 7f0b0afb5700 0
> >> > > > RGWObjManifest::operator++():
> >> > > > result: ofs=25690112 stripe_ofs=25690112 part_ofs=0
> >> > > > rule->part_size=0
> >> > > > 2015-04-24 10:43:29.996586 7f0b0afb5700 0
> >> > > > RGWObjManifest::operator++():
> >> > > > result: ofs=29884416 stripe_ofs=29884416 part_ofs=0
> >> > > > rule->part_size=0
> >> > > > 2015-04-24 10:43:29.996592 7f0b0afb5700 0
> >> > > > RGWObjManifest::operator++():
> >> > > > result: ofs=34078720 stripe_o

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-27 Thread Dan van der Ster

Hi Sage, Alexandre et al.

Here's another data point... we noticed something similar awhile ago.

After we restart our OSDs the "4kB object write latency" [1]
temporarily drops from ~8-10ms down to around 3-4ms. Then slowly over
time the latency increases back to 8-10ms. The time that the OSDs stay
with low latency is a function of how much work those OSDs are doing
(i.e. on our idle test cluster, they stay with low latency for a
couple hours; on our production cluster the latency is high again
pretty much immediately).

We also attributed this to the
tcmalloc::ThreadCache::ReleaseToCentralCache issue, since that
function is always very high %-wise in perf top. And finally today we
managed to get the fixed tcmalloc [2] on our el6 servers and tried the
larger cache. And as we expected, with 128M cache size [3] the latency
is staying low (actually below 3ms on the test cluster vs 9ms earlier
today).

We should probably send a patched init script option to make this configurable.

Cheers, Dan


[1] rados bench -p test -b 4096 -t 1
[2] rpmbuild --rebuild
https://kojipkgs.fedoraproject.org//packages/gperftools/2.4/1.fc23/src/gperftools-2.4-1.fc23.src.rpm
[3]

--- /tmp/ceph 2015-04-27 17:43:56.726216645 +0200
+++ /etc/init.d/ceph 2015-04-27 17:21:58.567859403 +0200
@@ -306,7 +306,7 @@
 if [ -n "$SYSTEMD_RUN" ]; then
  cmd="$SYSTEMD_RUN -r bash -c '$files $cmd --cluster $cluster -f'"
 else
- cmd="$files $wrap $cmd --cluster $cluster $runmode"
+ cmd="export TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728; $files
$wrap $cmd --cluster $cluster $runmode"
 fi

 if [ $dofsmount -eq 1 ] && [ -n "$fs_devs" ]; then

On Mon, Apr 27, 2015 at 5:27 PM, Sage Weil  wrote:
> On Mon, 27 Apr 2015, Alexandre DERUMIER wrote:
>> >>If I want to use librados API for performance testing, are there any
>> >>existing benchmark tools which directly accesses librados (not through
>> >>rbd or gateway)
>>
>> you can use "rados bench" from ceph packages
>>
>> http://ceph.com/docs/master/man/8/rados/
>>
>> "
>> bench seconds mode [ -b objsize ] [ -t threads ]
>> Benchmark for seconds. The mode can be write, seq, or rand. seq and rand are 
>> read benchmarks, either sequential or random. Before running one of the 
>> reading benchmarks, run a write benchmark with the ?no-cleanup option. The 
>> default object size is 4 MB, and the default number of simulated threads 
>> (parallel writes) is 16.
>> "
>
> This one creates whole objects.  You might also look at ceph_smalliobench
> (in the ceph-tests package) which is a bit more featureful but less
> friendly to use.
>
> Also, fio has an rbd driver.
>
> sage
>
>
>>
>>
>> - Mail original -
>> De: "Venkateswara Rao Jujjuri" 
>> À: "aderumier" 
>> Cc: "Mark Nelson" , "ceph-users" 
>> , "ceph-devel" , 
>> "Milosz Tanski" 
>> Envoyé: Lundi 27 Avril 2015 08:12:49
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
>> improve performance from 100k iops to 300k iops
>>
>> If I want to use librados API for performance testing, are there any
>> existing benchmark tools which directly accesses librados (not through
>> rbd or gateway)
>>
>> Thanks in advance,
>> JV
>>
>> On Sun, Apr 26, 2015 at 10:46 PM, Alexandre DERUMIER
>>  wrote:
>> >>>I'll retest tcmalloc, because I was prety sure to have patched it 
>> >>>correctly.
>> >
>> > Ok, I really think I have patched tcmalloc wrongly.
>> > I have repatched it, reinstalled it, and now I'm getting 195k iops with a 
>> > single osd (10fio rbd jobs 4k randread).
>> >
>> > So better than jemalloc.
>> >
>> >
>> > - Mail original -
>> > De: "aderumier" 
>> > À: "Mark Nelson" 
>> > Cc: "ceph-users" , "ceph-devel" 
>> > , "Milosz Tanski" 
>> > Envoyé: Lundi 27 Avril 2015 07:01:21
>> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
>> > improve performance from 100k iops to 300k iops
>> >
>> > Hi,
>> >
>> > also another big difference,
>> >
>> > I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 
>> > 50k iops max with tcmalloc.
>> >
>> > I'll retest tcmalloc, because I was prety sure to have patched it 
>> > correctly.
>> >
>> >
>> > - Mail original -
>> > De: "aderumier" 
>> > À: "Mark Nelson" 
>> > Cc: "ceph-users" , "ceph-devel" 
>> > , "Milosz Tanski" 
>> > Envoyé: Samedi 25 Avril 2015 06:45:43
>> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
>> > improve performance from 100k iops to 300k iops
>> >
>> >>>We haven't done any kind of real testing on jemalloc, so use at your own
>> >>>peril. Having said that, we've also been very interested in hearing
>> >>>community feedback from folks trying it out, so please feel free to give
>> >>>it a shot. :D
>> >
>> > Some feedback, I have runned bench all the night, no speed regression.
>> >
>> > And I have a speed increase with fio with more jobs. (with tcmalloc, it 
>> > seem to be the reverse)
>> >
>> > with tcmalloc :
>> >
>> > 10 fio-rbd jobs = 300k iops
>> > 15 fio-rbd jobs = 29

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Sage Weil

On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> Thanks for the info.
> 
> For my knowledge there was no snapshots on that pool, but cannot verify 
> that. 

Can you attach a 'ceph osd dump -f json-pretty'?  That will shed a 
bit more light on what happened (and the simplest way to fix it).

sage


> Any way to make this work again? Removing the tier and other settings 
> didn't fix it, I tried it the second this happened.
> 
> Br,
> Tuomas
> 
> -Original Message-
> From: Samuel Just [mailto:sj...@redhat.com] 
> Sent: 27. huhtikuuta 2015 15:50
> To: tuomas juntunen
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic 
> operations most of the OSD's went down
> 
> So, the base tier is what determines the snapshots for the cache/base pool 
> amalgam.  You added a populated pool complete with snapshots on top of a base 
> tier without snapshots.  Apparently, it caused an existential crisis for the 
> snapshot code.  That's one of the reasons why there is a --force-nonempty 
> flag for that operation, I think.  I think the immediate answer is probably 
> to disallow pools with snapshots as a cache tier altogether until we think of 
> a good way to make it work.
> -Sam
> 
> - Original Message -
> From: "tuomas juntunen" 
> To: "Samuel Just" 
> Cc: ceph-users@lists.ceph.com
> Sent: Monday, April 27, 2015 4:56:58 AM
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic 
> operations most of the OSD's went down
> 
> 
> 
> The following:
> 
> ceph osd tier add img images --force-nonempty ceph osd tier cache-mode images 
> forward ceph osd tier set-overlay img images
> 
> Idea was to make images as a tier to img, move data to img then change 
> clients to use the new img pool.
> 
> Br,
> Tuomas
> 
> > Can you explain exactly what you mean by:
> >
> > "Also I created one pool for tier to be able to move data without outage."
> >
> > -Sam
> > - Original Message -
> > From: "tuomas juntunen" 
> > To: "Ian Colle" 
> > Cc: ceph-users@lists.ceph.com
> > Sent: Monday, April 27, 2015 4:23:44 AM
> > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some 
> > basic operations most of the OSD's went down
> >
> > Hi
> >
> > Any solution for this yet?
> >
> > Br,
> > Tuomas
> >
> >> It looks like you may have hit http://tracker.ceph.com/issues/7915
> >>
> >> Ian R. Colle
> >> Global Director
> >> of Software Engineering
> >> Red Hat (Inktank is now part of Red Hat!) 
> >> http://www.linkedin.com/in/ircolle
> >> http://www.twitter.com/ircolle
> >> Cell: +1.303.601.7713
> >> Email: ico...@redhat.com
> >>
> >> - Original Message -
> >> From: "tuomas juntunen" 
> >> To: ceph-users@lists.ceph.com
> >> Sent: Monday, April 27, 2015 1:56:29 PM
> >> Subject: [ceph-users] Upgrade from Giant to Hammer and after some 
> >> basic operations most of the OSD's went down
> >>
> >>
> >>
> >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> >>
> >> Then created new pools and deleted some old ones. Also I created one 
> >> pool for tier to be able to move data without outage.
> >>
> >> After these operations all but 10 OSD's are down and creating this 
> >> kind of messages to logs, I get more than 100gb of these in a night:
> >>
> >>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23 pg_epoch: 
> >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
> >> 16609/16659
> >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
> >> crt=8480'7 lcod
> >> 0'0 inactive NOTIFY] enter Started
> >>-18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23 pg_epoch: 
> >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
> >> 16609/16659
> >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
> >> crt=8480'7 lcod
> >> 0'0 inactive NOTIFY] enter Start
> >>-17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23 pg_epoch: 
> >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
> >> 16609/16659
> >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
> >> crt=8480'7 lcod
> >> 0'0 inactive NOTIFY] state: transitioning to Stray
> >>-16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23 pg_epoch: 
> >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
> >> 16609/16659
> >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
> >> crt=8480'7 lcod
> >> 0'0 inactive NOTIFY] exit Start 0.25 0 0.00
> >>-15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23 pg_epoch: 
> >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 les/c 
> >> 16609/16659
> >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42 
> >> crt=8480'7 lcod
> >> 0'0 inactive NOTIFY] enter Started/Stray
> >>-14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 osd.23 pg_epoch: 
> >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 17879/17879
> >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY]

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-27 Thread Mark Nelson




On 04/27/2015 10:11 AM, Alexandre DERUMIER wrote:

Is it possible that you were suffering from the bug during the first
test but once reinstalled you hadn't hit it yet?


yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning.
I had patched it, but I think it's not enough.
I had always this bug in random, but mainly when I have a "lot" of concurrent 
client (20 -40).
more client increase - lower iops .


Today,I had try to start osd with TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M ,
and now it's working fine in all my benchs.



That's a pretty major
performance swing.  I'm not sure if we can draw any conclusions about
jemalloc vs tcmalloc until we can figure out what went wrong.


 From my bench, jemalloc use a little bit more cpu than tcmalloc (maybe 1% or 
2%).
Tcmalloc seem to works better, with correct tuning of thread_cache_bytes.


But I don't known how to tune TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES correctly.
Maybe Sommath can tell us ?


Ok, just to make sure that I understand:

tcmalloc un-tuned: ~50k IOPS once bug sets in
tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS
jemalloc un-tuned: ~150k IOPS

Is that correct?  Are there configurations/results I'm missing?

Mark




- Mail original -
De: "Mark Nelson" 
À: "aderumier" 
Cc: "ceph-users" , "ceph-devel" , 
"Milosz Tanski" 
Envoyé: Lundi 27 Avril 2015 16:54:34
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops

Hi Alex,

Is it possible that you were suffering from the bug during the first
test but once reinstalled you hadn't hit it yet? That's a pretty major
performance swing. I'm not sure if we can draw any conclusions about
jemalloc vs tcmalloc until we can figure out what went wrong.

Mark

On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote:

I'll retest tcmalloc, because I was prety sure to have patched it correctly.


Ok, I really think I have patched tcmalloc wrongly.
I have repatched it, reinstalled it, and now I'm getting 195k iops with a 
single osd (10fio rbd jobs 4k randread).

So better than jemalloc.


- Mail original -
De: "aderumier" 
À: "Mark Nelson" 
Cc: "ceph-users" , "ceph-devel" , 
"Milosz Tanski" 
Envoyé: Lundi 27 Avril 2015 07:01:21
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops

Hi,

also another big difference,

I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k 
iops max with tcmalloc.

I'll retest tcmalloc, because I was prety sure to have patched it correctly.


- Mail original -
De: "aderumier" 
À: "Mark Nelson" 
Cc: "ceph-users" , "ceph-devel" , 
"Milosz Tanski" 
Envoyé: Samedi 25 Avril 2015 06:45:43
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops


We haven't done any kind of real testing on jemalloc, so use at your own
peril. Having said that, we've also been very interested in hearing
community feedback from folks trying it out, so please feel free to give
it a shot. :D


Some feedback, I have runned bench all the night, no speed regression.

And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to 
be the reverse)

with tcmalloc :

10 fio-rbd jobs = 300k iops
15 fio-rbd jobs = 290k iops
20 fio-rbd jobs = 270k iops
40 fio-rbd jobs = 250k iops

(all with up and down values during the fio bench)


with jemalloc:

10 fio-rbd jobs = 300k iops
15 fio-rbd jobs = 320k iops
20 fio-rbd jobs = 330k iops
40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 
20cores 100%)

(all with contant values during the fio bench)

- Mail original -
De: "Mark Nelson" 
À: "Stefan Priebe" , "aderumier" 
Cc: "ceph-users" , "ceph-devel" , "Somnath 
Roy" , "Milosz Tanski" 
Envoyé: Vendredi 24 Avril 2015 20:02:15
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops

We haven't done any kind of real testing on jemalloc, so use at your own
peril. Having said that, we've also been very interested in hearing
community feedback from folks trying it out, so please feel free to give
it a shot. :D

Mark

On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote:

Is jemalloc recommanded in general? Does it also work for firefly?

Stefan

Excuse my typo sent from my mobile phone.

Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER mailto:aderum...@odiso.com>>:


Hi,

I have finished to rebuild ceph with jemalloc,

all seem to working fine.

I got a constant 300k iops for the moment, so no speed regression.

I'll do more long benchmark next week.

Regards,

Alexandre

- Mail original -
De: "Irek Fasikhov" mailto:malm...@gmail.com>>
À: "Somnath Roy" mailto:somnath@sandisk.com>>
Cc: "aderumier" mailto:aderum...@odiso.com>>,
"Mark Nelson" mailto:mnel...@redhat.com>>,
"ceph-users" mailto:ceph-users@lists.ceph.com>>, "ceph-de

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Tuomas Juntunen

Hi

Here you go

Br,
Tuomas



-Original Message-
From: Sage Weil [mailto:sw...@redhat.com] 
Sent: 27. huhtikuuta 2015 19:23
To: Tuomas Juntunen
Cc: 'Samuel Just'; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
operations most of the OSD's went down

On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> Thanks for the info.
> 
> For my knowledge there was no snapshots on that pool, but cannot 
> verify that.

Can you attach a 'ceph osd dump -f json-pretty'?  That will shed a bit more
light on what happened (and the simplest way to fix it).

sage


> Any way to make this work again? Removing the tier and other settings 
> didn't fix it, I tried it the second this happened.
> 
> Br,
> Tuomas
> 
> -Original Message-
> From: Samuel Just [mailto:sj...@redhat.com]
> Sent: 27. huhtikuuta 2015 15:50
> To: tuomas juntunen
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some 
> basic operations most of the OSD's went down
> 
> So, the base tier is what determines the snapshots for the cache/base pool
amalgam.  You added a populated pool complete with snapshots on top of a
base tier without snapshots.  Apparently, it caused an existential crisis
for the snapshot code.  That's one of the reasons why there is a
--force-nonempty flag for that operation, I think.  I think the immediate
answer is probably to disallow pools with snapshots as a cache tier
altogether until we think of a good way to make it work.
> -Sam
> 
> - Original Message -
> From: "tuomas juntunen" 
> To: "Samuel Just" 
> Cc: ceph-users@lists.ceph.com
> Sent: Monday, April 27, 2015 4:56:58 AM
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some 
> basic operations most of the OSD's went down
> 
> 
> 
> The following:
> 
> ceph osd tier add img images --force-nonempty ceph osd tier cache-mode 
> images forward ceph osd tier set-overlay img images
> 
> Idea was to make images as a tier to img, move data to img then change
clients to use the new img pool.
> 
> Br,
> Tuomas
> 
> > Can you explain exactly what you mean by:
> >
> > "Also I created one pool for tier to be able to move data without
outage."
> >
> > -Sam
> > - Original Message -
> > From: "tuomas juntunen" 
> > To: "Ian Colle" 
> > Cc: ceph-users@lists.ceph.com
> > Sent: Monday, April 27, 2015 4:23:44 AM
> > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > some basic operations most of the OSD's went down
> >
> > Hi
> >
> > Any solution for this yet?
> >
> > Br,
> > Tuomas
> >
> >> It looks like you may have hit http://tracker.ceph.com/issues/7915
> >>
> >> Ian R. Colle
> >> Global Director
> >> of Software Engineering
> >> Red Hat (Inktank is now part of Red Hat!) 
> >> http://www.linkedin.com/in/ircolle
> >> http://www.twitter.com/ircolle
> >> Cell: +1.303.601.7713
> >> Email: ico...@redhat.com
> >>
> >> - Original Message -
> >> From: "tuomas juntunen" 
> >> To: ceph-users@lists.ceph.com
> >> Sent: Monday, April 27, 2015 1:56:29 PM
> >> Subject: [ceph-users] Upgrade from Giant to Hammer and after some 
> >> basic operations most of the OSD's went down
> >>
> >>
> >>
> >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> >>
> >> Then created new pools and deleted some old ones. Also I created 
> >> one pool for tier to be able to move data without outage.
> >>
> >> After these operations all but 10 OSD's are down and creating this 
> >> kind of messages to logs, I get more than 100gb of these in a night:
> >>
> >>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23 pg_epoch: 
> >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> >> les/c
> >> 16609/16659
> >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> >> crt=8480'7 lcod
> >> 0'0 inactive NOTIFY] enter Started
> >>-18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23 pg_epoch: 
> >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> >> les/c
> >> 16609/16659
> >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> >> crt=8480'7 lcod
> >> 0'0 inactive NOTIFY] enter Start
> >>-17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23 pg_epoch: 
> >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> >> les/c
> >> 16609/16659
> >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> >> crt=8480'7 lcod
> >> 0'0 inactive NOTIFY] state: transitioning to Stray
> >>-16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23 pg_epoch: 
> >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> >> les/c
> >> 16609/16659
> >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> >> crt=8480'7 lcod
> >> 0'0 inactive NOTIFY] exit Start 0.25 0 0.00
> >>-15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23 pg_epoch: 
> >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> >> les/c
> >> 16609/16659
> >> 16590/16590/16590) [24,3,23] r=2 lpr=17

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-27 Thread Alexandre DERUMIER

Ok, just to make sure that I understand:

>>tcmalloc un-tuned: ~50k IOPS once bug sets in
yes, it's really random, but when hitting the bug, yes this is the worste I 
have seen.


>>tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS
yes
>>jemalloc un-tuned: ~150k IOPS
It's more around 185k iops  (a little bit less than tcmalloc, with a little bit 
more cpu usage)




- Mail original -
De: "Mark Nelson" 
À: "aderumier" 
Cc: "ceph-users" , "ceph-devel" 
, "Milosz Tanski" 
Envoyé: Lundi 27 Avril 2015 18:34:50
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops

On 04/27/2015 10:11 AM, Alexandre DERUMIER wrote: 
>>> Is it possible that you were suffering from the bug during the first 
>>> test but once reinstalled you hadn't hit it yet? 
> 
> yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning. 
> I had patched it, but I think it's not enough. 
> I had always this bug in random, but mainly when I have a "lot" of concurrent 
> client (20 -40). 
> more client increase - lower iops . 
> 
> 
> Today,I had try to start osd with TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M 
> , 
> and now it's working fine in all my benchs. 
> 
> 
>>> That's a pretty major 
>>> performance swing. I'm not sure if we can draw any conclusions about 
>>> jemalloc vs tcmalloc until we can figure out what went wrong. 
> 
> From my bench, jemalloc use a little bit more cpu than tcmalloc (maybe 1% or 
> 2%). 
> Tcmalloc seem to works better, with correct tuning of thread_cache_bytes. 
> 
> 
> But I don't known how to tune TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES 
> correctly. 
> Maybe Sommath can tell us ? 

Ok, just to make sure that I understand: 

tcmalloc un-tuned: ~50k IOPS once bug sets in 
tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS 
jemalloc un-tuned: ~150k IOPS 

Is that correct? Are there configurations/results I'm missing? 

Mark 

> 
> 
> - Mail original - 
> De: "Mark Nelson"  
> À: "aderumier"  
> Cc: "ceph-users" , "ceph-devel" 
> , "Milosz Tanski"  
> Envoyé: Lundi 27 Avril 2015 16:54:34 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
> improve performance from 100k iops to 300k iops 
> 
> Hi Alex, 
> 
> Is it possible that you were suffering from the bug during the first 
> test but once reinstalled you hadn't hit it yet? That's a pretty major 
> performance swing. I'm not sure if we can draw any conclusions about 
> jemalloc vs tcmalloc until we can figure out what went wrong. 
> 
> Mark 
> 
> On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote: 
 I'll retest tcmalloc, because I was prety sure to have patched it 
 correctly. 
>> 
>> Ok, I really think I have patched tcmalloc wrongly. 
>> I have repatched it, reinstalled it, and now I'm getting 195k iops with a 
>> single osd (10fio rbd jobs 4k randread). 
>> 
>> So better than jemalloc. 
>> 
>> 
>> - Mail original - 
>> De: "aderumier"  
>> À: "Mark Nelson"  
>> Cc: "ceph-users" , "ceph-devel" 
>> , "Milosz Tanski"  
>> Envoyé: Lundi 27 Avril 2015 07:01:21 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
>> improve performance from 100k iops to 300k iops 
>> 
>> Hi, 
>> 
>> also another big difference, 
>> 
>> I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k 
>> iops max with tcmalloc. 
>> 
>> I'll retest tcmalloc, because I was prety sure to have patched it correctly. 
>> 
>> 
>> - Mail original - 
>> De: "aderumier"  
>> À: "Mark Nelson"  
>> Cc: "ceph-users" , "ceph-devel" 
>> , "Milosz Tanski"  
>> Envoyé: Samedi 25 Avril 2015 06:45:43 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
>> improve performance from 100k iops to 300k iops 
>> 
 We haven't done any kind of real testing on jemalloc, so use at your own 
 peril. Having said that, we've also been very interested in hearing 
 community feedback from folks trying it out, so please feel free to give 
 it a shot. :D 
>> 
>> Some feedback, I have runned bench all the night, no speed regression. 
>> 
>> And I have a speed increase with fio with more jobs. (with tcmalloc, it seem 
>> to be the reverse) 
>> 
>> with tcmalloc : 
>> 
>> 10 fio-rbd jobs = 300k iops 
>> 15 fio-rbd jobs = 290k iops 
>> 20 fio-rbd jobs = 270k iops 
>> 40 fio-rbd jobs = 250k iops 
>> 
>> (all with up and down values during the fio bench) 
>> 
>> 
>> with jemalloc: 
>> 
>> 10 fio-rbd jobs = 300k iops 
>> 15 fio-rbd jobs = 320k iops 
>> 20 fio-rbd jobs = 330k iops 
>> 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine 
>> with 20cores 100%) 
>> 
>> (all with contant values during the fio bench) 
>> 
>> - Mail original - 
>> De: "Mark Nelson"  
>> À: "Stefan Priebe" , "aderumier" 
>>  
>> Cc: "ceph-users" , "ceph-devel" 
>> , "Somnath Roy" , 
>> "Milosz Tanski"  
>> Envoyé: Vendredi 24 Avril 2015 20:02:15 
>> Objet: Re: [ceph-users] stran

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-27 Thread Somnath Roy

Alexandre,
The moment you restarted after hitting the tcmalloc trace, irrespective of what 
value you set as thread cache, it will perform better and that's what happening 
in your case I guess.
Yes, setting this value kind of tricky and very much dependent on your 
setup/workload etc.
I would suggest to set it ~128M and run your test longer say ~10 hours or so.

Thanks & Regards
Somnath
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Alexandre DERUMIER
Sent: Monday, April 27, 2015 9:46 AM
To: Mark Nelson
Cc: ceph-users; ceph-devel; Milosz Tanski
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops

Ok, just to make sure that I understand:

>>tcmalloc un-tuned: ~50k IOPS once bug sets in
yes, it's really random, but when hitting the bug, yes this is the worste I 
have seen.


>>tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS
yes
>>jemalloc un-tuned: ~150k IOPS
It's more around 185k iops  (a little bit less than tcmalloc, with a little bit 
more cpu usage)




- Mail original -
De: "Mark Nelson" 
À: "aderumier" 
Cc: "ceph-users" , "ceph-devel" 
, "Milosz Tanski" 
Envoyé: Lundi 27 Avril 2015 18:34:50
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops

On 04/27/2015 10:11 AM, Alexandre DERUMIER wrote:
>>> Is it possible that you were suffering from the bug during the first
>>> test but once reinstalled you hadn't hit it yet?
>
> yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning.
> I had patched it, but I think it's not enough.
> I had always this bug in random, but mainly when I have a "lot" of concurrent 
> client (20 -40).
> more client increase - lower iops .
>
>
> Today,I had try to start osd with
> TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M , and now it's working fine in all 
> my benchs.
>
>
>>> That's a pretty major
>>> performance swing. I'm not sure if we can draw any conclusions about
>>> jemalloc vs tcmalloc until we can figure out what went wrong.
>
> From my bench, jemalloc use a little bit more cpu than tcmalloc (maybe 1% or 
> 2%).
> Tcmalloc seem to works better, with correct tuning of thread_cache_bytes.
>
>
> But I don't known how to tune TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES correctly.
> Maybe Sommath can tell us ?

Ok, just to make sure that I understand:

tcmalloc un-tuned: ~50k IOPS once bug sets in tcmalloc with patch and 128MB 
thread cache bytes: ~195k IOPS jemalloc un-tuned: ~150k IOPS

Is that correct? Are there configurations/results I'm missing?

Mark

>
>
> - Mail original -
> De: "Mark Nelson" 
> À: "aderumier" 
> Cc: "ceph-users" , "ceph-devel"
> , "Milosz Tanski" 
> Envoyé: Lundi 27 Avril 2015 16:54:34
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
> daemon improve performance from 100k iops to 300k iops
>
> Hi Alex,
>
> Is it possible that you were suffering from the bug during the first
> test but once reinstalled you hadn't hit it yet? That's a pretty major
> performance swing. I'm not sure if we can draw any conclusions about
> jemalloc vs tcmalloc until we can figure out what went wrong.
>
> Mark
>
> On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote:
 I'll retest tcmalloc, because I was prety sure to have patched it 
 correctly.
>>
>> Ok, I really think I have patched tcmalloc wrongly.
>> I have repatched it, reinstalled it, and now I'm getting 195k iops with a 
>> single osd (10fio rbd jobs 4k randread).
>>
>> So better than jemalloc.
>>
>>
>> - Mail original -
>> De: "aderumier" 
>> À: "Mark Nelson" 
>> Cc: "ceph-users" , "ceph-devel"
>> , "Milosz Tanski" 
>> Envoyé: Lundi 27 Avril 2015 07:01:21
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> daemon improve performance from 100k iops to 300k iops
>>
>> Hi,
>>
>> also another big difference,
>>
>> I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k 
>> iops max with tcmalloc.
>>
>> I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>>
>>
>> - Mail original -
>> De: "aderumier" 
>> À: "Mark Nelson" 
>> Cc: "ceph-users" , "ceph-devel"
>> , "Milosz Tanski" 
>> Envoyé: Samedi 25 Avril 2015 06:45:43
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> daemon improve performance from 100k iops to 300k iops
>>
 We haven't done any kind of real testing on jemalloc, so use at
 your own peril. Having said that, we've also been very interested
 in hearing community feedback from folks trying it out, so please
 feel free to give it a shot. :D
>>
>> Some feedback, I have runned bench all the night, no speed regression.
>>
>> And I have a speed increase with fio with more jobs. (with tcmalloc,
>> it seem to be the reverse)
>>
>> with tcmalloc :
>>
>> 10 fio-rbd jobs = 300k iops
>> 15 fio-rbd jobs = 290k iops
>> 20 fio-rbd jobs = 270k iops

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-27 Thread Mark Nelson


Hi Somnath,

Forgive me as I think this was discussed earlier in the thread, but did 
we confirm that the patch/fix/etc does not 100% fix the problem?


Mark

On 04/27/2015 12:25 PM, Somnath Roy wrote:

Alexandre,
The moment you restarted after hitting the tcmalloc trace, irrespective of what 
value you set as thread cache, it will perform better and that's what happening 
in your case I guess.
Yes, setting this value kind of tricky and very much dependent on your 
setup/workload etc.
I would suggest to set it ~128M and run your test longer say ~10 hours or so.

Thanks & Regards
Somnath
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Alexandre DERUMIER
Sent: Monday, April 27, 2015 9:46 AM
To: Mark Nelson
Cc: ceph-users; ceph-devel; Milosz Tanski
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops

Ok, just to make sure that I understand:


tcmalloc un-tuned: ~50k IOPS once bug sets in

yes, it's really random, but when hitting the bug, yes this is the worste I 
have seen.



tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS

yes

jemalloc un-tuned: ~150k IOPS

It's more around 185k iops  (a little bit less than tcmalloc, with a little bit 
more cpu usage)




- Mail original -
De: "Mark Nelson" 
À: "aderumier" 
Cc: "ceph-users" , "ceph-devel" , 
"Milosz Tanski" 
Envoyé: Lundi 27 Avril 2015 18:34:50
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops

On 04/27/2015 10:11 AM, Alexandre DERUMIER wrote:

Is it possible that you were suffering from the bug during the first
test but once reinstalled you hadn't hit it yet?


yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning.
I had patched it, but I think it's not enough.
I had always this bug in random, but mainly when I have a "lot" of concurrent 
client (20 -40).
more client increase - lower iops .


Today,I had try to start osd with
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M , and now it's working fine in all 
my benchs.



That's a pretty major
performance swing. I'm not sure if we can draw any conclusions about
jemalloc vs tcmalloc until we can figure out what went wrong.


 From my bench, jemalloc use a little bit more cpu than tcmalloc (maybe 1% or 
2%).
Tcmalloc seem to works better, with correct tuning of thread_cache_bytes.


But I don't known how to tune TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES correctly.
Maybe Sommath can tell us ?


Ok, just to make sure that I understand:

tcmalloc un-tuned: ~50k IOPS once bug sets in tcmalloc with patch and 128MB 
thread cache bytes: ~195k IOPS jemalloc un-tuned: ~150k IOPS

Is that correct? Are there configurations/results I'm missing?

Mark




- Mail original -
De: "Mark Nelson" 
À: "aderumier" 
Cc: "ceph-users" , "ceph-devel"
, "Milosz Tanski" 
Envoyé: Lundi 27 Avril 2015 16:54:34
Objet: Re: [ceph-users] strange benchmark problem : restarting osd
daemon improve performance from 100k iops to 300k iops

Hi Alex,

Is it possible that you were suffering from the bug during the first
test but once reinstalled you hadn't hit it yet? That's a pretty major
performance swing. I'm not sure if we can draw any conclusions about
jemalloc vs tcmalloc until we can figure out what went wrong.

Mark

On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote:

I'll retest tcmalloc, because I was prety sure to have patched it correctly.


Ok, I really think I have patched tcmalloc wrongly.
I have repatched it, reinstalled it, and now I'm getting 195k iops with a 
single osd (10fio rbd jobs 4k randread).

So better than jemalloc.


- Mail original -
De: "aderumier" 
À: "Mark Nelson" 
Cc: "ceph-users" , "ceph-devel"
, "Milosz Tanski" 
Envoyé: Lundi 27 Avril 2015 07:01:21
Objet: Re: [ceph-users] strange benchmark problem : restarting osd
daemon improve performance from 100k iops to 300k iops

Hi,

also another big difference,

I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k 
iops max with tcmalloc.

I'll retest tcmalloc, because I was prety sure to have patched it correctly.


- Mail original -
De: "aderumier" 
À: "Mark Nelson" 
Cc: "ceph-users" , "ceph-devel"
, "Milosz Tanski" 
Envoyé: Samedi 25 Avril 2015 06:45:43
Objet: Re: [ceph-users] strange benchmark problem : restarting osd
daemon improve performance from 100k iops to 300k iops


We haven't done any kind of real testing on jemalloc, so use at
your own peril. Having said that, we've also been very interested
in hearing community feedback from folks trying it out, so please
feel free to give it a shot. :D


Some feedback, I have runned bench all the night, no speed regression.

And I have a speed increase with fio with more jobs. (with tcmalloc,
it seem to be the reverse)

with tcmalloc :

10 fio-rbd jobs = 300k iops
15 fio-rbd jobs = 290k iops
20 fio-rbd jobs = 270k iops
40 fio-rbd jobs = 250k iops

(al

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Sage Weil

Yeah, no snaps:

images:
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 17882,
"pool_snaps": [],
"removed_snaps": "[]",

img:
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",

...and actually the log shows this happens on pool 2 (rbd), which has

"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",

I'm guessin gthe offending code is

pi->build_removed_snaps(newly_removed_snaps);
newly_removed_snaps.subtract(cached_removed_snaps);

so newly_removed_snaps should be empty, and apparently 
cached_removed_snaps is not?  Maybe one of your older osdmaps has snap 
info for rbd?  It doesn't make sense.  :/  Maybe

 ceph osd dump 18127 -f json-pretty

just to be certain?  I've pushed a branch 'wip-hammer-snaps' that 
will appear at gitbuilder.ceph.com in 20-30 minutes that will output some 
additional debug info.  It will be at


http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/ref/wip-hammer-sanps

or similar, depending on your distro.  Can you install it one on node 
and start and osd with logging to reproduce the crash?

Thanks!
sage


On Mon, 27 Apr 2015, Tuomas Juntunen wrote:

> Hi
> 
> Here you go
> 
> Br,
> Tuomas
> 
> 
> 
> -Original Message-
> From: Sage Weil [mailto:sw...@redhat.com] 
> Sent: 27. huhtikuuta 2015 19:23
> To: Tuomas Juntunen
> Cc: 'Samuel Just'; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
> operations most of the OSD's went down
> 
> On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> > Thanks for the info.
> > 
> > For my knowledge there was no snapshots on that pool, but cannot 
> > verify that.
> 
> Can you attach a 'ceph osd dump -f json-pretty'?  That will shed a bit more
> light on what happened (and the simplest way to fix it).
> 
> sage
> 
> 
> > Any way to make this work again? Removing the tier and other settings 
> > didn't fix it, I tried it the second this happened.
> > 
> > Br,
> > Tuomas
> > 
> > -Original Message-
> > From: Samuel Just [mailto:sj...@redhat.com]
> > Sent: 27. huhtikuuta 2015 15:50
> > To: tuomas juntunen
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some 
> > basic operations most of the OSD's went down
> > 
> > So, the base tier is what determines the snapshots for the cache/base pool
> amalgam.  You added a populated pool complete with snapshots on top of a
> base tier without snapshots.  Apparently, it caused an existential crisis
> for the snapshot code.  That's one of the reasons why there is a
> --force-nonempty flag for that operation, I think.  I think the immediate
> answer is probably to disallow pools with snapshots as a cache tier
> altogether until we think of a good way to make it work.
> > -Sam
> > 
> > - Original Message -
> > From: "tuomas juntunen" 
> > To: "Samuel Just" 
> > Cc: ceph-users@lists.ceph.com
> > Sent: Monday, April 27, 2015 4:56:58 AM
> > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some 
> > basic operations most of the OSD's went down
> > 
> > 
> > 
> > The following:
> > 
> > ceph osd tier add img images --force-nonempty ceph osd tier cache-mode 
> > images forward ceph osd tier set-overlay img images
> > 
> > Idea was to make images as a tier to img, move data to img then change
> clients to use the new img pool.
> > 
> > Br,
> > Tuomas
> > 
> > > Can you explain exactly what you mean by:
> > >
> > > "Also I created one pool for tier to be able to move data without
> outage."
> > >
> > > -Sam
> > > - Original Message -
> > > From: "tuomas juntunen" 
> > > To: "Ian Colle" 
> > > Cc: ceph-users@lists.ceph.com
> > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > > some basic operations most of the OSD's went down
> > >
> > > Hi
> > >
> > > Any solution for this yet?
> > >
> > > Br,
> > > Tuomas
> > >
> > >> It looks like you may have hit http://tracker.ceph.com/issues/7915
> > >>
> > >> Ian R. Colle
> > >> Global Director
> > >> of Software Engineering
> > >> Red Hat (Inktank is now part of Red Hat!) 
> > >> http://www.linkedin.com/in/ircolle
> > >> http://www.twitter.com/ircolle
> > >> Cell: +1.303.601.7713
> > >> Email: ico...@redhat.com
> > >>
> > >> - Original Message -
> > >> From: "tuomas juntunen" 
> > >> To: ceph-users@lists.ceph.com
> > >> Sent: Monday, April 27, 2015 1:56:29 PM
> > >> Subject: [ceph-users] Upgrade from Giant to Hammer and after some 
> > >> basic operations most of the OSD's went down
> > >>
> > >>
> > >>
> > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> > >>
> > >> Then created new pools and deleted some old ones. Also I created 
> > >> one pool fo

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Tuomas Juntunen

Hi

Thank you so much,

Here's the other json file, I'll check and install that and get the logs
asap too. There has not been any snaps on rbd, I haven't used it at all, it
has been just an empty pool.

Br,
Tuomas


-Original Message-
From: Sage Weil [mailto:sw...@redhat.com] 
Sent: 27. huhtikuuta 2015 20:45
To: Tuomas Juntunen
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
operations most of the OSD's went down

Yeah, no snaps:

images:
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 17882,
"pool_snaps": [],
"removed_snaps": "[]",

img:
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",

...and actually the log shows this happens on pool 2 (rbd), which has

"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",

I'm guessin gthe offending code is

pi->build_removed_snaps(newly_removed_snaps);
newly_removed_snaps.subtract(cached_removed_snaps);

so newly_removed_snaps should be empty, and apparently cached_removed_snaps
is not?  Maybe one of your older osdmaps has snap info for rbd?  It doesn't
make sense.  :/  Maybe

 ceph osd dump 18127 -f json-pretty

just to be certain?  I've pushed a branch 'wip-hammer-snaps' that will
appear at gitbuilder.ceph.com in 20-30 minutes that will output some
additional debug info.  It will be at


http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/ref/wip-hammer-sanps

or similar, depending on your distro.  Can you install it one on node and
start and osd with logging to reproduce the crash?

Thanks!
sage


On Mon, 27 Apr 2015, Tuomas Juntunen wrote:

> Hi
> 
> Here you go
> 
> Br,
> Tuomas
> 
> 
> 
> -Original Message-
> From: Sage Weil [mailto:sw...@redhat.com]
> Sent: 27. huhtikuuta 2015 19:23
> To: Tuomas Juntunen
> Cc: 'Samuel Just'; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some 
> basic operations most of the OSD's went down
> 
> On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> > Thanks for the info.
> > 
> > For my knowledge there was no snapshots on that pool, but cannot 
> > verify that.
> 
> Can you attach a 'ceph osd dump -f json-pretty'?  That will shed a bit 
> more light on what happened (and the simplest way to fix it).
> 
> sage
> 
> 
> > Any way to make this work again? Removing the tier and other 
> > settings didn't fix it, I tried it the second this happened.
> > 
> > Br,
> > Tuomas
> > 
> > -Original Message-
> > From: Samuel Just [mailto:sj...@redhat.com]
> > Sent: 27. huhtikuuta 2015 15:50
> > To: tuomas juntunen
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > some basic operations most of the OSD's went down
> > 
> > So, the base tier is what determines the snapshots for the 
> > cache/base pool
> amalgam.  You added a populated pool complete with snapshots on top of 
> a base tier without snapshots.  Apparently, it caused an existential 
> crisis for the snapshot code.  That's one of the reasons why there is 
> a --force-nonempty flag for that operation, I think.  I think the 
> immediate answer is probably to disallow pools with snapshots as a 
> cache tier altogether until we think of a good way to make it work.
> > -Sam
> > 
> > - Original Message -
> > From: "tuomas juntunen" 
> > To: "Samuel Just" 
> > Cc: ceph-users@lists.ceph.com
> > Sent: Monday, April 27, 2015 4:56:58 AM
> > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > some basic operations most of the OSD's went down
> > 
> > 
> > 
> > The following:
> > 
> > ceph osd tier add img images --force-nonempty ceph osd tier 
> > cache-mode images forward ceph osd tier set-overlay img images
> > 
> > Idea was to make images as a tier to img, move data to img then 
> > change
> clients to use the new img pool.
> > 
> > Br,
> > Tuomas
> > 
> > > Can you explain exactly what you mean by:
> > >
> > > "Also I created one pool for tier to be able to move data without
> outage."
> > >
> > > -Sam
> > > - Original Message -
> > > From: "tuomas juntunen" 
> > > To: "Ian Colle" 
> > > Cc: ceph-users@lists.ceph.com
> > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > > some basic operations most of the OSD's went down
> > >
> > > Hi
> > >
> > > Any solution for this yet?
> > >
> > > Br,
> > > Tuomas
> > >
> > >> It looks like you may have hit 
> > >> http://tracker.ceph.com/issues/7915
> > >>
> > >> Ian R. Colle
> > >> Global Director
> > >> of Software Engineering
> > >> Red Hat (Inktank is now part of Red Hat!) 
> > >> http://www.linkedin.com/in/ircolle
> > >> http://www.twitter.com/ircolle
> > >> Cell: +1.30

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-27 Thread Somnath Roy

Yes, the tcmalloc patch we applied is not to solve the trace we are seeing. The 
env varaiable code path was noop in the tcmalloc code base and the patch has 
resolved that. Now, setting the env variable is taking effect within tcmalloc 
code base.
Now, this thread cache env variable is a performance workaround with tcmalloc 
that will prevent the tcmalloc perf trace we are hitting to come soon :-) or 
may never be coming depending on your workload (objecs <=32K is vulnerable to 
hit this).
Basically, giving more cache it will build a bigger free list so that it 
doesn't have to do garbage collect often or go to central free list often. Here 
is the nice article explaining this.

http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html

So, it is very difficult to predict the optimal value for this thread cache and 
that's why it will not solve the issue completely.

Thanks & Regards
Somnath

-Original Message-
From: Mark Nelson [mailto:mnel...@redhat.com] 
Sent: Monday, April 27, 2015 10:42 AM
To: Somnath Roy; Alexandre DERUMIER
Cc: ceph-users; ceph-devel; Milosz Tanski
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
improve performance from 100k iops to 300k iops

Hi Somnath,

Forgive me as I think this was discussed earlier in the thread, but did 
we confirm that the patch/fix/etc does not 100% fix the problem?

Mark

On 04/27/2015 12:25 PM, Somnath Roy wrote:
> Alexandre,
> The moment you restarted after hitting the tcmalloc trace, irrespective of 
> what value you set as thread cache, it will perform better and that's what 
> happening in your case I guess.
> Yes, setting this value kind of tricky and very much dependent on your 
> setup/workload etc.
> I would suggest to set it ~128M and run your test longer say ~10 hours or so.
>
> Thanks & Regards
> Somnath
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Alexandre DERUMIER
> Sent: Monday, April 27, 2015 9:46 AM
> To: Mark Nelson
> Cc: ceph-users; ceph-devel; Milosz Tanski
> Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
> improve performance from 100k iops to 300k iops
>
> Ok, just to make sure that I understand:
>
>>> tcmalloc un-tuned: ~50k IOPS once bug sets in
> yes, it's really random, but when hitting the bug, yes this is the worste I 
> have seen.
>
>
>>> tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS
> yes
>>> jemalloc un-tuned: ~150k IOPS
> It's more around 185k iops  (a little bit less than tcmalloc, with a little 
> bit more cpu usage)
>
>
>
>
> - Mail original -
> De: "Mark Nelson" 
> À: "aderumier" 
> Cc: "ceph-users" , "ceph-devel" 
> , "Milosz Tanski" 
> Envoyé: Lundi 27 Avril 2015 18:34:50
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon 
> improve performance from 100k iops to 300k iops
>
> On 04/27/2015 10:11 AM, Alexandre DERUMIER wrote:
 Is it possible that you were suffering from the bug during the first
 test but once reinstalled you hadn't hit it yet?
>>
>> yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning.
>> I had patched it, but I think it's not enough.
>> I had always this bug in random, but mainly when I have a "lot" of 
>> concurrent client (20 -40).
>> more client increase - lower iops .
>>
>>
>> Today,I had try to start osd with
>> TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M , and now it's working fine in 
>> all my benchs.
>>
>>
 That's a pretty major
 performance swing. I'm not sure if we can draw any conclusions about
 jemalloc vs tcmalloc until we can figure out what went wrong.
>>
>>  From my bench, jemalloc use a little bit more cpu than tcmalloc (maybe 1% 
>> or 2%).
>> Tcmalloc seem to works better, with correct tuning of thread_cache_bytes.
>>
>>
>> But I don't known how to tune TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES 
>> correctly.
>> Maybe Sommath can tell us ?
>
> Ok, just to make sure that I understand:
>
> tcmalloc un-tuned: ~50k IOPS once bug sets in tcmalloc with patch and 128MB 
> thread cache bytes: ~195k IOPS jemalloc un-tuned: ~150k IOPS
>
> Is that correct? Are there configurations/results I'm missing?
>
> Mark
>
>>
>>
>> - Mail original -
>> De: "Mark Nelson" 
>> À: "aderumier" 
>> Cc: "ceph-users" , "ceph-devel"
>> , "Milosz Tanski" 
>> Envoyé: Lundi 27 Avril 2015 16:54:34
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> daemon improve performance from 100k iops to 300k iops
>>
>> Hi Alex,
>>
>> Is it possible that you were suffering from the bug during the first
>> test but once reinstalled you hadn't hit it yet? That's a pretty major
>> performance swing. I'm not sure if we can draw any conclusions about
>> jemalloc vs tcmalloc until we can figure out what went wrong.
>>
>> Mark
>>
>> On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote:
> I'll retest tcmalloc, because I was prety sure to have patched it 
> correctly.
>>>
>>> Ok, I really think I

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Tuomas Juntunen

Hey

Got the log, you can get it from
http://beta.xaasbox.com/ceph/ceph-osd.15.log 

Br,
Tuomas


-Original Message-
From: Sage Weil [mailto:sw...@redhat.com] 
Sent: 27. huhtikuuta 2015 20:45
To: Tuomas Juntunen
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
operations most of the OSD's went down

Yeah, no snaps:

images:
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 17882,
"pool_snaps": [],
"removed_snaps": "[]",

img:
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",

...and actually the log shows this happens on pool 2 (rbd), which has

"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",

I'm guessin gthe offending code is

pi->build_removed_snaps(newly_removed_snaps);
newly_removed_snaps.subtract(cached_removed_snaps);

so newly_removed_snaps should be empty, and apparently cached_removed_snaps
is not?  Maybe one of your older osdmaps has snap info for rbd?  It doesn't
make sense.  :/  Maybe

 ceph osd dump 18127 -f json-pretty

just to be certain?  I've pushed a branch 'wip-hammer-snaps' that will
appear at gitbuilder.ceph.com in 20-30 minutes that will output some
additional debug info.  It will be at


http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/ref/wip-hammer-sanps

or similar, depending on your distro.  Can you install it one on node and
start and osd with logging to reproduce the crash?

Thanks!
sage


On Mon, 27 Apr 2015, Tuomas Juntunen wrote:

> Hi
> 
> Here you go
> 
> Br,
> Tuomas
> 
> 
> 
> -Original Message-
> From: Sage Weil [mailto:sw...@redhat.com]
> Sent: 27. huhtikuuta 2015 19:23
> To: Tuomas Juntunen
> Cc: 'Samuel Just'; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some 
> basic operations most of the OSD's went down
> 
> On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> > Thanks for the info.
> > 
> > For my knowledge there was no snapshots on that pool, but cannot 
> > verify that.
> 
> Can you attach a 'ceph osd dump -f json-pretty'?  That will shed a bit 
> more light on what happened (and the simplest way to fix it).
> 
> sage
> 
> 
> > Any way to make this work again? Removing the tier and other 
> > settings didn't fix it, I tried it the second this happened.
> > 
> > Br,
> > Tuomas
> > 
> > -Original Message-
> > From: Samuel Just [mailto:sj...@redhat.com]
> > Sent: 27. huhtikuuta 2015 15:50
> > To: tuomas juntunen
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > some basic operations most of the OSD's went down
> > 
> > So, the base tier is what determines the snapshots for the 
> > cache/base pool
> amalgam.  You added a populated pool complete with snapshots on top of 
> a base tier without snapshots.  Apparently, it caused an existential 
> crisis for the snapshot code.  That's one of the reasons why there is 
> a --force-nonempty flag for that operation, I think.  I think the 
> immediate answer is probably to disallow pools with snapshots as a 
> cache tier altogether until we think of a good way to make it work.
> > -Sam
> > 
> > - Original Message -
> > From: "tuomas juntunen" 
> > To: "Samuel Just" 
> > Cc: ceph-users@lists.ceph.com
> > Sent: Monday, April 27, 2015 4:56:58 AM
> > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > some basic operations most of the OSD's went down
> > 
> > 
> > 
> > The following:
> > 
> > ceph osd tier add img images --force-nonempty ceph osd tier 
> > cache-mode images forward ceph osd tier set-overlay img images
> > 
> > Idea was to make images as a tier to img, move data to img then 
> > change
> clients to use the new img pool.
> > 
> > Br,
> > Tuomas
> > 
> > > Can you explain exactly what you mean by:
> > >
> > > "Also I created one pool for tier to be able to move data without
> outage."
> > >
> > > -Sam
> > > - Original Message -
> > > From: "tuomas juntunen" 
> > > To: "Ian Colle" 
> > > Cc: ceph-users@lists.ceph.com
> > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > > some basic operations most of the OSD's went down
> > >
> > > Hi
> > >
> > > Any solution for this yet?
> > >
> > > Br,
> > > Tuomas
> > >
> > >> It looks like you may have hit 
> > >> http://tracker.ceph.com/issues/7915
> > >>
> > >> Ian R. Colle
> > >> Global Director
> > >> of Software Engineering
> > >> Red Hat (Inktank is now part of Red Hat!) 
> > >> http://www.linkedin.com/in/ircolle
> > >> http://www.twitter.com/ircolle
> > >> Cell: +1.303.601.7713
> > >> Email: ico...@redhat.com
> > >>
> > >> - Original Message -
> > >> From: "tuomas juntunen"

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Sage Weil

On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> Hey
> 
> Got the log, you can get it from
> http://beta.xaasbox.com/ceph/ceph-osd.15.log 

Can you repeat this with 'debug osd = 20'?  Thanks!

sage

> 
> Br,
> Tuomas
> 
> 
> -Original Message-
> From: Sage Weil [mailto:sw...@redhat.com] 
> Sent: 27. huhtikuuta 2015 20:45
> To: Tuomas Juntunen
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
> operations most of the OSD's went down
> 
> Yeah, no snaps:
> 
> images:
> "snap_mode": "selfmanaged",
> "snap_seq": 0,
> "snap_epoch": 17882,
> "pool_snaps": [],
> "removed_snaps": "[]",
> 
> img:
> "snap_mode": "selfmanaged",
> "snap_seq": 0,
> "snap_epoch": 0,
> "pool_snaps": [],
> "removed_snaps": "[]",
> 
> ...and actually the log shows this happens on pool 2 (rbd), which has
> 
> "snap_mode": "selfmanaged",
> "snap_seq": 0,
> "snap_epoch": 0,
> "pool_snaps": [],
> "removed_snaps": "[]",
> 
> I'm guessin gthe offending code is
> 
> pi->build_removed_snaps(newly_removed_snaps);
> newly_removed_snaps.subtract(cached_removed_snaps);
> 
> so newly_removed_snaps should be empty, and apparently cached_removed_snaps
> is not?  Maybe one of your older osdmaps has snap info for rbd?  It doesn't
> make sense.  :/  Maybe
> 
>  ceph osd dump 18127 -f json-pretty
> 
> just to be certain?  I've pushed a branch 'wip-hammer-snaps' that will
> appear at gitbuilder.ceph.com in 20-30 minutes that will output some
> additional debug info.  It will be at
> 
>   
> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/ref/wip-hammer-sanps
> 
> or similar, depending on your distro.  Can you install it one on node and
> start and osd with logging to reproduce the crash?
> 
> Thanks!
> sage
> 
> 
> On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> 
> > Hi
> > 
> > Here you go
> > 
> > Br,
> > Tuomas
> > 
> > 
> > 
> > -Original Message-
> > From: Sage Weil [mailto:sw...@redhat.com]
> > Sent: 27. huhtikuuta 2015 19:23
> > To: Tuomas Juntunen
> > Cc: 'Samuel Just'; ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some 
> > basic operations most of the OSD's went down
> > 
> > On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> > > Thanks for the info.
> > > 
> > > For my knowledge there was no snapshots on that pool, but cannot 
> > > verify that.
> > 
> > Can you attach a 'ceph osd dump -f json-pretty'?  That will shed a bit 
> > more light on what happened (and the simplest way to fix it).
> > 
> > sage
> > 
> > 
> > > Any way to make this work again? Removing the tier and other 
> > > settings didn't fix it, I tried it the second this happened.
> > > 
> > > Br,
> > > Tuomas
> > > 
> > > -Original Message-
> > > From: Samuel Just [mailto:sj...@redhat.com]
> > > Sent: 27. huhtikuuta 2015 15:50
> > > To: tuomas juntunen
> > > Cc: ceph-users@lists.ceph.com
> > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > > some basic operations most of the OSD's went down
> > > 
> > > So, the base tier is what determines the snapshots for the 
> > > cache/base pool
> > amalgam.  You added a populated pool complete with snapshots on top of 
> > a base tier without snapshots.  Apparently, it caused an existential 
> > crisis for the snapshot code.  That's one of the reasons why there is 
> > a --force-nonempty flag for that operation, I think.  I think the 
> > immediate answer is probably to disallow pools with snapshots as a 
> > cache tier altogether until we think of a good way to make it work.
> > > -Sam
> > > 
> > > - Original Message -
> > > From: "tuomas juntunen" 
> > > To: "Samuel Just" 
> > > Cc: ceph-users@lists.ceph.com
> > > Sent: Monday, April 27, 2015 4:56:58 AM
> > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > > some basic operations most of the OSD's went down
> > > 
> > > 
> > > 
> > > The following:
> > > 
> > > ceph osd tier add img images --force-nonempty ceph osd tier 
> > > cache-mode images forward ceph osd tier set-overlay img images
> > > 
> > > Idea was to make images as a tier to img, move data to img then 
> > > change
> > clients to use the new img pool.
> > > 
> > > Br,
> > > Tuomas
> > > 
> > > > Can you explain exactly what you mean by:
> > > >
> > > > "Also I created one pool for tier to be able to move data without
> > outage."
> > > >
> > > > -Sam
> > > > - Original Message -
> > > > From: "tuomas juntunen" 
> > > > To: "Ian Colle" 
> > > > Cc: ceph-users@lists.ceph.com
> > > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > > > some basic operations most of the OSD's went down
> > > >
> > > > Hi
> > > >
> > > > Any solution for this yet?
> > > >
> > > > Br,
> > > > Tuomas
> > > >

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Tuomas Juntunen

Hi

Updated the logfile, same place http://beta.xaasbox.com/ceph/ceph-osd.15.log

Br,
Tuomas


-Original Message-
From: Sage Weil [mailto:sw...@redhat.com] 
Sent: 27. huhtikuuta 2015 22:22
To: Tuomas Juntunen
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Upgrade from Giant to Hammer and after some basic
operations most of the OSD's went down

On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> Hey
> 
> Got the log, you can get it from
> http://beta.xaasbox.com/ceph/ceph-osd.15.log

Can you repeat this with 'debug osd = 20'?  Thanks!

sage

> 
> Br,
> Tuomas
> 
> 
> -Original Message-
> From: Sage Weil [mailto:sw...@redhat.com]
> Sent: 27. huhtikuuta 2015 20:45
> To: Tuomas Juntunen
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some 
> basic operations most of the OSD's went down
> 
> Yeah, no snaps:
> 
> images:
> "snap_mode": "selfmanaged",
> "snap_seq": 0,
> "snap_epoch": 17882,
> "pool_snaps": [],
> "removed_snaps": "[]",
> 
> img:
> "snap_mode": "selfmanaged",
> "snap_seq": 0,
> "snap_epoch": 0,
> "pool_snaps": [],
> "removed_snaps": "[]",
> 
> ...and actually the log shows this happens on pool 2 (rbd), which has
> 
> "snap_mode": "selfmanaged",
> "snap_seq": 0,
> "snap_epoch": 0,
> "pool_snaps": [],
> "removed_snaps": "[]",
> 
> I'm guessin gthe offending code is
> 
> pi->build_removed_snaps(newly_removed_snaps);
> newly_removed_snaps.subtract(cached_removed_snaps);
> 
> so newly_removed_snaps should be empty, and apparently 
> cached_removed_snaps is not?  Maybe one of your older osdmaps has snap 
> info for rbd?  It doesn't make sense.  :/  Maybe
> 
>  ceph osd dump 18127 -f json-pretty
> 
> just to be certain?  I've pushed a branch 'wip-hammer-snaps' that will 
> appear at gitbuilder.ceph.com in 20-30 minutes that will output some 
> additional debug info.  It will be at
> 
>   
> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/ref/wip-hammer
> -sanps
> 
> or similar, depending on your distro.  Can you install it one on node 
> and start and osd with logging to reproduce the crash?
> 
> Thanks!
> sage
> 
> 
> On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> 
> > Hi
> > 
> > Here you go
> > 
> > Br,
> > Tuomas
> > 
> > 
> > 
> > -Original Message-
> > From: Sage Weil [mailto:sw...@redhat.com]
> > Sent: 27. huhtikuuta 2015 19:23
> > To: Tuomas Juntunen
> > Cc: 'Samuel Just'; ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > some basic operations most of the OSD's went down
> > 
> > On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> > > Thanks for the info.
> > > 
> > > For my knowledge there was no snapshots on that pool, but cannot 
> > > verify that.
> > 
> > Can you attach a 'ceph osd dump -f json-pretty'?  That will shed a 
> > bit more light on what happened (and the simplest way to fix it).
> > 
> > sage
> > 
> > 
> > > Any way to make this work again? Removing the tier and other 
> > > settings didn't fix it, I tried it the second this happened.
> > > 
> > > Br,
> > > Tuomas
> > > 
> > > -Original Message-
> > > From: Samuel Just [mailto:sj...@redhat.com]
> > > Sent: 27. huhtikuuta 2015 15:50
> > > To: tuomas juntunen
> > > Cc: ceph-users@lists.ceph.com
> > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > > some basic operations most of the OSD's went down
> > > 
> > > So, the base tier is what determines the snapshots for the 
> > > cache/base pool
> > amalgam.  You added a populated pool complete with snapshots on top 
> > of a base tier without snapshots.  Apparently, it caused an 
> > existential crisis for the snapshot code.  That's one of the reasons 
> > why there is a --force-nonempty flag for that operation, I think.  I 
> > think the immediate answer is probably to disallow pools with 
> > snapshots as a cache tier altogether until we think of a good way to
make it work.
> > > -Sam
> > > 
> > > - Original Message -
> > > From: "tuomas juntunen" 
> > > To: "Samuel Just" 
> > > Cc: ceph-users@lists.ceph.com
> > > Sent: Monday, April 27, 2015 4:56:58 AM
> > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > > some basic operations most of the OSD's went down
> > > 
> > > 
> > > 
> > > The following:
> > > 
> > > ceph osd tier add img images --force-nonempty ceph osd tier 
> > > cache-mode images forward ceph osd tier set-overlay img images
> > > 
> > > Idea was to make images as a tier to img, move data to img then 
> > > change
> > clients to use the new img pool.
> > > 
> > > Br,
> > > Tuomas
> > > 
> > > > Can you explain exactly what you mean by:
> > > >
> > > > "Also I created one pool for tier to be able to move data 
> > > > without
> > outage."
> > > >
> > > > -Sam
> > > > - Original Message -
>

Re: [ceph-users] Ceph Radosgw multi zone data replication failure

2015-04-27 Thread Craig Lewis

> [root@us-east-1 ceph]# ceph -s --name client.radosgw.us-east-1

> [root@us-east-1 ceph]# ceph -s --name client.radosgw.us-west-1

Are you trying to setup two zones on one cluster?  That's possible, but
you'll also want to spend some time on your CRUSH map making sure that the
two zones are as independent as possible (no shared disks, etc).

Are you using Civetweb or Apache + FastCGI?

Can you include the output (from both clusters):
radosgw-admin --name=client.radosgw.us-east-1 region get
radosgw-admin --name=client.radosgw.us-east-1 zone get

Double check that both system users exist in both clusters, with the same
secret.




On Sun, Apr 26, 2015 at 8:01 AM, Vickey Singh 
wrote:

> Hello Geeks
>
>
> I am trying to setup Ceph Radosgw multi site data replication using
> official documentation
> http://ceph.com/docs/master/radosgw/federated-config/#multi-site-data-replication
>
>
> Everything seems to work except radosgw-agent sync , Request you to please
> check the below outputs and help me in any possible way.
>
>
> *Environment : *
>
>
> CentOS 7.0.1406
>
> Ceph Versino 0.87.1
>
> Rados Gateway configured using Civetweb
>
>
>
> *Radosgw zone list : Works nicely *
>
>
> [root@us-east-1 ceph]# radosgw-admin zone list --name
> client.radosgw.us-east-1
>
> { "zones": [
>
> "us-west",
>
> "us-east"]}
>
> [root@us-east-1 ceph]#
>
>
> *Curl request to master zone : Works nicely *
>
>
> [root@us-east-1 ceph]# curl http://us-east-1.crosslogic.com:7480
>
> http://s3.amazonaws.com/doc/2006-03-01/
> ">anonymous
>
> [root@us-east-1 ceph]#
>
>
> *Curl request to secondary zone : Works nicely *
>
>
> [root@us-east-1 ceph]# curl http://us-west-1.crosslogic.com:7480
>
> http://s3.amazonaws.com/doc/2006-03-01/
> ">anonymous
>
> [root@us-east-1 ceph]#
>
>
> *Rados Gateway agent configuration file : Seems correct, no TYPO errors*
>
>
> [root@us-east-1 ceph]# cat cluster-data-sync.conf
>
> src_access_key: M7QAKDH8CYGTK86CG93U
>
> src_secret_key: 0xQR6PINk23W\/GYrWJ14aF+1stG56M6xMkqkdloO
>
> destination: http://us-west-1.crosslogic.com:7480
>
> dest_access_key: ZQ32ES1WAWPG05YMZ7T7
>
> dest_secret_key: INvk8AkrZRsejLEL34yRpMLmOqydt8ncOXy4RHCM
>
> log_file: /var/log/radosgw/radosgw-sync-us-east-west.log
>
> [root@us-east-1 ceph]#
>
>
> *Rados Gateway agent SYNC : Fails , however it can fetch region map so i
> think src and dest KEYS are correct. But don't know why it fails on
> AttributeError *
>
>
>
> *[root@us-east-1 ceph]# radosgw-agent -c cluster-data-sync.conf*
>
> *region map is: {u'us': [u'us-west', u'us-east']}*
>
> *Traceback (most recent call last):*
>
> *  File "/usr/bin/radosgw-agent", line 21, in *
>
> *sys.exit(main())*
>
> *  File "/usr/lib/python2.7/site-packages/radosgw_agent/cli.py", line 275,
> in main*
>
> *except client.ClientException as e:*
>
> *AttributeError: 'module' object has no attribute 'ClientException'*
>
> *[root@us-east-1 ceph]#*
>
>
> *Can query to Ceph cluster using us-east-1 ID*
>
>
> [root@us-east-1 ceph]# ceph -s --name client.radosgw.us-east-1
>
> cluster 9609b429-eee2-4e23-af31-28a24fcf5cbc
>
>  health HEALTH_OK
>
>  monmap e3: 3 mons at {ceph-node1=
> 192.168.1.101:6789/0,ceph-node2=192.168.1.102:6789/0,ceph-node3=192.168.1.103:6789/0},
> election epoch 448, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3
>
>  osdmap e1063: 9 osds: 9 up, 9 in
>
>   pgmap v8473: 1500 pgs, 43 pools, 374 MB data, 2852 objects
>
> 1193 MB used, 133 GB / 134 GB avail
>
> 1500 active+clean
>
> [root@us-east-1 ceph]#
>
>
> *Can query to Ceph cluster using us-west-1 ID*
>
>
> [root@us-east-1 ceph]# ceph -s --name client.radosgw.us-west-1
>
> cluster 9609b429-eee2-4e23-af31-28a24fcf5cbc
>
>  health HEALTH_OK
>
>  monmap e3: 3 mons at {ceph-node1=
> 192.168.1.101:6789/0,ceph-node2=192.168.1.102:6789/0,ceph-node3=192.168.1.103:6789/0},
> election epoch 448, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3
>
>  osdmap e1063: 9 osds: 9 up, 9 in
>
>   pgmap v8473: 1500 pgs, 43 pools, 374 MB data, 2852 objects
>
> 1193 MB used, 133 GB / 134 GB avail
>
> 1500 active+clean
>
> [root@us-east-1 ceph]#
>
>
> *Hope these packages are correct*
>
>
> [root@us-east-1 ceph]# rpm -qa | egrep -i "ceph|radosgw"
>
> libcephfs1-0.87.1-0.el7.centos.x86_64
>
> ceph-common-0.87.1-0.el7.centos.x86_64
>
> python-ceph-0.87.1-0.el7.centos.x86_64
>
> ceph-radosgw-0.87.1-0.el7.centos.x86_64
>
> ceph-release-1-0.el7.noarch
>
> ceph-0.87.1-0.el7.centos.x86_64
>
> radosgw-agent-1.2.1-0.el7.centos.noarch
>
> [root@us-east-1 ceph]#
>
>
>
> Regards
>
> VS
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Radosgw multi zone data replication failure

2015-04-27 Thread Alfredo Deza

Hi Vickey (and all)

It looks like this issue was introduced as part of the 1.2.1 release.

I just finished getting 1.2.2 out (try upgrading please). You should no longer 
see that
error.

Hope that helps!

-Alfredo

- Original Message -
From: "Craig Lewis" 
To: "Vickey Singh" 
Cc: ceph-users@lists.ceph.com
Sent: Monday, April 27, 2015 4:23:52 PM
Subject: Re: [ceph-users] Ceph Radosgw multi zone data replication failure

> [root@us-east-1 ceph] # ceph -s --name client.radosgw.us-east-1 


> [root@us-east-1 ceph]# ceph -s --name client.radosgw.us-west-1 

Are you trying to setup two zones on one cluster? That's possible, but you'll 
also want to spend some time on your CRUSH map making sure that the two zones 
are as independent as possible (no shared disks, etc). 

Are you using Civetweb or Apache + FastCGI? 

Can you include the output (from both clusters): 
radosgw-admin --name=client.radosgw.us-east-1 region get 
radosgw-admin --name=client.radosgw.us-east-1 zone get 

Double check that both system users exist in both clusters, with the same 
secret. 




On Sun, Apr 26, 2015 at 8:01 AM, Vickey Singh < vickey.singh22...@gmail.com > 
wrote: 





Hello Geeks 




I am trying to setup Ceph Radosgw multi site data replication using official 
documentation 
http://ceph.com/docs/master/radosgw/federated-config/#multi-site-data-replication
 




Everything seems to work except radosgw-agent sync , Request you to please 
check the below outputs and help me in any possible way. 




Environment : 




CentOS 7.0.1406 

Ceph Versino 0.87.1 

Rados Gateway configured using Civetweb 







Radosgw zone list : Works nicely 





[root@us-east-1 ceph]# radosgw-admin zone list --name client.radosgw.us-east-1 

{ "zones": [ 

"us-west", 

"us-east"]} 

[root@us-east-1 ceph]# 




Curl request to master zone : Works nicely 





[root@us-east-1 ceph]# curl http://us-east-1.crosslogic.com:7480 

http://s3.amazonaws.com/doc/2006-03-01/ 
">anonymous
 

[root@us-east-1 ceph]# 




Curl request to secondary zone : Works nicely 




[root@us-east-1 ceph]# curl http://us-west-1.crosslogic.com:7480 

http://s3.amazonaws.com/doc/2006-03-01/ 
">anonymous
 

[root@us-east-1 ceph]# 




Rados Gateway agent configuration file : Seems correct, no TYPO errors 





[root@us-east-1 ceph] # cat cluster-data-sync.conf 

src_access_key: M7QAKDH8CYGTK86CG93U 

src_secret_key: 0xQR6PINk23W\/GYrWJ14aF+1stG56M6xMkqkdloO 

destination: http://us-west-1.crosslogic.com:7480 

dest_access_key: ZQ32ES1WAWPG05YMZ7T7 

dest_secret_key: INvk8AkrZRsejLEL34yRpMLmOqydt8ncOXy4RHCM 

log_file: /var/log/radosgw/radosgw-sync-us-east-west.log 

[root@us-east-1 ceph]# 




Rados Gateway agent SYNC : Fails , however it can fetch region map so i think 
src and dest KEYS are correct. But don't know why it fails on AttributeError 





[root@us-east-1 ceph]# radosgw-agent -c cluster-data-sync.conf 


region map is: {u'us': [u'us-west', u'us-east']} 

Traceback (most recent call last): 

File "/usr/bin/radosgw-agent", line 21, in  

sys.exit(main()) 

File "/usr/lib/python2.7/site-packages/radosgw_agent/cli.py", line 275, in main 

except client.ClientException as e: 

AttributeError: 'module' object has no attribute 'ClientException' 

[root@us-east-1 ceph]# 




Can query to Ceph cluster using us-east-1 ID 




[root@us-east-1 ceph] # ceph -s --name client.radosgw.us-east-1 

cluster 9609b429-eee2-4e23-af31-28a24fcf5cbc 

health HEALTH_OK 

monmap e3: 3 mons at {ceph-node1= 
192.168.1.101:6789/0,ceph-node2=192.168.1.102:6789/0,ceph-node3=192.168.1.103:6789/0
 }, election epoch 448, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3 

osdmap e1063: 9 osds: 9 up, 9 in 

pgmap v8473: 1500 pgs, 43 pools, 374 MB data, 2852 objects 

1193 MB used, 133 GB / 134 GB avail 

1500 active+clean 

[root@us-east-1 ceph]# 




Can query to Ceph cluster using us-west-1 ID 





[root@us-east-1 ceph]# ceph -s --name client.radosgw.us-west-1 

cluster 9609b429-eee2-4e23-af31-28a24fcf5cbc 

health HEALTH_OK 

monmap e3: 3 mons at {ceph-node1= 
192.168.1.101:6789/0,ceph-node2=192.168.1.102:6789/0,ceph-node3=192.168.1.103:6789/0
 }, election epoch 448, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3 

osdmap e1063: 9 osds: 9 up, 9 in 

pgmap v8473: 1500 pgs, 43 pools, 374 MB data, 2852 objects 

1193 MB used, 133 GB / 134 GB avail 

1500 active+clean 

[root@us-east-1 ceph]# 




Hope these packages are correct 





[root@us-east-1 ceph]# rpm -qa | egrep -i "ceph|radosgw" 

libcephfs1-0.87.1-0.el7.centos.x86_64 

ceph-common-0.87.1-0.el7.centos.x86_64 

python-ceph-0.87.1-0.el7.centos.x86_64 

ceph-radosgw-0.87.1-0.el7.centos.x86_64 

ceph-release-1-0.el7.noarch 

ceph-0.87.1-0.el7.centos.x86_64 

radosgw-agent-1.2.1-0.el7.centos.noarch 

[root@us-east-1 ceph]# 







Regards 

VS 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow Files

2015-04-27 Thread Ben


How long are you thinking here?

We added more storage to our cluster to overcome these issues, and we 
can't keep throwing storage at it until the issues are fixed.


On 28/04/15 01:49, Yehuda Sadeh-Weinraub wrote:

It will get to the ceph mainline eventually. We're still reviewing and testing 
the fix, and there's more work to be done on the cleanup tool.

Yehuda

- Original Message -

From: "Ben" 
To: "Yehuda Sadeh-Weinraub" 
Cc: "ceph-users" 
Sent: Sunday, April 26, 2015 11:02:23 PM
Subject: Re: [ceph-users] Shadow Files

Are these fixes going to make it into the repository versions of ceph,
or will we be required to compile and install manually?

On 2015-04-26 02:29, Yehuda Sadeh-Weinraub wrote:

Yeah, that's definitely something that we'd address soon.

Yehuda

- Original Message -

From: "Ben" 
To: "Ben Hines" , "Yehuda Sadeh-Weinraub"

Cc: "ceph-users" 
Sent: Friday, April 24, 2015 5:14:11 PM
Subject: Re: [ceph-users] Shadow Files

Definitely need something to help clear out these old shadow files.

I'm sure our cluster has around 100TB of these shadow files.

I've written a script to go through known objects to get prefixes of
objects
that should exist to compare to ones that shouldn't, but the time it
takes
to do this over millions and millions of objects is just too long.

On 25/04/15 09:53, Ben Hines wrote:



When these are fixed it would be great to get good steps for listing /
cleaning up any orphaned objects. I have suspicions this is affecting
us.

thanks-

-Ben

On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub <
yeh...@redhat.com >
wrote:


These ones:

http://tracker.ceph.com/issues/10295
http://tracker.ceph.com/issues/11447

- Original Message -

From: "Ben Jackson" 
To: "Yehuda Sadeh-Weinraub" < yeh...@redhat.com >
Cc: "ceph-users" < ceph-us...@ceph.com >
Sent: Friday, April 24, 2015 3:06:02 PM
Subject: Re: [ceph-users] Shadow Files

We were firefly, then we upgraded to giant, now we are on hammer.

What issues?

On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub < yeh...@redhat.com >
wrote:

What version are you running? There are two different issues that we
were
fixing this week, and we should have that upstream pretty soon.

Yehuda

- Original Message -

From: "Ben" 
To: "ceph-users" < ceph-us...@ceph.com >
Cc: "Yehuda Sadeh-Weinraub" < yeh...@redhat.com >
Sent: Thursday, April 23, 2015 7:42:06 PM
Subject: [ceph-users] Shadow Files

We are still experiencing a problem with out gateway not properly
clearing out shadow files.

I have done numerous tests where I have:
-Uploaded a file of 1.5GB in size using s3browser application
-Done an object stat on the file to get its prefix
-Done rados ls -p .rgw.buckets | grep  to count the number
of
shadow files associated (in this case it is around 290 shadow files)
-Deleted said file with s3browser
-Performed a gc list, which shows the ~290 files listed
-Waited 24 hours to redo the rados ls -p .rgw.buckets | grep

to
recount the shadow files only to be left with 290 files still there

 From log output /var/log/ceph/radosgw.log, I can see the following
when
clicking DELETE (this appears 290 times)
2015-04-24 10:43:29.996523 7f0b0afb5700 0
RGWObjManifest::operator++():
result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule->part_size=0
2015-04-24 10:43:29.996557 7f0b0afb5700 0
RGWObjManifest::operator++():
result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule->part_size=0
2015-04-24 10:43:29.996564 7f0b0afb5700 0
RGWObjManifest::operator++():
result: ofs=13107200 stripe_ofs=13107200 part_ofs=0
rule->part_size=0
2015-04-24 10:43:29.996570 7f0b0afb5700 0
RGWObjManifest::operator++():
result: ofs=17301504 stripe_ofs=17301504 part_ofs=0
rule->part_size=0
2015-04-24 10:43:29.996576 7f0b0afb5700 0
RGWObjManifest::operator++():
result: ofs=21495808 stripe_ofs=21495808 part_ofs=0
rule->part_size=0
2015-04-24 10:43:29.996581 7f0b0afb5700 0
RGWObjManifest::operator++():
result: ofs=25690112 stripe_ofs=25690112 part_ofs=0
rule->part_size=0
2015-04-24 10:43:29.996586 7f0b0afb5700 0
RGWObjManifest::operator++():
result: ofs=29884416 stripe_ofs=29884416 part_ofs=0
rule->part_size=0
2015-04-24 10:43:29.996592 7f0b0afb5700 0
RGWObjManifest::operator++():
result: ofs=34078720 stripe_ofs=34078720 part_ofs=0
rule->part_size=0

In this same log, I also see the gc process saying it is removing
said
file (these records appear 290 times too)
2015-04-23 14:16:27.926952 7f15be0ee700 0 gc::process: removing
.rgw.buckets:
2015-04-23 14:16:27.928572 7f15be0ee700 0 gc::process: removing
.rgw.buckets:
2015-04-23 14:16:27.929636 7f15be0ee700 0 gc::process: removing
.rgw.buckets:
2015-04-23 14:16:27.930448 7f15be0ee700 0 gc::process: removing
.rgw.buckets:
2015-04-23 14:16:27.931226 7f15be0ee700 0 gc::process: removing
.rgw.buckets:
2015-04-23 14:16:27.932103 7f15be0ee700 0 gc::process: removing
.rgw.buckets:
2015-04-23 14:16:27.933470 7f15be0ee700 0 gc::process: removing
.rgw.buckets:

So even though it appears that the GC is processing its

Re: [ceph-users] Ceph Radosgw multi zone data replication failure

2015-04-27 Thread Vickey Singh

Hello Alfredo / Craig

First of all Thank You So much for replying and giving your precious time
to this problem.

@Alfredo : I tried version radosgw-agent version 1.2.2 and the case has
progressed a lot. ( below are some the logs )


I am now getting

*2015-04-28 00:35:14,781 5132 [radosgw_agent][INFO  ]
http://us-east-1.crosslogic.com:7480 
endpoint does not support versioning*

*2015-04-28 00:35:14,781 5132 [radosgw_agent][WARNIN] encountered issues
reaching to endpoint http://us-east-1.crosslogic.com:7480
*

*2015-04-28 00:35:14,782 5132 [radosgw_agent][WARNIN] HTTP Error 403:
Forbidden*

I am using CIVETWEB , any further help in this would be really helpful.


[root@us-east-1 ceph]#

[root@us-east-1 ceph]# radosgw-agent -c cluster-data-sync.conf

2015-04-28 00:35:14,750 5132 [radosgw_agent][INFO  ]  ____
  __   ___  ___

2015-04-28 00:35:14,750 5132 [radosgw_agent][INFO  ] /__` \ / |\ | /  `
/\  / _` |__  |\ |  |

2015-04-28 00:35:14,751 5132 [radosgw_agent][INFO  ] .__/  |  | \| \__,
/~~\ \__> |___ | \|  |

2015-04-28 00:35:14,751 5132 [radosgw_agent][INFO  ]
  v1.2.2

2015-04-28 00:35:14,751 5132 [radosgw_agent][INFO  ] agent options:

2015-04-28 00:35:14,752 5132 [radosgw_agent][INFO  ]  args:

2015-04-28 00:35:14,753 5132 [radosgw_agent][INFO  ]conf
  : None

2015-04-28 00:35:14,753 5132 [radosgw_agent][INFO  ]dest_access_key
  : 

2015-04-28 00:35:14,753 5132 [radosgw_agent][INFO  ]dest_secret_key
  : 

2015-04-28 00:35:14,753 5132 [radosgw_agent][INFO  ]destination
  : http://us-west-1.crosslogic.com:7480

2015-04-28 00:35:14,753 5132 [radosgw_agent][INFO  ]
incremental_sync_delay: 30

2015-04-28 00:35:14,754 5132 [radosgw_agent][INFO  ]lock_timeout
  : 60

2015-04-28 00:35:14,754 5132 [radosgw_agent][INFO  ]log_file
  : /var/log/radosgw/radosgw-sync-us-east-west.log

2015-04-28 00:35:14,756 5132 [radosgw_agent][INFO  ]log_lock_time
  : 20

2015-04-28 00:35:14,756 5132 [radosgw_agent][INFO  ]max_entries
  : 1000

2015-04-28 00:35:14,757 5132 [radosgw_agent][INFO  ]metadata_only
  : False

2015-04-28 00:35:14,757 5132 [radosgw_agent][INFO  ]num_workers
  : 1

2015-04-28 00:35:14,758 5132 [radosgw_agent][INFO  ]object_sync_timeout
  : 216000

2015-04-28 00:35:14,758 5132 [radosgw_agent][INFO  ]prepare_error_delay
  : 10

2015-04-28 00:35:14,758 5132 [radosgw_agent][INFO  ]quiet
  : False

2015-04-28 00:35:14,758 5132 [radosgw_agent][INFO  ]rgw_data_log_window
  : 30

2015-04-28 00:35:14,759 5132 [radosgw_agent][INFO  ]source
  : None

2015-04-28 00:35:14,759 5132 [radosgw_agent][INFO  ]src_access_key
  : 

2015-04-28 00:35:14,759 5132 [radosgw_agent][INFO  ]src_secret_key
  : 

2015-04-28 00:35:14,759 5132 [radosgw_agent][INFO  ]src_zone
  : None

2015-04-28 00:35:14,759 5132 [radosgw_agent][INFO  ]sync_scope
  : incremental

2015-04-28 00:35:14,760 5132 [radosgw_agent][INFO  ]test_server_host
  : None

2015-04-28 00:35:14,760 5132 [radosgw_agent][INFO  ]test_server_port
  : 8080

2015-04-28 00:35:14,761 5132 [radosgw_agent][INFO  ]verbose
  : False

2015-04-28 00:35:14,761 5132 [radosgw_agent][INFO  ]versioned
  : False

2015-04-28 00:35:14,761 5132 [radosgw_agent.client][INFO  ] creating
connection to endpoint: http://us-west-1.crosslogic.com:7480

region map is: {u'us': [u'us-west', u'us-east']}

*2015-04-28 00:35:14,781 5132 [radosgw_agent][INFO  ]
http://us-east-1.crosslogic.com:7480 
endpoint does not support versioning*

*2015-04-28 00:35:14,781 5132 [radosgw_agent][WARNIN] encountered issues
reaching to endpoint http://us-east-1.crosslogic.com:7480
*

*2015-04-28 00:35:14,782 5132 [radosgw_agent][WARNIN] HTTP Error 403:
Forbidden*

2015-04-28 00:35:14,782 5132 [radosgw_agent.client][INFO  ] creating
connection to endpoint: http://us-east-1.crosslogic.com:7480

2015-04-28 00:35:14,784 5132 [radosgw_agent.client][INFO  ] creating
connection to endpoint: http://us-west-1.crosslogic.com:7480

2015-04-28 00:35:14,785 5132 [radosgw_agent.client][INFO  ] creating
connection to endpoint: http://us-east-1.crosslogic.com:7480

2015-04-28 00:35:14,787 5132 [radosgw_agent.client][INFO  ] creating
connection to endpoint: http://us-west-1.crosslogic.com:7480

*2015-04-28 00:35:14,807 5132 [radosgw_agent.sync][ERROR ] finding number
of shards failed*

2015-04-28 00:35:14,807 5132 [radosgw_agent.sync][WARNIN] error preparing
for sync, will retry. Traceback:

Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/radosgw_agent/sync.py", line 30,
in prepare_sync

[ceph-users] v0.87.2 released

2015-04-27 Thread Sage Weil

This is the second (and possibly final) point release for Giant.

We recommend all v0.87.x Giant users upgrade to this release.

Notable Changes
---

* ceph-objectstore-tool: only output unsupported features when 
  incompatible (#11176 David Zafman)
* common: do not implicitly unlock rwlock on destruction (Federico 
  Simoncelli)
* common: make wait timeout on empty queue configurable (#10818 Samuel 
  Just)
* crush: pick ruleset id that matches and rule id (Xiaoxi Chen)
* crush: set_choose_tries = 100 for new erasure code rulesets (#10353 Loic 
  Dachary)
* librados: check initialized atomic safely (#9617 Josh Durgin)
* librados: fix failed tick_event assert (#11183 Zhiqiang Wang)
* librados: fix looping on skipped maps (#9986 Ding Dinghua)
* librados: fix op submit with timeout (#10340 Samuel Just)
* librados: pybind: fix memory leak (#10723 Billy Olsen)
* librados: pybind: keep reference to callbacks (#10775 Josh Durgin)
* librados: translate operation flags from C APIs (Matthew Richards)
* libradosstriper: fix write_full on ENOENT (#10758 Sebastien Ponce)
* libradosstriper: use strtoll instead of strtol (Dongmao Zhang)
* mds: fix assertion caused by system time moving backwards (#11053 Yan, 
  Zheng)
* mon: allow injection of random delays on writes (Joao Eduardo Luis)
* mon: do not trust small osd epoch cache values (#10787 Sage Weil)
* mon: fail non-blocking flush if object is being scrubbed (#8011 Samuel 
  Just)
* mon: fix division by zero in stats dump (Joao Eduardo Luis)
* mon: fix get_rule_avail when no osds (#10257 Joao Eduardo Luis)
* mon: fix timeout rounds period (#10546 Joao Eduardo Luis)
* mon: ignore osd failures before up_from (#10762 Dan van der Ster, Sage 
  Weil)
* mon: paxos: reset accept timeout before writing to store (#10220 Joao 
  Eduardo Luis)
* mon: return if fs exists on 'fs new' (Joao Eduardo Luis)
* mon: use EntityName when expanding profiles (#10844 Joao Eduardo Luis)
* mon: verify cross-service proposal preconditions (#10643 Joao Eduardo 
  Luis)
* mon: wait for osdmon to be writeable when requesting proposal (#9794 
  Joao Eduardo Luis)
* mount.ceph: avoid spurious error message about /etc/mtab (#10351 Yan, 
  Zheng)
* msg/simple: allow RESETSESSION when we forget an endpoint (#10080 Greg 
  Farnum)
* msg/simple: discard delay queue before incoming queue (#9910 Sage Weil)
* osd: clear_primary_state when leaving Primary (#10059 Samuel Just)
* osd: do not ignore deleted pgs on startup (#10617 Sage Weil)
* osd: fix FileJournal wrap to get header out first (#10883 David Zafman)
* osd: fix PG leak in SnapTrimWQ (#10421 Kefu Chai)
* osd: fix journalq population in do_read_entry (#6003 Samuel Just)
* osd: fix operator== for op_queue_age_hit and fs_perf_stat (#10259 Samuel 
  Just)
* osd: fix rare assert after split (#10430 David Zafman)
* osd: get pgid ancestor from last_map when building past intervals 
  (#10430 David Zafman)
* osd: include rollback_info_trimmed_to in {read,write}_log (#10157 Samuel 
  Just)
* osd: lock header_lock in DBObjectMap::sync (#9891 Samuel Just)
* osd: requeue blocked op before flush it was blocked on (#10512 Sage 
  Weil)
* osd: tolerate missing object between list and attr get on backfill 
  (#10150 Samuel Just)
* osd: use correct atime for eviction decision (Xinze Chi)
* rgw: flush XML header on get ACL request (#10106 Yehuda Sadeh)
* rgw: index swift keys appropriately (#10471 Hemant Bruman, Yehuda Sadeh)
* rgw: send cancel for bucket index pending ops (#10770 Baijiaruo, Yehuda 
  Sadeh)
* rgw: swift: support X_Remove_Container-Meta-{key} (#01475 Dmytro 
  Iurchenko)

For more detailed information, see

  http://ceph.com/docs/master/_downloads/v0.87.2.txt

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://ceph.com/download/ceph-0.87.2.tar.gz
* For packages, see http://ceph.com/docs/master/install/get-packages
* For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] IOWait on SATA-backed with SSD-journals

2015-04-27 Thread Gregory Farnum

On Sat, Apr 25, 2015 at 11:36 PM, Josef Johansson  wrote:
> Hi,
>
> With inspiration from all the other performance threads going on here, I 
> started to investigate on my own as well.
>
> I’m seeing a lot iowait on the OSD, and the journal utilised at 2-7%, with 
> about 8-30MB/s (mostly around 8MB/s write). This is a dumpling cluster. The 
> goal here is to increase the utilisation to maybe 50%.

I'm confused. You've got regular hard drives backing your system, so
in the long run you aren't going to be able to do much better than
those hard drives can do. The SSDs are much faster, so of course
they're not getting a load that counts as heavy for them. None of the
tuning you discuss below is going to do much except perhaps give the
steady state a longer startup time.
-Greg

>
> Journals: Intel DC S3700, OSD: HGST 4TB
>
> I did some initial testing to make the wbthrottle have more in the buffer, 
> and I think I managed to do it, didn’t affect the journal utilisation though.
>
> There’s 12 cores for the 10 OSDs per machine to utilise, and they use about 
> 20% of them, so I guess no bottle neck there.
>
> Well that’s the problem, I really can’t see any bottleneck with the current 
> layout, maybe it’s out copper 10Gb that’s giving us too much latency?
>
> It would be fancy with some kind of bottle-neck troubleshoot in ceph docs :)
> I’m guessing I’m not the only one on these kinds of specs and would be 
> interesting to see if there’s optimisation to be done.
>
> Hope you guys have a nice weekend :)
>
> Cheers,
> Josef
>
> Ping from a host to OSD:
>
> 6 packets transmitted, 6 received, 0% packet loss, time 4998ms
> rtt min/avg/max/mdev = 0.063/0.107/0.193/0.048 ms
>
> Setting on the OSD
>
> { "filestore_wbthrottle_xfs_ios_start_flusher": "5000"}
> { "filestore_wbthrottle_xfs_inodes_start_flusher": "5000"}
> { "filestore_wbthrottle_xfs_ios_hard_limit": "1"}
> { "filestore_wbthrottle_xfs_inodes_hard_limit": "1"}
> { "filestore_max_sync_interval": "30”}
>
> From the standard
>
> { "filestore_wbthrottle_xfs_ios_start_flusher": "500"}
> { "filestore_wbthrottle_xfs_inodes_start_flusher": "500"}
> { "filestore_wbthrottle_xfs_ios_hard_limit": “5000"}
> { "filestore_wbthrottle_xfs_inodes_hard_limit": “5000"}
> { "filestore_max_sync_interval": “5”}
>
>
> a single dump_historic_ops
>
> { "description": "osd_op(client.47765822.0:99270434 
> rbd_data.1da982c2eb141f2.5825 [stat,write 2093056~8192] 
> 3.8130048c e19290)",
>   "rmw_flags": 6,
>   "received_at": "2015-04-26 08:24:03.226255",
>   "age": "87.026653",
>   "duration": "0.801927",
>   "flag_point": "commit sent; apply or cleanup",
>   "client_info": { "client": "client.47765822",
>   "tid": 99270434},
>   "events": [
> { "time": "2015-04-26 08:24:03.226329",
>   "event": "waiting_for_osdmap"},
> { "time": "2015-04-26 08:24:03.230921",
>   "event": "reached_pg"},
> { "time": "2015-04-26 08:24:03.230928",
>   "event": "started"},
> { "time": "2015-04-26 08:24:03.230931",
>   "event": "started"},
> { "time": "2015-04-26 08:24:03.231791",
>   "event": "waiting for subops from [22,48]"},
> { "time": "2015-04-26 08:24:03.231813",
>   "event": "commit_queued_for_journal_write"},
> { "time": "2015-04-26 08:24:03.231849",
>   "event": "write_thread_in_journal_buffer"},
> { "time": "2015-04-26 08:24:03.232075",
>   "event": "journaled_completion_queued"},
> { "time": "2015-04-26 08:24:03.232492",
>   "event": "op_commit"},
> { "time": "2015-04-26 08:24:03.233134",
>   "event": "sub_op_commit_rec"},
> { "time": "2015-04-26 08:24:03.233183",
>   "event": "op_applied"},
> { "time": "2015-04-26 08:24:04.028167",
>   "event": "sub_op_commit_rec"},
> { "time": "2015-04-26 08:24:04.028174",
>   "event": "commit_sent"},
> { "time": "2015-04-26 08:24:04.028182",
>   "event": "done"}]},
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] about rgw region and zone

2015-04-27 Thread TERRY

Hi: all
  
 when I Configuring Federated Gateways?? I got  the error as below:
  
 sudo radosgw-agent  -c  /etc/ceph/ceph-data-sync.conf  
ERROR:root:Could not retrieve region map from destination
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/radosgw_agent/cli.py", line 269, in 
main
region_map = client.get_region_map(dest_conn)
  File "/usr/lib/python2.6/site-packages/radosgw_agent/client.py", line 391, in 
get_region_map
region_map = request(connection, 'get', 'admin/config')
  File "/usr/lib/python2.6/site-packages/radosgw_agent/client.py", line 155, in 
request
check_result_status(result)
  File "/usr/lib/python2.6/site-packages/radosgw_agent/client.py", line 116, in 
check_result_status
HttpError)(result.status_code, result.content)
NotFound: Http error code 404 content {"Code":"NoSuchKey"}
  
  
 I  have some quesions when I execute the command 
 1??radosgw-admin zone set --rgw-zone=us-west --infile us-west.json --name 
client.radosgw.us-west-1
 i  have no idea about the option --name ,   what's the difference  if  i  do 
it without --name config;
  
 2??Create a Region
 there is a conversation near the end of doc :
If you use different Ceph Storage Cluster instances for regions, you should 
repeat steps 2, 4 and 5 in by executing them with 
--nameclient.radosgw-us-west-1. You may also export the region map from the 
initial gateway instance and import it followed by updating the region map. 
  
 I has one cluster named ceph, one region named us, and two zones: us-east?? 
us-west?? us-east is the  master  zone. I has two gateway 
instances??client.radosgw.us-east-1??client.radosgw.us-west-1.  Do i need 
repeat steps 2,4,and 5? do  i need export the region map from the  initial 
gateway instance and import it___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

2015-04-27 Thread Tuomas Juntunen

Just to add some more interesting behavior to my problem, is that monitors
are not updating the status of OSD's.

Even when I stop all the remaining OSD's, ceph osd tree shows them as up.
Also there the status of mons and mds doesn't seem to update correctly in my
opinion.

Below is a copy of status when one mon and mds are stopped and all of the
osd's are also stopped.

 monmap e7: 3 mons at
{ceph1=10.20.0.11:6789/0,ceph2=10.20.0.12:6789/0,ceph3=10.20.0.13:6789/0}
election epoch 48, quorum 0,1 ceph1,ceph2
 mdsmap e1750: 1/1/1 up {0=ceph2=up:replay}, 1 up:standby
 osdmap e18132: 37 osds: 11 up, 11 in

Br,
Tuomas


-Original Message-
From: Sage Weil [mailto:sw...@redhat.com] 
Sent: 27. huhtikuuta 2015 22:22
To: Tuomas Juntunen
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Upgrade from Giant to Hammer and after some basic
operations most of the OSD's went down

On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> Hey
> 
> Got the log, you can get it from
> http://beta.xaasbox.com/ceph/ceph-osd.15.log

Can you repeat this with 'debug osd = 20'?  Thanks!

sage

> 
> Br,
> Tuomas
> 
> 
> -Original Message-
> From: Sage Weil [mailto:sw...@redhat.com]
> Sent: 27. huhtikuuta 2015 20:45
> To: Tuomas Juntunen
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some 
> basic operations most of the OSD's went down
> 
> Yeah, no snaps:
> 
> images:
> "snap_mode": "selfmanaged",
> "snap_seq": 0,
> "snap_epoch": 17882,
> "pool_snaps": [],
> "removed_snaps": "[]",
> 
> img:
> "snap_mode": "selfmanaged",
> "snap_seq": 0,
> "snap_epoch": 0,
> "pool_snaps": [],
> "removed_snaps": "[]",
> 
> ...and actually the log shows this happens on pool 2 (rbd), which has
> 
> "snap_mode": "selfmanaged",
> "snap_seq": 0,
> "snap_epoch": 0,
> "pool_snaps": [],
> "removed_snaps": "[]",
> 
> I'm guessin gthe offending code is
> 
> pi->build_removed_snaps(newly_removed_snaps);
> newly_removed_snaps.subtract(cached_removed_snaps);
> 
> so newly_removed_snaps should be empty, and apparently 
> cached_removed_snaps is not?  Maybe one of your older osdmaps has snap 
> info for rbd?  It doesn't make sense.  :/  Maybe
> 
>  ceph osd dump 18127 -f json-pretty
> 
> just to be certain?  I've pushed a branch 'wip-hammer-snaps' that will 
> appear at gitbuilder.ceph.com in 20-30 minutes that will output some 
> additional debug info.  It will be at
> 
>   
> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/ref/wip-hammer
> -sanps
> 
> or similar, depending on your distro.  Can you install it one on node 
> and start and osd with logging to reproduce the crash?
> 
> Thanks!
> sage
> 
> 
> On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> 
> > Hi
> > 
> > Here you go
> > 
> > Br,
> > Tuomas
> > 
> > 
> > 
> > -Original Message-
> > From: Sage Weil [mailto:sw...@redhat.com]
> > Sent: 27. huhtikuuta 2015 19:23
> > To: Tuomas Juntunen
> > Cc: 'Samuel Just'; ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > some basic operations most of the OSD's went down
> > 
> > On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> > > Thanks for the info.
> > > 
> > > For my knowledge there was no snapshots on that pool, but cannot 
> > > verify that.
> > 
> > Can you attach a 'ceph osd dump -f json-pretty'?  That will shed a 
> > bit more light on what happened (and the simplest way to fix it).
> > 
> > sage
> > 
> > 
> > > Any way to make this work again? Removing the tier and other 
> > > settings didn't fix it, I tried it the second this happened.
> > > 
> > > Br,
> > > Tuomas
> > > 
> > > -Original Message-
> > > From: Samuel Just [mailto:sj...@redhat.com]
> > > Sent: 27. huhtikuuta 2015 15:50
> > > To: tuomas juntunen
> > > Cc: ceph-users@lists.ceph.com
> > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > > some basic operations most of the OSD's went down
> > > 
> > > So, the base tier is what determines the snapshots for the 
> > > cache/base pool
> > amalgam.  You added a populated pool complete with snapshots on top 
> > of a base tier without snapshots.  Apparently, it caused an 
> > existential crisis for the snapshot code.  That's one of the reasons 
> > why there is a --force-nonempty flag for that operation, I think.  I 
> > think the immediate answer is probably to disallow pools with 
> > snapshots as a cache tier altogether until we think of a good way to
make it work.
> > > -Sam
> > > 
> > > - Original Message -
> > > From: "tuomas juntunen" 
> > > To: "Samuel Just" 
> > > Cc: ceph-users@lists.ceph.com
> > > Sent: Monday, April 27, 2015 4:56:58 AM
> > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after 
> > > some basic operations most of the OSD's went down
> > > 
> > >

[ceph-users] [cephfs][ceph-fuse] cache size or memory leak?

2015-04-27 Thread Dexter Xiong

Hi,
I've deployed a small hammer cluster 0.94.1. And I mount it via
ceph-fuse on Ubuntu 14.04. After several hours I found that the ceph-fuse
process crashed. The end is the crash log from
/var/log/ceph/ceph-client.admin.log. The memory cost of ceph-fuse process
was huge(more than 4GB) when it crashed.
Then I did some test and found these actions will increase memory cost
of ceph-fuse rapidly and the memory cost never seem to decrease:

   - rsync command to sync small files(rsync -a /mnt/some_small /srv/ceph)
   - chown command/ chmod command(chmod 775 /srv/ceph -R)

But chown/chmod command on accessed files will not increase the memory cost.
It seems that ceph-fuse caches the file nodes but never releases them.
I don't know if there is an option to control the cache size. I set mds
cache size = 2147483647 option to improve the performance of mds, and I
tried to set mds cache size = 1000 at client side but it doesn't effect the
result.






here is the crash log:
   -85> 2015-04-27 11:25:32.263743 7ff7c3fff700  3 client.74478 ll_forget
133ebe6 1
   -84> 2015-04-27 11:25:32.263748 7ff7f1ffb700  3 client.74478 ll_forget
133ebe6 1
   -83> 2015-04-27 11:25:32.263760 7ff7c3fff700  3 client.74478 ll_getattr
13436d6.head
   -82> 2015-04-27 11:25:32.263763 7ff7c3fff700  3 client.74478 ll_getattr
13436d6.head = 0
   -81> 2015-04-27 11:25:32.263770 7ff7c18f0700  3 client.74478 ll_getattr
1015146.head
   -80> 2015-04-27 11:25:32.263775 7ff7c18f0700  3 client.74478 ll_getattr
1015146.head = 0
   -79> 2015-04-27 11:25:32.263781 7ff7c18f0700  3 client.74478 ll_forget
1015146 1
   -78> 2015-04-27 11:25:32.263789 7ff7f17fa700  3 client.74478 ll_lookup
0x7ff6ed91fd00 2822
   -77> 2015-04-27 11:25:32.263794 7ff7f17fa700  3 client.74478 ll_lookup
0x7ff6ed91fd00 2822 -> 0 (13459d0)
   -76> 2015-04-27 11:25:32.263800 7ff7f17fa700  3 client.74478 ll_forget
13436d6 1
   -75> 2015-04-27 11:25:32.263807 7ff7c10ef700  3 client.74478 ll_lookup
0x7ff6e49b42d0 4519
   -74> 2015-04-27 11:25:32.263812 7ff7c10ef700  3 client.74478 ll_lookup
0x7ff6e49b42d0 4519 -> 0 (101a4d7)
   -73> 2015-04-27 11:25:32.263820 7ff7c10ef700  3 client.74478 ll_forget
1015146 1
   -72> 2015-04-27 11:25:32.263827 7ff7037fe700  3 client.74478 ll_getattr
13459d0.head
   -71> 2015-04-27 11:25:32.263832 7ff7037fe700  3 client.74478 ll_getattr
13459d0.head = 0
   -70> 2015-04-27 11:25:32.263840 7ff7c3fff700  3 client.74478 ll_forget
13436d6 1
   -69> 2015-04-27 11:25:32.263849 7ff7c3fff700  3 client.74478 ll_lookup
0x7ff6ed92e8c0 4_o_contour.jpg
   -68> 2015-04-27 11:25:32.263854 7ff7c3fff700  3 client.74478 ll_lookup
0x7ff6ed92e8c0 4_o_contour.jpg -> 0 (13464c2)
   -67> 2015-04-27 11:25:32.263863 7ff7c08ee700  3 client.74478 ll_getattr
101a4d7.head
   -66> 2015-04-27 11:25:32.263866 7ff7c08ee700  3 client.74478 ll_getattr
101a4d7.head = 0
   -65> 2015-04-27 11:25:32.263872 7ff7c08ee700  3 client.74478 ll_forget
101a4d7 1
   -64> 2015-04-27 11:25:32.263874 7ff7037fe700  3 client.74478 ll_forget
13459d0 1
   -63> 2015-04-27 11:25:32.263886 7ff7c08ee700  3 client.74478 ll_getattr
13464c2.head
   -62> 2015-04-27 11:25:32.263889 7ff7c08ee700  3 client.74478 ll_getattr
13464c2.head = 0
   -61> 2015-04-27 11:25:32.263891 7ff7c3fff700  3 client.74478 ll_forget
13459d0 1
   -60> 2015-04-27 11:25:32.263900 7ff7c08ee700  3 client.74478 ll_forget
13464c2 1
   -59> 2015-04-27 11:25:32.263911 7ff7f2ffd700  3 client.74478 ll_lookup
0x7ff6de277990 5_o_rectImg.png
   -58> 2015-04-27 11:25:32.263924 7ff7f2ffd700  1 -- 192.168.1.201:0/24527
--> 192.168.1.210:6800/1299 -- client_request(client.74478:1304984 lookup
#101a4d7/5_o_rectImg.png 2015-04-27 11:25:32.263921) v2 -- ?+0
0x7ff7d8010a50 con 0x2b43690
   -57> 2015-04-27 11:25:32.264026 7ff703fff700  3 client.74478 ll_getattr
100.head
   -56> 2015-04-27 11:25:32.264031 7ff703fff700  3 client.74478 ll_getattr
100.head = 0
   -55> 2015-04-27 11:25:32.264035 7ff703fff700  3 client.74478 ll_forget
100 1
   -54> 2015-04-27 11:25:32.264046 7ff7f27fc700  3 client.74478 ll_lookup
0x7ff8ad70 backup
   -53> 2015-04-27 11:25:32.264052 7ff7f27fc700  3 client.74478 ll_lookup
0x7ff8ad70 backup -> 0 (10003e9)
   -52> 2015-04-27 11:25:32.264057 7ff7f27fc700  3 client.74478 ll_forget
100 1
   -51> 2015-04-27 11:25:32.264071 7ff7f1ffb700  3 client.74478 ll_getattr
10003e9.head
   -50> 2015-04-27 11:25:32.264076 7ff7f1ffb700  3 client.74478 ll_getattr
10003e9.head = 0
   -49> 2015-04-27 11:25:32.264080 7ff7f1ffb700  3 client.74478 ll_forget
10003e9 1
   -48> 2015-04-27 11:25:32.264092 7ff7c18f0700  3 client.74478 ll_lookup
0x7ff8b6c0 11
   -47> 2015-04-27 11:25:32.264098 7ff7c18f0700  3 client.74478 ll_lookup
0x7ff8b6c0 11 -> 0 (10b883c)
   -46> 2015-04-27 11:25:32.264104 7ff7c18f0700  3 client.74478 ll_forget
10003e9 1
   -45> 2015-04-27 11:25:32.264118 7ff7f17fa700  3 clie

Re: [ceph-users] Calamari server not working after upgrade 0.87-1 -> 0.94-1

2015-04-27 Thread Steffen W Sørensen


> On 27/04/2015, at 15.51, Alexandre DERUMIER  wrote:
> 
> Hi, can you check on your ceph node
> /var/log/salt/minion ?
> 
> I have had some similar problem, I have need to remove
> 
> rm /etc/salt/pki/minion/minion_master.pub
> /etc/init.d/salt-minion restart
> 
> (I don't known if "calamari-ctl clear" change the salt master key)
Apparently not, master key is the same. Before clearing we still had various 
perf. data updated like IOPS per pool, cpu etc only the cluster PG info seemed 
stuck on old cluster info, so thought we wanted to clear the slate and start 
all over.

Howto make calamari properly aware of an existing cluster?

/Steffen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

46 matches

Mail list logo