[ceph-users] librados API never kills threads

2016-09-13 Thread Stuart Byma
Hi,

Can anyone tell me why librados creates multiple threads per object, and never 
kills them, even when the ioctx is deleted? I am using the C++ API with a 
single connection and a single IO context. More threads and memory are used for 
each new object accessed. Is there a way to prevent this behaviour, like 
prevent the implementation from caching per-object connections? Or is there a 
way to “close” objects after reading/writing them that will kill these threads 
and release associated memory? Destroying and recreating the cluster connection 
is too expensive to do all the time. 

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-osd fail to be started

2016-09-13 Thread Ronny Aasen

On 13. sep. 2016 07:10, strony zhang wrote:

Hi,

My ceph cluster include 5 OSDs. 3 osds are installed in the host
'strony-tc' and 2 are in the host 'strony-pc'. Recently, both of hosts
were rebooted due to power cycles. After all of disks are mounted again,
the ceph-osd are in the 'down' status. I tried cmd, "sudo start ceph-osd
id=x', to start the OSDs. But they are not started well with the error
below reported in the 'dmesg' output. Any suggestions about how to make
the OSDs started well? Any comments are appreciated.

"
[6595400.895147] init: ceph-osd (ceph/1) main process ended, respawning
[6595400.969346] init: ceph-osd (ceph/1) main process (21990) terminated
with status 1
[6595400.969352] init: ceph-osd (ceph/1) respawning too fast, stopped
"

:~$ ceph osd tree
ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 1.09477 root default
-2 0.61818 host strony-tc
 0 0.2 osd.0 down0  1.0
 1 0.21819 osd.1 down0  1.0
 4 0.2 osd.4   up  1.0  1.0
-3 0.47659 host strony-pc
 2 0.23830 osd.2 down0  1.0
 3 0.23830 osd.3 down0  1.0

:~$ cat /etc/ceph/ceph.conf
[global]
fsid = 60638bfd-1eea-46d5-900d-36224475d8aa
mon_initial_members = strony-tc
mon_host = 10.132.141.122
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_pool_default_size = 2

Thanks,
Strony




greetings.

I have somewhat of a similar problem
osd's that are just a single disk start on boot.

but osd's that are software raid md devices does not start automatically 
on boot


in order to mount and start them i have to run
ceph-disk-activate /dev/md127p1

where /dev/md127p1 is the xfs partition for the osd.

good luck
Ronny Aasen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem starting osd ; PGLog.cc: 984: FAILED assert hammer 0.94.9

2016-09-13 Thread Ronny Aasen
I suspect this must be a difficult question since there have been no 
replies on irc or mailinglist.


assuming it's impossible to get these osd's running again.

Is there a way to recover objects from the disks. ? they are mounted and 
data is readable. I have pg's down since they want to probe these osd's 
that do not want to start.


pg query claim it can continue if i mark the osd as lost. but i would 
prefer to not loose data. especially since the data is ok and readable 
on the nonfunctioning osd.


also let me know if there is other debug i can extract in order to 
troubleshoot the non starting osd's


kind regards
Ronny Aasen





On 12. sep. 2016 13:16, Ronny Aasen wrote:

after adding more osd's and having a big backfill running 2 of my osd's
keep on stopping.

We also recently upgraded from 0.94.7 to 0.94.9 but i do not know if
that is related.

the log say.

 0> 2016-09-12 10:31:08.288858 7f8749125880 -1 osd/PGLog.cc: In
function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t,
ghobject_t, const pg_info_t&, std::map&,
PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&,
std::set >*)' thread 7f8749125880 time
2016-09-12 10:31:08.286337
osd/PGLog.cc: 984: FAILED assert(oi.version == i->first)

googeling led me to a bug that seems to be related to infernalis only.
dmesg does not show anything wrong with the hardware.

this is debian running hammer 0.94.9
and the osd is a software raid5 consisting of 5 3TB harddrives.
journal is a partition on ssd intel 3500

anyone have a clue to what can be wrong ?

kind regrads
Ronny Aasen





-- log debug_filestore=10 --
   -19> 2016-09-12 10:31:08.070947 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/1df4bfdd/rb.0.392c.238e1f29.002bd134/head '_' = 266
-18> 2016-09-12 10:31:08.083111 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/deb5bfdd/rb.0.392c.238e1f29.002bc596/head '_' = 266
-17> 2016-09-12 10:31:08.096718 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/9be5dfdd/rb.0.392c.238e1f29.002bc2bf/head '_' = 266
-16> 2016-09-12 10:31:08.110048 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/cbf8ffdd/rb.0.392c.238e1f29.002b9d89/head '_' = 266
-15> 2016-09-12 10:31:08.126263 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/e49d0fdd/rb.0.392c.238e1f29.002b078e/head '_' = 266
-14> 2016-09-12 10:31:08.150199 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/e49d0fdd/rb.0.392c.238e1f29.002b078e/22 '_' = 259
-13> 2016-09-12 10:31:08.173223 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/d0827fdd/rb.0.392c.238e1f29.002b0373/head '_' = 266
-12> 2016-09-12 10:31:08.199192 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/d0827fdd/rb.0.392c.238e1f29.002b0373/22 '_' = 259
-11> 2016-09-12 10:31:08.232712 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/bf4effdd/rb.0.392c.238e1f29.002ae882/head '_' = 266
-10> 2016-09-12 10:31:08.265331 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/bf4effdd/rb.0.392c.238e1f29.002ae882/22 '_' = 259
 -9> 2016-09-12 10:31:08.265456 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) error opening file
/var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.00b381ae__head_DB220FDD__1
with flags=2: (2) No such file or directory
 -8> 2016-09-12 10:31:08.265475 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/db220fdd/rb.0.392c.238e1f29.00b381ae/head '_' = -2
 -7> 2016-09-12 10:31:08.265535 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) error opening file
/var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.00b381ae__21_DB220FDD__1
with flags=2: (2) No such file or directory
 -6> 2016-09-12 10:31:08.265546 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/db220fdd/rb.0.392c.238e1f29.00b381ae/21 '_' = -2
 -5> 2016-09-12 10:31:08.265609 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) error opening file
/var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.00cf4057__head_12020FDD__1
with flags=2: (2) No such file or directory
 -4> 2016-09-12 10:31:08.265628 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/12020fdd/rb.0.392c.238e1f29.00cf4057/head '_' = -2
 -3> 2016-09-12 10:31:08.265688 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) error opening file
/var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.00cf4057__21_12020FDD__1
with flags=2: (2) No such file or directory
 -2> 2016-09-12 10:31:08.265700 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/12020fdd/rb.0.392c.238e1f29.00cf4057/21 '_' = 

[ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread yu2xiangyang
Hello everyone,


I have met a ceph-fuse crash when i add osd to osd pool.


I am writing data through ceph-fuse,then i add one osd to osd pool, after less 
than 30 s, the ceph-fuse process crash.


The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow:


[root@localhost ~]# rpm -qa | grep ceph
libcephfs1-10.2.2-0.el7.centos.x86_64
python-cephfs-10.2.2-0.el7.centos.x86_64
ceph-common-0.94.3-0.el7.x86_64
ceph-fuse-10.2.2-0.el7.centos.x86_64
ceph-0.94.3-0.el7.x86_64
ceph-mds-10.2.2-0.el7.centos.x86_64
[root@localhost ~]# 
[root@localhost ~]# 
[root@localhost ~]# rpm -qa | grep rados
librados2-devel-0.94.3-0.el7.x86_64
librados2-0.94.3-0.el7.x86_64
libradosstriper1-0.94.3-0.el7.x86_64
python-rados-0.94.3-0.el7.x86_64


ceph stat:


[root@localhost ~]# ceph status
cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
 health HEALTH_WARN
clock skew detected on mon.2, mon.0
19 pgs stale
19 pgs stuck stale
Monitor clock skew detected 
 monmap e3: 3 mons at 
{0=10.222.5.229:6789/0,1=10.222.5.156:6789/0,2=10.222.5.162:6789/0}
election epoch 26, quorum 0,1,2 1,2,0
 mdsmap e58: 1/1/1 up {0=0=up:active}, 1 up:standby
 osdmap e324: 9 osds: 9 up, 9 in
  pgmap v3505: 320 pgs, 3 pools, 4638 MB data, 1302 objects
23373 MB used, 68695 MB / 92069 MB avail
 301 active+clean
  19 stale+active+clean


ceph osd stat:
[root@localhost ~]# ceph osd dump
epoch 324
fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
created 2016-09-13 11:08:34.629245
modified 2016-09-13 16:21:53.285729
flags 
pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 5 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 128 pgp_num 128 last_change 285 flags hashpspool 
crash_replay_interval 45 stripe_width 0
pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 128 pgp_num 128 last_change 282 flags hashpspool 
stripe_width 0
max_osd 9
osd.0 up   in  weight 1 up_from 271 up_thru 321 down_at 242 last_clean_interval 
[169,175) 10.222.5.229:6800/3780 10.222.5.229:6801/3780 10.222.5.229:6802/3780 
10.222.5.229:6803/3780 exists,up 1bf6cda4-bf1a-4f8a-836d-b6aec970d257
osd.1 up   in  weight 1 up_from 223 up_thru 320 down_at 186 last_clean_interval 
[20,183) 10.222.5.229:6804/2228 10.222.5.229:6805/2228 10.222.5.229:6806/2228 
10.222.5.229:6807/2228 exists,up 3f3ad2fa-52b1-46fd-af6c-05178b814e25
osd.2 up   in  weight 1 up_from 224 up_thru 320 down_at 186 last_clean_interval 
[22,183) 10.222.5.229:6808/2259 10.222.5.229:6809/2259 10.222.5.229:6810/2259 
10.222.5.229:6811/2259 exists,up 9199193e-9928-4c5d-8adc-2c32a4c8716b
osd.3 up   in  weight 1 up_from 312 up_thru 313 down_at 303 last_clean_interval 
[0,0) 10.222.5.156:6800/3592 10.222.5.156:6801/3592 10.222.5.156:6802/3592 
10.222.5.156:6803/3592 exists,up 9b8f1cb0-51df-42aa-8be4-8f6347235cc2
osd.4 up   in  weight 1 up_from 25 up_thru 322 down_at 0 last_clean_interval 
[0,0) 10.222.5.156:6804/25567 10.222.5.156:6805/25567 10.222.5.156:6806/25567 
10.222.5.156:6807/25567 exists,up 0c719e5e-f8fc-46e0-926d-426bf6881ee0
osd.5 up   in  weight 1 up_from 27 up_thru 310 down_at 0 last_clean_interval 
[0,0) 10.222.5.156:6808/25678 10.222.5.156:6809/25678 10.222.5.156:6810/25678 
10.222.5.156:6811/25678 exists,up 729e0749-2ce3-426a-a7f1-a3cbfa88ba0b
osd.6 up   in  weight 1 up_from 40 up_thru 313 down_at 0 last_clean_interval 
[0,0) 10.222.5.162:6807/15887 10.222.5.162:6808/15887 10.222.5.162:6809/15887 
10.222.5.162:6810/15887 exists,up dea24f0f-4666-40af-98af-5ab8d42c37c6
osd.7 up   in  weight 1 up_from 45 up_thru 313 down_at 0 last_clean_interval 
[0,0) 10.222.5.162:6811/16040 10.222.5.162:6812/16040 10.222.5.162:6813/16040 
10.222.5.162:6814/16040 exists,up 0e238745-0091-4790-9b39-c9d36f4ebbee
osd.8 up   in  weight 1 up_from 49 up_thru 314 down_at 0 last_clean_interval 
[0,0) 10.222.5.162:6815/16206 10.222.5.162:6816/16206 10.222.5.162:6817/16206 
10.222.5.162:6818/16206 exists,up 59637f86-f283-4397-a63b-474976ee8047
[root@localhost ~]# 
[root@localhost ~]# ceph osd tree
ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 9.0 root default 
-5 3.0 host yxy02   
 1 1.0 osd.1   up  1.0  1.0 
 2 1.0 osd.2   up  1.0  1.0 
 0 1.0 osd.0   up  1.0  1.0 
-6 3.0 host yxy03   
 4 1.0 osd.4   up  1.0  1.0  -> 
OSD JUST ADD!
 5 1.0 osd.5   up  1.0  1.0 
 3 1.0 osd.3   up  1.0  1.0 
-7 3.0 host zwr01   
 6 1.0 osd.6   up  1.0  1.0

Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread John Spray
On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang  wrote:
> Hello everyone,
>
> I have met a ceph-fuse crash when i add osd to osd pool.
>
> I am writing data through ceph-fuse,then i add one osd to osd pool, after
> less than 30 s, the ceph-fuse process crash.

It looks like this could be an ObjectCacher bug that is only being
exposed because of an unusual timing caused by the cluster slowing
down during PG creation.  Was this reproducible or a one-off
occurence?

Please could you create a ticket on tracker.ceph.com with all this info.

Thanks,
John

> The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow:
>
> [root@localhost ~]# rpm -qa | grep ceph
> libcephfs1-10.2.2-0.el7.centos.x86_64
> python-cephfs-10.2.2-0.el7.centos.x86_64
> ceph-common-0.94.3-0.el7.x86_64
> ceph-fuse-10.2.2-0.el7.centos.x86_64
> ceph-0.94.3-0.el7.x86_64
> ceph-mds-10.2.2-0.el7.centos.x86_64
> [root@localhost ~]#
> [root@localhost ~]#
> [root@localhost ~]# rpm -qa | grep rados
> librados2-devel-0.94.3-0.el7.x86_64
> librados2-0.94.3-0.el7.x86_64
> libradosstriper1-0.94.3-0.el7.x86_64
> python-rados-0.94.3-0.el7.x86_64
>
> ceph stat:
>
> [root@localhost ~]# ceph status
> cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
>  health HEALTH_WARN
> clock skew detected on mon.2, mon.0
> 19 pgs stale
> 19 pgs stuck stale
> Monitor clock skew detected
>  monmap e3: 3 mons at
> {0=10.222.5.229:6789/0,1=10.222.5.156:6789/0,2=10.222.5.162:6789/0}
> election epoch 26, quorum 0,1,2 1,2,0
>  mdsmap e58: 1/1/1 up {0=0=up:active}, 1 up:standby
>  osdmap e324: 9 osds: 9 up, 9 in
>   pgmap v3505: 320 pgs, 3 pools, 4638 MB data, 1302 objects
> 23373 MB used, 68695 MB / 92069 MB avail
>  301 active+clean
>   19 stale+active+clean
>
> ceph osd stat:
> [root@localhost ~]# ceph osd dump
> epoch 324
> fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
> created 2016-09-13 11:08:34.629245
> modified 2016-09-13 16:21:53.285729
> flags
> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
> pool 5 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 285 flags hashpspool
> crash_replay_interval 45 stripe_width 0
> pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 282 flags hashpspool
> stripe_width 0
> max_osd 9
> osd.0 up   in  weight 1 up_from 271 up_thru 321 down_at 242
> last_clean_interval [169,175) 10.222.5.229:6800/3780 10.222.5.229:6801/3780
> 10.222.5.229:6802/3780 10.222.5.229:6803/3780 exists,up
> 1bf6cda4-bf1a-4f8a-836d-b6aec970d257
> osd.1 up   in  weight 1 up_from 223 up_thru 320 down_at 186
> last_clean_interval [20,183) 10.222.5.229:6804/2228 10.222.5.229:6805/2228
> 10.222.5.229:6806/2228 10.222.5.229:6807/2228 exists,up
> 3f3ad2fa-52b1-46fd-af6c-05178b814e25
> osd.2 up   in  weight 1 up_from 224 up_thru 320 down_at 186
> last_clean_interval [22,183) 10.222.5.229:6808/2259 10.222.5.229:6809/2259
> 10.222.5.229:6810/2259 10.222.5.229:6811/2259 exists,up
> 9199193e-9928-4c5d-8adc-2c32a4c8716b
> osd.3 up   in  weight 1 up_from 312 up_thru 313 down_at 303
> last_clean_interval [0,0) 10.222.5.156:6800/3592 10.222.5.156:6801/3592
> 10.222.5.156:6802/3592 10.222.5.156:6803/3592 exists,up
> 9b8f1cb0-51df-42aa-8be4-8f6347235cc2
> osd.4 up   in  weight 1 up_from 25 up_thru 322 down_at 0 last_clean_interval
> [0,0) 10.222.5.156:6804/25567 10.222.5.156:6805/25567
> 10.222.5.156:6806/25567 10.222.5.156:6807/25567 exists,up
> 0c719e5e-f8fc-46e0-926d-426bf6881ee0
> osd.5 up   in  weight 1 up_from 27 up_thru 310 down_at 0 last_clean_interval
> [0,0) 10.222.5.156:6808/25678 10.222.5.156:6809/25678
> 10.222.5.156:6810/25678 10.222.5.156:6811/25678 exists,up
> 729e0749-2ce3-426a-a7f1-a3cbfa88ba0b
> osd.6 up   in  weight 1 up_from 40 up_thru 313 down_at 0 last_clean_interval
> [0,0) 10.222.5.162:6807/15887 10.222.5.162:6808/15887
> 10.222.5.162:6809/15887 10.222.5.162:6810/15887 exists,up
> dea24f0f-4666-40af-98af-5ab8d42c37c6
> osd.7 up   in  weight 1 up_from 45 up_thru 313 down_at 0 last_clean_interval
> [0,0) 10.222.5.162:6811/16040 10.222.5.162:6812/16040
> 10.222.5.162:6813/16040 10.222.5.162:6814/16040 exists,up
> 0e238745-0091-4790-9b39-c9d36f4ebbee
> osd.8 up   in  weight 1 up_from 49 up_thru 314 down_at 0 last_clean_interval
> [0,0) 10.222.5.162:6815/16206 10.222.5.162:6816/16206
> 10.222.5.162:6817/16206 10.222.5.162:6818/16206 exists,up
> 59637f86-f283-4397-a63b-474976ee8047
> [root@localhost ~]#
> [root@localhost ~]# ceph osd tree
> ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 9.0 root default
> -5 3.0 host yxy02
>  1 1.0 osd.1   up  1.0  1.0
>  2 1.0 osd.2   up  1.0  1.0
>  0 1.0 o

Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread yu2xiangyang


This problem was reproducible.


I remove one osd from the osd tree and after one minute, I add the same osd to 
osd pool and then fuse client crush.


Ceph fuse is writing data through smallfile too, and the script is


" python smallfile_cli.py --top /mnt/test --threads 8 --files 20 --file-size 
10240 --record-size 512 --operation create"


and my remove osd steps are:


1. kill -9 $pid_num 2. ceph osd out $id 3. ceph osd down $id 4. ceph osd crush 
remove osd.$id 5. ceph auth del osd.$id 6. ceph osd rm osd.$id


and my add osd steps are:


1. mkfs.xfs and remount my the osd remove..


2. ceph osd create 3. ceph-osd -i $id --mkfs --osd-data=/data/osd/osd.$id 
--mkkey 4. ceph auth add osd.$id osd 'allow *' mon 'allow rwx' -i 
/data/osd/osd.$id/keyring 5. ceph osd crush create-or-move osd.$id 1.0 
host= 6. ceph-osd -i $id






At 2016-09-13 17:01:09, "John Spray"  wrote:
>On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang  wrote:
>> Hello everyone, 
>>
>> I have met a ceph-fuse crash when i add osd to osd pool.
>>
>> I am writing data through ceph-fuse,then i add one osd to osd pool, after
>> less than 30 s, the ceph-fuse process crash.
>
>It looks like this could be an ObjectCacher bug that is only being
>exposed because of an unusual timing caused by the cluster slowing
>down during PG creation.  Was this reproducible or a one-off
>occurence?
>
>Please could you create a ticket on tracker.ceph.com with all this info.
>
>Thanks,
>John
>
>> The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow:
>>
>> [root@localhost ~]# rpm -qa | grep ceph
>> libcephfs1-10.2.2-0.el7.centos.x86_64
>> python-cephfs-10.2.2-0.el7.centos.x86_64
>> ceph-common-0.94.3-0.el7.x86_64
>> ceph-fuse-10.2.2-0.el7.centos.x86_64
>> ceph-0.94.3-0.el7.x86_64
>> ceph-mds-10.2.2-0.el7.centos.x86_64
>> [root@localhost ~]#
>> [root@localhost ~]#
>> [root@localhost ~]# rpm -qa | grep rados
>> librados2-devel-0.94.3-0.el7.x86_64
>> librados2-0.94.3-0.el7.x86_64
>> libradosstriper1-0.94.3-0.el7.x86_64
>> python-rados-0.94.3-0.el7.x86_64
>>
>> ceph stat:
>>
>> [root@localhost ~]# ceph status
>> cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
>>  health HEALTH_WARN
>> clock skew detected on mon.2, mon.0
>> 19 pgs stale
>> 19 pgs stuck stale
>> Monitor clock skew detected
>>  monmap e3: 3 mons at
>> {0=10.222.5.229:6789/0,1=10.222.5.156:6789/0,2=10.222.5.162:6789/0}
>> election epoch 26, quorum 0,1,2 1,2,0
>>  mdsmap e58: 1/1/1 up {0=0=up:active}, 1 up:standby
>>  osdmap e324: 9 osds: 9 up, 9 in
>>   pgmap v3505: 320 pgs, 3 pools, 4638 MB data, 1302 objects
>> 23373 MB used, 68695 MB / 92069 MB avail
>>  301 active+clean
>>   19 stale+active+clean
>>
>> ceph osd stat:
>> [root@localhost ~]# ceph osd dump
>> epoch 324
>> fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
>> created 2016-09-13 11:08:34.629245
>> modified 2016-09-13 16:21:53.285729
>> flags
>> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
>> pool 5 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 285 flags hashpspool
>> crash_replay_interval 45 stripe_width 0
>> pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 282 flags hashpspool
>> stripe_width 0
>> max_osd 9
>> osd.0 up   in  weight 1 up_from 271 up_thru 321 down_at 242
>> last_clean_interval [169,175) 10.222.5.229:6800/3780 10.222.5.229:6801/3780
>> 10.222.5.229:6802/3780 10.222.5.229:6803/3780 exists,up
>> 1bf6cda4-bf1a-4f8a-836d-b6aec970d257
>> osd.1 up   in  weight 1 up_from 223 up_thru 320 down_at 186
>> last_clean_interval [20,183) 10.222.5.229:6804/2228 10.222.5.229:6805/2228
>> 10.222.5.229:6806/2228 10.222.5.229:6807/2228 exists,up
>> 3f3ad2fa-52b1-46fd-af6c-05178b814e25
>> osd.2 up   in  weight 1 up_from 224 up_thru 320 down_at 186
>> last_clean_interval [22,183) 10.222.5.229:6808/2259 10.222.5.229:6809/2259
>> 10.222.5.229:6810/2259 10.222.5.229:6811/2259 exists,up
>> 9199193e-9928-4c5d-8adc-2c32a4c8716b
>> osd.3 up   in  weight 1 up_from 312 up_thru 313 down_at 303
>> last_clean_interval [0,0) 10.222.5.156:6800/3592 10.222.5.156:6801/3592
>> 10.222.5.156:6802/3592 10.222.5.156:6803/3592 exists,up
>> 9b8f1cb0-51df-42aa-8be4-8f6347235cc2
>> osd.4 up   in  weight 1 up_from 25 up_thru 322 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6804/25567 10.222.5.156:6805/25567
>> 10.222.5.156:6806/25567 10.222.5.156:6807/25567 exists,up
>> 0c719e5e-f8fc-46e0-926d-426bf6881ee0
>> osd.5 up   in  weight 1 up_from 27 up_thru 310 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6808/25678 10.222.5.156:6809/25678
>> 10.222.5.156:6810/25678 10.222.5.156:6811/25678 exists,up
>> 729e0749-2ce3-426a-a7f1-a3cbfa88ba0b
>> osd.6 up   in  weight 1 up_from 4

Re: [ceph-users] librados API never kills threads

2016-09-13 Thread Josh Durgin

On 09/13/2016 01:13 PM, Stuart Byma wrote:

Hi,

Can anyone tell me why librados creates multiple threads per object, and never 
kills them, even when the ioctx is deleted? I am using the C++ API with a 
single connection and a single IO context. More threads and memory are used for 
each new object accessed. Is there a way to prevent this behaviour, like 
prevent the implementation from caching per-object connections? Or is there a 
way to “close” objects after reading/writing them that will kill these threads 
and release associated memory? Destroying and recreating the cluster connection 
is too expensive to do all the time.


The threads you're seeing are most likely associated with the cluster
connection - with SimpleMessenger, the default, a connection to an OSD
uses 2 threads. If you access objects that happen to live on different
OSDs, you'll notice more threads being created for each new OSD. These
are encapulated in the Rados object, which you likely don't want to
recreate all the time.

An IoCtx is really just a small set of in-memory state, e.g. pool id, 
snapshot, namespace, etc. and doesn't consume many resources itself.


Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread yu2xiangyang
I have submitted the issue at  "http://tracker.ceph.com/issues/17270";.


At 2016-09-13 17:01:09, "John Spray"  wrote:
>On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang  wrote:
>> Hello everyone,
>>
>> I have met a ceph-fuse crash when i add osd to osd pool.
>>
>> I am writing data through ceph-fuse,then i add one osd to osd pool, after
>> less than 30 s, the ceph-fuse process crash.
>
>It looks like this could be an ObjectCacher bug that is only being
>exposed because of an unusual timing caused by the cluster slowing
>down during PG creation.  Was this reproducible or a one-off
>occurence?
>
>Please could you create a ticket on tracker.ceph.com with all this info.
>
>Thanks,
>John
>
>> The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow:
>>
>> [root@localhost ~]# rpm -qa | grep ceph
>> libcephfs1-10.2.2-0.el7.centos.x86_64
>> python-cephfs-10.2.2-0.el7.centos.x86_64
>> ceph-common-0.94.3-0.el7.x86_64
>> ceph-fuse-10.2.2-0.el7.centos.x86_64
>> ceph-0.94.3-0.el7.x86_64
>> ceph-mds-10.2.2-0.el7.centos.x86_64
>> [root@localhost ~]#
>> [root@localhost ~]#
>> [root@localhost ~]# rpm -qa | grep rados
>> librados2-devel-0.94.3-0.el7.x86_64
>> librados2-0.94.3-0.el7.x86_64
>> libradosstriper1-0.94.3-0.el7.x86_64
>> python-rados-0.94.3-0.el7.x86_64
>>
>> ceph stat:
>>
>> [root@localhost ~]# ceph status
>> cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
>>  health HEALTH_WARN
>> clock skew detected on mon.2, mon.0
>> 19 pgs stale
>> 19 pgs stuck stale
>> Monitor clock skew detected
>>  monmap e3: 3 mons at
>> {0=10.222.5.229:6789/0,1=10.222.5.156:6789/0,2=10.222.5.162:6789/0}
>> election epoch 26, quorum 0,1,2 1,2,0
>>  mdsmap e58: 1/1/1 up {0=0=up:active}, 1 up:standby
>>  osdmap e324: 9 osds: 9 up, 9 in
>>   pgmap v3505: 320 pgs, 3 pools, 4638 MB data, 1302 objects
>> 23373 MB used, 68695 MB / 92069 MB avail
>>  301 active+clean
>>   19 stale+active+clean
>>
>> ceph osd stat:
>> [root@localhost ~]# ceph osd dump
>> epoch 324
>> fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
>> created 2016-09-13 11:08:34.629245
>> modified 2016-09-13 16:21:53.285729
>> flags
>> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
>> pool 5 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 285 flags hashpspool
>> crash_replay_interval 45 stripe_width 0
>> pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 282 flags hashpspool
>> stripe_width 0
>> max_osd 9
>> osd.0 up   in  weight 1 up_from 271 up_thru 321 down_at 242
>> last_clean_interval [169,175) 10.222.5.229:6800/3780 10.222.5.229:6801/3780
>> 10.222.5.229:6802/3780 10.222.5.229:6803/3780 exists,up
>> 1bf6cda4-bf1a-4f8a-836d-b6aec970d257
>> osd.1 up   in  weight 1 up_from 223 up_thru 320 down_at 186
>> last_clean_interval [20,183) 10.222.5.229:6804/2228 10.222.5.229:6805/2228
>> 10.222.5.229:6806/2228 10.222.5.229:6807/2228 exists,up
>> 3f3ad2fa-52b1-46fd-af6c-05178b814e25
>> osd.2 up   in  weight 1 up_from 224 up_thru 320 down_at 186
>> last_clean_interval [22,183) 10.222.5.229:6808/2259 10.222.5.229:6809/2259
>> 10.222.5.229:6810/2259 10.222.5.229:6811/2259 exists,up
>> 9199193e-9928-4c5d-8adc-2c32a4c8716b
>> osd.3 up   in  weight 1 up_from 312 up_thru 313 down_at 303
>> last_clean_interval [0,0) 10.222.5.156:6800/3592 10.222.5.156:6801/3592
>> 10.222.5.156:6802/3592 10.222.5.156:6803/3592 exists,up
>> 9b8f1cb0-51df-42aa-8be4-8f6347235cc2
>> osd.4 up   in  weight 1 up_from 25 up_thru 322 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6804/25567 10.222.5.156:6805/25567
>> 10.222.5.156:6806/25567 10.222.5.156:6807/25567 exists,up
>> 0c719e5e-f8fc-46e0-926d-426bf6881ee0
>> osd.5 up   in  weight 1 up_from 27 up_thru 310 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6808/25678 10.222.5.156:6809/25678
>> 10.222.5.156:6810/25678 10.222.5.156:6811/25678 exists,up
>> 729e0749-2ce3-426a-a7f1-a3cbfa88ba0b
>> osd.6 up   in  weight 1 up_from 40 up_thru 313 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6807/15887 10.222.5.162:6808/15887
>> 10.222.5.162:6809/15887 10.222.5.162:6810/15887 exists,up
>> dea24f0f-4666-40af-98af-5ab8d42c37c6
>> osd.7 up   in  weight 1 up_from 45 up_thru 313 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6811/16040 10.222.5.162:6812/16040
>> 10.222.5.162:6813/16040 10.222.5.162:6814/16040 exists,up
>> 0e238745-0091-4790-9b39-c9d36f4ebbee
>> osd.8 up   in  weight 1 up_from 49 up_thru 314 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6815/16206 10.222.5.162:6816/16206
>> 10.222.5.162:6817/16206 10.222.5.162:6818/16206 exists,up
>> 59637f86-f283-4397-a63b-474976ee8047
>> [root@localhost ~]#
>> [root@localhost ~]# ceph osd tree
>> ID WEIGHT  TYPE NAME  UP/DOW

Re: [ceph-users] swiftclient call radosgw, it always response 401 Unauthorized

2016-09-13 Thread Brian Chang-Chien
Hi ,naga.b

I use Ceph jewel 10.2.2
my ceph.conf  as follow
[global]
fsid = d056c174-2e3a-4c36-a067-cb774d176ce2
mon_initial_members = brianceph
mon_host = 10.62.9.140
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_crush_chooseleaf_type = 0
osd_pool_default_size = 1
osd_journal_size = 100
[client.radosgw.gateway]
host = brianceph
keyring = /etc/ceph/ceph.client.radosgw.keyring
log_file = /var/log/ceph/radosgw.log
rgw_dns_name = brianceph
rgw_keystone_url = http://10.62.13.253:35357
rgw_keystone_admin_token = 7bb8e26cbc714c47a26ffec3d96f246f
rgw_keystone_accepted_roles = admin, swiftuser
rgw_ketstone_token_cache_size = 200
rgw_keystone_revocation_interval = 30
rgw_s3_auth_use_keystone = true
nss_db_path = /var/ceph/nss

and my radosgw.log

2016-09-13 17:42:38.638462 7efd964619c0  0 starting handler: fastcgi
2016-09-13 17:42:38.638523 7efcadf9b700  0 ERROR: no socket server point
defined, cannot start fcgi frontend
2016-09-13 17:47:33.597070 7efcdeffd700  1 == starting new request
req=0x7efcdeff7710 =
2016-09-13 17:47:33.597329 7efcdeffd700  1 == req done
req=0x7efcdeff7710 op status=0 http_status=401 ==
2016-09-13 17:47:33.597379 7efcdeffd700  1 civetweb: 0x7efd2bb0:
10.62.9.34 - - [13/Sep/2016:17:47:33 +0800] "HEAD /swift/v1 HTTP/1.1" 401 0
- python-swiftclient-2.6.0
2016-09-13 17:47:34.755291 7efcd700  1 == starting new request
req=0x7efcdfff9710 =
2016-09-13 17:47:34.755443 7efcd700  1 == req done
req=0x7efcdfff9710 op status=0 http_status=401 ==
2016-09-13 17:47:34.755481 7efcd700  1 civetweb: 0x7efd48004020:
10.62.9.34 - - [13/Sep/2016:17:47:34 +0800] "HEAD /swift/v1 HTTP/1.1" 401 0
- python-swiftclient-2.6.0
2016-09-13 17:49:04.718249 7efcdf7fe700  1 == starting new request
req=0x7efcdf7f8710 =
2016-09-13 17:49:04.718438 7efcdf7fe700  1 == req done
req=0x7efcdf7f8710 op status=0 http_status=401 ==
2016-09-13 17:49:04.718483 7efcdf7fe700  1 civetweb: 0x7efd68001f60:
10.62.9.34 - - [13/Sep/2016:17:49:04 +0800] "HEAD /swift/v1 HTTP/1.1" 401 0
- python-swiftclient-2.6.0
2016-09-13 17:49:05.870115 7efcde7fc700  1 == starting new request
req=0x7efcde7f6710 =
2016-09-13 17:49:05.870280 7efcde7fc700  1 == req done
req=0x7efcde7f6710 op status=0 http_status=401 ==
2016-09-13 17:49:05.870324 7efcde7fc700  1 civetweb: 0x7efd28000bb0:
10.62.9.34 - - [13/Sep/2016:17:49:05 +0800] "HEAD /swift/v1 HTTP/1.1" 401 0
- python-swiftclient-2.6.0
2016-09-13 17:51:32.036065 7efd157fa700  1 handle_sigterm
2016-09-13 17:51:32.036099 7efd157fa700  1 handle_sigterm set alarm for 120
2016-09-13 17:51:32.036153 7efd964619c0 -1 shutting down
2016-09-13 17:51:32.037977 7efd78df9700  0 monclient: hunting for new mon
2016-09-13 17:51:32.038172 7efd783f6700  0 -- 10.62.9.140:0/1002906388 >>
10.62.9.140:6789/0 pipe(0x7efd60016670 sd=7 :0 s=1 pgs=0 cs=0 l=1
c=0x7efd60014d70).fault
2016-09-13 17:51:32.906553 7efd964619c0  1 final shutdown
2016-09-13 17:51:39.294948 7ff5175f29c0  0 deferred set uid:gid to 167:167
(ceph:ceph)
2016-09-13 17:51:39.295097 7ff5175f29c0  0 ceph version 10.2.2
(45107e21c568dd033c2f0a3107dec8f0b0e58374), process radosgw, pid 13251
2016-09-13 17:51:39.318311 7ff5175e8700  0 -- :/175783115 >>
10.62.9.140:6789/0 pipe(0x7ff51987b9b0 sd=7 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff519842430).fault
2016-09-13 17:51:39.596568 7ff4fc10d700  0 -- 10.62.9.140:0/175783115 >>
10.62.9.140:6800/11336 pipe(0x7ff519880080 sd=8 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff519881390).fault
2016-09-13 17:51:40.197109 7ff4fc10d700  0 -- 10.62.9.140:0/175783115 >>
10.62.9.140:6800/11336 pipe(0x7ff519880080 sd=8 :42233 s=1 pgs=0 cs=0 l=1
c=0x7ff519881390).connect claims to be 10.62.9.140:6800/13358 not
10.62.9.140:6800/11336 - wrong node!
2016-09-13 17:51:40.997618 7ff4fc10d700  0 -- 10.62.9.140:0/175783115 >>
10.62.9.140:6800/11336 pipe(0x7ff519880080 sd=8 :42234 s=1 pgs=0 cs=0 l=1
c=0x7ff519881390).connect claims to be 10.62.9.140:6800/13358 not
10.62.9.140:6800/11336 - wrong node!
2016-09-13 17:51:42.598080 7ff4fc10d700  0 -- 10.62.9.140:0/175783115 >>
10.62.9.140:6800/11336 pipe(0x7ff519880080 sd=8 :42235 s=1 pgs=0 cs=0 l=1
c=0x7ff519881390).connect claims to be 10.62.9.140:6800/13358 not
10.62.9.140:6800/11336 - wrong node!
2016-09-13 17:51:45.798587 7ff4fc10d700  0 -- 10.62.9.140:0/175783115 >>
10.62.9.140:6800/11336 pipe(0x7ff519880080 sd=8 :42236 s=1 pgs=0 cs=0 l=1
c=0x7ff519881390).connect claims to be 10.62.9.140:6800/13358 not
10.62.9.140:6800/11336 - wrong node!
2016-09-13 17:51:52.199050 7ff4fc10d700  0 -- 10.62.9.140:0/175783115 >>
10.62.9.140:6800/11336 pipe(0x7ff519880080 sd=8 :42237 s=1 pgs=0 cs=0 l=1
c=0x7ff519881390).connect claims to be 10.62.9.140:6800/13358 not
10.62.9.140:6800/11336 - wrong node!
2016-09-13 17:51:54.596518 7ff4fc10d700  0 -- 10.62.9.140:0/175783115 >>
10.62.9.140:6800/11336 pipe(0x7ff519880080 sd=8 :42238 s=1 pgs=0 cs=0 l=1
c=0x7ff519881390).connect claims to be 10.62.9.140:6800/13358 not
10.62.9.140:6

[ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Nikolay Borisov
Hello list, 


I have the following cluster: 

ceph status
cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0
 health HEALTH_OK
 monmap e2: 5 mons at 
{alxc10=x:6789/0,alxc11=x:6789/0,alxc5=x:6789/0,alxc6=:6789/0,alxc7=x:6789/0}
election epoch 196, quorum 0,1,2,3,4 alxc10,alxc5,alxc6,alxc7,alxc11
 mdsmap e797: 1/1/1 up {0=alxc11.=up:active}, 2 up:standby
 osdmap e11243: 50 osds: 50 up, 50 in
  pgmap v3563774: 8192 pgs, 3 pools, 1954 GB data, 972 kobjects
4323 GB used, 85071 GB / 89424 GB avail
8192 active+clean
  client io 168 MB/s rd, 11629 kB/s wr, 3447 op/s

It's running ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) and 
kernel 4.4.14

I have multiple rbd devices which are used as the root for lxc-based containers 
and have ext4. At some point I want 
to create a an rbd snapshot, for this the sequence of operations I do is thus: 

1. freezefs -f /path/to/where/ext4-ontop-of-rbd-is-mounted

2. rbd snap create "${CEPH_POOL_NAME}/${name-of-blockdev}@${name-of-snapshot}

3. freezefs -u /path/to/where/ext4-ontop-of-rbd-is-mounted

<= At this point normal container operation continues =>

4. Mount the newly created snapshot to a 2nd location as read-only and rsync 
the files from it to a remote server.

However as I start rsyncing stuff to the remote server then certain files in 
the snapshot are reported as corrupted. 

freezefs implies filesystem syncing I also tested with manually doing 
sync/syncfs on the fs which is being snapshot. Before 
and after the freezefs and the corruption is still present. So it's unlikely 
there are dirty buffers in the page cache. 
I'm using the kernel rbd driver for the clients. The theory currently is there 
are some caches which are not being flushed, 
other than the linux page cache. Reading the doc implies that only librbd is 
using separate caching but I'm not using librbd. 

Any ideas would be much appreciated. 

Regards, 
Nikolay 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread John Spray
On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang  wrote:
> Hello everyone,
>
> I have met a ceph-fuse crash when i add osd to osd pool.
>
> I am writing data through ceph-fuse,then i add one osd to osd pool, after
> less than 30 s, the ceph-fuse process crash.
>
> The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow:

I missed this version mismatch until someone pointed it out (thanks Brad)

In theory the newer fuse client should still work with the older OSD,
but it would be very interesting to know if this issue is still
reproducible if you use all Jewel packages.

John

>
> [root@localhost ~]# rpm -qa | grep ceph
> libcephfs1-10.2.2-0.el7.centos.x86_64
> python-cephfs-10.2.2-0.el7.centos.x86_64
> ceph-common-0.94.3-0.el7.x86_64
> ceph-fuse-10.2.2-0.el7.centos.x86_64
> ceph-0.94.3-0.el7.x86_64
> ceph-mds-10.2.2-0.el7.centos.x86_64
> [root@localhost ~]#
> [root@localhost ~]#
> [root@localhost ~]# rpm -qa | grep rados
> librados2-devel-0.94.3-0.el7.x86_64
> librados2-0.94.3-0.el7.x86_64
> libradosstriper1-0.94.3-0.el7.x86_64
> python-rados-0.94.3-0.el7.x86_64
>
> ceph stat:
>
> [root@localhost ~]# ceph status
> cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
>  health HEALTH_WARN
> clock skew detected on mon.2, mon.0
> 19 pgs stale
> 19 pgs stuck stale
> Monitor clock skew detected
>  monmap e3: 3 mons at
> {0=10.222.5.229:6789/0,1=10.222.5.156:6789/0,2=10.222.5.162:6789/0}
> election epoch 26, quorum 0,1,2 1,2,0
>  mdsmap e58: 1/1/1 up {0=0=up:active}, 1 up:standby
>  osdmap e324: 9 osds: 9 up, 9 in
>   pgmap v3505: 320 pgs, 3 pools, 4638 MB data, 1302 objects
> 23373 MB used, 68695 MB / 92069 MB avail
>  301 active+clean
>   19 stale+active+clean
>
> ceph osd stat:
> [root@localhost ~]# ceph osd dump
> epoch 324
> fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
> created 2016-09-13 11:08:34.629245
> modified 2016-09-13 16:21:53.285729
> flags
> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
> pool 5 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 285 flags hashpspool
> crash_replay_interval 45 stripe_width 0
> pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 282 flags hashpspool
> stripe_width 0
> max_osd 9
> osd.0 up   in  weight 1 up_from 271 up_thru 321 down_at 242
> last_clean_interval [169,175) 10.222.5.229:6800/3780 10.222.5.229:6801/3780
> 10.222.5.229:6802/3780 10.222.5.229:6803/3780 exists,up
> 1bf6cda4-bf1a-4f8a-836d-b6aec970d257
> osd.1 up   in  weight 1 up_from 223 up_thru 320 down_at 186
> last_clean_interval [20,183) 10.222.5.229:6804/2228 10.222.5.229:6805/2228
> 10.222.5.229:6806/2228 10.222.5.229:6807/2228 exists,up
> 3f3ad2fa-52b1-46fd-af6c-05178b814e25
> osd.2 up   in  weight 1 up_from 224 up_thru 320 down_at 186
> last_clean_interval [22,183) 10.222.5.229:6808/2259 10.222.5.229:6809/2259
> 10.222.5.229:6810/2259 10.222.5.229:6811/2259 exists,up
> 9199193e-9928-4c5d-8adc-2c32a4c8716b
> osd.3 up   in  weight 1 up_from 312 up_thru 313 down_at 303
> last_clean_interval [0,0) 10.222.5.156:6800/3592 10.222.5.156:6801/3592
> 10.222.5.156:6802/3592 10.222.5.156:6803/3592 exists,up
> 9b8f1cb0-51df-42aa-8be4-8f6347235cc2
> osd.4 up   in  weight 1 up_from 25 up_thru 322 down_at 0 last_clean_interval
> [0,0) 10.222.5.156:6804/25567 10.222.5.156:6805/25567
> 10.222.5.156:6806/25567 10.222.5.156:6807/25567 exists,up
> 0c719e5e-f8fc-46e0-926d-426bf6881ee0
> osd.5 up   in  weight 1 up_from 27 up_thru 310 down_at 0 last_clean_interval
> [0,0) 10.222.5.156:6808/25678 10.222.5.156:6809/25678
> 10.222.5.156:6810/25678 10.222.5.156:6811/25678 exists,up
> 729e0749-2ce3-426a-a7f1-a3cbfa88ba0b
> osd.6 up   in  weight 1 up_from 40 up_thru 313 down_at 0 last_clean_interval
> [0,0) 10.222.5.162:6807/15887 10.222.5.162:6808/15887
> 10.222.5.162:6809/15887 10.222.5.162:6810/15887 exists,up
> dea24f0f-4666-40af-98af-5ab8d42c37c6
> osd.7 up   in  weight 1 up_from 45 up_thru 313 down_at 0 last_clean_interval
> [0,0) 10.222.5.162:6811/16040 10.222.5.162:6812/16040
> 10.222.5.162:6813/16040 10.222.5.162:6814/16040 exists,up
> 0e238745-0091-4790-9b39-c9d36f4ebbee
> osd.8 up   in  weight 1 up_from 49 up_thru 314 down_at 0 last_clean_interval
> [0,0) 10.222.5.162:6815/16206 10.222.5.162:6816/16206
> 10.222.5.162:6817/16206 10.222.5.162:6818/16206 exists,up
> 59637f86-f283-4397-a63b-474976ee8047
> [root@localhost ~]#
> [root@localhost ~]# ceph osd tree
> ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 9.0 root default
> -5 3.0 host yxy02
>  1 1.0 osd.1   up  1.0  1.0
>  2 1.0 osd.2   up  1.0  1.0
>  0 1.0 osd.0   up  1.0  1

[ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread Daznis
Hello,


I have encountered a strange I/O freeze while rebooting one OSD node
for maintenance purpose. It was one of the 3 Nodes in the entire
cluster. Before this rebooting or shutting down and entire node just
slowed down the ceph, but not completely froze it.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jewel blocked requests

2016-09-13 Thread Dennis Kramer (DBS)
I also have this problem. Is it perhaps possible to block clients
entirely if it is not using a specific version of Ceph?

BTW, I often stumble upon the cephfs problem:
"client failing to respond to capability release", which result in
blocked requests aswell. But i'm not entirely sure if you run CephFS.

On 09/13/2016 02:44 AM, Christian Balzer wrote:
> 
> Hello,
> 
> On Mon, 12 Sep 2016 19:28:50 -0500 shiva rkreddy wrote:
> 
>> By saying "old clients"  did you mean, (a) Client VMs running old Operating
>> System (b)  Client VMs/Volumes that are in-use for a long time and across
>> ceph releases ? Was there any tuning done to fix it?
>>
> I'm pretty sure he means c) that is:
> Clients (hosts) running older versions of Ceph.
> 
> Which is a pretty common thing, both because the Ceph cluster operator may
> not be controlling all the clients, there are version dependencies from
> things like OpenStack, the Ceph version on the client was installed by
> using an OS distro package (usually well aged).
> 
> What I'd like to know from Wido is if he opened a tracker issue for that,
> since this kind of regression should not happen [tm].
> 
> Christian
>  
>> Thanks,
>>
>> On Mon, Sep 12, 2016 at 3:05 PM, Wido den Hollander  wrote:
>>
>>>
 Op 12 september 2016 om 18:47 schreef "WRIGHT, JON R (JON R)" <
>>> jonrodwri...@gmail.com>:


 Since upgrading to Jewel from Hammer, we're started to see HEALTH_WARN
 because of 'blocked requests > 32 sec'.   Seems to be related to writes.

 Has anyone else seen this?  Or can anyone suggest what the problem might
>>> be?

>>>
>>> Do you by any chance have old clients connecting? I saw this after a Jewel
>>> upgrade as well and it was because of very old clients still connecting to
>>> the cluster.
>>>
>>> Wido
>>>
 Thanks!
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> On Mon, Sep 12, 2016 at 3:05 PM, Wido den Hollander  wrote:
>>
>>>
 Op 12 september 2016 om 18:47 schreef "WRIGHT, JON R (JON R)" <
>>> jonrodwri...@gmail.com>:


 Since upgrading to Jewel from Hammer, we're started to see HEALTH_WARN
 because of 'blocked requests > 32 sec'.   Seems to be related to writes.

 Has anyone else seen this?  Or can anyone suggest what the problem might
>>> be?

>>>
>>> Do you by any chance have old clients connecting? I saw this after a Jewel
>>> upgrade as well and it was because of very old clients still connecting to
>>> the cluster.
>>>
>>> Wido
>>>
 Thanks!
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread David
What froze? Kernel RBD? Librbd? CephFS?

Ceph version?

On Tue, Sep 13, 2016 at 11:24 AM, Daznis  wrote:

> Hello,
>
>
> I have encountered a strange I/O freeze while rebooting one OSD node
> for maintenance purpose. It was one of the 3 Nodes in the entire
> cluster. Before this rebooting or shutting down and entire node just
> slowed down the ceph, but not completely froze it.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Ilya Dryomov
On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov  wrote:
> Hello list,
>
>
> I have the following cluster:
>
> ceph status
> cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0
>  health HEALTH_OK
>  monmap e2: 5 mons at 
> {alxc10=x:6789/0,alxc11=x:6789/0,alxc5=x:6789/0,alxc6=:6789/0,alxc7=x:6789/0}
> election epoch 196, quorum 0,1,2,3,4 
> alxc10,alxc5,alxc6,alxc7,alxc11
>  mdsmap e797: 1/1/1 up {0=alxc11.=up:active}, 2 up:standby
>  osdmap e11243: 50 osds: 50 up, 50 in
>   pgmap v3563774: 8192 pgs, 3 pools, 1954 GB data, 972 kobjects
> 4323 GB used, 85071 GB / 89424 GB avail
> 8192 active+clean
>   client io 168 MB/s rd, 11629 kB/s wr, 3447 op/s
>
> It's running ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) 
> and kernel 4.4.14
>
> I have multiple rbd devices which are used as the root for lxc-based 
> containers and have ext4. At some point I want
> to create a an rbd snapshot, for this the sequence of operations I do is thus:
>
> 1. freezefs -f /path/to/where/ext4-ontop-of-rbd-is-mounted

fsfreeze?

>
> 2. rbd snap create "${CEPH_POOL_NAME}/${name-of-blockdev}@${name-of-snapshot}
>
> 3. freezefs -u /path/to/where/ext4-ontop-of-rbd-is-mounted
>
> <= At this point normal container operation continues =>
>
> 4. Mount the newly created snapshot to a 2nd location as read-only and rsync 
> the files from it to a remote server.
>
> However as I start rsyncing stuff to the remote server then certain files in 
> the snapshot are reported as corrupted.

Can you share some dmesg snippets?  Is there a pattern - the same
file/set of files, etc?

>
> freezefs implies filesystem syncing I also tested with manually doing 
> sync/syncfs on the fs which is being snapshot. Before
> and after the freezefs and the corruption is still present. So it's unlikely 
> there are dirty buffers in the page cache.
> I'm using the kernel rbd driver for the clients. The theory currently is 
> there are some caches which are not being flushed,
> other than the linux page cache. Reading the doc implies that only librbd is 
> using separate caching but I'm not using librbd.

What happens if you run fsck -n on the snapshot (ro mapping)?

What happens if you run clone from the snapshot and run fsck (rw
mapping)?

What happens if you mount the clone without running fsck and run rsync?

Can you try taking more than one snapshot and then compare them?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread M Ranga Swami Reddy
Please check if any osd is nearfull ERR. Can you please share the ceph -s
o/p?

Thanks
Swami

On Tue, Sep 13, 2016 at 3:54 PM, Daznis  wrote:

> Hello,
>
>
> I have encountered a strange I/O freeze while rebooting one OSD node
> for maintenance purpose. It was one of the 3 Nodes in the entire
> cluster. Before this rebooting or shutting down and entire node just
> slowed down the ceph, but not completely froze it.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread Daznis
No, no errors about that. I have set noout before it happened, but it
still started recovery. I have added
nobackfill,norebalance,norecover,noscrub,nodeep-scrub once i noticed
it started doing crazy stuff. So recovery I/O stopped but the cluster
can't read any info. Only writes to cache layer.

cluster cdca2074-4c91-4047-a607-faebcbc1ee17
 health HEALTH_WARN
2225 pgs degraded
18 pgs down
18 pgs peering
89 pgs stale
2225 pgs stuck degraded
18 pgs stuck inactive
89 pgs stuck stale
2257 pgs stuck unclean
2225 pgs stuck undersized
2225 pgs undersized
recovery 4180820/11837906 objects degraded (35.317%)
recovery 24016/11837906 objects misplaced (0.203%)
12/39 in osds are down
noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub
flag(s) set
 monmap e9: 7 mons at {}
election epoch 170, quorum 0,1,2,3,4,5,6
 osdmap e40290: 40 osds: 27 up, 39 in; 14 remapped pgs
flags noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub
  pgmap v39326300: 4096 pgs, 4 pools, 21455 GB data, 5780 kobjects
42407 GB used, 75772 GB / 115 TB avail
4180820/11837906 objects degraded (35.317%)
24016/11837906 objects misplaced (0.203%)
2136 active+undersized+degraded
1837 active+clean
  89 stale+active+undersized+degraded
  18 down+peering
  14 active+remapped
   2 active+clean+scrubbing+deep
  client io 0 B/s rd, 9509 kB/s wr, 3469 op/s

On Tue, Sep 13, 2016 at 1:34 PM, M Ranga Swami Reddy
 wrote:
> Please check if any osd is nearfull ERR. Can you please share the ceph -s
> o/p?
>
> Thanks
> Swami
>
> On Tue, Sep 13, 2016 at 3:54 PM, Daznis  wrote:
>>
>> Hello,
>>
>>
>> I have encountered a strange I/O freeze while rebooting one OSD node
>> for maintenance purpose. It was one of the 3 Nodes in the entire
>> cluster. Before this rebooting or shutting down and entire node just
>> slowed down the ceph, but not completely froze it.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread Sean Redmond
Hi,

The host that is taken down has 12 disks in it?

Have a look at the down PG's '18 pgs down' - I suspect this will be what is
causing the I/O freeze.

Is your cursh map setup correctly to split data over different hosts?

Thanks

On Tue, Sep 13, 2016 at 11:45 AM, Daznis  wrote:

> No, no errors about that. I have set noout before it happened, but it
> still started recovery. I have added
> nobackfill,norebalance,norecover,noscrub,nodeep-scrub once i noticed
> it started doing crazy stuff. So recovery I/O stopped but the cluster
> can't read any info. Only writes to cache layer.
>
> cluster cdca2074-4c91-4047-a607-faebcbc1ee17
>  health HEALTH_WARN
> 2225 pgs degraded
> 18 pgs down
> 18 pgs peering
> 89 pgs stale
> 2225 pgs stuck degraded
> 18 pgs stuck inactive
> 89 pgs stuck stale
> 2257 pgs stuck unclean
> 2225 pgs stuck undersized
> 2225 pgs undersized
> recovery 4180820/11837906 objects degraded (35.317%)
> recovery 24016/11837906 objects misplaced (0.203%)
> 12/39 in osds are down
> noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub
> flag(s) set
>  monmap e9: 7 mons at {}
> election epoch 170, quorum 0,1,2,3,4,5,6
>  osdmap e40290: 40 osds: 27 up, 39 in; 14 remapped pgs
> flags noout,nobackfill,norebalance,
> norecover,noscrub,nodeep-scrub
>   pgmap v39326300: 4096 pgs, 4 pools, 21455 GB data, 5780 kobjects
> 42407 GB used, 75772 GB / 115 TB avail
> 4180820/11837906 objects degraded (35.317%)
> 24016/11837906 objects misplaced (0.203%)
> 2136 active+undersized+degraded
> 1837 active+clean
>   89 stale+active+undersized+degraded
>   18 down+peering
>   14 active+remapped
>2 active+clean+scrubbing+deep
>   client io 0 B/s rd, 9509 kB/s wr, 3469 op/s
>
> On Tue, Sep 13, 2016 at 1:34 PM, M Ranga Swami Reddy
>  wrote:
> > Please check if any osd is nearfull ERR. Can you please share the ceph -s
> > o/p?
> >
> > Thanks
> > Swami
> >
> > On Tue, Sep 13, 2016 at 3:54 PM, Daznis  wrote:
> >>
> >> Hello,
> >>
> >>
> >> I have encountered a strange I/O freeze while rebooting one OSD node
> >> for maintenance purpose. It was one of the 3 Nodes in the entire
> >> cluster. Before this rebooting or shutting down and entire node just
> >> slowed down the ceph, but not completely froze it.
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread Daznis
Yes that one has +2 OSD's on it.
root default {
id -1   # do not change unnecessarily
# weight 116.480
alg straw
hash 0  # rjenkins1
item OSD-1 weight 36.400
item OSD-2 weight 36.400
item OSD-3 weight 43.680
}

rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

On Tue, Sep 13, 2016 at 1:51 PM, Sean Redmond  wrote:
> Hi,
>
> The host that is taken down has 12 disks in it?
>
> Have a look at the down PG's '18 pgs down' - I suspect this will be what is
> causing the I/O freeze.
>
> Is your cursh map setup correctly to split data over different hosts?
>
> Thanks
>
> On Tue, Sep 13, 2016 at 11:45 AM, Daznis  wrote:
>>
>> No, no errors about that. I have set noout before it happened, but it
>> still started recovery. I have added
>> nobackfill,norebalance,norecover,noscrub,nodeep-scrub once i noticed
>> it started doing crazy stuff. So recovery I/O stopped but the cluster
>> can't read any info. Only writes to cache layer.
>>
>> cluster cdca2074-4c91-4047-a607-faebcbc1ee17
>>  health HEALTH_WARN
>> 2225 pgs degraded
>> 18 pgs down
>> 18 pgs peering
>> 89 pgs stale
>> 2225 pgs stuck degraded
>> 18 pgs stuck inactive
>> 89 pgs stuck stale
>> 2257 pgs stuck unclean
>> 2225 pgs stuck undersized
>> 2225 pgs undersized
>> recovery 4180820/11837906 objects degraded (35.317%)
>> recovery 24016/11837906 objects misplaced (0.203%)
>> 12/39 in osds are down
>> noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub
>> flag(s) set
>>  monmap e9: 7 mons at {}
>> election epoch 170, quorum 0,1,2,3,4,5,6
>>  osdmap e40290: 40 osds: 27 up, 39 in; 14 remapped pgs
>> flags
>> noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub
>>   pgmap v39326300: 4096 pgs, 4 pools, 21455 GB data, 5780 kobjects
>> 42407 GB used, 75772 GB / 115 TB avail
>> 4180820/11837906 objects degraded (35.317%)
>> 24016/11837906 objects misplaced (0.203%)
>> 2136 active+undersized+degraded
>> 1837 active+clean
>>   89 stale+active+undersized+degraded
>>   18 down+peering
>>   14 active+remapped
>>2 active+clean+scrubbing+deep
>>   client io 0 B/s rd, 9509 kB/s wr, 3469 op/s
>>
>> On Tue, Sep 13, 2016 at 1:34 PM, M Ranga Swami Reddy
>>  wrote:
>> > Please check if any osd is nearfull ERR. Can you please share the ceph
>> > -s
>> > o/p?
>> >
>> > Thanks
>> > Swami
>> >
>> > On Tue, Sep 13, 2016 at 3:54 PM, Daznis  wrote:
>> >>
>> >> Hello,
>> >>
>> >>
>> >> I have encountered a strange I/O freeze while rebooting one OSD node
>> >> for maintenance purpose. It was one of the 3 Nodes in the entire
>> >> cluster. Before this rebooting or shutting down and entire node just
>> >> slowed down the ceph, but not completely froze it.
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I/O freeze while a single node is down.

2016-09-13 Thread Goncalo Borges
Hi Daznis...

Something is not quite right. You have pools with 2 replicas (right?). The fact 
that you have 18 down pgs says that both the OSDS acting on those pgs are with 
problems.

You should try to understand which PGs are down and which OSDs are acting on 
them ('ceph pg dump_stuck' or 'ceph health detail' should give you that info). 
Maybe you find from there what is the other / or others problematic OSDs. Try 
to then 'ceph pg  query' and see if you get any further info.

Cheers
Goncalo  

From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Daznis 
[daz...@gmail.com]
Sent: 13 September 2016 21:10
To: Sean Redmond
Cc: ceph-users
Subject: Re: [ceph-users] I/O freeze while a single node is down.

Yes that one has +2 OSD's on it.
root default {
id -1   # do not change unnecessarily
# weight 116.480
alg straw
hash 0  # rjenkins1
item OSD-1 weight 36.400
item OSD-2 weight 36.400
item OSD-3 weight 43.680
}

rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

On Tue, Sep 13, 2016 at 1:51 PM, Sean Redmond  wrote:
> Hi,
>
> The host that is taken down has 12 disks in it?
>
> Have a look at the down PG's '18 pgs down' - I suspect this will be what is
> causing the I/O freeze.
>
> Is your cursh map setup correctly to split data over different hosts?
>
> Thanks
>
> On Tue, Sep 13, 2016 at 11:45 AM, Daznis  wrote:
>>
>> No, no errors about that. I have set noout before it happened, but it
>> still started recovery. I have added
>> nobackfill,norebalance,norecover,noscrub,nodeep-scrub once i noticed
>> it started doing crazy stuff. So recovery I/O stopped but the cluster
>> can't read any info. Only writes to cache layer.
>>
>> cluster cdca2074-4c91-4047-a607-faebcbc1ee17
>>  health HEALTH_WARN
>> 2225 pgs degraded
>> 18 pgs down
>> 18 pgs peering
>> 89 pgs stale
>> 2225 pgs stuck degraded
>> 18 pgs stuck inactive
>> 89 pgs stuck stale
>> 2257 pgs stuck unclean
>> 2225 pgs stuck undersized
>> 2225 pgs undersized
>> recovery 4180820/11837906 objects degraded (35.317%)
>> recovery 24016/11837906 objects misplaced (0.203%)
>> 12/39 in osds are down
>> noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub
>> flag(s) set
>>  monmap e9: 7 mons at {}
>> election epoch 170, quorum 0,1,2,3,4,5,6
>>  osdmap e40290: 40 osds: 27 up, 39 in; 14 remapped pgs
>> flags
>> noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub
>>   pgmap v39326300: 4096 pgs, 4 pools, 21455 GB data, 5780 kobjects
>> 42407 GB used, 75772 GB / 115 TB avail
>> 4180820/11837906 objects degraded (35.317%)
>> 24016/11837906 objects misplaced (0.203%)
>> 2136 active+undersized+degraded
>> 1837 active+clean
>>   89 stale+active+undersized+degraded
>>   18 down+peering
>>   14 active+remapped
>>2 active+clean+scrubbing+deep
>>   client io 0 B/s rd, 9509 kB/s wr, 3469 op/s
>>
>> On Tue, Sep 13, 2016 at 1:34 PM, M Ranga Swami Reddy
>>  wrote:
>> > Please check if any osd is nearfull ERR. Can you please share the ceph
>> > -s
>> > o/p?
>> >
>> > Thanks
>> > Swami
>> >
>> > On Tue, Sep 13, 2016 at 3:54 PM, Daznis  wrote:
>> >>
>> >> Hello,
>> >>
>> >>
>> >> I have encountered a strange I/O freeze while rebooting one OSD node
>> >> for maintenance purpose. It was one of the 3 Nodes in the entire
>> >> cluster. Before this rebooting or shutting down and entire node just
>> >> slowed down the ceph, but not completely froze it.
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Network testing tool.

2016-09-13 Thread Owen Synge
Dear all,

Often issues arise with badly configured network switches, vlans, and
such like. Knowing each node routes to is a major deployment fail and
can be difficult to diagnose.

The brief looks like this:

Description:

  * Diagnose network issues quickly for ceph.
  * Identify network issues before deploying ceph.

A typical deployment will have 2 networks and potentially 3.

  * External network for client access.
  * Internal network for data replication. (Strongly recommended)
  * Administration network. (Optional)

Typically we will have salt available on all nodes, but it would be the
same for ansible or an other config management solution.

  * This will make injection of IP addresses and hosts trivial.

Before I go any further with developing a solution, does anyone know of
a pre made solution, that will avoid me writing much code, or ideally
any code.

So far I have only found tools for testing connectivity between 1 point
and another, not for testing 1:N or N:N.

If I must write such a tool my self, I imagine the roadmap to start with
just ping, and then expand form here, with commands such as iperf, port
range test etc.

All suggestions are welcome particularly tools that save time.
dependency on puppet has already ruled out one solution.

Best regards

Owen Synge
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] help on keystone v3 ceph.conf in Jewel

2016-09-13 Thread Robert Duncan
Thanks Jean-Charles,

It was the ceph client packages on the cinder node as you suspected, I now have 
a working rbd driver with cinder, I am left only with one other problem since 
the upgrade which has me stumped:
The rados gateway, Apache can't seem to proxy to the service


  ServerName node-10.domain.local
  DocumentRoot /var/www/radosgw

  RewriteEngine On
  RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

  SetEnv proxy-nokeepalive 1
  ProxyPass / fcgi://127.0.0.1:9000/

  ## Logging
  ErrorLog "/var/log/apache2/radosgw_error.log"
  CustomLog "/var/log/apache2/radosgw_access.log" forwarded

  AllowEncodedSlashes On
  ServerSignature Off



The radosgw service is running and the client works

root@node-10:/etc/apache2/sites-enabled# service radosgw status 
/usr/bin/radosgw is running.
root@node-10:/etc/apache2/sites-enabled#  rados -p .rgw put myobject test.txt 
root@node-10:/etc/apache2/sites-enabled#

but the virtual host can't make a connection to fcgi

[client 193.1.202.3:42416] AH01079: failed to make connection to backend: 
127.0.0.1 [Mon Sep 12 13:11:46.591957 2016] [proxy:error] [pid 8695:tid 
139780608206592] AH00940: FCGI: disabled connection for (127.0.0.1) [Mon Sep 12 
13:11:48.626932 2016] [proxy:error] [pid 8700:tid 139780608206592] AH00940: 
FCGI: disabled connection for (127.0.0.1) [Mon Sep 12 13:11:50.572243 2016] 
[proxy:error] [pid 8704:tid 139780616599296] (111)Connection refused: AH00957: 
FCGI: attempt to connect to 127.0.0.1:9000 (127.0.0.1) failed [Mon Sep 12 
13:11:50.572300 2016] [proxy:error] [pid 8704:tid 139780616599296] AH00959: 
ap_proxy_connect_backend disabling worker for (127.0.0.1) for 60s [Mon Sep 12 
13:11:50.572312 2016] [proxy_fcgi:error] [pid 8704:tid 139780616599296] [client 
192.168.10.2:42484] AH01079: failed to make connection to backend: 127.0.0.1

Apache has loaded the module
apache2ctl -M | grep fast
AH00316: WARNING: MaxRequestWorkers of 2406 is not an integer multiple of  
ThreadsPerChild of 25, decreasing to nearest multiple 2400,  for a maximum of 
96 servers.
 fastcgi_module (shared)

and it seems to be binding on port

netstat -tulpn | grep 9000
tcp0  0 127.0.0.1:9000  0.0.0.0:*   LISTEN  
11045/radosgw


can telnet to the port:
root@node-10:/etc/apache2/sites-enabled# telnet 127.0.0.1 9000 Trying 
127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.

As per the ceph.conf

rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator keyring = 
/etc/ceph/keyring.radosgw.gateway rgw_frontends = fastcgi socket_port=9000 
socket_host=127.0.0.1 rgw_socket_path = /tmp/radosgw.sock

I have noticed two things
1. the is no problem with the rados client interacting and creating object, and 
2. that the S3 api seems to be up when I visit the rgw service in a browser:

http://s3.amazonaws.com/doc/2006-03-01/";>

anonymous






Thanks for looking!


Rob.


-Original Message-
From: LOPEZ Jean-Charles [mailto:jelo...@redhat.com] 
Sent: Friday, September 9, 2016 6:10 PM
To: Robert Duncan 
Cc: LOPEZ Jean-Charles ; ceph-users 
Subject: Re: [ceph-users] help on keystone v3 ceph.conf in Jewel

Hi,

from the log file it looks like librbd.so doesn’t contain a specific entry 
point that needs to be called. See my comment inline.

Have you upgraded the ceph client packages on the cinder node and on the nova 
compute node? Or you just did the upgrade on the ceph nodes?

JC

> On Sep 9, 2016, at 09:37, Robert Duncan  wrote:
> 
> Hi,
> 
> I have deployed the Mirantis distribution of OpenStack Mitaka which comes 
> with Ceph Hammer, since I want to use keystone v3 with radosgw I added the 
> Ubuntu cloud archive for Mitaka on Trusty.
> And then followed the upgrade instructions (had to remove the mos 
> sources from sources.list)
> 
> Anyway the upgrade looks to have gone okay and I am now on jewel, but rdb and 
> rgw have stopped working in the cloud - is this down to my ceph.conf?
> 
> There are no clues on keystone logs
> 
> 
> 
> [global]
> fsid = 5d587e15-5904-4fd2-84db-b4038c18e327
> mon_initial_members = node-10
> mon_host = 172.25.80.4
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> log_to_syslog_level = info
> log_to_syslog = True
> osd_pool_default_size = 2
> osd_pool_default_min_size = 1
> osd_pool_default_pg_num = 64
> public_network = 172.25.80.0/24
> log_to_syslog_facility = LOG_LOCAL0
> osd_journal_size = 2048
> auth_supported = cephx
> osd_pool_default_pgp_num = 64
> osd_mkfs_type = xfs
> cluster_network = 172.25.80.0/24
> osd_recovery_max_active = 1
> osd_max_backfills = 1
> setuser match path = /var/lib/ceph/$type/$cluster-$id
> 
> [client]
> rbd_cache_writethrough_until_flush = True rbd_cache = True
> 
> [client.radosgw.gateway]
> rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator 
> keyring = /etc/ceph/keyring.radosgw.gateway rgw_frontends = fastcgi 
> socket_port=9000 socket_hos

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Nikolay Borisov


On 09/13/2016 01:33 PM, Ilya Dryomov wrote:
> On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov  wrote:
>> Hello list,
>>
>>
>> I have the following cluster:
>>
>> ceph status
>> cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0
>>  health HEALTH_OK
>>  monmap e2: 5 mons at 
>> {alxc10=x:6789/0,alxc11=x:6789/0,alxc5=x:6789/0,alxc6=:6789/0,alxc7=x:6789/0}
>> election epoch 196, quorum 0,1,2,3,4 
>> alxc10,alxc5,alxc6,alxc7,alxc11
>>  mdsmap e797: 1/1/1 up {0=alxc11.=up:active}, 2 up:standby
>>  osdmap e11243: 50 osds: 50 up, 50 in
>>   pgmap v3563774: 8192 pgs, 3 pools, 1954 GB data, 972 kobjects
>> 4323 GB used, 85071 GB / 89424 GB avail
>> 8192 active+clean
>>   client io 168 MB/s rd, 11629 kB/s wr, 3447 op/s
>>
>> It's running ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) 
>> and kernel 4.4.14
>>
>> I have multiple rbd devices which are used as the root for lxc-based 
>> containers and have ext4. At some point I want
>> to create a an rbd snapshot, for this the sequence of operations I do is 
>> thus:
>>
>> 1. freezefs -f /path/to/where/ext4-ontop-of-rbd-is-mounted
> 
> fsfreeze?

Yes, indeed, my bad. 

> 
>>
>> 2. rbd snap create "${CEPH_POOL_NAME}/${name-of-blockdev}@${name-of-snapshot}
>>
>> 3. freezefs -u /path/to/where/ext4-ontop-of-rbd-is-mounted
>>
>> <= At this point normal container operation continues =>
>>
>> 4. Mount the newly created snapshot to a 2nd location as read-only and rsync 
>> the files from it to a remote server.
>>
>> However as I start rsyncing stuff to the remote server then certain files in 
>> the snapshot are reported as corrupted.
> 
> Can you share some dmesg snippets?  Is there a pattern - the same
> file/set of files, etc?

[1718059.910038] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718060.044540] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #52269: 
comm rsync: deleted inode referenced: 46393
[1718060.044978] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718060.045246] rbd: rbd143: write 1000 at 0 result -30
[1718060.045249] blk_update_request: I/O error, dev rbd143, sector 0
[1718060.045487] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718071.404057] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
#385038: comm rsync: deleted inode referenced: 46581
[1718071.404466] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718071.404739] rbd: rbd143: write 1000 at 0 result -30
[1718071.404742] blk_update_request: I/O error, dev rbd143, sector 0
[1718071.404999] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718071.419172] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
#769039: comm rsync: deleted inode referenced: 410848
[1718071.419575] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718071.419844] rbd: rbd143: write 1000 at 0 result -30
[1718071.419847] blk_update_request: I/O error, dev rbd143, sector 0
[1718071.420081] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718071.420758] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
#769039: comm rsync: deleted inode referenced: 410848
[1718071.421196] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718071.421441] rbd: rbd143: write 1000 at 0 result -30
[1718071.421443] blk_update_request: I/O error, dev rbd143, sector 0
[1718071.421671] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718071.543020] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #52269: 
comm rsync: deleted inode referenced: 46393
[1718071.543422] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718071.543680] rbd: rbd143: write 1000 at 0 result -30
[1718071.543682] blk_update_request: I/O error, dev rbd143, sector 0
[1718071.543945] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718083.388635] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
#385038: comm rsync: deleted inode referenced: 46581
[1718083.389060] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718083.389324] rbd: rbd143: write 1000 at 0 result -30
[1718083.389327] blk_update_request: I/O error, dev rbd143, sector 0
[1718083.389561] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718083.403910] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
#769039: comm rsync: deleted inode referenced: 410848
[1718083.404319] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718083.404581] rbd: rbd143: write 1000 at 0 result -30
[1718083.404583] blk_update_request: I/O error, dev rbd143, sector 0
[1718083.404816] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718083.405484] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
#769039: comm rsync: deleted inode referenced: 410848
[1718083.405893] EXT4-fs (rbd143): previous I/O error to superblock detect

Re: [ceph-users] Network testing tool.

2016-09-13 Thread Mark Nelson

On 09/13/2016 06:46 AM, Owen Synge wrote:

Dear all,

Often issues arise with badly configured network switches, vlans, and
such like. Knowing each node routes to is a major deployment fail and
can be difficult to diagnose.

The brief looks like this:

Description:

  * Diagnose network issues quickly for ceph.
  * Identify network issues before deploying ceph.

A typical deployment will have 2 networks and potentially 3.

  * External network for client access.
  * Internal network for data replication. (Strongly recommended)
  * Administration network. (Optional)

Typically we will have salt available on all nodes, but it would be the
same for ansible or an other config management solution.

  * This will make injection of IP addresses and hosts trivial.

Before I go any further with developing a solution, does anyone know of
a pre made solution, that will avoid me writing much code, or ideally
any code.

So far I have only found tools for testing connectivity between 1 point
and another, not for testing 1:N or N:N.


We simulated 1:N and N:N throughput tests by wrapping iperf in simple 
shell scripts and simply executing it in parallel on every host using 
pdsh.  These are really good tests to perform as it can show problems in 
the switch itself.  In one instance, a customer's bonding setup was 
causing extreme throughput degradation, but only on certain routes, but 
it only showed up in 1:N and N:N situations (primarily N:N).  These were 
just throw away scripts, but the idea works really well in practice so 
something a bit more substantial might work very well.  Way down on the 
todo list is to add something like this to cbt for a pre-benchmark 
network test.


1:N
---

Script to launch iperf servers:
#!/bin/bash
for i in 8 9 10 11 12 13
do
  val=$((62+$i))
  pdsh -R ssh -w osd[$i] iperf -s -B 192.168.1.$val &
done

Script to launch iperf clients:
#!/bin/bash
for i in 0 1 2 3 4 5 6 7
do
  for val in 70 71 72 73 74 75
  do
pdsh -R ssh -w client[$i] iperf -c 192.168.1.$val -f m -t 60 -P 1 > 
/tmp/iperf_client${i}_to_${val}.out &

  done
done


N:N
---

for i in 8 9 10 11 12 13
do
  val=$((62+$i))
  pdsh -R ssh -w osd[$i] iperf -s -B 192.168.1.$val &
done

#!/bin/bash
for i in 8 9 10 11 12 13
do
  for val in 70 71 72 73 74 75
  do
pdsh -R ssh -w osd[$i] iperf -c 192.168.1.$val -P 1 -f m -t 60 -P 1 
> /tmp/iperf_${i}_to_${val}.out  &

  done
done
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RadosGW performance degradation on the 18 millions objects stored.

2016-09-13 Thread Stas Starikevich
Hi All,

Asking your assistance with the RadosGW performance degradation on the 18M 
objects placed (http://pasteboard.co/g781YI3J.png 
).
Drops from 620 uploads\s to 180-190 uploads\s.

I made list of tests and see that upload performance degrades in 3-4 times when 
its number of objects reaches 18M.
Number of OSD's doesn't matter, problem reproduces with 6\18\56 OSD's.
Increasing number of index shards doesn't help. Originally I faced with the 
problem when I had 8 shards per bucket, now it's 256, but same picture.
Number of PG's on the default.rgw.buckets.data also makes no difference, but 
latest test with 2048 PG's (+nobarrier, +leveldb_compression = false) shows a 
bit higher upload rate.
Problem reproduces even with erasure coding pool (tested 4-2). Erasure coding 
gives much higher inodes usage (my first suspicion was in the lack of cache RAM 
for inodes), but it doesn't matter - drops on the 18M too.

Moved meta\index pools to the SSD only. Increased number of RGW threads to 
8192. It raised upload\s from 250 to 600 (and no bad gateway errors), but 
didn't help with drop at the 18M objects threshold.

Extra tunings (logbsize=256k,delaylog,allocsize=4M,nobarrier, 
leveldb_cache_size, leveldb_write_buffer_size, 
osd_pg_epoch_persisted_max_stale, osd_map_cache_size) I made on the few latest 
tests. Didn't help much, but upload rate became more stable with no drops.

From the HDD's stats I see that on the 18M threshold number of 'read' requests 
increases from 2-3 to

Any ideas?

Ceph cluster has 9 nodes:

- ceph-mon0{1..3} - 12G RAM, SSD
- ceph-node0{1..6} - 24G RAM, 9 OSD's with SSD journals

Mon servers have ceph-mons, haproxies (https-ecc) + citweb services.

# ceph -v
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

# ceph -s
cluster 9cb0840a-bd73-499a-ae09-eaa75a80bddb
 health HEALTH_OK
 monmap e1: 3 mons at 
{ceph-mon01=10.10.10.21:6789/0,ceph-mon02=10.10.10.22:6789/0,ceph-mon03=10.10.10.23:6789/0}
election epoch 8, quorum 0,1,2 ceph-mon01,ceph-mon02,ceph-mon03
 osdmap e1476: 62 osds: 62 up, 62 in
flags sortbitwise
  pgmap v68348: 2752 pgs, 12 pools, 2437 GB data, 31208 kobjects
7713 GB used, 146 TB / 153 TB avail
2752 active+clean
  client io 1043 kB/s rd, 48307 kB/s wr, 1043 op/s rd, 8153 op/s wr

ceph osd tree: http://pastebin.com/scNuW0LN 
ceph df: http://pastebin.com/ZyQByHG4 
ceph.conf: http://pastebin.com/9AxVr1gm 
ceph osd dump: http://pastebin.com/4mesKGD0 

Screenshots from the grafana page:

Number of objects at the degradation moment: http://pasteboard.co/2B5OZ03d0.png 

IOPs drop: http://pasteboard.co/2B6vDyKEn.png 

Disk util raised to 80%: http://pasteboard.co/2B6YREzoC.png 

Disk operations: http://pasteboard.co/2B7uI5PWB.png 

Disk operations - reads: http://pasteboard.co/2B8U8E33d.png 


Thanks.

--
Kind regards,
Stas Starikevich, CISSP



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Ilya Dryomov
On Tue, Sep 13, 2016 at 1:59 PM, Nikolay Borisov  wrote:
>
>
> On 09/13/2016 01:33 PM, Ilya Dryomov wrote:
>> On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov  wrote:
>>> Hello list,
>>>
>>>
>>> I have the following cluster:
>>>
>>> ceph status
>>> cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0
>>>  health HEALTH_OK
>>>  monmap e2: 5 mons at 
>>> {alxc10=x:6789/0,alxc11=x:6789/0,alxc5=x:6789/0,alxc6=:6789/0,alxc7=x:6789/0}
>>> election epoch 196, quorum 0,1,2,3,4 
>>> alxc10,alxc5,alxc6,alxc7,alxc11
>>>  mdsmap e797: 1/1/1 up {0=alxc11.=up:active}, 2 up:standby
>>>  osdmap e11243: 50 osds: 50 up, 50 in
>>>   pgmap v3563774: 8192 pgs, 3 pools, 1954 GB data, 972 kobjects
>>> 4323 GB used, 85071 GB / 89424 GB avail
>>> 8192 active+clean
>>>   client io 168 MB/s rd, 11629 kB/s wr, 3447 op/s
>>>
>>> It's running ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) 
>>> and kernel 4.4.14
>>>
>>> I have multiple rbd devices which are used as the root for lxc-based 
>>> containers and have ext4. At some point I want
>>> to create a an rbd snapshot, for this the sequence of operations I do is 
>>> thus:
>>>
>>> 1. freezefs -f /path/to/where/ext4-ontop-of-rbd-is-mounted
>>
>> fsfreeze?
>
> Yes, indeed, my bad.
>
>>
>>>
>>> 2. rbd snap create 
>>> "${CEPH_POOL_NAME}/${name-of-blockdev}@${name-of-snapshot}
>>>
>>> 3. freezefs -u /path/to/where/ext4-ontop-of-rbd-is-mounted
>>>
>>> <= At this point normal container operation continues =>
>>>
>>> 4. Mount the newly created snapshot to a 2nd location as read-only and 
>>> rsync the files from it to a remote server.
>>>
>>> However as I start rsyncing stuff to the remote server then certain files 
>>> in the snapshot are reported as corrupted.
>>
>> Can you share some dmesg snippets?  Is there a pattern - the same
>> file/set of files, etc?
>
> [1718059.910038] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718060.044540] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #52269: comm rsync: deleted inode referenced: 46393
> [1718060.044978] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718060.045246] rbd: rbd143: write 1000 at 0 result -30
> [1718060.045249] blk_update_request: I/O error, dev rbd143, sector 0
> [1718060.045487] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718071.404057] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #385038: comm rsync: deleted inode referenced: 46581
> [1718071.404466] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718071.404739] rbd: rbd143: write 1000 at 0 result -30
> [1718071.404742] blk_update_request: I/O error, dev rbd143, sector 0
> [1718071.404999] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718071.419172] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #769039: comm rsync: deleted inode referenced: 410848
> [1718071.419575] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718071.419844] rbd: rbd143: write 1000 at 0 result -30
> [1718071.419847] blk_update_request: I/O error, dev rbd143, sector 0
> [1718071.420081] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718071.420758] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #769039: comm rsync: deleted inode referenced: 410848
> [1718071.421196] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718071.421441] rbd: rbd143: write 1000 at 0 result -30
> [1718071.421443] blk_update_request: I/O error, dev rbd143, sector 0
> [1718071.421671] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718071.543020] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #52269: comm rsync: deleted inode referenced: 46393
> [1718071.543422] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718071.543680] rbd: rbd143: write 1000 at 0 result -30
> [1718071.543682] blk_update_request: I/O error, dev rbd143, sector 0
> [1718071.543945] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718083.388635] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #385038: comm rsync: deleted inode referenced: 46581
> [1718083.389060] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718083.389324] rbd: rbd143: write 1000 at 0 result -30
> [1718083.389327] blk_update_request: I/O error, dev rbd143, sector 0
> [1718083.389561] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718083.403910] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #769039: comm rsync: deleted inode referenced: 410848
> [1718083.404319] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718083.404581] rbd: rbd143: write 1000 at 0 result -30
> [1718083.404583] blk_update_request: I/O error, dev rbd143, sector 0
> [1718083.404816] Buffer I/O error on dev rbd143, logical block 0, lost sync

Re: [ceph-users] jewel blocked requests

2016-09-13 Thread WRIGHT, JON R (JON R)
Yes, I do have old clients running.  The clients are all vms.  Is it 
typical that vm clients have to be rebuilt after a ceph upgrade?


Thanks,

Jon


On 9/12/2016 4:05 PM, Wido den Hollander wrote:

Op 12 september 2016 om 18:47 schreef "WRIGHT, JON R (JON R)" 
:


Since upgrading to Jewel from Hammer, we're started to see HEALTH_WARN
because of 'blocked requests > 32 sec'.   Seems to be related to writes.

Has anyone else seen this?  Or can anyone suggest what the problem might be?


Do you by any chance have old clients connecting? I saw this after a Jewel 
upgrade as well and it was because of very old clients still connecting to the 
cluster.

Wido


Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jewel blocked requests

2016-09-13 Thread WRIGHT, JON R (JON R)
Yes, vms and volumes existed across the ceph releases.  But the vms were 
rebooted and the volumes reattached following the upgrade.  The vms were 
all Ubuntu 14.04  before and after the upgrade.


Thanks,
Jon

On 9/12/2016 8:28 PM, shiva rkreddy wrote:
By saying "old clients"  did you mean, (a) Client VMs running old 
Operating System (b)  Client VMs/Volumes that are in-use for a long 
time and across ceph releases ? Was there any tuning done to fix it?


Thanks,

On Mon, Sep 12, 2016 at 3:05 PM, Wido den Hollander > wrote:



> Op 12 september 2016 om 18:47 schreef "WRIGHT, JON R (JON R)"
mailto:jonrodwri...@gmail.com>>:
>
>
> Since upgrading to Jewel from Hammer, we're started to see
HEALTH_WARN
> because of 'blocked requests > 32 sec'.   Seems to be related to
writes.
>
> Has anyone else seen this?  Or can anyone suggest what the
problem might be?
>

Do you by any chance have old clients connecting? I saw this after
a Jewel upgrade as well and it was because of very old clients
still connecting to the cluster.

Wido

> Thanks!
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




On Mon, Sep 12, 2016 at 3:05 PM, Wido den Hollander > wrote:



> Op 12 september 2016 om 18:47 schreef "WRIGHT, JON R (JON R)"
mailto:jonrodwri...@gmail.com>>:
>
>
> Since upgrading to Jewel from Hammer, we're started to see
HEALTH_WARN
> because of 'blocked requests > 32 sec'.   Seems to be related to
writes.
>
> Has anyone else seen this?  Or can anyone suggest what the
problem might be?
>

Do you by any chance have old clients connecting? I saw this after
a Jewel upgrade as well and it was because of very old clients
still connecting to the cluster.

Wido

> Thanks!
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Nikolay Borisov


On 09/13/2016 04:30 PM, Ilya Dryomov wrote:
[SNIP]
> 
> Hmm, it could be about whether it is able to do journal replay on
> mount.  When you mount a snapshot, you get a read-only block device;
> when you mount a clone image, you get a read-write block device.
> 
> Let's try this again, suppose image is foo and snapshot is snap:
> 
> # fsfreeze -f /mnt
> 
> # rbd snap create foo@snap
> # rbd map foo@snap
> /dev/rbd0
> # file -s /dev/rbd0
> # fsck.ext4 -n /dev/rbd0
> # mount /dev/rbd0 /foo
> # umount /foo
> 
> # file -s /dev/rbd0
> # fsck.ext4 -n /dev/rbd0
> 
> # rbd clone foo@snap bar
> $ rbd map bar
> /dev/rbd1
> # file -s /dev/rbd1
> # fsck.ext4 -n /dev/rbd1
> # mount /dev/rbd1 /bar
> # umount /bar
> 
> # file -s /dev/rbd1
> # fsck.ext4 -n /dev/rbd1
> 
> Could you please provide the output for the above?

Here you go : http://paste.ubuntu.com/23173721/


[SNIP]

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW performance degradation on the 18 millions objects stored.

2016-09-13 Thread Mark Nelson



On 09/13/2016 08:17 AM, Stas Starikevich wrote:

Hi All,

Asking your assistance with the RadosGW performance degradation on the
18M objects placed (http://pasteboard.co/g781YI3J.png).
Drops from 620 uploads\s to 180-190 uploads\s.

I made list of tests and see that upload performance degrades in 3-4
times when its number of objects reaches 18M.
Number of OSD's doesn't matter, problem reproduces with 6\18\56 OSD's.
Increasing number of index shards doesn't help. Originally I faced with
the problem when I had 8 shards per bucket, now it's 256, but same picture.
Number of PG's on the default.rgw.buckets.data also makes no difference,
but latest test with 2048 PG's (+nobarrier, +leveldb_compression =
false) shows a bit higher upload rate.


Please do not use nobarrier!  In almost all situations you should 
absolutely not use it!



Problem reproduces even with erasure coding pool (tested 4-2). Erasure
coding gives much higher inodes usage (my first suspicion was in the
lack of cache RAM for inodes), but it doesn't matter - drops on the 18M too.

Moved meta\index pools to the SSD only. Increased number of RGW threads
to 8192. It raised upload\s from 250 to 600 (and no bad gateway errors),
but didn't help with drop at the 18M objects threshold.

Extra tunings
(logbsize=256k,delaylog,allocsize=4M,nobarrier, leveldb_cache_size, 
leveldb_write_buffer_size, osd_pg_epoch_persisted_max_stale, osd_map_cache_size)
I made on the few latest tests. Didn't help much, but upload rate became
more stable with no drops.

From the HDD's stats I see that on the 18M threshold number of 'read'
requests increases from 2-3 to

Any ideas?


With RGW writes, you are ultimately fighting seek behavior, and it's 
going to get worse the more objects you've written.  There are a variety 
of reasons for this.


1) If you are not using blind buckets, every write is going to result in 
multiple round trips to update the bucket indices (ie more seeks and 
more latency).


2) The more objects you have, the lower the chance that any given 
object's inode/dentry will be cached.  This was even worse in hammer as 
we didn't chunk xattrs at 255 bytes, so RGW metadata would push xattrs 
out of the inode causing yet another seek.  That was fixed for jewel, 
but old objects will still slow things down.


2b) You can increase the dentry/inode cache in the kernel, but this 
comes with a cost.  The more things you have cached, the longer it takes 
syncfs to complete as it has to iterate through all of that cached 
metadata.  This isn't so much a problem as small scale but has proven to 
be a problem on large clusters when there is a ton of memory in the node 
and lots of cached metadata.


3) filestore stores objects for a given PG in a nested directory 
hierarchy that becomes deeper as the number of objects grow.  The number 
of objects you can store before hitting these thresholds depends on the 
number of PGs, the distribution of objects to PGs, and the filestore 
split/merge thresholds.  A deeper directory hierarchy means that there 
are more dentries to keep in cache and a greater likelihood that memory 
pressure may push one of them out and an extra seek will need to be 
performed.


3a) when splitting is happening, ceph will definitely be looking up 
directory xattrs and also will perform a very large number of 
link/unlink operations.  Usually this shouldn't require xattr lookups on 
the objects, but it appears that if selinux is enabled, it may 
(depending on the setup) read security metadata in the xattrs for an 
object to determine if the link/unlink can proceed. This is extremely 
slow and seek intensive.  Even when selinux is not enabled, splitting a 
PG's directory hierarchy is going to involve a certain amount of overhead.


3b) On XFS, files in a directory are all created in the same AG with the 
idea that they will be physically close to each other on the disk.  The 
idea is to hopefully reduce the number of seeks should they all be 
accessed at roughly the same point in time.  When a split happens, new 
subdirectories are created and a portion of the objects in the parent 
directory are moved to the new subdirectories.  The problem is that 
those subdirectories will not necessarily be in the same AG as the 
parent.  As the directory hierarchy grows deeper, the leaf directories 
will become more fragmented until they have objects spread across every 
AG on the disk.


3c) Setting extremely high split/merge thresholds will likely mitigate a 
lot of what is happening here in point 3, but with the cost of making 
readdir potentially very expensive when the number of objects/pg grows 
high (say 100K objects/PG or more).  This is primarily a problem during 
operations that need to list the files in the PG like during recovery.


So what can be done?

1) Make sure the bucket index pool has enough PGs for reasonably good 
distribution.  If you have SSDs for journals, you may want to consider 
co-locating a set of OSDs for the bucket index poo

Re: [ceph-users] problem starting osd ; PGLog.cc: 984: FAILED assert hammer 0.94.9

2016-09-13 Thread Henrik Korkuc

On 16-09-13 11:13, Ronny Aasen wrote:
I suspect this must be a difficult question since there have been no 
replies on irc or mailinglist.


assuming it's impossible to get these osd's running again.

Is there a way to recover objects from the disks. ? they are mounted 
and data is readable. I have pg's down since they want to probe these 
osd's that do not want to start.


pg query claim it can continue if i mark the osd as lost. but i would 
prefer to not loose data. especially since the data is ok and readable 
on the nonfunctioning osd.


also let me know if there is other debug i can extract in order to 
troubleshoot the non starting osd's


kind regards
Ronny Aasen


I cannot help you with this, but you can try using 
http://ceph.com/community/incomplete-pgs-oh-my/ and 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-April/000238.html 
(found this mail thread googling for the objectool post). ymmv






On 12. sep. 2016 13:16, Ronny Aasen wrote:

after adding more osd's and having a big backfill running 2 of my osd's
keep on stopping.

We also recently upgraded from 0.94.7 to 0.94.9 but i do not know if
that is related.

the log say.

 0> 2016-09-12 10:31:08.288858 7f8749125880 -1 osd/PGLog.cc: In
function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t,
ghobject_t, const pg_info_t&, std::map&,
PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&,
std::set >*)' thread 7f8749125880 time
2016-09-12 10:31:08.286337
osd/PGLog.cc: 984: FAILED assert(oi.version == i->first)

googeling led me to a bug that seems to be related to infernalis only.
dmesg does not show anything wrong with the hardware.

this is debian running hammer 0.94.9
and the osd is a software raid5 consisting of 5 3TB harddrives.
journal is a partition on ssd intel 3500

anyone have a clue to what can be wrong ?

kind regrads
Ronny Aasen





-- log debug_filestore=10 --
   -19> 2016-09-12 10:31:08.070947 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/1df4bfdd/rb.0.392c.238e1f29.002bd134/head '_' = 266
-18> 2016-09-12 10:31:08.083111 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/deb5bfdd/rb.0.392c.238e1f29.002bc596/head '_' = 266
-17> 2016-09-12 10:31:08.096718 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/9be5dfdd/rb.0.392c.238e1f29.002bc2bf/head '_' = 266
-16> 2016-09-12 10:31:08.110048 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/cbf8ffdd/rb.0.392c.238e1f29.002b9d89/head '_' = 266
-15> 2016-09-12 10:31:08.126263 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/e49d0fdd/rb.0.392c.238e1f29.002b078e/head '_' = 266
-14> 2016-09-12 10:31:08.150199 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/e49d0fdd/rb.0.392c.238e1f29.002b078e/22 '_' = 259
-13> 2016-09-12 10:31:08.173223 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/d0827fdd/rb.0.392c.238e1f29.002b0373/head '_' = 266
-12> 2016-09-12 10:31:08.199192 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/d0827fdd/rb.0.392c.238e1f29.002b0373/22 '_' = 259
-11> 2016-09-12 10:31:08.232712 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/bf4effdd/rb.0.392c.238e1f29.002ae882/head '_' = 266
-10> 2016-09-12 10:31:08.265331 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/bf4effdd/rb.0.392c.238e1f29.002ae882/22 '_' = 259
 -9> 2016-09-12 10:31:08.265456 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) error opening file
/var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.00b381ae__head_DB220FDD__1 


with flags=2: (2) No such file or directory
 -8> 2016-09-12 10:31:08.265475 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/db220fdd/rb.0.392c.238e1f29.00b381ae/head '_' = -2
 -7> 2016-09-12 10:31:08.265535 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) error opening file
/var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.00b381ae__21_DB220FDD__1 


with flags=2: (2) No such file or directory
 -6> 2016-09-12 10:31:08.265546 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/db220fdd/rb.0.392c.238e1f29.00b381ae/21 '_' = -2
 -5> 2016-09-12 10:31:08.265609 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) error opening file
/var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.00cf4057__head_12020FDD__1 


with flags=2: (2) No such file or directory
 -4> 2016-09-12 10:31:08.265628 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/12020fdd/rb.0.392c.238e1f29.00cf4057/head '_' = -2
 -3> 2016-09-12 10:31:08.265688 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) error opening file
/var/lib/ceph/osd/ceph-8/curre

Re: [ceph-users] jewel blocked requests

2016-09-13 Thread Wido den Hollander

> Op 13 september 2016 om 15:58 schreef "WRIGHT, JON R (JON R)" 
> :
> 
> 
> Yes, I do have old clients running.  The clients are all vms.  Is it 
> typical that vm clients have to be rebuilt after a ceph upgrade?
> 

No, not always, but it is just that I saw this happening recently after a Jewel 
upgrade.

What version are the client(s) still running?

Wido

> Thanks,
> 
> Jon
> 
> 
> On 9/12/2016 4:05 PM, Wido den Hollander wrote:
> >> Op 12 september 2016 om 18:47 schreef "WRIGHT, JON R (JON R)" 
> >> :
> >>
> >>
> >> Since upgrading to Jewel from Hammer, we're started to see HEALTH_WARN
> >> because of 'blocked requests > 32 sec'.   Seems to be related to writes.
> >>
> >> Has anyone else seen this?  Or can anyone suggest what the problem might 
> >> be?
> >>
> > Do you by any chance have old clients connecting? I saw this after a Jewel 
> > upgrade as well and it was because of very old clients still connecting to 
> > the cluster.
> >
> > Wido
> >
> >> Thanks!
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Ilya Dryomov
On Tue, Sep 13, 2016 at 4:11 PM, Nikolay Borisov  wrote:
>
>
> On 09/13/2016 04:30 PM, Ilya Dryomov wrote:
> [SNIP]
>>
>> Hmm, it could be about whether it is able to do journal replay on
>> mount.  When you mount a snapshot, you get a read-only block device;
>> when you mount a clone image, you get a read-write block device.
>>
>> Let's try this again, suppose image is foo and snapshot is snap:
>>
>> # fsfreeze -f /mnt
>>
>> # rbd snap create foo@snap
>> # rbd map foo@snap
>> /dev/rbd0
>> # file -s /dev/rbd0
>> # fsck.ext4 -n /dev/rbd0
>> # mount /dev/rbd0 /foo
>> # umount /foo
>> 
>> # file -s /dev/rbd0
>> # fsck.ext4 -n /dev/rbd0
>>
>> # rbd clone foo@snap bar
>> $ rbd map bar
>> /dev/rbd1
>> # file -s /dev/rbd1
>> # fsck.ext4 -n /dev/rbd1
>> # mount /dev/rbd1 /bar
>> # umount /bar
>> 
>> # file -s /dev/rbd1
>> # fsck.ext4 -n /dev/rbd1
>>
>> Could you please provide the output for the above?
>
> Here you go : http://paste.ubuntu.com/23173721/

OK, so that explains it: the frozen filesystem is "needs journal
recovery", so mounting it off of read-only block device leads to
errors.

root@alxc13:~# fsfreeze -f /var/lxc/c11579
root@alxc13:~# rbd snap create rbd/c11579@snap_test
root@alxc13:~# rbd map c11579@snap_test
/dev/rbd151
root@alxc13:~# fsfreeze -u /var/lxc/c11579
root@alxc13:~# file -s /dev/rbd151
/dev/rbd151: Linux rev 1.0 ext4 filesystem data (needs journal
recovery) (extents) (large files) (huge files)

Now, to isolate the problem, the easiest would probably be to try to
reproduce it with loop devices.  Can you try dding one of these images
to a file, make sure that the filesystem is clean, losetup + mount,
freeze, make a "snapshot" with cp and losetup -r + mount?

Try sticking file -s before unfreeze and also compare md5sums:

root@alxc13:~# fsfreeze -f /var/lxc/c11579

root@alxc13:~# rbd snap create rbd/c11579@snap_test
root@alxc13:~# rbd map c11579@snap_test


root@alxc13:~# file -s /dev/rbd151
root@alxc13:~# fsfreeze -u /var/lxc/c11579


root@alxc13:~# file -s /dev/rbd151

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jewel blocked requests

2016-09-13 Thread WRIGHT, JON R (JON R)

VM Client OS: ubuntu 14.04

Openstack: kilo

libvirt: 1.2.12

nova-compute-kvm: 1:2015.1.4-0ubuntu2

Jon

On 9/13/2016 11:17 AM, Wido den Hollander wrote:


Op 13 september 2016 om 15:58 schreef "WRIGHT, JON R (JON R)" 
:


Yes, I do have old clients running.  The clients are all vms.  Is it
typical that vm clients have to be rebuilt after a ceph upgrade?


No, not always, but it is just that I saw this happening recently after a Jewel 
upgrade.

What version are the client(s) still running?

Wido


Thanks,

Jon


On 9/12/2016 4:05 PM, Wido den Hollander wrote:

Op 12 september 2016 om 18:47 schreef "WRIGHT, JON R (JON R)" 
:


Since upgrading to Jewel from Hammer, we're started to see HEALTH_WARN
because of 'blocked requests > 32 sec'.   Seems to be related to writes.

Has anyone else seen this?  Or can anyone suggest what the problem might be?


Do you by any chance have old clients connecting? I saw this after a Jewel 
upgrade as well and it was because of very old clients still connecting to the 
cluster.

Wido


Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Lots of "wrongly marked me down" messages

2016-09-13 Thread Oliver Francke
Hi,

I can only second this, revert all, but especially:

net.core.netdev_max_backlog = 5

this def. leads to bad behaviour, so back to 1000, or max 2500 and re-check

Regards,

Oliver.


> Am 12.09.2016 um 22:06 schrieb Wido den Hollander :
> 
>> net.core.netdev_max_backlog = 5

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-osd fail to be started

2016-09-13 Thread strony zhang
Hi Ronny,
After the disks are activated, the OSDs get recovered. Thanks for your info.
Thanks,Strony 

On Tuesday, September 13, 2016 1:00 AM, Ronny Aasen 
 wrote:
 

 On 13. sep. 2016 07:10, strony zhang wrote:
> Hi,
>
> My ceph cluster include 5 OSDs. 3 osds are installed in the host
> 'strony-tc' and 2 are in the host 'strony-pc'. Recently, both of hosts
> were rebooted due to power cycles. After all of disks are mounted again,
> the ceph-osd are in the 'down' status. I tried cmd, "sudo start ceph-osd
> id=x', to start the OSDs. But they are not started well with the error
> below reported in the 'dmesg' output. Any suggestions about how to make
> the OSDs started well? Any comments are appreciated.
>
> "
> [6595400.895147] init: ceph-osd (ceph/1) main process ended, respawning
> [6595400.969346] init: ceph-osd (ceph/1) main process (21990) terminated
> with status 1
> [6595400.969352] init: ceph-osd (ceph/1) respawning too fast, stopped
> "
>
> :~$ ceph osd tree
> ID WEIGHT  TYPE NAME          UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 1.09477 root default
> -2 0.61818    host strony-tc
>  0 0.2        osd.0        down        0          1.0
>  1 0.21819        osd.1        down        0          1.0
>  4 0.2        osd.4          up  1.0          1.0
> -3 0.47659    host strony-pc
>  2 0.23830        osd.2        down        0          1.0
>  3 0.23830        osd.3        down        0          1.0
>
> :~$ cat /etc/ceph/ceph.conf
> [global]
> fsid = 60638bfd-1eea-46d5-900d-36224475d8aa
> mon_initial_members = strony-tc
> mon_host = 10.132.141.122
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> osd_pool_default_size = 2
>
> Thanks,
> Strony
>


greetings.

I have somewhat of a similar problem
osd's that are just a single disk start on boot.

but osd's that are software raid md devices does not start automatically 
on boot

in order to mount and start them i have to run
ceph-disk-activate /dev/md127p1

where /dev/md127p1 is the xfs partition for the osd.

good luck
Ronny Aasen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


   ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread yu2xiangyang

I have  tried all Jewel packages and it runs correctly and I think the problem 
is in osdc  at ceph-0.94-3.
There must be some previous commits which solved the problem.

At 2016-09-13 18:08:19, "John Spray"  wrote:
>On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang  wrote:
>> Hello everyone,
>>
>> I have met a ceph-fuse crash when i add osd to osd pool.
>>
>> I am writing data through ceph-fuse,then i add one osd to osd pool, after
>> less than 30 s, the ceph-fuse process crash.
>>
>> The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow:
>
>I missed this version mismatch until someone pointed it out (thanks Brad)
>
>In theory the newer fuse client should still work with the older OSD,
>but it would be very interesting to know if this issue is still
>reproducible if you use all Jewel packages.
>
>John
>
>>
>> [root@localhost ~]# rpm -qa | grep ceph
>> libcephfs1-10.2.2-0.el7.centos.x86_64
>> python-cephfs-10.2.2-0.el7.centos.x86_64
>> ceph-common-0.94.3-0.el7.x86_64
>> ceph-fuse-10.2.2-0.el7.centos.x86_64
>> ceph-0.94.3-0.el7.x86_64
>> ceph-mds-10.2.2-0.el7.centos.x86_64
>> [root@localhost ~]#
>> [root@localhost ~]#
>> [root@localhost ~]# rpm -qa | grep rados
>> librados2-devel-0.94.3-0.el7.x86_64
>> librados2-0.94.3-0.el7.x86_64
>> libradosstriper1-0.94.3-0.el7.x86_64
>> python-rados-0.94.3-0.el7.x86_64
>>
>> ceph stat:
>>
>> [root@localhost ~]# ceph status
>> cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
>>  health HEALTH_WARN
>> clock skew detected on mon.2, mon.0
>> 19 pgs stale
>> 19 pgs stuck stale
>> Monitor clock skew detected
>>  monmap e3: 3 mons at
>> {0=10.222.5.229:6789/0,1=10.222.5.156:6789/0,2=10.222.5.162:6789/0}
>> election epoch 26, quorum 0,1,2 1,2,0
>>  mdsmap e58: 1/1/1 up {0=0=up:active}, 1 up:standby
>>  osdmap e324: 9 osds: 9 up, 9 in
>>   pgmap v3505: 320 pgs, 3 pools, 4638 MB data, 1302 objects
>> 23373 MB used, 68695 MB / 92069 MB avail
>>  301 active+clean
>>   19 stale+active+clean
>>
>> ceph osd stat:
>> [root@localhost ~]# ceph osd dump
>> epoch 324
>> fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
>> created 2016-09-13 11:08:34.629245
>> modified 2016-09-13 16:21:53.285729
>> flags
>> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
>> pool 5 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 285 flags hashpspool
>> crash_replay_interval 45 stripe_width 0
>> pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 282 flags hashpspool
>> stripe_width 0
>> max_osd 9
>> osd.0 up   in  weight 1 up_from 271 up_thru 321 down_at 242
>> last_clean_interval [169,175) 10.222.5.229:6800/3780 10.222.5.229:6801/3780
>> 10.222.5.229:6802/3780 10.222.5.229:6803/3780 exists,up
>> 1bf6cda4-bf1a-4f8a-836d-b6aec970d257
>> osd.1 up   in  weight 1 up_from 223 up_thru 320 down_at 186
>> last_clean_interval [20,183) 10.222.5.229:6804/2228 10.222.5.229:6805/2228
>> 10.222.5.229:6806/2228 10.222.5.229:6807/2228 exists,up
>> 3f3ad2fa-52b1-46fd-af6c-05178b814e25
>> osd.2 up   in  weight 1 up_from 224 up_thru 320 down_at 186
>> last_clean_interval [22,183) 10.222.5.229:6808/2259 10.222.5.229:6809/2259
>> 10.222.5.229:6810/2259 10.222.5.229:6811/2259 exists,up
>> 9199193e-9928-4c5d-8adc-2c32a4c8716b
>> osd.3 up   in  weight 1 up_from 312 up_thru 313 down_at 303
>> last_clean_interval [0,0) 10.222.5.156:6800/3592 10.222.5.156:6801/3592
>> 10.222.5.156:6802/3592 10.222.5.156:6803/3592 exists,up
>> 9b8f1cb0-51df-42aa-8be4-8f6347235cc2
>> osd.4 up   in  weight 1 up_from 25 up_thru 322 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6804/25567 10.222.5.156:6805/25567
>> 10.222.5.156:6806/25567 10.222.5.156:6807/25567 exists,up
>> 0c719e5e-f8fc-46e0-926d-426bf6881ee0
>> osd.5 up   in  weight 1 up_from 27 up_thru 310 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6808/25678 10.222.5.156:6809/25678
>> 10.222.5.156:6810/25678 10.222.5.156:6811/25678 exists,up
>> 729e0749-2ce3-426a-a7f1-a3cbfa88ba0b
>> osd.6 up   in  weight 1 up_from 40 up_thru 313 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6807/15887 10.222.5.162:6808/15887
>> 10.222.5.162:6809/15887 10.222.5.162:6810/15887 exists,up
>> dea24f0f-4666-40af-98af-5ab8d42c37c6
>> osd.7 up   in  weight 1 up_from 45 up_thru 313 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6811/16040 10.222.5.162:6812/16040
>> 10.222.5.162:6813/16040 10.222.5.162:6814/16040 exists,up
>> 0e238745-0091-4790-9b39-c9d36f4ebbee
>> osd.8 up   in  weight 1 up_from 49 up_thru 314 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6815/16206 10.222.5.162:6816/16206
>> 10.222.5.162:6817/16206 10.222.5.162:6818/16206 exists,up
>> 59637f86-f283-4397-a63b-474976ee8047
>> [root@localhost ~]#
>> [root

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Adrian Saul

I found I could ignore the XFS issues and just mount it with the appropriate 
options (below from my backup scripts):

#
# Mount with nouuid (conflicting XFS) and norecovery (ro snapshot)
#
if ! mount -o ro,nouuid,norecovery  $SNAPDEV /backup${FS}; then
echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - 
cleaning up"
rbd unmap $SNAPDEV
rbd snap rm ${RBDPATH}@${DATESTAMP}
exit 3;
fi
echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}"

It's impossible without clones to do it without norecovery.



> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Ilya Dryomov
> Sent: Wednesday, 14 September 2016 1:51 AM
> To: Nikolay Borisov
> Cc: ceph-users; SiteGround Operations
> Subject: Re: [ceph-users] Consistency problems when taking RBD snapshot
>
> On Tue, Sep 13, 2016 at 4:11 PM, Nikolay Borisov  wrote:
> >
> >
> > On 09/13/2016 04:30 PM, Ilya Dryomov wrote:
> > [SNIP]
> >>
> >> Hmm, it could be about whether it is able to do journal replay on
> >> mount.  When you mount a snapshot, you get a read-only block device;
> >> when you mount a clone image, you get a read-write block device.
> >>
> >> Let's try this again, suppose image is foo and snapshot is snap:
> >>
> >> # fsfreeze -f /mnt
> >>
> >> # rbd snap create foo@snap
> >> # rbd map foo@snap
> >> /dev/rbd0
> >> # file -s /dev/rbd0
> >> # fsck.ext4 -n /dev/rbd0
> >> # mount /dev/rbd0 /foo
> >> # umount /foo
> >> 
> >> # file -s /dev/rbd0
> >> # fsck.ext4 -n /dev/rbd0
> >>
> >> # rbd clone foo@snap bar
> >> $ rbd map bar
> >> /dev/rbd1
> >> # file -s /dev/rbd1
> >> # fsck.ext4 -n /dev/rbd1
> >> # mount /dev/rbd1 /bar
> >> # umount /bar
> >> 
> >> # file -s /dev/rbd1
> >> # fsck.ext4 -n /dev/rbd1
> >>
> >> Could you please provide the output for the above?
> >
> > Here you go : http://paste.ubuntu.com/23173721/
>
> OK, so that explains it: the frozen filesystem is "needs journal recovery", so
> mounting it off of read-only block device leads to errors.
>
> root@alxc13:~# fsfreeze -f /var/lxc/c11579 root@alxc13:~# rbd snap create
> rbd/c11579@snap_test root@alxc13:~# rbd map c11579@snap_test
> /dev/rbd151
> root@alxc13:~# fsfreeze -u /var/lxc/c11579 root@alxc13:~# file -s
> /dev/rbd151
> /dev/rbd151: Linux rev 1.0 ext4 filesystem data (needs journal
> recovery) (extents) (large files) (huge files)
>
> Now, to isolate the problem, the easiest would probably be to try to
> reproduce it with loop devices.  Can you try dding one of these images to a
> file, make sure that the filesystem is clean, losetup + mount, freeze, make a
> "snapshot" with cp and losetup -r + mount?
>
> Try sticking file -s before unfreeze and also compare md5sums:
>
> root@alxc13:~# fsfreeze -f /var/lxc/c11579  device> root@alxc13:~# rbd snap create rbd/c11579@snap_test
> root@alxc13:~# rbd map c11579@snap_test  device>  root@alxc13:~# file -s /dev/rbd151
> root@alxc13:~# fsfreeze -u /var/lxc/c11579  device>  root@alxc13:~# file -s /dev/rbd151
>
> Thanks,
>
> Ilya
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com