[ceph-users] Delete default pools?
Is it safe to delete all default pools? Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Calculate pgs
I know it is 100*numofosds/replfactor. But I also read somewhere that it should be a value of 2^X. It this still correct? So for 24 osds and repl 3 100*24/3 => 800 => to be 2^X => 1024? Greets Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] "rbd snap rm" overload my cluster (during backups)
Hi, I have a backup script, which every night : * create a snapshot of each RBD image * then delete all snapshot that have more than 15 days The problem is that "rbd snap rm XXX" will overload my cluster for hours (6 hours today...). Here I see several problems : #1 "rbd snap rm XXX" is not blocking. The erase is done in background, and I know no way to verify if it was completed. So I add "sleeps" between rm, but I have to estimate the time it will take #2 "rbd (snap) rm" are sometimes very very slow. I don't know if it's because of XFS or not, but all my OSD are at 100% IO usage (reported by iostat) So : * is there a way to reduce priority of "snap rm", to avoid overloading of the cluster ? * is there a way to have a blocking "snap rm" which will wait until it's completed * is there a way to speedup "snap rm" ? Note that I have a too low PG number on my cluster (200 PG for 40 active OSD ; but I'm trying to progressivly migrate data to a newer pool). Can it be the source of the problem ? Thanks, Olivier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX
A little bit more. I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and than connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all unsuccessfully. Again my question, does anybody use S3 desktop clients with RGW? On Fri, Apr 19, 2013 at 10:54 PM, Igor Laskovy wrote: > Hello! > > Does anybody use Rados Gateway via S3-compatible clients on desktop > systems? > > -- > Igor Laskovy > facebook.com/igor.laskovy > studiogrizzly.com > -- Igor Laskovy facebook.com/igor.laskovy studiogrizzly.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Way to copy rbd Disk image incl snapshots?
Hi, is there a way to copy a rbd disk image incl snapshots from one pool to another? Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD performance test (write) problem
Hi all, Have any comments about 1) or 2) ? Thanks!!! Hi Mark, Sorry, reply too late, because I didn't receive this mail so missed this message in the several days... http://www.mail-archive.com/ceph-users@lists.ceph.com/msg00624.html Your advice are very very helpful to me !!! thanks :) I have done the following test and have some questions 1) I concurrently use dd if=/dev/zero of=/dev/sd[b,c,d,e,f ...n] bs=4096k count=1 oflag=direct , on each SATA disk collectl show: #<--Disks---><--Network--> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 0 0 2935636 0 0 866560 2708 1 9 0 1 0 0 2939718 0 0 865620 2708 2 14 1 4 0 0 2872631 0 0 868480 2714 1 8 0 1 0 0 2937621 0 0 864640 2702 1 9 0 4 total write throughput about 860MB/s use RADOS bench : rados -p rbd bench 300 write -t 256 collectl show: # <--Disks---><--Network--> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 22 10 6991 17947 4 1 999K 3111 4 31 48 22 18 8 6151 16116 0 0 1003K 2858 8 40 23 37 19 9 6295 16031 8 2 1002K 2458 2 22 44 17 total write throughput about 1000MB/s the expander backplane running at 3.0Gb/s and 4-lane Mini-SAS port to connect : 4 * 3Gb/s = 12Gb/s ~= 1GB/s, so I think write throughput stuck on 1000MB/s due to expander backplane that is bottleneck for sequential writes. If expander backplane can running at 6.0Gb/s then total write throughput will increase right? 2) OSDs & journal setting: a. OSDs filesystems are EXT4 , no use osd mkfs options osd mkfs type = ext4 osd mount options ext4 = rw,data=writeback,errors=remount-ro,noatime,nodiratime,user_xattr filestore_xattr_use_omap = true b. SSDs journal are raw disk that don't has filesystem and divided into two partition (alignment) LSI MegaRAID SAS 9260-4i setting: a. every HDD : RAID0 , Write Policy: Write Back with BBU, Read Policy: ReadAhead, IO Policy: Direct, Disk cache: unchanged b. every SSD : RAID0 , Write Policy: Write Through, Read Policy: NoReadAhead, IO Policy: Direct, Disk cache: disabled Because the last result are pool size=576, so i did a new test for pool size=2048 and 9 OSDs + 4 SSDs configuration !! Read: rados -p testpool bench 300 seq -t 256 Write: rados -p testpool bench 300 write -t 256 --no-cleanup Rados Bench TEST (Read): 2x replication & 12 OSDs case: Bandwidth (MB/sec): Bandwidth (MB/sec): 1373.013 1x replication & 12 OSDs case: Bandwidth (MB/sec): Bandwidth (MB/sec): 1478.694 2x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec):1442.543 1x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec):1448.407 2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):1485.175 1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):447.245 Rados Bench TEST (Write): 2x replication & 12 OSDs case: Bandwidth (MB/sec): 228.064 1x replication & 12 OSDs case: Bandwidth (MB/sec): 457.213 2x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec): 224.430 1x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec): 482.104 2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 239 1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 485 In Rados Bench TEST (Read), I originally expected more OSDs can increase the read bandwidth but the results show about 1400MB/s on most case, this is cache intervention? because i didn't see any read operation on disk ... In Rados Bench TEST (Write), I test 9 OSDs + 3 SSDs (Journal) configuration and observing by collectl # DISK STATISTICS (/sec) # <-reads-><-writes-> Pct #Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util sdc 0 000 47336382 335 141 141 0 2 07 sdd 0 000 65600 0 240 273 273 1 4 07 sde 0 000 56440 0 207 273 27264 342 4 99 sdg 0 000 43544450 326 134 13339 135 3 99 sdf 0 000 65600 0 240 273 273 0 2 07 sdh 0 000 57400 0 210 273 273 0 2 07 sdi 0 000 69012227 251 275 27490 560 3 99 sdj 0 000 66944424 308 217 217 1 5 07 sdb 0 000
Re: [ceph-users] Cephfs unaccessible
So, I've restarted the new osds as many as possible and the cluster started to move data to the 2 new nodes overnight. This morning there was not netowrk traffic and the healt was HEALTH_ERR 1323 pgs backfill; 150 pgs backfill_toofull; 100 pgs backfilling; 114 pgs degraded; 3374 pgs peering; 36 pgs recovering; 949 pgs recovery_wait; 3374 pgs stuck inactive; 6289 pgs stuck unclean; recovery 2130652/20890113 degraded (10.199%); 58/8914654 unfound (0.001%); 1 full osd(s); 22 near full osd(s); full,noup,nodown flag(s) set So I have unset the noup and nodown flags and the data started movin again I've increased the full ratio to 97% so now there's no "official" full osd and the HEALTH_ERR became HEALT_WARN However, still no access to filesystem HEALTH_WARN 1906 pgs backfill; 21 pgs backfill_toofull; 52 pgs backfilling; 707 pgs degraded; 371 pgs down; 97 pgs incomplete; 3385 pgs peering; 35 pgs recovering; 1002 pgs recovery_wait; 4 pgs stale; 683 pgs stuck inactive; 5898 pgs stuck unclean; recovery 3081499/22208859 degraded (13.875%); 487/9433642 unfound (0.005%); recovering 11722 o/s, 57040MB/s; 17 near full osd(s) The osd are flapping in/out again... I'm disposed to start deleting some portion of data. What can I try to do now? 2013/4/21 Gregory Farnum : > It's not entirely clear from your description and the output you've > given us, but it looks like maybe you've managed to bring up all your > OSDs correctly at this point? Or are they just not reporting down > because you set the "no down" flag... > > In any case, CephFS isn't going to come up while the underlying RADOS > cluster is this unhealthy, so you're going to need to get that going > again. Since your OSDs have managed to get themselves so full it's > going to be trickier than normal, but if all the rebalancing that's > happening is only because you sort-of-didn't-really lose nodes, and > you can bring them all back up, you should be able to sort it out by > getting all the nodes back up, and then changing your full percentages > (by a *very small* amount); since you haven't been doing any writes to > the cluster it shouldn't take much data writes to get everything back > where it was, although if this has been continuing to backfill in the > meanwhile that will need to unwind. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Sat, Apr 20, 2013 at 12:21 PM, John Wilkins > wrote: >> I don't see anything related to lost objects in your output. I just see >> waiting on backfill, backfill_toofull, remapped, and so forth. You can read >> a bit about what is going on here: >> http://ceph.com/docs/next/rados/operations/monitoring-osd-pg/ >> >> Keep us posted as to the recovery, and let me know what I can do to improve >> the docs for scenarios like this. >> >> >> On Sat, Apr 20, 2013 at 10:52 AM, Marco Aroldi >> wrote: >>> >>> John, >>> thanks for the quick reply. >>> Below you can see my ceph osd tree >>> The problem is caused not by the failure itself, but by the "renamed" >>> bunch of devices. >>> It was like a deadly 15-puzzle >>> I think that the solution was to mount the devices in fstab using UUID >>> (/dev/disk/by-uuid) instead of /dev/sdX >>> >>> However, yes I have an entry in my ceph.conf (devs = /dev/sdX1 -- >>> osd_journal = /dev/sdX2) *and* an entry in my fstab for each OSD >>> >>> The node with failed disk is s103 (osd.59) >>> >>> Now i have 5 osd from s203 up and in to try to let ceph rebalance >>> data... but is still a bloody mess. >>> Look at ceph -w output: is reported a total of 110TB: is wrong... al >>> drives are 2TB and i have 49 drives up and in -- total 98Tb >>> I think that 110TB (55 osd) was the size before cluster became >>> inaccessible >>> >>> # idweighttype nameup/downreweight >>> -1130root default >>> -965room p1 >>> -344rack r14 >>> -422host s101 >>> 112osd.11up1 >>> 122osd.12up1 >>> 132osd.13up1 >>> 142osd.14up1 >>> 152osd.15up1 >>> 162osd.16up1 >>> 172osd.17up1 >>> 182osd.18up1 >>> 192osd.19up1 >>> 202osd.20up1 >>> 212osd.21up1 >>> -622host s102 >>> 332osd.33up1 >>> 342osd.34up1 >>> 352osd.35up1 >>> 362osd.36up1 >>> 372osd.37up1 >>> 382osd.38up1 >>> 392osd.39up1 >>> 402osd.40up1 >>> 412osd.41up1 >>> 422osd.42up1 >>> 432
Re: [ceph-users] Delete default pools?
On Sun, Apr 21, 2013 at 12:35 AM, Stefan Priebe - Profihost AG wrote: > Is it safe to delete all default pools? As long as you don't have any data you need in there; the system won't break without them or anything like that. They're favored only in that tools default to using them (eg the rbd tools default to using the rbd pool). -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] "rbd snap rm" overload my cluster (during backups)
Which version of Ceph are you running right now and seeing this with (Sam reworked it a bit for Cuttlefish and it was in some of the dev releases)? Snapshot deletes are a little more expensive than we'd like, but I'm surprised they're doing this badly for you. :/ -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet wrote: > Hi, > > I have a backup script, which every night : > * create a snapshot of each RBD image > * then delete all snapshot that have more than 15 days > > The problem is that "rbd snap rm XXX" will overload my cluster for hours > (6 hours today...). > > Here I see several problems : > #1 "rbd snap rm XXX" is not blocking. The erase is done in background, > and I know no way to verify if it was completed. So I add "sleeps" > between rm, but I have to estimate the time it will take > > #2 "rbd (snap) rm" are sometimes very very slow. I don't know if it's > because of XFS or not, but all my OSD are at 100% IO usage (reported by > iostat) > > > > So : > * is there a way to reduce priority of "snap rm", to avoid overloading > of the cluster ? > * is there a way to have a blocking "snap rm" which will wait until it's > completed > * is there a way to speedup "snap rm" ? > > > Note that I have a too low PG number on my cluster (200 PG for 40 active > OSD ; but I'm trying to progressivly migrate data to a newer pool). Can > it be the source of the problem ? > > Thanks, > > Olivier > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Way to copy rbd Disk image incl snapshots?
On Sun, Apr 21, 2013 at 5:01 AM, Stefan Priebe - Profihost AG wrote: > Hi, > > is there a way to copy a rbd disk image incl snapshots from one pool to > another? Not directly and not right now, sorry. What are you trying to do? Would it suffice for instance to manually create all the snapshots you care about by copying the first, snapshotting, copying the changes in the second, snapshotting, etc? I believe there's work going on to make this easier for the Dumpling release. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Calculate pgs
On Sun, Apr 21, 2013 at 12:39 AM, Stefan Priebe - Profihost AG wrote: > I know it is 100*numofosds/replfactor. But I also read somewhere that it > should be a value of 2^X. It this still correct? So for 24 osds and repl 3 > 100*24/3 => 800 => to be 2^X => 1024? PG counts of 2^x ensure that each PG covers exactly the same amount of hash space. This can improve your distribution, but at reasonable PG counts is definitely not critical — I might use that if I were going to do less than 50 PGs/OSD like you are, but it shouldn't be necessary. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX
On Sun, Apr 21, 2013 at 3:02 AM, Igor Laskovy wrote: > A little bit more. > > I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and than > connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all > unsuccessfully. > > Again my question, does anybody use S3 desktop clients with RGW? These applications should be compatible with rgw. Are you sure your setup works? What are you getting? Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Delete default pools?
Am 21.04.2013 um 17:41 schrieb Gregory Farnum : > On Sun, Apr 21, 2013 at 12:35 AM, Stefan Priebe - Profihost AG > wrote: >> Is it safe to delete all default pools? > > As long as you don't have any data you need in there; the system won't > break without them or anything like that. They're favored only in that > tools default to using them (eg the rbd tools default to using the rbd > pool). Thanks! No data in it. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Way to copy rbd Disk image incl snapshots?
Am 21.04.2013 um 17:47 schrieb Gregory Farnum : > On Sun, Apr 21, 2013 at 5:01 AM, Stefan Priebe - Profihost AG > wrote: >> Hi, >> >> is there a way to copy a rbd disk image incl snapshots from one pool to >> another? > > Not directly and not right now, sorry. What are you trying to do? > Would it suffice for instance to manually create all the snapshots you > care about by copying the first, snapshotting, copying the changes in > the second, snapshotting, etc? > I believe there's work going on to make this easier for the Dumpling release. I've 8192 pgs for 24 osds and repl 3 I thought this is way too much. So my idea was to create a new pool and copy every disk... Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX
Well, in each case something specific. For CrossFTP, for example, it says that asking the server it receive text data instead of XML. In logs on servers side I don't found something interested. I do everything shown at http://ceph.com/docs/master/radosgw/ and only that, excluding swift compatible preparation. May be there are needs something additional? Manual creating of root bucket or something like that? On Sun, Apr 21, 2013 at 6:53 PM, Yehuda Sadeh wrote: > On Sun, Apr 21, 2013 at 3:02 AM, Igor Laskovy > wrote: > > A little bit more. > > > > I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and > than > > connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all > > unsuccessfully. > > > > Again my question, does anybody use S3 desktop clients with RGW? > > These applications should be compatible with rgw. Are you sure your > setup works? What are you getting? > > Yehuda > -- Igor Laskovy facebook.com/igor.laskovy studiogrizzly.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX
On Sun, Apr 21, 2013 at 9:39 AM, Igor Laskovy wrote: > Well, in each case something specific. For CrossFTP, for example, it says > that asking the server it receive text data instead of XML. When doing what? Are you able to do anything? > In logs on servers side I don't found something interested. What do the apache access and error logs show? > > I do everything shown at http://ceph.com/docs/master/radosgw/ and only that, > excluding swift compatible preparation. > May be there are needs something additional? Manual creating of root bucket > or something like that? > > > On Sun, Apr 21, 2013 at 6:53 PM, Yehuda Sadeh wrote: >> >> On Sun, Apr 21, 2013 at 3:02 AM, Igor Laskovy >> wrote: >> > A little bit more. >> > >> > I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and >> > than >> > connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all >> > unsuccessfully. >> > >> > Again my question, does anybody use S3 desktop clients with RGW? >> >> These applications should be compatible with rgw. Are you sure your >> setup works? What are you getting? >> >> Yehuda > > > > > -- > Igor Laskovy > facebook.com/igor.laskovy > studiogrizzly.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs unaccessible
What I can try to do/delete to regain access? Those osd are crazy, flapping up and down. I think that the situation is without control HEALTH_WARN 2735 pgs backfill; 13 pgs backfill_toofull; 157 pgs backfilling; 188 pgs degraded; 251 pgs peering; 13 pgs recovering; 1159 pgs recovery_wait; 159 pgs stuck inactive; 4641 pgs stuck unclean; recovery 4007916/23007073 degraded (17.420%); recovering 4 o/s, 31927KB/s; 19 near full osd(s) 2013-04-21 18:56:46.839851 mon.0 [INF] pgmap v1399007: 17280 pgs: 276 active, 12791 active+clean, 2575 active+remapped+wait_backfill, 71 active+degraded+wait_backfill, 6 active+remapped+wait_backfill+backfill_toofull, 1121 active+recovery_wait, 90 peering, 3 remapped, 1 active+remapped, 127 active+remapped+backfilling, 1 active+degraded, 5 active+remapped+backfill_toofull, 19 active+degraded+backfilling, 1 active+clean+scrubbing, 79 active+degraded+remapped+wait_backfill, 36 active+recovery_wait+remapped, 1 active+degraded+remapped+wait_backfill+backfill_toofull, 46 remapped+peering, 16 active+degraded+remapped+backfilling, 1 active+recovery_wait+degraded+remapped, 14 active+recovering; 50435 GB data, 74790 GB used, 38642 GB / 110 TB avail; 4018849/23025448 degraded (17.454%); recovering 14 o/s, 54732KB/s # idweighttype nameup/downreweight -1130root default -965room p1 -344rack r14 -422host s101 112osd.11up1 122osd.12up1 132osd.13up1 142osd.14up1 152osd.15up1 162osd.16up1 172osd.17up1 182osd.18up1 192osd.19up1 202osd.20up1 212osd.21up1 -622host s102 332osd.33up1 342osd.34up1 352osd.35up1 362osd.36up1 372osd.37up1 382osd.38up1 392osd.39up1 402osd.40up1 412osd.41up1 422osd.42up1 432osd.43up1 -1321rack r10 -1221host s103 552osd.55up1 562osd.56up1 572osd.57up1 582osd.58up1 592osd.59down0 602osd.60down0 612osd.61down0 622osd.62up1 632osd.63up1 641.5osd.64up1 651.5osd.65down0 -1065room p2 -722rack r20 -522host s202 222osd.22up1 232osd.23up1 242osd.24up1 252osd.25up1 262osd.26up1 272osd.27up1 282osd.28up1 292osd.29up1 302osd.30up1 312osd.31up1 322osd.32up1 -822rack r22 -222host s201 02osd.0up1 12osd.1up1 22osd.2up1 32osd.3up1 42osd.4up1 52osd.5up1 62osd.6up1 72osd.7up1 82osd.8up1 92osd.9up1 102osd.10up1 -1421rack r21 -1121host s203 442osd.44up1 452osd.45up1 462osd.46up1 472osd.47up1 482osd.48up1 492osd.49up1 502osd.50up1 512osd.51up1 521.5osd.52up1 531.5osd.53up1 542osd.54up1 2013/4/21 Marco Aroldi : > So, I've restarted the new osds as many as possible and the cluster > started to move data to the 2 new nodes overnight. > This morning there was not netowrk traffic and the healt was > > HEALTH_ERR 1323 pgs backfill; 150 pgs backfill_toofull; 100 pgs > backfilling; 11
Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX
Just initial connect to rgw server, nothing further. Please see below behavior for CrossFTP and S3Browser cases. On CrossFTP side: [R1] Connect to rgw.labspace [R1] Current path: / [R1] Current path: / [R1] LIST / [R1] Expected XML document response from S3 but received content type text/html [R1] Disconnected On rgw side: root@osd01:~# ps aux |grep rados root 1785 0.4 0.1 2045404 6068 ?Ssl 19:47 0:00 /usr/bin/radosgw -n client.radosgw.a root@osd01:~# tail -f /var/log/apache2/error.log [Sun Apr 21 19:43:56 2013] [notice] FastCGI: process manager initialized (pid 1433) [Sun Apr 21 19:43:56 2013] [notice] Apache/2.2.22 (Ubuntu) mod_fastcgi/mod_fastcgi-SNAP-0910052141 mod_ssl/2.2.22 OpenSSL/1.0.1 configured -- resuming normal operations [Sun Apr 21 19:50:19 2013] [error] [client 192.168.1.51] File does not exist: /var/www/favicon.ico tail -f /var/log/apache2/access.log nothing On S3browser side: [image: Inline image 2] [4/21/2013 7:56 PM] Getting buckets list... TaskID: 2 [4/21/2013 7:56 PM] System.Net.WebException:The underlying connection was closed: An unexpected error occurred on a send. TaskID: 2 TaskID: 2 [4/21/2013 7:56 PM] Error occurred during Getting buckets list TaskID: 2 On rgw side: root@osd01:~# tail -f /var/log/apache2/error.log [Sun Apr 21 19:56:19 2013] [error] [client 192.168.1.51] Invalid method in request \x16\x03\x01 [Sun Apr 21 19:56:22 2013] [error] [client 192.168.1.51] Invalid method in request \x16\x03\x01 [Sun Apr 21 19:56:23 2013] [error] [client 192.168.1.51] Invalid method in request \x16\x03\x01 [Sun Apr 21 19:56:23 2013] [error] [client 192.168.1.51] Invalid method in request \x16\x03\x01 [Sun Apr 21 19:56:24 2013] [error] [client 192.168.1.51] Invalid method in request \x16\x03\x01 [Sun Apr 21 19:56:24 2013] [error] [client 192.168.1.51] Invalid method in request \x16\x03\x01 [Sun Apr 21 19:56:25 2013] [error] [client 192.168.1.51] Invalid method in request \x16\x03\x01 [Sun Apr 21 19:56:25 2013] [error] [client 192.168.1.51] Invalid method in request \x16\x03\x01 tail -f /var/log/apache2/access.log nothing On Sun, Apr 21, 2013 at 7:43 PM, Yehuda Sadeh wrote: > On Sun, Apr 21, 2013 at 9:39 AM, Igor Laskovy > wrote: > > Well, in each case something specific. For CrossFTP, for example, it says > > that asking the server it receive text data instead of XML. > > When doing what? Are you able to do anything? > > > In logs on servers side I don't found something interested. > > What do the apache access and error logs show? > > > > > I do everything shown at http://ceph.com/docs/master/radosgw/ and only > that, > > excluding swift compatible preparation. > > May be there are needs something additional? Manual creating of root > bucket > > or something like that? > > > > > > On Sun, Apr 21, 2013 at 6:53 PM, Yehuda Sadeh > wrote: > >> > >> On Sun, Apr 21, 2013 at 3:02 AM, Igor Laskovy > >> wrote: > >> > A little bit more. > >> > > >> > I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and > >> > than > >> > connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all > >> > unsuccessfully. > >> > > >> > Again my question, does anybody use S3 desktop clients with RGW? > >> > >> These applications should be compatible with rgw. Are you sure your > >> setup works? What are you getting? > >> > >> Yehuda > > > > > > > > > > -- > > Igor Laskovy > > facebook.com/igor.laskovy > > studiogrizzly.com > -- Igor Laskovy facebook.com/igor.laskovy studiogrizzly.com <>___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs unaccessible
Greg, your supposition about the small amount data to be written is right but the rebalance is writing an insane amount of data to the new nodes and the mount is not working again this is the node S203 (the os is on /dev/sdl, not listed) /dev/sda1 1.9T 467G 1.4T 26% /var/lib/ceph/osd/ceph-44 /dev/sdb1 1.9T 595G 1.3T 33% /var/lib/ceph/osd/ceph-45 /dev/sdc1 1.9T 396G 1.5T 22% /var/lib/ceph/osd/ceph-46 /dev/sdd1 1.9T 401G 1.5T 22% /var/lib/ceph/osd/ceph-47 /dev/sde1 1.9T 337G 1.5T 19% /var/lib/ceph/osd/ceph-48 /dev/sdf1 1.9T 441G 1.4T 24% /var/lib/ceph/osd/ceph-49 /dev/sdg1 1.9T 338G 1.5T 19% /var/lib/ceph/osd/ceph-50 /dev/sdh1 1.9T 359G 1.5T 20% /var/lib/ceph/osd/ceph-51 /dev/sdi1 1.4T 281G 1.1T 21% /var/lib/ceph/osd/ceph-52 /dev/sdj1 1.4T 423G 964G 31% /var/lib/ceph/osd/ceph-53 /dev/sdk1 1.9T 421G 1.4T 23% /var/lib/ceph/osd/ceph-54 2013/4/21 Marco Aroldi : > What I can try to do/delete to regain access? > Those osd are crazy, flapping up and down. I think that the situation > is without control > > > HEALTH_WARN 2735 pgs backfill; 13 pgs backfill_toofull; 157 pgs > backfilling; 188 pgs degraded; 251 pgs peering; 13 pgs recovering; > 1159 pgs recovery_wait; 159 pgs stuck inactive; 4641 pgs stuck > unclean; recovery 4007916/23007073 degraded (17.420%); recovering 4 > o/s, 31927KB/s; 19 near full osd(s) > > 2013-04-21 18:56:46.839851 mon.0 [INF] pgmap v1399007: 17280 pgs: 276 > active, 12791 active+clean, 2575 active+remapped+wait_backfill, 71 > active+degraded+wait_backfill, 6 > active+remapped+wait_backfill+backfill_toofull, 1121 > active+recovery_wait, 90 peering, 3 remapped, 1 active+remapped, 127 > active+remapped+backfilling, 1 active+degraded, 5 > active+remapped+backfill_toofull, 19 active+degraded+backfilling, 1 > active+clean+scrubbing, 79 active+degraded+remapped+wait_backfill, 36 > active+recovery_wait+remapped, 1 > active+degraded+remapped+wait_backfill+backfill_toofull, 46 > remapped+peering, 16 active+degraded+remapped+backfilling, 1 > active+recovery_wait+degraded+remapped, 14 active+recovering; 50435 GB > data, 74790 GB used, 38642 GB / 110 TB avail; 4018849/23025448 > degraded (17.454%); recovering 14 o/s, 54732KB/s > > # idweighttype nameup/downreweight > -1130root default > -965room p1 > -344rack r14 > -422host s101 > 112osd.11up1 > 122osd.12up1 > 132osd.13up1 > 142osd.14up1 > 152osd.15up1 > 162osd.16up1 > 172osd.17up1 > 182osd.18up1 > 192osd.19up1 > 202osd.20up1 > 212osd.21up1 > -622host s102 > 332osd.33up1 > 342osd.34up1 > 352osd.35up1 > 362osd.36up1 > 372osd.37up1 > 382osd.38up1 > 392osd.39up1 > 402osd.40up1 > 412osd.41up1 > 422osd.42up1 > 432osd.43up1 > -1321rack r10 > -1221host s103 > 552osd.55up1 > 562osd.56up1 > 572osd.57up1 > 582osd.58up1 > 592osd.59down0 > 602osd.60down0 > 612osd.61down0 > 622osd.62up1 > 632osd.63up1 > 641.5osd.64up1 > 651.5osd.65down0 > -1065room p2 > -722rack r20 > -522host s202 > 222osd.22up1 > 232osd.23up1 > 242osd.24up1 > 252osd.25up1 > 262osd.26up1 > 272osd.27up1 > 282osd.28up1 > 292osd.29up1 > 302osd.30up1 > 312osd.31up1 > 322osd.32up1 > -822rack r22 > -222host s201 > 02osd.0up1 > 12osd.1up1 > 22osd.2up1 > 32osd.3up1 > 42
Re: [ceph-users] Calculate pgs
Am 21.04.2013 um 17:50 schrieb Gregory Farnum : > On Sun, Apr 21, 2013 at 12:39 AM, Stefan Priebe - Profihost AG > wrote: >> I know it is 100*numofosds/replfactor. But I also read somewhere that it >> should be a value of 2^X. It this still correct? So for 24 osds and repl 3 >> 100*24/3 => 800 => to be 2^X => 1024? > > PG counts of 2^x ensure that each PG covers exactly the same amount of > hash space. This can improve your distribution, but at reasonable PG > counts is definitely not critical — I might use that if I were going > to do less than 50 PGs/OSD like you are, but it shouldn't be > necessary. But going under 50pgs per pool is suggested at the ceph docs by deviding through rep factor. Stefan > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] "rbd snap rm" overload my cluster (during backups)
I use ceph 0.56.4 ; and to be fair, a lot of stuff are «doing badly» on my cluster, so maybe I have a general OSD problem. Le dimanche 21 avril 2013 à 08:44 -0700, Gregory Farnum a écrit : > Which version of Ceph are you running right now and seeing this with > (Sam reworked it a bit for Cuttlefish and it was in some of the dev > releases)? Snapshot deletes are a little more expensive than we'd > like, but I'm surprised they're doing this badly for you. :/ > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet > wrote: > > Hi, > > > > I have a backup script, which every night : > > * create a snapshot of each RBD image > > * then delete all snapshot that have more than 15 days > > > > The problem is that "rbd snap rm XXX" will overload my cluster for hours > > (6 hours today...). > > > > Here I see several problems : > > #1 "rbd snap rm XXX" is not blocking. The erase is done in background, > > and I know no way to verify if it was completed. So I add "sleeps" > > between rm, but I have to estimate the time it will take > > > > #2 "rbd (snap) rm" are sometimes very very slow. I don't know if it's > > because of XFS or not, but all my OSD are at 100% IO usage (reported by > > iostat) > > > > > > > > So : > > * is there a way to reduce priority of "snap rm", to avoid overloading > > of the cluster ? > > * is there a way to have a blocking "snap rm" which will wait until it's > > completed > > * is there a way to speedup "snap rm" ? > > > > > > Note that I have a too low PG number on my cluster (200 PG for 40 active > > OSD ; but I'm trying to progressivly migrate data to a newer pool). Can > > it be the source of the problem ? > > > > Thanks, > > > > Olivier > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX
I like s3cmd, but it allows you only manipulate buckets with at least one capital letter On Sun, Apr 21, 2013 at 2:05 PM, Igor Laskovy wrote: > Just initial connect to rgw server, nothing further. > Please see below behavior for CrossFTP and S3Browser cases. > > On CrossFTP side: > [R1] Connect to rgw.labspace > [R1] Current path: / > [R1] Current path: / > [R1] LIST / > [R1] Expected XML document response from S3 but received content type > text/html > [R1] Disconnected > > On rgw side: > root@osd01:~# ps aux |grep rados > root 1785 0.4 0.1 2045404 6068 ?Ssl 19:47 0:00 > /usr/bin/radosgw -n client.radosgw.a > > root@osd01:~# tail -f /var/log/apache2/error.log > [Sun Apr 21 19:43:56 2013] [notice] FastCGI: process manager initialized > (pid 1433) > [Sun Apr 21 19:43:56 2013] [notice] Apache/2.2.22 (Ubuntu) > mod_fastcgi/mod_fastcgi-SNAP-0910052141 mod_ssl/2.2.22 OpenSSL/1.0.1 > configured -- resuming normal operations > [Sun Apr 21 19:50:19 2013] [error] [client 192.168.1.51] File does not > exist: /var/www/favicon.ico > > tail -f /var/log/apache2/access.log > nothing > > On S3browser side: > [image: Inline image 2] > [4/21/2013 7:56 PM] Getting buckets list... TaskID: 2 > [4/21/2013 7:56 PM] System.Net.WebException:The underlying connection was > closed: An unexpected error occurred on a send. TaskID: 2 TaskID: 2 > [4/21/2013 7:56 PM] Error occurred during Getting buckets list TaskID: 2 > > On rgw side: > > root@osd01:~# tail -f /var/log/apache2/error.log > [Sun Apr 21 19:56:19 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:22 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:23 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:23 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:24 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:24 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:25 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:25 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > > tail -f /var/log/apache2/access.log > nothing > > > > On Sun, Apr 21, 2013 at 7:43 PM, Yehuda Sadeh wrote: > >> On Sun, Apr 21, 2013 at 9:39 AM, Igor Laskovy >> wrote: >> > Well, in each case something specific. For CrossFTP, for example, it >> says >> > that asking the server it receive text data instead of XML. >> >> When doing what? Are you able to do anything? >> >> > In logs on servers side I don't found something interested. >> >> What do the apache access and error logs show? >> >> > >> > I do everything shown at http://ceph.com/docs/master/radosgw/ and only >> that, >> > excluding swift compatible preparation. >> > May be there are needs something additional? Manual creating of root >> bucket >> > or something like that? >> > >> > >> > On Sun, Apr 21, 2013 at 6:53 PM, Yehuda Sadeh >> wrote: >> >> >> >> On Sun, Apr 21, 2013 at 3:02 AM, Igor Laskovy >> >> wrote: >> >> > A little bit more. >> >> > >> >> > I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and >> >> > than >> >> > connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all >> >> > unsuccessfully. >> >> > >> >> > Again my question, does anybody use S3 desktop clients with RGW? >> >> >> >> These applications should be compatible with rgw. Are you sure your >> >> setup works? What are you getting? >> >> >> >> Yehuda >> > >> > >> > >> > >> > -- >> > Igor Laskovy >> > facebook.com/igor.laskovy >> > studiogrizzly.com >> > > > > -- > Igor Laskovy > facebook.com/igor.laskovy > studiogrizzly.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX
On Sun, Apr 21, 2013 at 10:05 AM, Igor Laskovy wrote: > > Just initial connect to rgw server, nothing further. > Please see below behavior for CrossFTP and S3Browser cases. > > On CrossFTP side: > [R1] Connect to rgw.labspace > [R1] Current path: / > [R1] Current path: / > [R1] LIST / > [R1] Expected XML document response from S3 but received content type > text/html > [R1] Disconnected > > On rgw side: > root@osd01:~# ps aux |grep rados > root 1785 0.4 0.1 2045404 6068 ?Ssl 19:47 0:00 > /usr/bin/radosgw -n client.radosgw.a > > root@osd01:~# tail -f /var/log/apache2/error.log > [Sun Apr 21 19:43:56 2013] [notice] FastCGI: process manager initialized (pid > 1433) > [Sun Apr 21 19:43:56 2013] [notice] Apache/2.2.22 (Ubuntu) > mod_fastcgi/mod_fastcgi-SNAP-0910052141 mod_ssl/2.2.22 OpenSSL/1.0.1 > configured -- resuming normal operations > [Sun Apr 21 19:50:19 2013] [error] [client 192.168.1.51] File does not exist: > /var/www/favicon.ico Doesn't seem that your apache is configured right. How does your site config file look like? Do you have any other sites configured (e.g., the default one)? try listing whatever under /etc/apache2/sites-enabled, see if there's anything else there. > > tail -f /var/log/apache2/access.log > nothing > > On S3browser side: > > [4/21/2013 7:56 PM] Getting buckets list... TaskID: 2 > [4/21/2013 7:56 PM] System.Net.WebException:The underlying connection was > closed: An unexpected error occurred on a send. TaskID: 2 TaskID: 2 > [4/21/2013 7:56 PM] Error occurred during Getting buckets list TaskID: 2 > > On rgw side: > > root@osd01:~# tail -f /var/log/apache2/error.log > [Sun Apr 21 19:56:19 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:22 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:23 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:23 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:24 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:24 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:25 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > [Sun Apr 21 19:56:25 2013] [error] [client 192.168.1.51] Invalid method in > request \x16\x03\x01 > > tail -f /var/log/apache2/access.log > nothing > > > > On Sun, Apr 21, 2013 at 7:43 PM, Yehuda Sadeh wrote: >> >> On Sun, Apr 21, 2013 at 9:39 AM, Igor Laskovy wrote: >> > Well, in each case something specific. For CrossFTP, for example, it says >> > that asking the server it receive text data instead of XML. >> >> When doing what? Are you able to do anything? >> >> > In logs on servers side I don't found something interested. >> >> What do the apache access and error logs show? >> >> > >> > I do everything shown at http://ceph.com/docs/master/radosgw/ and only >> > that, >> > excluding swift compatible preparation. >> > May be there are needs something additional? Manual creating of root bucket >> > or something like that? >> > >> > >> > On Sun, Apr 21, 2013 at 6:53 PM, Yehuda Sadeh wrote: >> >> >> >> On Sun, Apr 21, 2013 at 3:02 AM, Igor Laskovy >> >> wrote: >> >> > A little bit more. >> >> > >> >> > I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and >> >> > than >> >> > connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all >> >> > unsuccessfully. >> >> > >> >> > Again my question, does anybody use S3 desktop clients with RGW? >> >> >> >> These applications should be compatible with rgw. Are you sure your >> >> setup works? What are you getting? >> >> >> >> Yehuda >> > >> > >> > >> > >> > -- >> > Igor Laskovy >> > facebook.com/igor.laskovy >> > studiogrizzly.com > > > > > -- > Igor Laskovy > facebook.com/igor.laskovy > studiogrizzly.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com