[ceph-users] Delete default pools?

2013-04-21 Thread Stefan Priebe - Profihost AG
Is it safe to delete all default pools?

Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Calculate pgs

2013-04-21 Thread Stefan Priebe - Profihost AG
I know it is 100*numofosds/replfactor. But I also read somewhere that it should 
be a value of 2^X. It this still correct? So for 24 osds and repl 3 100*24/3 => 
800 => to be 2^X => 1024? 

Greets
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] "rbd snap rm" overload my cluster (during backups)

2013-04-21 Thread Olivier Bonvalet
Hi,

I have a backup script, which every night :
* create a snapshot of each RBD image
* then delete all snapshot that have more than 15 days

The problem is that "rbd snap rm XXX" will overload my cluster for hours
(6 hours today...).

Here I see several problems :
#1 "rbd snap rm XXX" is not blocking. The erase is done in background,
and I know no way to verify if it was completed. So I add "sleeps"
between rm, but I have to estimate the time it will take

#2 "rbd (snap) rm" are sometimes very very slow. I don't know if it's
because of XFS or not, but all my OSD are at 100% IO usage (reported by
iostat)



So :
* is there a way to reduce priority of "snap rm", to avoid overloading
of the cluster ?
* is there a way to have a blocking "snap rm" which will wait until it's
completed
* is there a way to speedup "snap rm" ?


Note that I have a too low PG number on my cluster (200 PG for 40 active
OSD ; but I'm trying to progressivly migrate data to a newer pool). Can
it be the source of the problem ?

Thanks,

Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX

2013-04-21 Thread Igor Laskovy
A little bit more.

I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and than
connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all
unsuccessfully.

Again my question, does anybody use S3 desktop clients with RGW?


On Fri, Apr 19, 2013 at 10:54 PM, Igor Laskovy wrote:

> Hello!
>
> Does anybody use Rados Gateway via S3-compatible clients on desktop
> systems?
>
> --
> Igor Laskovy
> facebook.com/igor.laskovy
> studiogrizzly.com
>



-- 
Igor Laskovy
facebook.com/igor.laskovy
studiogrizzly.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Way to copy rbd Disk image incl snapshots?

2013-04-21 Thread Stefan Priebe - Profihost AG
Hi,

is there a way to copy a rbd disk image incl snapshots from one pool to another?

Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD performance test (write) problem

2013-04-21 Thread Kelvin_Huang
Hi all,
Have any comments about  1) or 2) ?

Thanks!!!



Hi Mark,

Sorry, reply too late, because I didn't receive this mail so missed this 
message in the several days...
http://www.mail-archive.com/ceph-users@lists.ceph.com/msg00624.html


Your advice are very very helpful to me !!! thanks :)

I have done the following test and have some questions


1)  I concurrently use dd if=/dev/zero of=/dev/sd[b,c,d,e,f ...n] bs=4096k 
count=1 oflag=direct , on each SATA disk

collectl show:
#<--Disks---><--Network-->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut
   0   0  2935636  0  0 866560   2708  1  9  0   1
   0   0  2939718  0  0 865620   2708  2 14  1   4
   0   0  2872631  0  0 868480   2714  1  8  0   1
   0   0  2937621  0  0 864640   2702  1  9  0   4

total write throughput about 860MB/s

use RADOS bench : rados -p rbd bench 300 write -t 256
collectl show:
#<--Disks---><--Network-->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut
  22  10  6991  17947  4  1   999K   3111  4 31 48  22
  18   8  6151  16116  0  0  1003K   2858  8 40 23  37
  19   9  6295  16031  8  2  1002K   2458  2 22 44  17

total write throughput about 1000MB/s

the expander backplane running at 3.0Gb/s and 4-lane Mini-SAS port to connect : 
4 * 3Gb/s = 12Gb/s ~= 1GB/s, so I think write throughput stuck on 1000MB/s due 
to expander backplane that is bottleneck for sequential writes.
If expander backplane can running at 6.0Gb/s then total write throughput will 
increase right?




2)
OSDs & journal setting:
a. OSDs filesystems are EXT4  , no use osd mkfs options
osd mkfs type = ext4
osd mount options ext4 = 
rw,data=writeback,errors=remount-ro,noatime,nodiratime,user_xattr
filestore_xattr_use_omap = true

b. SSDs journal are raw disk that don't has filesystem and divided into two 
partition (alignment)

LSI MegaRAID SAS 9260-4i setting:
a. every HDD : RAID0 , Write Policy: Write Back with BBU, Read Policy: 
ReadAhead, IO Policy: Direct, Disk cache: unchanged
b. every SSD  : RAID0 , Write Policy: Write Through, Read Policy: NoReadAhead, 
IO Policy: Direct, Disk cache: disabled

Because the last result are pool size=576, so i did a new test for pool 
size=2048 and 9 OSDs + 4 SSDs configuration !!

Read: rados -p testpool bench 300 seq -t 256
Write: rados  -p testpool bench 300 write -t 256 --no-cleanup


Rados Bench TEST (Read):
2x replication & 12 OSDs case: Bandwidth (MB/sec): Bandwidth (MB/sec):
1373.013
1x replication & 12 OSDs case: Bandwidth (MB/sec): Bandwidth (MB/sec):
1478.694

2x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec):1442.543
1x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec):1448.407

2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):1485.175
1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec):447.245



Rados Bench TEST (Write):
2x replication & 12 OSDs case: Bandwidth (MB/sec): 228.064
1x replication & 12 OSDs case: Bandwidth (MB/sec): 457.213

2x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec): 224.430
1x replication & 8 OSDs + 4 SSDs (Journal) case: Bandwidth (MB/sec): 482.104

2x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 239
1x replication & 9 OSDs + 3 SSDs (Journal) case: Bandwidth (MB/sec): 485

In Rados Bench TEST (Read), I originally expected more OSDs can increase the 
read bandwidth but the results show about 1400MB/s on most case, this is cache 
intervention? because i didn't see any read operation on disk ...

In Rados Bench TEST (Write), I test 9 OSDs + 3 SSDs (Journal) configuration and 
observing by collectl

# DISK STATISTICS (/sec)
#  
<-reads-><-writes-> 
Pct
#Name   KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  
Wait SvcTim Util
sdc  0  000   47336382  335  141 141 0 
2  07
sdd  0  000   65600  0  240  273 273 1 
4  07
sde  0  000   56440  0  207  273 27264   
342  4   99
sdg  0  000   43544450  326  134 13339   
135  3   99
sdf  0  000   65600  0  240  273 273 0 
2  07
sdh  0  000   57400  0  210  273 273 0 
2  07
sdi  0  000   69012227  251  275 27490   
560  3   99
sdj  0  000   66944424  308  217 217 1 
5  07
sdb  0  000

Re: [ceph-users] Cephfs unaccessible

2013-04-21 Thread Marco Aroldi
So, I've restarted the new osds as many as possible and the cluster
started to move data to the 2 new nodes overnight.
This morning there was not netowrk traffic and the healt was

HEALTH_ERR 1323 pgs backfill; 150 pgs backfill_toofull; 100 pgs
backfilling; 114 pgs degraded; 3374 pgs peering; 36 pgs recovering;
949 pgs recovery_wait; 3374 pgs stuck inactive; 6289 pgs stuck
unclean; recovery 2130652/20890113 degraded (10.199%); 58/8914654
unfound (0.001%); 1 full osd(s); 22 near full osd(s); full,noup,nodown
flag(s) set

So I have unset the noup and nodown flags and the data started movin again
I've increased the full ratio to 97% so now there's no "official" full
osd and the HEALTH_ERR became HEALT_WARN

However, still no access to filesystem

HEALTH_WARN 1906 pgs backfill; 21 pgs backfill_toofull; 52 pgs
backfilling; 707 pgs degraded; 371 pgs down; 97 pgs incomplete; 3385
pgs peering; 35 pgs recovering; 1002 pgs recovery_wait; 4 pgs stale;
683 pgs stuck inactive; 5898 pgs stuck unclean; recovery
3081499/22208859 degraded (13.875%); 487/9433642 unfound (0.005%);
recovering 11722 o/s, 57040MB/s; 17 near full osd(s)

The osd are flapping in/out again...

I'm disposed to start deleting some portion of data.
What can I try to do now?

2013/4/21 Gregory Farnum :
> It's not entirely clear from your description and the output you've
> given us, but it looks like maybe you've managed to bring up all your
> OSDs correctly at this point? Or are they just not reporting down
> because you set the "no down" flag...
>
> In any case, CephFS isn't going to come up while the underlying RADOS
> cluster is this unhealthy, so you're going to need to get that going
> again. Since your OSDs have managed to get themselves so full it's
> going to be trickier than normal, but if all the rebalancing that's
> happening is only because you sort-of-didn't-really lose nodes, and
> you can bring them all back up, you should be able to sort it out by
> getting all the nodes back up, and then changing your full percentages
> (by a *very small* amount); since you haven't been doing any writes to
> the cluster it shouldn't take much data writes to get everything back
> where it was, although if this has been continuing to backfill in the
> meanwhile that will need to unwind.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Sat, Apr 20, 2013 at 12:21 PM, John Wilkins  
> wrote:
>> I don't see anything related to lost objects in your output. I just see
>> waiting on backfill, backfill_toofull, remapped, and so forth. You can read
>> a bit about what is going on here:
>> http://ceph.com/docs/next/rados/operations/monitoring-osd-pg/
>>
>> Keep us posted as to the recovery, and let me know what I can do to improve
>> the docs for scenarios like this.
>>
>>
>> On Sat, Apr 20, 2013 at 10:52 AM, Marco Aroldi 
>> wrote:
>>>
>>> John,
>>> thanks for the quick reply.
>>> Below you can see my ceph osd tree
>>> The problem is caused not by the failure itself, but by the "renamed"
>>> bunch of devices.
>>> It was like a deadly 15-puzzle
>>> I think that the solution was to mount the devices in fstab using UUID
>>> (/dev/disk/by-uuid) instead of /dev/sdX
>>>
>>> However, yes I have an entry in my ceph.conf (devs = /dev/sdX1 --
>>> osd_journal = /dev/sdX2) *and* an entry in my fstab for each OSD
>>>
>>> The node with failed disk is s103 (osd.59)
>>>
>>> Now i have 5 osd from s203 up and in to try to let ceph rebalance
>>> data... but is still a bloody mess.
>>> Look at ceph -w output: is reported a total of 110TB: is wrong... al
>>> drives are 2TB and i have 49 drives up and in -- total 98Tb
>>> I think that 110TB (55 osd) was the size before cluster became
>>> inaccessible
>>>
>>> # idweighttype nameup/downreweight
>>> -1130root default
>>> -965room p1
>>> -344rack r14
>>> -422host s101
>>> 112osd.11up1
>>> 122osd.12up1
>>> 132osd.13up1
>>> 142osd.14up1
>>> 152osd.15up1
>>> 162osd.16up1
>>> 172osd.17up1
>>> 182osd.18up1
>>> 192osd.19up1
>>> 202osd.20up1
>>> 212osd.21up1
>>> -622host s102
>>> 332osd.33up1
>>> 342osd.34up1
>>> 352osd.35up1
>>> 362osd.36up1
>>> 372osd.37up1
>>> 382osd.38up1
>>> 392osd.39up1
>>> 402osd.40up1
>>> 412osd.41up1
>>> 422osd.42up1
>>> 432  

Re: [ceph-users] Delete default pools?

2013-04-21 Thread Gregory Farnum
On Sun, Apr 21, 2013 at 12:35 AM, Stefan Priebe - Profihost AG
 wrote:
> Is it safe to delete all default pools?

As long as you don't have any data you need in there; the system won't
break without them or anything like that. They're favored only in that
tools default to using them (eg the rbd tools default to using the rbd
pool).
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "rbd snap rm" overload my cluster (during backups)

2013-04-21 Thread Gregory Farnum
Which version of Ceph are you running right now and seeing this with
(Sam reworked it a bit for Cuttlefish and it was in some of the dev
releases)? Snapshot deletes are a little more expensive than we'd
like, but I'm surprised they're doing this badly for you. :/
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet
 wrote:
> Hi,
>
> I have a backup script, which every night :
> * create a snapshot of each RBD image
> * then delete all snapshot that have more than 15 days
>
> The problem is that "rbd snap rm XXX" will overload my cluster for hours
> (6 hours today...).
>
> Here I see several problems :
> #1 "rbd snap rm XXX" is not blocking. The erase is done in background,
> and I know no way to verify if it was completed. So I add "sleeps"
> between rm, but I have to estimate the time it will take
>
> #2 "rbd (snap) rm" are sometimes very very slow. I don't know if it's
> because of XFS or not, but all my OSD are at 100% IO usage (reported by
> iostat)
>
>
>
> So :
> * is there a way to reduce priority of "snap rm", to avoid overloading
> of the cluster ?
> * is there a way to have a blocking "snap rm" which will wait until it's
> completed
> * is there a way to speedup "snap rm" ?
>
>
> Note that I have a too low PG number on my cluster (200 PG for 40 active
> OSD ; but I'm trying to progressivly migrate data to a newer pool). Can
> it be the source of the problem ?
>
> Thanks,
>
> Olivier
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Way to copy rbd Disk image incl snapshots?

2013-04-21 Thread Gregory Farnum
On Sun, Apr 21, 2013 at 5:01 AM, Stefan Priebe - Profihost AG
 wrote:
> Hi,
>
> is there a way to copy a rbd disk image incl snapshots from one pool to 
> another?

Not directly and not right now, sorry. What are you trying to do?
Would it suffice for instance to manually create all the snapshots you
care about by copying the first, snapshotting, copying the changes in
the second, snapshotting, etc?
I believe there's work going on to make this easier for the Dumpling release.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Calculate pgs

2013-04-21 Thread Gregory Farnum
On Sun, Apr 21, 2013 at 12:39 AM, Stefan Priebe - Profihost AG
 wrote:
> I know it is 100*numofosds/replfactor. But I also read somewhere that it 
> should be a value of 2^X. It this still correct? So for 24 osds and repl 3 
> 100*24/3 => 800 => to be 2^X => 1024?

PG counts of 2^x ensure that each PG covers exactly the same amount of
hash space. This can improve your distribution, but at reasonable PG
counts is definitely not critical — I might use that if I were going
to do less than 50 PGs/OSD like you are, but it shouldn't be
necessary.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX

2013-04-21 Thread Yehuda Sadeh
On Sun, Apr 21, 2013 at 3:02 AM, Igor Laskovy  wrote:
> A little bit more.
>
> I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and than
> connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all
> unsuccessfully.
>
> Again my question, does anybody use S3 desktop clients with RGW?

These applications should be compatible with rgw. Are you sure your
setup works? What are you getting?

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Delete default pools?

2013-04-21 Thread Stefan Priebe - Profihost AG
Am 21.04.2013 um 17:41 schrieb Gregory Farnum :

> On Sun, Apr 21, 2013 at 12:35 AM, Stefan Priebe - Profihost AG
>  wrote:
>> Is it safe to delete all default pools?
> 
> As long as you don't have any data you need in there; the system won't
> break without them or anything like that. They're favored only in that
> tools default to using them (eg the rbd tools default to using the rbd
> pool).

Thanks! No data in it.

> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Way to copy rbd Disk image incl snapshots?

2013-04-21 Thread Stefan Priebe - Profihost AG
Am 21.04.2013 um 17:47 schrieb Gregory Farnum :

> On Sun, Apr 21, 2013 at 5:01 AM, Stefan Priebe - Profihost AG
>  wrote:
>> Hi,
>> 
>> is there a way to copy a rbd disk image incl snapshots from one pool to 
>> another?
> 
> Not directly and not right now, sorry. What are you trying to do?
> Would it suffice for instance to manually create all the snapshots you
> care about by copying the first, snapshotting, copying the changes in
> the second, snapshotting, etc?
> I believe there's work going on to make this easier for the Dumpling release.

I've 8192 pgs for 24 osds and repl 3 I thought this is way too much. So my idea 
was to create a new pool and copy every disk...

Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX

2013-04-21 Thread Igor Laskovy
Well, in each case something specific. For CrossFTP, for example, it says
that asking the server it receive text data instead of XML.
In logs on servers side I don't found something interested.

I do everything shown at http://ceph.com/docs/master/radosgw/ and only
that, excluding swift compatible preparation.
May be there are needs something additional? Manual creating of root bucket
or something like that?


On Sun, Apr 21, 2013 at 6:53 PM, Yehuda Sadeh  wrote:

> On Sun, Apr 21, 2013 at 3:02 AM, Igor Laskovy 
> wrote:
> > A little bit more.
> >
> > I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and
> than
> > connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all
> > unsuccessfully.
> >
> > Again my question, does anybody use S3 desktop clients with RGW?
>
> These applications should be compatible with rgw. Are you sure your
> setup works? What are you getting?
>
> Yehuda
>



-- 
Igor Laskovy
facebook.com/igor.laskovy
studiogrizzly.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX

2013-04-21 Thread Yehuda Sadeh
On Sun, Apr 21, 2013 at 9:39 AM, Igor Laskovy  wrote:
> Well, in each case something specific. For CrossFTP, for example, it says
> that asking the server it receive text data instead of XML.

When doing what? Are you able to do anything?

> In logs on servers side I don't found something interested.

What do the apache access and error logs show?

>
> I do everything shown at http://ceph.com/docs/master/radosgw/ and only that,
> excluding swift compatible preparation.
> May be there are needs something additional? Manual creating of root bucket
> or something like that?
>
>
> On Sun, Apr 21, 2013 at 6:53 PM, Yehuda Sadeh  wrote:
>>
>> On Sun, Apr 21, 2013 at 3:02 AM, Igor Laskovy 
>> wrote:
>> > A little bit more.
>> >
>> > I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and
>> > than
>> > connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all
>> > unsuccessfully.
>> >
>> > Again my question, does anybody use S3 desktop clients with RGW?
>>
>> These applications should be compatible with rgw. Are you sure your
>> setup works? What are you getting?
>>
>> Yehuda
>
>
>
>
> --
> Igor Laskovy
> facebook.com/igor.laskovy
> studiogrizzly.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cephfs unaccessible

2013-04-21 Thread Marco Aroldi
What I can try to do/delete to regain access?
Those osd are crazy, flapping up and down. I think that the situation
is without control


HEALTH_WARN 2735 pgs backfill; 13 pgs backfill_toofull; 157 pgs
backfilling; 188 pgs degraded; 251 pgs peering; 13 pgs recovering;
1159 pgs recovery_wait; 159 pgs stuck inactive; 4641 pgs stuck
unclean; recovery 4007916/23007073 degraded (17.420%);  recovering 4
o/s, 31927KB/s; 19 near full osd(s)

2013-04-21 18:56:46.839851 mon.0 [INF] pgmap v1399007: 17280 pgs: 276
active, 12791 active+clean, 2575 active+remapped+wait_backfill, 71
active+degraded+wait_backfill, 6
active+remapped+wait_backfill+backfill_toofull, 1121
active+recovery_wait, 90 peering, 3 remapped, 1 active+remapped, 127
active+remapped+backfilling, 1 active+degraded, 5
active+remapped+backfill_toofull, 19 active+degraded+backfilling, 1
active+clean+scrubbing, 79 active+degraded+remapped+wait_backfill, 36
active+recovery_wait+remapped, 1
active+degraded+remapped+wait_backfill+backfill_toofull, 46
remapped+peering, 16 active+degraded+remapped+backfilling, 1
active+recovery_wait+degraded+remapped, 14 active+recovering; 50435 GB
data, 74790 GB used, 38642 GB / 110 TB avail; 4018849/23025448
degraded (17.454%);  recovering 14 o/s, 54732KB/s

# idweighttype nameup/downreweight
-1130root default
-965room p1
-344rack r14
-422host s101
112osd.11up1
122osd.12up1
132osd.13up1
142osd.14up1
152osd.15up1
162osd.16up1
172osd.17up1
182osd.18up1
192osd.19up1
202osd.20up1
212osd.21up1
-622host s102
332osd.33up1
342osd.34up1
352osd.35up1
362osd.36up1
372osd.37up1
382osd.38up1
392osd.39up1
402osd.40up1
412osd.41up1
422osd.42up1
432osd.43up1
-1321rack r10
-1221host s103
552osd.55up1
562osd.56up1
572osd.57up1
582osd.58up1
592osd.59down0
602osd.60down0
612osd.61down0
622osd.62up1
632osd.63up1
641.5osd.64up1
651.5osd.65down0
-1065room p2
-722rack r20
-522host s202
222osd.22up1
232osd.23up1
242osd.24up1
252osd.25up1
262osd.26up1
272osd.27up1
282osd.28up1
292osd.29up1
302osd.30up1
312osd.31up1
322osd.32up1
-822rack r22
-222host s201
02osd.0up1
12osd.1up1
22osd.2up1
32osd.3up1
42osd.4up1
52osd.5up1
62osd.6up1
72osd.7up1
82osd.8up1
92osd.9up1
102osd.10up1
-1421rack r21
-1121host s203
442osd.44up1
452osd.45up1
462osd.46up1
472osd.47up1
482osd.48up1
492osd.49up1
502osd.50up1
512osd.51up1
521.5osd.52up1
531.5osd.53up1
542osd.54up1


2013/4/21 Marco Aroldi :
> So, I've restarted the new osds as many as possible and the cluster
> started to move data to the 2 new nodes overnight.
> This morning there was not netowrk traffic and the healt was
>
> HEALTH_ERR 1323 pgs backfill; 150 pgs backfill_toofull; 100 pgs
> backfilling; 11

Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX

2013-04-21 Thread Igor Laskovy
Just initial connect to rgw server, nothing further.
Please see below behavior for CrossFTP and S3Browser cases.

On CrossFTP side:
[R1] Connect to rgw.labspace
[R1] Current path: /
[R1] Current path: /
[R1] LIST /
[R1] Expected XML document response from S3 but received content type
text/html
[R1] Disconnected

On rgw side:
root@osd01:~# ps aux |grep rados
root  1785  0.4  0.1 2045404 6068 ?Ssl  19:47   0:00
/usr/bin/radosgw -n client.radosgw.a

root@osd01:~# tail -f /var/log/apache2/error.log
[Sun Apr 21 19:43:56 2013] [notice] FastCGI: process manager initialized
(pid 1433)
[Sun Apr 21 19:43:56 2013] [notice] Apache/2.2.22 (Ubuntu)
mod_fastcgi/mod_fastcgi-SNAP-0910052141 mod_ssl/2.2.22 OpenSSL/1.0.1
configured -- resuming normal operations
[Sun Apr 21 19:50:19 2013] [error] [client 192.168.1.51] File does not
exist: /var/www/favicon.ico

tail -f /var/log/apache2/access.log
nothing

On S3browser side:
[image: Inline image 2]
[4/21/2013 7:56 PM] Getting buckets list... TaskID: 2
[4/21/2013 7:56 PM] System.Net.WebException:The underlying connection was
closed: An unexpected error occurred on a send. TaskID: 2 TaskID: 2
[4/21/2013 7:56 PM] Error occurred during Getting buckets list TaskID: 2

On rgw side:

root@osd01:~# tail -f /var/log/apache2/error.log
[Sun Apr 21 19:56:19 2013] [error] [client 192.168.1.51] Invalid method in
request \x16\x03\x01
[Sun Apr 21 19:56:22 2013] [error] [client 192.168.1.51] Invalid method in
request \x16\x03\x01
[Sun Apr 21 19:56:23 2013] [error] [client 192.168.1.51] Invalid method in
request \x16\x03\x01
[Sun Apr 21 19:56:23 2013] [error] [client 192.168.1.51] Invalid method in
request \x16\x03\x01
[Sun Apr 21 19:56:24 2013] [error] [client 192.168.1.51] Invalid method in
request \x16\x03\x01
[Sun Apr 21 19:56:24 2013] [error] [client 192.168.1.51] Invalid method in
request \x16\x03\x01
[Sun Apr 21 19:56:25 2013] [error] [client 192.168.1.51] Invalid method in
request \x16\x03\x01
[Sun Apr 21 19:56:25 2013] [error] [client 192.168.1.51] Invalid method in
request \x16\x03\x01

tail -f /var/log/apache2/access.log
nothing



On Sun, Apr 21, 2013 at 7:43 PM, Yehuda Sadeh  wrote:

> On Sun, Apr 21, 2013 at 9:39 AM, Igor Laskovy 
> wrote:
> > Well, in each case something specific. For CrossFTP, for example, it says
> > that asking the server it receive text data instead of XML.
>
> When doing what? Are you able to do anything?
>
> > In logs on servers side I don't found something interested.
>
> What do the apache access and error logs show?
>
> >
> > I do everything shown at http://ceph.com/docs/master/radosgw/ and only
> that,
> > excluding swift compatible preparation.
> > May be there are needs something additional? Manual creating of root
> bucket
> > or something like that?
> >
> >
> > On Sun, Apr 21, 2013 at 6:53 PM, Yehuda Sadeh 
> wrote:
> >>
> >> On Sun, Apr 21, 2013 at 3:02 AM, Igor Laskovy 
> >> wrote:
> >> > A little bit more.
> >> >
> >> > I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and
> >> > than
> >> > connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all
> >> > unsuccessfully.
> >> >
> >> > Again my question, does anybody use S3 desktop clients with RGW?
> >>
> >> These applications should be compatible with rgw. Are you sure your
> >> setup works? What are you getting?
> >>
> >> Yehuda
> >
> >
> >
> >
> > --
> > Igor Laskovy
> > facebook.com/igor.laskovy
> > studiogrizzly.com
>



-- 
Igor Laskovy
facebook.com/igor.laskovy
studiogrizzly.com
<>___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cephfs unaccessible

2013-04-21 Thread Marco Aroldi
Greg, your supposition about the small amount data to be written is
right but the rebalance is writing an insane amount of data to the new
nodes and the mount is not working again

this is the node S203 (the os is on /dev/sdl, not listed)

/dev/sda1   1.9T  467G  1.4T  26% /var/lib/ceph/osd/ceph-44
/dev/sdb1   1.9T  595G  1.3T  33% /var/lib/ceph/osd/ceph-45
/dev/sdc1   1.9T  396G  1.5T  22% /var/lib/ceph/osd/ceph-46
/dev/sdd1   1.9T  401G  1.5T  22% /var/lib/ceph/osd/ceph-47
/dev/sde1   1.9T  337G  1.5T  19% /var/lib/ceph/osd/ceph-48
/dev/sdf1   1.9T  441G  1.4T  24% /var/lib/ceph/osd/ceph-49
/dev/sdg1   1.9T  338G  1.5T  19% /var/lib/ceph/osd/ceph-50
/dev/sdh1   1.9T  359G  1.5T  20% /var/lib/ceph/osd/ceph-51
/dev/sdi1   1.4T  281G  1.1T  21% /var/lib/ceph/osd/ceph-52
/dev/sdj1   1.4T  423G  964G  31% /var/lib/ceph/osd/ceph-53
/dev/sdk1   1.9T  421G  1.4T  23% /var/lib/ceph/osd/ceph-54

2013/4/21 Marco Aroldi :
> What I can try to do/delete to regain access?
> Those osd are crazy, flapping up and down. I think that the situation
> is without control
>
>
> HEALTH_WARN 2735 pgs backfill; 13 pgs backfill_toofull; 157 pgs
> backfilling; 188 pgs degraded; 251 pgs peering; 13 pgs recovering;
> 1159 pgs recovery_wait; 159 pgs stuck inactive; 4641 pgs stuck
> unclean; recovery 4007916/23007073 degraded (17.420%);  recovering 4
> o/s, 31927KB/s; 19 near full osd(s)
>
> 2013-04-21 18:56:46.839851 mon.0 [INF] pgmap v1399007: 17280 pgs: 276
> active, 12791 active+clean, 2575 active+remapped+wait_backfill, 71
> active+degraded+wait_backfill, 6
> active+remapped+wait_backfill+backfill_toofull, 1121
> active+recovery_wait, 90 peering, 3 remapped, 1 active+remapped, 127
> active+remapped+backfilling, 1 active+degraded, 5
> active+remapped+backfill_toofull, 19 active+degraded+backfilling, 1
> active+clean+scrubbing, 79 active+degraded+remapped+wait_backfill, 36
> active+recovery_wait+remapped, 1
> active+degraded+remapped+wait_backfill+backfill_toofull, 46
> remapped+peering, 16 active+degraded+remapped+backfilling, 1
> active+recovery_wait+degraded+remapped, 14 active+recovering; 50435 GB
> data, 74790 GB used, 38642 GB / 110 TB avail; 4018849/23025448
> degraded (17.454%);  recovering 14 o/s, 54732KB/s
>
> # idweighttype nameup/downreweight
> -1130root default
> -965room p1
> -344rack r14
> -422host s101
> 112osd.11up1
> 122osd.12up1
> 132osd.13up1
> 142osd.14up1
> 152osd.15up1
> 162osd.16up1
> 172osd.17up1
> 182osd.18up1
> 192osd.19up1
> 202osd.20up1
> 212osd.21up1
> -622host s102
> 332osd.33up1
> 342osd.34up1
> 352osd.35up1
> 362osd.36up1
> 372osd.37up1
> 382osd.38up1
> 392osd.39up1
> 402osd.40up1
> 412osd.41up1
> 422osd.42up1
> 432osd.43up1
> -1321rack r10
> -1221host s103
> 552osd.55up1
> 562osd.56up1
> 572osd.57up1
> 582osd.58up1
> 592osd.59down0
> 602osd.60down0
> 612osd.61down0
> 622osd.62up1
> 632osd.63up1
> 641.5osd.64up1
> 651.5osd.65down0
> -1065room p2
> -722rack r20
> -522host s202
> 222osd.22up1
> 232osd.23up1
> 242osd.24up1
> 252osd.25up1
> 262osd.26up1
> 272osd.27up1
> 282osd.28up1
> 292osd.29up1
> 302osd.30up1
> 312osd.31up1
> 322osd.32up1
> -822rack r22
> -222host s201
> 02osd.0up1
> 12osd.1up1
> 22osd.2up1
> 32osd.3up1
> 42 

Re: [ceph-users] Calculate pgs

2013-04-21 Thread Stefan Priebe - Profihost AG

Am 21.04.2013 um 17:50 schrieb Gregory Farnum :

> On Sun, Apr 21, 2013 at 12:39 AM, Stefan Priebe - Profihost AG
>  wrote:
>> I know it is 100*numofosds/replfactor. But I also read somewhere that it 
>> should be a value of 2^X. It this still correct? So for 24 osds and repl 3 
>> 100*24/3 => 800 => to be 2^X => 1024?
> 
> PG counts of 2^x ensure that each PG covers exactly the same amount of
> hash space. This can improve your distribution, but at reasonable PG
> counts is definitely not critical — I might use that if I were going
> to do less than 50 PGs/OSD like you are, but it shouldn't be
> necessary.

But going under 50pgs per pool is suggested at the ceph docs by deviding 
through rep factor.

Stefan

> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "rbd snap rm" overload my cluster (during backups)

2013-04-21 Thread Olivier Bonvalet
I use ceph 0.56.4 ; and to be fair, a lot of stuff are «doing badly» on
my cluster, so maybe I have a general OSD problem.


Le dimanche 21 avril 2013 à 08:44 -0700, Gregory Farnum a écrit :
> Which version of Ceph are you running right now and seeing this with
> (Sam reworked it a bit for Cuttlefish and it was in some of the dev
> releases)? Snapshot deletes are a little more expensive than we'd
> like, but I'm surprised they're doing this badly for you. :/
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 
> On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet
>  wrote:
> > Hi,
> >
> > I have a backup script, which every night :
> > * create a snapshot of each RBD image
> > * then delete all snapshot that have more than 15 days
> >
> > The problem is that "rbd snap rm XXX" will overload my cluster for hours
> > (6 hours today...).
> >
> > Here I see several problems :
> > #1 "rbd snap rm XXX" is not blocking. The erase is done in background,
> > and I know no way to verify if it was completed. So I add "sleeps"
> > between rm, but I have to estimate the time it will take
> >
> > #2 "rbd (snap) rm" are sometimes very very slow. I don't know if it's
> > because of XFS or not, but all my OSD are at 100% IO usage (reported by
> > iostat)
> >
> >
> >
> > So :
> > * is there a way to reduce priority of "snap rm", to avoid overloading
> > of the cluster ?
> > * is there a way to have a blocking "snap rm" which will wait until it's
> > completed
> > * is there a way to speedup "snap rm" ?
> >
> >
> > Note that I have a too low PG number on my cluster (200 PG for 40 active
> > OSD ; but I'm trying to progressivly migrate data to a newer pool). Can
> > it be the source of the problem ?
> >
> > Thanks,
> >
> > Olivier
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX

2013-04-21 Thread Lorieri
I like s3cmd, but it allows you only manipulate buckets with at least one
capital letter




On Sun, Apr 21, 2013 at 2:05 PM, Igor Laskovy wrote:

> Just initial connect to rgw server, nothing further.
> Please see below behavior for CrossFTP and S3Browser cases.
>
> On CrossFTP side:
> [R1] Connect to rgw.labspace
> [R1] Current path: /
> [R1] Current path: /
> [R1] LIST /
> [R1] Expected XML document response from S3 but received content type
> text/html
> [R1] Disconnected
>
> On rgw side:
> root@osd01:~# ps aux |grep rados
> root  1785  0.4  0.1 2045404 6068 ?Ssl  19:47   0:00
> /usr/bin/radosgw -n client.radosgw.a
>
> root@osd01:~# tail -f /var/log/apache2/error.log
> [Sun Apr 21 19:43:56 2013] [notice] FastCGI: process manager initialized
> (pid 1433)
> [Sun Apr 21 19:43:56 2013] [notice] Apache/2.2.22 (Ubuntu)
> mod_fastcgi/mod_fastcgi-SNAP-0910052141 mod_ssl/2.2.22 OpenSSL/1.0.1
> configured -- resuming normal operations
> [Sun Apr 21 19:50:19 2013] [error] [client 192.168.1.51] File does not
> exist: /var/www/favicon.ico
>
> tail -f /var/log/apache2/access.log
> nothing
>
> On S3browser side:
> [image: Inline image 2]
> [4/21/2013 7:56 PM] Getting buckets list... TaskID: 2
> [4/21/2013 7:56 PM] System.Net.WebException:The underlying connection was
> closed: An unexpected error occurred on a send. TaskID: 2 TaskID: 2
> [4/21/2013 7:56 PM] Error occurred during Getting buckets list TaskID: 2
>
> On rgw side:
>
> root@osd01:~# tail -f /var/log/apache2/error.log
> [Sun Apr 21 19:56:19 2013] [error] [client 192.168.1.51] Invalid method in
> request \x16\x03\x01
> [Sun Apr 21 19:56:22 2013] [error] [client 192.168.1.51] Invalid method in
> request \x16\x03\x01
> [Sun Apr 21 19:56:23 2013] [error] [client 192.168.1.51] Invalid method in
> request \x16\x03\x01
> [Sun Apr 21 19:56:23 2013] [error] [client 192.168.1.51] Invalid method in
> request \x16\x03\x01
> [Sun Apr 21 19:56:24 2013] [error] [client 192.168.1.51] Invalid method in
> request \x16\x03\x01
> [Sun Apr 21 19:56:24 2013] [error] [client 192.168.1.51] Invalid method in
> request \x16\x03\x01
> [Sun Apr 21 19:56:25 2013] [error] [client 192.168.1.51] Invalid method in
> request \x16\x03\x01
> [Sun Apr 21 19:56:25 2013] [error] [client 192.168.1.51] Invalid method in
> request \x16\x03\x01
>
> tail -f /var/log/apache2/access.log
> nothing
>
>
>
> On Sun, Apr 21, 2013 at 7:43 PM, Yehuda Sadeh  wrote:
>
>> On Sun, Apr 21, 2013 at 9:39 AM, Igor Laskovy 
>> wrote:
>> > Well, in each case something specific. For CrossFTP, for example, it
>> says
>> > that asking the server it receive text data instead of XML.
>>
>> When doing what? Are you able to do anything?
>>
>> > In logs on servers side I don't found something interested.
>>
>> What do the apache access and error logs show?
>>
>> >
>> > I do everything shown at http://ceph.com/docs/master/radosgw/ and only
>> that,
>> > excluding swift compatible preparation.
>> > May be there are needs something additional? Manual creating of root
>> bucket
>> > or something like that?
>> >
>> >
>> > On Sun, Apr 21, 2013 at 6:53 PM, Yehuda Sadeh 
>> wrote:
>> >>
>> >> On Sun, Apr 21, 2013 at 3:02 AM, Igor Laskovy 
>> >> wrote:
>> >> > A little bit more.
>> >> >
>> >> > I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and
>> >> > than
>> >> > connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all
>> >> > unsuccessfully.
>> >> >
>> >> > Again my question, does anybody use S3 desktop clients with RGW?
>> >>
>> >> These applications should be compatible with rgw. Are you sure your
>> >> setup works? What are you getting?
>> >>
>> >> Yehuda
>> >
>> >
>> >
>> >
>> > --
>> > Igor Laskovy
>> > facebook.com/igor.laskovy
>> > studiogrizzly.com
>>
>
>
>
> --
> Igor Laskovy
> facebook.com/igor.laskovy
> studiogrizzly.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW and S3-compatible clients for PC and OSX

2013-04-21 Thread Yehuda Sadeh
On Sun, Apr 21, 2013 at 10:05 AM, Igor Laskovy  wrote:
>
> Just initial connect to rgw server, nothing further.
> Please see below behavior for CrossFTP and S3Browser cases.
>
> On CrossFTP side:
> [R1] Connect to rgw.labspace
> [R1] Current path: /
> [R1] Current path: /
> [R1] LIST /
> [R1] Expected XML document response from S3 but received content type 
> text/html
> [R1] Disconnected
>
> On rgw side:
> root@osd01:~# ps aux |grep rados
> root  1785  0.4  0.1 2045404 6068 ?Ssl  19:47   0:00 
> /usr/bin/radosgw -n client.radosgw.a
>
> root@osd01:~# tail -f /var/log/apache2/error.log
> [Sun Apr 21 19:43:56 2013] [notice] FastCGI: process manager initialized (pid 
> 1433)
> [Sun Apr 21 19:43:56 2013] [notice] Apache/2.2.22 (Ubuntu) 
> mod_fastcgi/mod_fastcgi-SNAP-0910052141 mod_ssl/2.2.22 OpenSSL/1.0.1 
> configured -- resuming normal operations
> [Sun Apr 21 19:50:19 2013] [error] [client 192.168.1.51] File does not exist: 
> /var/www/favicon.ico

Doesn't seem that your apache is configured right. How does your site
config file look like? Do you have any other sites configured (e.g.,
the default one)? try listing whatever under
/etc/apache2/sites-enabled, see if there's anything else there.
>
> tail -f /var/log/apache2/access.log
> nothing
>
> On S3browser side:
>
> [4/21/2013 7:56 PM] Getting buckets list... TaskID: 2
> [4/21/2013 7:56 PM] System.Net.WebException:The underlying connection was 
> closed: An unexpected error occurred on a send. TaskID: 2 TaskID: 2
> [4/21/2013 7:56 PM] Error occurred during Getting buckets list TaskID: 2
>
> On rgw side:
>
> root@osd01:~# tail -f /var/log/apache2/error.log
> [Sun Apr 21 19:56:19 2013] [error] [client 192.168.1.51] Invalid method in 
> request \x16\x03\x01
> [Sun Apr 21 19:56:22 2013] [error] [client 192.168.1.51] Invalid method in 
> request \x16\x03\x01
> [Sun Apr 21 19:56:23 2013] [error] [client 192.168.1.51] Invalid method in 
> request \x16\x03\x01
> [Sun Apr 21 19:56:23 2013] [error] [client 192.168.1.51] Invalid method in 
> request \x16\x03\x01
> [Sun Apr 21 19:56:24 2013] [error] [client 192.168.1.51] Invalid method in 
> request \x16\x03\x01
> [Sun Apr 21 19:56:24 2013] [error] [client 192.168.1.51] Invalid method in 
> request \x16\x03\x01
> [Sun Apr 21 19:56:25 2013] [error] [client 192.168.1.51] Invalid method in 
> request \x16\x03\x01
> [Sun Apr 21 19:56:25 2013] [error] [client 192.168.1.51] Invalid method in 
> request \x16\x03\x01
>
> tail -f /var/log/apache2/access.log
> nothing
>
>
>
> On Sun, Apr 21, 2013 at 7:43 PM, Yehuda Sadeh  wrote:
>>
>> On Sun, Apr 21, 2013 at 9:39 AM, Igor Laskovy  wrote:
>> > Well, in each case something specific. For CrossFTP, for example, it says
>> > that asking the server it receive text data instead of XML.
>>
>> When doing what? Are you able to do anything?
>>
>> > In logs on servers side I don't found something interested.
>>
>> What do the apache access and error logs show?
>>
>> >
>> > I do everything shown at http://ceph.com/docs/master/radosgw/ and only 
>> > that,
>> > excluding swift compatible preparation.
>> > May be there are needs something additional? Manual creating of root bucket
>> > or something like that?
>> >
>> >
>> > On Sun, Apr 21, 2013 at 6:53 PM, Yehuda Sadeh  wrote:
>> >>
>> >> On Sun, Apr 21, 2013 at 3:02 AM, Igor Laskovy 
>> >> wrote:
>> >> > A little bit more.
>> >> >
>> >> > I have tried deploy RGW via http://ceph.com/docs/master/radosgw/ and
>> >> > than
>> >> > connect S3 Browser, CrossFTP and CloudBerry Explorer clients, but all
>> >> > unsuccessfully.
>> >> >
>> >> > Again my question, does anybody use S3 desktop clients with RGW?
>> >>
>> >> These applications should be compatible with rgw. Are you sure your
>> >> setup works? What are you getting?
>> >>
>> >> Yehuda
>> >
>> >
>> >
>> >
>> > --
>> > Igor Laskovy
>> > facebook.com/igor.laskovy
>> > studiogrizzly.com
>
>
>
>
> --
> Igor Laskovy
> facebook.com/igor.laskovy
> studiogrizzly.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com