Re: [ceph-users] OOM-Killer for ceph-osd

2014-04-28 Thread Gandalf Corvotempesta
2014-04-27 23:58 GMT+02:00 Andrey Korolyov :
> Nothing looks wrong, except heartbeat interval which probably should
> be smaller due to recovery considerations. Try ``ceph osd tell X heap
> release'' and if it will not change memory consumption, file a bug.

What should I look for running this ?
Seems to does nothing
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD journal overload?

2014-04-28 Thread Udo Lembke
Hi,
perhaps due IOs from the journal?
You can test with iostat (like "iostat -dm 5 sdg").

on debian iostat is in the package sysstat.

Udo

Am 28.04.2014 07:38, schrieb Indra Pramana:
> Hi Craig,
> 
> Good day to you, and thank you for your enquiry.
> 
> As per your suggestion, I have created a 3rd partition on the SSDs and did
> the dd test directly into the device, and the result is very slow.
> 
> 
> root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdg3
> conv=fdatasync oflag=direct
> 128+0 records in
> 128+0 records out
> 134217728 bytes (134 MB) copied, 19.5223 s, 6.9 MB/s
> 
> root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdf3
> conv=fdatasync oflag=direct
> 128+0 records in
> 128+0 records out
> 134217728 bytes (134 MB) copied, 5.34405 s, 25.1 MB/s
> 
> 
> I did a test onto another server with exactly similar specification and
> similar SSD drive (Seagate SSD 100 GB) but not added into the cluster yet
> (thus no load), and the result is fast:
> 
> 
> root@ceph-osd-09:/home/indra# dd bs=1M count=128 if=/dev/zero of=/dev/sdf1
> conv=fdatasync oflag=direct
> 128+0 records in
> 128+0 records out
> 134217728 bytes (134 MB) copied, 0.742077 s, 181 MB/s
> 
> 
> Is the Ceph journal load really takes up a lot of the SSD resources? I
> don't understand how come the performance can drop significantly.
> Especially since the two Ceph journals are only taking the first 20 GB out
> of the 100 GB of the SSD total capacity.
> 
> Any advice is greatly appreciated.
> 
> Looking forward to your reply, thank you.
> 
> Cheers.
> 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD journal overload?

2014-04-28 Thread Irek Fasikhov
You what model SSD?
Which version of the kernel?



2014-04-28 12:35 GMT+04:00 Udo Lembke :

> Hi,
> perhaps due IOs from the journal?
> You can test with iostat (like "iostat -dm 5 sdg").
>
> on debian iostat is in the package sysstat.
>
> Udo
>
> Am 28.04.2014 07:38, schrieb Indra Pramana:
> > Hi Craig,
> >
> > Good day to you, and thank you for your enquiry.
> >
> > As per your suggestion, I have created a 3rd partition on the SSDs and
> did
> > the dd test directly into the device, and the result is very slow.
> >
> > 
> > root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdg3
> > conv=fdatasync oflag=direct
> > 128+0 records in
> > 128+0 records out
> > 134217728 bytes (134 MB) copied, 19.5223 s, 6.9 MB/s
> >
> > root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdf3
> > conv=fdatasync oflag=direct
> > 128+0 records in
> > 128+0 records out
> > 134217728 bytes (134 MB) copied, 5.34405 s, 25.1 MB/s
> > 
> >
> > I did a test onto another server with exactly similar specification and
> > similar SSD drive (Seagate SSD 100 GB) but not added into the cluster yet
> > (thus no load), and the result is fast:
> >
> > 
> > root@ceph-osd-09:/home/indra# dd bs=1M count=128 if=/dev/zero
> of=/dev/sdf1
> > conv=fdatasync oflag=direct
> > 128+0 records in
> > 128+0 records out
> > 134217728 bytes (134 MB) copied, 0.742077 s, 181 MB/s
> > 
> >
> > Is the Ceph journal load really takes up a lot of the SSD resources? I
> > don't understand how come the performance can drop significantly.
> > Especially since the two Ceph journals are only taking the first 20 GB
> out
> > of the 100 GB of the SSD total capacity.
> >
> > Any advice is greatly appreciated.
> >
> > Looking forward to your reply, thank you.
> >
> > Cheers.
> >
> >
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cluster_network ignored

2014-04-28 Thread Gandalf Corvotempesta
2014-04-26 12:06 GMT+02:00 Gandalf Corvotempesta
:
> I've not defined cluster IPs for each OSD server but only the whole subnet.
> Should I define each IP for each OSD ? This is not wrote on docs and
> could be tricky to do this in big environments with hundreds of nodes

I've added "cluster addr" and "public addr" to each OSD configuration
but nothing is changed.
I see all OSDs down except the ones from one server but I'm able to
ping each other nodes on both interfaces.

How can I detect what ceph is doing? I see tons of debug logs but they
are not very easy to understand
with "ceph health" i can see that "pgs down" value is slowly
decreasing so I can suppose that caph is recovering. Is that right?

Isn't possible to add a semplified output like the one coming from
"mdadm"? (cat /proc/mdstat)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Access denied error

2014-04-28 Thread Punit Dambiwal
Hi Yehuda,

I am using the same above method to call the api and used the way which
described in the
http://ceph.com/docs/master/radosgw/s3/authentication/#access-control-lists-aclsfor
connection. The method in the
http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html is
for generating the hash of the header string and secret keys, since these
keys are created already and i think we don't need this method, right ?

I also tried one function to list out the bucket data as like

curl -i 'http://gateway.3linux.com/test?format=json' -X GET -H
'Authorization: AWS
KGXJJGKDM5G7G4CNKC7R:LC7S0twZdhtXA1XxthfMDsj5TgJpeKhZrloWa9WN' -H 'Host:
gateway.3linux.com' -H 'Date: Mon, 28 April 2014 07:25:00 GMT ' -H
'Content-Length: 0'

but its also getting the access denied error. But i can view the bucket
details by directly entering http://gateway.3linux.com/test?format=json in
the browser. What do you think ? what may be the reason ? I am able to
connect and list buckets etc using cyberduck ftp clients these access keys
but unable to do with the function calls.




On Sat, Apr 26, 2014 at 12:22 AM, Yehuda Sadeh  wrote:

> On Fri, Apr 25, 2014 at 1:03 AM, Punit Dambiwal  wrote:
> > Hi Yehuda,
> >
> > Thanks for your help...that missing date error gone but still i am
> getting
> > the access denied error :-
> >
> > -
> > 2014-04-25 15:52:56.988025 7f00d37c6700  1 == starting new request
> > req=0x237a090 =
> > 2014-04-25 15:52:56.988072 7f00d37c6700  2 req 24:0.46::GET
> > /admin/usage::initializing
> > 2014-04-25 15:52:56.988077 7f00d37c6700 10 host=gateway.3linux.com
> > rgw_dns_name=gateway.3linux.com
> > 2014-04-25 15:52:56.988102 7f00d37c6700 20 FCGI_ROLE=RESPONDER
> > 2014-04-25 15:52:56.988103 7f00d37c6700 20 SCRIPT_URL=/admin/usage
> > 2014-04-25 15:52:56.988104 7f00d37c6700 20
> > SCRIPT_URI=http://gateway.3linux.com/admin/usage
> > 2014-04-25 15:52:56.988105 7f00d37c6700 20 HTTP_AUTHORIZATION=AWS
> > KGXJJGKDM5G7G4CNKC7R:LC7S0twZdhtXA1XxthfMDsj5TgJpeKhZrloWa9WN
> > 2014-04-25 15:52:56.988107 7f00d37c6700 20 HTTP_USER_AGENT=curl/7.22.0
> > (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4libidn/1.23
> > librtmp/2.3
> > 2014-04-25 15:52:56.988108 7f00d37c6700 20 HTTP_ACCEPT=*/*
> > 2014-04-25 15:52:56.988109 7f00d37c6700 20 HTTP_HOST=gateway.3linux.com
> > 2014-04-25 15:52:56.988110 7f00d37c6700 20 HTTP_DATE=Fri, 25 April 2014
> > 07:50:00 GMT
> > 2014-04-25 15:52:56.988111 7f00d37c6700 20 CONTENT_LENGTH=0
> > 2014-04-25 15:52:56.988112 7f00d37c6700 20
> PATH=/usr/local/bin:/usr/bin:/bin
> > 2014-04-25 15:52:56.988113 7f00d37c6700 20 SERVER_SIGNATURE=
> > 2014-04-25 15:52:56.988114 7f00d37c6700 20 SERVER_SOFTWARE=Apache/2.2.22
> > (Ubuntu)
> > 2014-04-25 15:52:56.988115 7f00d37c6700 20 SERVER_NAME=
> gateway.3linux.com
> > 2014-04-25 15:52:56.988116 7f00d37c6700 20 SERVER_ADDR=117.18.79.110
> > 2014-04-25 15:52:56.988117 7f00d37c6700 20 SERVER_PORT=80
> > 2014-04-25 15:52:56.988117 7f00d37c6700 20 REMOTE_ADDR=122.166.115.191
> > 2014-04-25 15:52:56.988118 7f00d37c6700 20 DOCUMENT_ROOT=/var/www
> > 2014-04-25 15:52:56.988119 7f00d37c6700 20 SERVER_ADMIN=c...@3linux.com
> > 2014-04-25 15:52:56.988120 7f00d37c6700 20
> > SCRIPT_FILENAME=/var/www/s3gw.fcgi
> > 2014-04-25 15:52:56.988120 7f00d37c6700 20 REMOTE_PORT=28840
> > 2014-04-25 15:52:56.988121 7f00d37c6700 20 GATEWAY_INTERFACE=CGI/1.1
> > 2014-04-25 15:52:56.988122 7f00d37c6700 20 SERVER_PROTOCOL=HTTP/1.1
> > 2014-04-25 15:52:56.988123 7f00d37c6700 20 REQUEST_METHOD=GET
> > 2014-04-25 15:52:56.988123 7f00d37c6700 20
> > QUERY_STRING=page=admin¶ms=/usage&format=json
> > 2014-04-25 15:52:56.988124 7f00d37c6700 20
> > REQUEST_URI=/admin/usage?format=json
> > 2014-04-25 15:52:56.988125 7f00d37c6700 20 SCRIPT_NAME=/admin/usage
> > 2014-04-25 15:52:56.988126 7f00d37c6700  2 req 24:0.000101::GET
> > /admin/usage::getting op
> > 2014-04-25 15:52:56.988129 7f00d37c6700  2 req 24:0.000104::GET
> > /admin/usage:get_usage:authorizing
> > 2014-04-25 15:52:56.988141 7f00d37c6700 20 get_obj_state:
> > rctx=0x7effbc004aa0 obj=.users:KGXJJGKDM5G7G4CNKC7R state=0x7effbc00e718
> > s->prefetch_data=0
> > 2014-04-25 15:52:56.988148 7f00d37c6700 10 moving
> > .users+KGXJJGKDM5G7G4CNKC7R to cache LRU end
> > 2014-04-25 15:52:56.988150 7f00d37c6700 10 cache get:
> > name=.users+KGXJJGKDM5G7G4CNKC7R : hit
> > 2014-04-25 15:52:56.988155 7f00d37c6700 20 get_obj_state: s->obj_tag was
> set
> > empty
> > 2014-04-25 15:52:56.988160 7f00d37c6700 10 moving
> > .users+KGXJJGKDM5G7G4CNKC7R to cache LRU end
> > 2014-04-25 15:52:56.988161 7f00d37c6700 10 cache get:
> > name=.users+KGXJJGKDM5G7G4CNKC7R : hit
> > 2014-04-25 15:52:56.988179 7f00d37c6700 20 get_obj_state:
> > rctx=0x7effbc001ce0 obj=.users.uid:admin state=0x7effbc00ec58
> > s->prefetch_data=0
> > 2014-04-25 15:52:56.988185 7f00d37c6700 10 moving .users.uid+admin to
> cache
> > LRU end
> > 2014-04-25 15:52:56.988186 7f00d37c6700 10 cache get:
> name=.users.uid+admin
>

[ceph-users] Please provide me rados gateway configuration (rgw.conf) for lighttpd

2014-04-28 Thread Srinivasa Rao Ragolu
Hi All,

I would like to use lighttpd instead of apache for rados gateway
configuration. But i am facing issues with syntax for rgw.conf.

Could you please share the details how I can prepare rgw.conf fot lighttpd?

Please also suggest version of mod_fastcgi for apache version 2.4.3.

Thanks,
Srininivas.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Andrija Panic
Hi,

I'm trying to add CEPH as Primary Storage, but my libvirt 0.10.2 (CentOS
6.5) does some complaints:
-  internal error missing backend for pool type 8

Is it possible that the libvirt 0.10.2 (shipped with CentOS 6.5) was not
compiled with RBD support ?
Can't find how to check this...

I'm able to use qemu-img to create rbd images etc...

Here is cloudstack-agent DEBUG output, all seems fine...


1e119e4c-20d1-3fbc-a525-a5771944046d
1e119e4c-20d1-3fbc-a525-a5771944046d


cloudstack






-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OOM-Killer for ceph-osd

2014-04-28 Thread Andrey Korolyov
On 04/28/2014 12:33 PM, Gandalf Corvotempesta wrote:
> 2014-04-27 23:58 GMT+02:00 Andrey Korolyov :
>> Nothing looks wrong, except heartbeat interval which probably should
>> be smaller due to recovery considerations. Try ``ceph osd tell X heap
>> release'' and if it will not change memory consumption, file a bug.
> 
> What should I look for running this ?
> Seems to does nothing
> 

OSD process should shrink, if it has a room to. Anyway you`ll probably
want to gcore target process and fill a bug, I saw OSDs with such memory
commit but on extremely large values of relative PG count and object
count plus it was involved into recovery procedures.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD journal overload?

2014-04-28 Thread Indra Pramana
Hi Udo and Irek,

Good day to you, and thank you for your emails.

>perhaps due IOs from the journal?
>You can test with iostat (like "iostat -dm 5 sdg").

Yes, I have shared the iostat result earlier on this same thread. At times
the utilisation of the 2 journal drives will hit 100%, especially when I
simulate writing data using rados bench command. Any suggestions what could
be the cause of the I/O issue?


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.850.001.653.140.00   93.36

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdg   0.00 0.000.00   55.00 0.00 25365.33
922.3834.22  568.900.00  568.90  17.82  98.00
sdf   0.00 0.000.00   55.67 0.00 25022.67
899.0229.76  500.570.00  500.57  17.60  98.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   2.100.001.372.070.00   94.46

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdg   0.00 0.000.00   56.67 0.00 25220.00
890.1223.60  412.140.00  412.14  17.62  99.87
sdf   0.00 0.000.00   52.00 0.00 24637.33
947.5933.65  587.410.00  587.41  19.23 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   2.210.001.776.750.00   89.27

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdg   0.00 0.000.00   54.33 0.00 24802.67
912.9825.75  486.360.00  486.36  18.40 100.00
sdf   0.00 0.000.00   53.00 0.00 24716.00
932.6835.26  669.890.00  669.89  18.87 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.870.001.675.250.00   91.21

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdg   0.00 0.000.00   94.33 0.00 26257.33
556.6918.29  208.440.00  208.44  10.50  99.07
sdf   0.00 0.000.00   51.33 0.00 24470.67
953.4032.75  684.620.00  684.62  19.51 100.13

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.510.001.347.250.00   89.89

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdg   0.00 0.000.00   52.00 0.00 22565.33
867.9024.73  446.510.00  446.51  19.10  99.33
sdf   0.00 0.000.00   64.67 0.00 24892.00
769.8619.50  330.020.00  330.02  15.32  99.07


>You what model SSD?

For this one, I am using Seagate 100GB SSD, model: HDS-2TM-ST100FM0012

>Which version of the kernel?

Ubuntu 13.04, Linux kernel version: 3.8.0-19-generic #30-Ubuntu SMP Wed May
1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Looking forward to your reply, thank you.

Cheers.



On Mon, Apr 28, 2014 at 4:45 PM, Irek Fasikhov  wrote:

> You what model SSD?
> Which version of the kernel?
>
>
>
> 2014-04-28 12:35 GMT+04:00 Udo Lembke :
>
>> Hi,
>> perhaps due IOs from the journal?
>> You can test with iostat (like "iostat -dm 5 sdg").
>>
>> on debian iostat is in the package sysstat.
>>
>> Udo
>>
>> Am 28.04.2014 07:38, schrieb Indra Pramana:
>> > Hi Craig,
>> >
>> > Good day to you, and thank you for your enquiry.
>> >
>> > As per your suggestion, I have created a 3rd partition on the SSDs and
>> did
>> > the dd test directly into the device, and the result is very slow.
>> >
>> > 
>> > root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdg3
>> > conv=fdatasync oflag=direct
>> > 128+0 records in
>> > 128+0 records out
>> > 134217728 bytes (134 MB) copied, 19.5223 s, 6.9 MB/s
>> >
>> > root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdf3
>> > conv=fdatasync oflag=direct
>> > 128+0 records in
>> > 128+0 records out
>> > 134217728 bytes (134 MB) copied, 5.34405 s, 25.1 MB/s
>> > 
>> >
>> > I did a test onto another server with exactly similar specification and
>> > similar SSD drive (Seagate SSD 100 GB) but not added into the cluster
>> yet
>> > (thus no load), and the result is fast:
>> >
>> > 
>> > root@ceph-osd-09:/home/indra# dd bs=1M count=128 if=/dev/zero
>> of=/dev/sdf1
>> > conv=fdatasync oflag=direct
>> > 128+0 records in
>> > 128+0 records out
>> > 134217728 bytes (134 MB) copied, 0.742077 s, 181 MB/s
>> > 
>> >
>> > Is the Ceph journal load really takes up a lot of the SSD resources? I
>> > don't understand how come the performance can drop significantly.
>> > Especially since the two Ceph journals are only taking the first 20 GB
>> out
>> > of the 100 GB of the SSD total capacity.
>> >
>> > Any advice is greatly appreciated.
>> >
>> > Looking forward to your reply, thank you.
>> >
>>

Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Wido den Hollander

On 04/28/2014 12:49 PM, Andrija Panic wrote:

Hi,

I'm trying to add CEPH as Primary Storage, but my libvirt 0.10.2 (CentOS
6.5) does some complaints:
-  internal error missing backend for pool type 8

Is it possible that the libvirt 0.10.2 (shipped with CentOS 6.5) was not
compiled with RBD support ?
Can't find how to check this...



No, it's probably not compiled with RBD storage pool support.

As far as I know CentOS doesn't compile libvirt with that support yet.


I'm able to use qemu-img to create rbd images etc...

Here is cloudstack-agent DEBUG output, all seems fine...


1e119e4c-20d1-3fbc-a525-a5771944046d
1e119e4c-20d1-3fbc-a525-a5771944046d




I recommend creating a Round Robin DNS record which points to all your 
monitors.



cloudstack






--

Andrija Panić


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD journal overload?

2014-04-28 Thread Irek Fasikhov
Most likely you need to apply a patch to the kernel.

http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov


2014-04-28 15:20 GMT+04:00 Indra Pramana :

> Hi Udo and Irek,
>
> Good day to you, and thank you for your emails.
>
>
> >perhaps due IOs from the journal?
> >You can test with iostat (like "iostat -dm 5 sdg").
>
> Yes, I have shared the iostat result earlier on this same thread. At times
> the utilisation of the 2 journal drives will hit 100%, especially when I
> simulate writing data using rados bench command. Any suggestions what could
> be the cause of the I/O issue?
>
>
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>1.850.001.653.140.00   93.36
>
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sdg   0.00 0.000.00   55.00 0.00 25365.33
> 922.3834.22  568.900.00  568.90  17.82  98.00
> sdf   0.00 0.000.00   55.67 0.00 25022.67
> 899.0229.76  500.570.00  500.57  17.60  98.00
>
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>2.100.001.372.070.00   94.46
>
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sdg   0.00 0.000.00   56.67 0.00 25220.00
> 890.1223.60  412.140.00  412.14  17.62  99.87
> sdf   0.00 0.000.00   52.00 0.00 24637.33
> 947.5933.65  587.410.00  587.41  19.23 100.00
>
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>2.210.001.776.750.00   89.27
>
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sdg   0.00 0.000.00   54.33 0.00 24802.67
> 912.9825.75  486.360.00  486.36  18.40 100.00
> sdf   0.00 0.000.00   53.00 0.00 24716.00
> 932.6835.26  669.890.00  669.89  18.87 100.00
>
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>1.870.001.675.250.00   91.21
>
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sdg   0.00 0.000.00   94.33 0.00 26257.33
> 556.6918.29  208.440.00  208.44  10.50  99.07
> sdf   0.00 0.000.00   51.33 0.00 24470.67
> 953.4032.75  684.620.00  684.62  19.51 100.13
>
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>1.510.001.347.250.00   89.89
>
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sdg   0.00 0.000.00   52.00 0.00 22565.33
> 867.9024.73  446.510.00  446.51  19.10  99.33
> sdf   0.00 0.000.00   64.67 0.00 24892.00
> 769.8619.50  330.020.00  330.02  15.32  99.07
> 
>
> >You what model SSD?
>
> For this one, I am using Seagate 100GB SSD, model: HDS-2TM-ST100FM0012
>
> >Which version of the kernel?
>
> Ubuntu 13.04, Linux kernel version: 3.8.0-19-generic #30-Ubuntu SMP Wed
> May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>
> Looking forward to your reply, thank you.
>
> Cheers.
>
>
>
> On Mon, Apr 28, 2014 at 4:45 PM, Irek Fasikhov  wrote:
>
>> You what model SSD?
>> Which version of the kernel?
>>
>>
>>
>> 2014-04-28 12:35 GMT+04:00 Udo Lembke :
>>
>>> Hi,
>>> perhaps due IOs from the journal?
>>> You can test with iostat (like "iostat -dm 5 sdg").
>>>
>>> on debian iostat is in the package sysstat.
>>>
>>> Udo
>>>
>>> Am 28.04.2014 07:38, schrieb Indra Pramana:
>>> > Hi Craig,
>>> >
>>> > Good day to you, and thank you for your enquiry.
>>> >
>>> > As per your suggestion, I have created a 3rd partition on the SSDs and
>>> did
>>> > the dd test directly into the device, and the result is very slow.
>>> >
>>> > 
>>> > root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdg3
>>> > conv=fdatasync oflag=direct
>>> > 128+0 records in
>>> > 128+0 records out
>>> > 134217728 bytes (134 MB) copied, 19.5223 s, 6.9 MB/s
>>> >
>>> > root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdf3
>>> > conv=fdatasync oflag=direct
>>> > 128+0 records in
>>> > 128+0 records out
>>> > 134217728 bytes (134 MB) copied, 5.34405 s, 25.1 MB/s
>>> > 
>>> >
>>> > I did a test onto another server with exactly similar specification and
>>> > similar SSD drive (Seagate SSD 100 GB) but not added into the cluster
>>> yet
>>> > (thus no load), and the result is fast:
>>> >
>>> > 
>>> > root@ceph-osd-09:/home/indra# dd bs=1M count=128 if=/dev/zero
>>> of=/dev/sdf1
>>> > conv=fdatasync oflag=direct
>>> > 128+0 records in
>>> > 128+0 records out
>>> > 134217728 bytes (134 MB) copied, 0.7420

Re: [ceph-users] SSD journal overload?

2014-04-28 Thread Indra Pramana
Hi Irek,

Thanks for the article. Do you have any other web sources pertaining to the
same issue, which is in English?

Looking forward to your reply, thank you.

Cheers.


On Mon, Apr 28, 2014 at 7:40 PM, Irek Fasikhov  wrote:

> Most likely you need to apply a patch to the kernel.
>
>
> http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov
>
>
> 2014-04-28 15:20 GMT+04:00 Indra Pramana :
>
> Hi Udo and Irek,
>>
>> Good day to you, and thank you for your emails.
>>
>>
>> >perhaps due IOs from the journal?
>> >You can test with iostat (like "iostat -dm 5 sdg").
>>
>> Yes, I have shared the iostat result earlier on this same thread. At
>> times the utilisation of the 2 journal drives will hit 100%, especially
>> when I simulate writing data using rados bench command. Any suggestions
>> what could be the cause of the I/O issue?
>>
>>
>> 
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>1.850.001.653.140.00   93.36
>>
>>
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>  sdg   0.00 0.000.00   55.00 0.00 25365.33
>> 922.3834.22  568.900.00  568.90  17.82  98.00
>> sdf   0.00 0.000.00   55.67 0.00 25022.67
>> 899.0229.76  500.570.00  500.57  17.60  98.00
>>
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>2.100.001.372.070.00   94.46
>>
>>
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>  sdg   0.00 0.000.00   56.67 0.00 25220.00
>> 890.1223.60  412.140.00  412.14  17.62  99.87
>> sdf   0.00 0.000.00   52.00 0.00 24637.33
>> 947.5933.65  587.410.00  587.41  19.23 100.00
>>
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>2.210.001.776.750.00   89.27
>>
>>
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>  sdg   0.00 0.000.00   54.33 0.00 24802.67
>> 912.9825.75  486.360.00  486.36  18.40 100.00
>> sdf   0.00 0.000.00   53.00 0.00 24716.00
>> 932.6835.26  669.890.00  669.89  18.87 100.00
>>
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>1.870.001.675.250.00   91.21
>>
>>
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>  sdg   0.00 0.000.00   94.33 0.00 26257.33
>> 556.6918.29  208.440.00  208.44  10.50  99.07
>> sdf   0.00 0.000.00   51.33 0.00 24470.67
>> 953.4032.75  684.620.00  684.62  19.51 100.13
>>
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>1.510.001.347.250.00   89.89
>>
>>
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>  sdg   0.00 0.000.00   52.00 0.00 22565.33
>> 867.9024.73  446.510.00  446.51  19.10  99.33
>> sdf   0.00 0.000.00   64.67 0.00 24892.00
>> 769.8619.50  330.020.00  330.02  15.32  99.07
>> 
>>
>> >You what model SSD?
>>
>> For this one, I am using Seagate 100GB SSD, model: HDS-2TM-ST100FM0012
>>
>> >Which version of the kernel?
>>
>> Ubuntu 13.04, Linux kernel version: 3.8.0-19-generic #30-Ubuntu SMP Wed
>> May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>>
>> Looking forward to your reply, thank you.
>>
>> Cheers.
>>
>>
>>
>> On Mon, Apr 28, 2014 at 4:45 PM, Irek Fasikhov  wrote:
>>
>>> You what model SSD?
>>> Which version of the kernel?
>>>
>>>
>>>
>>> 2014-04-28 12:35 GMT+04:00 Udo Lembke :
>>>
 Hi,
 perhaps due IOs from the journal?
 You can test with iostat (like "iostat -dm 5 sdg").

 on debian iostat is in the package sysstat.

 Udo

 Am 28.04.2014 07:38, schrieb Indra Pramana:
 > Hi Craig,
 >
 > Good day to you, and thank you for your enquiry.
 >
 > As per your suggestion, I have created a 3rd partition on the SSDs
 and did
 > the dd test directly into the device, and the result is very slow.
 >
 > 
 > root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdg3
 > conv=fdatasync oflag=direct
 > 128+0 records in
 > 128+0 records out
 > 134217728 bytes (134 MB) copied, 19.5223 s, 6.9 MB/s
 >
 > root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdf3
 > conv=fdatasync oflag=direct
 > 128+0 records in
 > 128+0 records out
 > 134217728 bytes (134 MB) copied, 5.34405 s, 25.1 MB/s
 > 
 >
 > I did a test onto another server with exactly similar 

Re: [ceph-users] SSD journal overload?

2014-04-28 Thread Irek Fasikhov
This is my article :).
To patch to the kernel (http://www.theirek.com/downloads/code/CMD_FLUSH.diff
).
After rebooting, run the following commands:
echo temporary write through > /sys/class/scsi_disk//cache_type


2014-04-28 15:44 GMT+04:00 Indra Pramana :

> Hi Irek,
>
> Thanks for the article. Do you have any other web sources pertaining to
> the same issue, which is in English?
>
> Looking forward to your reply, thank you.
>
> Cheers.
>
>
> On Mon, Apr 28, 2014 at 7:40 PM, Irek Fasikhov  wrote:
>
>> Most likely you need to apply a patch to the kernel.
>>
>>
>> http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov
>>
>>
>> 2014-04-28 15:20 GMT+04:00 Indra Pramana :
>>
>> Hi Udo and Irek,
>>>
>>> Good day to you, and thank you for your emails.
>>>
>>>
>>> >perhaps due IOs from the journal?
>>> >You can test with iostat (like "iostat -dm 5 sdg").
>>>
>>> Yes, I have shared the iostat result earlier on this same thread. At
>>> times the utilisation of the 2 journal drives will hit 100%, especially
>>> when I simulate writing data using rados bench command. Any suggestions
>>> what could be the cause of the I/O issue?
>>>
>>>
>>> 
>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>1.850.001.653.140.00   93.36
>>>
>>>
>>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>  sdg   0.00 0.000.00   55.00 0.00 25365.33
>>> 922.3834.22  568.900.00  568.90  17.82  98.00
>>> sdf   0.00 0.000.00   55.67 0.00 25022.67
>>> 899.0229.76  500.570.00  500.57  17.60  98.00
>>>
>>>
>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>2.100.001.372.070.00   94.46
>>>
>>>
>>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>  sdg   0.00 0.000.00   56.67 0.00 25220.00
>>> 890.1223.60  412.140.00  412.14  17.62  99.87
>>> sdf   0.00 0.000.00   52.00 0.00 24637.33
>>> 947.5933.65  587.410.00  587.41  19.23 100.00
>>>
>>>
>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>2.210.001.776.750.00   89.27
>>>
>>>
>>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>  sdg   0.00 0.000.00   54.33 0.00 24802.67
>>> 912.9825.75  486.360.00  486.36  18.40 100.00
>>> sdf   0.00 0.000.00   53.00 0.00 24716.00
>>> 932.6835.26  669.890.00  669.89  18.87 100.00
>>>
>>>
>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>1.870.001.675.250.00   91.21
>>>
>>>
>>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>  sdg   0.00 0.000.00   94.33 0.00 26257.33
>>> 556.6918.29  208.440.00  208.44  10.50  99.07
>>> sdf   0.00 0.000.00   51.33 0.00 24470.67
>>> 953.4032.75  684.620.00  684.62  19.51 100.13
>>>
>>>
>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>1.510.001.347.250.00   89.89
>>>
>>>
>>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>  sdg   0.00 0.000.00   52.00 0.00 22565.33
>>> 867.9024.73  446.510.00  446.51  19.10  99.33
>>> sdf   0.00 0.000.00   64.67 0.00 24892.00
>>> 769.8619.50  330.020.00  330.02  15.32  99.07
>>> 
>>>
>>> >You what model SSD?
>>>
>>> For this one, I am using Seagate 100GB SSD, model: HDS-2TM-ST100FM0012
>>>
>>> >Which version of the kernel?
>>>
>>> Ubuntu 13.04, Linux kernel version: 3.8.0-19-generic #30-Ubuntu SMP Wed
>>> May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> Looking forward to your reply, thank you.
>>>
>>> Cheers.
>>>
>>>
>>>
>>> On Mon, Apr 28, 2014 at 4:45 PM, Irek Fasikhov wrote:
>>>
 You what model SSD?
 Which version of the kernel?



 2014-04-28 12:35 GMT+04:00 Udo Lembke :

> Hi,
> perhaps due IOs from the journal?
> You can test with iostat (like "iostat -dm 5 sdg").
>
> on debian iostat is in the package sysstat.
>
> Udo
>
> Am 28.04.2014 07:38, schrieb Indra Pramana:
> > Hi Craig,
> >
> > Good day to you, and thank you for your enquiry.
> >
> > As per your suggestion, I have created a 3rd partition on the SSDs
> and did
> > the dd test directly into the device, and the result is very slow.
> >
> > 
> > root@ceph-osd-08:/mnt# dd bs=1M count=128 if=/dev/zero of=/dev/sdg3
> > conv=fdatasync oflag=direct
> > 128+0 

Re: [ceph-users] Access denied error

2014-04-28 Thread Cedric Lemarchand
Hi Punit,

Le 28 avr. 2014 à 11:55, Punit Dambiwal mailto:hypu...@gmail.com>> a écrit :

> Hi Yehuda,
>
> I am using the same above method to call the api and used the way
> which described in the
> http://ceph.com/docs/master/radosgw/s3/authentication/#access-control-lists-acls
> for connection. The method in the
> http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html
> is for generating the hash of the header string and secret keys, since
> these keys are created already and i think we don't need this method,
> right ?
No, there are difference between the aws_access_id and aws_secret_key
(static, generated by radogw at the user creation) and the AWS
Authentication header, which is dynamic. As of my understanding, the AWS
signature header need to be regularly generated because of the parts it
embeds, plus the time expiration period. I think you can safely
regenerate the AWS Auth signature for each request.

Cheers

> I also tried one function to list out the bucket data as like
>
> curl -i 'http://gateway.3linux.com/test?format=json' -X GET -H
> 'Authorization: AWS
> KGXJJGKDM5G7G4CNKC7R:LC7S0twZdhtXA1XxthfMDsj5TgJpeKhZrloWa9WN' -H
> 'Host: gateway.3linux.com ' -H 'Date: Mon,
> 28 April 2014 07:25:00 GMT ' -H 'Content-Length: 0'
>
> but its also getting the access denied error. But i can view the
> bucket details by directly entering
> http://gateway.3linux.com/test?format=json in the browser. What do you
> think ? what may be the reason ? I am able to connect and list buckets
> etc using cyberduck ftp clients these access keys but unable to do
> with the function calls.
>
>
>
>
> On Sat, Apr 26, 2014 at 12:22 AM, Yehuda Sadeh  > wrote:
>
> On Fri, Apr 25, 2014 at 1:03 AM, Punit Dambiwal  > wrote:
> > Hi Yehuda,
> >
> > Thanks for your help...that missing date error gone but still i
> am getting
> > the access denied error :-
> >
> > -
> > 2014-04-25 15:52:56.988025 7f00d37c6700  1 == starting new
> request
> > req=0x237a090 =
> > 2014-04-25 15:52:56.988072 7f00d37c6700  2 req 24:0.46::GET
> > /admin/usage::initializing
> > 2014-04-25 15:52:56.988077 7f00d37c6700 10
> host=gateway.3linux.com 
> > rgw_dns_name=gateway.3linux.com 
> > 2014-04-25 15:52:56.988102 7f00d37c6700 20 FCGI_ROLE=RESPONDER
> > 2014-04-25 15:52:56.988103 7f00d37c6700 20 SCRIPT_URL=/admin/usage
> > 2014-04-25 15:52:56.988104 7f00d37c6700 20
> > SCRIPT_URI=http://gateway.3linux.com/admin/usage
> > 2014-04-25 15:52:56.988105 7f00d37c6700 20 HTTP_AUTHORIZATION=AWS
> > KGXJJGKDM5G7G4CNKC7R:LC7S0twZdhtXA1XxthfMDsj5TgJpeKhZrloWa9WN
> > 2014-04-25 15:52:56.988107 7f00d37c6700 20
> HTTP_USER_AGENT=curl/7.22.0
> > (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4
>  libidn/1.23
> > librtmp/2.3
> > 2014-04-25 15:52:56.988108 7f00d37c6700 20 HTTP_ACCEPT=*/*
> > 2014-04-25 15:52:56.988109 7f00d37c6700 20
> HTTP_HOST=gateway.3linux.com 
> > 2014-04-25 15:52:56.988110 7f00d37c6700 20 HTTP_DATE=Fri, 25
> April 2014
> > 07:50:00 GMT
> > 2014-04-25 15:52:56.988111 7f00d37c6700 20 CONTENT_LENGTH=0
> > 2014-04-25 15:52:56.988112 7f00d37c6700 20
> PATH=/usr/local/bin:/usr/bin:/bin
> > 2014-04-25 15:52:56.988113 7f00d37c6700 20 SERVER_SIGNATURE=
> > 2014-04-25 15:52:56.988114 7f00d37c6700 20
> SERVER_SOFTWARE=Apache/2.2.22
> > (Ubuntu)
> > 2014-04-25 15:52:56.988115 7f00d37c6700 20
> SERVER_NAME=gateway.3linux.com 
> > 2014-04-25 15:52:56.988116 7f00d37c6700 20 SERVER_ADDR=117.18.79.110
> > 2014-04-25 15:52:56.988117 7f00d37c6700 20 SERVER_PORT=80
> > 2014-04-25 15:52:56.988117 7f00d37c6700 20
> REMOTE_ADDR=122.166.115.191
> > 2014-04-25 15:52:56.988118 7f00d37c6700 20 DOCUMENT_ROOT=/var/www
> > 2014-04-25 15:52:56.988119 7f00d37c6700 20
> SERVER_ADMIN=c...@3linux.com 
> > 2014-04-25 15:52:56.988120 7f00d37c6700 20
> > SCRIPT_FILENAME=/var/www/s3gw.fcgi
> > 2014-04-25 15:52:56.988120 7f00d37c6700 20 REMOTE_PORT=28840
> > 2014-04-25 15:52:56.988121 7f00d37c6700 20 GATEWAY_INTERFACE=CGI/1.1
> > 2014-04-25 15:52:56.988122 7f00d37c6700 20 SERVER_PROTOCOL=HTTP/1.1
> > 2014-04-25 15:52:56.988123 7f00d37c6700 20 REQUEST_METHOD=GET
> > 2014-04-25 15:52:56.988123 7f00d37c6700 20
> > QUERY_STRING=page=admin¶ms=/usage&format=json
> > 2014-04-25 15:52:56.988124 7f00d37c6700 20
> > REQUEST_URI=/admin/usage?format=json
> > 2014-04-25 15:52:56.988125 7f00d37c6700 20 SCRIPT_NAME=/admin/usage
> > 2014-04-25 15:52:56.988126 7f00d37c6700  2 req 24:0.000101::GET
> > /admin/usage::getting op
> > 2014-04-25 15:52:56.988129 7f00d37c67

Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Andrija Panic
Thank you very much Wido,
any suggestion on compiling libvirt with support (I already found a way) or
perhaps use some prebuilt , that you would recommend ?

Best


On 28 April 2014 13:25, Wido den Hollander  wrote:

> On 04/28/2014 12:49 PM, Andrija Panic wrote:
>
>> Hi,
>>
>> I'm trying to add CEPH as Primary Storage, but my libvirt 0.10.2 (CentOS
>> 6.5) does some complaints:
>> -  internal error missing backend for pool type 8
>>
>> Is it possible that the libvirt 0.10.2 (shipped with CentOS 6.5) was not
>> compiled with RBD support ?
>> Can't find how to check this...
>>
>>
> No, it's probably not compiled with RBD storage pool support.
>
> As far as I know CentOS doesn't compile libvirt with that support yet.
>
>
>  I'm able to use qemu-img to create rbd images etc...
>>
>> Here is cloudstack-agent DEBUG output, all seems fine...
>>
>> 
>> 1e119e4c-20d1-3fbc-a525-a5771944046d
>> 1e119e4c-20d1-3fbc-a525-a5771944046d
>> 
>> 
>>
>
> I recommend creating a Round Robin DNS record which points to all your
> monitors.
>
>  cloudstack
>> 
>> 
>> 
>> 
>> 
>>
>> --
>>
>> Andrija Panić
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] What happened if rbd lose a block?

2014-04-28 Thread Timofey Koolin
What will happened if RBD lose all copied of data-block and I read the block?

Context:
I want use RDB as main storage with replication factor 1 and drbd for 
replication on non rbd storage by client side.

For example:
Computer1:
1. connect rbd as /dev/rbd15
2. use rbd as disk for drbd

Computer2:
Use HDD for drbd-replication.


I want protect from break of ceph system (for example while upgrade ceph) and 
long-distance replication.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What happened if rbd lose a block?

2014-04-28 Thread Wido den Hollander

On 04/28/2014 02:35 PM, Timofey Koolin wrote:

What will happened if RBD lose all copied of data-block and I read the block?



The read to the object will block until a replica comes online to serve it.

Remember this with Ceph: "Consistency goes over availability"


Context:
I want use RDB as main storage with replication factor 1 and drbd for 
replication on non rbd storage by client side.

For example:
Computer1:
1. connect rbd as /dev/rbd15
2. use rbd as disk for drbd

Computer2:
Use HDD for drbd-replication.


I want protect from break of ceph system (for example while upgrade ceph) and 
long-distance replication.


Ceph wants to be consistent at all times. So copying over long distances 
with high latency will be very slow.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Wido den Hollander

On 04/28/2014 02:15 PM, Andrija Panic wrote:

Thank you very much Wido,
any suggestion on compiling libvirt with support (I already found a way)
or perhaps use some prebuilt , that you would recommend ?



No special suggestions, just make sure you use at least Ceph 0.67.7

I'm not aware of any pre-build packages for CentOS.


Best


On 28 April 2014 13:25, Wido den Hollander mailto:w...@42on.com>> wrote:

On 04/28/2014 12:49 PM, Andrija Panic wrote:

Hi,

I'm trying to add CEPH as Primary Storage, but my libvirt 0.10.2
(CentOS
6.5) does some complaints:
-  internal error missing backend for pool type 8

Is it possible that the libvirt 0.10.2 (shipped with CentOS 6.5)
was not
compiled with RBD support ?
Can't find how to check this...


No, it's probably not compiled with RBD storage pool support.

As far as I know CentOS doesn't compile libvirt with that support yet.


I'm able to use qemu-img to create rbd images etc...

Here is cloudstack-agent DEBUG output, all seems fine...


1e119e4c-20d1-3fbc-a525-__a5771944046d
1e119e4c-20d1-3fbc-a525-__a5771944046d




I recommend creating a Round Robin DNS record which points to all
your monitors.

cloudstack






--

Andrija Panić


_
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com




--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902 
Skype: contact42on
_
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com





--

Andrija Panić
--
http://admintweets.com
--



--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Dan van der Ster


On 28/04/14 14:54, Wido den Hollander wrote:

On 04/28/2014 02:15 PM, Andrija Panic wrote:

Thank you very much Wido,
any suggestion on compiling libvirt with support (I already found a way)
or perhaps use some prebuilt , that you would recommend ?



No special suggestions, just make sure you use at least Ceph 0.67.7

I'm not aware of any pre-build packages for CentOS.


Look for qemu-kvm-rhev ... el6 ...
That's the Redhat built version of kvm which supports RBD.

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] packages for Trusty

2014-04-28 Thread Alphe Salas Michels

hello all,
to begin with there is no Emperor package for saucy. Emperor for saucy 
is only rolled through git and based on my experience that can broke 
ceph cluster to have the test builds rolling in constantly.


I don t know why there is a lack on the ceph.com/download section. But 
the fact that inktank consider stable production version of ceph to be 
dumpling should explain that much (that is what they sell). Why carring 
for today's ubuntu and today's "stable" when the real product sold is 
the ceph of past year that works greatly on the ubuntu from past year.


Alphe Salas.

On 04/25/2014 06:03 PM, Craig Lewis wrote:
Using the Emperor builds for Precise seems to work on Trusty.  I just 
put a hold on all of the ceph, rados, and apache packages before the 
release upgrade.


It makes me nervous though.  I haven't stressed it much, and I don't 
really want to roll it out to production.


I would like to see Emperor builds for Trusty, so I can get started 
rolling out Trusty independently of Firefly.  Changing one thing at a 
time is invaluable when bad things start happening.





*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



On 4/25/14 12:10 , Sebastien wrote:


Well as far as I know trusty has 0.79 and will get firefly as soon as 
it's ready so I'm not sure if it's that urgent. Precise repo should 
work fine.


My 2 cents


Sébastien Han
Cloud Engineer

"Always give 100%. Unless you're giving blood."

Phone: +33 (0)1 49 70 99 72
Mail: sebastien@enovance.com
Address : 11 bis, rue Roquépine 75008 Paris
Web : www.enovance.com - Twitter : @enovance


On Fri, Apr 25, 2014 at 9:05 PM, Travis Rhoden > wrote:


Are there packages for Trusty being built yet?

I don't see it listed at http://ceph.com/debian-emperor/dists/

Thanks,

- Travis



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Andrija Panic
Thanks Dan :)


On 28 April 2014 15:02, Dan van der Ster  wrote:

>
> On 28/04/14 14:54, Wido den Hollander wrote:
>
>> On 04/28/2014 02:15 PM, Andrija Panic wrote:
>>
>>> Thank you very much Wido,
>>> any suggestion on compiling libvirt with support (I already found a way)
>>> or perhaps use some prebuilt , that you would recommend ?
>>>
>>>
>> No special suggestions, just make sure you use at least Ceph 0.67.7
>>
>> I'm not aware of any pre-build packages for CentOS.
>>
>
> Look for qemu-kvm-rhev ... el6 ...
> That's the Redhat built version of kvm which supports RBD.
>
> Cheers, Dan
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Andrija Panic
Dan, is this maybe just rbd support for kvm package (I already have rbd
enabled qemu, qemu-img etc from ceph.com site)
I need just libvirt with rbd support ?

Thanks


On 28 April 2014 15:05, Andrija Panic  wrote:

> Thanks Dan :)
>
>
> On 28 April 2014 15:02, Dan van der Ster wrote:
>
>>
>> On 28/04/14 14:54, Wido den Hollander wrote:
>>
>>> On 04/28/2014 02:15 PM, Andrija Panic wrote:
>>>
 Thank you very much Wido,
 any suggestion on compiling libvirt with support (I already found a way)
 or perhaps use some prebuilt , that you would recommend ?


>>> No special suggestions, just make sure you use at least Ceph 0.67.7
>>>
>>> I'm not aware of any pre-build packages for CentOS.
>>>
>>
>> Look for qemu-kvm-rhev ... el6 ...
>> That's the Redhat built version of kvm which supports RBD.
>>
>> Cheers, Dan
>>
>
>
>
> --
>
> Andrija Panić
> --
>   http://admintweets.com
> --
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Dan van der Ster
Indeed. Actually, we didn't need a special libvirt, only the special 
qemu-kvm-rhev. But we use it via openstack, so I don't know if there is 
other redhat magic involved.


sorry, dan



On 28/04/14 15:08, Andrija Panic wrote:
Dan, is this maybe just rbd support for kvm package (I already have 
rbd enabled qemu, qemu-img etc from ceph.com  site)

I need just libvirt with rbd support ?

Thanks


On 28 April 2014 15:05, Andrija Panic > wrote:


Thanks Dan :)


On 28 April 2014 15:02, Dan van der Ster
mailto:daniel.vanders...@cern.ch>> wrote:


On 28/04/14 14:54, Wido den Hollander wrote:

On 04/28/2014 02:15 PM, Andrija Panic wrote:

Thank you very much Wido,
any suggestion on compiling libvirt with support (I
already found a way)
or perhaps use some prebuilt , that you would recommend ?


No special suggestions, just make sure you use at least
Ceph 0.67.7

I'm not aware of any pre-build packages for CentOS.


Look for qemu-kvm-rhev ... el6 ...
That's the Redhat built version of kvm which supports RBD.

Cheers, Dan




-- 


Andrija Panic'
--
http://admintweets.com
--




--

Andrija Panic'
--
http://admintweets.com
--


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Installing ceph without access to the internet

2014-04-28 Thread Alfredo Deza
On Sun, Apr 27, 2014 at 7:24 AM, Cedric Lemarchand  wrote:
> Hi rAn,
>
> Le 27/04/2014 13:13, rAn rAnn a écrit :
>
> Thanks all
> Im trying to deploy from node1(the admin node) to the new node via the
> command " ceph-deploy install node2".
> I have coppied the two main repositories (noarc and x86-64) to my  secure
> site and I have encountered the folowing warnings and errors;
> [node2][warnin] http://ceph.com/rmp-emperor/e16/x86_64/repodata/repomd.xml:
> Errno 14 PYCURL ERROR 6 -"couldn't resolve host 'ceph.com'"
> [node2][warnin] ERROR: Cannot retrive repository metadata [repomd.xml] for
> repository: ceph. please verify its path and try again
> [node2][ERROR] RuntimeError: Failed to execute command yum -y -q install
> ceph
> Anyone have an idea?

For the past few versions, ceph-deploy has a few options for
installation without internet access.

If you have a ceph repo mirror you could tell ceph-deploy to install
from that url (you will need to pass
in the gpg key url too). For example:

ceph-deploy install --repo-url {http mirror} --gpg-url {http gpg url} {host}



>
> ceph-deploy tools remotely 'pilot' the ceph installation and configuration
> on the specified node, including package installation, thus you still need
> an internet access for the installation parts, which is why pycurl (and then
> yum) complains.
>
> Possible solutions could be :
>
> - as Eric state it, create a local mirror of the remote package repository
> (don't know if it's an easy task ...), and configure your OS to use it.
> - download and install all the necessary packages and dependency on nodes
> before using ceph-deploy, thus you will profit of the local packages cache
> for every operations.
>
> Cheers
>
> Cédric
>
> בתאריך 27 באפר 2014 14:02, "xan.peng"  כתב:
>>
>> On Sat, Apr 26, 2014 at 7:16 PM, rAn rAnn  wrote:
>> > hi, Im on a site with no access to the internet and Im trying to install
>> > ceph
>> > during the installation it tries to download files from the internet and
>> > then I get an error
>> > I tried to download the files and make my own repository, also i have
>> > changed the installation code to point to a different path but it still
>> > keeps trying to access the internet
>> > has anyone managed to install ceph on a secure site or maybe someone has
>> > an
>> > idea or a way to install it
>> > thanks in advance.
>> >
>>
>> You can always install ceph by compiling the source code. However this
>> half-done
>> project (https://github.com/xanpeng/ceph-tools/tree/master/deploy-ceph)
>> may help.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> Cédric
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd Problem with disk full

2014-04-28 Thread Alphe Salas

Hello,
I need rbd kernel module to really delete data on osd related disks. 
Having a ever growing "hidden data" is not a great solution.


Then we can say that first of all we should be able at least manually to 
strip out the "hidden" data aka the replicas.


I use rbd image let say it is 10 TB on  a overall available space of 
25TB. What the real case experience shows me if that if I write in a row 
8Tb of my 10 tb. then overall used data is around 18TB. Then I delete 
from the rbd image 4TB and write 4 TB then the overall data would grow 
from 4 TB, ofcours the pgs used by the rbd image will be reused 
overwritten but the replicas corresponding will not so.

in the end after round 2 of writing the overall used space is 22TB
at that moment i get stuff like this:

2034 active+clean
   7 active+remapped+wait_backfill+backfill_toofull
   7 active+remapped+backfilling

I tried to use ceph osd reweight-by-utilization but that  didn t solve 
the problem. And if the problem is solve it would be only momentarily 
because after cleaning again 4TB and writing 4TB then I will reach the 
full ratio and get my osd stucked until I spend 12 000 dollars to 
enhance my ceph cluster. Because when you manipulate a 40TB ceph cluster

adding 4TB isn t quite mutch of a difference.

In the end for 40TB of real space 20 disks of 2TB after first formating
I get a 37 TB cluster of available data. Then I do a 18TB rbd image. And
can t use much than 16TB before having my osds showing page stucks.

In the end 37TB for a 16TB of available disk space for sometimes is 
quite not the great solution at all because I loose 60% of my data 
storage.


On the how to delete data, really I don't know the more "easy" way
I can see is at least to be able to manually tell rbd kernel module to
clean "released" data from osd when we see it fit "maintenance time".

If doing it automatically has a too bad impact on overall performances.
I would be glad yet to be able to decide an appropriate moment to force
cleaning task that would be better than nothing and ever growing "hiden" 
data situation.


Regards,

--
Alphe Salas
I.T ingeneer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OpenStack Icehouse and ephemeral disks created from image

2014-04-28 Thread Sebastien Han
FYI It’s fixed here: https://review.openstack.org/#/c/90644/1

 
Sébastien Han 
Cloud Engineer 

"Always give 100%. Unless you're giving blood.” 

Phone: +33 (0)1 49 70 99 72 
Mail: sebastien@enovance.com 
Address : 11 bis, rue Roquépine - 75008 Paris
Web : www.enovance.com - Twitter : @enovance 

On 25 Apr 2014, at 18:16, Sebastien Han  wrote:

> I just tried, I have the same problem, it looks like a regression…
> It’s weird because the code didn’t change that much during the Icehouse cycle.
> 
> I just reported the bug here: https://bugs.launchpad.net/cinder/+bug/1312819
> 
>  
> Sébastien Han 
> Cloud Engineer 
> 
> "Always give 100%. Unless you're giving blood.” 
> 
> Phone: +33 (0)1 49 70 99 72 
> Mail: sebastien@enovance.com 
> Address : 11 bis, rue Roquépine - 75008 Paris
> Web : www.enovance.com - Twitter : @enovance 
> 
> On 25 Apr 2014, at 16:37, Sebastien Han  wrote:
> 
>> g
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd always expending data space problem.

2014-04-28 Thread Alphe Salas Michels

Hello all,
recently I get to the conclusion that of a 40 TB of physical space I 
could use only 16TB before seeing pg stick because osd was too full.

The data space used seems to be for ever growing.

using ceph osd reweight-by-utilization 103 seems at first to rebalance 
the osd pg use. Then the problem is solve for a time. But then the 
problems appears again with more PGs stuck_too_full. and the problem for 
ever grows. Sure the solution should be to add more disk space but for 
that enhancement to be significant and solving the problem it should be 
at least of a 25% which means growing the ceph cluster of 10 TB (5 disks 
of 2TB or 3 disks of 4TB) that has a cost, and the problem will only be 
solved for a moment until the replicas that are never freed fills again 
the added data.



In the end I really can count on using a rbd image of 16 TB out of a 37 
TB of global ceph cluster disk. Which means I can really use a 40% and 
over the time that ratio will drop constantly.


So It is requiered of that the replicas and data can be overwriten so 
that the hidden data will not keep growing. Or that I can clean them 
when I need to.


Alphe Salas.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] apple support automated mail ? WTF!?

2014-04-28 Thread Alphe Salas Michels

**Hello,
each time I send a mail to the ceph user mailing list I receive an email 
from apple support?!

Is that a joke?


Alphe Salas

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OpenStack Icehouse and ephemeral disks created from image

2014-04-28 Thread Maciej Gałkiewicz
On 28 April 2014 15:58, Sebastien Han  wrote:

> FYI It’s fixed here: https://review.openstack.org/#/c/90644/1


I already have this patch and it didn't help. Have it fixed the problem in
your cluster?

-- 
Maciej Gałkiewicz
Shelly Cloud Sp. z o. o., Co-founder, Sysadmin
http://shellycloud.com/, mac...@shellycloud.com
KRS: 440358 REGON: 101504426
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OpenStack Icehouse and ephemeral disks created from image

2014-04-28 Thread Sebastien Han
Yes yes, just restart cinder-api and cinder-volume.
It worked for me.

 
Sébastien Han 
Cloud Engineer 

"Always give 100%. Unless you're giving blood.” 

Phone: +33 (0)1 49 70 99 72 
Mail: sebastien@enovance.com 
Address : 11 bis, rue Roquépine - 75008 Paris
Web : www.enovance.com - Twitter : @enovance 

On 28 Apr 2014, at 16:10, Maciej Gałkiewicz  wrote:

> On 28 April 2014 15:58, Sebastien Han  wrote:
> FYI It’s fixed here: https://review.openstack.org/#/c/90644/1
> 
> I already have this patch and it didn't help. Have it fixed the problem in 
> your cluster? 
> 
> -- 
> Maciej Gałkiewicz
> Shelly Cloud Sp. z o. o., Co-founder, Sysadmin
> http://shellycloud.com/, mac...@shellycloud.com
> KRS: 440358 REGON: 101504426



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Data placement algorithm

2014-04-28 Thread Séguin Cyril

Hy all,

I'm currently interesting in comparing Ceph's block replica placement 
policy with other placements algorithm.


Is it possible to access to the source code of Ceph's placement policy 
and where can I find it?


Thanks a lot.

Best regards.

CS

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Access denied error

2014-04-28 Thread shanil

Hi Yehuda,

We are using the same above method to call the api and used the way 
which described in the 
http://ceph.com/docs/master/radosgw/s3/authentication/#access-control-lists-acls 
for connection. The method in the 
http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html 
is for generating the hash of the header string and secret keys, since 
these keys are created already and i think we don't need this method, 
right ?


I also tried one function to list out the bucket data as like

curl -i 'http://gateway.3linux.com/test?format=json' -X GET -H 
'Authorization: AWS 
KGXJJGKDM5G7G4CNKC7R:LC7S0twZdhtXA1XxthfMDsj5TgJpeKhZrloWa9WN' -H 'Host: 
gateway.3linux.com' -H 'Date: Mon, 28 April 2014 07:25:00 GMT ' -H 
'Content-Length: 0'


but its also getting the access denied error. But i can view the bucket 
details by directly entering http://gateway.3linux.com/test?format=json 
in the browser. What do you think ? what may be the reason ? I am able 
to connect and list buckets etc using cyberduck ftp clients these access 
keys but unable to do with the function calls.



On Saturday 26 April 2014 10:17 AM, Punit Dambiwal wrote:

Hi Shanil,

I got the following reply from community :-

Still signing issues. If you're manually constructing the auth header
you need to make it look like the above (copy pasted here):

> 2014-04-25 15:52:56.988239 7f00d37c6700 10 auth_hdr:
> GET
>
>
> Fri, 25 April 2014 07:50:00 GMT
> /admin/usage

Then you need to run hmac-sha1 on it, as described here:

http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html

If you have any backslash in the key then you need to remove it, it's
just an escape character for representing slashes in json.


-- Forwarded message --
From: *Yehuda Sadeh* mailto:yeh...@inktank.com>>
Date: Sat, Apr 26, 2014 at 12:22 AM
Subject: Re: [ceph-users] Access denied error
To: Punit Dambiwal mailto:hypu...@gmail.com>>
Cc: "ceph-users@lists.ceph.com " 
mailto:ceph-users@lists.ceph.com>>



On Fri, Apr 25, 2014 at 1:03 AM, Punit Dambiwal > wrote:

> Hi Yehuda,
>
> Thanks for your help...that missing date error gone but still i am 
getting

> the access denied error :-
>
> -
> 2014-04-25 15:52:56.988025 7f00d37c6700  1 == starting new request
> req=0x237a090 =
> 2014-04-25 15:52:56.988072 7f00d37c6700  2 req 24:0.46::GET
> /admin/usage::initializing
> 2014-04-25 15:52:56.988077 7f00d37c6700 10 host=gateway.3linux.com 


> rgw_dns_name=gateway.3linux.com 
> 2014-04-25 15:52:56.988102 7f00d37c6700 20 FCGI_ROLE=RESPONDER
> 2014-04-25 15:52:56.988103 7f00d37c6700 20 SCRIPT_URL=/admin/usage
> 2014-04-25 15:52:56.988104 7f00d37c6700 20
> SCRIPT_URI=http://gateway.3linux.com/admin/usage
> 2014-04-25 15:52:56.988105 7f00d37c6700 20 HTTP_AUTHORIZATION=AWS
> KGXJJGKDM5G7G4CNKC7R:LC7S0twZdhtXA1XxthfMDsj5TgJpeKhZrloWa9WN
> 2014-04-25 15:52:56.988107 7f00d37c6700 20 HTTP_USER_AGENT=curl/7.22.0
> (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 
 libidn/1.23

> librtmp/2.3
> 2014-04-25 15:52:56.988108 7f00d37c6700 20 HTTP_ACCEPT=*/*
> 2014-04-25 15:52:56.988109 7f00d37c6700 20 
HTTP_HOST=gateway.3linux.com 

> 2014-04-25 15:52:56.988110 7f00d37c6700 20 HTTP_DATE=Fri, 25 April 2014
> 07:50:00 GMT
> 2014-04-25 15:52:56.988111 7f00d37c6700 20 CONTENT_LENGTH=0
> 2014-04-25 15:52:56.988112 7f00d37c6700 20 
PATH=/usr/local/bin:/usr/bin:/bin

> 2014-04-25 15:52:56.988113 7f00d37c6700 20 SERVER_SIGNATURE=
> 2014-04-25 15:52:56.988114 7f00d37c6700 20 SERVER_SOFTWARE=Apache/2.2.22
> (Ubuntu)
> 2014-04-25 15:52:56.988115 7f00d37c6700 20 
SERVER_NAME=gateway.3linux.com 

> 2014-04-25 15:52:56.988116 7f00d37c6700 20 SERVER_ADDR=117.18.79.110
> 2014-04-25 15:52:56.988117 7f00d37c6700 20 SERVER_PORT=80
> 2014-04-25 15:52:56.988117 7f00d37c6700 20 REMOTE_ADDR=122.166.115.191
> 2014-04-25 15:52:56.988118 7f00d37c6700 20 DOCUMENT_ROOT=/var/www
> 2014-04-25 15:52:56.988119 7f00d37c6700 20 
SERVER_ADMIN=c...@3linux.com 

> 2014-04-25 15:52:56.988120 7f00d37c6700 20
> SCRIPT_FILENAME=/var/www/s3gw.fcgi
> 2014-04-25 15:52:56.988120 7f00d37c6700 20 REMOTE_PORT=28840
> 2014-04-25 15:52:56.988121 7f00d37c6700 20 GATEWAY_INTERFACE=CGI/1.1
> 2014-04-25 15:52:56.988122 7f00d37c6700 20 SERVER_PROTOCOL=HTTP/1.1
> 2014-04-25 15:52:56.988123 7f00d37c6700 20 REQUEST_METHOD=GET
> 2014-04-25 15:52:56.988123 7f00d37c6700 20
> QUERY_STRING=page=admin¶ms=/usage&format=json
> 2014-04-25 15:52:56.988124 7f00d37c6700 20
> REQUEST_URI=/admin/usage?format=json
> 2014-04-25 15:52:56.988125 7f00d37c6700 20 SCRIPT_NAME=/admin/usage
> 2014-04-25 15:52:56.988126 7f00d37c6700  2 req 24:0.000101::GET
> /admin/usage::getting op
> 2014-04-25 15:52:56.988129 7f00d37c6700  2 req 24:0.000104::GET
> /admin/usage:get_usage:authori

Re: [ceph-users] cluster_network ignored

2014-04-28 Thread Kurt Bauer

To see where your OSDs and Mon are listening, you have various cmds in
Linux, e.g:

'lsof -ni | grep ceph' - you should see one LISTEN line for the monitor,
2 LISTEN lines for the OSDs and a lot of ESTABLISHED lines, which
indicate communication between OSDs and OSDs and clients
'netstat -atn | grep LIST' - you should see a lot of lines with
portnumber 6800 and upwards (OSDs) and port 6789 (MON)
More comments inline.

HTH,
Kurt
> Gandalf Corvotempesta 
> 28. April 2014 11:05
> 2014-04-26 12:06 GMT+02:00 Gandalf Corvotempesta
>
> I've added "cluster addr" and "public addr" to each OSD configuration
> but nothing is changed.
> I see all OSDs down except the ones from one server but I'm able to
> ping each other nodes on both interfaces.

What do you mean by "I see all OSDs down"? What does a 'ceph osd stat' say?
>
> How can I detect what ceph is doing?
'ceph -w'

> I see tons of debug logs but they
> are not very easy to understand
> with "ceph health" i can see that "pgs down" value is slowly
> decreasing so I can suppose that caph is recovering. Is that right?
What's the output of 'ceph -s'

>
> Isn't possible to add a semplified output like the one coming from
> "mdadm"? (cat /proc/mdstat)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> Gandalf Corvotempesta 
> 26. April 2014 12:06
> I've not defined cluster IPs for each OSD server but only the whole
> subnet.
> Should I define each IP for each OSD ? This is not wrote on docs and
> could be tricky to do this in big environments with hundreds of nodes
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> McNamara, Bradley 
> 24. April 2014 20:04
> Do you have all of the cluster IP's defined in the host file on each
> OSD server? As I understand it, the mon's do not use a cluster
> network, only the OSD servers.
>
> -Original Message-
> From: ceph-users-boun...@lists.ceph.com
> [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Gandalf
> Corvotempesta
> Sent: Thursday, April 24, 2014 8:54 AM
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] cluster_network ignored
>
> I'm trying to configure a small ceph cluster with both public and
> cluster networks.
> This is my conf:
>
> [global]
> public_network = 192.168.0/24
> cluster_network = 10.0.0.0/24
> auth cluster required = cephx
> auth service required = cephx
> auth client required = cephx
> fsid = 004baba0-74dc-4429-84ec-1e376fb7bcad
> osd pool default pg num = 8192
> osd pool default pgp num = 8192
> osd pool default size = 3
>
> [mon]
> mon osd down out interval = 600
> mon osd mon down reporters = 7
> [mon.osd1]
> host = osd1
> mon addr = 192.168.0.1
> [mon.osd2]
> host = osd2
> mon addr = 192.168.0.2
> [mon.osd3]
> host = osd3
> mon addr = 192.168.0.3
>
> [osd]
> osd mkfs type = xfs
> osd journal size = 16384
> osd mon heartbeat interval = 30
> filestore merge threshold = 40
> filestore split multiple = 8
> osd op threads = 8
> osd recovery max active = 5
> osd max backfills = 2
> osd recovery op priority = 2
>
>
> on each node I have bond0 bound to 192.168.0.x and bond1 bound to
> 10.0.0.x When ceph is doing recovery, I can see replication through
> bond0 (public interface) and nothing via bond1 (cluster interface)
>
> Should I configure anything else ?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> Gandalf Corvotempesta 
> 24. April 2014 17:53
> I'm trying to configure a small ceph cluster with both public and
> cluster networks.
> This is my conf:
>
> [global]
> public_network = 192.168.0/24
> cluster_network = 10.0.0.0/24
> auth cluster required = cephx
> auth service required = cephx
> auth client required = cephx
> fsid = 004baba0-74dc-4429-84ec-1e376fb7bcad
> osd pool default pg num = 8192
> osd pool default pgp num = 8192
> osd pool default size = 3
>
> [mon]
> mon osd down out interval = 600
> mon osd mon down reporters = 7
> [mon.osd1]
> host = osd1
> mon addr = 192.168.0.1
> [mon.osd2]
> host = osd2
> mon addr = 192.168.0.2
> [mon.osd3]
> host = osd3
> mon addr = 192.168.0.3
>
> [osd]
> osd mkfs type = xfs
> osd journal size = 16384
> osd mon heartbeat interval = 30
> filestore merge threshold = 40
> filestore split multiple = 8
> osd op threads = 8
> osd recovery max active = 5
> osd max backfills = 2
> osd recovery op priority = 2
>
>
> on each node I have bond0 bound to 192.168.0.x and bond1 bound to 10.0.0.x
> When cep

Re: [ceph-users] Data placement algorithm

2014-04-28 Thread xan.peng
I think what you want is Ceph's CRUSH algorithm.
Source code: https://github.com/ceph/ceph/tree/master/src/crush
Paper: http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf

On Mon, Apr 28, 2014 at 5:27 PM, Séguin Cyril
 wrote:
> Hy all,
>
> I'm currently interesting in comparing Ceph's block replica placement policy
> with other placements algorithm.
>
> Is it possible to access to the source code of Ceph's placement policy and
> where can I find it?
>
> Thanks a lot.
>
> Best regards.
>
> CS
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Only one OSD log available per node?

2014-04-28 Thread Gregory Farnum
It is not. My guess from looking at the time stamps is that maybe you have
a log rotation system set up that isn't working properly?
-Greg

On Sunday, April 27, 2014, Indra Pramana  wrote:

> Dear all,
>
> I have multiple OSDs per node (normally 4) and I realised that for all the
> nodes that I have, only one OSD will contain logs under /var/log/ceph, the
> rest of the logs are empty.
>
> root@ceph-osd-07:/var/log/ceph# ls -la *.log
> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-client.admin.log
> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.0.log
> -rw-r--r-- 1 root root 386857 Apr 28 14:02 ceph-osd.12.log
> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.13.log
> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.14.log
> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.15.log
> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd..log
>
> The ceph-osd.12.log only contains the logs for osd.12 only, while the
> other logs for osd.13, 14 and 15 are not available and empty.
>
> Is this normal?
>
> Looking forward to your reply, thank you.
>
> Cheers.
>


-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Data placement algorithm

2014-04-28 Thread Séguin Cyril

Yes it is!

Thanks a lot.

On 28/04/2014 17:18, xan.peng wrote:

I think what you want is Ceph's CRUSH algorithm.
Source code: https://github.com/ceph/ceph/tree/master/src/crush
Paper: http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf

On Mon, Apr 28, 2014 at 5:27 PM, Séguin Cyril
 wrote:

Hy all,

I'm currently interesting in comparing Ceph's block replica placement policy
with other placements algorithm.

Is it possible to access to the source code of Ceph's placement policy and
where can I find it?

Thanks a lot.

Best regards.

CS

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What happened if rbd lose a block?

2014-04-28 Thread Timofey Koolin
Is a setting for change the behavior to return read error instead of block read?

I think it is more reasonable behavior because it is similar to bad block on 
HDD: it can’t be read.

Or may be a timeout some seconds, then return read error for the block and 
other absented blocks in same image/PG.


Or is any method safe upgrade cluster without downtime.

Now if I will upgrade monitors and upgrade will fail on second (of three) 
monitor - cluster will down. Becouse it will have
1 new monitor
1 down monitor
1 old monitor

Old and mew monitor haven’t quorum.

Same for 5 monitors:
2 new monitors
1 down monitor
2 old monitors.

> On 04/28/2014 02:35 PM, Timofey Koolin wrote:
>> What will happened if RBD lose all copied of data-block and I read the block?
>> 
> 
> The read to the object will block until a replica comes online to serve it.
> 
> Remember this with Ceph: "Consistency goes over availability"
> 
>> Context:
>> I want use RDB as main storage with replication factor 1 and drbd for 
>> replication on non rbd storage by client side.
>> 
>> For example:
>> Computer1:
>> 1. connect rbd as /dev/rbd15
>> 2. use rbd as disk for drbd
>> 
>> Computer2:
>> Use HDD for drbd-replication.
>> 
>> 
>> I want protect from break of ceph system (for example while upgrade ceph) 
>> and long-distance replication.
> 
> Ceph wants to be consistent at all times. So copying over long distances with 
> high latency will be very slow.
> 
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> 
> -- 
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CentOS 6 Yum repository broken / tampered with?

2014-04-28 Thread Brian Rak
Were there any changes to the EL6 yum packages at 
http://ceph.com/rpm/el6/x86_64/ ?


There are a number of files showing a modification date of 
'25-Apr-2014', but it seems that no one regenerated the repository metadata.


This breaks installations using the repository, you'll get errors like this:

 http://ceph.com/rpm/el6/x86_64/libcephfs1-0.72.2-0.el6.x86_64.rpm: 
[Errno -1] Package does not match intended download. Suggestion: run yum 
--enablerepo=ceph clean metadata.


This is definitely an issue with the repository.  The metadata shows:


libcephfs1
x86_64

pkgid="YES">4bdb7c99a120bb3e0de2b642e00c6e28fa75dbbe

   



   .


But, if we download that package and check the checksum, we get:

$ wget -q http://ceph.com/rpm/el6/x86_64/libcephfs1-0.72.2-0.el6.x86_64.rpm
$ sha1sum libcephfs1-0.72.2-0.el6.x86_64.rpm
4d9730c9dd6dad6cc1b08abc6a4ef5ae0e497aec libcephfs1-0.72.2-0.el6.x86_64.rpm


It's my understanding that you never want to make changes to an existing 
package, because any machine that's already installed it will not have 
the updates applied.  You'd generally just increase the iteration number.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] apple support automated mail ? WTF!?

2014-04-28 Thread Brian Rak

I thought that was just me..  I guess someone from apple is subscribed?

On 4/28/2014 10:06 AM, Alphe Salas Michels wrote:

**Hello,
each time I send a mail to the ceph user mailing list I receive an 
email from apple support?!

Is that a joke?


Alphe Salas



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] apple support automated mail ? WTF!?

2014-04-28 Thread Patrick McGarry
Yeah, someone subscribed supp...@apple.com.  I just unsubscribed them
from the list.  Let me know if it shows back up and I'll ban it.
Thanks.



Best Regards,

Patrick McGarry
Director, Community || Inktank
http://ceph.com  ||  http://inktank.com
@scuttlemonkey || @ceph || @inktank


On Mon, Apr 28, 2014 at 12:37 PM, Brian Rak  wrote:
> I thought that was just me..  I guess someone from apple is subscribed?
>
> On 4/28/2014 10:06 AM, Alphe Salas Michels wrote:
>
> Hello,
> each time I send a mail to the ceph user mailing list I receive an email
> from apple support?!
> Is that a joke?
>
>
> Alphe Salas
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [ANN] ceph-deploy 1.5.0 released!

2014-04-28 Thread Alfredo Deza
Hi All,

There is a new release of ceph-deploy, the easy deployment tool for Ceph.

This release comes with a few bug fixes and a few features:

* implement `osd list`
* add a status check on OSDs when deploying
* sync local mirrors to remote hosts when installing
* support flags and options set in cephdeploy.conf

The full list of changes and fixes is documented at:

http://ceph.com/ceph-deploy/docs/changelog.html#id1

Make sure you update!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cluster_network ignored

2014-04-28 Thread Gandalf Corvotempesta
2014-04-28 17:17 GMT+02:00 Kurt Bauer :
> What do you mean by "I see all OSDs down"?

I mean that my OSDs are detected as down:

$ sudo ceph osd tree
# id weight type name up/down reweight
-1 12.74 root default
-2 3.64 host osd13
0 1.82 osd.0 down 0
2 1.82 osd.2 down 0
-3 5.46 host osd12
1 1.82 osd.1 up 1
3 1.82 osd.3 down 0
4 1.82 osd.4 down 0
-4 3.64 host osd14
5 1.82 osd.5 down 0
6 1.82 osd.6 up 1



> What does a 'ceph osd stat' say?

 osdmap e1640: 7 osds: 2 up, 2 in

> How can I detect what ceph is doing?
>
> 'ceph -w'

Ok, but there I can't see something like "recovering, 57% complete" or
something similiar.

> What's the output of 'ceph -s'

$ sudo ceph -s
cluster 6b9916f9-c209-4f53-98c6-581adcdf0955
 health HEALTH_WARN 3383 pgs degraded; 59223 pgs down; 12986 pgs
incomplete; 81691 pgs peering; 25071 pgs stale; 95049 pgs stuck
inactive; 25071 pgs stuck stale; 98432 pgs stuck unclean; 16 requests
are blocked > 32 sec; recovery 1/189 objects degraded (0.529%)
 monmap e3: 3 mons at
{osd12=192.168.0.112:6789/0,osd13=192.168.0.113:6789/0,osd14=192.168.0.114:6789/0},
election epoch 326, quorum 0,1,2 osd12,osd13,osd14
 osdmap e1640: 7 osds: 2 up, 2 in
  pgmap v1046855: 98432 pgs, 14 pools, 65979 bytes data, 63 objects
969 MB used, 3721 GB / 3722 GB avail
1/189 objects degraded (0.529%)
  24 stale
   12396 peering
 348 remapped
   44014 down+peering
3214 active+degraded
3949 stale+peering
   11613 stale+down+peering
  24 stale+active+degraded
 145 active+replay+degraded
6123 remapped+peering
3159 down+remapped+peering
3962 incomplete
 437 stale+down+remapped+peering
9024 stale+incomplete
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Creating a bucket failed

2014-04-28 Thread Seowon Jung
Hello,

I've installed Ceph Emperor on my Ubuntu 12.04 server to test many things.
 Everything was pretty good so far, but now I got a problem (403,
 AccessDenied) when I try to create a bucket through S3-compatible API.
 Please read the following information.

*Client Information*
Computer: Ubuntu 12.04 64bit Desktop
S3 Client: Dragon Disk 1.05


*Server Information*
Server Hardware: 2 servers, 2 storage array (12 OSDs each, total 24 OSDs)
OS: Ubuntu 12.04 64bit
Ceph: Emperor, Health OK, all OSDs UP


*Configurations:*

ceph.conf
[global]
fsid = 2606e43d-6ca3-4aeb-b760-507a97e06190
mon_initial_members = lab0, lab1
mon_host = 172.17.1.250,172.17.1.251
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_max_attr_size = 655360
osd pool default size = 3
osd pool default min size = 1
osd pool default pg num = 800
osd pool default pgp num = 800

[client.radosgw.gateway]
host = lab0
keyring = /etc/ceph/keyring.radosgw.gateway
rgw socket path = /tmp/radosgw.sock
log file = /var/log/ceph/radosgw.log
rgw data = /var/lib/ceph/radosgw
rgw dns name = lab0.coe.hawaii.edu
rgw print continue = false


Apache
/etc/apache2/sites-enabled/rgw

FastCgiExternalServer /var/www/s3gw.fcgi -socket /tmp/radosgw.sock
ServerName  lab0.coe.hawaii.edu
ServerAdmin webmaster@localhost
DocumentRoot /var/www

RewriteEngine On
RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*)
/s3gw.fcgi?page=$1¶ms=$2&%{QUERY_STRING}
[E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]



Options +ExecCGI
AllowOverride All
SetHandler fastcgi-script
Order allow,deny
allow from all
AuthBasicAuthoritative Off



AllowEncodedSlashes On
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined
ServerSignature Off



User Info:
# radosgw-admin user info --uid=admin
{ "user_id": "admin",
  "display_name": "Admin",
  "email": "",
  "suspended": 0,
  "max_buckets": 1000,
  "auid": 0,
  "subusers": [],
  "keys": [
{ "user": "admin",
  "access_key": "A3R0CEF3140MLIZIXN4X",
  "secret_key": "K8TRyfK8ArRjGRnSRvd4N5gY4TdeK1wK\/1iGCcGO"}],
  "swift_keys": [],
  "caps": [],
  "op_mask": "read, write, delete",
  "default_placement": "",
  "placement_tags": [],
  "bucket_quota": { "enabled": false,
  "max_size_kb": -1,
  "max_objects": -1}}


/var/log/ceph/radosgw.log:
2014-04-28 10:44:42.206681 7fc9b9feb700 15 calculated
digest=6JGkEimcy2pBN3Ty6mfYh6SudcA=
2014-04-28 10:44:42.206685 7fc9b9feb700 15
auth_sign=6JGkEimcy2pBN3Ty6mfYh6SudcA=
2014-04-28 10:44:42.206686 7fc9b9feb700 15 compare=0
2014-04-28 10:44:42.206691 7fc9b9feb700  2 req 20:0.000456:s3:PUT
/:create_bucket:reading permissions
2014-04-28 10:44:42.206697 7fc9b9feb700  2 req 20:0.000463:s3:PUT
/:create_bucket:init op
2014-04-28 10:44:42.206701 7fc9b9feb700  2 req 20:0.000467:s3:PUT
/:create_bucket:verifying op mask
2014-04-28 10:44:42.206704 7fc9b9feb700 20 required_mask= 2 user.op_mask=7
2014-04-28 10:44:42.206706 7fc9b9feb700  2 req 20:0.000472:s3:PUT
/:create_bucket:verifying op permissions
2014-04-28 10:44:42.209718 7fc9b9feb700  2 req 20:0.003483:s3:PUT
/:create_bucket:verifying op params
2014-04-28 10:44:42.209742 7fc9b9feb700  2 req 20:0.003508:s3:PUT
/:create_bucket:executing
2014-04-28 10:44:42.209776 7fc9b9feb700 20 get_obj_state:
rctx=0x7fc928009bd0 obj=.rgw:test state=0x7fc92800cfd8 s->prefetch_data=0
2014-04-28 10:44:42.209790 7fc9b9feb700 10 moving .rgw+test to cache LRU end
2014-04-28 10:44:42.209793 7fc9b9feb700 10 cache get: name=.rgw+test : type
miss (requested=22, cached=0)
2014-04-28 10:44:42.211397 7fc9b9feb700 10 cache put: name=.rgw+test
2014-04-28 10:44:42.211417 7fc9b9feb700 10 moving .rgw+test to cache LRU end
2014-04-28 10:44:42.212563 7fc9b9feb700 20 rgw_create_bucket returned
ret=-1 bucket=test(@{i=.rgw.buckets.index}.rgw.buckets[default.5154.9])
2014-04-28 10:44:42.212629 7fc9b9feb700  2 req 20:0.006394:s3:PUT
/:create_bucket:http status=403
2014-04-28 10:44:42.212749 7fc9b9feb700  1 == req done req=0x1f20f30
http_status=403 ==


I tried to use the secret key both K8TRyfK8ArRjGRnSRvd4N5gY4TdeK1wK\/1iGCcGO
and K8TRyfK8ArRjGRnSRvd4N5gY4TdeK1wK/1iGCcGO

Thank you for your help!
Seowon

--
Seowon Jung
Systems Administrator

College of Education
University of Hawaii at Manoa
(808) 956-7939
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Creating a bucket failed

2014-04-28 Thread Yehuda Sadeh
This could happen if your client is uses the bucket through subdomain
scheme, but the rgw is not resolving it correctly (either rgw_dns_name is
misconfigured, or you were accessing it through different host name).

Yehuda


On Mon, Apr 28, 2014 at 2:02 PM, Seowon Jung  wrote:

> Hello,
>
> I've installed Ceph Emperor on my Ubuntu 12.04 server to test many things.
>  Everything was pretty good so far, but now I got a problem (403,
>  AccessDenied) when I try to create a bucket through S3-compatible API.
>  Please read the following information.
>
> *Client Information*
> Computer: Ubuntu 12.04 64bit Desktop
> S3 Client: Dragon Disk 1.05
>
>
> *Server Information*
> Server Hardware: 2 servers, 2 storage array (12 OSDs each, total 24 OSDs)
> OS: Ubuntu 12.04 64bit
> Ceph: Emperor, Health OK, all OSDs UP
>
>
> *Configurations:*
>
> ceph.conf
> [global]
> fsid = 2606e43d-6ca3-4aeb-b760-507a97e06190
> mon_initial_members = lab0, lab1
> mon_host = 172.17.1.250,172.17.1.251
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> osd_max_attr_size = 655360
> osd pool default size = 3
> osd pool default min size = 1
> osd pool default pg num = 800
> osd pool default pgp num = 800
>
> [client.radosgw.gateway]
> host = lab0
> keyring = /etc/ceph/keyring.radosgw.gateway
> rgw socket path = /tmp/radosgw.sock
> log file = /var/log/ceph/radosgw.log
> rgw data = /var/lib/ceph/radosgw
> rgw dns name = lab0.coe.hawaii.edu
> rgw print continue = false
>
>
> Apache
> /etc/apache2/sites-enabled/rgw
> 
> FastCgiExternalServer /var/www/s3gw.fcgi -socket /tmp/radosgw.sock
> ServerName  lab0.coe.hawaii.edu
> ServerAdmin webmaster@localhost
> DocumentRoot /var/www
>
> RewriteEngine On
> RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*)
> /s3gw.fcgi?page=$1¶ms=$2&%{QUERY_STRING}
> [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
>
> 
> 
> Options +ExecCGI
> AllowOverride All
> SetHandler fastcgi-script
> Order allow,deny
> allow from all
> AuthBasicAuthoritative Off
> 
> 
>
> AllowEncodedSlashes On
> ErrorLog ${APACHE_LOG_DIR}/error.log
> CustomLog ${APACHE_LOG_DIR}/access.log combined
> ServerSignature Off
> 
>
>
> User Info:
> # radosgw-admin user info --uid=admin
> { "user_id": "admin",
>   "display_name": "Admin",
>   "email": "",
>   "suspended": 0,
>   "max_buckets": 1000,
>   "auid": 0,
>   "subusers": [],
>   "keys": [
> { "user": "admin",
>   "access_key": "A3R0CEF3140MLIZIXN4X",
>   "secret_key": "K8TRyfK8ArRjGRnSRvd4N5gY4TdeK1wK\/1iGCcGO"}],
>   "swift_keys": [],
>   "caps": [],
>   "op_mask": "read, write, delete",
>   "default_placement": "",
>   "placement_tags": [],
>   "bucket_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1}}
>
>
> /var/log/ceph/radosgw.log:
> 2014-04-28 10:44:42.206681 7fc9b9feb700 15 calculated
> digest=6JGkEimcy2pBN3Ty6mfYh6SudcA=
> 2014-04-28 10:44:42.206685 7fc9b9feb700 15
> auth_sign=6JGkEimcy2pBN3Ty6mfYh6SudcA=
> 2014-04-28 10:44:42.206686 7fc9b9feb700 15 compare=0
> 2014-04-28 10:44:42.206691 7fc9b9feb700  2 req 20:0.000456:s3:PUT
> /:create_bucket:reading permissions
> 2014-04-28 10:44:42.206697 7fc9b9feb700  2 req 20:0.000463:s3:PUT
> /:create_bucket:init op
> 2014-04-28 10:44:42.206701 7fc9b9feb700  2 req 20:0.000467:s3:PUT
> /:create_bucket:verifying op mask
> 2014-04-28 10:44:42.206704 7fc9b9feb700 20 required_mask= 2 user.op_mask=7
> 2014-04-28 10:44:42.206706 7fc9b9feb700  2 req 20:0.000472:s3:PUT
> /:create_bucket:verifying op permissions
> 2014-04-28 10:44:42.209718 7fc9b9feb700  2 req 20:0.003483:s3:PUT
> /:create_bucket:verifying op params
> 2014-04-28 10:44:42.209742 7fc9b9feb700  2 req 20:0.003508:s3:PUT
> /:create_bucket:executing
> 2014-04-28 10:44:42.209776 7fc9b9feb700 20 get_obj_state:
> rctx=0x7fc928009bd0 obj=.rgw:test state=0x7fc92800cfd8 s->prefetch_data=0
> 2014-04-28 10:44:42.209790 7fc9b9feb700 10 moving .rgw+test to cache LRU
> end
> 2014-04-28 10:44:42.209793 7fc9b9feb700 10 cache get: name=.rgw+test :
> type miss (requested=22, cached=0)
> 2014-04-28 10:44:42.211397 7fc9b9feb700 10 cache put: name=.rgw+test
> 2014-04-28 10:44:42.211417 7fc9b9feb700 10 moving .rgw+test to cache LRU
> end
> 2014-04-28 10:44:42.212563 7fc9b9feb700 20 rgw_create_bucket returned
> ret=-1 bucket=test(@{i=.rgw.buckets.index}.rgw.buckets[default.5154.9])
> 2014-04-28 10:44:42.212629 7fc9b9feb700  2 req 20:0.006394:s3:PUT
> /:create_bucket:http status=403
> 2014-04-28 10:44:42.212749 7fc9b9feb700  1 == req done req=0x1f20f30
> http_status=403 ==
>
>
> I tried to use the secret key both K8TRyfK8ArRjGRnSRvd4N5gY4TdeK1wK\/1iGCcGO
> and K8TRyfK8ArRjGRnSRvd4N5gY4TdeK1wK/1iGCcGO
>
> Thank you for your help!
> Seowon
>
> --
> Seowon Jung
> Systems Administrator
>
> College of Education
> University of Hawaii at Manoa
> (808) 956-7939
>
> _

Re: [ceph-users] Creating a bucket failed

2014-04-28 Thread Seowon Jung
Thank you so much for your quick reply.  I created a subuser for Swift, but
it got the authorization error.  Is it related to the same problem?

$ swift --verbose  -V 1.0 -A http://lab0.coe.hawaii.edu/auth -U admin:swift
-K RnelTPTJGc4rt6LlRjF4AnxfJhrLvu4J6+PTUl+s post test
Container PUT failed: http://lab0.coe.hawaii.edu:80/swift/v1/test 401
Authorization Required   AccessDenied

Thank you!

--
Seowon Jung
Systems Administrator

College of Education
University of Hawaii at Manoa
(808) 956-7939


On Mon, Apr 28, 2014 at 11:10 AM, Yehuda Sadeh  wrote:

> This could happen if your client is uses the bucket through subdomain
> scheme, but the rgw is not resolving it correctly (either rgw_dns_name is
> misconfigured, or you were accessing it through different host name).
>
> Yehuda
>
>
> On Mon, Apr 28, 2014 at 2:02 PM, Seowon Jung  wrote:
>
>> Hello,
>>
>> I've installed Ceph Emperor on my Ubuntu 12.04 server to test many
>> things.  Everything was pretty good so far, but now I got a problem (403,
>>  AccessDenied) when I try to create a bucket through S3-compatible API.
>>  Please read the following information.
>>
>> *Client Information*
>> Computer: Ubuntu 12.04 64bit Desktop
>> S3 Client: Dragon Disk 1.05
>>
>>
>> *Server Information*
>> Server Hardware: 2 servers, 2 storage array (12 OSDs each, total 24 OSDs)
>> OS: Ubuntu 12.04 64bit
>> Ceph: Emperor, Health OK, all OSDs UP
>>
>>
>> *Configurations:*
>>
>> ceph.conf
>> [global]
>> fsid = 2606e43d-6ca3-4aeb-b760-507a97e06190
>> mon_initial_members = lab0, lab1
>> mon_host = 172.17.1.250,172.17.1.251
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>> filestore_xattr_use_omap = true
>> osd_max_attr_size = 655360
>> osd pool default size = 3
>> osd pool default min size = 1
>> osd pool default pg num = 800
>> osd pool default pgp num = 800
>>
>> [client.radosgw.gateway]
>> host = lab0
>> keyring = /etc/ceph/keyring.radosgw.gateway
>> rgw socket path = /tmp/radosgw.sock
>> log file = /var/log/ceph/radosgw.log
>> rgw data = /var/lib/ceph/radosgw
>> rgw dns name = lab0.coe.hawaii.edu
>> rgw print continue = false
>>
>>
>> Apache
>> /etc/apache2/sites-enabled/rgw
>> 
>> FastCgiExternalServer /var/www/s3gw.fcgi -socket /tmp/radosgw.sock
>> ServerName  lab0.coe.hawaii.edu
>> ServerAdmin webmaster@localhost
>>  DocumentRoot /var/www
>>
>> RewriteEngine On
>> RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*)
>> /s3gw.fcgi?page=$1¶ms=$2&%{QUERY_STRING}
>> [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
>>
>> 
>> 
>> Options +ExecCGI
>> AllowOverride All
>> SetHandler fastcgi-script
>> Order allow,deny
>> allow from all
>> AuthBasicAuthoritative Off
>> 
>> 
>>
>> AllowEncodedSlashes On
>> ErrorLog ${APACHE_LOG_DIR}/error.log
>> CustomLog ${APACHE_LOG_DIR}/access.log combined
>> ServerSignature Off
>> 
>>
>>
>> User Info:
>> # radosgw-admin user info --uid=admin
>> { "user_id": "admin",
>>   "display_name": "Admin",
>>   "email": "",
>>   "suspended": 0,
>>   "max_buckets": 1000,
>>   "auid": 0,
>>   "subusers": [],
>>   "keys": [
>> { "user": "admin",
>>   "access_key": "A3R0CEF3140MLIZIXN4X",
>>   "secret_key": "K8TRyfK8ArRjGRnSRvd4N5gY4TdeK1wK\/1iGCcGO"}],
>>   "swift_keys": [],
>>   "caps": [],
>>   "op_mask": "read, write, delete",
>>   "default_placement": "",
>>   "placement_tags": [],
>>   "bucket_quota": { "enabled": false,
>>   "max_size_kb": -1,
>>   "max_objects": -1}}
>>
>>
>> /var/log/ceph/radosgw.log:
>> 2014-04-28 10:44:42.206681 7fc9b9feb700 15 calculated
>> digest=6JGkEimcy2pBN3Ty6mfYh6SudcA=
>> 2014-04-28 10:44:42.206685 7fc9b9feb700 15
>> auth_sign=6JGkEimcy2pBN3Ty6mfYh6SudcA=
>> 2014-04-28 10:44:42.206686 7fc9b9feb700 15 compare=0
>> 2014-04-28 10:44:42.206691 7fc9b9feb700  2 req 20:0.000456:s3:PUT
>> /:create_bucket:reading permissions
>> 2014-04-28 10:44:42.206697 7fc9b9feb700  2 req 20:0.000463:s3:PUT
>> /:create_bucket:init op
>> 2014-04-28 10:44:42.206701 7fc9b9feb700  2 req 20:0.000467:s3:PUT
>> /:create_bucket:verifying op mask
>> 2014-04-28 10:44:42.206704 7fc9b9feb700 20 required_mask= 2 user.op_mask=7
>> 2014-04-28 10:44:42.206706 7fc9b9feb700  2 req 20:0.000472:s3:PUT
>> /:create_bucket:verifying op permissions
>> 2014-04-28 10:44:42.209718 7fc9b9feb700  2 req 20:0.003483:s3:PUT
>> /:create_bucket:verifying op params
>> 2014-04-28 10:44:42.209742 7fc9b9feb700  2 req 20:0.003508:s3:PUT
>> /:create_bucket:executing
>> 2014-04-28 10:44:42.209776 7fc9b9feb700 20 get_obj_state:
>> rctx=0x7fc928009bd0 obj=.rgw:test state=0x7fc92800cfd8 s->prefetch_data=0
>> 2014-04-28 10:44:42.209790 7fc9b9feb700 10 moving .rgw+test to cache LRU
>> end
>> 2014-04-28 10:44:42.209793 7fc9b9feb700 10 cache get: name=.rgw+test :
>> type miss (requested=22, cached=0)
>> 2014-04-28 10:44:42.211397 7fc9b9feb700 10 cache put: name=.rgw+test
>> 2014-04-28 10:44:42.

Re: [ceph-users] Creating a bucket failed

2014-04-28 Thread Cedric Lemarchand
Hello,

Le 28/04/2014 23:29, Seowon Jung a écrit :
> Thank you so much for your quick reply.  I created a subuser for
> Swift, but it got the authorization error.  Is it related to the same
> problem?
In the way bucket access via subdomain is specific to S3 and you are now
using Swift, I don't think so.
> $ swift --verbose  -V 1.0 -A http://lab0.coe.hawaii.edu/auth -U
> admin:swift -K RnelTPTJGc4rt6LlRjF4AnxfJhrLvu4J6+PTUl+s post test
> Container PUT failed: http://lab0.coe.hawaii.edu:80/swift/v1/test 401
> Authorization Required   AccessDenied
I would first try to check if the subuser has rights to create a bucket.
("permissions" field)

Cheers

> Thank you!
>
> --
> Seowon Jung
> Systems Administrator
>
> College of Education
> University of Hawaii at Manoa
> (808) 956-7939
>
>
> On Mon, Apr 28, 2014 at 11:10 AM, Yehuda Sadeh  > wrote:
>
> This could happen if your client is uses the bucket through
> subdomain scheme, but the rgw is not resolving it correctly
> (either rgw_dns_name is misconfigured, or you were accessing it
> through different host name).
>
> Yehuda
>
>
> On Mon, Apr 28, 2014 at 2:02 PM, Seowon Jung  > wrote:
>
> Hello,
>
> I've installed Ceph Emperor on my Ubuntu 12.04 server to test
> many things.  Everything was pretty good so far, but now I got
> a problem (403,  AccessDenied) when I try to create a bucket
> through S3-compatible API.  Please read the following information.
>
> *Client Information*
> Computer: Ubuntu 12.04 64bit Desktop
> S3 Client: Dragon Disk 1.05
>
>
> *Server Information*
> Server Hardware: 2 servers, 2 storage array (12 OSDs each,
> total 24 OSDs)
> OS: Ubuntu 12.04 64bit
> Ceph: Emperor, Health OK, all OSDs UP
>
>
> *Configurations:*
>
> ceph.conf
> [global]
> fsid = 2606e43d-6ca3-4aeb-b760-507a97e06190
> mon_initial_members = lab0, lab1
> mon_host = 172.17.1.250,172.17.1.251
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> osd_max_attr_size = 655360
> osd pool default size = 3
> osd pool default min size = 1
> osd pool default pg num = 800
> osd pool default pgp num = 800
>
> [client.radosgw.gateway]
> host = lab0
> keyring = /etc/ceph/keyring.radosgw.gateway
> rgw socket path = /tmp/radosgw.sock
> log file = /var/log/ceph/radosgw.log
> rgw data = /var/lib/ceph/radosgw
> rgw dns name = lab0.coe.hawaii.edu 
> rgw print continue = false
>
>
> Apache
> /etc/apache2/sites-enabled/rgw
> 
> FastCgiExternalServer /var/www/s3gw.fcgi -socket
> /tmp/radosgw.sock
> ServerName  lab0.coe.hawaii.edu 
> ServerAdmin webmaster@localhost
> DocumentRoot /var/www
>
> RewriteEngine On
> RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*)
> /s3gw.fcgi?page=$1¶ms=$2&%{QUERY_STRING}
> [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
>
> 
> 
> Options +ExecCGI
> AllowOverride All
> SetHandler fastcgi-script
> Order allow,deny
> allow from all
> AuthBasicAuthoritative Off
> 
> 
>
> AllowEncodedSlashes On
> ErrorLog ${APACHE_LOG_DIR}/error.log
> CustomLog ${APACHE_LOG_DIR}/access.log combined
> ServerSignature Off
> 
>
>
> User Info:
> # radosgw-admin user info --uid=admin
> { "user_id": "admin",
>   "display_name": "Admin",
>   "email": "",
>   "suspended": 0,
>   "max_buckets": 1000,
>   "auid": 0,
>   "subusers": [],
>   "keys": [
> { "user": "admin",
>   "access_key": "A3R0CEF3140MLIZIXN4X",
>   "secret_key":
> "K8TRyfK8ArRjGRnSRvd4N5gY4TdeK1wK\/1iGCcGO"}],
>   "swift_keys": [],
>   "caps": [],
>   "op_mask": "read, write, delete",
>   "default_placement": "",
>   "placement_tags": [],
>   "bucket_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1}}
>
>
> /var/log/ceph/radosgw.log:
> 2014-04-28 10:44:42.206681 7fc9b9feb700 15 calculated
> digest=6JGkEimcy2pBN3Ty6mfYh6SudcA=
> 2014-04-28 10:44:42.206685 7fc9b9feb700 15
> auth_sign=6JGkEimcy2pBN3Ty6mfYh6SudcA=
> 2014-04-28 10:44:42.206686 7fc9b9feb700 15 compare=0
> 2014-04-28 10:44:42.206691 7fc9b9feb700  2 req
> 20:

Re: [ceph-users] osd_recovery_max_single_start

2014-04-28 Thread David Zafman


On Apr 24, 2014, at 10:09 AM, Chad Seys  wrote:

> Hi David,
>  Thanks for the reply.
>  I'm a little confused by OSD versus PGs in the description of the two 
> options osd_recovery_max_single_start and osd_recovery_max_active .

An OSD manages all the PGs in its object store (a subset of all PGs in the 
cluster).  An OSD only needs to manage recovery of the PGs for which it is 
primary and need recovery.   

> 
> The ceph webpage describes osd_recovery_max_active as "The number of active 
> recovery requests per OSD at one time." It does not mention PGs. ?
> 
> Assuming you meant OSD instead of PG, is this a rephrase of your message:
> 
> "osd_recovery_max_active (default 15)" recovery operations will run total and 
> will be started in groups of "osd_recovery_max_single_start (default 5)”

Yes, but PGs are the way the newly started recovery ops group.

The osd_recovery_max_active is the number of recovery operations which can be 
active at any given time for an OSD for all the PGs it is simultaneously 
recovering.

The osd_recovery_max_single_start is the maximum number of recovery operations 
that will be newly started per PG that the OSD is recovering.

> 
> So if I set osd_recovery_max_active = 1 then osd_recovery_max_single_start 
> will effectively = 1 ?


Yes, if osd_recovery_max_active <= osd_recovery_max_single_start then with no 
ops are currently active we could only start the osd_recovery_max_active new 
ops anyway.

> 
> Thanks!
> Chad.
> 
> On Thursday, April 24, 2014 11:43:47 you wrote:
>> The value of osd_recovery_max_single_start (default 5) is used in
>> conjunction with osd_recovery_max_active (default 15).   This means that a
>> given PG will start up to 5 recovery operations at time of a total of 15
>> operations active at a time.  This allows recovery to spread operations
>> across more or less PGs at any given time.
>> 
>> David Zafman
>> Senior Developer
>> http://www.inktank.com
>> 
>> On Apr 24, 2014, at 8:09 AM, Chad Seys  wrote:
>>> Hi All,
>>> 
>>>  What does osd_recovery_max_single_start do?  I could not find a
>>>  description
>>> 
>>> of it.
>>> 
>>> Thanks!
>>> Chad.
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



David Zafman
Senior Developer
http://www.inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Creating a bucket failed

2014-04-28 Thread Seowon Jung
Cedric,

I created this user as described on the official document
$ sudo radosgw-admin subuser create --uid=johndoe --subuser=johndoe:swift
--access=full

The subuser seems to have a full permission.
$ radosgw-admin user info --uid=admin

  "swift_keys": [
{ "user": "admin:swift",
  "secret_key": "RnelTPTJGc4rt6LlRjF4AnxfJhrLvu4J6+PTUl+s"}],
  ps": [],
  "op_mask": "read, write, delete",
  "default_placement": "",
  "placement_tags": [],
  "bucket_quota": { "enabled": false,
  "max_size_kb": -1,
  "max_objects": -1}}

Thank you for your help anyway,
Seowon


--
Seowon Jung
Systems Administrator

College of Education
University of Hawaii at Manoa
(808) 956-7939


On Mon, Apr 28, 2014 at 12:12 PM, Cedric Lemarchand wrote:

>  Hello,
>
> Le 28/04/2014 23:29, Seowon Jung a écrit :
>
>  Thank you so much for your quick reply.  I created a subuser for Swift,
> but it got the authorization error.  Is it related to the same problem?
>
> In the way bucket access via subdomain is specific to S3 and you are now
> using Swift, I don't think so.
>
>   $ swift --verbose  -V 1.0 -A http://lab0.coe.hawaii.edu/auth -U
> admin:swift -K RnelTPTJGc4rt6LlRjF4AnxfJhrLvu4J6+PTUl+s post test
> Container PUT failed: http://lab0.coe.hawaii.edu:80/swift/v1/test 401
> Authorization Required   AccessDenied
>
> I would first try to check if the subuser has rights to create a bucket.
> ("permissions" field)
>
> Cheers
>
>
>  Thank you!
>
>  --
> Seowon Jung
> Systems Administrator
>
> College of Education
> University of Hawaii at Manoa
> (808) 956-7939
>
>
> On Mon, Apr 28, 2014 at 11:10 AM, Yehuda Sadeh  wrote:
>
>> This could happen if your client is uses the bucket through subdomain
>> scheme, but the rgw is not resolving it correctly (either rgw_dns_name is
>> misconfigured, or you were accessing it through different host name).
>>
>>  Yehuda
>>
>>
>>  On Mon, Apr 28, 2014 at 2:02 PM, Seowon Jung  wrote:
>>
>>>   Hello,
>>>
>>>  I've installed Ceph Emperor on my Ubuntu 12.04 server to test many
>>> things.  Everything was pretty good so far, but now I got a problem (403,
>>>  AccessDenied) when I try to create a bucket through S3-compatible API.
>>>  Please read the following information.
>>>
>>>  *Client Information*
>>> Computer: Ubuntu 12.04 64bit Desktop
>>> S3 Client: Dragon Disk 1.05
>>>
>>>
>>>  *Server Information*
>>> Server Hardware: 2 servers, 2 storage array (12 OSDs each, total 24 OSDs)
>>> OS: Ubuntu 12.04 64bit
>>> Ceph: Emperor, Health OK, all OSDs UP
>>>
>>>
>>>  *Configurations:*
>>>
>>>  ceph.conf
>>>  [global]
>>> fsid = 2606e43d-6ca3-4aeb-b760-507a97e06190
>>> mon_initial_members = lab0, lab1
>>> mon_host = 172.17.1.250,172.17.1.251
>>> auth_cluster_required = cephx
>>> auth_service_required = cephx
>>> auth_client_required = cephx
>>> filestore_xattr_use_omap = true
>>> osd_max_attr_size = 655360
>>> osd pool default size = 3
>>> osd pool default min size = 1
>>> osd pool default pg num = 800
>>> osd pool default pgp num = 800
>>>
>>>  [client.radosgw.gateway]
>>> host = lab0
>>> keyring = /etc/ceph/keyring.radosgw.gateway
>>> rgw socket path = /tmp/radosgw.sock
>>> log file = /var/log/ceph/radosgw.log
>>> rgw data = /var/lib/ceph/radosgw
>>> rgw dns name = lab0.coe.hawaii.edu
>>> rgw print continue = false
>>>
>>>
>>>  Apache
>>> /etc/apache2/sites-enabled/rgw
>>> 
>>>  FastCgiExternalServer /var/www/s3gw.fcgi -socket /tmp/radosgw.sock
>>> ServerName  lab0.coe.hawaii.edu
>>> ServerAdmin webmaster@localhost
>>>  DocumentRoot /var/www
>>>
>>>  RewriteEngine On
>>> RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*)
>>> /s3gw.fcgi?page=$1¶ms=$2&%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{
>>> HTTP:Authorization},L]
>>>
>>>  
>>> 
>>> Options +ExecCGI
>>> AllowOverride All
>>> SetHandler fastcgi-script
>>> Order allow,deny
>>> allow from all
>>> AuthBasicAuthoritative Off
>>> 
>>> 
>>>
>>>  AllowEncodedSlashes On
>>> ErrorLog ${APACHE_LOG_DIR}/error.log
>>> CustomLog ${APACHE_LOG_DIR}/access.log combined
>>> ServerSignature Off
>>> 
>>>
>>>
>>>  User Info:
>>>  # radosgw-admin user info --uid=admin
>>> { "user_id": "admin",
>>>   "display_name": "Admin",
>>>   "email": "",
>>>   "suspended": 0,
>>>   "max_buckets": 1000,
>>>   "auid": 0,
>>>   "subusers": [],
>>>   "keys": [
>>> { "user": "admin",
>>>   "access_key": "A3R0CEF3140MLIZIXN4X",
>>>   "secret_key": "K8TRyfK8ArRjGRnSRvd4N5gY4TdeK1wK\/1iGCcGO"}],
>>>   "swift_keys": [],
>>>   "caps": [],
>>>   "op_mask": "read, write, delete",
>>>   "default_placement": "",
>>>   "placement_tags": [],
>>>   "bucket_quota": { "enabled": false,
>>>   "max_size_kb": -1,
>>>   "max_objects": -1}}
>>>
>>>
>>>  /var/log/ceph/radosgw.log:
>>>  2014-04-28 10:44:42.206681 7fc9b9feb700 15 calculated
>>> digest=6JGkEimcy2pBN3Ty6mfYh6SudcA=
>>> 2014-04-28 10:44:42.206685 7fc9b9feb700 15
>>> auth_sign=6JG

Re: [ceph-users] Slow RBD Benchmark Compared To Direct I/O Test

2014-04-28 Thread Indra Pramana
Dear Christian and all,

Anyone can advise?

Looking forward to your reply, thank you.

Cheers.



On Thu, Apr 24, 2014 at 1:51 PM, Indra Pramana  wrote:

> Hi Christian,
>
> Good day to you, and thank you for your reply.
>
> On Wed, Apr 23, 2014 at 11:41 PM, Christian Balzer  wrote:
>
>> > > > Using 32 concurrent writes, result is below. The speed really
>> > > > fluctuates.
>> > > >
>> > > >  Total time run: 64.31704964.317049
>> > > > Total writes made:  1095
>> > > > Write size: 4194304
>> > > > Bandwidth (MB/sec): 68.100
>> > > >
>> > > > Stddev Bandwidth:   44.6773
>> > > > Max bandwidth (MB/sec): 184
>> > > > Min bandwidth (MB/sec): 0
>> > > > Average Latency:1.87761
>> > > > Stddev Latency: 1.90906
>> > > > Max latency:9.99347
>> > > > Min latency:0.075849
>> > > >
>> > > That is really weird, it should get faster, not slower. ^o^
>> > > I assume you've run this a number of times?
>> > >
>> > > Also my apologies, the default is 16 threads, not 1, but that still
>> > > isn't enough to get my cluster to full speed:
>> > > ---
>> > > Bandwidth (MB/sec): 349.044
>> > >
>> > > Stddev Bandwidth:   107.582
>> > > Max bandwidth (MB/sec): 408
>> > > ---
>> > > at 64 threads it will ramp up from a slow start to:
>> > > ---
>> > > Bandwidth (MB/sec): 406.967
>> > >
>> > > Stddev Bandwidth:   114.015
>> > > Max bandwidth (MB/sec): 452
>> > > ---
>> > >
>> > > But what stands out is your latency. I don't have a 10GBE network to
>> > > compare, but my Infiniband based cluster (going through at least one
>> > > switch) gives me values like this:
>> > > ---
>> > > Average Latency:0.335519
>> > > Stddev Latency: 0.177663
>> > > Max latency:1.37517
>> > > Min latency:0.1017
>> > > ---
>> > >
>> > > Of course that latency is not just the network.
>> > >
>> >
>> > What else can contribute to this latency? Storage node load, disk speed,
>> > anything else?
>> >
>> That and the network itself are pretty much it, you should know once
>> you've run those test with atop or iostat on the storage nodes.
>>
>> >
>> > > I would suggest running atop (gives you more information at one
>> > > glance) or "iostat -x 3" on all your storage nodes during these tests
>> > > to identify any node or OSD that is overloaded in some way.
>> > >
>> >
>> > Will try.
>> >
>> Do that and let us know about the results.
>>
>
> I have done some tests using iostat and noted some OSDs on a particular
> storage node going up to the 100% limit when I run the rados bench test.
>
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>1.090.000.92   21.740.00   76.25
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda   0.00 0.004.33   42.0073.33  6980.00
> 304.46 0.296.220.006.86   1.50   6.93
> sdb   0.00 0.000.00   17.67 0.00  6344.00
> 718.1959.64  854.260.00  854.26  56.60 *100.00*
> sdc   0.00 0.00   12.33   59.3370.67 18882.33
> 528.9236.54  509.80   64.76  602.31  10.51  75.33
> sdd   0.00 0.003.33   54.3324.00 15249.17
> 529.71 1.29   22.453.20   23.63   1.64   9.47
> sde   0.00 0.330.000.67 0.00 4.00
> 12.00 0.30  450.000.00  450.00 450.00  30.00
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>1.380.001.137.750.00   89.74
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda   0.00 0.005.00   69.0030.67 19408.50
> 525.38 4.29   58.020.53   62.18   2.00  14.80
> sdb   0.00 0.007.00   63.3341.33 20911.50
> 595.8213.09  826.96   88.57  908.57   5.48  38.53
> sdc   0.00 0.002.67   30.0017.33  6945.33
> 426.29 0.216.530.507.07   1.59   5.20
> sdd   0.00 0.002.67   58.6716.00 20661.33
> 674.26 4.89   79.54   41.00   81.30   2.70  16.53
> sde   0.00 0.000.001.67 0.00 6.67
> 8.00 0.013.200.003.20   1.60   0.27
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>0.970.000.556.730.00   91.75
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda   0.00 0.001.67   15.3321.33   120.00
> 16.63 0.021.180.001.30   0.63   1.07
> sdb   0.00 0.004.33   62.3324.00 13299.17
> 399.69 2.68   11.181.23   11.87   1.94  12.93
> sdc   0.00 0.000.67   38.3370.67  7881.33
> 407.7937.66  202.150.00  205.67  13.61  53.07
> sdd   0.00 0.003.00

Re: [ceph-users] RBD clone for OpenStack Nova ephemeral volumes

2014-04-28 Thread Dmitry Borodaenko
I have decoupled the Nova rbd-ephemeral-clone branch from the
multiple-image-location patch, the result can be found at the same
location on GitHub as before:
https://github.com/angdraug/nova/tree/rbd-ephemeral-clone

I will keep rebasing this over Nova master, I also plan to update the
rbd-clone-image-handler blueprint and publish it to nova-specs so that
the patch series could be proposed for Juno.

Icehouse backport of this branch is here:
https://github.com/angdraug/nova/tree/rbd-ephemeral-clone-stable-icehouse

I am not going to track every stable/icehouse commit with this branch,
instead, I will rebase it over stable release tags as they appear.
Right now it's based on tag:2014.1.

For posterity, I'm leaving the multiple-image-location patch rebased
over current Nova master here:
https://github.com/angdraug/nova/tree/multiple-image-location

I don't plan on maintaining multiple-image-location, just leaving it
out there to save some rebasing effort for whoever decides to pick it
up.

-DmitryB

On Fri, Mar 21, 2014 at 1:12 PM, Josh Durgin  wrote:
> On 03/20/2014 07:03 PM, Dmitry Borodaenko wrote:
>>
>> On Thu, Mar 20, 2014 at 3:43 PM, Josh Durgin 
>> wrote:
>>>
>>> On 03/20/2014 02:07 PM, Dmitry Borodaenko wrote:

 The patch series that implemented clone operation for RBD backed
 ephemeral volumes in Nova did not make it into Icehouse. We have tried
 our best to help it land, but it was ultimately rejected. Furthermore,
 an additional requirement was imposed to make this patch series
 dependent on full support of Glance API v2 across Nova (due to its
 dependency on direct_url that was introduced in v2).

 You can find the most recent discussion of this patch series in the
 FFE (feature freeze exception) thread on openstack-dev ML:

 http://lists.openstack.org/pipermail/openstack-dev/2014-March/029127.html

 As I explained in that thread, I believe this feature is essential for
 using Ceph as a storage backend for Nova, so I'm going to try and keep
 it alive outside of OpenStack mainline until it is allowed to land.

 I have created rbd-ephemeral-clone branch in my nova repo fork on
 GitHub:
 https://github.com/angdraug/nova/tree/rbd-ephemeral-clone

 I will keep it rebased over nova master, and will create an
 rbd-ephemeral-clone-stable-icehouse to track the same patch series
 over nova stable/icehouse once it's branched. I also plan to make sure
 that this patch series is included in Mirantis OpenStack 5.0 which
 will be based on Icehouse.

 If you're interested in this feature, please review and test. Bug
 reports and patches are welcome, as long as their scope is limited to
 this patch series and is not applicable for mainline OpenStack.
>>>
>>>
>>> Thanks for taking this on Dmitry! Having rebased those patches many
>>> times during icehouse, I can tell you it's often not trivial.
>>
>>
>> Indeed, I get conflicts every day lately, even in the current
>> bugfixing stage of the OpenStack release cycle. I have a feeling it
>> will not get easier when Icehouse is out and Juno is in full swing.
>>
>>> Do you think the imagehandler-based approach is best for Juno? I'm
>>> leaning towards the older way [1] for simplicity of review, and to
>>> avoid using glance's v2 api by default.
>>> [1] https://review.openstack.org/#/c/46879/
>>
>>
>> Excellent question, I have thought long and hard about this. In
>> retrospect, requiring this change to depend on the imagehandler patch
>> back in December 2013 proven to have been a poor decision.
>> Unfortunately, now that it's done, porting your original patch from
>> Havana to Icehouse is more work than keeping the new patch series up
>> to date with Icehouse, at least short term. Especially if we decide to
>> keep the rbd_utils refactoring, which I've grown to like.
>>
>> As far as I understand, your original code made use of the same v2 api
>> call even before it was rebased over imagehandler patch:
>>
>> https://github.com/jdurgin/nova/blob/8e4594123b65ddf47e682876373bca6171f4a6f5/nova/image/glance.py#L304
>>
>> If I read this right, imagehandler doesn't create the dependency on v2
>> api, the only reason it caused a problem was because it exposed the
>> output of the same Glance API call to a code path that assumed a v1
>> data structure. If so, decoupling rbd clone patch from imagehandler
>> will not help lift the full Glance API v2 support requirement, that v2
>> api call will still be there.
>>
>> Also, there's always a chance that imagehandler lands in Juno. If it
>> does, we'd be forced to dust off the imagehandler based patch series
>> again, and the effort spent on maintaining the old patch would be
>> wasted.
>>
>> Given all that, and without making any assumptions about stability of
>> the imagehandler patch in its current state, I'm leaning towards
>> keeping it. If you think it's likely that it will cause us more
>> problems than the G

Re: [ceph-users] Ceph mds laggy and failed assert in function replay mds/journal.cc

2014-04-28 Thread Jingyuan Luke
Hi,

We had applied the patch and recompile ceph as well as updated the
ceph.conf as per suggested, when we re-run ceph-mds we noticed the
following:


2014-04-29 10:45:22.260798 7f90b971d700  0 log [WRN] :  replayed op
client.324186:51366457,12681393 no session for client.324186
2014-04-29 10:45:22.262419 7f90b971d700  0 log [WRN] :  replayed op
client.324186:51366475,12681393 no session for client.324186
2014-04-29 10:45:22.267699 7f90b971d700  0 log [WRN] :  replayed op
client.324186:5135,12681393 no session for client.324186
2014-04-29 10:45:22.271664 7f90b971d700  0 log [WRN] :  replayed op
client.324186:51366724,12681393 no session for client.324186
2014-04-29 10:45:22.281050 7f90b971d700  0 log [WRN] :  replayed op
client.324186:51366945,12681393 no session for client.324186
2014-04-29 10:45:22.283196 7f90b971d700  0 log [WRN] :  replayed op
client.324186:51366996,12681393 no session for client.324186
2014-04-29 10:45:22.287801 7f90b971d700  0 log [WRN] :  replayed op
client.324186:51367043,12681393 no session for client.324186
2014-04-29 10:45:22.289967 7f90b971d700  0 log [WRN] :  replayed op
client.324186:51367082,12681393 no session for client.324186
2014-04-29 10:45:22.291026 7f90b971d700  0 log [WRN] :  replayed op
client.324186:51367110,12681393 no session for client.324186
2014-04-29 10:45:22.294459 7f90b971d700  0 log [WRN] :  replayed op
client.324186:51367192,12681393 no session for client.324186
2014-04-29 10:45:22.297228 7f90b971d700  0 log [WRN] :  replayed op
client.324186:51367257,12681393 no session for client.324186
2014-04-29 10:45:22.297477 7f90b971d700  0 log [WRN] :  replayed op
client.324186:51367264,12681393 no session for client.324186

tcmalloc: large alloc 1136660480 bytes == 0xb2019000 @  0x7f90c2564da7
0x5bb9cb 0x5ac8eb 0x5b32f7 0x79ecd8 0x58cbed 0x7f90c231de9a
0x7f90c0cca3fd
tcmalloc: large alloc 2273316864 bytes == 0x15d73d000 @
0x7f90c2564da7 0x5bb9cb 0x5ac8eb 0x5b32f7 0x79ecd8 0x58cbed
0x7f90c231de9a 0x7f90c0cca3fd

ceph -s shows that MDS up:replay,

Also the messages above seemed to be repeating again after a while but
with a different session number. Is there a way for us to determine
that we are on the right track? Thanks.

Regards,
Luke

On Sun, Apr 27, 2014 at 12:04 PM, Yan, Zheng  wrote:
> On Sat, Apr 26, 2014 at 9:56 AM, Jingyuan Luke  wrote:
>> Hi Greg,
>>
>> Actually our cluster is pretty empty, but we suspect we had a temporary
>> network disconnection to one of our OSD, not sure if this caused the
>> problem.
>>
>> Anyway we don't mind try the method you mentioned, how can we do that?
>>
>
> compile ceph-mds with the attached patch. add a line "mds
> wipe_sessions = 1" to the ceph.conf,
>
> Yan, Zheng
>
>> Regards,
>> Luke
>>
>>
>> On Saturday, April 26, 2014, Gregory Farnum  wrote:
>>>
>>> Hmm, it looks like your on-disk SessionMap is horrendously out of
>>> date. Did your cluster get full at some point?
>>>
>>> In any case, we're working on tools to repair this now but they aren't
>>> ready for use yet. Probably the only thing you could do is create an
>>> empty sessionmap with a higher version than the ones the journal
>>> refers to, but that might have other fallout effects...
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>>
>>> On Fri, Apr 25, 2014 at 2:57 AM, Mohd Bazli Ab Karim
>>>  wrote:
>>> > More logs. I ran ceph-mds  with debug-mds=20.
>>> >
>>> > -2> 2014-04-25 17:47:54.839672 7f0d6f3f0700 10 mds.0.journal
>>> > EMetaBlob.replay inotable tablev 4316124 <= table 4317932
>>> > -1> 2014-04-25 17:47:54.839674 7f0d6f3f0700 10 mds.0.journal
>>> > EMetaBlob.replay sessionmap v8632368 -(1|2) == table 7239603 prealloc
>>> > [141df86~1] used 141db9e
>>> >   0> 2014-04-25 17:47:54.840733 7f0d6f3f0700 -1 mds/journal.cc: In
>>> > function 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' 
>>> > thread
>>> > 7f0d6f3f0700 time 2014-04-25 17:47:54.839688 mds/journal.cc: 1303: FAILED
>>> > assert(session)
>>> >
>>> > Please look at the attachment for more details.
>>> >
>>> > Regards,
>>> > Bazli
>>> >
>>> > From: Mohd Bazli Ab Karim
>>> > Sent: Friday, April 25, 2014 12:26 PM
>>> > To: 'ceph-de...@vger.kernel.org'; ceph-users@lists.ceph.com
>>> > Subject: Ceph mds laggy and failed assert in function replay
>>> > mds/journal.cc
>>> >
>>> > Dear Ceph-devel, ceph-users,
>>> >
>>> > I am currently facing issue with my ceph mds server. Ceph-mds daemon
>>> > does not want to bring up back.
>>> > Tried running that manually with ceph-mds -i mon01 -d but it shows that
>>> > it stucks at failed assert(session) line 1303 in mds/journal.cc and 
>>> > aborted.
>>> >
>>> > Can someone shed some light in this issue.
>>> > ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
>>> >
>>> > Let me know if I need to send log with debug enabled.
>>> >
>>> > Regards,
>>> > Bazli
>>> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/

Re: [ceph-users] Only one OSD log available per node?

2014-04-28 Thread Indra Pramana
Hi Greg,

The log rotation works fine, it will rotate the logs every day at around
6:50am. However, there are no writes to the files (except for one osd log
file) so it will rotate empty files for most of them.

-rw-r--r--  1 root root 313884 Apr 29 12:07 ceph-osd.12.log
-rw-r--r--  1 root root 198319 Apr 29 06:36 ceph-osd.12.log.1.gz
-rw-r--r--  1 root root 181675 Apr 28 06:50 ceph-osd.12.log.2.gz
-rw-r--r--  1 root root  44012 Apr 27 06:53 ceph-osd.12.log.3.gz
-rw-r--r--  1 root root  0 Apr 29 06:36 ceph-osd.13.log
-rw-r--r--  1 root root 20 Apr 28 06:50 ceph-osd.13.log.1.gz
-rw-r--r--  1 root root506 Apr 27 06:53 ceph-osd.13.log.2.gz
-rw-r--r--  1 root root  44605 Apr 27 06:53 ceph-osd.13.log.3.gz
-rw-r--r--  1 root root  0 Apr 29 06:36 ceph-osd.14.log
-rw-r--r--  1 root root 20 Apr 28 06:50 ceph-osd.14.log.1.gz
-rw-r--r--  1 root root502 Apr 27 06:53 ceph-osd.14.log.2.gz
-rw-r--r--  1 root root  55570 Apr 27 06:53 ceph-osd.14.log.3.gz
-rw-r--r--  1 root root  0 Apr 29 06:36 ceph-osd.15.log
-rw-r--r--  1 root root 20 Apr 28 06:50 ceph-osd.15.log.1.gz
-rw-r--r--  1 root root500 Apr 27 06:53 ceph-osd.15.log.2.gz
-rw-r--r--  1 root root  49090 Apr 27 06:53 ceph-osd.15.log.3.gz

Any advice?

Thank you.


On Mon, Apr 28, 2014 at 11:26 PM, Gregory Farnum  wrote:

> It is not. My guess from looking at the time stamps is that maybe you have
> a log rotation system set up that isn't working properly?
> -Greg
>
>
> On Sunday, April 27, 2014, Indra Pramana  wrote:
>
>> Dear all,
>>
>> I have multiple OSDs per node (normally 4) and I realised that for all
>> the nodes that I have, only one OSD will contain logs under /var/log/ceph,
>> the rest of the logs are empty.
>>
>> root@ceph-osd-07:/var/log/ceph# ls -la *.log
>> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-client.admin.log
>> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.0.log
>> -rw-r--r-- 1 root root 386857 Apr 28 14:02 ceph-osd.12.log
>> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.13.log
>> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.14.log
>> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.15.log
>> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd..log
>>
>> The ceph-osd.12.log only contains the logs for osd.12 only, while the
>> other logs for osd.13, 14 and 15 are not available and empty.
>>
>> Is this normal?
>>
>> Looking forward to your reply, thank you.
>>
>> Cheers.
>>
>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Only one OSD log available per node?

2014-04-28 Thread Gregory Farnum
Are your OSDs actually running? I see that your older logs have more data
in them; did you change log rotation from the defaults?

On Monday, April 28, 2014, Indra Pramana  wrote:

> Hi Greg,
>
> The log rotation works fine, it will rotate the logs every day at around
> 6:50am. However, there are no writes to the files (except for one osd log
> file) so it will rotate empty files for most of them.
>
> -rw-r--r--  1 root root 313884 Apr 29 12:07 ceph-osd.12.log
> -rw-r--r--  1 root root 198319 Apr 29 06:36 ceph-osd.12.log.1.gz
> -rw-r--r--  1 root root 181675 Apr 28 06:50 ceph-osd.12.log.2.gz
> -rw-r--r--  1 root root  44012 Apr 27 06:53 ceph-osd.12.log.3.gz
> -rw-r--r--  1 root root  0 Apr 29 06:36 ceph-osd.13.log
> -rw-r--r--  1 root root 20 Apr 28 06:50 ceph-osd.13.log.1.gz
> -rw-r--r--  1 root root506 Apr 27 06:53 ceph-osd.13.log.2.gz
> -rw-r--r--  1 root root  44605 Apr 27 06:53 ceph-osd.13.log.3.gz
> -rw-r--r--  1 root root  0 Apr 29 06:36 ceph-osd.14.log
> -rw-r--r--  1 root root 20 Apr 28 06:50 ceph-osd.14.log.1.gz
> -rw-r--r--  1 root root502 Apr 27 06:53 ceph-osd.14.log.2.gz
> -rw-r--r--  1 root root  55570 Apr 27 06:53 ceph-osd.14.log.3.gz
> -rw-r--r--  1 root root  0 Apr 29 06:36 ceph-osd.15.log
> -rw-r--r--  1 root root 20 Apr 28 06:50 ceph-osd.15.log.1.gz
> -rw-r--r--  1 root root500 Apr 27 06:53 ceph-osd.15.log.2.gz
> -rw-r--r--  1 root root  49090 Apr 27 06:53 ceph-osd.15.log.3.gz
>
> Any advice?
>
> Thank you.
>
>
> On Mon, Apr 28, 2014 at 11:26 PM, Gregory Farnum 
> 
> > wrote:
>
>> It is not. My guess from looking at the time stamps is that maybe you
>> have a log rotation system set up that isn't working properly?
>> -Greg
>>
>>
>> On Sunday, April 27, 2014, Indra Pramana 
>> >
>> wrote:
>>
>>> Dear all,
>>>
>>> I have multiple OSDs per node (normally 4) and I realised that for all
>>> the nodes that I have, only one OSD will contain logs under /var/log/ceph,
>>> the rest of the logs are empty.
>>>
>>> root@ceph-osd-07:/var/log/ceph# ls -la *.log
>>> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-client.admin.log
>>> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.0.log
>>> -rw-r--r-- 1 root root 386857 Apr 28 14:02 ceph-osd.12.log
>>> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.13.log
>>> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.14.log
>>> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.15.log
>>> -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd..log
>>>
>>> The ceph-osd.12.log only contains the logs for osd.12 only, while the
>>> other logs for osd.13, 14 and 15 are not available and empty.
>>>
>>> Is this normal?
>>>
>>> Looking forward to your reply, thank you.
>>>
>>> Cheers.
>>>
>>
>>
>> --
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>
>

-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD journal overload?

2014-04-28 Thread Indra Pramana
Hi Irek,

Good day to you, and thank you for your e-mail.

Is there a better way other than patching the kernel? I would like to avoid
having to compile a custom kernel for my OS. I read that I can disable
write-caching on the drive using hdparm:

hdparm -W0 /dev/sdf
hdparm -W0 /dev/sdg

I tested on one of my test servers and it seems I can disable it using the
command.

Current setup, write-caching is on:


root@ceph-osd-09:/home/indra# hdparm -W /dev/sdg

/dev/sdg:
 write-caching =  1 (on)


I tried to disable write-caching and it's successful:


root@ceph-osd-09:/home/indra# hdparm -W0 /dev/sdg

/dev/sdg:
 setting drive write-caching to 0 (off)
 write-caching =  0 (off)


I check again, and now write-caching is disabled.


root@ceph-osd-09:/home/indra# hdparm -W /dev/sdg

/dev/sdg:
 write-caching =  0 (off)


Would the above give the same result? If yes, I will try to do that on our
running cluster tonight.

May I also know how I can confirm if my SSD comes with "volatile cache" as
mentioned on your article? I tried to check my SSD's data sheet and there's
no information on whether it comes with volatile cache or not. I also read
that disabling write-caching will also increase the risk of data-loss. Can
you comment on that?

Looking forward to your reply, thank you.

Cheers.



On Mon, Apr 28, 2014 at 7:49 PM, Irek Fasikhov  wrote:

> This is my article :).
> To patch to the kernel (
> http://www.theirek.com/downloads/code/CMD_FLUSH.diff).
> After rebooting, run the following commands:
> echo temporary write through > /sys/class/scsi_disk//cache_type
>
>
> 2014-04-28 15:44 GMT+04:00 Indra Pramana :
>
> Hi Irek,
>>
>> Thanks for the article. Do you have any other web sources pertaining to
>> the same issue, which is in English?
>>
>> Looking forward to your reply, thank you.
>>
>> Cheers.
>>
>>
>> On Mon, Apr 28, 2014 at 7:40 PM, Irek Fasikhov  wrote:
>>
>>> Most likely you need to apply a patch to the kernel.
>>>
>>>
>>> http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov
>>>
>>>
>>> 2014-04-28 15:20 GMT+04:00 Indra Pramana :
>>>
>>> Hi Udo and Irek,

 Good day to you, and thank you for your emails.


 >perhaps due IOs from the journal?
 >You can test with iostat (like "iostat -dm 5 sdg").

 Yes, I have shared the iostat result earlier on this same thread. At
 times the utilisation of the 2 journal drives will hit 100%, especially
 when I simulate writing data using rados bench command. Any suggestions
 what could be the cause of the I/O issue?


 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
1.850.001.653.140.00   93.36


 Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
 avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
  sdg   0.00 0.000.00   55.00 0.00 25365.33
 922.3834.22  568.900.00  568.90  17.82  98.00
 sdf   0.00 0.000.00   55.67 0.00 25022.67
 899.0229.76  500.570.00  500.57  17.60  98.00


 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
2.100.001.372.070.00   94.46


 Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
 avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
  sdg   0.00 0.000.00   56.67 0.00 25220.00
 890.1223.60  412.140.00  412.14  17.62  99.87
 sdf   0.00 0.000.00   52.00 0.00 24637.33
 947.5933.65  587.410.00  587.41  19.23 100.00


 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
2.210.001.776.750.00   89.27


 Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
 avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
  sdg   0.00 0.000.00   54.33 0.00 24802.67
 912.9825.75  486.360.00  486.36  18.40 100.00
 sdf   0.00 0.000.00   53.00 0.00 24716.00
 932.6835.26  669.890.00  669.89  18.87 100.00


 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
1.870.001.675.250.00   91.21


 Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
 avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
  sdg   0.00 0.000.00   94.33 0.00 26257.33
 556.6918.29  208.440.00  208.44  10.50  99.07
 sdf   0.00 0.000.00   51.33 0.00 24470.67
 953.4032.75  684.620.00  684.62  19.51 100.13


 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
1.510.001.347.250.00   89.89


 Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
 avgrq-sz avgqu-sz   awai

Re: [ceph-users] Only one OSD log available per node?

2014-04-28 Thread Indra Pramana
Hi Greg,

Yes, all my OSDs are running.

 1945 ?Ssl  215:36 /usr/bin/ceph-osd --cluster=ceph -i 12 -f
 2090 ?Sl   165:07 /usr/bin/ceph-osd --cluster=ceph -i 15 -f
 2100 ?Sl   205:29 /usr/bin/ceph-osd --cluster=ceph -i 13 -f
 2102 ?Sl   196:01 /usr/bin/ceph-osd --cluster=ceph -i 14 -f

I didn't change log rotation settings from the default. This happens to all
my OSD nodes, not only this one.

Is there a way I can verify if the logs are actually being written by the
ceph-osd processes?

Looking forward to your reply, thank you.

Cheers.



On Tue, Apr 29, 2014 at 12:28 PM, Gregory Farnum  wrote:

> Are your OSDs actually running? I see that your older logs have more data
> in them; did you change log rotation from the defaults?
>
>
> On Monday, April 28, 2014, Indra Pramana  wrote:
>
>> Hi Greg,
>>
>> The log rotation works fine, it will rotate the logs every day at around
>> 6:50am. However, there are no writes to the files (except for one osd log
>> file) so it will rotate empty files for most of them.
>>
>> -rw-r--r--  1 root root 313884 Apr 29 12:07 ceph-osd.12.log
>> -rw-r--r--  1 root root 198319 Apr 29 06:36 ceph-osd.12.log.1.gz
>> -rw-r--r--  1 root root 181675 Apr 28 06:50 ceph-osd.12.log.2.gz
>> -rw-r--r--  1 root root  44012 Apr 27 06:53 ceph-osd.12.log.3.gz
>> -rw-r--r--  1 root root  0 Apr 29 06:36 ceph-osd.13.log
>> -rw-r--r--  1 root root 20 Apr 28 06:50 ceph-osd.13.log.1.gz
>> -rw-r--r--  1 root root506 Apr 27 06:53 ceph-osd.13.log.2.gz
>> -rw-r--r--  1 root root  44605 Apr 27 06:53 ceph-osd.13.log.3.gz
>> -rw-r--r--  1 root root  0 Apr 29 06:36 ceph-osd.14.log
>> -rw-r--r--  1 root root 20 Apr 28 06:50 ceph-osd.14.log.1.gz
>> -rw-r--r--  1 root root502 Apr 27 06:53 ceph-osd.14.log.2.gz
>> -rw-r--r--  1 root root  55570 Apr 27 06:53 ceph-osd.14.log.3.gz
>> -rw-r--r--  1 root root  0 Apr 29 06:36 ceph-osd.15.log
>> -rw-r--r--  1 root root 20 Apr 28 06:50 ceph-osd.15.log.1.gz
>> -rw-r--r--  1 root root500 Apr 27 06:53 ceph-osd.15.log.2.gz
>> -rw-r--r--  1 root root  49090 Apr 27 06:53 ceph-osd.15.log.3.gz
>>
>> Any advice?
>>
>> Thank you.
>>
>>
>> On Mon, Apr 28, 2014 at 11:26 PM, Gregory Farnum wrote:
>>
>>> It is not. My guess from looking at the time stamps is that maybe you
>>> have a log rotation system set up that isn't working properly?
>>> -Greg
>>>
>>>
>>> On Sunday, April 27, 2014, Indra Pramana  wrote:
>>>
 Dear all,

 I have multiple OSDs per node (normally 4) and I realised that for all
 the nodes that I have, only one OSD will contain logs under /var/log/ceph,
 the rest of the logs are empty.

 root@ceph-osd-07:/var/log/ceph# ls -la *.log
 -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-client.admin.log
 -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.0.log
 -rw-r--r-- 1 root root 386857 Apr 28 14:02 ceph-osd.12.log
 -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.13.log
 -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.14.log
 -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd.15.log
 -rw-r--r-- 1 root root  0 Apr 28 06:50 ceph-osd..log

 The ceph-osd.12.log only contains the logs for osd.12 only, while the
 other logs for osd.13, 14 and 15 are not available and empty.

 Is this normal?

 Looking forward to your reply, thank you.

 Cheers.

>>>
>>>
>>> --
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>
>>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mds laggy and failed assert in function replay mds/journal.cc

2014-04-28 Thread Yan, Zheng
On Tue, Apr 29, 2014 at 11:24 AM, Jingyuan Luke  wrote:
> Hi,
>
> We had applied the patch and recompile ceph as well as updated the
> ceph.conf as per suggested, when we re-run ceph-mds we noticed the
> following:
>
>
> 2014-04-29 10:45:22.260798 7f90b971d700  0 log [WRN] :  replayed op
> client.324186:51366457,12681393 no session for client.324186
> 2014-04-29 10:45:22.262419 7f90b971d700  0 log [WRN] :  replayed op
> client.324186:51366475,12681393 no session for client.324186
> 2014-04-29 10:45:22.267699 7f90b971d700  0 log [WRN] :  replayed op
> client.324186:5135,12681393 no session for client.324186
> 2014-04-29 10:45:22.271664 7f90b971d700  0 log [WRN] :  replayed op
> client.324186:51366724,12681393 no session for client.324186
> 2014-04-29 10:45:22.281050 7f90b971d700  0 log [WRN] :  replayed op
> client.324186:51366945,12681393 no session for client.324186
> 2014-04-29 10:45:22.283196 7f90b971d700  0 log [WRN] :  replayed op
> client.324186:51366996,12681393 no session for client.324186
> 2014-04-29 10:45:22.287801 7f90b971d700  0 log [WRN] :  replayed op
> client.324186:51367043,12681393 no session for client.324186
> 2014-04-29 10:45:22.289967 7f90b971d700  0 log [WRN] :  replayed op
> client.324186:51367082,12681393 no session for client.324186
> 2014-04-29 10:45:22.291026 7f90b971d700  0 log [WRN] :  replayed op
> client.324186:51367110,12681393 no session for client.324186
> 2014-04-29 10:45:22.294459 7f90b971d700  0 log [WRN] :  replayed op
> client.324186:51367192,12681393 no session for client.324186
> 2014-04-29 10:45:22.297228 7f90b971d700  0 log [WRN] :  replayed op
> client.324186:51367257,12681393 no session for client.324186
> 2014-04-29 10:45:22.297477 7f90b971d700  0 log [WRN] :  replayed op
> client.324186:51367264,12681393 no session for client.324186
>
> tcmalloc: large alloc 1136660480 bytes == 0xb2019000 @  0x7f90c2564da7
> 0x5bb9cb 0x5ac8eb 0x5b32f7 0x79ecd8 0x58cbed 0x7f90c231de9a
> 0x7f90c0cca3fd
> tcmalloc: large alloc 2273316864 bytes == 0x15d73d000 @
> 0x7f90c2564da7 0x5bb9cb 0x5ac8eb 0x5b32f7 0x79ecd8 0x58cbed
> 0x7f90c231de9a 0x7f90c0cca3fd
>
> ceph -s shows that MDS up:replay,
>
> Also the messages above seemed to be repeating again after a while but
> with a different session number. Is there a way for us to determine
> that we are on the right track? Thanks.
>

It's on the right track as long as the MDS doesn't crash.

> Regards,
> Luke
>
> On Sun, Apr 27, 2014 at 12:04 PM, Yan, Zheng  wrote:
>> On Sat, Apr 26, 2014 at 9:56 AM, Jingyuan Luke  wrote:
>>> Hi Greg,
>>>
>>> Actually our cluster is pretty empty, but we suspect we had a temporary
>>> network disconnection to one of our OSD, not sure if this caused the
>>> problem.
>>>
>>> Anyway we don't mind try the method you mentioned, how can we do that?
>>>
>>
>> compile ceph-mds with the attached patch. add a line "mds
>> wipe_sessions = 1" to the ceph.conf,
>>
>> Yan, Zheng
>>
>>> Regards,
>>> Luke
>>>
>>>
>>> On Saturday, April 26, 2014, Gregory Farnum  wrote:

 Hmm, it looks like your on-disk SessionMap is horrendously out of
 date. Did your cluster get full at some point?

 In any case, we're working on tools to repair this now but they aren't
 ready for use yet. Probably the only thing you could do is create an
 empty sessionmap with a higher version than the ones the journal
 refers to, but that might have other fallout effects...
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Fri, Apr 25, 2014 at 2:57 AM, Mohd Bazli Ab Karim
  wrote:
 > More logs. I ran ceph-mds  with debug-mds=20.
 >
 > -2> 2014-04-25 17:47:54.839672 7f0d6f3f0700 10 mds.0.journal
 > EMetaBlob.replay inotable tablev 4316124 <= table 4317932
 > -1> 2014-04-25 17:47:54.839674 7f0d6f3f0700 10 mds.0.journal
 > EMetaBlob.replay sessionmap v8632368 -(1|2) == table 7239603 prealloc
 > [141df86~1] used 141db9e
 >   0> 2014-04-25 17:47:54.840733 7f0d6f3f0700 -1 mds/journal.cc: In
 > function 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' 
 > thread
 > 7f0d6f3f0700 time 2014-04-25 17:47:54.839688 mds/journal.cc: 1303: FAILED
 > assert(session)
 >
 > Please look at the attachment for more details.
 >
 > Regards,
 > Bazli
 >
 > From: Mohd Bazli Ab Karim
 > Sent: Friday, April 25, 2014 12:26 PM
 > To: 'ceph-de...@vger.kernel.org'; ceph-users@lists.ceph.com
 > Subject: Ceph mds laggy and failed assert in function replay
 > mds/journal.cc
 >
 > Dear Ceph-devel, ceph-users,
 >
 > I am currently facing issue with my ceph mds server. Ceph-mds daemon
 > does not want to bring up back.
 > Tried running that manually with ceph-mds -i mon01 -d but it shows that
 > it stucks at failed assert(session) line 1303 in mds/journal.cc and 
 > aborted.
 >
 > Can someone shed some light in this issue.
 > ceph version 0.72.2 (a913de