Re: [ceph-users] collectd / graphite / grafana .. calamari?

2014-05-23 Thread Alexandre DERUMIER
>>https://github.com/rochaporto/collectd-ceph 
>>
>>It has a set of collectd plugins pushing metrics which mostly map what 
>>the ceph commands return. In the setup we have it pushes them to 
>>graphite and the displays rely on grafana (check for a screenshot in 
>>the link above). 


Thanks for sharing ricardo !

I was looking to create a dashboard for grafana too, yours seem very good :)



- Mail original - 

De: "Ricardo Rocha"  
À: "'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com)" 
, ceph-de...@vger.kernel.org 
Envoyé: Vendredi 23 Mai 2014 02:58:04 
Objet: collectd / graphite / grafana .. calamari? 

Hi. 

I saw the thread a couple days ago on ceph-users regarding collectd... 
and yes, i've been working on something similar for the last few days 
:) 

https://github.com/rochaporto/collectd-ceph 

It has a set of collectd plugins pushing metrics which mostly map what 
the ceph commands return. In the setup we have it pushes them to 
graphite and the displays rely on grafana (check for a screenshot in 
the link above). 

As it relies on common building blocks, it's easily extensible and 
we'll come up with new dashboards soon - things like plotting osd data 
against the metrics from the collectd disk plugin, which we also 
deploy. 

This email is mostly to share the work, but also to check on Calamari? 
I asked Patrick after the RedHat/Inktank news and have no idea what it 
provides, but i'm sure it comes with lots of extra sauce - he 
suggested to ask in the list. 

What's the timeline to have it open sourced? It would be great to have 
a look at it, and as there's work from different people in this area 
maybe start working together on some fancier monitoring tools. 

Regards, 
Ricardo 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majord...@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread jan.zeller
Dear ceph,

I am trying to setup ceph 0.80.1 with the following components :

1 x mon - Debian Wheezy (i386)
3 x osds - Debian Wheezy (i386)

(all are kvm powered)

Status after the standard setup procedure :

root@ceph-node2:~# ceph -s
cluster d079dd72-8454-4b4a-af92-ef4c424d96d8
 health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs 
stuck unclean
 monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, election epoch 2, 
quorum 0 ceph-node1
 osdmap e11: 3 osds: 3 up, 3 in
  pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects
103 MB used, 15223 MB / 15326 MB avail
 192 incomplete

root@ceph-node2:~# ceph health
HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean

root@ceph-node2:~# ceph osd tree
# idweight  type name   up/down reweight
-1  0   root default
-2  0   host ceph-node2
0   0   osd.0   up  1
-3  0   host ceph-node3
1   0   osd.1   up  1
-4  0   host ceph-node4
2   0   osd.2   up  1


root@ceph-node2:~# ceph osd dump
epoch 11
fsid d079dd72-8454-4b4a-af92-ef4c424d96d8
created 2014-05-23 09:00:08.780211
modified 2014-05-23 09:01:33.438001
flags 

pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
crash_replay_interval 45 stripe_width 0

pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
stripe_width 0

pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool stripe_width 0 
max_osd 3

osd.0 up   in  weight 1 up_from 4 up_thru 5 down_at 0 last_clean_interval [0,0) 
192.168.123.49:6800/11373 192.168.123.49:6801/11373 192.168.123.49:6802/11373 
192.168.123.49:6803/11373 exists,up 21a7d2a8-b709-4a28-bc3b-850913fe4c6b

osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) 
192.168.123.50:6800/10542 192.168.123.50:6801/10542 192.168.123.50:6802/10542 
192.168.123.50:6803/10542 exists,up c1cd3ad1-b086-438f-a22d-9034b383a1be

osd.2 up   in  weight 1 up_from 11 up_thru 0 down_at 0 last_clean_interval 
[0,0) 192.168.123.53:6800/6962 192.168.123.53:6801/6962 
192.168.123.53:6802/6962 192.168.123.53:6803/6962 exists,up 
aa06d7e4-181c-4d70-bb8e-018b088c5053


What am I doing wrong here ?
Or what kind of additional information should be provided to get troubleshooted.

thanks,

---

Jan

P.S. with emperor 0.72.2 I had no such problems
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Screencast/tutorial on setting up Ceph

2014-05-23 Thread Sankar P
Hi,

I have four old machines lying around. I would like to setup ceph on
these machines.

Are there any screencast or tutorial with commands, on how to obtain,
install and configure on ceph on these machines ?

The official documentation page "OS Recommendations" seem to list only
old distros and not the new version of distros (openSUSE and Ubuntu).

So I wanted to ask if there is a screencast or tutorial or techtalk on
how to setup Ceph for a total newbie ?

-- 
Sankar P
http://psankar.blogspot.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph deploy on rhel6.5 installs ceph from el6 and fails

2014-05-23 Thread Lukac, Erik
Hi Simon,

thanks for your reply.
I already installed OS for my ceph-nodes via Kickstart (via network) from 
Redhat Satellite and I dont want to do that again because some other config had 
also been done.
xfsprogs is not part of the rhel base repository but of some extra package with 
costs per node/CPU/whatever called "Scalable File System". For some other nodes 
I installed xfsprogs from centos-6-base repo but now I want to try a clean 
rhel-based-only install and so I'll add ceph on my nodes from 
/etc/yum.repos.d/ceph, install manually with yum and then do a ceph-deploy and 
see what will happen ;)

Greetz from munich

Erik


Von: Simon Ironside [sirons...@caffetine.org]
Gesendet: Freitag, 23. Mai 2014 01:07
An: Lukac, Erik
Cc: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] ceph deploy on rhel6.5 installs ceph from el6 and 
fails

On 22/05/14 23:56, Lukac, Erik wrote:
> But: this fails because of the dependencies. xfsprogs is in rhel6 repo,
> but not in el6 L

I hadn't noticed that xfsprogs is included in the ceph repos, I'm using
the package from the RHEL 6.5 DVD, which is the same version, you'll
find it in the ScalableFileSystem repo on the Install DVD.

HTH,
Simon.

--
Bayerischer Rundfunk; Rundfunkplatz 1; 80335 München
Telefon: +49 89 590001; E-Mail: i...@br.de; Website: http://www.BR.de
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about "osd objectstore = keyvaluestore-dev" setting

2014-05-23 Thread GMail


发自我的 iPhone

> 在 2014年5月22日,22:26,Gregory Farnum  写道:
> 
>> On Thu, May 22, 2014 at 5:04 AM, Geert Lindemulder  
>> wrote:
>> Hello All
>> 
>> Trying to implement the osd leveldb backend at an existing ceph test
>> cluster.
>> The test cluster was updated from 0.72.1 to 0.80.1. The update was ok.
>> After the update, the "osd objectstore = keyvaluestore-dev" setting was
>> added to ceph.conf.
> 
> Does that mean you tried to switch to the KeyValueStore on one of your
> existing OSDs? That isn't going to work; you'll need to create new
> ones (or knock out old ones and recreate them with it).
> 
>> After restarting an osd it gives the following error:
>> 2014-05-22 12:28:06.805290 7f2e7d9de800 -1 KeyValueStore::mount : stale
>> version stamp 3. Please run the KeyValueStore update script before starting
>> the OSD, or set keyvaluestore_update_to to 1
>> 
>> How can the "keyvaluestore_update_to" parameter be set or where can i find
>> the "KeyValueStore update script"
> 
> Hmm, it looks like that config value isn't actually plugged in to the
> KeyValueStore, so you can't set it with the stock binaries. Maybe
> Haomai has an idea?

yes, the error is that keyvaluestore read version from existing osd data. The 
version is incorrect and maybe there should be more clear error message.

> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Screencast/tutorial on setting up Ceph

2014-05-23 Thread jan.zeller
> -Ursprüngliche Nachricht-
> Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag
> von Sankar P
> Gesendet: Freitag, 23. Mai 2014 11:14
> An: ceph-users@lists.ceph.com
> Betreff: [ceph-users] Screencast/tutorial on setting up Ceph
> 
> Hi,
> 
> I have four old machines lying around. I would like to setup ceph on these
> machines.
> 
> Are there any screencast or tutorial with commands, on how to obtain,
> install and configure on ceph on these machines ?
> 
> The official documentation page "OS Recommendations" seem to list only
> old distros and not the new version of distros (openSUSE and Ubuntu).
> 
> So I wanted to ask if there is a screencast or tutorial or techtalk on how to
> setup Ceph for a total newbie ?
> 
> --
> Sankar P
> http://psankar.blogspot.com

Hi,

I am rookie too and only used just this : 
http://ceph.com/docs/master/start/

it's a very nice doc

---

jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw Timeout

2014-05-23 Thread Georg Höllrigl

On 22.05.2014 15:36, Yehuda Sadeh wrote:

On Thu, May 22, 2014 at 6:16 AM, Georg Höllrigl
 wrote:

Hello List,

Using the radosgw works fine, as long as the amount of data doesn't get too
big.

I have created one bucket that holds many small files, separated into
different "directories". But whenever I try to acess the bucket, I only run
into some timeout. The timeout is at around 30 - 100 seconds. This is
smaller then the Apache timeout of 300 seconds.

I've tried to access the bucket with different clients - one thing is s3cmd
- which still is able to upload things, but takes rather long time, when
listing the contents.
Then I've  tried with s3fs-fuse - which throws
ls: reading directory .: Input/output error

Also Cyberduck and S3Browser show a similar behaivor.

Is there an option, to only send back maybe 1000 list entries, like Amazon
das? So that the client might decide, if he want's to list all the contents?



That how it works, it doesn't return more than 1000 entries at once.


OK. I found that in the Requests. So it's the client, that states how 
many objects should be in the listing with sending the max-keys=1000 
variable:


- - - [23/May/2014:08:49:33 +] "GET 
/test/?delimiter=%2F&max-keys=1000&prefix HTTP/1.1" 200 715 "-" 
"Cyberduck/4.4.4 (14505) (Windows NT (unknown)/6.2) (x86)" 
"xidrasservice.com:443"



Are there any timeout values in radosgw?


Are you sure the timeout is in the gateway itself? Could be apache
that is timing out. Will need to see the apache access logs for these
operations, radosgw debug and messenger logs (debug rgw = 20, debug ms
= 1), to give a better answer.


No I'm not sure where the timeout comes from. As far as I can tell, 
apache times out after 300 seconds - so that should not be the problem.


I think I found something in the apache logs:
[Fri May 23 08:59:39.385548 2014] [fastcgi:error] [pid 3035:tid 
140723006891776] [client 10.0.1.66:46049] FastCGI: comm with server 
"/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
[Fri May 23 08:59:39.385604 2014] [fastcgi:error] [pid 3035:tid 
140723006891776] [client 10.0.1.66:46049] FastCGI: incomplete headers (0 
bytes) received from server "/var/www/s3gw.fcgi"


I've increased the timeout to 900 in the apache vhosts config:
FastCgiExternalServer /var/www/s3gw.fcgi -socket 
/var/run/ceph/radosgw.vvx-ceph-m-02 -idle-timeout 900

Now it's not working, and I don't get a log entry any more.

Most interesting when watching the debug output - I'm getting that rados 
successfully finished with the request. But at the same time, the client 
tells me, it failed.


I've shortened the log file, as far as I could see, the info repeats 
itself...


2014-05-23 09:38:43.051395 7f1b427fc700  1 == starting new request 
req=0x7f1b3400f1c0 =
2014-05-23 09:38:43.051597 7f1b427fc700  1 -- 10.0.1.107:0/1005898 --> 
10.0.1.199:6800/14453 -- osd_op(client.72942.0:120 UHXW458EH1RVULE1BCEH 
[getxattrs,stat] 11.10193f7e ack+read e279) v4 -- ?+0 0x7f1b4640 con 
0x2455930
2014-05-23 09:38:43.053180 7f1b96d80700  1 -- 10.0.1.107:0/1005898 <== 
osd.0 10.0.1.199:6800/14453 23  osd_op_reply(120 
UHXW458EH1RVULE1BCEH [getxattrs,stat] v0'0 uv1 ondisk = 0) v6  
229+0+20 (1060030390 0 1010060712) 0x7f1b58002540 con 0x2455930
2014-05-23 09:38:43.053380 7f1b427fc700  1 -- 10.0.1.107:0/1005898 --> 
10.0.1.199:6800/14453 -- osd_op(client.72942.0:121 UHXW458EH1RVULE1BCEH 
[read 0~524288] 11.10193f7e ack+read e279) v4 -- ?+0 0x7f1b45d0 con 
0x2455930
2014-05-23 09:38:43.054359 7f1b96d80700  1 -- 10.0.1.107:0/1005898 <== 
osd.0 10.0.1.199:6800/14453 24  osd_op_reply(121 
UHXW458EH1RVULE1BCEH [read 0~8] v0'0 uv1 ondisk = 0) v6  187+0+8 
(3510944971 0 3829959217) 0x7f1b580057b0 con 0x2455930
2014-05-23 09:38:43.054490 7f1b427fc700  1 -- 10.0.1.107:0/1005898 --> 
10.0.1.199:6806/15018 -- osd_op(client.72942.0:122 macm [getxattrs,stat] 
7.1069f101 ack+read e279) v4 -- ?+0 0x7f1b6010 con 0x2457de0
2014-05-23 09:38:43.055871 7f1b96d80700  1 -- 10.0.1.107:0/1005898 <== 
osd.2 10.0.1.199:6806/15018 3  osd_op_reply(122 macm 
[getxattrs,stat] v0'0 uv46 ondisk = 0) v6  213+0+91 (22324782 0 
2022698800) 0x7f1b500025a0 con 0x2457de0
2014-05-23 09:38:43.055963 7f1b427fc700  1 -- 10.0.1.107:0/1005898 --> 
10.0.1.199:6806/15018 -- osd_op(client.72942.0:123 macm [read 0~524288] 
7.1069f101 ack+read e279) v4 -- ?+0 0x7f1b3950 con 0x2457de0
2014-05-23 09:38:43.057087 7f1b96d80700  1 -- 10.0.1.107:0/1005898 <== 
osd.2 10.0.1.199:6806/15018 4  osd_op_reply(123 macm [read 0~310] 
v0'0 uv46 ondisk = 0) v6  171+0+310 (3762965810 0 1648184722) 
0x7f1b500026e0 con 0x2457de0
2014-05-23 09:38:43.057364 7f1b427fc700  1 -- 10.0.1.107:0/1005898 --> 
10.0.0.26:6809/4834 -- osd_op(client.72942.0:124 store [call 
version.read,getxattrs,stat] 5.c5755cee ack+read e279) v4 -- ?+0 
0x7f1b66b0 con 0x7f1b440022e0
2014-05-23 09:38:43.059223 7f1b96d80700  1 -- 10.0.1.107:0/1005898 <== 
osd.7 10.0.0.26:6809/4834 37  osd_op_reply(12

Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread Karan Singh
Try increasing the placement groups for pools

ceph osd pool set data pg_num 128  
ceph osd pool set data pgp_num 128

similarly for other 2 pools as well.

- karan -


On 23 May 2014, at 11:50, jan.zel...@id.unibe.ch wrote:

> Dear ceph,
> 
> I am trying to setup ceph 0.80.1 with the following components :
> 
> 1 x mon - Debian Wheezy (i386)
> 3 x osds - Debian Wheezy (i386)
> 
> (all are kvm powered)
> 
> Status after the standard setup procedure :
> 
> root@ceph-node2:~# ceph -s
>cluster d079dd72-8454-4b4a-af92-ef4c424d96d8
> health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs 
> stuck unclean
> monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, election epoch 
> 2, quorum 0 ceph-node1
> osdmap e11: 3 osds: 3 up, 3 in
>  pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects
>103 MB used, 15223 MB / 15326 MB avail
> 192 incomplete
> 
> root@ceph-node2:~# ceph health
> HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean
> 
> root@ceph-node2:~# ceph osd tree
> # idweight  type name   up/down reweight
> -1  0   root default
> -2  0   host ceph-node2
> 0   0   osd.0   up  1
> -3  0   host ceph-node3
> 1   0   osd.1   up  1
> -4  0   host ceph-node4
> 2   0   osd.2   up  1
> 
> 
> root@ceph-node2:~# ceph osd dump
> epoch 11
> fsid d079dd72-8454-4b4a-af92-ef4c424d96d8
> created 2014-05-23 09:00:08.780211
> modified 2014-05-23 09:01:33.438001
> flags 
> 
> pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
> rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
> crash_replay_interval 45 stripe_width 0
> 
> pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
> rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
> stripe_width 0
> 
> pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
> rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
> stripe_width 0 max_osd 3
> 
> osd.0 up   in  weight 1 up_from 4 up_thru 5 down_at 0 last_clean_interval 
> [0,0) 192.168.123.49:6800/11373 192.168.123.49:6801/11373 
> 192.168.123.49:6802/11373 192.168.123.49:6803/11373 exists,up 
> 21a7d2a8-b709-4a28-bc3b-850913fe4c6b
> 
> osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval 
> [0,0) 192.168.123.50:6800/10542 192.168.123.50:6801/10542 
> 192.168.123.50:6802/10542 192.168.123.50:6803/10542 exists,up 
> c1cd3ad1-b086-438f-a22d-9034b383a1be
> 
> osd.2 up   in  weight 1 up_from 11 up_thru 0 down_at 0 last_clean_interval 
> [0,0) 192.168.123.53:6800/6962 192.168.123.53:6801/6962 
> 192.168.123.53:6802/6962 192.168.123.53:6803/6962 exists,up 
> aa06d7e4-181c-4d70-bb8e-018b088c5053
> 
> 
> What am I doing wrong here ?
> Or what kind of additional information should be provided to get 
> troubleshooted.
> 
> thanks,
> 
> ---
> 
> Jan
> 
> P.S. with emperor 0.72.2 I had no such problems
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Screencast/tutorial on setting up Ceph

2014-05-23 Thread Karan Singh
use my blogs if you like   
http://karan-mj.blogspot.fi/2013/12/ceph-storage-part-2.html 

- Karan Singh -

On 23 May 2014, at 12:30,   
wrote:

>> -Ursprüngliche Nachricht-
>> Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag
>> von Sankar P
>> Gesendet: Freitag, 23. Mai 2014 11:14
>> An: ceph-users@lists.ceph.com
>> Betreff: [ceph-users] Screencast/tutorial on setting up Ceph
>> 
>> Hi,
>> 
>> I have four old machines lying around. I would like to setup ceph on these
>> machines.
>> 
>> Are there any screencast or tutorial with commands, on how to obtain,
>> install and configure on ceph on these machines ?
>> 
>> The official documentation page "OS Recommendations" seem to list only
>> old distros and not the new version of distros (openSUSE and Ubuntu).
>> 
>> So I wanted to ask if there is a screencast or tutorial or techtalk on how to
>> setup Ceph for a total newbie ?
>> 
>> --
>> Sankar P
>> http://psankar.blogspot.com
> 
> Hi,
> 
> I am rookie too and only used just this : 
> http://ceph.com/docs/master/start/
> 
> it's a very nice doc
> 
> ---
> 
> jan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw Timeout

2014-05-23 Thread Georg Höllrigl
Thank you very much - I think I've solved the whole thing. It wasn't in 
radosgw.


The solution was,
- increase the timeout in Apache conf.
- when using haproxy, also increase the timeouts there!


Georg

On 22.05.2014 15:36, Yehuda Sadeh wrote:

On Thu, May 22, 2014 at 6:16 AM, Georg Höllrigl
 wrote:

Hello List,

Using the radosgw works fine, as long as the amount of data doesn't get too
big.

I have created one bucket that holds many small files, separated into
different "directories". But whenever I try to acess the bucket, I only run
into some timeout. The timeout is at around 30 - 100 seconds. This is
smaller then the Apache timeout of 300 seconds.

I've tried to access the bucket with different clients - one thing is s3cmd
- which still is able to upload things, but takes rather long time, when
listing the contents.
Then I've  tried with s3fs-fuse - which throws
ls: reading directory .: Input/output error

Also Cyberduck and S3Browser show a similar behaivor.

Is there an option, to only send back maybe 1000 list entries, like Amazon
das? So that the client might decide, if he want's to list all the contents?



That how it works, it doesn't return more than 1000 entries at once.



Are there any timeout values in radosgw?


Are you sure the timeout is in the gateway itself? Could be apache
that is timing out. Will need to see the apache access logs for these
operations, radosgw debug and messenger logs (debug rgw = 20, debug ms
= 1), to give a better answer.

Yehuda


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread Michael
64 PG's per pool /shouldn't/ cause any issues while there's only 3 
OSD's. It'll be something to pay attention to if a lot more get added 
through.


Your replication setup is probably anything other than host.
You'll want to extract your crush map then decompile it and see if your 
"step" is set to osd or rack.

If it's not host then change it to that and pull it in again.

Check the docs on crush maps 
http://ceph.com/docs/master/rados/operations/crush-map/ for more info.


-Michael

On 23/05/2014 10:53, Karan Singh wrote:

Try increasing the placement groups for pools

ceph osd pool set data pg_num 128
ceph osd pool set data pgp_num 128

similarly for other 2 pools as well.

- karan -


On 23 May 2014, at 11:50, jan.zel...@id.unibe.ch 
 wrote:



Dear ceph,

I am trying to setup ceph 0.80.1 with the following components :

1 x mon - Debian Wheezy (i386)
3 x osds - Debian Wheezy (i386)

(all are kvm powered)

Status after the standard setup procedure :

root@ceph-node2:~# ceph -s
   cluster d079dd72-8454-4b4a-af92-ef4c424d96d8
health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 
192 pgs stuck unclean
monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, election 
epoch 2, quorum 0 ceph-node1

osdmap e11: 3 osds: 3 up, 3 in
 pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects
   103 MB used, 15223 MB / 15326 MB avail
192 incomplete

root@ceph-node2:~# ceph health
HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck 
unclean


root@ceph-node2:~# ceph osd tree
# idweight  type name   up/down reweight
-1  0   root default
-2  0   host ceph-node2
0   0   osd.0   up  1
-3  0   host ceph-node3
1   0   osd.1   up  1
-4  0   host ceph-node4
2   0   osd.2   up  1


root@ceph-node2:~# ceph osd dump
epoch 11
fsid d079dd72-8454-4b4a-af92-ef4c424d96d8
created 2014-05-23 09:00:08.780211
modified 2014-05-23 09:01:33.438001
flags

pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags 
hashpspool crash_replay_interval 45 stripe_width 0


pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags 
hashpspool stripe_width 0


pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
stripe_width 0 max_osd 3


osd.0 up   in  weight 1 up_from 4 up_thru 5 down_at 0 
last_clean_interval [0,0) 192.168.123.49:6800/11373 
192.168.123.49:6801/11373 192.168.123.49:6802/11373 
192.168.123.49:6803/11373 exists,up 21a7d2a8-b709-4a28-bc3b-850913fe4c6b


osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 
last_clean_interval [0,0) 192.168.123.50:6800/10542 
192.168.123.50:6801/10542 192.168.123.50:6802/10542 
192.168.123.50:6803/10542 exists,up c1cd3ad1-b086-438f-a22d-9034b383a1be


osd.2 up   in  weight 1 up_from 11 up_thru 0 down_at 0 
last_clean_interval [0,0) 192.168.123.53:6800/6962 
192.168.123.53:6801/6962 192.168.123.53:6802/6962 
192.168.123.53:6803/6962 exists,up aa06d7e4-181c-4d70-bb8e-018b088c5053



What am I doing wrong here ?
Or what kind of additional information should be provided to get 
troubleshooted.


thanks,

---

Jan

P.S. with emperor 0.72.2 I had no such problems
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] collectd / graphite / grafana .. calamari?

2014-05-23 Thread John Spray
Hi Ricardo,

Let me share a few notes on metrics in calamari:
 * We're bundling graphite, and using diamond to send home metrics.
The diamond collector used in calamari has always been open source
[1].
 * The Calamari UI has its own graphs page that talks directly to the
graphite API (the calamari REST API does not duplicate any of the
graphing interface)
 * We also bundle the default graphite dashboard, so that folks can go
to /graphite/dashboard/ on the calamari server to plot anything custom
they want to.

It could be quite interesting hook in Grafana there in the same way
that we currently hook in the default graphite dashboard, as it
grafana definitely nicer and would give us a roadmap to influxdb (a
project I am quite excited about).

Cheers,
John

1. https://github.com/ceph/Diamond/commits/calamari

On Fri, May 23, 2014 at 1:58 AM, Ricardo Rocha  wrote:
> Hi.
>
> I saw the thread a couple days ago on ceph-users regarding collectd...
> and yes, i've been working on something similar for the last few days
> :)
>
> https://github.com/rochaporto/collectd-ceph
>
> It has a set of collectd plugins pushing metrics which mostly map what
> the ceph commands return. In the setup we have it pushes them to
> graphite and the displays rely on grafana (check for a screenshot in
> the link above).
>
> As it relies on common building blocks, it's easily extensible and
> we'll come up with new dashboards soon - things like plotting osd data
> against the metrics from the collectd disk plugin, which we also
> deploy.
>
> This email is mostly to share the work, but also to check on Calamari?
> I asked Patrick after the RedHat/Inktank news and have no idea what it
> provides, but i'm sure it comes with lots of extra sauce - he
> suggested to ask in the list.
>
> What's the timeline to have it open sourced? It would be great to have
> a look at it, and as there's work from different people in this area
> maybe start working together on some fancier monitoring tools.
>
> Regards,
>   Ricardo
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw Timeout

2014-05-23 Thread Georg Höllrigl



On 22.05.2014 17:30, Craig Lewis wrote:

On 5/22/14 06:16 , Georg Höllrigl wrote:


I have created one bucket that holds many small files, separated into
different "directories". But whenever I try to acess the bucket, I
only run into some timeout. The timeout is at around 30 - 100 seconds.
This is smaller then the Apache timeout of 300 seconds.


Just so we're all talking about the same things, what does "many small
files" mean to you?  Also, how are you separating them into
"directories"?  Are you just giving files in the same "directory" the
same leading string, like "dir1_subdir1_filename"?


I can only estimate how many files. ATM I've 25M files on the origin but 
only 1/10th has been synced to radosgw. These are distributed throuhg 20 
folders, each containing about 2k directories with ~ 100 - 500 files each.


Do you think that's too much in that usecase?


I'm putting about 1M objects, random sizes, in each bucket.  I'm not
having problems getting individual files, or uploading new ones.  It
does take a long time for s3cmd to list the contents of the bucket. The
only time I get timeouts is when my cluster is very unhealthy.

If you're doing a lot more than that, say 10M or 100M objects, then that
could cause a hot spot on disk.  You might be better off taking your
"directories", and putting them in their own bucket.


--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter
  | Facebook
  | LinkedIn
  | Blog




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Occasional Missing Admin Sockets

2014-05-23 Thread Loic Dachary
Hi Mike,

Sorry I missed this message. Are you able to reproduce the problem ? Does it 
always happen when you logrotate --force or only sometimes ?

Cheers

On 13/05/2014 21:23, Gregory Farnum wrote:
> Yeah, I just did so. :(
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 
> 
> On Tue, May 13, 2014 at 11:41 AM, Mike Dawson  
> wrote:
>> Greg/Loic,
>>
>> I can confirm that "logrotate --force /etc/logrotate.d/ceph" removes the
>> monitor admin socket on my boxes running 0.80.1 just like the description in
>> Issue 7188 [0].
>>
>> 0: http://tracker.ceph.com/issues/7188
>>
>> Should that bug be reopened?
>>
>> Thanks,
>> Mike Dawson
>>
>>
>>
>> On 5/13/2014 2:10 PM, Gregory Farnum wrote:
>>>
>>> On Tue, May 13, 2014 at 9:06 AM, Mike Dawson 
>>> wrote:

 All,

 I have a recurring issue where the admin sockets
 (/var/run/ceph/ceph-*.*.asok) may vanish on a running cluster while the
 daemons keep running
>>>
>>>
>>> Hmm.
>>>
 (or restart without my knowledge).
>>>
>>>
>>> I'm guessing this might be involved:
>>>
 I see this issue on
 a dev cluster running Ubuntu and Ceph Emperor/Firefly, deployed with
 ceph-deploy using Upstart to control daemons. I never see this issue on
 Ubuntu / Dumpling / sysvinit.
>>>
>>>
>>> *goes and greps the git log*
>>>
>>> I'm betting it was commit 45600789f1ca399dddc5870254e5db883fb29b38
>>> (which has, in fact, been backported to dumpling and emperor),
>>> intended so that turning on a new daemon wouldn't remove the admin
>>> socket of an existing one. But I think that means that if you activate
>>> the new daemon before the old one has finished shutting down and
>>> unlinking, you would end up with a daemon that had no admin socket.
>>> Perhaps it's an incomplete fix and we need a tracker ticket?
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Occasional Missing Admin Sockets

2014-05-23 Thread Loic Dachary


On 13/05/2014 20:10, Gregory Farnum wrote:
> On Tue, May 13, 2014 at 9:06 AM, Mike Dawson  wrote:
>> All,
>>
>> I have a recurring issue where the admin sockets
>> (/var/run/ceph/ceph-*.*.asok) may vanish on a running cluster while the
>> daemons keep running
> 
> Hmm.
> 
>> (or restart without my knowledge).
> 
> I'm guessing this might be involved:
> 
>> I see this issue on
>> a dev cluster running Ubuntu and Ceph Emperor/Firefly, deployed with
>> ceph-deploy using Upstart to control daemons. I never see this issue on
>> Ubuntu / Dumpling / sysvinit.
> 
> *goes and greps the git log*
> 
> I'm betting it was commit 45600789f1ca399dddc5870254e5db883fb29b38
> (which has, in fact, been backported to dumpling and emperor),
> intended so that turning on a new daemon wouldn't remove the admin
> socket of an existing one. But I think that means that if you activate
> the new daemon before the old one has finished shutting down and
> unlinking, you would end up with a daemon that had no admin socket.
> Perhaps it's an incomplete fix and we need a tracker ticket?

https://github.com/ceph/ceph/commit/45600789f1ca399dddc5870254e5db883fb29b38

I see the race condition now, missed it the first time around, thanks Greg :-) 
I'll work on it.

Cheers

> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread Alexandre DERUMIER
Hi,

if you use debian,

try to use a recent kernel from backport (>3.10)

also check your libleveldb1 version, it should be 1.9.0-1~bpo70+1  (debian 
wheezy version is too old)

I don't see it in ceph repo:
http://ceph.com/debian-firefly/pool/main/l/leveldb/

(only for squeeze ~bpo60+1)

but you can take it from our proxmox repository
http://download.proxmox.com/debian/dists/wheezy/pve-no-subscription/binary-amd64/libleveldb1_1.9.0-1~bpo70+1_amd64.deb


- Mail original - 

De: "jan zeller"  
À: ceph-users@lists.ceph.com 
Envoyé: Vendredi 23 Mai 2014 10:50:40 
Objet: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean 

Dear ceph, 

I am trying to setup ceph 0.80.1 with the following components : 

1 x mon - Debian Wheezy (i386) 
3 x osds - Debian Wheezy (i386) 

(all are kvm powered) 

Status after the standard setup procedure : 

root@ceph-node2:~# ceph -s 
cluster d079dd72-8454-4b4a-af92-ef4c424d96d8 
health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck 
unclean 
monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, election epoch 2, 
quorum 0 ceph-node1 
osdmap e11: 3 osds: 3 up, 3 in 
pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects 
103 MB used, 15223 MB / 15326 MB avail 
192 incomplete 

root@ceph-node2:~# ceph health 
HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean 

root@ceph-node2:~# ceph osd tree 
# id weight type name up/down reweight 
-1 0 root default 
-2 0 host ceph-node2 
0 0 osd.0 up 1 
-3 0 host ceph-node3 
1 0 osd.1 up 1 
-4 0 host ceph-node4 
2 0 osd.2 up 1 


root@ceph-node2:~# ceph osd dump 
epoch 11 
fsid d079dd72-8454-4b4a-af92-ef4c424d96d8 
created 2014-05-23 09:00:08.780211 
modified 2014-05-23 09:01:33.438001 
flags 

pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
crash_replay_interval 45 stripe_width 0 

pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
stripe_width 0 

pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool stripe_width 0 
max_osd 3 

osd.0 up in weight 1 up_from 4 up_thru 5 down_at 0 last_clean_interval [0,0) 
192.168.123.49:6800/11373 192.168.123.49:6801/11373 192.168.123.49:6802/11373 
192.168.123.49:6803/11373 exists,up 21a7d2a8-b709-4a28-bc3b-850913fe4c6b 

osd.1 up in weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) 
192.168.123.50:6800/10542 192.168.123.50:6801/10542 192.168.123.50:6802/10542 
192.168.123.50:6803/10542 exists,up c1cd3ad1-b086-438f-a22d-9034b383a1be 

osd.2 up in weight 1 up_from 11 up_thru 0 down_at 0 last_clean_interval [0,0) 
192.168.123.53:6800/6962 192.168.123.53:6801/6962 192.168.123.53:6802/6962 
192.168.123.53:6803/6962 exists,up aa06d7e4-181c-4d70-bb8e-018b088c5053 


What am I doing wrong here ? 
Or what kind of additional information should be provided to get 
troubleshooted. 

thanks, 

--- 

Jan 

P.S. with emperor 0.72.2 I had no such problems 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about "osd objectstore = keyvaluestore-dev" setting

2014-05-23 Thread Geert Lindemulder
Hello Greg and Haomai,

Thanks for the answers.
I was trying to implement the osd leveldb backend at an existing ceph
test cluster.

At the moment i am removing the osd's one by one and recreate them with
the objectstore = keyvaluestore-dev option in place in ceph.conf.
This works fine and the backend is leveldb now for the new osd's.
The leveldb backend looks more efficient.

The error gave me the idea that migrating from non-leveldb backend osd
to new type leveldb was possible.
Will online migration of existings osd's be added in the future?

Thanks,
Geert

On 05/23/2014 11:31 AM, GMail wrote:
> implement the osd leveldb backend at an existing ceph test
> >> cluster.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread jan.zeller
> -Ursprüngliche Nachricht-
> Von: Alexandre DERUMIER [mailto:aderum...@odiso.com]
> Gesendet: Freitag, 23. Mai 2014 13:20
> An: Zeller, Jan (ID)
> Cc: ceph-users@lists.ceph.com
> Betreff: Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck
> unclean
> 
> Hi,
> 
> if you use debian,
> 
> try to use a recent kernel from backport (>3.10)
> 
> also check your libleveldb1 version, it should be 1.9.0-1~bpo70+1  (debian
> wheezy version is too old)
> 
> I don't see it in ceph repo:
> http://ceph.com/debian-firefly/pool/main/l/leveldb/
> 
> (only for squeeze ~bpo60+1)
> 
> but you can take it from our proxmox repository
> http://download.proxmox.com/debian/dists/wheezy/pve-no-
> subscription/binary-amd64/libleveldb1_1.9.0-1~bpo70+1_amd64.deb
> 

thanks Alexandre, due to this I'll try the whole setup on Ubuntu 12.04.
May be it's going to be a bit more easier...

---

jan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread Alexandre DERUMIER
>>thanks Alexandre, due to this I'll try the whole setup on Ubuntu 12.04. 
>>May be it's going to be a bit more easier... 

Yes,I think you can use last ubuntu lts, I think ceph 0.79 is officialy 
supported, so it should not be a problem for firefly.


- Mail original - 

De: "jan zeller"  
À: aderum...@odiso.com 
Cc: ceph-users@lists.ceph.com 
Envoyé: Vendredi 23 Mai 2014 13:36:04 
Objet: AW: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean 

> -Ursprüngliche Nachricht- 
> Von: Alexandre DERUMIER [mailto:aderum...@odiso.com] 
> Gesendet: Freitag, 23. Mai 2014 13:20 
> An: Zeller, Jan (ID) 
> Cc: ceph-users@lists.ceph.com 
> Betreff: Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck 
> unclean 
> 
> Hi, 
> 
> if you use debian, 
> 
> try to use a recent kernel from backport (>3.10) 
> 
> also check your libleveldb1 version, it should be 1.9.0-1~bpo70+1 (debian 
> wheezy version is too old) 
> 
> I don't see it in ceph repo: 
> http://ceph.com/debian-firefly/pool/main/l/leveldb/ 
> 
> (only for squeeze ~bpo60+1) 
> 
> but you can take it from our proxmox repository 
> http://download.proxmox.com/debian/dists/wheezy/pve-no- 
> subscription/binary-amd64/libleveldb1_1.9.0-1~bpo70+1_amd64.deb 
> 

thanks Alexandre, due to this I'll try the whole setup on Ubuntu 12.04. 
May be it's going to be a bit more easier... 

--- 

jan 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to create authentication signature for getting user details

2014-05-23 Thread Shanil S
Hi All,

I would like to create a function for getting the user details by passing a
user id ( id) using php and curl. I am planning to pass the user id as
'admin' ( admin is a user which is already there ) and get the details of
that user. Could you please tell me how we can create the authentication
signature for this ? I tried the way as like in
http://mashupguide.net/1.0/html/ch16s05.xhtml#ftn.d0e27318 but its not
working and getting a "Failed to authenticate error" ( this is because the
signature is not generating properly )

If anyone knows a proper ways to generate authentication signature using
php, please help me to solve this.

-- 
Regards
Shanil
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about "osd objectstore = keyvaluestore-dev" setting

2014-05-23 Thread Wang Haomai


Best Wishes!

> 在 2014年5月23日,19:27,Geert Lindemulder  写道:
> 
> Hello Greg and Haomai,
> 
> Thanks for the answers.
> I was trying to implement the osd leveldb backend at an existing ceph
> test cluster.
> 
> At the moment i am removing the osd's one by one and recreate them with
> the objectstore = keyvaluestore-dev option in place in ceph.conf.
> This works fine and the backend is leveldb now for the new osd's.
> The leveldb backend looks more efficient.

Happy to see it, although I'm still try to improve performance for some 
workloads.

> 
> The error gave me the idea that migrating from non-leveldb backend osd
> to new type leveldb was possible.
> Will online migration of existings osd's be added in the future?

Still not, I think it's a good feature. We can implement it at ObjectStore 
class and simply convert one type to another
> 
> Thanks,
> Geert
> 
>> On 05/23/2014 11:31 AM, GMail wrote:
>> implement the osd leveldb backend at an existing ceph test
 cluster.
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to get Object ID ?

2014-05-23 Thread Shashank Puntamkar
I want to know/read  Objet ID assigned by ceph to file which I
transfered via crossftp.
How can  I read 64bit Object ID?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd pool default pg num problem

2014-05-23 Thread Cao, Buddy
In Firefly, I added below lines to [global] section in ceph.conf, however, 
after creating the cluster, the default pool “metadata/data/rbd”’s pg num is 
still over 900 but not 375.  Any suggestion?


osd pool default pg num = 375
osd pool default pgp num = 375


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Pool snaps

2014-05-23 Thread Thorwald Lundqvist
Hi!

I can't find any information about ceph osd pool snapshots, except for the
commands mksnap and rmsnap.

What features does snapshots enable? Can I do things such as
diff-export/import just like rbd can?


Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

2014-05-23 Thread jan.zeller
Thanks for your tips & tricks.



This setup is now based on ubuntu 12.04, ceph version 0.80.1



Still using



1 x mon

3 x osds





root@ceph-node2:~# ceph osd tree

# idweight type name up/downreweight

-10 root default

-20 host ceph-node2

0 0 osd.0 up
  1

-30 host ceph-node3

1 0 osd.1 up
  1

-40 host ceph-node1

2 0 osd.2 up
  1



root@ceph-node2:~# ceph -s

cluster c30e1410-fe1a-4924-9112-c7a5d789d273

 health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs 
stuck unclean

 monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, election epoch 2, 
quorum 0 ceph-node1

 osdmap e11: 3 osds: 3 up, 3 in

  pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects

102 MB used, 15224 MB / 15326 MB avail

 192 incomplete







root@ceph-node2:~# cat mycrushmap.txt

# begin crush map

tunable choose_local_tries 0

tunable choose_local_fallback_tries 0

tunable choose_total_tries 50

tunable chooseleaf_descend_once 1



# devices

device 0 osd.0

device 1 osd.1

device 2 osd.2



# types

type 0 osd

type 1 host

type 2 chassis

type 3 rack

type 4 row

type 5 pdu

type 6 pod

type 7 room

type 8 datacenter

type 9 region

type 10 root



# buckets

host ceph-node2 {

id -2  # do not change unnecessarily

# weight 0.000

alg straw

hash 0   # rjenkins1

item osd.0 weight 0.000

}

host ceph-node3 {

id -3  # do not change unnecessarily

# weight 0.000

alg straw

hash 0   # rjenkins1

item osd.1 weight 0.000

}

host ceph-node1 {

id -4  # do not change unnecessarily

# weight 0.000

alg straw

hash 0   # rjenkins1

item osd.2 weight 0.000

}

root default {

id -1  # do not change unnecessarily

# weight 0.000

alg straw

hash 0   # rjenkins1

item ceph-node2 weight 0.000

item ceph-node3 weight 0.000

item ceph-node1 weight 0.000

}



# rules

rule replicated_ruleset {

ruleset 0

type replicated

min_size 1

max_size 10

step take default

step chooseleaf firstn 0 type host

step emit

}



# end crush map





Is there anything wrong with it ?







root@ceph-node2:~# ceph osd dump

epoch 11

fsid c30e1410-fe1a-4924-9112-c7a5d789d273

created 2014-05-23 15:16:57.772981

modified 2014-05-23 15:18:17.022152

flags



pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
crash_replay_interval 45 stripe_width 0



pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool 
stripe_width 0



pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool stripe_width 0 
max_osd 3



osd.0 up   in  weight 1 up_from 4 up_thru 5 down_at 0 last_clean_interval [0,0) 
192.168.123.49:6800/4714 192.168.123.49:6801/4714 192.168.123.49:6802/4714 
192.168.123.49:6803/4714 exists,up bc991a4b-9e60-4759-b35a-7f58852aa804



osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) 
192.168.123.50:6800/4685 192.168.123.50:6801/4685 192.168.123.50:6802/4685 
192.168.123.50:6803/4685 exists,up bd099d83-2483-42b9-9dbc-7f4e4043ca60



osd.2 up   in  weight 1 up_from 11 up_thru 0 down_at 0 last_clean_interval 
[0,0) 192.168.123.53:6800/16807 192.168.123.53:6801/16807 
192.168.123.53:6802/16807 192.168.123.53:6803/16807 exists,up 
80a302d0-3493-4c39-b34b-5af233b32ba1





thanks

Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von 
Michael
Gesendet: Freitag, 23. Mai 2014 12:36
An: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs stuck unclean

64 PG's per pool shouldn't cause any issues while there's only 3 OSD's. It'll 
be something to pay attention to if a lot more get added through.

Your replication setup is probably anything other than host.
You'll want to extract your crush map then decompile it and see if your "step" 
is set to osd or rack.
If it's not host then change it to that and pull it in again.

Check the docs on c

Re: [ceph-users] Unable to update Swift ACL's on existing containers

2014-05-23 Thread James Page
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Yehuda

On 23/05/14 02:25, Yehuda Sadeh wrote:
> That looks like a bug; generally the permission checks there are 
> broken. I opened issue #8428, and pushed a fix on top of the
> firefly branch to wip-8428.

I cherry picked the fix and tested - LGTM.

Thanks for the quick fix.

Cheers

James

- -- 
James Page
Ubuntu and Debian Developer
james.p...@ubuntu.com
jamesp...@debian.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJTf06vAAoJEL/srsug59jDXokP/3FREIK0HPOl9ZvA3d+y+XUx
J6v5mR9BMzVpY4yE4VIB7iZB7FiOPk9McqUSacmDhYvBy1KEwA92NYcF8G79GMiI
eNOTYFh0hAg3Lw+y79X8jJ4eWlw2NJGyqsm84UfkOLYOTIPCBOzeqv8X9tVUhChv
k20rEmIb0HBJnLp6gScyTrNgX1csOu2MdK+3/GlLeV8MiQJscea8lkbehDhdIJDj
FzLfTxPi2tFM8vfR1O/zvcotsWSq1xq2HdXcM1KTIJukMF++mfH6pHMUGthSCUzF
/g7DETg+IkGL3crxoZSDODztFR/Q7tD7KCKbd5jH29za11fvhZy9ZamcfJp7gsem
G70NYm3gC2kGnFu9A06IBNlwjDDTCzr1cTpdk2xi+kzGBqfshbJ4ppGvnQIypb29
689xXvwLJpIPAR56EGRlxY4W88z7E5krX72XcBTNsrIZP/KvrpKxSMgEhj8N4xZu
o3PVZlkMUJ8sOfDG5tWQRF7Nas6AyFhHodBW3vWtykkmW/+aI5dBCMMpm6QoNlMu
8WTGReqs6Skv/kxrpwmhlNLtl9JYU6xrF42/MKKg5zy6pxvRIffSqWV+oy9MdISb
hmtTCHTA9Fuj0/n/nUOCi3ZAwroEzcFwknYTivHiTLDaFu7u2eSl28sAczCQ2vie
bWYkBOn4FLvFtlnJ2kPF
=m4xJ
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd pool default pg num problem

2014-05-23 Thread John Spray
Those settings are applied when creating new pools with "osd pool
create", but not to the pools that are created automatically during
cluster setup.

We've had the same question before
(http://comments.gmane.org/gmane.comp.file-systems.ceph.user/8150), so
maybe it's worth opening a ticket to do something about it.

Cheers,
John

On Fri, May 23, 2014 at 2:01 PM, Cao, Buddy  wrote:
> In Firefly, I added below lines to [global] section in ceph.conf, however,
> after creating the cluster, the default pool “metadata/data/rbd”’s pg num is
> still over 900 but not 375.  Any suggestion?
>
>
>
>
>
> osd pool default pg num = 375
>
> osd pool default pgp num = 375
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to find the disk partitions attached to a OSD

2014-05-23 Thread Sharmila Govind
Thanks:-) That helped.

Thanks & Regards,
Sharmila


On Thu, May 22, 2014 at 6:41 PM, Alfredo Deza wrote:

> Hopefully I am not late to the party :)
>
> But ceph-deploy recently gained a `osd list` subcommand that does this
> plus a bunch of other interesting metadata:
>
> $ ceph-deploy osd list node1
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /Users/alfredo/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.2):
> /Users/alfredo/.virtualenvs/ceph-deploy/bin/ceph-deploy osd list node1
> [node1][DEBUG ] connected to host: node1
> [node1][DEBUG ] detect platform information from remote host
> [node1][DEBUG ] detect machine type
> [node1][INFO  ] Running command: sudo ceph --cluster=ceph osd tree
> --format=json
> [node1][DEBUG ] connected to host: node1
> [node1][DEBUG ] detect platform information from remote host
> [node1][DEBUG ] detect machine type
> [node1][INFO  ] Running command: sudo ceph-disk list
> [node1][INFO  ] 
> [node1][INFO  ] ceph-0
> [node1][INFO  ] 
> [node1][INFO  ] Path   /var/lib/ceph/osd/ceph-0
> [node1][INFO  ] ID 0
> [node1][INFO  ] Name   osd.0
> [node1][INFO  ] Status up
> [node1][INFO  ] Reweight   1.00
> [node1][INFO  ] Magic  ceph osd volume v026
> [node1][INFO  ] Journal_uuid   214a6865-416b-4c09-b031-a354d4f8bdff
> [node1][INFO  ] Active ok
> [node1][INFO  ] Device /dev/sdb1
> [node1][INFO  ] Whoami 0
> [node1][INFO  ] Journal path   /dev/sdb2
> [node1][INFO  ] 
>
> On Thu, May 22, 2014 at 8:30 AM, John Spray 
> wrote:
> > On Thu, May 22, 2014 at 10:57 AM, Sharmila Govind
> >  wrote:
> >> root@cephnode4:/mnt/ceph/osd2# mount |grep ceph
> >> /dev/sdc on /mnt/ceph/osd3 type ext4 (rw)
> >> /dev/sdb on /mnt/ceph/osd2 type ext4 (rw)
> >>
> >> All the above commands just pointed out the mount
> points(/mnt/ceph/osd3),
> >> the folders were named by me as ceph/osd. But, if a new user has to get
> the
> >> osd mapping to the mounted devices, would be difficult if we named the
> osd
> >> disk folders differently. Any other command which could give the mapping
> >> would be useful.
> >
> > It really depends on how you have set up the OSDs.  If you're using
> > ceph-deploy or ceph-disk to partition and format the drives, they get
> > a special partition type set which marks them as a Ceph OSD.  On a
> > system set up that way, you get nice uniform output like this:
> >
> > # ceph-disk list
> > /dev/sda :
> >  /dev/sda1 other, ext4, mounted on /boot
> >  /dev/sda2 other, LVM2_member
> > /dev/sdb :
> >  /dev/sdb1 ceph data, active, cluster ceph, osd.0, journal /dev/sdb2
> >  /dev/sdb2 ceph journal, for /dev/sdb1
> > /dev/sdc :
> >  /dev/sdc1 ceph data, active, cluster ceph, osd.3, journal /dev/sdc2
> >  /dev/sdc2 ceph journal, for /dev/sdc1
> >
> > John
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to backup mon-data?

2014-05-23 Thread Fabian Zimmermann
Hello,

I’m running a 3 node cluster with 2 hdd/osd and one mon on each node.
Sadly the fsyncs done by mon-processes eat my hdd.

I was able to disable this impact by moving the mon-data-dir to ramfs.
This should work until at least 2 nodes are running, but I want to implement 
some kind of disaster recover.

What’s the correct way to backup mon-data - if there is any?

Thanks,

Fabian



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Dan Van Der Ster
Hi,
I think you’re rather brave (sorry, foolish) to store the mon data dir in 
ramfs. One power outage and your cluster is dead. Even with good backups of the 
data dir I wouldn't want to go through that exercise.

Saying that, we had a similar disk-io-bound problem with the mon data dirs, and 
solved it by moving the mons to SSDs. Maybe in your case using the cfq io 
scheduler would help, since at least then the OSD and MON processes would get 
fair shares of the disk IOs.

Anyway, to backup the data dirs, you need to stop the mon daemon to get a 
consistent leveldb before copying the data to a safe place.
Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --


On 23 May 2014, at 15:45, Fabian Zimmermann  wrote:

> Hello,
> 
> I’m running a 3 node cluster with 2 hdd/osd and one mon on each node.
> Sadly the fsyncs done by mon-processes eat my hdd.
> 
> I was able to disable this impact by moving the mon-data-dir to ramfs.
> This should work until at least 2 nodes are running, but I want to implement 
> some kind of disaster recover.
> 
> What’s the correct way to backup mon-data - if there is any?
> 
> Thanks,
> 
> Fabian
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] network Ports Linked to each OSD process

2014-05-23 Thread Sharmila Govind
Hi,

Iam trying to do some network control on the storage nodes. For this, I
need to know the ports opened for communication by each OSD processes.

I got to know from the link
http://ceph.com/docs/master/rados/configuration/network-config-ref/ , that
each OSD process requires 3 ports and they from port 6800 it is reserved
for OSD processes.

However, when I do a ceph osd dump command, it lists 4 ports in use for
each of the OSDs:

*root@cephnode2:~# ceph osd dump | grep osd*
*max_osd 4*
*osd.0 up   in  weight 1 up_from 71 up_thru 71 down_at 68
last_clean_interval [4,70) 10.223.169.166:6800/83380
 10.223.169.166:6810/1083380
 10.223.169.166:6811/1083380
 10.223.169.166:6812/1083380
 exists,up
fdbbc6eb-7d9f-4ad8-a8c3-caf995422528*
*osd.1 up   in  weight 1 up_from 7 up_thru 71 down_at 0 last_clean_interval
[0,0) 10.223.169.201:6800/83569 
10.223.169.201:6801/83569 
10.223.169.201:6802/83569 
10.223.169.201:6803/83569  exists,up
db545fd7-071f-4671-b1c4-c57221f894a3*
*osd.2 up   in  weight 1 up_from 64 up_thru 64 down_at 61
last_clean_interval [12,60) 10.223.169.166:6805/92402
 10.223.169.166:6806/92402
 10.223.169.166:6807/92402
 10.223.169.166:6808/92402
 exists,up
594b73b9-1908-4757-b914-d887d850b386*
*osd.3 up   in  weight 1 up_from 17 up_thru 71 down_at 0
last_clean_interval [0,0) 10.223.169.201:6805/84590
 10.223.169.201:6806/84590
 10.223.169.201:6807/84590
 10.223.169.201:6808/84590
 exists,up
37536050-ef92-4eba-95a7-e7a099c6d059*
*root@cephnode2:~# *



I also, listed the ports listening on the above highlighted OSD  process
using lsof

 root@cephnode2:~/nethogs# lsof -i | grep ceph | grep 83380
ntpd1627  ntp   19u  IPv4   33890  0t0  UDP
cephnode2.iind.intel.com:ntp
*ceph-osd   83380 root4u  IPv4 4881747  0t0  TCP *:6800 (LISTEN)*
*ceph-osd   83380 root5u  IPv4 5045544  0t0  TCP
cephnode2.iind.intel.com:6810 
(LISTEN)*
*ceph-osd   83380 root6u  IPv4 5045545  0t0  TCP
cephnode2.iind.intel.com:6811 
(LISTEN)*
*ceph-osd   83380 root7u  IPv4 5045546  0t0  TCP
cephnode2.iind.intel.com:6812 
(LISTEN)*
*ceph-osd   83380 root8u  IPv4 4881751  0t0  TCP *:6804 (LISTEN)*
ceph-osd   83380 root   19u  IPv4 5101954  0t0  TCP
cephnode2.iind.intel.com:6800->computeich.iind.intel.com:60781 (ESTABLISHED)
ceph-osd   83380 root   23u  IPv4 5013387  0t0  TCP
cephnode2.iind.intel.com:41878->cephnode4.iind.intel.com:6803 (ESTABLISHED)
ceph-osd   83380 root   25u  IPv4 5037728  0t0  TCP
cephnode2.iind.intel.com:44251->cephnode4.iind.intel.com:6802 (ESTABLISHED)
ceph-osd   83380 root   83u  IPv4 5025954  0t0  TCP
cephnode2.iind.intel.com:47863->cephnode4.iind.intel.com:6808 (ESTABLISHED)
ceph-osd   83380 root  111u  IPv4 4850005  0t0  TCP
cephnode2.iind.intel.com:43189->cephnode2.iind.intel.com:6807 (ESTABLISHED)
ceph-osd   83380 root  112u  IPv4 4850839  0t0  TCP
cephnode2.iind.intel.com:59738->cephnode2.iind.intel.com:6808 (ESTABLISHED)
ceph-osd   83380 root  130u  IPv4 5037729  0t0  TCP
cephnode2.iind.intel.com:41902->cephnode4.iind.intel.com:6807 (ESTABLISHED)
ceph-osd   83380 root  152u  IPv4 5013621  0t0  TCP
cephnode2.iind.intel.com:34798->cephmon.iind.intel.com:6789 (ESTABLISHED)
ceph-osd   83380 root  159u  IPv4 5040569  0t0  TCP
cephnode2.iind.intel.com:6811->cephnode4.iind.intel.com:35321 (ESTABLISHED)
ceph-osd   83380 root  160u  IPv4 5040570  0t0  TCP
cephnode2.iind.intel.com:6812->cephnode4.iind.intel.com:42682 (ESTABLISHED)
ceph-osd   83380 root  161u  IPv4 5043767  0t0  TCP
cephnode2.iind.intel.com:6812->cephnode4.iind.intel.com:42683 (ESTABLISHED)
ceph-osd   83380 root  162u  IPv4 5038664  0t0  TCP
cephnode2.iind.intel.com:6811->cephnode4.iind.intel.com:35324 (ESTABLISHED)


In the above list, it looks like it is listening to some additional
ports(6810-6812) from what is listed in the "ceph osd dump" command.

I would like to know, if there is any straight way of listing the ports
used by each OSD process.

Also, I would also like to understand the networking architecture of Ceph
in more detail. Is there any link/doc for the same?

Thanks in Advance,
Sharmila
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Wido den Hollander

On 05/23/2014 04:09 PM, Dan Van Der Ster wrote:

Hi,
I think you’re rather brave (sorry, foolish) to store the mon data dir in 
ramfs. One power outage and your cluster is dead. Even with good backups of the 
data dir I wouldn't want to go through that exercise.



Agreed. Foolish. I'd never do that.


Saying that, we had a similar disk-io-bound problem with the mon data dirs, and 
solved it by moving the mons to SSDs. Maybe in your case using the cfq io 
scheduler would help, since at least then the OSD and MON processes would get 
fair shares of the disk IOs.

Anyway, to backup the data dirs, you need to stop the mon daemon to get a 
consistent leveldb before copying the data to a safe place.


I wrote a blog about this: 
http://blog.widodh.nl/2014/03/safely-backing-up-your-ceph-monitors/


Wido


Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --


On 23 May 2014, at 15:45, Fabian Zimmermann  wrote:


Hello,

I’m running a 3 node cluster with 2 hdd/osd and one mon on each node.
Sadly the fsyncs done by mon-processes eat my hdd.

I was able to disable this impact by moving the mon-data-dir to ramfs.
This should work until at least 2 nodes are running, but I want to implement 
some kind of disaster recover.

What’s the correct way to backup mon-data - if there is any?

Thanks,

Fabian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow IOPS on RBD compared to journal and backing devices

2014-05-23 Thread Christian Balzer

For what it's worth (very little in my case)...

Since the cluster wasn't in production yet and Firefly (0.80.1) did hit
Debian Jessie today I upgraded it.

Big mistake...

I did the recommended upgrade song and dance, MONs first, OSDs after that.

Then applied "ceph osd crush tunables default" as per the update
instructions and since "ceph -s" was whining about it.

Lastly I did a "ceph osd pool set rbd hashpspool true" and after that was
finished (people with either a big cluster or slow network probably should
avoid this like the plague) I re-ran the below fio from a VM (old or new
client libraries made no difference) again.

The result, 2800 write IOPS instead of 3200 with Emperor.

So much for improved latency and whatnot...

Christian

On Wed, 14 May 2014 21:33:06 +0900 Christian Balzer wrote:

> 
> Hello!
> 
> On Wed, 14 May 2014 11:29:47 +0200 Josef Johansson wrote:
> 
> > Hi Christian,
> > 
> > I missed this thread, haven't been reading the list that well the last
> > weeks.
> > 
> > You already know my setup, since we discussed it in an earlier thread.
> > I don't have a fast backing store, but I see the slow IOPS when doing
> > randwrite inside the VM, with rbd cache. Still running dumpling here
> > though.
> > 
> Nods, I do recall that thread.
> 
> > A thought struck me that I could test with a pool that consists of OSDs
> > that have tempfs-based disks, think I have a bit more latency than your
> > IPoIB but I've pushed 100k IOPS with the same network devices before.
> > This would verify if the problem is with the journal disks. I'll also
> > try to run the journal devices in tempfs as well, as it would test
> > purely Ceph itself.
> >
> That would be interesting indeed.
> Given what I've seen (with the journal at 20% utilization and the actual
> filestore ataround 5%) I'd expect Ceph to be the culprit. 
>  
> > I'll get back to you with the results, hopefully I'll manage to get
> > them done during this night.
> >
> Looking forward to that. ^^
> 
> 
> Christian 
> > Cheers,
> > Josef
> > 
> > On 13/05/14 11:03, Christian Balzer wrote:
> > > I'm clearly talking to myself, but whatever.
> > >
> > > For Greg, I've played with all the pertinent journal and filestore
> > > options and TCP nodelay, no changes at all.
> > >
> > > Is there anybody on this ML who's running a Ceph cluster with a fast
> > > network and FAST filestore, so like me with a big HW cache in front
> > > of a RAID/JBODs or using SSDs for final storage?
> > >
> > > If so, what results do you get out of the fio statement below per
> > > OSD? In my case with 4 OSDs and 3200 IOPS that's about 800 IOPS per
> > > OSD, which is of course vastly faster than the normal indvidual HDDs
> > > could do.
> > >
> > > So I'm wondering if I'm hitting some inherent limitation of how fast
> > > a single OSD (as in the software) can handle IOPS, given that
> > > everything else has been ruled out from where I stand.
> > >
> > > This would also explain why none of the option changes or the use of
> > > RBD caching has any measurable effect in the test case below. 
> > > As in, a slow OSD aka single HDD with journal on the same disk would
> > > clearly benefit from even the small 32MB standard RBD cache, while in
> > > my test case the only time the caching becomes noticeable is if I
> > > increase the cache size to something larger than the test data size.
> > > ^o^
> > >
> > > On the other hand if people here regularly get thousands or tens of
> > > thousands IOPS per OSD with the appropriate HW I'm stumped. 
> > >
> > > Christian
> > >
> > > On Fri, 9 May 2014 11:01:26 +0900 Christian Balzer wrote:
> > >
> > >> On Wed, 7 May 2014 22:13:53 -0700 Gregory Farnum wrote:
> > >>
> > >>> Oh, I didn't notice that. I bet you aren't getting the expected
> > >>> throughput on the RAID array with OSD access patterns, and that's
> > >>> applying back pressure on the journal.
> > >>>
> > >> In the a "picture" being worth a thousand words tradition, I give
> > >> you this iostat -x output taken during a fio run:
> > >>
> > >> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> > >>   50.820.00   19.430.170.00   29.58
> > >>
> > >> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
> > >> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> > >> sda   0.0051.500.00 1633.50 0.00  7460.00
> > >> 9.13 0.180.110.000.11   0.01   1.40 sdb
> > >> 0.00 0.000.00 1240.50 0.00  5244.00 8.45 0.30
> > >> 0.250.000.25   0.02   2.00 sdc   0.00 5.00
> > >> 0.00 2468.50 0.00 13419.0010.87 0.240.100.00
> > >> 0.10   0.09  22.00 sdd   0.00 6.500.00 1913.00
> > >> 0.00 10313.0010.78 0.200.100.000.10   0.09
> > >> 16.60
> > >>
> > >> The %user CPU utilization is pretty much entirely the 2 OSD
> > >> processes, note the nearly complete absence of iowait.
> > >>
> > >> sda and sdb are the OSDs RA

Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Fabian Zimmermann
Hi,

Am 23.05.2014 um 16:09 schrieb Dan Van Der Ster :

> Hi,
> I think you’re rather brave (sorry, foolish) to store the mon data dir in 
> ramfs. One power outage and your cluster is dead. Even with good backups of 
> the data dir I wouldn't want to go through that exercise.
> 

I know - I’m still testing my env and I don’t really plan to use ramfs in prod, 
but technically it’s quite interesting ;)

> Saying that, we had a similar disk-io-bound problem with the mon data dirs, 
> and solved it by moving the mons to SSDs. Maybe in your case using the cfq io 
> scheduler would help, since at least then the OSD and MON processes would get 
> fair shares of the disk IOs.

Oh, when did they switch the default sched to deadline? Thanks for the hint, 
moved to cfq - tests are running.

> Anyway, to backup the data dirs, you need to stop the mon daemon to get a 
> consistent leveldb before copying the data to a safe place.

Well, this wouldn’t be a real problem, but I’m worrying about how effective 
this would be?

Is it enough to restore such a backup even if in the meantime (since the backup 
was done) data-objects have changed? I don’t think so :(

Conclude:

* ceph would stop/freeze as soon as amount of nodes is less than quorum
* ceph would continue to work as soon as node go up again
* I could create a fresh mon on every node directly on boot by importing 
current state " ceph-mon --force-sync --yes-i-really-mean-it ..."

So, as long as there are enough mon to build the quorum, it should work with 
ramfs. 
If nodes fail one by one, ceph would stop if quorum is lost and continue if 
nodes are back.
But if all nodes stop (f.e. poweroutage) my ceph-cluster is dead and backups 
wouldn’t prevent this, isn’t it?

Maybe snapshotting the pool could help?

Backup:
* create a snapshot
* shutdown one mon
* backup mon-dir

Restore:
* import mon-dir
* create further mons until quorum is restored
* restore snapshot

Possible?.. :D

Thanks,

Fabian


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Fabian Zimmermann

Hi,

> Am 23.05.2014 um 17:31 schrieb "Wido den Hollander" :
> 
> I wrote a blog about this: 
> http://blog.widodh.nl/2014/03/safely-backing-up-your-ceph-monitors/

so you assume restoring the old data is working, or did you proof this?

Fabian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Day Boston Schedule Released

2014-05-23 Thread Patrick McGarry
Hey cephers,

Just wanted to let you know that the schedule has been posted for Ceph
Day Boston happening on 10 June at the Sheraton Boston, MA:

http://www.inktank.com/cephdays/boston/

There are still a couple of talk title tweaks that are pending, but I
wanted to get the info out as soon as possible.  We have some really
solid speakers, including a couple of highly technical talks from the
CohortFS guys and a demo of one of the hot new ethernet drives that is
poised to take the market by storm.

If you haven't signed up yet, please don't wait!  We want to make sure
we can adequately accommodate everyone that wishes to attend.  Thanks,
and see you there!


Best Regards,

Patrick McGarry
Director, Community || Inktank
http://ceph.com  ||  http://inktank.com
@scuttlemonkey || @ceph || @inktank
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] centos and 'print continue' support

2014-05-23 Thread Bryan Stillwell
Yesterday I went through manually configuring a ceph cluster with a
rados gateway on centos 6.5, and I have a question about the
documentation.  On this page:

https://ceph.com/docs/master/radosgw/config/

It mentions "On CentOS/RHEL distributions, turn off print continue. If
you have it set to true, you may encounter problems with PUT
operations."  However, when I had 'rgw print continue = false' in my
ceph.conf, adding objects with the python boto module would hang at:

key.set_contents_from_string('Hello World!')

After switching it to 'rgw print continue = true' things started working.

I'm wondering if this is because I installed the custom
apache/mod_fastcgi packages from the instructions on this page?:

http://ceph.com/docs/master/install/install-ceph-gateway/#id2

If that's the case, could the docs be updated to mention that setting
'rgw print continue = false' is only needed if you're using the distro
packages?

Thanks,
Bryan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd pool default pg num problem

2014-05-23 Thread McNamara, Bradley
The other thing to note, too, is that it appears you're trying to decrease the 
PG/PGP_num parameters, which is not supported.  In order to decrease those 
settings, you'll need to delete and recreate the pools.  All new pools created 
will use the settings defined in the ceph.conf file.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of John 
Spray
Sent: Friday, May 23, 2014 6:38 AM
To: Cao, Buddy
Cc: ceph-users@lists.ceph.com; ceph-u...@ceph.com
Subject: Re: [ceph-users] osd pool default pg num problem

Those settings are applied when creating new pools with "osd pool create", but 
not to the pools that are created automatically during cluster setup.

We've had the same question before
(http://comments.gmane.org/gmane.comp.file-systems.ceph.user/8150), so maybe 
it's worth opening a ticket to do something about it.

Cheers,
John

On Fri, May 23, 2014 at 2:01 PM, Cao, Buddy  wrote:
> In Firefly, I added below lines to [global] section in ceph.conf, 
> however, after creating the cluster, the default pool 
> “metadata/data/rbd”’s pg num is still over 900 but not 375.  Any suggestion?
>
>
>
>
>
> osd pool default pg num = 375
>
> osd pool default pgp num = 375
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Designing a cluster with ceph and benchmark (ceph vs ext4)

2014-05-23 Thread Listas@Adminlinux

Hi !

I have failover clusters for some aplications. Generally with 2 members 
configured with Ubuntu + Drbd + Ext4. For example, my IMAP cluster works 
fine with ~ 50k email accounts and my HTTP cluster hosts ~2k sites.


See design here: http://adminlinux.com.br/cluster_design.txt

I would like to provide load balancing instead of just failover. So, I 
would like to use a distributed architecture of the filesystem. As we 
know, Ext4 isn't a distributed filesystem. So wish to use Ceph in my 
clusters.


Any suggestions for design of the cluster with Ubuntu+Ceph?

I built a simple cluster of 2 servers to test simultaneous reading and 
writing with Ceph. My conf:  http://adminlinux.com.br/ceph_conf.txt


But in my simultaneous benchmarks found errors in reading and writing. I 
ran "iozone -t 5 -r 4k -s 2m" simultaneously on both servers in the 
cluster. The performance was poor and had errors like this:


Error in file: Found ?0? Expecting ?6d6d6d6d6d6d6d6d? addr b660
Error in file: Position 1060864
Record # 259 Record size 4 kb
where b660 loop 0

Performance graphs of benchmark: http://adminlinux.com.br/ceph_bench.html

Can you help me find what I did wrong?

Thanks !

--
Thiago Henrique
www.adminlinux.com.br
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests

2014-05-23 Thread Craig Lewis

On 5/22/14 11:51 , Győrvári Gábor wrote:

Hello,

Got this kind of logs in two node of 3 node cluster both node has 2 
OSD, only affected 2 OSD on two separate node thats why i dont 
understand the situation. There wasnt any extra io on the system at 
the given time.


Using radosgw with s3 api to store objects under ceph average ops 
around 20-150 and bw usage 100-2000kb read / sec and only 50-1000kb / 
sec written.


osd_op(client.7821.0:67251068 
default.4181.1_products/800x600/537e28022fdcc.jpg [cmpxattr 
user.rgw.idtag (22) op 1 mode 1,setxattr user.rgw.idtag (33),call 
refcount.put] 11.fe53a6fb e590) v4 *currently waiting for subops from 
[2] **

*


Are any of your PGs in recovery or backfill?

I've seen this happen two different ways.  The first time was because I 
had the recovery and backfill parameters set too high for my cluster.  
If your journals aren't SSDs, the default parameters are too high.  The 
recovery operation will use most of the IOps, and starve the clients.


The second time I saw this is when one disk was starting to fail. 
Sectors starting failing, and the drive spent a lot of time reading and 
remapping bad sectors.  Consumer class SATA disks will retry bad sectors 
for 30+ second.  It happens in the drive firmware, so it's not something 
you can stop.  Enterprise class drives will give up quicker, since they 
know you have another copy of the data.  (Nobody uses enterprise class 
drives stand-alone; they're always in some sort of storage array).


I've had reports of 6+ OSDs blocking subops, and I traced it back to one 
disk that was blocking others.  I replaced that disk, and the warnings 
went away.



If your cluster is healthy, check the SMART attributes for osd.2. If 
osd.2 looks good, it might another osd.  Check osd.2 logs, and check any 
osd that are blocking osd.2.  If your cluster is small, it might be 
faster to just check all disks instead of following the trail.




--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd pool default pg num problem

2014-05-23 Thread Craig Lewis
If you're not using CephFS, you don't need metadata or data pools.  You 
can delete them.

If you're not using RBD, you don't need the rbd pool.

If you are using CephFS, and you do delete and recreate the 
metadata/data pools, you'll need to tell CephFS.  I think the command is 
ceph mds add_data_pool .  I'm not using CephFS, so I 
can't test that.  I'm don't see any commands to set the metadata pool 
for CephFS, but it seems strange that you have to tell it about the data 
pool, but not the metadata pool.




On 5/23/14 11:22 , McNamara, Bradley wrote:

The other thing to note, too, is that it appears you're trying to decrease the 
PG/PGP_num parameters, which is not supported.  In order to decrease those 
settings, you'll need to delete and recreate the pools.  All new pools created 
will use the settings defined in the ceph.conf file.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of John 
Spray
Sent: Friday, May 23, 2014 6:38 AM
To: Cao, Buddy
Cc: ceph-users@lists.ceph.com; ceph-u...@ceph.com
Subject: Re: [ceph-users] osd pool default pg num problem

Those settings are applied when creating new pools with "osd pool create", but 
not to the pools that are created automatically during cluster setup.

We've had the same question before
(http://comments.gmane.org/gmane.comp.file-systems.ceph.user/8150), so maybe 
it's worth opening a ticket to do something about it.

Cheers,
John

On Fri, May 23, 2014 at 2:01 PM, Cao, Buddy  wrote:

In Firefly, I added below lines to [global] section in ceph.conf,
however, after creating the cluster, the default pool
“metadata/data/rbd”’s pg num is still over 900 but not 375.  Any suggestion?





osd pool default pg num = 375

osd pool default pgp num = 375






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Wido den Hollander

On 05/23/2014 06:30 PM, Fabian Zimmermann wrote:


Hi,


Am 23.05.2014 um 17:31 schrieb "Wido den Hollander" :

I wrote a blog about this: 
http://blog.widodh.nl/2014/03/safely-backing-up-your-ceph-monitors/


so you assume restoring the old data is working, or did you proof this?



No, that won't work in ALL situations. But it's always better to have a 
backup of your mons instead of having none.



Fabian




--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Craig Lewis

On 5/23/14 09:30 , Fabian Zimmermann wrote:

Hi,


Am 23.05.2014 um 17:31 schrieb "Wido den Hollander" :

I wrote a blog about this: 
http://blog.widodh.nl/2014/03/safely-backing-up-your-ceph-monitors/

so you assume restoring the old data is working, or did you proof this?


I did some of the same things, but never tested a restore 
(http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/3087). 
There is a discussion, but I can't figure out how to get gmane to show 
me the threaded version from a google search.



I stopped doing the backups, because they seemed rather useless.

The monitors have a snapshot of the cluster state right now.  If you 
ever need to restore a monitor backup, you're effectively rolling the 
whole cluster back to that point in time.


What happens if you've added disks after the backup?
What happens if a disk has failed after the backup?
What happens if you write data to the cluster after the backup?
What happens if you delete data after the backup, and it gets garbage 
collected?


All questions that can be tested and answered... with a lot of time and 
experimentation.  I decided to add more monitors and stop taking backups.



I'm still thinking about doing manual backups before a major ceph 
version upgrade.  In that case, I'd only need to test the write/delete 
cases, because I can control the the add/remove disk cases.  The backups 
would only be useful between restarting the MON and the OSD processes 
though.  I can't really backup the OSD state[1], so once they're 
upgraded, there's no going back.



1: ZFS or Btrfs snapshots could do this, but neither one are recommended 
for production.  I do plan to make snapshots once either FS is 
production ready.  LVM snapshots could do it, but they're such a pain 
that I never bothered.  And I have the scripts I used to use to make LVM 
snapshots of MySQL data directories.



--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Dimitri Maziuk
On 05/23/2014 03:06 PM, Craig Lewis wrote:

> 1: ZFS or Btrfs snapshots could do this, but neither one are recommended
> for production.

Out of curiosity, what's the current beef with zfs? I know what problems
are cited for btrfs, but I haven't heard much about zfs lately.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw Timeout

2014-05-23 Thread Craig Lewis

On 5/23/14 03:47 , Georg Höllrigl wrote:



On 22.05.2014 17:30, Craig Lewis wrote:

On 5/22/14 06:16 , Georg Höllrigl wrote:


I have created one bucket that holds many small files, separated into
different "directories". But whenever I try to acess the bucket, I
only run into some timeout. The timeout is at around 30 - 100 seconds.
This is smaller then the Apache timeout of 300 seconds.


Just so we're all talking about the same things, what does "many small
files" mean to you?  Also, how are you separating them into
"directories"?  Are you just giving files in the same "directory" the
same leading string, like "dir1_subdir1_filename"?


I can only estimate how many files. ATM I've 25M files on the origin 
but only 1/10th has been synced to radosgw. These are distributed 
throuhg 20 folders, each containing about 2k directories with ~ 100 - 
500 files each.


Do you think that's too much in that usecase?

The recommendations I've seen indicate that 25M objects per bucket is 
doable, but painful.  The bucket is itself an object stored in Ceph, 
which stores the list of objects in that bucket.   With a single bucket 
containing 25M objects, you're going to hotspot on the bucket.  Think of 
a bucket like a directory on a filesystem.  You wouldn't store 25M files 
in a single directory.


Buckets are a bit simpler than directories.  They don't have to track 
permissions, per file ACLs, and all the other things that POSIX 
filesystems do.  You can push them harder than a normal directory, but 
the same concepts still apply.  The more files you put in a 
bucket/directory, the slower it gets.  Most filesystems impose a hard 
limit on the number of files in a directory.  RadosGW doesn't have a 
limit, it just gets slower.


Even the list of buckets has this problem.  You wouldn't want to create 
25M buckets with one object each.  By default, there is a 1000 bucket 
limit per user, but you can increase that.



If you can handle using 20 buckets, it would be worthwhile to put each 
one of your top 20 folders into it's own bucket.  If you can break it 
apart even more, that would be even better.


I mentioned that I have a bunch of buckets with ~1M objects each. GET 
and PUT of objects is still fast, but listing the contents of the bucket 
takes a long time.  Each bucket takes 20-30 minutes to get a full 
listing.  If you're going to be doing a lot of bucket listing, you might 
want to keep each bucket below 1000 items.  Maybe each of your 2k 
directories gets it's own bucket.



If using more than one bucket is difficult, then 25M objects in one 
bucket will work.



--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Questions about zone and disater recovery

2014-05-23 Thread Craig Lewis

On 5/21/14 19:49 , wsnote wrote:

Hi,everyone!
I have 2 ceph clusters, one master zone, another secondary zone.
Now I have some question.
1. Can ceph have two or more secondary zones?


It's supposed to work, but I haven't tested it.



2. Can the role of master zone and secondary zone transform mutual?
I mean I can change the secondary zone to be master and the master 
zone to secondary.
Yes and no.  You can promote the slave to a master at any time by 
disabling replication, and writing to it.  You'll want to update your 
region and zone maps, but that's only required to make replication 
between zones work.


Converting the master to a secondary zone... I don't know. Everything 
will work if you delete the contents of the old master, set it up as a 
new secondary of the new master, and re-replicate everything.   Nobody 
wants to do that.  It would be nice if you could just point the old 
master (with it's existing data) at the new master, and it would start 
replicating.  I can't answer that.




3. How to deal with the situation when the master zone is down?
Now the secondary zone forbids all the operations of files, such as 
create objects, delete objects.
When the master zone is down, users can't do anything to the files 
except read objects from the secondary zone.
It's a bad user experience. Additionly, it will have a bad influence 
on the confidence of the users.
I know the limit of secondary zone is out of consideration for the 
consistency of data. However, is there another way to improve some 
experience?

I think:
There can be a config that allow the files operations of the secondary 
zone.If the master zone is down, the admin can enable it, then the 
users can do files opeartions as usually. The secondary record all the 
files operations of the files. When the master zone gets right, the 
admin can sync files to the master zone manually.




The secondary zone tracks what metadata operations that it has replayed 
from the master zone.  It does this per bucket.


In theory, there's no reason you can have additional buckets in the 
slave zone that the master zone doesn't have.  Since these buckets 
aren't replicated, there shouldn't be a problem writing to them.  In 
theory, you should even be able to write objects to the existing buckets 
in the slave, as long as the master doesn't have those objects.  I don't 
know what would happen if you created one of those buckets or objects on 
the master.  Maybe replication breaks, or maybe it just overwrites the 
data in the slave.


That's a lot of "in theory" though.  I wouldn't attempt it without a lot 
of simulation in test clusters.


--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Cédric Lemarchand
Hello Dimitri,
> Le 23 mai 2014 à 22:33, Dimitri Maziuk  a écrit :
> 
>> On 05/23/2014 03:06 PM, Craig Lewis wrote:
>> 
>> 1: ZFS or Btrfs snapshots could do this, but neither one are recommended
>> for production.
> 
> Out of curiosity, what's the current beef with zfs? I know what problems
> are cited for btrfs, but I haven't heard much about zfs lately.

The Linux implementation (ZoL) is actually stable for production, but is quiet 
memory hungry because of a spl/slab fragmentation issue ...

But I would ask a question : even with a snapshot capable FS, is it sufficient 
to achieve a consistent backup of a running leveldb ? Or did you plan to 
stop/snap/start the mon ? (No knowledge at all about leveldb ...)

Cheers 

> 
> -- 
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] collectd / graphite / grafana .. calamari?

2014-05-23 Thread Ricardo Rocha
Hi John.

Thanks for the reply, sounds very good.

The extra visualizations from kibana (grafana only seems to pack a
small subset, but the codebase is basically the same) look cool, will
put some more in soon - seems like they can still be useful later.

Looking forward to some calamari.

Cheers,
  Ricardo

On Fri, May 23, 2014 at 10:42 PM, John Spray  wrote:
> Hi Ricardo,
>
> Let me share a few notes on metrics in calamari:
>  * We're bundling graphite, and using diamond to send home metrics.
> The diamond collector used in calamari has always been open source
> [1].
>  * The Calamari UI has its own graphs page that talks directly to the
> graphite API (the calamari REST API does not duplicate any of the
> graphing interface)
>  * We also bundle the default graphite dashboard, so that folks can go
> to /graphite/dashboard/ on the calamari server to plot anything custom
> they want to.
>
> It could be quite interesting hook in Grafana there in the same way
> that we currently hook in the default graphite dashboard, as it
> grafana definitely nicer and would give us a roadmap to influxdb (a
> project I am quite excited about).
>
> Cheers,
> John
>
> 1. https://github.com/ceph/Diamond/commits/calamari
>
> On Fri, May 23, 2014 at 1:58 AM, Ricardo Rocha  wrote:
>> Hi.
>>
>> I saw the thread a couple days ago on ceph-users regarding collectd...
>> and yes, i've been working on something similar for the last few days
>> :)
>>
>> https://github.com/rochaporto/collectd-ceph
>>
>> It has a set of collectd plugins pushing metrics which mostly map what
>> the ceph commands return. In the setup we have it pushes them to
>> graphite and the displays rely on grafana (check for a screenshot in
>> the link above).
>>
>> As it relies on common building blocks, it's easily extensible and
>> we'll come up with new dashboards soon - things like plotting osd data
>> against the metrics from the collectd disk plugin, which we also
>> deploy.
>>
>> This email is mostly to share the work, but also to check on Calamari?
>> I asked Patrick after the RedHat/Inktank news and have no idea what it
>> provides, but i'm sure it comes with lots of extra sauce - he
>> suggested to ask in the list.
>>
>> What's the timeline to have it open sourced? It would be great to have
>> a look at it, and as there's work from different people in this area
>> maybe start working together on some fancier monitoring tools.
>>
>> Regards,
>>   Ricardo
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests

2014-05-23 Thread Győrvári Gábor

Hello,

No i dont see any backfill log in ceph.log during that period, drives 
are WD2000FYYZ-01UL1B1 but i did not find any informations in SMART, and 
yes i will check other drives too.


Could i determine somehow, in which PG placed the file?

Thanks

2014.05.23. 20:51 keltezéssel, Craig Lewis írta:

On 5/22/14 11:51 , Győrvári Gábor wrote:

Hello,

Got this kind of logs in two node of 3 node cluster both node has 2 
OSD, only affected 2 OSD on two separate node thats why i dont 
understand the situation. There wasnt any extra io on the system at 
the given time.


Using radosgw with s3 api to store objects under ceph average ops 
around 20-150 and bw usage 100-2000kb read / sec and only 50-1000kb / 
sec written.


osd_op(client.7821.0:67251068 
default.4181.1_products/800x600/537e28022fdcc.jpg [cmpxattr 
user.rgw.idtag (22) op 1 mode 1,setxattr user.rgw.idtag (33),call 
refcount.put] 11.fe53a6fb e590) v4 *currently waiting for subops from 
[2] **

*


Are any of your PGs in recovery or backfill?

I've seen this happen two different ways.  The first time was because 
I had the recovery and backfill parameters set too high for my 
cluster.  If your journals aren't SSDs, the default parameters are too 
high.  The recovery operation will use most of the IOps, and starve 
the clients.


The second time I saw this is when one disk was starting to fail. 
Sectors starting failing, and the drive spent a lot of time reading 
and remapping bad sectors.  Consumer class SATA disks will retry bad 
sectors for 30+ second.  It happens in the drive firmware, so it's not 
something you can stop.  Enterprise class drives will give up quicker, 
since they know you have another copy of the data.  (Nobody uses 
enterprise class drives stand-alone; they're always in some sort of 
storage array).


I've had reports of 6+ OSDs blocking subops, and I traced it back to 
one disk that was blocking others.  I replaced that disk, and the 
warnings went away.



If your cluster is healthy, check the SMART attributes for osd.2. If 
osd.2 looks good, it might another osd.  Check osd.2 logs, and check 
any osd that are blocking osd.2.  If your cluster is small, it might 
be faster to just check all disks instead of following the trail.




--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 





--
Győrvári Gábor - Scr34m
scr...@frontember.hu

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Designing a cluster with ceph and benchmark (ceph vs ext4)

2014-05-23 Thread Christian Balzer

Hello,

On Fri, 23 May 2014 15:41:23 -0300 Listas@Adminlinux wrote:

> Hi !
> 
> I have failover clusters for some aplications. Generally with 2 members 
> configured with Ubuntu + Drbd + Ext4. For example, my IMAP cluster works 
> fine with ~ 50k email accounts and my HTTP cluster hosts ~2k sites.
> 
My mailbox servers are also multiple DRBD based cluster pairs. 
For performance in fully redundant storage there is isn't anything better
(in the OSS, generic hardware section at least).

> See design here: http://adminlinux.com.br/cluster_design.txt
> 
> I would like to provide load balancing instead of just failover. So, I 
> would like to use a distributed architecture of the filesystem. As we 
> know, Ext4 isn't a distributed filesystem. So wish to use Ceph in my 
> clusters.
>
You will find that all cluster/distributed filesystems have severe
performance shortcomings when compared to something like Ext4.

On top of that, CephFS isn't ready for production as the MDS isn't HA.

A potential middle way might be to use Ceph/RBD volumes formatted in Ext4.
That doesn't give you shared access, but it will allow you to separate
storage and compute nodes, so when one compute node becomes busy, mount
that volume from a more powerful compute node instead.

That all said, I can't see any way and reason to replace my mailbox DRBD
clusters with Ceph in the foreseeable future.
To get similar performance/reliability to DRBD I would have to spend 3-4
times the money.

Where Ceph/RBD works well is situations where you can't fit the compute
needs into a storage node (as required with DRBD) and where you want to
access things from multiple compute nodes, primarily for migration
purposes. 
In short, as a shared storage for VMs.

> Any suggestions for design of the cluster with Ubuntu+Ceph?
> 
> I built a simple cluster of 2 servers to test simultaneous reading and 
> writing with Ceph. My conf:  http://adminlinux.com.br/ceph_conf.txt
> 
Again, CephFS isn't ready for production, but other than that I know very
little about it as I don't use it.
However your version of Ceph is severely outdated, you really should be
looking at something more recent to rule out you're experience long fixed
bugs. The same goes for your entire setup and kernel.

Also Ceph only starts to perform decently with many OSDs (disks) and
the journals on SSDs instead of being on the same disk.
Think DRBD AL metadata-internal, but with MUCH more impact.

Regards,

Christian
> But in my simultaneous benchmarks found errors in reading and writing. I 
> ran "iozone -t 5 -r 4k -s 2m" simultaneously on both servers in the 
> cluster. The performance was poor and had errors like this:
> 
> Error in file: Found ?0? Expecting ?6d6d6d6d6d6d6d6d? addr b660
> Error in file: Position 1060864
> Record # 259 Record size 4 kb
> where b660 loop 0
> 
> Performance graphs of benchmark: http://adminlinux.com.br/ceph_bench.html
> 
> Can you help me find what I did wrong?
> 
> Thanks !
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com