[ceph-users] help: a newbie question

2014-09-11 Thread brandon li
Hi, I am new to ceph file system, and have got a newbie question: For a sparse file, how could ceph file system know the hole in the file was never created or some stripe was just simply lost? Thanks, Brandon ___ ceph-users mailing list ceph-users@list

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-11 Thread Alexandre DERUMIER
results of fio on rbd with kernel patch fio rbd crucial m550 1 osd 0.85 (osd_enable_op_tracker true or false, same result): --- bw=12327KB/s, iops=3081 So no much better than before, but this time, iostat show only 15% utils, and latencies are lower Device: rrqm/s

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-11 Thread Alexandre DERUMIER
>>For crucial, I'll try to apply the patch from stefan priebe, to ignore >>flushes (as crucial m550 have supercaps) >>http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-November/035707.html Here the results, disable cache flush crucial m550 #fio --filename=/dev/sdb --direct=1

Re: [ceph-users] Ceph object back up details

2014-09-11 Thread M Ranga Swami Reddy
Thank you. On Sep 8, 2014 9:21 PM, "Yehuda Sadeh" wrote: > Not sure I understand what you ask. Multiple zones within the same > region configuration is described here: > > > http://ceph.com/docs/master/radosgw/federated-config/#multi-site-data-replication > > Yehuda > > On Sun, Sep 7, 2014 at 10:

Re: [ceph-users] Regarding key/value interface

2014-09-11 Thread Somnath Roy
Hi Haomai, > Make perfect sense Sage.. >

Re: [ceph-users] Regarding key/value interface

2014-09-11 Thread Sage Weil
On Fri, 12 Sep 2014, Somnath Roy wrote: > Thanks Sage... > Basically, we are doing similar chunking in our current implementation which > is derived from objectstore. > Moving to Key/value will save us from that :-) > Also, I was thinking, we may want to do compression (later may be dedupe ?) >

Re: [ceph-users] Regarding key/value interface

2014-09-11 Thread Somnath Roy
Thanks Sage... Basically, we are doing similar chunking in our current implementation which is derived from objectstore. Moving to Key/value will save us from that :-) Also, I was thinking, we may want to do compression (later may be dedupe ?) on that Key/value layer as well. Yes, partial read/

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-11 Thread Alexandre DERUMIER
Hi, seem that intel s3500 perform a lot better with o_dsync crucial m550 #fio --filename=/dev/sdb --direct=1 --rw=write --bs=4k --numjobs=2 --group_reporting --invalidate=0 --name=ab --sync=1 bw=1249.9KB/s, iops=312 intel s3500 --- fio --filename=/dev/sdb --direct=1 --rw=wri

Re: [ceph-users] Regarding key/value interface

2014-09-11 Thread Haomai Wang
On Fri, Sep 12, 2014 at 9:46 AM, Somnath Roy wrote: > > Make perfect sense Sage.. > > Regarding striping of filedata, You are saying KeyValue interface will do the > following for me? > > 1. Say in case of rbd image of order 4 MB, a write request coming to > Key/Value interface, it will chunk t

Re: [ceph-users] Regarding key/value interface

2014-09-11 Thread Haomai Wang
On Fri, Sep 12, 2014 at 9:46 AM, Somnath Roy wrote: > Make perfect sense Sage.. > > Regarding striping of filedata, You are saying KeyValue interface will do > the following for me? > > 1. Say in case of rbd image of order 4 MB, a write request coming to > Key/Value interface, it will chunk the

Re: [ceph-users] Regarding key/value interface

2014-09-11 Thread Haomai Wang
On Fri, Sep 12, 2014 at 9:46 AM, Somnath Roy wrote: > Make perfect sense Sage.. > > Regarding striping of filedata, You are saying KeyValue interface will do > the following for me? > > 1. Say in case of rbd image of order 4 MB, a write request coming to > Key/Value interface, it will chunk the

Re: [ceph-users] Regarding key/value interface

2014-09-11 Thread Sage Weil
On Fri, 12 Sep 2014, Somnath Roy wrote: > Make perfect sense Sage.. > > Regarding striping of filedata, You are saying KeyValue interface will do the > following for me? > > 1. Say in case of rbd image of order 4 MB, a write request coming to > Key/Value interface, it will chunk the object (sa

Re: [ceph-users] Ceph object back up details

2014-09-11 Thread M Ranga Swami Reddy
Thanks for details. Thanks Swami On Sep 8, 2014 9:21 PM, "Yehuda Sadeh" wrote: > Not sure I understand what you ask. Multiple zones within the same > region configuration is described here: > > > http://ceph.com/docs/master/radosgw/federated-config/#multi-site-data-replication > > Yehuda > > On S

Re: [ceph-users] Regarding key/value interface

2014-09-11 Thread Somnath Roy
Make perfect sense Sage.. Regarding striping of filedata, You are saying KeyValue interface will do the following for me? 1. Say in case of rbd image of order 4 MB, a write request coming to Key/Value interface, it will chunk the object (say full 4MB) in smaller sizes (configurable ?) and str

Re: [ceph-users] Regarding key/value interface

2014-09-11 Thread Sage Weil
Hi Somnath, On Fri, 12 Sep 2014, Somnath Roy wrote: > > Hi Sage/Haomai, > > If I have a key/value backend that support transaction, range queries (and I > don?t need any explicit caching etc.) and I want to replace filestore (and > leveldb omap) with that,  which interface you recommend me to de

[ceph-users] Regarding key/value interface

2014-09-11 Thread Somnath Roy
Hi Sage/Haomai, If I have a key/value backend that support transaction, range queries (and I don't need any explicit caching etc.) and I want to replace filestore (and leveldb omap) with that, which interface you recommend me to derive from , directly ObjectStore or KeyValueDB ? I have already

[ceph-users] Consistent hashing

2014-09-11 Thread Jakes John
Hi, I would like to know few points regarding the consistent hashing of CRUSH algorithm. When I read the algorithm, I noticed that if a selected bucket(device) is failed or overloaded, it skips and selects a new bucket. Similar is the case if collision happens. If such an event happens, how is

Re: [ceph-users] Upgraded now MDS won't start

2014-09-11 Thread McNamara, Bradley
That portion of the log confused me, too. However, I had run the same upgrade process on the MDS as all the other cluster components. Firefly was actually installed on the MDS even though the log mentions 0.72.2. At any rate, I ended up stopping the MDS and using 'newfs' on the metadata and d

Re: [ceph-users] Cephfs upon Tiering

2014-09-11 Thread Sage Weil
On Thu, 11 Sep 2014, Gregory Farnum wrote: > On Thu, Sep 11, 2014 at 11:39 AM, Sage Weil wrote: > > On Thu, 11 Sep 2014, Gregory Farnum wrote: > >> On Thu, Sep 11, 2014 at 4:13 AM, Kenneth Waegeman > >> wrote: > >> > Hi all, > >> > > >> > I am testing the tiering functionality with cephfs. I used

Re: [ceph-users] Upgraded now MDS won't start

2014-09-11 Thread Gregory Farnum
On Wed, Sep 10, 2014 at 4:24 PM, McNamara, Bradley wrote: > Hello, > > This is my first real issue since running Ceph for several months. Here's > the situation: > > I've been running an Emperor cluster for several months. All was good. I > decided to upgrade since I'm running Ubuntu 13.10 an

Re: [ceph-users] Cephfs upon Tiering

2014-09-11 Thread Gregory Farnum
On Thu, Sep 11, 2014 at 11:39 AM, Sage Weil wrote: > On Thu, 11 Sep 2014, Gregory Farnum wrote: >> On Thu, Sep 11, 2014 at 4:13 AM, Kenneth Waegeman >> wrote: >> > Hi all, >> > >> > I am testing the tiering functionality with cephfs. I used a replicated >> > cache with an EC data pool, and a repl

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-11 Thread Cedric Lemarchand
Le 11/09/2014 19:33, Cedric Lemarchand a écrit : > Le 11/09/2014 08:20, Alexandre DERUMIER a écrit : >> Hi Sebastien, >> >> here my first results with crucial m550 (I'll send result with intel s3500 >> later): >> >> - 3 nodes >> - dell r620 without expander backplane >> - sas controller : lsi LSI

Re: [ceph-users] Cephfs upon Tiering

2014-09-11 Thread Sage Weil
On Thu, 11 Sep 2014, Gregory Farnum wrote: > On Thu, Sep 11, 2014 at 4:13 AM, Kenneth Waegeman > wrote: > > Hi all, > > > > I am testing the tiering functionality with cephfs. I used a replicated > > cache with an EC data pool, and a replicated metadata pool like this: > > > > > > ceph osd pool cr

Re: [ceph-users] OpTracker optimization

2014-09-11 Thread Samuel Just
Just added it to wip-sam-testing. -Sam On Thu, Sep 11, 2014 at 11:30 AM, Somnath Roy wrote: > Sam/Sage, > I have addressed all of your comments and pushed the changes to the same pull > request. > > https://github.com/ceph/ceph/pull/2440 > > Thanks & Regards > Somnath > > -Original Message--

Re: [ceph-users] Cephfs upon Tiering

2014-09-11 Thread Gregory Farnum
On Thu, Sep 11, 2014 at 4:13 AM, Kenneth Waegeman wrote: > Hi all, > > I am testing the tiering functionality with cephfs. I used a replicated > cache with an EC data pool, and a replicated metadata pool like this: > > > ceph osd pool create cache 1024 1024 > ceph osd pool set cache size 2 > ceph

Re: [ceph-users] OpTracker optimization

2014-09-11 Thread Somnath Roy
Sam/Sage, I have addressed all of your comments and pushed the changes to the same pull request. https://github.com/ceph/ceph/pull/2440 Thanks & Regards Somnath -Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Wednesday, September 10, 2014 8:33 PM To: Somnath Roy Cc:

[ceph-users] radosgw user creation in secondary site error

2014-09-11 Thread Santhosh Fernandes
Hi All, When I try to create user in federated gateway secondary site I get following error. radosgw-admin user create --uid="eu-east" --display-name="Region-EU Zone-East" --name client.radosgw.eu-east-1 --system 2014-09-11 22:34:50.234269 7f3da41327c0 -1 ERROR: region map does not specify maste

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-11 Thread Cedric Lemarchand
Le 11/09/2014 08:20, Alexandre DERUMIER a écrit : > Hi Sebastien, > > here my first results with crucial m550 (I'll send result with intel s3500 > later): > > - 3 nodes > - dell r620 without expander backplane > - sas controller : lsi LSI 9207 (no hardware raid or cache) > - 2 x E5-2603v2 1.8GHz

Re: [ceph-users] Is ceph osd reweight always safe to use?

2014-09-11 Thread JR
Greetings Just a follow up on the resolution of this issue. Restarting ceph-osd on one of the nodes solved the problem of the stuck unclean pgs. Thanks, JR On 9/9/2014 2:24 AM, Christian Balzer wrote: > > Hello, > > On Tue, 09 Sep 2014 01:25:17 -0400 JR wrote: > >> Greetings >> >> After run

Re: [ceph-users] why one osd-op from client can get two osd-op-reply?

2014-09-11 Thread Gregory Farnum
It's the recovery and backfill code. There's not one place; it's what most of the OSD code is for. On Thursday, September 11, 2014, yuelongguang wrote: > as for the second question, could you tell me where the code is. > how ceph makes size/min_szie copies? > > thanks > > > > > > > > At 2014-09-

Re: [ceph-users] osd cpu usage is bigger than 100%

2014-09-11 Thread Gregory Farnum
Presumably it's going faster when you have a deeper iodepth? So the reason it's using more CPU is because it's doing more work. That's all there is to it. (And the OSD uses a lot more CPU than some storage systems do, because it does a lot more work than them.) -Greg On Thursday, September 11, 201

Re: [ceph-users] Cache Pool writing too much on ssds, poor performance?

2014-09-11 Thread Mark Nelson
I'd take a look at: http://ceph.com/docs/master/rados/operations/pools/ and see if any of the options that govern cache flush behaviour may be affecting things. Specifically: cache_target_dirty_ratio cache_target_full_ratio target_max_bytes target_max_objects cache_min_flush_age cache_min_ev

Re: [ceph-users] Cache Pool writing too much on ssds, poor performance?

2014-09-11 Thread Andrei Mikhailovsky
Mark, Thanks for a very detailed email. Really apreciate your help on this. I now have a bit more understanding on how it works and understand why I am getting so much write on the cache ssds. I am however, trouble to understand why the cache pool is not keeping the data and flushing it? I'v

Re: [ceph-users] Cache Pool writing too much on ssds, poor performance?

2014-09-11 Thread Mark Nelson
Something that is very important to keep in mind with the way that the cache tier implementation currently works in Ceph is that cache misses are very expensive. It's really important that your workload have a really big hot/cold data skew otherwise it's not going to work well at all. In your

[ceph-users] Striping with cloned images

2014-09-11 Thread Gerhard Wolkerstorfer
Hi, i am running a Ceph cluster that contains the following RBD image: root@ceph0:~# rbd info -p cephstorage debian_6_0_9_template_system rbd image 'debian_6_0_9_template_system': size 30720 MB in 7680 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.1907c2a

Re: [ceph-users] (no subject)

2014-09-11 Thread Alfredo Deza
We discourage users from using `root` to call ceph-deploy or to call it with `sudo` for this reason. We have a warning in the docs about it if you are getting started in the Ceph Node Setup section: http://ceph.com/docs/v0.80.5/start/quick-start-preflight/#ceph-deploy-setup The reason for this is

Re: [ceph-users] Rebalancing slow I/O.

2014-09-11 Thread Andrei Mikhailovsky
Irek, have you change the ceph.conf file to change the recovery p riority? Options like these might help with prioritising repair/rebuild io with the client IO: osd_recovery_max_chunk = 8388608 osd_recovery_op_priority = 2 osd_max_backfills = 1 osd_recovery_max_active = 1 osd_recovery_th

Re: [ceph-users] error while installing ceph in cluster node

2014-09-11 Thread Subhadip Bagui
Hi, Please let me know what can be the issue. Regards, Subhadip --- On Thu, Sep 11, 2014 at 9:54 AM, Subhadip Bagui wrote: > Hi, > > I'm getting the below error while installing ceph

[ceph-users] Rebalancing slow I/O.

2014-09-11 Thread Irek Fasikhov
Hi,All. DELL R720X8,96 OSDs, Network 2x10Gbit LACP. When one of the nodes crashes, I get very slow I / O operations on virtual machines. A cluster map by default. [ceph@ceph08 ~]$ ceph osd tree # idweight type name up/down reweight -1 262.1 root defaults -2 32.76

Re: [ceph-users] Cache Pool writing too much on ssds, poor performance?

2014-09-11 Thread Andrei Mikhailovsky
Hi, I have created the cache tier using the following commands: 95 ceph osd pool create cache-pool-ssd 2048 2048 ; ceph osd pool set cache-pool-ssd crush_ruleset 4 124 ceph osd pool set cache-pool-ssd size 2 126 ceph osd pool set cache-pool-ssd min_size 1 130 ceph osd tier add Primary-ubunt

[ceph-users] Cephfs upon Tiering

2014-09-11 Thread Kenneth Waegeman
Hi all, I am testing the tiering functionality with cephfs. I used a replicated cache with an EC data pool, and a replicated metadata pool like this: ceph osd pool create cache 1024 1024 ceph osd pool set cache size 2 ceph osd pool set cache min_size 1 ceph osd erasure-code-profile set pro

Re: [ceph-users] question about librbd io(fio paramenters)

2014-09-11 Thread yuelongguang
fio paramenters --fio [global] ioengine=libaio direct=1 rw=randwrite filename=/dev/vdb time_based runtime=300 stonewall [iodepth32] iodepth=32 bs=4k At 2014-09-11 05:04:09, "yuelongguang" wrote: hi, josh durgin: please look at my test. inside vm using fio to tes

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-11 Thread Francois Deppierraz
Hi Greg, An attempt to recover pg 3.3ef by copying it from broken osd.6 to working osd.32 resulted in one more broken osd :( Here's what was actually done: root@storage1:~# ceph pg 3.3ef list_missing | head { "offset": { "oid": "", "key": "", "snapid": 0, "hash": 0, "max"

Re: [ceph-users] question about librbd io

2014-09-11 Thread yuelongguang
hi, josh durgin: please look at my test. inside vm using fio to test rbd performance. fio paramters: dircet io, bs=4k, iodepth >> 4 from the infomation below, it does not match. avgrq-sz is not approximately 8, for avgqu-sz , its value is small and ruleless, lesser than 32. why? in ceph ,

Re: [ceph-users] why one osd-op from client can get two osd-op-reply?

2014-09-11 Thread yuelongguang
as for the second question, could you tell me where the code is. how ceph makes size/min_szie copies? thanks At 2014-09-11 12:19:18, "Gregory Farnum" wrote: >On Wed, Sep 10, 2014 at 8:29 PM, yuelongguang wrote: >> >> >> >> >> as for ack and ondisk, ceph has size and min_size to decide

[ceph-users] osd cpu usage is bigger than 100%

2014-09-11 Thread yuelongguang
hi,all i am testing rbd performance, now there is only one vm which is using rbd as its disk, and inside it fio is doing r/w. the big diffenence is that i set a big iodepth other than iodepth=1. according to my test, the bigger iodepth, the bigger cpu usage. analyse the output of top comm