Re: [ceph-users] Number of SSD for OSD journal
16.12.2014 10:53, Daniel Schwager пишет: > Hallo Mike, > >> This is also have another way. >> * for CONF 2,3 replace 200Gb SSD to 800Gb and add another 1-2 SSD to >> each node. >> * make tier1 read-write cache on SSDs >> * also you can add journal partition on them if you wish - then data >> will moving from SSD to SSD before let down on HDD >> * on HDD you can make erasure pool or replica pool > > Do you have some experience (performance ?) with SSD as caching tier1? Maybe > some small benchmarks? From the mailing list, I "feel" that SSD-tearing is > not much used in productive. > > regards > Danny > > No. But I think it's better than using SSD only for journals. Looks on StorPool or Nutanix (in some way) - they used SSD as a storage/long life cache as a storage. Cache pool tiering it's a new feature in Ceph introducing in Firefly. It's explain why cache tiering by now haven't used in production. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unable to download files from ceph radosgw node using openstack juno swift client.
Hi, On Tue, Dec 16, 2014 at 12:54 PM, pushpesh sharma wrote: > > Vivek, > > The problem is swift client is only downloading a chunk of object not > the whole object so the etag mismatch. Could you paste the value of > 'rgw_max_chunk_size'. Please be sure you set this to a sane > value(<4MB, atleast for Giant release this works below this value). > > > Where can I find the rgw_max_chunk_size ?. I am using ceph firefly. Regards, -- Vivek Varghese Cherian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unable to download files from ceph radosgw node using openstack juno swift client.
Hi, root@ppm-c240-ceph3:/var/run/ceph# ceph --admin-daemon /var/run/ceph/ceph-osd.11.asok config show | less | grep rgw_max_chunk_size "rgw_max_chunk_size": "524288", root@ppm-c240-ceph3:/var/run/ceph# And the value is above 4 MB. Regards, -- Vivek Varghese Cherian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Dual RADOSGW Network
Thanks Craig. I will try that! I thought it was more complicate than that because of the entries for the "public_network" and "rgw dns name" in the config file... I will give it a try. Best, George That shouldnt be a problem. Just have Apache bind to all interfaces instead of the external IP. In my case, I only have Apache bound to the internal interface. My load balancer has an external and internal IP, and Im able to talk to it on both interfaces. On Mon, Dec 15, 2014 at 2:00 PM, Georgios Dimitrakakis wrote: Hi all! I have a single CEPH node which has two network interfaces. One is configured to be accessed directly by the internet (153.*) and the other one is configured on an internal LAN (192.*) For the moment radosgw is listening on the external (internet) interface. Can I configure radosgw to be accessed by both interfaces? What I would like to do is to save bandwidth and time for the machines on the internal network and use the internal net for all rados communications. Any ideas? Best regards, George ___ ceph-users mailing list ceph-users@lists.ceph.com [1] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2] Links: -- [1] mailto:ceph-users@lists.ceph.com [2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3] mailto:gior...@acmac.uoc.gr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw timeout
I have a 3 node Ceph 0.87 cluster. After a while I see an error in radosgw and I don’t find references in the list archives heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7fc4eac2d700' had timed out after 600 The only solution is restart radosgw and for a while it works just fine Any idea? Thanks ceph.conf [global] fsid = fc0e2e09-ade3-4ff6-b23e-f789775b2515 mon initial members = nodo-3 mon host = 192.168.2.200, 192.168.2.201, 192.168.2.202 mon addr = 192.168.2.200:6789, 192.168.2.201:6789, 192.168.2.202:6789 auth cluster required = cephx auth service required = cephx auth client required = cephx osd pool default size = 3 osd pool default min_size = 1 osd pool default pg_num = 128 osd pool default pgp_num = 128 osd recovery delay start = 15 log file = /dev/stdout mon clock drift allowed = 1 [client.radosgw.gateway] host = deis-store-gateway keyring = /etc/ceph/ceph.client.radosgw.keyring rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock log file = /var/log/ceph/radosgw.log Full trace: 2014-12-15 21:59:27.976981 7fc70cb1c840 0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process radosgw, pid 127 2014-12-15 21:59:28.005388 7fc70cb1c840 0 framework: fastcgi 2014-12-15 21:59:28.005393 7fc70cb1c840 0 framework: civetweb 2014-12-15 21:59:28.005398 7fc70cb1c840 0 framework conf key: port, val: 7480 2014-12-15 21:59:28.005402 7fc70cb1c840 0 starting handler: civetweb 2014-12-15 21:59:28.010659 7fc70cb1c840 0 starting handler: fastcgi 2014-12-15 21:59:39.961503 7fc55cd11700 1 == starting new request req=0x7fc6ec1148e0 = 2014-12-15 21:59:39.965239 7fc55cd11700 1 == req done req=0x7fc6ec1148e0 http_status=200 == 2014-12-15 21:59:40.033219 7fc554500700 1 == starting new request req=0x7fc6ec11c190 = 2014-12-15 21:59:40.038634 7fc554500700 0 WARNING: couldn't find acl header for object, generating default 2014-12-15 21:59:40.348267 7fc554500700 1 == req done req=0x7fc6ec11c190 http_status=200 == 2014-12-15 22:00:42.522831 7fc554500700 1 == starting new request req=0x7fc6ec11c220 = 2014-12-15 22:00:42.786590 7fc554500700 1 == req done req=0x7fc6ec11c220 http_status=200 == 2014-12-15 22:04:41.906676 7fc55cd11700 1 == starting new request req=0x7fc6ec11c4c0 = 2014-12-15 22:04:42.077969 7fc55cd11700 1 == req done req=0x7fc6ec11c4c0 http_status=200 == 2014-12-15 22:09:42.270387 7fc554500700 1 == starting new request req=0x7fc6ec11bb90 = 2014-12-15 22:09:42.634896 7fc554500700 1 == req done req=0x7fc6ec11bb90 http_status=200 == 2014-12-15 22:14:42.812094 7fc554500700 1 == starting new request req=0x7fc6ec11a2c0 = 2014-12-15 22:14:43.027164 7fc554500700 1 == req done req=0x7fc6ec11a2c0 http_status=200 == 2014-12-15 22:19:43.330578 7fc5acdb1700 1 == starting new request req=0x7fc6ec11a560 = 2014-12-15 22:19:43.505847 7fc5acdb1700 1 == req done req=0x7fc6ec11a560 http_status=200 == 2014-12-15 22:24:31.664914 7fc6fb7fe700 0 monclient: hunting for new mon 2014-12-15 22:24:31.691258 7fc70cb14700 0 -- 192.168.2.201:0/1000131 >> 192.168.2.202:6800/1 pipe(0x7fc6f0120610 sd=9 :0 s=1 pgs=0 cs=0 l=1 c=0x7fc6f01208a0).fault 2014-12-15 22:24:43.653020 7fc5acdb1700 1 == starting new request req=0x7fc6ec11a3b0 = 2014-12-15 22:24:49.093981 7fc55cd11700 1 == starting new request req=0x7fc6ec119d60 = 2014-12-15 22:24:55.165618 7fc51347e700 1 == starting new request req=0x7fc6ec121290 = 2014-12-15 22:25:04.181370 7fc57cd51700 1 == starting new request req=0x7fc6ec125fa0 = 2014-12-15 22:25:11.936946 7fc531cbb700 1 == starting new request req=0x7fc6ec12ad20 = 2014-12-15 22:25:12.401848 7fc5acdb1700 1 == req done req=0x7fc6ec11a3b0 http_status=200 == 2014-12-15 22:25:12.402031 7fc57cd51700 1 == req done req=0x7fc6ec125fa0 http_status=200 = 2014-12-15 22:25:12.402164 7fc51347e700 1 == req done req=0x7fc6ec121290 http_status=200 == 2014-12-15 22:25:12.402286 7fc531cbb700 1 == req done req=0x7fc6ec12ad20 http_status=200 == 2014-12-15 22:25:12.574183 7fc55cd11700 1 == req done req=0x7fc6ec119d60 http_status=200 == 2014-12-15 22:28:44.138277 7fc531cbb700 1 == starting new request req=0x7fc6ec12fa80 = 2014-12-15 22:28:44.277586 7fc531cbb700 1 == req done req=0x7fc6ec12fa80 http_status=200 == 2014-12-15 22:29:44.023631 7fc531cbb700 1 == starting new request req=0x7fc6ec11c560 = 2014-12-15 22:29:44.233772 7fc531cbb700 1 == req done req=0x7fc6ec11c560 http_status=200 == 2014-12-15 22:34:43.458371 7fc51347e700 1 == starting new request req=0x7fc6ec119dc0 = 2014-12-15 22:34:43.618785 7fc51347e700 1 == req done req=0x7fc6ec119dc0 http_status=200 == 2014-12-15 22:39:43.772838 7fc531cbb700 1 == starting new request req=0x7fc6ec11c560 = 2014-12-15 22:39:43.954160 7fc5
[ceph-users] OSD Crash makes whole cluster unusable ?
Hi there, today I had an osd crash with ceph 0.87/giant which made my hole cluster unusable for 45 Minutes. First it began with a disk error: sd 0:1:2:0: [sdc] CDB: Read(10)Read(10):: 28 28 00 00 0d 15 fe d0 fd 7b e8 f8 00 00 00 00 b0 08 00 00 XFS (sdc1): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. Then most other osds found out that my osd.3 is down: 2014-12-16 08:45:15.873478 mon.0 10.67.1.11:6789/0 3361077 : cluster [INF] osd.3 10.67.1.11:6810/713621 failed (42 reports from 35 peers after 23.642482 >= grace 23.348982) 5 minutes later the osd is marked as out: 2014-12-16 08:50:21.095903 mon.0 10.67.1.11:6789/0 3361367 : cluster [INF] osd.3 out (down for 304.581079) However, since 8:45 until 9:20 I have 1000 slow requests and 107 incomplete pgs. Many requests are not answered: 2014-12-16 08:46:03.029094 mon.0 10.67.1.11:6789/0 3361126 : cluster [INF] pgmap v6930583: 4224 pgs: 4117 active+clean, 107 incomplete; 7647 GB data, 19090 GB used, 67952 GB / 87042 GB avail; 2307 kB/s rd, 2293 kB/s wr, 407 op/s Also a recovery to another osd was not starting Seems the osd thinks it is still up and all other osds think this osd is down ? I found this in the log of osd3: ceph-osd.3.log:2014-12-16 08:45:19.319152 7faf81296700 0 log_channel(default) log [WRN] : map e61177 wrongly marked me down ceph-osd.3.log: -440> 2014-12-16 08:45:19.319152 7faf81296700 0 log_channel(default) log [WRN] : map e61177 wrongly marked me down Luckily I was able to restart osd3 and everything was working again but I do not understand what has happened. The cluster ways simply not usable for 45 Minutes. Any ideas Thanks Christoph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd snapshot slow restore
On Tue, 16 Dec 2014 11:26:35 AM you wrote: > Is this normal? is ceph just really slow at restoring rbd snapshots, > or have I really borked my setup? I'm not looking for a fix or a tuning suggestions, just feedback on whether this is normal -- Lindsay signature.asc Description: This is a digitally signed message part. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] can not add osd
Hi You logs does not provides much information , if you are following any other documentation for Ceph , i would recommend you to follow official Ceph docs. http://ceph.com/docs/master/start/quick-start-preflight/ Karan Singh Systems Specialist , Storage Platforms CSC - IT Center for Science, Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland mobile: +358 503 812758 tel. +358 9 4572001 fax +358 9 4572302 http://www.csc.fi/ On 16 Dec 2014, at 09:55, yang.bi...@zte.com.cn wrote: > hi > > When i execute "ceph-deploy osd prepare node3:/dev/sdb",always come out err > like this : > > [node3][WARNIN] INFO:ceph-disk:Running command: /bin/umount -- > /var/lib/ceph/tmp/mnt.u2KXW3 > [node3][WARNIN] umount: /var/lib/ceph/tmp/mnt.u2KXW3: target is busy. > > Then i execute "/bin/umount -- /var/lib/ceph/tmp/mnt.u2KXW3",result is ok. > > > ZTE Information Security Notice: The information contained in this mail (and > any attachment transmitted herewith) is privileged and confidential and is > intended for the exclusive use of the addressee(s). If you are not an > intended recipient, any disclosure, reproduction, distribution or other > dissemination or use of the information contained is strictly prohibited. If > you have received this mail in error, please delete it and notify us > immediately. > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RESOLVED Re: Cluster with pgs in active (unclean) status
Hi Gregory, Sorry for the delay getting back. There was no activity at all on those 3 pools. Activity on the fourth pool was under 1 Mbps of writes. I think I waited several hours, but I can't recall exactly. One hour at least is for sure. Thanks Eneko On 11/12/14 19:32, Gregory Farnum wrote: Was there any activity against your cluster when you reduced the size from 3 -> 2? I think maybe it was just taking time to percolate through the system if nothing else was going on. When you reduced them to size 1 then data needed to be deleted so everything woke up and started processing. -Greg On Wed, Dec 10, 2014 at 5:27 AM, Eneko Lacunza wrote: Hi all, I fixed the issue with the following commands: # ceph osd pool set data size 1 (wait some seconds for clean+active state of +64pgs) # ceph osd pool set data size 2 # ceph osd pool set metadata size 1 (wait some seconds for clean+active state of +64pgs) # ceph osd pool set metadata size 2 # ceph osd pool set rbd size 1 (wait some seconds for clean+active state of +64pgs) # ceph osd pool set rbd size 2 This now gives me: # ceph status cluster 3e91b908-2af3-4288-98a5-dbb77056ecc7 health HEALTH_OK monmap e3: 3 mons at {0=10.0.3.3:6789/0,1=10.0.3.1:6789/0,2=10.0.3.2:6789/0}, election epoch 32, quorum 0,1,2 1,2,0 osdmap e275: 2 osds: 2 up, 2 in pgmap v395557: 256 pgs, 4 pools, 194 GB data, 49820 objects 388 GB used, 116 GB / 505 GB avail 256 active+clean I'm still curious whether this can be fixed without this trick? Cheers Eneko On 10/12/14 13:14, Eneko Lacunza wrote: Hi all, I have a small ceph cluster with just 2 OSDs, latest firefly. Default data, metadata and rbd pools were created with size=3 and min_size=1 An additional pool rbd2 was created with size=2 and min_size=1 This would give me a warning status, saying that 64 pgs were active+clean and 192 active+degraded. (there are 64 pg per pool). I realized it was due to the size=3 in the three pools, so I changed that value to 2: # ceph osd pool set data size 2 # ceph osd pool set metadata size 2 # ceph osd pool set rbd size 2 Those 3 pools are empty. After those commands status would report 64 pgs active+clean, and 192 pgs active, with a warning saying 192 pgs were unclean. I have created a rbd block with: rbd create -p rbd --image test --size 1024 And now the status is: # ceph status cluster 3e91b908-2af3-4288-98a5-dbb77056ecc7 health HEALTH_WARN 192 pgs stuck unclean; recovery 2/99640 objects degraded (0.002%) monmap e3: 3 mons at {0=10.0.3.3:6789/0,1=10.0.3.1:6789/0,2=10.0.3.2:6789/0}, election epoch 32, quorum 0,1,2 1,2,0 osdmap e263: 2 osds: 2 up, 2 in pgmap v393763: 256 pgs, 4 pools, 194 GB data, 49820 objects 388 GB used, 116 GB / 505 GB avail 2/99640 objects degraded (0.002%) 192 active 64 active+clean Looking to an unclean non-empty pg: # ceph pg 2.14 query { "state": "active", "epoch": 263, "up": [ 0, 1], "acting": [ 0, 1], "actingbackfill": [ "0", "1"], "info": { "pgid": "2.14", "last_update": "263'1", "last_complete": "263'1", "log_tail": "0'0", "last_user_version": 1, "last_backfill": "MAX", "purged_snaps": "[]", "history": { "epoch_created": 1, "last_epoch_started": 136, "last_epoch_clean": 136, "last_epoch_split": 0, "same_up_since": 135, "same_interval_since": 135, "same_primary_since": 11, "last_scrub": "0'0", "last_scrub_stamp": "2014-11-26 12:23:57.023493", "last_deep_scrub": "0'0", "last_deep_scrub_stamp": "2014-11-26 12:23:57.023493", "last_clean_scrub_stamp": "0.00"}, "stats": { "version": "263'1", "reported_seq": "306", "reported_epoch": "263", "state": "active", "last_fresh": "2014-12-10 12:53:37.766465", "last_change": "2014-12-10 10:32:24.189000", "last_active": "2014-12-10 12:53:37.766465", "last_clean": "0.00", "last_became_active": "0.00", "last_unstale": "2014-12-10 12:53:37.766465", "mapping_epoch": 128, "log_start": "0'0", "ondisk_log_start": "0'0", "created": 1, "last_epoch_clean": 136, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "0'0", "last_scrub_stamp": "2014-11-26 12:23:57.023493", "last_deep_scrub": "0'0", "last_deep_scrub_stamp": "2014-11-26 12:23:57.023493", "last_clean_scrub_stamp": "0.00", "log_size": 1, "ondisk_log_size": 1, "stats_invalid": "0", "stat_sum": { "num_bytes": 112, "num_objects": 1, "num_object_clones": 0,
Re: [ceph-users] Number of SSD for OSD journal
On Tue, 16 Dec 2014 12:10:42 +0300 Mike wrote: > 16.12.2014 10:53, Daniel Schwager пишет: > > Hallo Mike, > > > >> This is also have another way. > >> * for CONF 2,3 replace 200Gb SSD to 800Gb and add another 1-2 SSD to > >> each node. > >> * make tier1 read-write cache on SSDs > >> * also you can add journal partition on them if you wish - then data > >> will moving from SSD to SSD before let down on HDD > >> * on HDD you can make erasure pool or replica pool > > > > Do you have some experience (performance ?) with SSD as caching > > tier1? Maybe some small benchmarks? From the mailing list, I "feel" > > that SSD-tearing is not much used in productive. > > > > regards > > Danny > > > > > > No. But I think it's better than using SSD only for journals. Looks on > StorPool or Nutanix (in some way) - they used SSD as a storage/long life > cache as a storage. > Unfortunately a promising design doesn't make a well rounded working solution. > Cache pool tiering it's a new feature in Ceph introducing in Firefly. > It's explain why cache tiering by now haven't used in production. > If you'd followed the various discussions here, you'd know that SSD based cache tiers are pointless (from a performance perspective) in Firefly and still riddled with bugs in Giant with only minor improvements. They show great promise/potential and I'm looking forward to use them, but right now (and probably for the next 1-2 releases) the best bang for the buck in speeding up Ceph is classic SSD journals for writes and lots of RAM for reads. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd snapshot slow restore
On 2014-12-16 14:53, Lindsay Mathieson wrote: Is this normal? is ceph just really slow at restoring rbd snapshots, or have I really borked my setup? I'm not looking for a fix or a tuning suggestions, just feedback on whether this is normal That is my experience as well. I rolled back a 1,5 TB volume once, and had to leave it running overnight before it would complete. -- Carl-Johan Schenström Driftansvarig / System Administrator Språkbanken & Svensk nationell datatjänst / The Swedish Language Bank & Swedish National Data Service Göteborgs universitet / University of Gothenburg carl-johan.schenst...@gu.se / +46 709 116769 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd snapshot slow restore
On 12/16/2014 04:14 PM, Carl-Johan Schenström wrote: > On 2014-12-16 14:53, Lindsay Mathieson wrote: > >>> Is this normal? is ceph just really slow at restoring rbd snapshots, >>> or have I really borked my setup? >> >> I'm not looking for a fix or a tuning suggestions, just feedback on >> whether >> this is normal > > That is my experience as well. I rolled back a 1,5 TB volume once, and > had to leave it running overnight before it would complete. > That is normal behavior. Snapshotting itself is a fast process, but restoring means merging and rolling back. It's easier to protect a snapshot and clone it into a new image and use that one. Afterwards you can flatten the image to detach the clone from the parent. Never tried if this can be done live. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd snapshot slow restore
Alexandre Derumier Ingénieur système et stockage Fixe : 03 20 68 90 88 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix 12 rue Marivaux 75002 Paris MonSiteEstLent.com - Blog dédié à la webperformance et la gestion de pics de trafic De: "Wido den Hollander" À: "ceph-users" Envoyé: Mardi 16 Décembre 2014 16:18:09 Objet: Re: [ceph-users] rbd snapshot slow restore On 12/16/2014 04:14 PM, Carl-Johan Schenström wrote: > On 2014-12-16 14:53, Lindsay Mathieson wrote: > >>> Is this normal? is ceph just really slow at restoring rbd snapshots, >>> or have I really borked my setup? >> >> I'm not looking for a fix or a tuning suggestions, just feedback on >> whether >> this is normal > > That is my experience as well. I rolled back a 1,5 TB volume once, and > had to leave it running overnight before it would complete. > That is normal behavior. Snapshotting itself is a fast process, but restoring means merging and rolling back. It's easier to protect a snapshot and clone it into a new image and use that one. Afterwards you can flatten the image to detach the clone from the parent. Never tried if this can be done live. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd snapshot slow restore
Hi, >>That is normal behavior. Snapshotting itself is a fast process, but >>restoring means merging and rolling back. Any future plan to add something similar to zfs or netapp, where you can instant rollback a snapshot ? (Not sure it's technically possible to implement such snapshot with distributed storage) - Mail original - De: "aderumier" À: "Wido den Hollander" Cc: "ceph-users" Envoyé: Mardi 16 Décembre 2014 17:02:12 Objet: Re: [ceph-users] rbd snapshot slow restore Alexandre Derumier Ingénieur système et stockage Fixe : 03 20 68 90 88 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix 12 rue Marivaux 75002 Paris MonSiteEstLent.com - Blog dédié à la webperformance et la gestion de pics de trafic ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd read speed only 1/4 of write speed
Hello, Read speed inside our vms (most of them windows) is only ¼ of the write speed. Write speed is about 450MB/s - 500mb/s and Read is only about 100/MB/s Our network is 10Gbit for OSDs and 10GB for MONS. We have 3 Servers with 15 osds each ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd snapshot slow restore
There are really only two ways to do snapshots that I know of and they have trade-offs: COW into the snapshot (like VMware, Ceph, etc): When a write is committed, the changes are committed to a diff file and the base file is left untouched. This only has a single write penalty, if you want to discard the child, it is fast as you just delete the diff file. The negative side effects is that reads may have to query each diff file before being satisfied, and if you want to delete the snapshot, but keep the changes (merge the snapshot into the base), then you have to copy all the diff blocks into the base image. COW into the base image (like most Enterprise disk systems with snapshots for backups): When a write is committed, the system reads the blocks to be changed out of the base disk and places those original blocks into a diff file, then writes the new blocks directly into the base image. The pros to this approach is that snapshots can be deleted quickly and the data is "merged" already. Read access for the current data is always fast as it only has to search one location. The cons are that each write is really a read and two writes, recovering data from a snapshot can be slow as the reads have to search one or more snapshots. My experience is that you can't have your cake and eat it too. If you have the choice, you choose the option that fits your use case best. Ceph doesn't have the ability to select which snapshot method it uses (most systems don't). I hope that helps explain why the request is not easily fulfilled. On Tue, Dec 16, 2014 at 9:04 AM, Alexandre DERUMIER wrote: > Hi, > > >>That is normal behavior. Snapshotting itself is a fast process, but > >>restoring means merging and rolling back. > > > Any future plan to add something similar to zfs or netapp, > where you can instant rollback a snapshot ? > > (Not sure it's technically possible to implement such snapshot with > distributed storage) > > > > - Mail original - > De: "aderumier" > À: "Wido den Hollander" > Cc: "ceph-users" > Envoyé: Mardi 16 Décembre 2014 17:02:12 > Objet: Re: [ceph-users] rbd snapshot slow restore > > > > > > > > Alexandre Derumier > Ingénieur système et stockage > > > Fixe : 03 20 68 90 88 > Fax : 03 20 68 90 81 > > > 45 Bvd du Général Leclerc 59100 Roubaix > 12 rue Marivaux 75002 Paris > > > MonSiteEstLent.com - Blog dédié à la webperformance et la gestion de pics > de trafic > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Dual RADOSGW Network
You may need split horizon DNS. The internal machines' DNS should resolve to the internal IP, and the external machines' DNS should resolve to the external IP. There are various ways to do that. The RadosGW config has an example of setting up Dnsmasq: http://ceph.com/docs/master/radosgw/config/#enabling-subdomain-s3-calls On Tue, Dec 16, 2014 at 3:05 AM, Georgios Dimitrakakis wrote: > > Thanks Craig. > > I will try that! > > I thought it was more complicate than that because of the entries for the > "public_network" and "rgw dns name" in the config file... > > I will give it a try. > > Best, > > > George > > > > That shouldnt be a problem. Just have Apache bind to all interfaces >> instead of the external IP. >> >> In my case, I only have Apache bound to the internal interface. My >> load balancer has an external and internal IP, and Im able to talk to >> it on both interfaces. >> >> On Mon, Dec 15, 2014 at 2:00 PM, Georgios Dimitrakakis wrote: >> >> Hi all! >>> >>> I have a single CEPH node which has two network interfaces. >>> >>> One is configured to be accessed directly by the internet (153.*) >>> and the other one is configured on an internal LAN (192.*) >>> >>> For the moment radosgw is listening on the external (internet) >>> interface. >>> >>> Can I configure radosgw to be accessed by both interfaces? What I >>> would like to do is to save bandwidth and time for the machines on >>> the internal network and use the internal net for all rados >>> communications. >>> >>> Any ideas? >>> >>> Best regards, >>> >>> George >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com [1] >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2] >>> >> >> >> Links: >> -- >> [1] mailto:ceph-users@lists.ceph.com >> [2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> [3] mailto:gior...@acmac.uoc.gr >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd read speed only 1/4 of write speed
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 17/12/14 05:26, VELARTIS Philipp Dürhammer wrote: > Hello, > > > > Read speed inside our vms (most of them windows) is only ¼ of the > write speed. > > Write speed is about 450MB/s – 500mb/s and > > Read is only about 100/MB/s > > > > Our network is 10Gbit for OSDs and 10GB for MONS. We have 3 Servers > with 15 osds each We saw similar things, until we started playing around with read ahead parameters inside the VMs. Our environment is almost 100% Ubuntu, but the same basic principles should hold. I'm pretty sure that there have been previous posts to the lists about this, but the value(s) we tweaked are: /sys/block/$device/queue/read_ahead_kb It defaults to 128, but we had pretty drastic increases to read speed all of the way up to around 8192 (8 MB) with no obvious regressions to random read speed. I'm not sure what the equivalent option is in Windows, sorry. Unfortunately this is a per VM (per disk per VM, even) setting, but it can be automated to some degree. We have a udev rule snippet pushed out to each VM in order to set the value(s). It may also be worth investigating read ahead options on the storage nodes themselves, both at the OS and disk controller levels. This isn't something we've yet been able to test, however. - -- David Clarke Systems Architect Catalyst IT -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQIcBAEBCAAGBQJUkJuQAAoJEPH5xSy8rJPMGnYP/iwfzBJMHsJ9oYetY+s0U6id IeA6iAK3DjTSJmJE0reO7olUBZ6Kq1T/u5yW/wA/qvdPw/UTb0G9NsY80DXsc2nc 74EUM4o6TiqzauUqxWxiLyh/vaBg5WtFmmJlCwGxclWrCuazeTYmTTUmN76WKJ4G nrR2m+HlZMz0jaOcwoTB5dQALVvJ4g4zRg4Cz4M8sLJX13pEZDRC1FGfiAA0OvxZ kUFZBtvFhQxAVisTBc7gqiz9PwAk1vn8WlfaUD2h1DNNoQBE4wqLW8QgWbGrnrWl mIrtKH5c8ykDIR2aaSgcWiYw3QrOgU/2PyRKw/yrMuhany7VXPA3oJMnr7HZNAUn Z0t3bPBjrsdayERwJJz4PscZuIVvAoJs8He+ssg7BT5/R9jRJawL23FDxcbtUmp6 1io/y2GSL7SQhR9vlRl6/AFGJc24nU52wRzkUHCcCMVz3AgBv7bfyBE+7DzaJTNs /0MvIea8u8owDnWP/YTfYLNBOwi6WqZG4m8IqrygvlCTO5Ijtl1CUhvxFKaVzdGi FAoG+k3x5HXdw8T1sgMeiF6kLx0ifw6pe5J4bxkNMkoNvqS9f3vPARe+FGgtl/QM inli/qGipqYvqfgzONNILNkpp+k67dAjCfNT22jvAUtnFDEOmYMSCIDcwoi0rVAP Yqn0EwypITm1jEDRrqof =wu0K -END PGP SIGNATURE- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Erasure coded PGs incomplete
Hello, I'm trying to create an erasure pool following http://docs.ceph.com/docs/master/rados/operations/erasure-code/, but when I try create a pool with a specifc erasure-code-profile ("myprofile") the PGs became on incomplete state. Anyone can help me? Below the profile I created: root@ceph0001:~# ceph osd erasure-code-profile get myprofile directory=/usr/lib/ceph/erasure-code k=6 m=2 plugin=jerasure technique=reed_sol_van The status of cluster: root@ceph0001:~# ceph health HEALTH_WARN 12 pgs incomplete; 12 pgs stuck inactive; 12 pgs stuck unclean health detail: root@ceph0001:~# ceph health detail HEALTH_WARN 12 pgs incomplete; 12 pgs stuck inactive; 12 pgs stuck unclean pg 2.9 is stuck inactive since forever, current state incomplete, last acting [4,10,15,2147483647,3,2147483647,2147483647,2147483647] pg 2.8 is stuck inactive since forever, current state incomplete, last acting [0,2147483647,4,2147483647,10,2147483647,15,2147483647] pg 2.b is stuck inactive since forever, current state incomplete, last acting [8,3,14,2147483647,5,2147483647,2147483647,2147483647] pg 2.a is stuck inactive since forever, current state incomplete, last acting [11,7,2,2147483647,2147483647,2147483647,15,2147483647] pg 2.5 is stuck inactive since forever, current state incomplete, last acting [12,8,5,1,2147483647,2147483647,2147483647,2147483647] pg 2.4 is stuck inactive since forever, current state incomplete, last acting [5,2147483647,13,1,2147483647,2147483647,8,2147483647] pg 2.7 is stuck inactive since forever, current state incomplete, last acting [12,2,10,7,2147483647,2147483647,2147483647,2147483647] pg 2.6 is stuck inactive since forever, current state incomplete, last acting [9,15,2147483647,4,2,2147483647,2147483647,2147483647] pg 2.1 is stuck inactive since forever, current state incomplete, last acting [2,4,2147483647,13,2147483647,10,2147483647,2147483647] pg 2.0 is stuck inactive since forever, current state incomplete, last acting [14,1,2147483647,4,10,2147483647,2147483647,2147483647] pg 2.3 is stuck inactive since forever, current state incomplete, last acting [14,11,6,2147483647,2147483647,2147483647,2,2147483647] pg 2.2 is stuck inactive since forever, current state incomplete, last acting [13,5,11,2147483647,2147483647,3,2147483647,2147483647] pg 2.9 is stuck unclean since forever, current state incomplete, last acting [4,10,15,2147483647,3,2147483647,2147483647,2147483647] pg 2.8 is stuck unclean since forever, current state incomplete, last acting [0,2147483647,4,2147483647,10,2147483647,15,2147483647] pg 2.b is stuck unclean since forever, current state incomplete, last acting [8,3,14,2147483647,5,2147483647,2147483647,2147483647] pg 2.a is stuck unclean since forever, current state incomplete, last acting [11,7,2,2147483647,2147483647,2147483647,15,2147483647] pg 2.5 is stuck unclean since forever, current state incomplete, last acting [12,8,5,1,2147483647,2147483647,2147483647,2147483647] pg 2.4 is stuck unclean since forever, current state incomplete, last acting [5,2147483647,13,1,2147483647,2147483647,8,2147483647] pg 2.7 is stuck unclean since forever, current state incomplete, last acting [12,2,10,7,2147483647,2147483647,2147483647,2147483647] pg 2.6 is stuck unclean since forever, current state incomplete, last acting [9,15,2147483647,4,2,2147483647,2147483647,2147483647] pg 2.1 is stuck unclean since forever, current state incomplete, last acting [2,4,2147483647,13,2147483647,10,2147483647,2147483647] pg 2.0 is stuck unclean since forever, current state incomplete, last acting [14,1,2147483647,4,10,2147483647,2147483647,2147483647] pg 2.3 is stuck unclean since forever, current state incomplete, last acting [14,11,6,2147483647,2147483647,2147483647,2,2147483647] pg 2.2 is stuck unclean since forever, current state incomplete, last acting [13,5,11,2147483647,2147483647,3,2147483647,2147483647] pg 2.9 is incomplete, acting [4,10,15,2147483647,3,2147483647,2147483647,2147483647] (reducing pool ecpool min_size from 6 may help; search ceph.com/docs for 'incomplete') pg 2.8 is incomplete, acting [0,2147483647,4,2147483647,10,2147483647,15,2147483647] (reducing pool ecpool min_size from 6 may help; search ceph.com/docs for 'incomplete') pg 2.b is incomplete, acting [8,3,14,2147483647,5,2147483647,2147483647,2147483647] (reducing pool ecpool min_size from 6 may help; search ceph.com/docs for 'incomplete') pg 2.a is incomplete, acting [11,7,2,2147483647,2147483647,2147483647,15,2147483647] (reducing pool ecpool min_size from 6 may help; search ceph.com/docs for 'incomplete') pg 2.5 is incomplete, acting [12,8,5,1,2147483647,2147483647,2147483647,2147483647] (reducing pool ecpool min_size from 6 may help; search ceph.com/docs for 'incomplete') pg 2.4 is incomplete, acting [5,2147483647,13,1,2147483647,2147483647,8,2147483647] (reducing pool ecpool min_size from 6 may help; search ceph.com/docs for 'incomplete') pg 2.7 is incomplete, acting [12,2,10,7,2147483
Re: [ceph-users] Erasure coded PGs incomplete
Hi, The 2147483647 means that CRUSH did not find enough OSD for a given PG. If you check the crush rule associated with the erasure coded pool, you will most probably find why. Cheers On 16/12/2014 23:32, Italo Santos wrote: > Hello, > > I'm trying to create an erasure pool following > http://docs.ceph.com/docs/master/rados/operations/erasure-code/, but when I > try create a pool with a specifc erasure-code-profile ("myprofile") the PGs > became on incomplete state. > > Anyone can help me? > > Below the profile I created: > root@ceph0001:~# ceph osd erasure-code-profile get myprofile > directory=/usr/lib/ceph/erasure-code > k=6 > m=2 > plugin=jerasure > technique=reed_sol_van > > The status of cluster: > root@ceph0001:~# ceph health > HEALTH_WARN 12 pgs incomplete; 12 pgs stuck inactive; 12 pgs stuck unclean > > health detail: > root@ceph0001:~# ceph health detail > HEALTH_WARN 12 pgs incomplete; 12 pgs stuck inactive; 12 pgs stuck unclean > pg 2.9 is stuck inactive since forever, current state incomplete, last acting > [4,10,15,2147483647,3,2147483647,2147483647,2147483647] > pg 2.8 is stuck inactive since forever, current state incomplete, last acting > [0,2147483647,4,2147483647,10,2147483647,15,2147483647] > pg 2.b is stuck inactive since forever, current state incomplete, last acting > [8,3,14,2147483647,5,2147483647,2147483647,2147483647] > pg 2.a is stuck inactive since forever, current state incomplete, last acting > [11,7,2,2147483647,2147483647,2147483647,15,2147483647] > pg 2.5 is stuck inactive since forever, current state incomplete, last acting > [12,8,5,1,2147483647,2147483647,2147483647,2147483647] > pg 2.4 is stuck inactive since forever, current state incomplete, last acting > [5,2147483647,13,1,2147483647,2147483647,8,2147483647] > pg 2.7 is stuck inactive since forever, current state incomplete, last acting > [12,2,10,7,2147483647,2147483647,2147483647,2147483647] > pg 2.6 is stuck inactive since forever, current state incomplete, last acting > [9,15,2147483647,4,2,2147483647,2147483647,2147483647] > pg 2.1 is stuck inactive since forever, current state incomplete, last acting > [2,4,2147483647,13,2147483647,10,2147483647,2147483647] > pg 2.0 is stuck inactive since forever, current state incomplete, last acting > [14,1,2147483647,4,10,2147483647,2147483647,2147483647] > pg 2.3 is stuck inactive since forever, current state incomplete, last acting > [14,11,6,2147483647,2147483647,2147483647,2,2147483647] > pg 2.2 is stuck inactive since forever, current state incomplete, last acting > [13,5,11,2147483647,2147483647,3,2147483647,2147483647] > pg 2.9 is stuck unclean since forever, current state incomplete, last acting > [4,10,15,2147483647,3,2147483647,2147483647,2147483647] > pg 2.8 is stuck unclean since forever, current state incomplete, last acting > [0,2147483647,4,2147483647,10,2147483647,15,2147483647] > pg 2.b is stuck unclean since forever, current state incomplete, last acting > [8,3,14,2147483647,5,2147483647,2147483647,2147483647] > pg 2.a is stuck unclean since forever, current state incomplete, last acting > [11,7,2,2147483647,2147483647,2147483647,15,2147483647] > pg 2.5 is stuck unclean since forever, current state incomplete, last acting > [12,8,5,1,2147483647,2147483647,2147483647,2147483647] > pg 2.4 is stuck unclean since forever, current state incomplete, last acting > [5,2147483647,13,1,2147483647,2147483647,8,2147483647] > pg 2.7 is stuck unclean since forever, current state incomplete, last acting > [12,2,10,7,2147483647,2147483647,2147483647,2147483647] > pg 2.6 is stuck unclean since forever, current state incomplete, last acting > [9,15,2147483647,4,2,2147483647,2147483647,2147483647] > pg 2.1 is stuck unclean since forever, current state incomplete, last acting > [2,4,2147483647,13,2147483647,10,2147483647,2147483647] > pg 2.0 is stuck unclean since forever, current state incomplete, last acting > [14,1,2147483647,4,10,2147483647,2147483647,2147483647] > pg 2.3 is stuck unclean since forever, current state incomplete, last acting > [14,11,6,2147483647,2147483647,2147483647,2,2147483647] > pg 2.2 is stuck unclean since forever, current state incomplete, last acting > [13,5,11,2147483647,2147483647,3,2147483647,2147483647] > pg 2.9 is incomplete, acting > [4,10,15,2147483647,3,2147483647,2147483647,2147483647] (reducing pool ecpool > min_size from 6 may help; search ceph.com/docs for 'incomplete') > pg 2.8 is incomplete, acting > [0,2147483647,4,2147483647,10,2147483647,15,2147483647] (reducing pool ecpool > min_size from 6 may help; search ceph.com/docs for 'incomplete') > pg 2.b is incomplete, acting > [8,3,14,2147483647,5,2147483647,2147483647,2147483647] (reducing pool ecpool > min_size from 6 may help; search ceph.com/docs for 'incomplete') > pg 2.a is incomplete, acting > [11,7,2,2147483647,2147483647,2147483647,15,2147483647] (reducing pool ecpool > min_size from 6 may help; search ceph.com/docs for 'incomplet
Re: [ceph-users] Test 6
On Tue, 16 Dec 2014 07:57:19 AM Leen de Braal wrote: > If you are trying to see if your mails come through, don't check on the > list. You have a gmail account, gmail removes mails that you have sent > yourself. Not the case, I am on a dozen other mailman lists via gmail, all of them show my posts. ceph-users is the only exception. However ceph-us...@ceph.com seems to work reliably rather than using ceph- us...@lists.ceph.com > You can check the archives to see. A number of my posts are missing from there. Some are there, it seems very erratic. -- Lindsay signature.asc Description: This is a digitally signed message part. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd snapshot slow restore
On 17 December 2014 at 04:50, Robert LeBlanc wrote: > There are really only two ways to do snapshots that I know of and they have > trade-offs: > > COW into the snapshot (like VMware, Ceph, etc): > > When a write is committed, the changes are committed to a diff file and the > base file is left untouched. This only has a single write penalty, This is when you are accessing the snapshot image? I suspect I'm probably looking at this differently - when I take a snapshot I never access it "live", I only ever restore it - would that be merging it back into the base? > > COW into the base image (like most Enterprise disk systems with snapshots > for backups): > > When a write is committed, the system reads the blocks to be changed out of > the base disk and places those original blocks into a diff file, then writes > the new blocks directly into the base image. The pros to this approach is > that snapshots can be deleted quickly and the data is "merged" already. Read > access for the current data is always fast as it only has to search one > location. The cons are that each write is really a read and two writes, > recovering data from a snapshot can be slow as the reads have to search one > or more snapshots. Whereabout does qcow2 fall on this spectrum? Thanks, -- Lindsay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd read speed only 1/4 of write speed
On Tue, 16 Dec 2014 16:26:17 + VELARTIS Philipp Dürhammer wrote: > Hello, > > Read speed inside our vms (most of them windows) is only ¼ of the write > speed. Write speed is about 450MB/s - 500mb/s and > Read is only about 100/MB/s > > Our network is 10Gbit for OSDs and 10GB for MONS. We have 3 Servers with > 15 osds each > Basically what David Clarke wrote, it has indeed been discussed several times. Find my "The woes of sequential reads" thread, it has data and a link to a blueprint that is attempting to fix this on the Ceph side. Unfortunately I don't think there has been any progress with this. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd read speed only 1/4 of write speed
On 12/16/2014 07:08 PM, Christian Balzer wrote: On Tue, 16 Dec 2014 16:26:17 + VELARTIS Philipp Dürhammer wrote: Hello, Read speed inside our vms (most of them windows) is only ¼ of the write speed. Write speed is about 450MB/s - 500mb/s and Read is only about 100/MB/s Our network is 10Gbit for OSDs and 10GB for MONS. We have 3 Servers with 15 osds each Basically what David Clarke wrote, it has indeed been discussed several times. Find my "The woes of sequential reads" thread, it has data and a link to a blueprint that is attempting to fix this on the Ceph side. Unfortunately I don't think there has been any progress with this. Christian Yeah read ahead definitely seems to help quite a bit. I've been wondering now with the work going into improving random read performance on SSDs if we are going to pay for it sooner or later, but so far increasing readahead seems to typically be a win. Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] can not add osd
From official Ceph docs,i still get the same err: [root@node3 ceph-cluster]# ceph-deploy osd activate node2:/dev/sdb1 [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.21): /usr/bin/ceph-deploy osd activate node2:/dev/sdb1 [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks node2:/dev/sdb1: [node2][DEBUG ] connected to host: node2 [node2][DEBUG ] detect platform information from remote host [node2][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.0.1406 Core [ceph_deploy.osd][DEBUG ] activating host node2 disk /dev/sdb1 [ceph_deploy.osd][DEBUG ] will use init type: sysvinit [node2][INFO ] Running command: ceph-disk -v activate --mark-init sysvinit --mount /dev/sdb1 [node2][WARNIN] INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/sdb1 [node2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs [node2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs [node2][WARNIN] DEBUG:ceph-disk:Mounting /dev/sdb1 on /var/lib/ceph/tmp/mnt.NC9pdv with options noatime,inode64 [node2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/sdb1 /var/lib/ceph/tmp/mnt.NC9pdv [node2][WARNIN] DEBUG:ceph-disk:Cluster uuid is cadb2f14-e2ea-41fb-8050-a2f0fe447475 [node2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid [node2][WARNIN] DEBUG:ceph-disk:Cluster name is ceph [node2][WARNIN] DEBUG:ceph-disk:OSD uuid is 8bbf6631-8722-4e97-bf18-06253143acf6 [node2][WARNIN] DEBUG:ceph-disk:Allocating OSD id... [node2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd create --concise 8bbf6631-8722-4e97-bf18-06253143acf6 [node2][WARNIN] ERROR:ceph-disk:Failed to activate [node2][WARNIN] DEBUG:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.NC9pdv [node2][WARNIN] Traceback (most recent call last): [node2][WARNIN] File "/usr/sbin/ceph-disk", line 2784, in [node2][WARNIN] main() [node2][WARNIN] File "/usr/sbin/ceph-disk", line 2762, in main [node2][WARNIN] args.func(args) [node2][WARNIN] File "/usr/sbin/ceph-disk", line 1996, in main_activate [node2][WARNIN] init=args.mark_init, [node2][WARNIN] File "/usr/sbin/ceph-disk", line 1819, in mount_activate [node2][WARNIN] os.rmdir(path) [node2][WARNIN] OSError: [Errno 16] Device or resource busy: '/var/lib/ceph/tmp/mnt.NC9pdv' [node2][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk -v activate --mark-init sysvinit --mount /dev/sdb1 发件人: Karan Singh 收件人: yang.bi...@zte.com.cn, 抄送: ceph-users 日期: 2014/12/16 22:51 主题: Re: [ceph-users] can not add osd Hi You logs does not provides much information , if you are following any other documentation for Ceph , i would recommend you to follow official Ceph docs. http://ceph.com/docs/master/start/quick-start-preflight/ Karan Singh Systems Specialist , Storage Platforms CSC - IT Center for Science, Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland mobile: +358 503 812758 tel. +358 9 4572001 fax +358 9 4572302 http://www.csc.fi/ On 16 Dec 2014, at 09:55, yang.bi...@zte.com.cn wrote: hi When i execute "ceph-deploy osd prepare node3:/dev/sdb",always come out err like this : [node3][WARNIN] INFO:ceph-disk:Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.u2KXW3 [node3][WARNIN] umount: /var/lib/ceph/tmp/mnt.u2KXW3: target is busy. Then i execute "/bin/umount -- /var/lib/ceph/tmp/mnt.u2KXW3",result is ok. ZTE Information Security Notice: The information contained in this mail (and any attachment transmitted herewith) is privileged and confidential and is intended for the exclusive use of the addressee(s). If you are not an intended recipient, any disclosure, reproduction, distribution or other dissemination or use of the information contained is strictly prohibited. If you have received this mail in error, please delete it and notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ZTE Information Security Notice: The information contained in this mail (and any attachment transmitted herewith) is privileged and confidential and is intended for the exclusive use of the addressee(s). If you are not an intended recipient, any disclosure, reproduction, distribution or other d
Re: [ceph-users] File System stripping data
Hello, I am trying to set the extended attribute to a newly created created directory (call it "dir" here) using setfattr. I run the following command. setfattr -n ceph.dir.layout.stripe_count -v 2 dir And return: setfattr: dir: Operation not supported I am wondering if the underlying file system does not support xattr. Has anyone ever run into similar problem before? I deployed CephFS on Debian wheezy. And here is the mounting information: ceph-fuse on /dfs type fuse.ceph-fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other) Many thanks, Kevin On Mon Dec 15 2014 at 1:49:15 AM PST John Spray wrote: > Yes, setfattr is the preferred way. The docs are here: > http://ceph.com/docs/master/cephfs/file-layouts/ > > Cheers, > John > > On Mon, Dec 15, 2014 at 8:12 AM, Ilya Dryomov > wrote: > > On Sun, Dec 14, 2014 at 10:38 AM, Kevin Shiah wrote: > >> Hello All, > >> > >> Does anyone know how to configure data stripping when using ceph as file > >> system? My understanding is that configuring stripping with rbd is only > for > >> block device. > > > > You should be able to set layout.* xattrs on directories and empty > > files (directory layout just sets the default layout for the newly > > created files within it). There are also a couple of ioctls which do > > essentially the same thing but I think their use is discouraged. > > John will correct me if I'm wrong. > > > > Thanks, > > > > Ilya > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd snapshot slow restore
On Tue, Dec 16, 2014 at 5:37 PM, Lindsay Mathieson < lindsay.mathie...@gmail.com> wrote: > > On 17 December 2014 at 04:50, Robert LeBlanc wrote: > > There are really only two ways to do snapshots that I know of and they > have > > trade-offs: > > > > COW into the snapshot (like VMware, Ceph, etc): > > > > When a write is committed, the changes are committed to a diff file and > the > > base file is left untouched. This only has a single write penalty, > > This is when you are accessing the snapshot image? > > I suspect I'm probably looking at this differently - when I take a snapshot > I never access it "live", I only ever restore it - would that be merging it > back into the base? > I'm not sure what you mean by this. If you take a snapshot then you technically only work on the snapshot. If in VMware (sorry, most of my experience comes from VMware, but I believe KVM is the same) you take a snapshot, then the VM immediately uses the snapshot for all the writes/reads. You then have three options: 1. keep the snapshot indefinitely, 2. revert back to the snapshot point, or 3. delete the snapshot and merge the changes into the base to make it permanent. In case "2" the reverting of the snapshot is fast because it only deletes the diff file and points back to the original base disk ready to make a new diff file. In case "3" depending on how much write activity to "new" blocks have happened, then it may take a long time to copy the blocks into the base disk. Rereading your previous post, I understand that you are using rbd snapshots and then using the rbd rollback command. You are testing this performance vs. the rollback feature in QEMU/KVM when on local/NFS disk. Is that accurate? I haven't used the rollback feature. If you want to go back to a snapshot, would it be faster to create a clone off the snapshot, then run your VM off that, then just delete and recreate the clone? rbd snap create rbd/test-image@snap1 rbd snap protect rbd/test-image@snap1 rbd clone rbd/test-image@snap1 rbd/test-image-snap1 You can then run: rbd rm rbd/test-image-snap1 rbd clone rbd/test-image@snap1 rbd/test-image-snap1 to revert back to the original snapshot. > Whereabout does qcow2 fall on this spectrum? > > I think qcow2 falls into the same category as VMware, but I'm still cutting my teeth on QEMU/KVM. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Test 6
I always wondered why my posts didn't show up until somebody replied to them. I thought it was my filters. Thanks! On Mon, Dec 15, 2014 at 10:57 PM, Leen de Braal wrote: > > If you are trying to see if your mails come through, don't check on the > list. You have a gmail account, gmail removes mails that you have sent > yourself. > You can check the archives to see. > > And your mails did come on the list. > > > > -- > > Lindsay > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > L. de Braal > BraHa Systems > NL - Terneuzen > T +31 115 649333 > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD Crash makes whole cluster unusable ?
So the problem started once remapping+backfilling started, and lasted until the cluster was healthy again? Have you adjusted any of the recovery tunables? Are you using SSD journals? I had a similar experience the first time my OSDs started backfilling. The average RadosGW operation latency went from 0.1 seconds to 10 seconds, which is longer than the default HAProxy timeout. Fun times. Since then, I've increased HAProxy's timeouts, de-prioritized Ceph's recovery, and I added SSD journals. The relevant sections of ceph.conf are: [global] mon osd down out interval = 900 mon osd min down reporters = 9 mon osd min down reports = 12 mon warn on legacy crush tunables = false osd pool default flag hashpspool = true [osd] osd max backfills = 3 osd recovery max active = 3 osd recovery op priority = 1 osd scrub sleep = 1.0 osd snap trim sleep = 1.0 Before the SSD journals, I had osd_max_backfills and osd_recovery_max_active set to 1. I watched my latency graphs, and used ceph tell osd.\* injectargs '--osd_max_backfills 1 --osd_recovery_max_active 1 to tweak the values until the latency was acceptable. On Tue, Dec 16, 2014 at 5:37 AM, Christoph Adomeit < christoph.adom...@gatworks.de> wrote: > > > Hi there, > > today I had an osd crash with ceph 0.87/giant which made my hole cluster > unusable for 45 Minutes. > > First it began with a disk error: > > sd 0:1:2:0: [sdc] CDB: Read(10)Read(10):: 28 28 00 00 0d 15 fe d0 fd 7b e8 > f8 00 00 00 00 b0 08 00 00 > XFS (sdc1): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. > > Then most other osds found out that my osd.3 is down: > > 2014-12-16 08:45:15.873478 mon.0 10.67.1.11:6789/0 3361077 : cluster > [INF] osd.3 10.67.1.11:6810/713621 failed (42 reports from 35 peers after > 23.642482 >= grace 23.348982) > > 5 minutes later the osd is marked as out: > 2014-12-16 08:50:21.095903 mon.0 10.67.1.11:6789/0 3361367 : cluster > [INF] osd.3 out (down for 304.581079) > > However, since 8:45 until 9:20 I have 1000 slow requests and 107 > incomplete pgs. Many requests are not answered: > > 2014-12-16 08:46:03.029094 mon.0 10.67.1.11:6789/0 3361126 : cluster > [INF] pgmap v6930583: 4224 pgs: 4117 active+clean, 107 incomplete; 7647 GB > data, 19090 GB used, 67952 GB / 87042 GB avail; 2307 kB/s rd, 2293 kB/s wr, > 407 op/s > > Also a recovery to another osd was not starting > > Seems the osd thinks it is still up and all other osds think this osd is > down ? > I found this in the log of osd3: > ceph-osd.3.log:2014-12-16 08:45:19.319152 7faf81296700 0 > log_channel(default) log [WRN] : map e61177 wrongly marked me down > ceph-osd.3.log: -440> 2014-12-16 08:45:19.319152 7faf81296700 0 > log_channel(default) log [WRN] : map e61177 wrongly marked me down > > Luckily I was able to restart osd3 and everything was working again but I > do not understand what has happened. The cluster ways simply not usable for > 45 Minutes. > > Any ideas > > Thanks > Christoph > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd snapshot slow restore
On 17 December 2014 at 11:50, Robert LeBlanc wrote: > > > On Tue, Dec 16, 2014 at 5:37 PM, Lindsay Mathieson > wrote: >> >> On 17 December 2014 at 04:50, Robert LeBlanc wrote: >> > There are really only two ways to do snapshots that I know of and they >> > have >> > trade-offs: >> > >> > COW into the snapshot (like VMware, Ceph, etc): >> > >> > When a write is committed, the changes are committed to a diff file and >> > the >> > base file is left untouched. This only has a single write penalty, >> >> This is when you are accessing the snapshot image? >> >> I suspect I'm probably looking at this differently - when I take a >> snapshot >> I never access it "live", I only ever restore it - would that be merging >> it >> back into the base? > > > I'm not sure what you mean by this. If you take a snapshot then you > technically only work on the snapshot. If in VMware (sorry, most of my > experience comes from VMware, but I believe KVM is the same) you take a > snapshot, then the VM immediately uses the snapshot for all the > writes/reads. You then have three options: 1. keep the snapshot > indefinitely, 2. revert back to the snapshot point, or 3. delete the > snapshot and merge the changes into the base to make it permanent. I suspect I'm using terms different;y, probably because I don't know what is really happening underneath. To me a VM snapshot is a static thing you you can roll back to, but all VM activity takes place on the "main" image. > > In case "2" the reverting of the snapshot is fast because it only deletes > the diff file and points back to the original base disk ready to make a new > diff file. What happens if you have multiple snapshots? e.g. Snap 1, 2 & 3. Deleting Snap 2 won't be a simple rollback to the base. > > In case "3" depending on how much write activity to "new" blocks have > happened, then it may take a long time to copy the blocks into the base > disk. > > Rereading your previous post, I understand that you are using rbd snapshots > and then using the rbd rollback command. You are testing this performance > vs. the rollback feature in QEMU/KVM when on local/NFS disk. Is that > accurate? Yes, though the rollback feature is a function of the image format used (e.g qcow2), not something specific to qemu. If you use RAW then snapshots are not supported. > > I haven't used the rollback feature. If you want to go back to a snapshot, > would it be faster to create a clone off the snapshot, then run your VM off > that, then just delete and recreate the clone? I'll test that, but wouldn't it involve flattening the clone, which is also a very slow process? I don't know if this is relevant, but with qcow2 and vmware rolling back or deleting snapshots are both operations that only take a few tens of seconds. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Placing Different Pools on Different OSDS
I've found the problem. The command "ceph osd crush rule create-simple ssd_ruleset ssd root" should be "ceph osd crush rule create-simple ssd_ruleset ssd host" ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Help with Integrating Ceph with various Cloud Storage
Hi All, I am new to Ceph. Due to physical machines shortage I have installed Ceph cluster with single OSD and MON in a single Virtual Machine. I have few queries as below: 1. Whether having the Ceph setup on a VM is fine or it require to be on Physical server. 2. Since Amazon S3, Azure Blob Storage, Swift are Object based Storage, what is the feasibility of attaching these Cloud Storage to Ceph and to be able to allocate disc space from the same while creating new VM from local CloudStack or OpenStack 3. When I am integrating CloudStack with Ceph whether libvert should be installed on the CloudStack management server or on Ceph server. From diagram given in Ceph documentation it's bit confusing. Thank you in advance. your help shall be really appreciated. Best Regards, Manoj Kumar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com