Re: [ceph-users] help with ceph radosgw configure
Thanks for your reply. I am sure that there is only one web server in CentOS. All my steps are as follows: First of all, I have set DNS, so that I can nslookup ceph65 nslookup a.ceph65 nslookup anyother.ceph65 Then 1. yum install httpd mod_fastcgi mod_ssl rm /etc/httpd/conf.d/welcome.conf rm /var/www/error/noindex.html 2. vi /etc/httpd/conf/httpd.conf - Listen 65080 ServerName ceph65 - 3. make sure that LoadModule rewrite_module modules/mod_rewrite.so LoadModule fastcgi_module modules/mod_fastcgi.so LoadModule ssl_module modules/mod_ssl.so 4. Generate ssl cd /etc/pki/tls/private/ openssl genrsa -des3 -out server.key 2048 openssl req -new -key server.key -out server.csr cp server.key server.key.orig openssl rsa -in server.key.orig -out server.key openssl x509 -req -days 65535 -in server.csr -signkey server.key -out server.crt rm server.key.orig server.csr mv server.crt /etc/pki/tls/certs 5. vi /etc/httpd/conf.d/ssl.conf - Listen 65443 SSLCertificateFile /etc/pki/tls/certs/server.crt SSLCertificateKeyFile /etc/pki/tls/private/server.key - 6. install ceph-radosgw and radosgw-agent yum install ceph-radosgw radosgw-agent 7. vi ceph.conf and copy to other ceph server - [client.radosgw.gateway] host = ceph65 public_addr = 192.168.8.183 rgw_dns_name = 127.0.0.1 keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock log_file = /var/log/ceph/radosgw.log - 8. mkdir -p /var/lib/ceph/radosgw/ceph-radosgw.gateway 9. ceph-authtool --create-keyring /etc/ceph/keyring.radosgw.gateway chmod +r /etc/ceph/keyring.radosgw.gateway ceph-authtool /etc/ceph/keyring.radosgw.gateway -n client.radosgw.gateway --gen-key ceph-authtool -n client.radosgw.gateway --cap osd 'allow rwx' --cap mon 'allow rw' /etc/ceph/keyring.radosgw.gateway ceph auth add client.radosgw.gateway -i /etc/ceph/keyring.radosgw.gateway 10. vi /etc/httpd/conf.d/fastcgi.conf - FastCgiWrapper Off - 11. vi /etc/httpd/conf.d/rgw.conf - ServerName ceph65 ServerAdmin ceph65 DocumentRoot /var/www/html RewriteEngine On RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /s3gw.fcgi?page=$1¶ms=$2&%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] Options +ExecCGI AllowOverride All SetHandler fastcgi-script Order allow,deny Allow from all AuthBasicAuthoritative Off AllowEncodedSlashes On ErrorLog /var/log/httpd/rgw_error_log CustomLog /var/log/httpd/rgw_access_log combined ServerSignature Off SetEnv SERVER_PORT_SECURE 65443 ServerName ceph65 ServerAdmin ceph65 DocumentRoot /var/www/html #ErrorLog logs/ssl_error_log #TransferLog logs/ssl_access_log LogLevel warn SSLEngine on SSLProtocol all -SSLv2 SSLCipherSuite ALL:!ADH:!EXPORT:!SSLv2:RC4+RSA:+HIGH:+MEDIUM:+LOW SSLCertificateFile /etc/pki/tls/certs/server.crt SSLCertificateKeyFile /etc/pki/tls/private/server.key SSLOptions +StdEnvVars SSLOptions +StdEnvVars SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown downgrade-1.0 force-response-1.0 CustomLog logs/ssl_request_log "%t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x \"%r\" %b" RewriteEngine On RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /s3gw.fcgi?page=$1¶ms=$2&%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] Options +ExecCGI AllowOverride All SetHandler fastcgi-script Order allow,deny Allow from all AuthBasicAuthoritative Off AllowEncodedSlashes On ErrorLog /var/log/httpd/rgw_error_log CustomLog /var/log/httpd/rgw_access_log combined ServerSignature Off SetEnv SERVER_PORT_SECURE 65443 FastCgiExternalServer /var/www/html/s3gw.fcgi -socket /tmp/radosgw.sock - 12. vi /var/www/html/s3gw.fcgi and chmod +x /var/www/html/s3gw.fcgi - #!/bin/sh exec /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway - 13. rm -rf /tmp/radosgw.sock 14. start radosgw chkconfig --add ceph-radosgw chkconfig ceph-radosgw on service ceph -a restart service httpd restart service ceph-radosgw start servi
[ceph-users] RBD as backend for iSCSI SAN Targets
Hi Everyone, I am just wondering if any of you are running a ceph cluster with an iSCSI target front end? I know this isn’t available out of the box, unfortunately in one particular use case we are looking at providing iSCSI access and it's a necessity. I am liking the idea of having rbd devices serving block level storage to the iSCSI Target servers while providing a unified backed for native rbd access by openstack and various application servers. On multiple levels this would reduce the complexity of our SAN environment and move us away from expensive proprietary solutions that don’t scale out. If any of you have deployed any HA iSCSI Targets backed by rbd I would really appreciate your feedback and any thoughts. Karol ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Replication lag in block storage
Which model you have hard drives? 2014-03-14 21:59 GMT+04:00 Greg Poirier : > We are stressing these boxes pretty spectacularly at the moment. > > On every box I have one OSD that is pegged for IO almost constantly. > > ceph-1: > Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > sdv 0.00 0.00 104.00 160.00 748.00 1000.0013.24 > 1.154.369.461.05 3.70 97.60 > > ceph-2: > Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > sdq 0.0025.00 109.00 218.00 844.00 1773.5016.01 > 1.374.209.031.78 3.01 98.40 > > ceph-3: > Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > sdm 0.00 0.00 126.00 56.00 996.00 540.0016.88 > 1.015.588.060.00 5.43 98.80 > > These are all disks in my block storage pool. > > osdmap e26698: 102 osds: 102 up, 102 in > pgmap v6752413: 4624 pgs, 3 pools, 14151 GB data, 21729 kobjects > 28517 GB used, 65393 GB / 93911 GB avail > 4624 active+clean > client io 1915 kB/s rd, 59690 kB/s wr, 1464 op/s > > I don't see any smart errors, but i'm slowly working my way through all of > the disks on these machines with smartctl to see if anything stands out. > > > On Fri, Mar 14, 2014 at 9:52 AM, Gregory Farnum wrote: > >> On Fri, Mar 14, 2014 at 9:37 AM, Greg Poirier >> wrote: >> > So, on the cluster that I _expect_ to be slow, it appears that we are >> > waiting on journal commits. I want to make sure that I am reading this >> > correctly: >> > >> > "received_at": "2014-03-14 12:14:22.659170", >> > >> > { "time": "2014-03-14 12:14:22.660191", >> > "event": "write_thread_in_journal_buffer"}, >> > >> > At this point we have received the write and are attempting to write the >> > transaction to the OSD's journal, yes? >> > >> > Then: >> > >> > { "time": "2014-03-14 12:14:22.900779", >> > "event": "journaled_completion_queued"}, >> > >> > 240ms later we have successfully written to the journal? >> >> Correct. That seems an awfully long time for a 16K write, although I >> don't know how much data I have on co-located journals. (At least, I'm >> assuming it's in the 16K range based on the others, although I'm just >> now realizing that subops aren't providing that information...I've >> created a ticket to include that diagnostic info in future.) >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> > I expect this particular slowness due to colocation of journal and data >> on >> > the same disk (and it's a spinning disk, not an SSD). I expect some of >> this >> > could be alleviated by migrating journals to SSDs, but I am looking to >> > rebuild in the near future--so am willing to hobble in the meantime. >> > >> > I am surprised that our all SSD cluster is also underperforming. I am >> trying >> > colocating the journal on the same disk with all SSDs at the moment and >> will >> > see if the performance degradation is of the same nature. >> > >> > >> > >> > On Thu, Mar 13, 2014 at 6:25 PM, Gregory Farnum >> wrote: >> >> >> >> Right. So which is the interval that's taking all the time? Probably >> >> it's waiting for the journal commit, but maybe there's something else >> >> blocking progress. If it is the journal commit, check out how busy the >> >> disk is (is it just saturated?) and what its normal performance >> >> characteristics are (is it breaking?). >> >> -Greg >> >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> >> >> >> On Thu, Mar 13, 2014 at 5:48 PM, Greg Poirier > > >> >> wrote: >> >> > Many of the sub ops look like this, with significant lag between >> >> > received_at >> >> > and commit_sent: >> >> > >> >> > { "description": "osd_op(client.6869831.0:1192491 >> >> > rbd_data.67b14a2ae8944a.9105 [write 507904~3686400] >> >> > 6.556a4db0 >> >> > e660)", >> >> > "received_at": "2014-03-13 20:42:05.811936", >> >> > "age": "46.088198", >> >> > "duration": "0.038328", >> >> > >> >> > { "time": "2014-03-13 20:42:05.850215", >> >> > "event": "commit_sent"}, >> >> > { "time": "2014-03-13 20:42:05.850264", >> >> > "event": "done"}]]}, >> >> > >> >> > In this case almost 39ms between received_at and commit_sent. >> >> > >> >> > A particularly egregious example of 80+ms lag between received_at and >> >> > commit_sent: >> >> > >> >> >{ "description": "osd_op(client.6869831.0:1190526 >> >> > rbd_data.67b14a2ae8944a.8fac [write 3325952~868352] >> >> > 6.5255f5fd >> >> > e660)", >> >> > "received_at": "201
Re: [ceph-users] RBD as backend for iSCSI SAN Targets
On 03/15/2014 04:11 PM, Karol Kozubal wrote: Hi Everyone, I am just wondering if any of you are running a ceph cluster with an iSCSI target front end? I know this isn’t available out of the box, unfortunately in one particular use case we are looking at providing iSCSI access and it's a necessity. I am liking the idea of having rbd devices serving block level storage to the iSCSI Target servers while providing a unified backed for native rbd access by openstack and various application servers. On multiple levels this would reduce the complexity of our SAN environment and move us away from expensive proprietary solutions that don’t scale out. If any of you have deployed any HA iSCSI Targets backed by rbd I would really appreciate your feedback and any thoughts. I haven't used it in production, but a couple of things which come to mind: - Use TGT so you can run it all in userspace backed by librbd - Do not use writeback caching on the targets You could use multipathing if you don't use writeback caching. Use writeback would also cause data loss/corruption in case of multiple targets. It will probably just work with TGT, but I don't know anything about the performance. Karol ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD as backend for iSCSI SAN Targets
Hi Wido, I will have some new hardware for running tests in the next two weeks or so and will report my findings once I get a chance to run some tests. I will disable writeback on the target side as I will be attempting to configure an ssd caching pool of 24 ssd's with writeback for the main pool with 360 disks with a 5 osd spinners to 1 ssd journal ratio. I will be running everything through 10Gig SFP+ Ethernet interfaces with a dedicated cluster network interface, dedicated public ceph interface and a separate iscsi network also with 10 gig interfaces for the target machines. I am ideally looking for a 20,000 to 60,000 IOPS from this system if I can get the caching pool configuration right. The application has a 30ms max latency requirement for the storage. In my current tests I have only spinners with SAS 10K disks, 4.2ms write latency on the disks with separate journaling on SAS 15K disks with a 3.3ms write latency. With 20 OSDs and 4 Journals I am only concerned with the overall operation apply latency that I have been seeing (1-6ms idle is normal, but up to 60-170ms for a moderate workload using rbd bench-write) however I am on a network where I am bound to 1500 mtu and I will get to test jumbo frames with the next setup in addition to the ssd¹s. I suspect the overall performance will be good in the new test setup and I am curious to see what my tests will yield. Thanks for the response! Karol On 2014-03-15, 12:18 PM, "Wido den Hollander" wrote: >On 03/15/2014 04:11 PM, Karol Kozubal wrote: >> Hi Everyone, >> >> I am just wondering if any of you are running a ceph cluster with an >> iSCSI target front end? I know this isn¹t available out of the box, >> unfortunately in one particular use case we are looking at providing >> iSCSI access and it's a necessity. I am liking the idea of having rbd >> devices serving block level storage to the iSCSI Target servers while >> providing a unified backed for native rbd access by openstack and >> various application servers. On multiple levels this would reduce the >> complexity of our SAN environment and move us away from expensive >> proprietary solutions that don¹t scale out. >> >> If any of you have deployed any HA iSCSI Targets backed by rbd I would >> really appreciate your feedback and any thoughts. >> > >I haven't used it in production, but a couple of things which come to >mind: > >- Use TGT so you can run it all in userspace backed by librbd >- Do not use writeback caching on the targets > >You could use multipathing if you don't use writeback caching. Use >writeback would also cause data loss/corruption in case of multiple >targets. > >It will probably just work with TGT, but I don't know anything about the >performance. > >> Karol >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > >-- >Wido den Hollander >42on B.V. > >Phone: +31 (0)20 700 9902 >Skype: contact42on >___ >ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD as backend for iSCSI SAN Targets
On 03/15/2014 05:40 PM, Karol Kozubal wrote: Hi Wido, I will have some new hardware for running tests in the next two weeks or so and will report my findings once I get a chance to run some tests. I will disable writeback on the target side as I will be attempting to configure an ssd caching pool of 24 ssd's with writeback for the main pool with 360 disks with a 5 osd spinners to 1 ssd journal ratio. I will be How are the SSDs going to be in writeback? Is that the new caching pool feature? running everything through 10Gig SFP+ Ethernet interfaces with a dedicated cluster network interface, dedicated public ceph interface and a separate iscsi network also with 10 gig interfaces for the target machines. That seems like a good network. I am ideally looking for a 20,000 to 60,000 IOPS from this system if I can get the caching pool configuration right. The application has a 30ms max latency requirement for the storage. 20.000 to 60.000 is a big difference. But the only way you are going to achieve that is by doing a lot of parellel I/O. Ceph doesn't excel in single threads doing a lot of I/O. So if you have multiple RBD devices on which you are doing the I/O it shouldn't be that much of a problem. Just spread out the I/O. Scale horizontal instead of vertical. In my current tests I have only spinners with SAS 10K disks, 4.2ms write latency on the disks with separate journaling on SAS 15K disks with a 3.3ms write latency. With 20 OSDs and 4 Journals I am only concerned with the overall operation apply latency that I have been seeing (1-6ms idle is normal, but up to 60-170ms for a moderate workload using rbd bench-write) however I am on a network where I am bound to 1500 mtu and I will get to test jumbo frames with the next setup in addition to the ssd¹s. I suspect the overall performance will be good in the new test setup and I am curious to see what my tests will yield. Thanks for the response! Karol On 2014-03-15, 12:18 PM, "Wido den Hollander" wrote: On 03/15/2014 04:11 PM, Karol Kozubal wrote: Hi Everyone, I am just wondering if any of you are running a ceph cluster with an iSCSI target front end? I know this isn¹t available out of the box, unfortunately in one particular use case we are looking at providing iSCSI access and it's a necessity. I am liking the idea of having rbd devices serving block level storage to the iSCSI Target servers while providing a unified backed for native rbd access by openstack and various application servers. On multiple levels this would reduce the complexity of our SAN environment and move us away from expensive proprietary solutions that don¹t scale out. If any of you have deployed any HA iSCSI Targets backed by rbd I would really appreciate your feedback and any thoughts. I haven't used it in production, but a couple of things which come to mind: - Use TGT so you can run it all in userspace backed by librbd - Do not use writeback caching on the targets You could use multipathing if you don't use writeback caching. Use writeback would also cause data loss/corruption in case of multiple targets. It will probably just work with TGT, but I don't know anything about the performance. Karol ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD as backend for iSCSI SAN Targets
How are the SSDs going to be in writeback? Is that the new caching pool Feature? I am not sure what version implemented this, but it is documented here (https://ceph.com/docs/master/dev/cache-pool/). I will be using the latest stable release for my next batch of testing, right now I am on 0.67.4 and I will be moving towards the 0.72.x branch. As for the IOPS, it would be a total cluster IO throughput estimate based on an application that would be reading/writing to more than 60 rbd volumes. On 2014-03-15, 1:11 PM, "Wido den Hollander" wrote: >On 03/15/2014 05:40 PM, Karol Kozubal wrote: >> Hi Wido, >> >> I will have some new hardware for running tests in the next two weeks or >> so and will report my findings once I get a chance to run some tests. I >> will disable writeback on the target side as I will be attempting to >> configure an ssd caching pool of 24 ssd's with writeback for the main >>pool >> with 360 disks with a 5 osd spinners to 1 ssd journal ratio. I will be > >How are the SSDs going to be in writeback? Is that the new caching pool >feature? > >> running everything through 10Gig SFP+ Ethernet interfaces with a >>dedicated >> cluster network interface, dedicated public ceph interface and a >>separate >> iscsi network also with 10 gig interfaces for the target machines. >> > >That seems like a good network. > >> I am ideally looking for a 20,000 to 60,000 IOPS from this system if I >>can >> get the caching pool configuration right. The application has a 30ms max >> latency requirement for the storage. >> > >20.000 to 60.000 is a big difference. But the only way you are going to >achieve that is by doing a lot of parellel I/O. Ceph doesn't excel in >single threads doing a lot of I/O. > >So if you have multiple RBD devices on which you are doing the I/O it >shouldn't be that much of a problem. > >Just spread out the I/O. Scale horizontal instead of vertical. > >> In my current tests I have only spinners with SAS 10K disks, 4.2ms write >> latency on the disks with separate journaling on SAS 15K disks with a >> 3.3ms write latency. With 20 OSDs and 4 Journals I am only concerned >>with >> the overall operation apply latency that I have been seeing (1-6ms idle >>is >> normal, but up to 60-170ms for a moderate workload using rbd >>bench-write) >> however I am on a network where I am bound to 1500 mtu and I will get to >> test jumbo frames with the next setup in addition to the ssd¹s. I >>suspect >> the overall performance will be good in the new test setup and I am >> curious to see what my tests will yield. >> >> Thanks for the response! >> >> Karol >> >> >> >> On 2014-03-15, 12:18 PM, "Wido den Hollander" wrote: >> >>> On 03/15/2014 04:11 PM, Karol Kozubal wrote: Hi Everyone, I am just wondering if any of you are running a ceph cluster with an iSCSI target front end? I know this isn¹t available out of the box, unfortunately in one particular use case we are looking at providing iSCSI access and it's a necessity. I am liking the idea of having rbd devices serving block level storage to the iSCSI Target servers while providing a unified backed for native rbd access by openstack and various application servers. On multiple levels this would reduce the complexity of our SAN environment and move us away from expensive proprietary solutions that don¹t scale out. If any of you have deployed any HA iSCSI Targets backed by rbd I would really appreciate your feedback and any thoughts. >>> >>> I haven't used it in production, but a couple of things which come to >>> mind: >>> >>> - Use TGT so you can run it all in userspace backed by librbd >>> - Do not use writeback caching on the targets >>> >>> You could use multipathing if you don't use writeback caching. Use >>> writeback would also cause data loss/corruption in case of multiple >>> targets. >>> >>> It will probably just work with TGT, but I don't know anything about >>>the >>> performance. >>> Karol ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> -- >>> Wido den Hollander >>> 42on B.V. >>> >>> Phone: +31 (0)20 700 9902 >>> Skype: contact42on >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > >-- >Wido den Hollander >42on B.V. > >Phone: +31 (0)20 700 9902 >Skype: contact42on >___ >ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD as backend for iSCSI SAN Targets
I just re-read the documentation… It looks like its a proposed feature that is in development. I will have to adjust my test in consequence in that case. Any one out there have any ideas when this will be implemented? Or what the plans look like as of right now? On 2014-03-15, 1:17 PM, "Karol Kozubal" wrote: >How are the SSDs going to be in writeback? Is that the new caching pool >Feature? > >I am not sure what version implemented this, but it is documented here >(https://ceph.com/docs/master/dev/cache-pool/). >I will be using the latest stable release for my next batch of testing, >right now I am on 0.67.4 and I will be moving towards the 0.72.x branch. > >As for the IOPS, it would be a total cluster IO throughput estimate based >on an application that would be reading/writing to more than 60 rbd >volumes. > > > > > >On 2014-03-15, 1:11 PM, "Wido den Hollander" wrote: > >>On 03/15/2014 05:40 PM, Karol Kozubal wrote: >>> Hi Wido, >>> >>> I will have some new hardware for running tests in the next two weeks >>>or >>> so and will report my findings once I get a chance to run some tests. I >>> will disable writeback on the target side as I will be attempting to >>> configure an ssd caching pool of 24 ssd's with writeback for the main >>>pool >>> with 360 disks with a 5 osd spinners to 1 ssd journal ratio. I will be >> >>How are the SSDs going to be in writeback? Is that the new caching pool >>feature? >> >>> running everything through 10Gig SFP+ Ethernet interfaces with a >>>dedicated >>> cluster network interface, dedicated public ceph interface and a >>>separate >>> iscsi network also with 10 gig interfaces for the target machines. >>> >> >>That seems like a good network. >> >>> I am ideally looking for a 20,000 to 60,000 IOPS from this system if I >>>can >>> get the caching pool configuration right. The application has a 30ms >>>max >>> latency requirement for the storage. >>> >> >>20.000 to 60.000 is a big difference. But the only way you are going to >>achieve that is by doing a lot of parellel I/O. Ceph doesn't excel in >>single threads doing a lot of I/O. >> >>So if you have multiple RBD devices on which you are doing the I/O it >>shouldn't be that much of a problem. >> >>Just spread out the I/O. Scale horizontal instead of vertical. >> >>> In my current tests I have only spinners with SAS 10K disks, 4.2ms >>>write >>> latency on the disks with separate journaling on SAS 15K disks with a >>> 3.3ms write latency. With 20 OSDs and 4 Journals I am only concerned >>>with >>> the overall operation apply latency that I have been seeing (1-6ms idle >>>is >>> normal, but up to 60-170ms for a moderate workload using rbd >>>bench-write) >>> however I am on a network where I am bound to 1500 mtu and I will get >>>to >>> test jumbo frames with the next setup in addition to the ssd¹s. I >>>suspect >>> the overall performance will be good in the new test setup and I am >>> curious to see what my tests will yield. >>> >>> Thanks for the response! >>> >>> Karol >>> >>> >>> >>> On 2014-03-15, 12:18 PM, "Wido den Hollander" wrote: >>> On 03/15/2014 04:11 PM, Karol Kozubal wrote: > Hi Everyone, > > I am just wondering if any of you are running a ceph cluster with an > iSCSI target front end? I know this isn¹t available out of the box, > unfortunately in one particular use case we are looking at providing > iSCSI access and it's a necessity. I am liking the idea of having rbd > devices serving block level storage to the iSCSI Target servers while > providing a unified backed for native rbd access by openstack and > various application servers. On multiple levels this would reduce the > complexity of our SAN environment and move us away from expensive > proprietary solutions that don¹t scale out. > > If any of you have deployed any HA iSCSI Targets backed by rbd I >would > really appreciate your feedback and any thoughts. > I haven't used it in production, but a couple of things which come to mind: - Use TGT so you can run it all in userspace backed by librbd - Do not use writeback caching on the targets You could use multipathing if you don't use writeback caching. Use writeback would also cause data loss/corruption in case of multiple targets. It will probably just work with TGT, but I don't know anything about the performance. > Karol > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >>-- >>Wido den Hollander >>42on B.V. >> >>Phon
Re: [ceph-users] RBD as backend for iSCSI SAN Targets
On Sat, 15 Mar 2014, Karol Kozubal wrote: > I just re-read the documentation… It looks like its a proposed feature > that is in development. I will have to adjust my test in consequence in > that case. > > Any one out there have any ideas when this will be implemented? Or what > the plans look like as of right now? This will appear in 0.78, which will be out in the next week. sage > > > > On 2014-03-15, 1:17 PM, "Karol Kozubal" wrote: > > >How are the SSDs going to be in writeback? Is that the new caching pool > >Feature? > > > >I am not sure what version implemented this, but it is documented here > >(https://ceph.com/docs/master/dev/cache-pool/). > >I will be using the latest stable release for my next batch of testing, > >right now I am on 0.67.4 and I will be moving towards the 0.72.x branch. > > > >As for the IOPS, it would be a total cluster IO throughput estimate based > >on an application that would be reading/writing to more than 60 rbd > >volumes. > > > > > > > > > > > >On 2014-03-15, 1:11 PM, "Wido den Hollander" wrote: > > > >>On 03/15/2014 05:40 PM, Karol Kozubal wrote: > >>> Hi Wido, > >>> > >>> I will have some new hardware for running tests in the next two weeks > >>>or > >>> so and will report my findings once I get a chance to run some tests. I > >>> will disable writeback on the target side as I will be attempting to > >>> configure an ssd caching pool of 24 ssd's with writeback for the main > >>>pool > >>> with 360 disks with a 5 osd spinners to 1 ssd journal ratio. I will be > >> > >>How are the SSDs going to be in writeback? Is that the new caching pool > >>feature? > >> > >>> running everything through 10Gig SFP+ Ethernet interfaces with a > >>>dedicated > >>> cluster network interface, dedicated public ceph interface and a > >>>separate > >>> iscsi network also with 10 gig interfaces for the target machines. > >>> > >> > >>That seems like a good network. > >> > >>> I am ideally looking for a 20,000 to 60,000 IOPS from this system if I > >>>can > >>> get the caching pool configuration right. The application has a 30ms > >>>max > >>> latency requirement for the storage. > >>> > >> > >>20.000 to 60.000 is a big difference. But the only way you are going to > >>achieve that is by doing a lot of parellel I/O. Ceph doesn't excel in > >>single threads doing a lot of I/O. > >> > >>So if you have multiple RBD devices on which you are doing the I/O it > >>shouldn't be that much of a problem. > >> > >>Just spread out the I/O. Scale horizontal instead of vertical. > >> > >>> In my current tests I have only spinners with SAS 10K disks, 4.2ms > >>>write > >>> latency on the disks with separate journaling on SAS 15K disks with a > >>> 3.3ms write latency. With 20 OSDs and 4 Journals I am only concerned > >>>with > >>> the overall operation apply latency that I have been seeing (1-6ms idle > >>>is > >>> normal, but up to 60-170ms for a moderate workload using rbd > >>>bench-write) > >>> however I am on a network where I am bound to 1500 mtu and I will get > >>>to > >>> test jumbo frames with the next setup in addition to the ssd¹s. I > >>>suspect > >>> the overall performance will be good in the new test setup and I am > >>> curious to see what my tests will yield. > >>> > >>> Thanks for the response! > >>> > >>> Karol > >>> > >>> > >>> > >>> On 2014-03-15, 12:18 PM, "Wido den Hollander" wrote: > >>> > On 03/15/2014 04:11 PM, Karol Kozubal wrote: > > Hi Everyone, > > > > I am just wondering if any of you are running a ceph cluster with an > > iSCSI target front end? I know this isn¹t available out of the box, > > unfortunately in one particular use case we are looking at providing > > iSCSI access and it's a necessity. I am liking the idea of having rbd > > devices serving block level storage to the iSCSI Target servers while > > providing a unified backed for native rbd access by openstack and > > various application servers. On multiple levels this would reduce the > > complexity of our SAN environment and move us away from expensive > > proprietary solutions that don¹t scale out. > > > > If any of you have deployed any HA iSCSI Targets backed by rbd I > >would > > really appreciate your feedback and any thoughts. > > > > I haven't used it in production, but a couple of things which come to > mind: > > - Use TGT so you can run it all in userspace backed by librbd > - Do not use writeback caching on the targets > > You could use multipathing if you don't use writeback caching. Use > writeback would also cause data loss/corruption in case of multiple > targets. > > It will probably just work with TGT, but I don't know anything about > the > performance. > > > Karol > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/
Re: [ceph-users] No more Journals ?
Hello Everyone If you see ceph day presentation delivered by Sebastien ( slide number 23 ) http://www.slideshare.net/Inktank_Ceph/ceph-performance It looks like Firefly has dropped support to Journals , How concrete is this news ??? -Karan- On 14 Mar 2014, at 15:35, Jake Young wrote: > You should take a look at this blog post: > > http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/ > > The test results shows that using a RAID card with a write-back cache without > journal disks can perform better or equivalent to using journal disks with > XFS. > > As to whether or not it’s better to buy expensive controllers and use all of > your drive bays for spinning disks or cheap controllers and use some portion > of your bays for SSDs/Journals, there are trade-offs. If built right, > systems with SSD journals provide higher large block write throughput, while > putting journals on the data disks provides higher storage density. Without > any tuning both solutions currently provide similar IOP throughput. > > Jake > > > On Friday, March 14, 2014, Markus Goldberg wrote: > Sorry, > i should have asked a little bit clearer: > Can ceph (or OSDs) be used without journals now ? > The Journal-Parameter seems to be optional ( because of '[...]' ) > > Markus > Am 14.03.2014 12:19, schrieb John Spray: > Journals have not gone anywhere, and ceph-deploy still supports > specifying them with exactly the same syntax as before. > > The page you're looking at is the simplified "quick start", the detail > on osd creation including journals is here: > http://eu.ceph.com/docs/v0.77/rados/deployment/ceph-deploy-osd/ > > Cheers, > John > > On Fri, Mar 14, 2014 at 9:47 AM, Markus Goldberg > wrote: > Hi, > i'm a little bit surprised. I read through the new manuals of 0.77 > (http://eu.ceph.com/docs/v0.77/start/quick-ceph-deploy/) > In the section of creating the osd the manual says: > > Then, from your admin node, use ceph-deploy to prepare the OSDs. > > ceph-deploy osd prepare {ceph-node}:/path/to/directory > > For example: > > ceph-deploy osd prepare node2:/var/local/osd0 node3:/var/local/osd1 > > Finally, activate the OSDs. > > ceph-deploy osd activate {ceph-node}:/path/to/directory > > For example: > > ceph-deploy osd activate node2:/var/local/osd0 node3:/var/local/osd1 > > > In former versions the osd was created like: > > ceph-deploy -v --overwrite-conf osd --fs-type btrfs prepare > bd-0:/dev/sdb:/dev/sda5 > > ^^ Journal > As i remember defining and creating a journal for each osd was a must. > > So the question is: Are Journals obsolet now ? > > -- > MfG, >Markus Goldberg > > -- > Markus Goldberg Universität Hildesheim >Rechenzentrum > Tel +49 5121 88392822 Marienburger Platz 22, D-31141 Hildesheim, Germany > Fax +49 5121 88392823 email goldb...@uni-hildesheim.de > -- > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > MfG, > Markus Goldberg > > -- > Markus Goldberg Universität Hildesheim > Rechenzentrum > Tel +49 5121 88392822 Marienburger Platz 22, D-31141 Hildesheim, Germany > Fax +49 5121 88392823 email goldb...@uni-hildesheim.de > -- > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] No more Journals ?
Hi, This is the new objectstore multi backend, instead of using a filesystem (xfs,btrfs) , you can use leveldb,rocksdb,... which don't need journal, because operations are atomic. I think it should be release with firefly, if I remember. About this, can somebody tell me if write are same speed, for osd xfs HDD+journal on ssd vs osd leveldb HDD ? (As journal on ssd should improve latencies) - Mail original - De: "Karan Singh" À: "Jake Young" , ceph-users@lists.ceph.com, "Sebastien Han" , "Markus Goldberg" Envoyé: Samedi 15 Mars 2014 19:07:56 Objet: Re: [ceph-users] No more Journals ? Hello Everyone If you see ceph day presentation delivered by Sebastien ( slide number 23 ) http://www.slideshare.net/Inktank_Ceph/ceph-performance It looks like Firefly has dropped support to Journals , How concrete is this news ??? -Karan- On 14 Mar 2014, at 15:35, Jake Young < jak3...@gmail.com > wrote: You should take a look at this blog post: http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/ The test results shows that using a RAID card with a write-back cache without journal disks can perform better or equivalent to using journal disks with XFS. As to whether or not it’s better to buy expensive controllers and use all of your drive bays for spinning disks or cheap controllers and use some portion of your bays for SSDs/Journals, there are trade-offs. If built right, systems with SSD journals provide higher large block write throughput, while putting journals on the data disks provides higher storage density. Without any tuning both solutions currently provide similar IOP throughput . Jake On Friday, March 14, 2014, Markus Goldberg < goldb...@uni-hildesheim.de > wrote: Sorry, i should have asked a little bit clearer: Can ceph (or OSDs) be used without journals now ? The Journal-Parameter seems to be optional ( because of '[...]' ) Markus Am 14.03.2014 12:19, schrieb John Spray: Journals have not gone anywhere, and ceph-deploy still supports specifying them with exactly the same syntax as before. The page you're looking at is the simplified "quick start", the detail on osd creation including journals is here: http://eu.ceph.com/docs/v0.77/ rados/deployment/ceph-deploy- osd/ Cheers, John On Fri, Mar 14, 2014 at 9:47 AM, Markus Goldberg < goldb...@uni-hildesheim.de > wrote: Hi, i'm a little bit surprised. I read through the new manuals of 0.77 ( http://eu.ceph.com/docs/v0. 77/start/quick-ceph-deploy/ ) In the section of creating the osd the manual says: Then, from your admin node, use ceph-deploy to prepare the OSDs. ceph-deploy osd prepare {ceph-node}:/path/to/directory For example: ceph-deploy osd prepare node2:/var/local/osd0 node3:/var/local/osd1 Finally, activate the OSDs. ceph-deploy osd activate {ceph-node}:/path/to/directory For example: ceph-deploy osd activate node2:/var/local/osd0 node3:/var/local/osd1 In former versions the osd was created like: ceph-deploy -v --overwrite-conf osd --fs-type btrfs prepare bd-0:/dev/sdb:/dev/sda5 ^^ Journal As i remember defining and creating a journal for each osd was a must. So the question is: Are Journals obsolet now ? -- MfG, Markus Goldberg -- -- -- Markus Goldberg Universität Hildesheim Rechenzentrum Tel +49 5121 88392822 Marienburger Platz 22, D-31141 Hildesheim, Germany Fax +49 5121 88392823 email goldb...@uni-hildesheim.de -- -- -- __ _ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/ listinfo.cgi/ceph-users-ceph. com -- MfG, Markus Goldberg -- -- -- Markus Goldberg Universität Hildesheim Rechenzentrum Tel +49 5121 88392822 Marienburger Platz 22, D-31141 Hildesheim, Germany Fax +49 5121 88392823 email goldb...@uni-hildesheim.de -- -- -- __ _ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/ listinfo.cgi/ceph-users-ceph. com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] AWS SDK and multipart upload
Just FYI for any who might be using the AWS Java SDK with rgw. There is a bug in older versions of the AWS SDK in the CompleteMultipartUpload call. The Etag that is sent in the manifest is not formatted correctly. This will cause rgw to return a 400. e.g. T 192.168.1.16:46532 -> 192.168.1.51:80 [AP] POST /ed19074e-ed9a-488e-96e0-0d29b3704717/5bc1961c-215a-449e-9df4-c80b50ecfe64-multi-1394917767?uploadId=huS2ydwnDmOJ0vqu_CDCRJIIxvQuXfe HTTP/1.1. Host: 192.168.1.51. Authorization: AWS foo:bar. Date: Sat, 15 Mar 2014 21:09:28 GMT. User-Agent: aws-sdk-java/1.5.0 Linux/2.6.32-431.5.1.el6.x86_64 OpenJDK_64-Bit_Server_VM/24.45-b08/1.7.0_51. Content-Type: text/plain. Content-Length: 147. Connection: Keep-Alive. . 1"e;cd3573ccd5891f07fcd519881cc74738"e; The ""e;" above should be """ I moved to version 1.7.1 of the AWS SDK and multipart worked just fine. If you already knew about it, please ignore :) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com