[ceph-users] CephFS emptying files or silently failing to mount?
howdy, y'all. we are testing ceph and all of its features. we love RBD! however cephFS, though clearly stated not production ready, has been stonewalling us. in an attempt to get rolling quickly, we followed some guides on CephFS ( http://goo.gl/BmVxG, http://goo.gl/1VtNk). when i mount CephFS, i can: 1) create a file with contents on clientA; 2) list the folder and see the file on clientB and clientC; 3) pull up the exact same xattr's on that file across all clients. however, as soon as i try to modify or view the contents of the file, ubuntu 12.04 says "operation not permitted". only clientA, which made the file, can try to view it's contents, but when clientA does so the file appears empty. a remount of the share on any client makes no difference. for what it is worth, xattrs don't change upon remount. what in the world are we doing wrong? cephx is enabled. works great for RBD. our ceph auth command for cephFS user: ceph auth get-or-create client.cephfs mds 'allow *' mon 'allow r' osd 'allow rwx pool=cephfs' we are allowing ceph to determine what pool to use, which i think defaults to 0/"data". mount command: mount -t ceph 10.0.10.53:6789:/ /mnt/test/ -o name=cephfs,secretfile=/etc/ceph/client.cephfs,noatime mount path exists; permissions are consistent across all clients. "/etc/ceph/client.cephfs" is populated with the output of "ceph auth print-key client.cephfs". all my clients are currently ubuntu 12.04 LTS within virtual machines. our company has not dedicated physical servers to this project. kernel for all clients: Linux os-comp2 3.5.0-30-generic #51~precise1-Ubuntu SMP Wed May 15 08:48:19 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux xattrs for test file (removed the hex code): # xattr -l testfile ceph.file.layout: chunk_bytes=4194304.stripe_count=1.object_size=4194304. ceph.layout: chunk_bytes=4194304.stripe_count=1.object_size=4194304. validating ceph actually created the file, in lue of http://tracker.ceph.com/issues/2753 : # cephfs testfile show_location location.file_offset: 0 location.object_offset:0 location.object_no:0 location.object_size: 4194304 location.object_name: 100.0000 location.block_offset: 0 location.block_size: 4194304 location.osd: 4 -bo -- "But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Much more then, having now been justified by His blood, we shall be saved from the wrath of God through Him." Romans 5:8-9 *All *have sinned, broken God's law, and deserve eternal torment. Jesus Christ, the Son of God, died for the sins of those that will believe, purchasing our salvation, and defeated death so that we all may spend eternity in heaven. Do you desire freedom from hell and be with God in His love for eternity? "If you confess with your mouth Jesus as Lord, and believe in your heart that God raised Him from the dead, you will be saved." Romans 10:9 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS emptying files or silently failing to mount?
Holy cow. Thank you for pointing out what should have been obvious. So glad these emails are kept on the web for future searchers like me ;) -bo On Tue, Jun 11, 2013 at 11:46 AM, Gregory Farnum wrote: > On Tue, Jun 11, 2013 at 9:39 AM, Bo wrote: > > howdy, y'all. > > > > we are testing ceph and all of its features. we love RBD! however cephFS, > > though clearly stated not production ready, has been stonewalling us. in > an > > attempt to get rolling quickly, we followed some guides on CephFS > > (http://goo.gl/BmVxG, http://goo.gl/1VtNk). > > > > when i mount CephFS, i can: > > 1) create a file with contents on clientA; > > 2) list the folder and see the file on clientB and clientC; > > 3) pull up the exact same xattr's on that file across all clients. > > > > however, as soon as i try to modify or view the contents of the file, > ubuntu > > 12.04 says "operation not permitted". only clientA, which made the file, > can > > try to view it's contents, but when clientA does so the file appears > empty. > > a remount of the share on any client makes no difference. for what it is > > worth, xattrs don't change upon remount. > > > > what in the world are we doing wrong? > > > > cephx is enabled. works great for RBD. our ceph auth command for cephFS > > user: > > ceph auth get-or-create client.cephfs mds 'allow *' mon 'allow r' osd > 'allow > > rwx pool=cephfs' > > > > we are allowing ceph to determine what pool to use, which i think > defaults > > to 0/"data". > > Yep, that's why. Your cephx key only lets the client touch the > "cephfs" pool, but the metadata server is telling the client to put > its file data in the "data" pool. The client buffers up writes, sends > all the metadata to the server (where it is successfully stored), but > when it tries to flush anything out it fails to do so. You'll need to > resolve that discrepancy, either by reconfiguring CephFS to store all > the file data in the "cephfs" pool (look up the newfs command in the > docs, or just set a new layout on the root dir), or adding the "data" > pool to the list that the clients can access. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > -- "But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Much more then, having now been justified by His blood, we shall be saved from the wrath of God through Him." Romans 5:8-9 *All *have sinned, broken God's law, and deserve eternal torment. Jesus Christ, the Son of God, died for the sins of those that will believe, purchasing our salvation, and defeated death so that we all may spend eternity in heaven. Do you desire freedom from hell and be with God in His love for eternity? "If you confess with your mouth Jesus as Lord, and believe in your heart that God raised Him from the dead, you will be saved." Romans 10:9 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Live Migrations with cephFS
If I am not mistaken, one would need to modify OpenStack source to force Nova to boot from RBD volumes. Is this no longer the case? Modifying OpenStack's source is a wonderful idea especially if you push your changes upstream for review. However, it does add to your work when you want to pull updated code from upstream into your deployment. -bo On 14.06.2013 12:55, Alvaro Izquierdo Jimeno wrote: > By default, openstack uses NFS but… other options are available….can we > use cephFS instead of NFS? Wouldn't you use qemu-rbd for your virtual guests in OpenStack? AFAIK CephFS is not needed for KVM/qemu virtual machines. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- "But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Much more then, having now been justified by His blood, we shall be saved from the wrath of God through Him." Romans 5:8-9 *All *have sinned, broken God's law, and deserve eternal torment. Jesus Christ, the Son of God, died for the sins of those that will believe, purchasing our salvation, and defeated death so that we all may spend eternity in heaven. Do you desire freedom from hell and be with God in His love for eternity? "If you confess with your mouth Jesus as Lord, and believe in your heart that God raised Him from the dead, you will be saved." Romans 10:9 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Live Migrations with cephFS
Thank you, Sebastien Han. I am sure many are thankful you've published your thoughts and experiences with Ceph and even OpemStack. If I may, I would like to reword my question/statement with greater clarity: To force all instances to always boot from RBD volumes, would a person would have to make changes to something more than Horizon (demonstration GUI)? If the changes need only be in Horizon, the provider would then likely need to restrict or deny their customers access to their unmodified APIs. If they do not, then the unchanged APIs would allow for behavior the provider does not want. Thoughts? Corrections? Feel free to teach. -bo On Jun 16, 2013 9:44 AM, "Sebastien Han" wrote: > In OpenStack, a VM booted from a volume (where the disk is located on RBD) > supports the live-migration without any problems. > > > Sébastien Han > Cloud Engineer > > "Always give 100%. Unless you're giving blood." > > > > > > > > > > *Phone : *+33 (0)1 49 70 99 72 – *Mobile : *+33 (0)6 52 84 44 70 > *Email :* sebastien@enovance.com – *Skype : *han.sbastien > *Address :* 10, rue de la Victoire – 75009 Paris > *Web : *www.enovance.com – *Twitter : *@enovance > > On Jun 14, 2013, at 11:36 PM, Bo wrote: > > If I am not mistaken, one would need to modify OpenStack source to force > Nova to boot from RBD volumes. Is this no longer the case? > > Modifying OpenStack's source is a wonderful idea especially if you push > your changes upstream for review. However, it does add to your work when > you want to pull updated code from upstream into your deployment. > > -bo > > > > > On 14.06.2013 12:55, Alvaro Izquierdo Jimeno wrote: > > > By default, openstack uses NFS but… other options are available….can we > > use cephFS instead of NFS? > > Wouldn't you use qemu-rbd for your virtual guests in OpenStack? > AFAIK CephFS is not needed for KVM/qemu virtual machines. > > Regards > -- > Robert Sander > Heinlein Support GmbH > Schwedter Str. 8/9b, 10119 Berlin > > http://www.heinlein-support.de > > Tel: 030 / 405051-43 > Fax: 030 / 405051-19 > > Zwangsangaben lt. §35a GmbHG: > HRB 93818 B / Amtsgericht Berlin-Charlottenburg, > Geschäftsführer: Peer Heinlein -- Sitz: Berlin > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- > > "But God demonstrates His own love toward us, in that while we were yet > sinners, Christ died for us. Much more then, having now been justified by > His blood, we shall be saved from the wrath of God through Him." Romans > 5:8-9 > All have sinned, broken God's law, and deserve eternal torment. Jesus > Christ, the Son of God, died for the sins of those that will believe, > purchasing our salvation, and defeated death so that we all may spend > eternity in heaven. Do you desire freedom from hell and be with God in His > love for eternity? > "If you confess with your mouth Jesus as Lord, and believe in your heart > that God raised Him from the dead, you will be saved." Romans 10:9 > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > <>___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] MON quorum a single point of failure?
Howdy! Loving working with ceph; learning a lot. :) I am curious about the quorum process because I seem to get conflicting information from "experts". Those that I report to need a clear answer from me which I am currently unable to give. Ceph needs an odd number of monitors in any given cluster (3, 5, 7) to avoid split-brain syndrome. So what happens whenever I have 3 monitors, 1 dies, and I have 2 left? The information regarding this situation that I have gathered over the past few months all falls within these three categories: A) commonly "stated"--nothing is said. period. B) rarely stated--this is a bad situation (possibly split-brain). C) rarely stated--each monitor has a "rank", so the highest ranking monitor is the boss, thus quorum. Does anyone know with absolute certainty what ceph's quorum logic will do with an even number of (specifically 2) monitors left? You may say, "well, take down one of your monitors", to which I respectfully state that my testing is not an authoritative answer on what ceph is designed to do and what it does in production. My testing cannot cover the vast majority of cases covered by the hundreds/thousands who have had a monitor die. Thank you for your time and brain juice, -bo -- "But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Much more then, having now been justified by His blood, we shall be saved from the wrath of God through Him." Romans 5:8-9 *All *have sinned, broken God's law, and deserve eternal torment. Jesus Christ, the Son of God, died for the sins of those that will believe, purchasing our salvation, and defeated death so that we all may spend eternity in heaven. Do you desire freedom from hell and be with God in His love for eternity? "If you confess with your mouth Jesus as Lord, and believe in your heart that God raised Him from the dead, you will be saved." Romans 10:9 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MON quorum a single point of failure?
Thank you, Mike Sage and Greg. Completely different than everything I had heard or read. Clears it all up. :) Gracias, -bo On Thu, Jun 20, 2013 at 11:15 AM, Gregory Farnum wrote: > On Thursday, June 20, 2013, Bo wrote: > > > > Howdy! > > > > Loving working with ceph; learning a lot. :) > > > > I am curious about the quorum process because I seem to get conflicting > information from "experts". Those that I report to need a clear answer from > me which I am currently unable to give. > > > > Ceph needs an odd number of monitors in any given cluster (3, 5, 7) to > avoid split-brain syndrome. So what happens whenever I have 3 monitors, 1 > dies, and I have 2 left? > > > > The information regarding this situation that I have gathered over the > past few months all falls within these three categories: > > A) commonly "stated"--nothing is said. period. > > B) rarely stated--this is a bad situation (possibly split-brain). > > C) rarely stated--each monitor has a "rank", so the highest ranking > monitor is the boss, thus quorum. > > > > Does anyone know with absolute certainty what ceph's quorum logic will > do with an even number of (specifically 2) monitors left? > > > > You may say, "well, take down one of your monitors", to which I > respectfully state that my testing is not an authoritative answer on what > ceph is designed to do and what it does in production. My testing cannot > cover the vast majority of cases covered by the hundreds/thousands who have > had a monitor die. > > > > Thank you for your time and brain juice, > > -bo > > > This is often misunderstood, but the answers to your questions are > pretty simple. :) > > There is no risk of split brain in Ceph (so, not in the monitor either). > The mantra to use an odd number of monitors is *not* a system > requirement; it is a deployment recommendation. This is due to how the > cluster avoids split brain — using a Paxos variant in which a strict > majority of the monitors need to agree on everything. Using one > monitor, you can make forward progress if it's running; using two > monitors, you can afford for neither of them to die (because then you > only have 50% up); using three monitors you can lose one; using four > you can lose one; using five you can lose two; etc. So using an even > number of monitors increases your odds of failure without increasing > your survivability (in availability terms) of failure over the > previous odd number. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > -- "But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Much more then, having now been justified by His blood, we shall be saved from the wrath of God through Him." Romans 5:8-9 *All *have sinned, broken God's law, and deserve eternal torment. Jesus Christ, the Son of God, died for the sins of those that will believe, purchasing our salvation, and defeated death so that we all may spend eternity in heaven. Do you desire freedom from hell and be with God in His love for eternity? "If you confess with your mouth Jesus as Lord, and believe in your heart that God raised Him from the dead, you will be saved." Romans 10:9 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] HA and data recovery of CEPH
Hi all, We are working on use CEPH to build our HA system, the purpose is the system should always provide service even a node of CEPH is down or OSD is lost. Currently, as we practiced once a node/OSD is down, the CEPH cluster needs to take about 40 seconds to sync data, our system can't provide service during that. My questions: - Does there have any way that we can reduce the data sync time? - How can we let the CEPH keeps available once a node/OSD is down? BR -- The modern Unified Communications provider https://www.portsip.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HA and data recovery of CEPH
Hi Nathan, Thanks for the help. My colleague will provide more details. BR On Fri, Nov 29, 2019 at 12:57 PM Nathan Fish wrote: > If correctly configured, your cluster should have zero downtime from a > single OSD or node failure. What is your crush map? Are you using > replica or EC? If your 'min_size' is not smaller than 'size', then you > will lose availability. > > On Thu, Nov 28, 2019 at 10:50 PM Peng Bo wrote: > > > > Hi all, > > > > We are working on use CEPH to build our HA system, the purpose is the > system should always provide service even a node of CEPH is down or OSD is > lost. > > > > Currently, as we practiced once a node/OSD is down, the CEPH cluster > needs to take about 40 seconds to sync data, our system can't provide > service during that. > > > > My questions: > > > > Does there have any way that we can reduce the data sync time? > > How can we let the CEPH keeps available once a node/OSD is down? > > > > > > BR > > > > -- > > The modern Unified Communications provider > > > > https://www.portsip.com > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- The modern Unified Communications provider https://www.portsip.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HA and data recovery of CEPH
Thanks to all, now we can make that duration to 25 seconds around, this is the best result as we can. BR On Tue, Dec 3, 2019 at 10:30 PM Wido den Hollander wrote: > > > On 12/3/19 3:07 PM, Aleksey Gutikov wrote: > > > >> That is true. When an OSD goes down it will take a few seconds for it's > >> Placement Groups to re-peer with the other OSDs. During that period > >> writes to those PGs will stall for a couple of seconds. > >> > >> I wouldn't say it's 40s, but it can take ~10s. > > > > Hello, > > > > According to my experience, in case of OSD crashes, killed -9 (any kind > > abnormat termination) OSD failure handling contains next steps: > > 1) Failed OSD's peers detect that it does not respond - it can take up > > to osd_heartbeat_grace + osd_heartbeat_interval seconds > > If a 'Connection Refused' is detected the OSD will be marked as down > right away. > > > 2) Peers send reports to monitor > > 3) Monitor makes a decision according to (options from it's own config) > > mon_osd_adjust_heartbeat_grace, osd_heartbeat_grace, > > mon_osd_laggy_halflife, mon_osd_min_down_reporters, ... And finally mark > > OSD down in osdmap. > > True. > > > 4) Monitor send updated OSDmap to OSDs and clients > > 5) OSDs starting peering > > 5.1) Peering itself is complicated process, for example we had > > experienced PGs stuck in inactive state due to > > osd_max_pg_per_osd_hard_ratio. > > I would say that 5.1 isn't relevant for most cases. Yes, it can happen, > but it's rare. > > > 6) Peering finished (PGs' data continue moving) - clients can normally > > access affected PGs. Clients also have their own timeouts that can > > affect time to recover. > > > Again, according to my experience, 40s with default settings is possible. > > > > 40s is possible in certain scenarios. But I wouldn't say that's the > default for all cases. > > Wido > > > > -- The modern Unified Communications provider https://www.portsip.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com