Re: [ceph-users] [cephfs] About feature 'snapshot'
Snapshots are disabled by default in Jewel as well. Depending on user feedback about what's most important, we hope to have them ready for Kraken or the L release (but we'll see). -Greg On Friday, March 18, 2016, 施柏安 wrote: > Hi John, > Really thank you for your help, and sorry about that I ask such a stupid > question of setting... > So isn't this feature ready in Jewel? I found something info says that the > features(snapshot, quota...) become stable in Jewel > > Thank you > > 2016-03-18 21:07 GMT+09:00 John Spray >: > >> On Fri, Mar 18, 2016 at 1:33 AM, 施柏安 > > wrote: >> > Hi John, >> > How to set this feature on? >> >> ceph mds set allow_new_snaps true --yes-i-really-mean-it >> >> John >> >> > Thank you >> > >> > 2016-03-17 21:41 GMT+08:00 Gregory Farnum > >: >> >> >> >> On Thu, Mar 17, 2016 at 3:49 AM, John Spray > > wrote: >> >> > Snapshots are disabled by default: >> >> > >> >> > >> http://docs.ceph.com/docs/hammer/cephfs/early-adopters/#most-stable-configuration >> >> >> >> Which makes me wonder if we ought to be hiding the .snaps directory >> >> entirely in that case. I haven't previously thought about that, but it >> >> *is* a bit weird. >> >> -Greg >> >> >> >> > >> >> > John >> >> > >> >> > On Thu, Mar 17, 2016 at 10:02 AM, 施柏安 > > wrote: >> >> >> Hi all, >> >> >> I encounter a trouble about cephfs sanpshot. It seems that the >> folder >> >> >> '.snap' is exist. >> >> >> But I use 'll -a' can't let it show up. And I enter that folder and >> >> >> create >> >> >> folder in it, it showed something wrong to use snapshot. >> >> >> >> >> >> Please check : http://imgur.com/elZhQvD >> >> >> >> >> >> >> >> >> ___ >> >> >> ceph-users mailing list >> >> >> ceph-users@lists.ceph.com >> >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> >> > ___ >> >> > ceph-users mailing list >> >> > ceph-users@lists.ceph.com >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > >> > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ZFS or BTRFS for performance?
Hi, Le 18/03/2016 20:58, Mark Nelson a écrit : > FWIW, from purely a performance perspective Ceph usually looks pretty > fantastic on a fresh BTRFS filesystem. In fact it will probably > continue to look great until you do small random writes to large > objects (like say to blocks in an RBD volume). Then COW starts > fragmenting the objects into oblivion. I've seen sequential read > performance drop by 300% after 5 minutes of 4K random writes to the > same RBD blocks. > > Autodefrag might help. With 3.19 it wasn't enough for our workload and we had to develop our own defragmentation, see scheduler https://github.com/jtek/ceph-utils. We tried autodefrag again with a 4.0.5 kernel but it wasn't good enough yet (and based on my reading of the linux-btrfs list I don't think there is any work done on it currently). > A long time ago I recall Josef told me it was dangerous to use (I > think it could run the node out of memory and corrupt the FS), but it > may be that it's safer now. No problem here (as long as we use our defragmentation scheduler, otherwise the performance degrades over time/amount of rewrites). > In any event we don't really do a lot of testing with BTRFS these > days as bluestore is indeed the next gen OSD backend. Will bluestore provide the same protection against bitrot than BTRFS? Ie: with BTRFS the deep-scrubs detect inconsistencies *and* the OSD(s) with invalid data get IO errors when trying to read corrupted data and as such can't be used as the source for repairs even if they are primary OSD(s). So with BTRFS you get a pretty good overall protection against bitrot in Ceph (it allowed us to automate the repair process in the most common cases). With XFS IIRC unless you override the default behavior the primary OSD is always the source for repairs (even if all the secondaries agree on another version of the data). Best regards, Lionel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] inconsistent PG -> unfound objects on an erasure coded system
Thanks Sam. Since I have prepared a script for this, I decided to go ahead with the checks.(patience isn't one of my extended attributes) I've got a file that searches the full erasure encoded spaces and does your checklist below. I have operated only on one PG so far, the 70.459 one that we've been discussing.There was only the one file that I found to be out of place--the one we already discussed/found and it has been removed. The pg is still marked as inconsistent. I've scrubbed it a couple of times now and what I've seen is: 2016-03-17 09:29:53.202818 7f2e816f8700 0 log_channel(cluster) log [INF] : 70.459 deep-scrub starts 2016-03-17 09:36:38.436821 7f2e816f8700 -1 log_channel(cluster) log [ERR] : 70.459s0 deep-scrub stat mismatch, got 22319/22321 objects, 0/0 clones, 22319/22321 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 68440088914/68445454633 bytes,0/0 hit_set_archive bytes. 2016-03-17 09:36:38.436844 7f2e816f8700 -1 log_channel(cluster) log [ERR] : 70.459 deep-scrub 1 errors 2016-03-17 09:44:23.592302 7f2e816f8700 0 log_channel(cluster) log [INF] : 70.459 deep-scrub starts 2016-03-17 09:47:01.237846 7f2e816f8700 -1 log_channel(cluster) log [ERR] : 70.459s0 deep-scrub stat mismatch, got 22319/22321 objects, 0/0 clones, 22319/22321 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 68440088914/68445454633 bytes,0/0 hit_set_archive bytes. 2016-03-17 09:47:01.237880 7f2e816f8700 -1 log_channel(cluster) log [ERR] : 70.459 deep-scrub 1 errors Should the scrub be sufficient to remove the inconsistent flag? I took the osd offline during the repairs.I've looked at files in all of the osds in the placement group and I'm not finding any more problem files. The vast majority of files do not have the user.cephos.lfn3 attribute. There are 22321 objects that I seen and only about 230 have the user.cephos.lfn3 file attribute. The files will have other attributes, just not user.cephos.lfn3. Regards, Jeff On Wed, Mar 16, 2016 at 3:53 PM, Samuel Just wrote: > Ok, like I said, most files with _long at the end are *not orphaned*. > The generation number also is *not* an indication of whether the file > is orphaned -- some of the orphaned files will have > as the generation number and others won't. For each long filename > object in a pg you would have to: > 1) Pull the long name out of the attr > 2) Parse the hash out of the long name > 3) Turn that into a directory path > 4) Determine whether the file is at the right place in the path > 5) If not, remove it (or echo it to be checked) > > You probably want to wait for someone to get around to writing a > branch for ceph-objectstore-tool. Should happen in the next week or > two. > -Sam > > -- Jeffrey McDonald, PhD Assistant Director for HPC Operations Minnesota Supercomputing Institute University of Minnesota Twin Cities 599 Walter Library email: jeffrey.mcdon...@msi.umn.edu 117 Pleasant St SE phone: +1 612 625-6905 Minneapolis, MN 55455fax: +1 612 624-8861 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RGW quota
Hi, We have a user with a 50GB quota and has now a single bucket with 20GB of files. They had previous buckets created and removed but the quota has not decreased. I understand that we do garbage collection but it has been significantly longer than the defaults that we have not overridden. They get 403 QuotaExceeded when trying to write additional data to a new bucket or the existing bucket. # radosgw-admin user info --uid=username ... "user_quota": { "enabled": true, "max_size_kb": 52428800, "max_objects": -1 }, # radosgw-admin bucket stats --bucket=start ... "usage": { "rgw.main": { "size_kb": 21516505, "size_kb_actual": 21516992, "num_objects": 243 } }, # radosgw-admin user stats --uid=username ... { "stats": { "total_entries": 737, "total_bytes": 55060794604, "total_bytes_rounded": 55062102016 }, "last_stats_sync": "2016-03-16 14:16:25.205060Z", "last_stats_update": "2016-03-16 14:16:25.190605Z" } Thanks, derek -- Derek T. Yarnell University of Maryland Institute for Advanced Computer Studies ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] data corruption with hammer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Sage, You patch seems to have resolved the issue for us. We can't reproduce the problem with ceph_test_rados or our VM test. I also figured out that those are all backports that were cherry-picked so it was showing the original commit date. There was quite a bit of work on ReplicatedPG.cc since 0.94.6 so it probably only makes sense to wait for 0.94.7 for this fix. Thanks for looking into this so quick! As a work around for 0.94.6, our testing shows that min_read_recency_for_promote 1 does not have the corruption as it keeps the original behavior. Something for people to be aware of with 0.94.6 and using cache tiers. Hopefully there is a way to detect this in a unittest. -BEGIN PGP SIGNATURE- Version: Mailvelope v1.3.6 Comment: https://www.mailvelope.com wsFcBAEBCAAQBQJW6wILCRDmVDuy+mK58QAAcVQP/0t8jGZuwmwg2RIwkgjQ Kb3mIxvsmnA9BQ4dICJB3Wu6FPT1/V34t0ThASehWyVSJyiUkdf+pxhXbDaQ vOr4OOyTwCB2Ly6jaLEgAiyGTL45uOnMYcSttXPG95lilTb+oGUcqBdQzRbw yJHG18UiEgMvKnttFjTLbd1FjICIY7xkkP7lrdHvaqe200aqQmb+g8CHTVj/ HqzYm/gTs84c2vK+x/nV8OFxY9Yf5WAV+O7uozeWC3SAc2VMlQgi8rdng51N B+andt/SXgGq9VCDqdmEzcEpBN+2wK6usZQCZJmMXRmW4BXYVK4yAdfgKJOB MEUN2cDA1i7bMIUcDrh1hnqwEfizkbqOWXpgrgAkQYhtlbp/gvEucl5nYMUy kv9jNYg/KFQn9tzZqKWmvHj3sjl6DmOlN+A9XA2fGppOiiKk0s4dVKRDFwSJ LNxUIZm4CtAekaQ4KymE/hK6RhRU2REQl7qSMF+wtw73nhA9gzqP32Ag46yd WoeGpOngWRnMaejQfkuTSjiDSLvbCd7X5LM/WXH4dJHtHNSSA2qK3c4Nvvqp yDhvFLdvybtJvWj0+hHczpcP0VlFZH9s7uGWz0+cNabkRnm41EC2+XD6sJ5+ kinZO+CgjbC2AQPdoEKMuvRwBgnftH0YuZJFl0sQPkgBg23r+eCfIxfW/9v/ iLgk =6It+ -END PGP SIGNATURE- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Mar 17, 2016 at 11:55 AM, Robert LeBlanc wrote: > Cherry-picking that commit onto v0.94.6 wasn't clean so I'm just > building your branch. I'm not sure what the difference between your > branch and 0.94.6 is, I don't see any commits against > osd/ReplicatedPG.cc in the last 5 months other than the one you did > today. > > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Thu, Mar 17, 2016 at 11:38 AM, Robert LeBlanc wrote: >> Yep, let me pull and build that branch. I tried installing the dbg >> packages and running it in gdb, but it didn't load the symbols. >> >> Robert LeBlanc >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >> >> >> On Thu, Mar 17, 2016 at 11:36 AM, Sage Weil wrote: >>> On Thu, 17 Mar 2016, Robert LeBlanc wrote: Also, is this ceph_test_rados rewriting objects quickly? I think that the issue is with rewriting objects so if we can tailor the ceph_test_rados to do that, it might be easier to reproduce. >>> >>> It's doing lots of overwrites, yeah. >>> >>> I was albe to reproduce--thanks! It looks like it's specific to >>> hammer. The code was rewritten for jewel so it doesn't affect the >>> latest. The problem is that maybe_handle_cache may proxy the read and >>> also still try to handle the same request locally (if it doesn't trigger a >>> promote). >>> >>> Here's my proposed fix: >>> >>> https://github.com/ceph/ceph/pull/8187 >>> >>> Do you mind testing this branch? >>> >>> It doesn't appear to be directly related to flipping between writeback and >>> forward, although it may be that we are seeing two unrelated issues. I >>> seemed to be able to trigger it more easily when I flipped modes, but the >>> bug itself was a simple issue in the writeback mode logic. :/ >>> >>> Anyway, please see if this fixes it for you (esp with the RBD workload). >>> >>> Thanks! >>> sage >>> >>> >>> >>> Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Mar 17, 2016 at 11:05 AM, Robert LeBlanc wrote: > I'll miss the Ceph community as well. There was a few things I really > wanted to work in with Ceph. > > I got this: > > update_object_version oid 13 v 1166 (ObjNum 1028 snap 0 seq_num 1028) > dirty exists > 1038: left oid 13 (ObjNum 1028 snap 0 seq_num 1028) > 1040: finishing write tid 1 to nodez23350-256 > 1040: finishing write tid 2 to nodez23350-256 > 1040: finishing write tid 3 to nodez23350-256 > 1040: finishing write tid 4 to nodez23350-256 > 1040: finishing write tid 6 to nodez23350-256 > 1035: done (4 left) > 1037: done (3 left) > 1038: done (2 left) > 1043: read oid 430 snap -1 > 1043: expect (ObjNum 429 snap 0 seq_num 429) > 1040: finishing write tid 7 to nodez23350-256 > update_object_version oid 256 v 661 (ObjNum 1029 snap 0 seq_num 1029) > dirty exists > 1040: left oid 256 (ObjNum 1029 snap 0 seq_num 1029) > 1042: expect (ObjNum 664 snap 0 seq_num 664) > 1043: Error: oid 430 read returned error code -2 > ./test/osd/RadosModel.h: In function 'virtual void > ReadOp::_fini
[ceph-users] cephfs infernalis (ceph version 9.2.1) - bonnie++
Hi! Trying to run bonnie++ on cephfs mounted via the kernel driver on a centos 7.2.1511 machine resulted in: # bonnie++ -r 128 -u root -d /data/cephtest/bonnie2/ Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty Cleaning up test directory after error. # ceph -w cluster health HEALTH_OK monmap e3: 3 mons at {cestor4=:6789/0,cestor5=:6789/0,cestor6=:6789/0} election epoch 62, quorum 0,1,2 cestor4,cestor5,cestor6 mdsmap e30: 1/1/1 up {0=cestor2=up:active}, 1 up:standby osdmap e703: 60 osds: 60 up, 60 in flags sortbitwise pgmap v135437: 1344 pgs, 4 pools, 4315 GB data, 2315 kobjects 7262 GB used, 320 TB / 327 TB avail 1344 active+clean Any ideas? Gruesse Michael ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Local SSD cache for ceph on each compute node.
I’d rather like to see this implemented at the hypervisor level, i.e.: QEMU, so we can have a common layer for all the storage backends. Although this is less portable... > On 17 Mar 2016, at 11:00, Nick Fisk wrote: > > > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Daniel Niasoff >> Sent: 16 March 2016 21:02 >> To: Nick Fisk ; 'Van Leeuwen, Robert' >> ; 'Jason Dillaman' >> Cc: ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] Local SSD cache for ceph on each compute node. >> >> Hi Nick, >> >> Your solution requires manual configuration for each VM and cannot be >> setup as part of an automated OpenStack deployment. > > Absolutely, potentially flaky as well. > >> >> It would be really nice if it was a hypervisor based setting as opposed to > a VM >> based setting. > > Yes, I can't wait until we can just specify "rbd_cache_device=/dev/ssd" in > the ceph.conf and get it to write to that instead. Ideally ceph would also > provide some sort of lightweight replication for the cache devices, but > otherwise a iSCSI SSD farm or switched SAS could be used so that the caching > device is not tied to one physical host. > >> >> Thanks >> >> Daniel >> >> -Original Message- >> From: Nick Fisk [mailto:n...@fisk.me.uk] >> Sent: 16 March 2016 08:59 >> To: Daniel Niasoff ; 'Van Leeuwen, Robert' >> ; 'Jason Dillaman' >> Cc: ceph-users@lists.ceph.com >> Subject: RE: [ceph-users] Local SSD cache for ceph on each compute node. >> >> >> >>> -Original Message- >>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf >>> Of Daniel Niasoff >>> Sent: 16 March 2016 08:26 >>> To: Van Leeuwen, Robert ; Jason Dillaman >>> >>> Cc: ceph-users@lists.ceph.com >>> Subject: Re: [ceph-users] Local SSD cache for ceph on each compute node. >>> >>> Hi Robert, >>> Caching writes would be bad because a hypervisor failure would result in >>> loss of the cache which pretty much guarantees inconsistent data on >>> the ceph volume. Also live-migration will become problematic compared to running >>> everything from ceph since you will also need to migrate the >> local-storage. >> >> I tested a solution using iSCSI for the cache devices. Each VM was using >> flashcache with a combination of a iSCSI LUN from a SSD and a RBD. This > gets >> around the problem of moving things around or if the hypervisor goes down. >> It's not local caching but the write latency is at least 10x lower than > the RBD. >> Note I tested it, I didn't put it into production :-) >> >>> >>> My understanding of how a writeback cache should work is that it >>> should only take a few seconds for writes to be streamed onto the >>> network and is focussed on resolving the speed issue of small sync >>> writes. The writes >> would >>> be bundled into larger writes that are not time sensitive. >>> >>> So there is potential for a few seconds data loss but compared to the >> current >>> trend of using ephemeral storage to solve this issue, it's a major >>> improvement. >> >> Yeah, problem is a couple of seconds data loss mean different things to >> different people. >> >>> (considering the time required for setting up and maintaining the extra >>> caching layer on each vm, unless you work for free ;-) >>> >>> Couldn't agree more there. >>> >>> I am just so surprised how the openstack community haven't looked to >>> resolve this issue. Ephemeral storage is a HUGE compromise unless you >>> have built in failure into every aspect of your application but many >>> people use openstack as a general purpose devstack. >>> >>> (Jason pointed out his blueprint but I guess it's at least a year or 2 >> away - >>> http://tracker.ceph.com/projects/ceph/wiki/Rbd_-_ordered_crash- >>> consistent_write-back_caching_extension) >>> >>> I see articles discussing the idea such as this one >>> >>> http://www.sebastien-han.fr/blog/2014/06/10/ceph-cache-pool-tiering- >>> scalable-cache/ >>> >>> but no real straightforward validated setup instructions. >>> >>> Thanks >>> >>> Daniel >>> >>> >>> -Original Message- >>> From: Van Leeuwen, Robert [mailto:rovanleeu...@ebay.com] >>> Sent: 16 March 2016 08:11 >>> To: Jason Dillaman ; Daniel Niasoff >>> >>> Cc: ceph-users@lists.ceph.com >>> Subject: Re: [ceph-users] Local SSD cache for ceph on each compute node. >>> Indeed, well understood. As a shorter term workaround, if you have control over the VMs, you could >>> always just slice out an LVM volume from local SSD/NVMe and pass it >>> through to the guest. Within the guest, use dm-cache (or similar) to >>> add >> a >>> cache front-end to your RBD volume. >>> >>> If you do this you need to setup your cache as read-cache only. >>> Caching writes would be bad because a hypervisor failure would result >>> in >> loss >>> of the cache which pretty much guarantees inconsistent data on the >>> ceph volume. >>> Also live-migration w
Re: [ceph-users] data corruption with hammer
This tracker ticket happened to go by my eyes today: http://tracker.ceph.com/issues/12814 . There isn't a lot of detail there but the headline matches. -Greg On Wed, Mar 16, 2016 at 2:02 AM, Nick Fisk wrote: > > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Christian Balzer >> Sent: 16 March 2016 07:08 >> To: Robert LeBlanc >> Cc: Robert LeBlanc ; ceph-users > us...@lists.ceph.com>; William Perkins >> Subject: Re: [ceph-users] data corruption with hammer >> >> >> Hello Robert, >> >> On Tue, 15 Mar 2016 10:54:20 -0600 Robert LeBlanc wrote: >> >> > -BEGIN PGP SIGNED MESSAGE- >> > Hash: SHA256 >> > >> > There are no monitors on the new node. >> > >> So one less possible source of confusion. >> >> > It doesn't look like there has been any new corruption since we >> > stopped changing the cache modes. Upon closer inspection, some files >> > have been changed such that binary files are now ASCII files and visa >> > versa. These are readable ASCII files and are things like PHP or >> > script files. Or C files where ASCII files should be. >> > >> What would be most interesting is if the objects containing those > corrupted >> files did reside on the new OSDs (primary PG) or the old ones, or both. >> >> Also, what cache mode was the cluster in before the first switch > (writeback I >> presume from the timeline) and which one is it in now? >> >> > I've seen this type of corruption before when a SAN node misbehaved >> > and both controllers were writing concurrently to the backend disks. >> > The volume was only mounted by one host, but the writes were split >> > between the controllers when it should have been active/passive. >> > >> > We have killed off the OSDs on the new node as a precaution and will >> > try to replicate this in our lab. >> > >> > I suspicion is that is has to do with the cache promotion code update, >> > but I'm not sure how it would have caused this. >> > >> While blissfully unaware of the code, I have a hard time imagining how it >> would cause that as well. >> Potentially a regression in the code that only triggers in one cache mode > and >> when wanting to promote something? >> >> Or if it is actually the switching action, not correctly promoting things > as it >> happens? >> And thus referencing a stale object? > > I can't think of any other reason why the recency would break things in any > other way. Can the OP confirm what recency setting is being used? > > When you switch to writeback, if you haven't reached the required recency > yet, all reads will be proxied, previous behaviour would have pretty much > promoted all the time regardless. So unless something is happening where > writes are getting sent to one tier in forward mode and then read from a > different tier in WB mode, I'm out of ideas. I'm pretty sure the code says > Proxy Read then check for promotion, so I'm not even convinced that there > should be any difference anyway. > > I note the documentation states that in forward mode, modified objects get > written to the backing tier, I'm not if that sounds correct to me. But if > that is what is happening, that could also be related to the problem??? > > I think this might be easyish to reproduce using the get/put commands with a > couple of objects on a test pool if anybody out there is running 94.6 on the > whole cluster. > >> >> Christian >> >> > -BEGIN PGP SIGNATURE- >> > Version: Mailvelope v1.3.6 >> > Comment: https://www.mailvelope.com >> > >> > >> wsFcBAEBCAAQBQJW6D4zCRDmVDuy+mK58QAAoW0QAKmaNnN78m/3/YLI >> IlAB >> > U+q9PKXgB4ptds1prEJrB/HJqtxIi021M2urk6iO2XRUgR4qSWZyVJWMmeE9 >> > 6EhM6IvLbweOePr2LJ5nAVEkL5Fns+ya/aOAvilqo2WJGr8jt9J1ABjQgodp >> > >> SAGwDywo3GbGUmdxWWy5CrhLsdc9WNhiXdBxREh/uqWFvw2D8/1Uq4/u8 >> tEv >> > fohrGD+SZfYLQwP9O/v8Rc1C3A0h7N4ytSMiN7Xg2CC9bJDynn0FTrP2LAr/ >> > >> edEYx+SWF2VtKuG7wVHrQqImTfDUoTLJXP5Q6B+Oxy852qvWzglfoRhaKwGf >> > >> fodaxFlTDQaeMnyhMlODRMMXadmiTmyM/WK44YBuMjM8tnlaxf7yKgh09A >> Dz >> > ay5oviRWnn7peXmq65TvaZzUfz6Mx5ZWYtqIevaXb0ieFgrxCTdVbdpnMNRt >> > >> bMwQ+yVQ8WB5AQmEqN6p6enBCxpvr42p8Eu484dO0xqjIiEOfsMANT/8V63 >> y >> > RzjPMOaFKFnl3JoYNm61RGAUYszNBeX/Plm/3mP0qiiGBAeHYoxh7DNYlrs/ >> > >> gUb/O9V0yNuHQIRTs8ZRyrzZKpmh9YMYo8hCsfIqWZjMwEyQaRFuysQB3NaR >> > lQCO/o12Khv2cygmTCQxS2L7vp2zrkPaS/KietqQ0gwkV1XbynK0XyLkAVDw >> > zTLa >> > =Wk/a >> > -END PGP SIGNATURE- >> > >> > Robert LeBlanc >> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >> > >> > >> > On Mon, Mar 14, 2016 at 9:35 PM, Christian Balzer wrote: >> > > >> > > Hello, >> > > >> > > On Mon, 14 Mar 2016 20:51:04 -0600 Mike Lovell wrote: >> > > >> > >> something weird happened on one of the ceph clusters that i >> > >> administer tonight which resulted in virtual machines using rbd >> > >> volumes seeing corruption in multiple forms. >> > >> >> > >> when everything was fine earlier in the day, the cluster was a >> > >> number of storage nodes spread a
Re: [ceph-users] [cephfs] About feature 'snapshot'
On Thu, Mar 17, 2016 at 1:41 PM, Gregory Farnum wrote: > On Thu, Mar 17, 2016 at 3:49 AM, John Spray wrote: >> Snapshots are disabled by default: >> http://docs.ceph.com/docs/hammer/cephfs/early-adopters/#most-stable-configuration > > Which makes me wonder if we ought to be hiding the .snaps directory > entirely in that case. I haven't previously thought about that, but it > *is* a bit weird. Hmm, we could use the ever_allowed_snaps field to hide .snap. However, we would still want to prevent people creating a directory with that name, because if they ever enabled snapshots, we wouldn't have a way of resolving that. So it would be weird to omit .snap from the directory listing, but then give an error if someone tries to create a folder with that name. Perhaps showing the folder (even if snaps are disabled) is the lesser evil. John > -Greg > >> >> John >> >> On Thu, Mar 17, 2016 at 10:02 AM, 施柏安 wrote: >>> Hi all, >>> I encounter a trouble about cephfs sanpshot. It seems that the folder >>> '.snap' is exist. >>> But I use 'll -a' can't let it show up. And I enter that folder and create >>> folder in it, it showed something wrong to use snapshot. >>> >>> Please check : http://imgur.com/elZhQvD >>> >>> >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ZFS or BTRFS for performance?
Insofar as I've been able to tell, both BTRFS and ZFS provide similar capabilities back to CEPH, and both are sufficiently stable for the basic CEPH use case (Single disk -> single mount point), so the question becomes this: Which actually provides better performance? Which is the more highly optimized single write path for ceph? Does anybody have a handful of side-by-side benchmarks? I'm more interested in higher IOPS, since you can always scale-out throughput, but throughput is also important. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] data corruption with hammer
On Thu, 17 Mar 2016, Robert LeBlanc wrote: > Also, is this ceph_test_rados rewriting objects quickly? I think that > the issue is with rewriting objects so if we can tailor the > ceph_test_rados to do that, it might be easier to reproduce. It's doing lots of overwrites, yeah. I was albe to reproduce--thanks! It looks like it's specific to hammer. The code was rewritten for jewel so it doesn't affect the latest. The problem is that maybe_handle_cache may proxy the read and also still try to handle the same request locally (if it doesn't trigger a promote). Here's my proposed fix: https://github.com/ceph/ceph/pull/8187 Do you mind testing this branch? It doesn't appear to be directly related to flipping between writeback and forward, although it may be that we are seeing two unrelated issues. I seemed to be able to trigger it more easily when I flipped modes, but the bug itself was a simple issue in the writeback mode logic. :/ Anyway, please see if this fixes it for you (esp with the RBD workload). Thanks! sage > > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Thu, Mar 17, 2016 at 11:05 AM, Robert LeBlanc wrote: > > I'll miss the Ceph community as well. There was a few things I really > > wanted to work in with Ceph. > > > > I got this: > > > > update_object_version oid 13 v 1166 (ObjNum 1028 snap 0 seq_num 1028) > > dirty exists > > 1038: left oid 13 (ObjNum 1028 snap 0 seq_num 1028) > > 1040: finishing write tid 1 to nodez23350-256 > > 1040: finishing write tid 2 to nodez23350-256 > > 1040: finishing write tid 3 to nodez23350-256 > > 1040: finishing write tid 4 to nodez23350-256 > > 1040: finishing write tid 6 to nodez23350-256 > > 1035: done (4 left) > > 1037: done (3 left) > > 1038: done (2 left) > > 1043: read oid 430 snap -1 > > 1043: expect (ObjNum 429 snap 0 seq_num 429) > > 1040: finishing write tid 7 to nodez23350-256 > > update_object_version oid 256 v 661 (ObjNum 1029 snap 0 seq_num 1029) > > dirty exists > > 1040: left oid 256 (ObjNum 1029 snap 0 seq_num 1029) > > 1042: expect (ObjNum 664 snap 0 seq_num 664) > > 1043: Error: oid 430 read returned error code -2 > > ./test/osd/RadosModel.h: In function 'virtual void > > ReadOp::_finish(TestOp::CallbackInfo*)' thread 7fa1bf7fe700 time > > 2016-03-17 10:47:19.085414 > > ./test/osd/RadosModel.h: 1109: FAILED assert(0) > > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > const*)+0x76) [0x4db956] > > 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xec) [0x4c959c] > > 3: (()+0x9791d) [0x7fa1d472191d] > > 4: (()+0x72519) [0x7fa1d46fc519] > > 5: (()+0x13c178) [0x7fa1d47c6178] > > 6: (()+0x80a4) [0x7fa1d425a0a4] > > 7: (clone()+0x6d) [0x7fa1d2bd504d] > > NOTE: a copy of the executable, or `objdump -rdS ` is > > needed to interpret this. > > terminate called after throwing an instance of 'ceph::FailedAssertion' > > Aborted > > > > I had to toggle writeback/forward and min_read_recency_for_promote a > > few times to get it, but I don't know if it is because I only have one > > job running. Even with six jobs running, it is not easy to trigger > > with ceph_test_rados, but it is very instant in the RBD VMs. > > > > Here are the six run crashes (I have about the last 2000 lines of each > > if needed): > > > > nodev: > > update_object_version oid 1015 v 1255 (ObjNum 1014 snap 0 seq_num > > 1014) dirty exists > > 1015: left oid 1015 (ObjNum 1014 snap 0 seq_num 1014) > > 1016: finishing write tid 1 to nodev21799-1016 > > 1016: finishing write tid 2 to nodev21799-1016 > > 1016: finishing write tid 3 to nodev21799-1016 > > 1016: finishing write tid 4 to nodev21799-1016 > > 1016: finishing write tid 6 to nodev21799-1016 > > 1016: finishing write tid 7 to nodev21799-1016 > > update_object_version oid 1016 v 1957 (ObjNum 1015 snap 0 seq_num > > 1015) dirty exists > > 1016: left oid 1016 (ObjNum 1015 snap 0 seq_num 1015) > > 1017: finishing write tid 1 to nodev21799-1017 > > 1017: finishing write tid 2 to nodev21799-1017 > > 1017: finishing write tid 3 to nodev21799-1017 > > 1017: finishing write tid 5 to nodev21799-1017 > > 1017: finishing write tid 6 to nodev21799-1017 > > update_object_version oid 1017 v 1010 (ObjNum 1016 snap 0 seq_num > > 1016) dirty exists > > 1017: left oid 1017 (ObjNum 1016 snap 0 seq_num 1016) > > 1018: finishing write tid 1 to nodev21799-1018 > > 1018: finishing write tid 2 to nodev21799-1018 > > 1018: finishing write tid 3 to nodev21799-1018 > > 1018: finishing write tid 4 to nodev21799-1018 > > 1018: finishing write tid 6 to nodev21799-1018 > > 1018: finishing write tid 7 to nodev21799-1018 > > update_object_version oid 1018 v 1093 (ObjNum 1017 snap 0 seq_num > > 1017) dirty exists > > 1018: left oid 1018 (ObjNum 1017 snap 0 seq_num 1017) > > 1019: finishing write tid 1 to nodev21799-1019 > > 1019: finishing write tid 2 to node
Re: [ceph-users] DONTNEED fadvise flag
On Wed, Mar 16, 2016 at 9:46 AM, Kenneth Waegeman wrote: > Hi all, > > Quick question: Does cephFS pass the fadvise DONTNEED flag and take it into > account? > I want to use the --drop-cache option of rsync 3.1.1 to not fill the cache > when rsyncing to cephFS It looks like ceph-fuse unfortunately does not. I'm not sure about the kernel client though. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is there an api to list all s3 user
Hi Php aws sdk with personal updates can do that. First, you need a functionnal php_aws_sdk with your radosgw and an account (access/secret key) with metadata caps. I use aws Version 2.8.22 : $aws = Aws::factory('config.php'); $this->s3client = $aws->get('s3'); http://docs.aws.amazon.com/aws-sdk-php/v2/guide/configuration.html Exemple of config.php file : return array ( 'includes' => array('_aws'), 'services' => array( // All AWS clients extend from 'default_settings'. Here we are // overriding 'default_settings' with our default credentials and // providing a default region setting. 'default_settings' => array( 'params' => array( 'version' => 'latest', 'region' => 'us-west-1', 'endpoint' => HOST, 'signature_version' => 'v2', 'credentials' => array( 'key'=> AWS_KEY, 'secret' => AWS_SECRET_KEY, ), "bucket_endpoint" => false, 'debug' => true, ) ) ) ); And second, this is an example of my updates : I add a new ServiceDescription to Guzzle : $aws = Aws::factory(YOUR_PHP_RGW_CONFIG_FILE); $this->s3client = $aws->get('s3'); $cephCommand = include __DIR__.'/ceph-services.php'; $description = new \Guzzle\Service\Description\ServiceDescription($cephCommand); $default = \Guzzle\Service\Command\Factory\CompositeFactory::getDefaultChain($this->s3client); $default->add( new \Guzzle\Service\Command\Factory\ServiceDescriptionFactory($description), 'Guzzle\Service\Command\Factory\ServiceDescriptionFactory'); $this->s3client->setCommandFactory($default); The ceph-services.php file contains : return array( 'apiVersion' => '2015-12-08', 'serviceFullName' => 'Ceph Gateway', 'serviceAbbreviation' => 'CEPH ULR', 'serviceType' => 'rest-xml', 'operations' => array( 'ListAllUsers' => array( 'httpMethod' => 'GET', 'uri' => '/admin/metadata/user', 'class' => 'Aws\S3\Command\S3Command', 'responseClass' => 'ListAllUsersOutput', 'responseType' => 'model', 'parameters' => array( 'format' => array( 'type' => 'string', 'location' => 'query', 'sentAs' => 'format', 'require' => true, 'default' => 'xml', ), ), ), ), 'models' => array( 'ListAllUsersOutput' => array( 'type' => 'object', 'additionalProperties' => true, 'properties' => array( 'Keys' => array( 'type' => 'string', 'location' => 'xml', ), ), ), ) ); Just call it by : $result = $this->s3client->listAllUsers(); Good luck ... Mikaël Le 16/03/2016 07:51, Mika c a écrit : Hi all, Hi, I try to find an api that can list all s3 user like command 'radosgw-admin metadata list user'. But I can not find any document related. Have anyone know how to get this information? Any comments will be much appreciated! Best wishes, Mika ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSDs for journals vs SSDs for a cache tier, which is better?
Hello, On Wed, 16 Mar 2016 16:22:06 + Stephen Harker wrote: > On 2016-02-17 11:07, Christian Balzer wrote: > > > > On Wed, 17 Feb 2016 10:04:11 +0100 Piotr Wachowicz wrote: > > > >> > > Let's consider both cases: > >> > > Journals on SSDs - for writes, the write operation returns right > >> > > after data lands on the Journal's SSDs, but before it's written to > >> > > the backing HDD. So, for writes, SSD journal approach should be > >> > > comparable to having a SSD cache tier. > >> > Not quite, see below. > >> > > >> > > >> Could you elaborate a bit more? > >> > >> Are you saying that with a Journal on a SSD writes from clients, > >> before > >> they can return from the operation to the client, must end up on both > >> the > >> SSD (Journal) *and* HDD (actual data store behind that journal)? > > > > No, your initial statement is correct. > > > > However that burst of speed doesn't last indefinitely. > > > > Aside from the size of the journal (which is incidentally NOT the most > > limiting factor) there are various "filestore" parameters in Ceph, in > > particular the sync interval ones. > > There was a more in-depth explanation by a developer about this in > > this ML, > > try your google-foo. > > > > For short bursts of activity, the journal helps a LOT. > > If you send a huge number of for example 4KB writes to your cluster, > > the > > speed will eventually (after a few seconds) go down to what your > > backing > > storage (HDDs) are capable of sustaining. > > > >> > (Which SSDs do you plan to use anyway?) > >> > > >> > >> Intel DC S3700 > >> > > Good choice, with the 200GB model prefer the 3700 over the 3710 (higher > > sequential write speed). > > Hi All, > > I am looking at using PCI-E SSDs as journals in our (4) Ceph OSD nodes, > each of which has 6 4TB SATA drives within. I had my eye on these: > > 400GB Intel P3500 DC AIC SSD, HHHL PCIe 3.0 > > but reading through this thread, it might be better to go with the P3700 > given the improved iops. So a couple of questions. > The 3700's will also last significantly longer than the 3500's. IOPS (of the device) are mostly irrelevant, sequential write speed is where it's at. In the same vein, remember that journals are never ever read from unless there was a crash. > * Are the PCI-E versions of these drives different in any other way than > the interface? > > * Would one of these as a journal for 6 4TB OSDs be overkill > (connectivity is 10GE, or will be shortly anyway), would the SATA S3700 > be sufficient? > Overkill, but not insanely so. >From my (not insignificant) experience you want to match your journal(s) firstly towards your network speed and then the devices behind them. A SATA HDD can write indeed about 180MB/s sequentially, but that's firmly in the land of theory when it comes to Ceph. Ceph/RBD writes are 4MB objects at the largest, they are spread out all over the cluster and of course most likely interspersed with competing (seeking) reads and other writes to the same OSD. That is before all the IO and thus seeks needed for for file system operations, LevelDB updates, etc. I thus spec my journals to 100MB/s write speed per SATA based HDD and that's already generous. Concrete case in point, 4 node cluster, 4 DC S3700 100GB SSDs with 2 journals each, 8 7.2k 3TB SATA HDDs, Infiniband network. That cluster is very lightly loaded. Doing this fio from a client VM: --- fio --size=6G --ioengine=libaio --invalidate=1 --direct=1 --numjobs=1 --rw=randwrite --name=fiojob --blocksize=4M --iodepth=32 --- and watching all 4 nodes simultaneously with atop shows us that the HDDs are pushed up to around 80% utilization while writing only about 50MB/s. The journal SSDs (which can handle 200MB/s writes) are consequently semi-bored at about 45% utilization writing around 95MB/s. As others mentioned, the P series will give you significantly lower latencies if that's important in your use case (small writes that in their sum do not exceed the abilities of your backing storage and CPUs). Also a lot of this depends on your actual HW (cases), how many hot-swap bays do you have, how many free PCIe slots, etc. With entirely new HW you could go for something that has 1-2 NVMe hot-swap bays and get the best of both worlds. Summing things up, the 400GB P3700 matches your network speed and thus can deal with short bursts at full speed. However it is overkill for your 6 HDDs, especially once they get busy (like backfilling or tests as above). I'd be surprised to see them handle more than 400MB/s writes combined. If you're trying to economize, a single 200GB DC S3700 or 2 100GB ones (smaller failure domains) should do the trick, too. > Given they're not hot-swappable, it'd be good if they didn't wear out in > 6 months too. > See above. I haven't been able to make more than 1% impact in the media wearout of 200GB DC S3700s that receive a constant write stream of 3MB/s over 500 days of operation. Christian -- Christian
Re: [ceph-users] ceph-disk from jewel has issues on redhat 7
I can raise a tracker for this issue since it looks like an intermittent issue and mostly dependent on specific hardware or it would be better if you add all the hardware/os details in tracker.ceph.com, also from your logs it looks like you have Resource busy issue: Error: Failed to add partition 2 (Device or resource busy) From my test run logs on centos 7.2 , 10.0.5 ( http://qa-proxy.ceph.com/teuthology/vasu-2016-03-15_15:34:41-selinux-master---basic-mira/62626/teuthology.log ) 2016-03-15T18:49:56.305 INFO:teuthology.orchestra.run.mira041.stderr:[ceph_deploy.osd][DEBUG ] Preparing host mira041 disk /dev/sdb journal None activate True 2016-03-15T18:49:56.305 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][DEBUG ] find the location of an executable 2016-03-15T18:49:56.309 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][INFO ] Running command: sudo /usr/sbin/ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdb 2016-03-15T18:49:56.546 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid 2016-03-15T18:49:56.611 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] command: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --cluster ceph 2016-03-15T18:49:56.643 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] command: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --cluster ceph 2016-03-15T18:49:56.708 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] command: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --cluster ceph 2016-03-15T18:49:56.708 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid 2016-03-15T18:49:56.709 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] set_type: Will colocate journal with data on /dev/sdb 2016-03-15T18:49:56.709 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size 2016-03-15T18:49:56.774 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid 2016-03-15T18:49:56.774 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid 2016-03-15T18:49:56.775 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid 2016-03-15T18:49:56.775 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs 2016-03-15T18:49:56.777 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs 2016-03-15T18:49:56.809 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs 2016-03-15T18:49:56.841 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs 2016-03-15T18:49:56.857 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid 2016-03-15T18:49:56.858 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid 2016-03-15T18:49:56.858 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] ptype_tobe_for_name: name = journal 2016-03-15T18:49:56.859 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid 2016-03-15T18:49:56.859 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] create_partition: Creating journal partition num 2 size 5120 on /dev/sdb 2016-03-15T18:49:56.859 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] command_check_call: Running command: /sbin/sgdisk --new=2:0:+5120M --change-name=2:ceph journal --partition-guid=2:d4b2fa8d-3f2a-4ce9-a2fe-2a3872d7e198 --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdb 2016-03-15T18:49:57.927 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][DEBUG ] The operation has completed successfully. 2016-03-15T18:49:57.927 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] update_partition: Calling partprobe on created device /dev/sdb 2016-03-15T18:49:57.928 INFO:teuthology.orchestra.run.mira041.stderr:[mira041][WARNING] command_check_call: Running command: /usr/bin/udevadm settle --timeout=600 2016-03-15T18:49:58.393 INFO:teuthology.orchestra.run.mira041.stderr:[mira04