Re: [ceph-users] ceph OSD journal (with dmcrypt) replacement
AFAIK in case of dm-crypt luks (as default) ceph-disk keeps particular OSD partition\partitions key in ceph mon attributes and uses OSD partition uuid as an ID for this key. So you can get all your keys running: /usr/bin/ceph config-key ls You'll get something like: [ ... "dm-crypt/osd/50250ade-500a-44c4-8a47-00224d76594a/luks", "dm-crypt/osd/940b5b1c-5926-4aa5-8cd7-ce2f22371d6a/luks", "dm-crypt/osd/dd28c6ba-c101-4874-bc1c-401b34cb2f9b/luks", ... ] These uuid are partition uuids. You can check your *OSD* partition uuid and get particular key as: # change path to your OSD (*not journal*) partition path OSD_PATH=/dev/sdXN OSD_UUID=`blkid -s PARTUUID -o value $OSD_PATH` /usr/bin/ceph config-key get dm-crypt/osd/$OSD_UUID/luks 2017-09-08 18:18 GMT+05:00 M Ranga Swami Reddy : > when I create dmcrypted jounral using cryptsetup command, its asking > for passphase? Can I use passphase as empty? > > On Wed, Sep 6, 2017 at 11:23 PM, M Ranga Swami Reddy > wrote: > > Thank you. Iam able to replace the dmcrypt journal successfully. > > > > On Sep 5, 2017 18:14, "David Turner" wrote: > >> > >> Did the journal drive fail during operation? Or was it taken out during > >> pre-failure. If it fully failed, then most likely you can't guarantee > the > >> consistency of the underlying osds. In this case, you just put the > affected > >> osds and add them back in as new osds. > >> > >> In the case of having good data on the osds, you follow the standard > >> process of closing the journal, create the new partition, set up all of > the > >> partition metadata so that the ceph udev rules will know what the > journal > >> is, and just create a new dmcrypt volume on it. I would recommend using > the > >> same uuid as the old journal so that you don't need to update the > symlinks > >> and such on the osd. After everything is done, run the journal create > >> command for the osd and start the osd. > >> > >> > >> On Tue, Sep 5, 2017, 2:47 AM M Ranga Swami Reddy > >> wrote: > >>> > >>> Hello, > >>> How to replace an OSD's journal created with dmcrypt, from one drive > >>> to another drive, in case of current journal drive failed. > >>> > >>> Thanks > >>> Swami > >>> ___ > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- С уважением, Дробышевский Владимир Компания "АйТи Город" +7 343 192 ИТ-консалтинг Поставка проектов "под ключ" Аутсорсинг ИТ-услуг Аутсорсинг ИТ-инфраструктуры ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [PVE-User] OSD won't start, even created ??
Did a few more tests : Older Ceph server with a pveceph create osd command ( (pveceph create osd /dev/sdb equivalent to ceph-disk prepare --zap-disk --fs-type xfs --cluster ceph --cluster-uuid a5c0cfed-...4bf939ed70 /dev/sdb ) sgdisk --print /dev/sdd Disk /dev/sdd: 2930277168 sectors, 1.4 TiB Logical sector size: 512 bytes Disk identifier (GUID): 638646CF-..-62296C871132 Partition table holds up to 128 entries First usable sector is 34, last usable sector is 2930277134 Partitions will be aligned on 2048-sector boundaries Total free space is 2014 sectors (1007.0 KiB) Number Start (sector)End (sector) Size Code Name 110487808 2930277134 1.4 TiB F800 ceph data 2204810487807 5.0 GiB F802 ceph journal On a newer ceph server ( dpkg -l : 12.2.0-pve1 version) sgdisk --print /dev/sdb Disk /dev/sdb: 1465149168 sectors, 698.6 GiB Logical sector size: 512 bytes Disk identifier (GUID): D63886B6-0.26-BCBCD6FFCA3C Partition table holds up to 128 entries First usable sector is 34, last usable sector is 1465149134 Partitions will be aligned on 2048-sector boundaries Total free space is 2014 sectors (1007.0 KiB) Number Start (sector)End (sector) Size Code Name 12048 206847 100.0 MiB F800 ceph data 2 206848 1465149134 698.5 GiB ceph block Related to the cep-osd.admin log , i think i used a osd creation process leading to a bluestore osd (instead of a filestore one). And seems that afterward the ceph server is unable to use the new bluestore : ( bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 102: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding ) just before trying to use it as a filestore one : ( probe_block_device_fsid /dev/sdb2 is filestore ) Tried to use the --bluestore 0 flag when creating the osd, but the flag is unknown. thanks by advance for any hint. Being ready to do a few more tests. Best regards. Le 08/09/2017 à 17:25, Phil Schwarz a écrit : Hi, any help would be really useful. Does anyone got a clue with my issue ? Thanks by advance. Best regards; Le 05/09/2017 à 20:25, Phil Schwarz a écrit : Hi, I come back with same issue as seen in previous thread ( link given) trying to a 2TB SATA as OSD: Using proxmox GUI or CLI (command given) give the same (bad) result. Didn't want to use a direct 'ceph osd create', thus bypassing pxmfs redundant filesystem. I tried to build an OSD woth same disk on another machine (stronger one with Opteron QuadCore), failing at the same time. Sorry for crossposting, but i think, i fail against the pveceph wrapper. Any help or clue would be really useful.. Thanks Best regards. -- Link to previous thread (but same problem): https://www.mail-archive.com/ceph-users@lists.ceph.com/msg38897.html -- commands : fdisk /dev/sdc ( mklabel msdos, w, q) ceph-disk zap /dev/sdc pveceph createosd /dev/sdc -- dpkg -l dpkg -l |grep ceph ii ceph 12.1.2-pve1 amd64 distributed storage and file system ii ceph-base12.1.2-pve1 amd64common ceph daemon libraries and management tools ii ceph-common 12.1.2-pve1 amd64common utilities to mount and interact with a ceph storage cluster ii ceph-mgr 12.1.2-pve1 amd64 manager for the ceph distributed storage system ii ceph-mon 12.1.2-pve1 amd64 monitor server for the ceph storage system ii ceph-osd 12.1.2-pve1 amd64OSD server for the ceph storage system ii libcephfs1 10.2.5-7.2 amd64Ceph distributed file system client library ii libcephfs2 12.1.2-pve1 amd64Ceph distributed file system client library ii python-cephfs12.1.2-pve1 amd64Python 2 libraries for the Ceph libcephfs library -- tail -f /var/log/ceph/ceph-osd.admin.log 2017-09-03 18:28:20.856641 7fad97e45e00 0 ceph version 12.1.2 (cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc), process (unknown), pid 5493 2017-09-03 18:28:20.857104 7fad97e45e00 -1 bluestore(/dev/sdc2) _read_bdev_label unable to decode label at offset 102: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding 2017-09-03 18:28:20.857200 7fad97e45e00 1 journal _open /dev/sdc2 fd 4: 2000293007360 bytes, block size 4096 bytes, directio = 0, aio = 0 2017-09-03 18:28:20.857366 7fad97e45e00 1 journal close /dev/sdc2 2017-09-03 18:28:20.857431 7fad97e45e00 0 probe_block_device_fsid /dev/sdc2 is filestore, ---- 2017-09-03 18:28:21.937285 7fa5766a5e00 0 ceph version 12.1.2 (cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc), process (unknown)
[ceph-users] MAX AVAIL in ceph df
Hi, How is the MAX AVAIL calculated in 'ceph df'? Since I am missing some space. I have 26 OSD's, each is 1484GB (according to df). I have 3 replica's. Shouldn't the MAX AVAIL be: (26*1484)/3 = 12.861GB? Instead 'ceph df' is showing 7545G for the pool that is using the 26 OSD's. What is wrong with my calculation? Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] librados for MacOS
On Thu, Aug 3, 2017 at 4:41 PM, Willem Jan Withagen wrote: > On 03/08/2017 09:36, Brad Hubbard wrote: >> On Thu, Aug 3, 2017 at 5:21 PM, Martin Palma wrote: >>> Hello, >>> >>> is there a way to get librados for MacOS? Has anybody tried to build >>> librados for MacOS? Is this even possible? yes, once upon a time librados and even ceph-fuse compiled and ran[0] fine on OSX. but since we have not worked on the port for a while, the build is broken in master. but with this patch[1], at least librados should build now. -- [0] https://github.com/ceph/ceph/pull/9371 [1] https://github.com/ceph/ceph/pull/17615 >> >> Yes, it is eminently possible, but would require a dedicated effort. >> >> As far as I know there is no one working on this atm. > > Looking at the code I've come across a few #ifdef's for OSX and sorts. > So attempts have been tried, but I think that code has rotted. > Now FreeBSD and MacOS have a partial similar background, so ATM I would > expect a MacOS port not to be all complex. And build on some of the > stuff I've done for FreeBSD. Not sure if the native compiler on Mac is > Clang, but all Clang issues are already fixed. (If Clang on Mac is at > least at 3.8) > > Liek Btad says:It does require persistence, and testing. But most > important, it will also require maintenance. > > --WjW > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Regards Kefu Chai ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Is the StupidAllocator supported in Luminous?
I am seeing OOM issues with some of my OSD nodes that I am testing with Bluestore on 12.2.0, so I decided to try the StupidAllocator to see if it has a smaller memory footprint, by setting the following in my ceph.conf: bluefs_allocator = stupid bluestore_cache_size_hdd = 1073741824 bluestore_cache_size_ssd = 1073741824 With these settings I am no longer seeing OOM errors, but on the node with these setting, overnight I have seen multiple Aborted messages in my log files: grep Abort *log ceph-osd.10.log:2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) ** ceph-osd.10.log: 0> 2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) ** ceph-osd.11.log:2017-09-09 11:39:16.835793 7fdcf6b08700 -1 *** Caught signal (Aborted) ** ceph-osd.11.log: 0> 2017-09-09 11:39:16.835793 7fdcf6b08700 -1 *** Caught signal (Aborted) ** ceph-osd.3.log:2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught signal (Aborted) ** ceph-osd.3.log:2017-09-09 07:49:56.256899 7f89edf90700 -1 *** Caught signal (Aborted) ** ceph-osd.3.log: 0> 2017-09-09 07:49:56.256899 7f89edf90700 -1 *** Caught signal (Aborted) ** ceph-osd.3.log:2017-09-09 08:13:16.919887 7f82f315e700 -1 *** Caught signal (Aborted) ** ceph-osd.7.log:2017-09-09 09:19:17.281950 7f77824cf700 -1 *** Caught signal (Aborted) ** ceph-osd.7.log: 0> 2017-09-09 09:19:17.281950 7f77824cf700 -1 *** Caught signal (Aborted) ** Before I open a ticket, I just want to know if the StupidAllocator is supported in Luminous. A couple of examples of the Aborts are: 2017-09-09 12:39:27.044074 7f27f5f20700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1504975167035909, "job": 86, "event": "flush_started", "num_memtables": 1, "num_entries": 1015543, "num_deletes": 345553, "memory_usage": 260049176} 2017-09-09 12:39:27.044088 7f27f5f20700 4 rocksdb: [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293] [default] [JOB 86] Level-0 flush table #1825: started 2017-09-09 12:39:28.234651 7f27fff34700 -1 osd.10 pg_epoch: 3521 pg[1.3c7( v 3521'372186 (3456'369135,3521'372186] local-lis/les=3488/3490 n=2842 ec=578/66 lis/c 3488/3488 les/c/f 3490/3500/0 3488/3488/3477) [10,8,16] r=0 lpr=3488 crt=3521'372186 lcod 3521'372184 mlcod 3521'372184 active+clean+snaptrim snaptrimq=[111~2,115~2,13a~1,13c~3]] removing snap head 2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) ** in thread 7f2816f45700 thread_name:msgr-worker-2 ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc) 1: (()+0xa562f4) [0x5634e14882f4] 2: (()+0x11390) [0x7f281b2c5390] 3: (gsignal()+0x38) [0x7f281a261428] 4: (abort()+0x16a) [0x7f281a26302a] 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f281aba384d] 6: (()+0x8d6b6) [0x7f281aba16b6] 7: (()+0x8d701) [0x7f281aba1701] 8: (()+0xb8d38) [0x7f281abccd38] 9: (()+0x76ba) [0x7f281b2bb6ba] 10: (clone()+0x6d) [0x7f281a33282d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- begin dump of recent events --- -1> 2017-09-09 12:39:05.878006 7f2817746700 1 -- 172.16.2.133:6804/1327479 <== osd.2 172.16.2.131:6800/1710 37506 osd_repop(mds.0.19:101159707 1.2f1 e3521/3477) v2 998+0+46 (52256346 0 1629833233) 0x56359eb29000 con 0x563510c02000 -> 2017-09-09 12:39:05.878065 7f2816f45700 1 -- 10.15.2.133:6805/327479 <== mds.0 10.15.2.123:6800/2942775562 55580 osd_op(mds.0.19:101159714 1.ec 1.ffad68ec (undecoded) ondisk+write+known_if_redirected+full_force e3521) v8 305+0+366 (2883828331 0 2609552142) 0x56355d9eb0c0 con 0x56355f455000 Second example: 2017-09-09 07:10:58.135527 7fa2d56a0700 4 rocksdb: [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:264] [default] [JOB 10] Flushing memtable with next log file: 2773 2017-09-09 07:10:58.262058 7fa2d56a0700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1504955458135538, "job": 10, "event": "flush_started", "num_memtables": 1, "num_entries": 935059, "num_deletes": 175946, "memory_usage": 260049888} 2017-09-09 07:10:58.262077 7fa2d56a0700 4 rocksdb: [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293] [default] [JOB 10] Level-0 flush table #2774: started 2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught signal (Aborted) ** in thread 7fa2e96c8700 thread_name:bstore_kv_sync ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc) 1: (()+0xa562f4) [0x5579585362f4] 2: (()+0x11390) [0x7fa2faa45390] 3: (gsignal()+0x38) [0x7fa2f99e1428] 4: (abort()+0x16a) [0x7fa2f99e302a] 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fa2fa32384d] 6: (()+0x8d6b6) [0x7fa2fa3216b6] 7: (()+0x8d701) [0x7fa2fa321701] 8: (()+0x8d919) [0x7fa2fa321919] 9: (()+0x1230f) [0x7fa2fb60b30f] 10: (operator new[](unsigned long)+0x4e7) [0x7fa2fb62f4b7] 11: (rocksdb::Arena::AllocateNewBlock(unsigned long)+0x70) [0x557958939150] 12: (rocksdb::Arena::AllocateFallback(unsigned long, bool)+0x45) [0x5579589392d5] 13: (rocksdb::Arena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logg
Re: [ceph-users] PCIe journal benefit for SSD OSDs
Hi Alexandre, Am 07.09.2017 um 19:31 schrieb Alexandre DERUMIER: > Hi Stefan > >>> Have you already done tests how he performance changes with bluestore >>> while putting all 3 block devices on the same ssd? > > > I'm going to test bluestore with 3 nodes , 18 x intel s3610 1,6TB in coming > weeks. > > I'll send results on the mailing. Thanks! Greets, Stefan > - Mail original - > De: "Stefan Priebe, Profihost AG" > À: "Christian Balzer" , "ceph-users" > Envoyé: Jeudi 7 Septembre 2017 08:03:31 > Objet: Re: [ceph-users] PCIe journal benefit for SSD OSDs > > Hello, > Am 07.09.2017 um 03:53 schrieb Christian Balzer: >> >> Hello, >> >> On Wed, 6 Sep 2017 09:09:54 -0400 Alex Gorbachev wrote: >> >>> We are planning a Jewel filestore based cluster for a performance >>> sensitive healthcare client, and the conservative OSD choice is >>> Samsung SM863A. >>> >> >> While I totally see where you're coming from and me having stated that >> I'll give Luminous and Bluestore some time to mature, I'd also be looking >> into that if I were being in the planning phase now, with like 3 months >> before deployment. >> The inherent performance increase with Bluestore (and having something >> that hopefully won't need touching/upgrading for a while) shouldn't be >> ignored. > > Yes and that's the point where i'm currently as well. Thinking about how > to design a new cluster based on bluestore. > >> The SSDs are fine, I've been starting to use those recently (though not >> with Ceph yet) as Intel DC S36xx or 37xx are impossible to get. >> They're a bit slower in the write IOPS department, but good enough for me. > > I've never used the Intel DC ones but always the Samsung are the Intel > really faster? Have you disabled te FLUSH command for the Samsung ones? > They don't skip the command automatically like the Intel do. Sadly the > Samsung SM863 got more expensive over the last months. They were a lot > cheaper in the first month of 2016. May be the 2,5" optane intel ssds > will change the game. > >>> but was wondering if anyone has seen a positive >>> impact from also using PCIe journals (e.g. Intel P3700 or even the >>> older 910 series) in front of such SSDs? >>> >> NVMe journals (or WAL and DB space for Bluestore) are nice and can >> certainly help, especially if Ceph is tuned accordingly. >> Avoid non DC NVMes, I doubt you can still get 910s, they are officially >> EOL. >> You want to match capabilities and endurances, a DC P3700 800GB would be >> an OK match for 3-4 SM863a 960GB for example. > > That's a good point but makes the cluster more expensive. Currently > while using filestore i use one SSD for journal and data which works fine. > > With bluestore we've block, db and wal so we need 3 block devices per > OSD. If we need one PCIe or NVMe device per 3-4 devices it get's much > more expensive per host - currently running 10 OSDs / SSDs per Node. > > Have you already done tests how he performance changes with bluestore > while putting all 3 block devices on the same ssd? > > Greets, > Stefan > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph release cadence
Hi, have been using Ceph for multiple years now. It’s unclear to me which of your options fits best, but here are my preferences: * Updates are risky in a way that we tend to rather not do them every year. Also, having seen jewel, we’ve been well off to avoid two major issues what would have bitten us and will upgrade from hammer in the next month or so. * Non-production releases are of not much value to me, as I have to keep our dev/staging/prod clusters in sync to work on our stuff. As you can never downgrade, there’s no value in it for me to try non-production releases (without frying dev for everyone). * I’d prefer stability over new features. *Specifically* that new features can be properly recombined with existing features (and each other) without leading to surprises. (E.g. cache tiering breaking with snapshots and then no way going back and a general notion of “that combination wasn’t really well tested). * I’d prefer versions that I have to be maintained for production-critical issues maybe 2 years, so I can have some time after a new production release that overlaps with the new production release receiving important bug fixes until I switch. Maybe this is close to what your "Drop the odd releases, and aim for a ~9 month cadence.” would say. Waiting for a feature for a year is a pain, but my personal goal for Ceph is that it first has to work properly, meaning: not loose your data, not "stopping the show”, and not drawing you into a corner you can’t get out. That’s my perspective as a user. As a fellow developer I feel your pain about wanting to release faster and reducing maintenance load, so thanks for asking! Hope this helps, Christian > On Sep 6, 2017, at 5:23 PM, Sage Weil wrote: > > Hi everyone, > > Traditionally, we have done a major named "stable" release twice a year, > and every other such release has been an "LTS" release, with fixes > backported for 1-2 years. > > With kraken and luminous we missed our schedule by a lot: instead of > releasing in October and April we released in January and August. > > A few observations: > > - Not a lot of people seem to run the "odd" releases (e.g., infernalis, > kraken). This limits the value of actually making them. It also means > that those who *do* run them are running riskier code (fewer users -> more > bugs). > > - The more recent requirement that upgrading clusters must make a stop at > each LTS (e.g., hammer -> luminous not supported, must go hammer -> jewel > -> lumninous) has been hugely helpful on the development side by reducing > the amount of cross-version compatibility code to maintain and reducing > the number of upgrade combinations to test. > > - When we try to do a time-based "train" release cadence, there always > seems to be some "must-have" thing that delays the release a bit. This > doesn't happen as much with the odd releases, but it definitely happens > with the LTS releases. When the next LTS is a year away, it is hard to > suck it up and wait that long. > > A couple of options: > > * Keep even/odd pattern, and continue being flexible with release dates > > + flexible > - unpredictable > - odd releases of dubious value > > * Keep even/odd pattern, but force a 'train' model with a more regular > cadence > > + predictable schedule > - some features will miss the target and be delayed a year > > * Drop the odd releases but change nothing else (i.e., 12-month release > cadence) > > + eliminate the confusing odd releases with dubious value > > * Drop the odd releases, and aim for a ~9 month cadence. This splits the > difference between the current even/odd pattern we've been doing. > > + eliminate the confusing odd releases with dubious value > + waiting for the next release isn't quite as bad > - required upgrades every 9 months instead of ever 12 months > > * Drop the odd releases, but relax the "must upgrade through every LTS" to > allow upgrades across 2 versions (e.g., luminous -> mimic or luminous -> > nautilus). Shorten release cycle (~6-9 months). > > + more flexibility for users > + downstreams have greater choice in adopting an upstrema release > - more LTS branches to maintain > - more upgrade paths to consider > > Other options we should consider? Other thoughts? > > Thanks! > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Liebe Grüße, Christian Theune -- Christian Theune · c...@flyingcircus.io · +49 345 219401 0 Flying Circus Internet Operations GmbH · http://flyingcircus.io Forsterstraße 29 · 06112 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listi
[ceph-users] [Luminous] rgw not deleting object
Hi, I face a wild issue: I cannot remove an object from rgw (via s3 API) My steps: s3cmd ls s3://bucket/object -> it exists s3cmd rm s3://bucket/object -> success s3cmd ls s3://bucket/object -> it still exists At this point, I can curl and get the object (thus, it does exists) Doing the same via boto leads to the same behavior Log sample: 2017-09-10 01:18:42.502486 7fd189e7d700 1 == starting new request req=0x7fd189e77300 = 2017-09-10 01:18:42.504028 7fd189e7d700 1 == req done req=0x7fd189e77300 op status=-2 http_status=204 == 2017-09-10 01:18:42.504076 7fd189e7d700 1 civetweb: 0x560ebc275000: 10.42.43.6 - - [10/Sep/2017:01:18:38 +0200] "DELETE /bucket/object HTTP/1.1" 1 0 - Boto/2.44.0 Python/3.5.4 Linux/4.12.0-1-amd64 What can I do ? What data shall I provide to debug this issue ? Regards, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is the StupidAllocator supported in Luminous?
Yes. Please open a ticket! On Sat, Sep 9, 2017 at 11:16 AM Eric Eastman wrote: > I am seeing OOM issues with some of my OSD nodes that I am testing with > Bluestore on 12.2.0, so I decided to try the StupidAllocator to see if it > has a smaller memory footprint, by setting the following in my ceph.conf: > > bluefs_allocator = stupid > bluestore_cache_size_hdd = 1073741824 > bluestore_cache_size_ssd = 1073741824 > > With these settings I am no longer seeing OOM errors, but on the node with > these setting, overnight I have seen multiple Aborted messages in my log > files: > > grep Abort *log > ceph-osd.10.log:2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught > signal (Aborted) ** > ceph-osd.10.log: 0> 2017-09-09 12:39:28.573034 7f2816f45700 -1 *** > Caught signal (Aborted) ** > ceph-osd.11.log:2017-09-09 11:39:16.835793 7fdcf6b08700 -1 *** Caught > signal (Aborted) ** > ceph-osd.11.log: 0> 2017-09-09 11:39:16.835793 7fdcf6b08700 -1 *** > Caught signal (Aborted) ** > ceph-osd.3.log:2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught > signal (Aborted) ** > ceph-osd.3.log:2017-09-09 07:49:56.256899 7f89edf90700 -1 *** Caught > signal (Aborted) ** > ceph-osd.3.log: 0> 2017-09-09 07:49:56.256899 7f89edf90700 -1 *** > Caught signal (Aborted) ** > ceph-osd.3.log:2017-09-09 08:13:16.919887 7f82f315e700 -1 *** Caught > signal (Aborted) ** > ceph-osd.7.log:2017-09-09 09:19:17.281950 7f77824cf700 -1 *** Caught > signal (Aborted) ** > ceph-osd.7.log: 0> 2017-09-09 09:19:17.281950 7f77824cf700 -1 *** > Caught signal (Aborted) ** > > Before I open a ticket, I just want to know if the StupidAllocator is > supported in Luminous. > > A couple of examples of the Aborts are: > > 2017-09-09 12:39:27.044074 7f27f5f20700 4 rocksdb: EVENT_LOG_v1 > {"time_micros": 1504975167035909, "job": 86, "event": "flush_started", > "num_memtables": 1, "num_entries": 1015543, "num_deletes": 345553, > "memory_usage": 260049176} > 2017-09-09 12:39:27.044088 7f27f5f20700 4 rocksdb: > [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293] [default] [JOB 86] > Level-0 flush table #1825: started > 2017-09-09 12:39:28.234651 7f27fff34700 -1 osd.10 pg_epoch: 3521 pg[1.3c7( > v 3521'372186 (3456'369135,3521'372186] local-lis/les=3488/3490 n=2842 > ec=578/66 lis/c 3488/3488 les/c/f 3490/3500/0 3488/3488/3477) [10,8,16] r=0 > lpr=3488 crt=3521'372186 lcod 3521'372184 mlcod 3521'372184 > active+clean+snaptrim snaptrimq=[111~2,115~2,13a~1,13c~3]] removing snap > head > 2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) ** > in thread 7f2816f45700 thread_name:msgr-worker-2 > > ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous > (rc) > 1: (()+0xa562f4) [0x5634e14882f4] > 2: (()+0x11390) [0x7f281b2c5390] > 3: (gsignal()+0x38) [0x7f281a261428] > 4: (abort()+0x16a) [0x7f281a26302a] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f281aba384d] > 6: (()+0x8d6b6) [0x7f281aba16b6] > 7: (()+0x8d701) [0x7f281aba1701] > 8: (()+0xb8d38) [0x7f281abccd38] > 9: (()+0x76ba) [0x7f281b2bb6ba] > 10: (clone()+0x6d) [0x7f281a33282d] > NOTE: a copy of the executable, or `objdump -rdS ` is needed > to interpret this. > > --- begin dump of recent events --- > -1> 2017-09-09 12:39:05.878006 7f2817746700 1 -- > 172.16.2.133:6804/1327479 <== osd.2 172.16.2.131:6800/1710 37506 > osd_repop(mds.0.19:101159707 1.2f1 e3521/3477) v2 998+0+46 (52256346 0 > 1629833233) 0x56359eb29000 con 0x563510c02000 > -> 2017-09-09 12:39:05.878065 7f2816f45700 1 -- > 10.15.2.133:6805/327479 <== mds.0 10.15.2.123:6800/2942775562 55580 > osd_op(mds.0.19:101159714 1.ec 1.ffad68ec (undecoded) > ondisk+write+known_if_redirected+full_force e3521) v8 305+0+366 > (2883828331 0 2609552142) 0x56355d9eb0c0 con 0x56355f455000 > > > Second example: > 2017-09-09 07:10:58.135527 7fa2d56a0700 4 rocksdb: > [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:264] [default] [JOB 10] > Flushing memtable with next log file: 2773 > > 2017-09-09 07:10:58.262058 7fa2d56a0700 4 rocksdb: EVENT_LOG_v1 > {"time_micros": 1504955458135538, "job": 10, "event": "flush_started", > "num_memtables": 1, "num_entries": 935059, "num_deletes": 175946, > "memory_usage": 260049888} > 2017-09-09 07:10:58.262077 7fa2d56a0700 4 rocksdb: > [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293] [default] [JOB 10] > Level-0 flush table #2774: started > 2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught signal (Aborted) ** > in thread 7fa2e96c8700 thread_name:bstore_kv_sync > > ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous > (rc) > 1: (()+0xa562f4) [0x5579585362f4] > 2: (()+0x11390) [0x7fa2faa45390] > 3: (gsignal()+0x38) [0x7fa2f99e1428] > 4: (abort()+0x16a) [0x7fa2f99e302a] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fa2fa32384d] > 6: (()+0x8d6b6) [0x7fa2fa3216b6] > 7: (()+0x8d701) [0x7fa2fa321701] > 8: (()+0x8d919) [0x7fa2fa321919] > 9: (()+0x1230f) [0x7fa2fb60b30f] > 10: (operato
Re: [ceph-users] Is the StupidAllocator supported in Luminous?
Opened: http://tracker.ceph.com/issues/21332 On Sat, Sep 9, 2017 at 10:03 PM, Gregory Farnum wrote: > Yes. Please open a ticket! > > >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph release cadence
As a user, I woul like to add, I would like to see a real 2 year support for LTS releases. Hammer releases were sketchy at best in 2017. When luminous was released The outstanding bugs were auto closed, good buy and good readance. Also the decision to drop certain OS support created a barrier to upgrade and looking at jewel and luminous upgrade path where you cannot easily go back after upgrade is completed doesn't add the confidence. So making upgrades less radical may help production support to be more consistent and update process less dangerous. I would say 9 month is a good reference point but for me it is ready when it is really ready and tested. Keeping development release may be better for devs and early adopters. I don't believe production admins would go for intermediate one's as they being released now. This is only MHO and may be wrong. On Sep 9, 2017 15:32, "Christian Theune" wrote: > Hi, > > have been using Ceph for multiple years now. It’s unclear to me which of > your options fits best, but here are my preferences: > > * Updates are risky in a way that we tend to rather not do them every > year. Also, having seen jewel, we’ve been well off to avoid two > major issues what would have bitten us and will upgrade from hammer in > the next month or so. > > * Non-production releases are of not much value to me, as I have to keep > our dev/staging/prod clusters in sync to work on our stuff. > As you can never downgrade, there’s no value in it for me to try > non-production releases (without frying dev for everyone). > > * I’d prefer stability over new features. *Specifically* that new features > can be properly recombined with existing features (and each > other) without leading to surprises. (E.g. cache tiering breaking with > snapshots and then no way going back and a general notion of > “that combination wasn’t really well tested). > > * I’d prefer versions that I have to be maintained for production-critical > issues maybe 2 years, so I can have some time after a new > production release that overlaps with the new production release > receiving important bug fixes until I switch. > > Maybe this is close to what your "Drop the odd releases, and aim for a ~9 > month cadence.” would say. Waiting for a feature for a year is a pain, but > my personal goal for Ceph is that it first has to work properly, meaning: > not loose your data, not "stopping the show”, and not drawing you into a > corner you can’t get out. > > That’s my perspective as a user. As a fellow developer I feel your pain > about wanting to release faster and reducing maintenance load, so thanks > for asking! > > Hope this helps, > Christian > > > On Sep 6, 2017, at 5:23 PM, Sage Weil wrote: > > > > Hi everyone, > > > > Traditionally, we have done a major named "stable" release twice a year, > > and every other such release has been an "LTS" release, with fixes > > backported for 1-2 years. > > > > With kraken and luminous we missed our schedule by a lot: instead of > > releasing in October and April we released in January and August. > > > > A few observations: > > > > - Not a lot of people seem to run the "odd" releases (e.g., infernalis, > > kraken). This limits the value of actually making them. It also means > > that those who *do* run them are running riskier code (fewer users -> > more > > bugs). > > > > - The more recent requirement that upgrading clusters must make a stop at > > each LTS (e.g., hammer -> luminous not supported, must go hammer -> jewel > > -> lumninous) has been hugely helpful on the development side by reducing > > the amount of cross-version compatibility code to maintain and reducing > > the number of upgrade combinations to test. > > > > - When we try to do a time-based "train" release cadence, there always > > seems to be some "must-have" thing that delays the release a bit. This > > doesn't happen as much with the odd releases, but it definitely happens > > with the LTS releases. When the next LTS is a year away, it is hard to > > suck it up and wait that long. > > > > A couple of options: > > > > * Keep even/odd pattern, and continue being flexible with release dates > > > > + flexible > > - unpredictable > > - odd releases of dubious value > > > > * Keep even/odd pattern, but force a 'train' model with a more regular > > cadence > > > > + predictable schedule > > - some features will miss the target and be delayed a year > > > > * Drop the odd releases but change nothing else (i.e., 12-month release > > cadence) > > > > + eliminate the confusing odd releases with dubious value > > > > * Drop the odd releases, and aim for a ~9 month cadence. This splits the > > difference between the current even/odd pattern we've been doing. > > > > + eliminate the confusing odd releases with dubious value > > + waiting for the next release isn't quite as bad > > - required upgrades every 9 months instead of ever 12 months > > > > * Drop the odd releases, but relax the "must upgr
Re: [ceph-users] Ceph release cadence
As a user, I woul like to add, I would like to see a real 2 year support for LTS releases. Hammer releases were sketchy at best in 2017. When luminous was released The outstanding bugs were auto closed, good buy and good readance. Also the decision to drop certain OS support created a barrier to upgrade and looking at jewel and luminous upgrade path where you cannot easily go back after upgrade is completed doesn't add the confidence. So making upgrades less radical may help production support to be more consistent and update process less dangerous. I would say 9 month is a good reference point but for me it is ready when it is really ready and tested. Keeping development release may be better for devs and early adopters. I don't believe production admins would go for intermediate one's as they being released now. This is only MHO and may be wrong On Saturday, September 9, 2017, Christian Theune wrote: > Hi, > > have been using Ceph for multiple years now. It’s unclear to me which of > your options fits best, but here are my preferences: > > * Updates are risky in a way that we tend to rather not do them every > year. Also, having seen jewel, we’ve been well off to avoid two > major issues what would have bitten us and will upgrade from hammer in > the next month or so. > > * Non-production releases are of not much value to me, as I have to keep > our dev/staging/prod clusters in sync to work on our stuff. > As you can never downgrade, there’s no value in it for me to try > non-production releases (without frying dev for everyone). > > * I’d prefer stability over new features. *Specifically* that new features > can be properly recombined with existing features (and each > other) without leading to surprises. (E.g. cache tiering breaking with > snapshots and then no way going back and a general notion of > “that combination wasn’t really well tested). > > * I’d prefer versions that I have to be maintained for production-critical > issues maybe 2 years, so I can have some time after a new > production release that overlaps with the new production release > receiving important bug fixes until I switch. > > Maybe this is close to what your "Drop the odd releases, and aim for a ~9 > month cadence.” would say. Waiting for a feature for a year is a pain, but > my personal goal for Ceph is that it first has to work properly, meaning: > not loose your data, not "stopping the show”, and not drawing you into a > corner you can’t get out. > > That’s my perspective as a user. As a fellow developer I feel your pain > about wanting to release faster and reducing maintenance load, so thanks > for asking! > > Hope this helps, > Christian > > > On Sep 6, 2017, at 5:23 PM, Sage Weil > > wrote: > > > > Hi everyone, > > > > Traditionally, we have done a major named "stable" release twice a year, > > and every other such release has been an "LTS" release, with fixes > > backported for 1-2 years. > > > > With kraken and luminous we missed our schedule by a lot: instead of > > releasing in October and April we released in January and August. > > > > A few observations: > > > > - Not a lot of people seem to run the "odd" releases (e.g., infernalis, > > kraken). This limits the value of actually making them. It also means > > that those who *do* run them are running riskier code (fewer users -> > more > > bugs). > > > > - The more recent requirement that upgrading clusters must make a stop at > > each LTS (e.g., hammer -> luminous not supported, must go hammer -> jewel > > -> lumninous) has been hugely helpful on the development side by reducing > > the amount of cross-version compatibility code to maintain and reducing > > the number of upgrade combinations to test. > > > > - When we try to do a time-based "train" release cadence, there always > > seems to be some "must-have" thing that delays the release a bit. This > > doesn't happen as much with the odd releases, but it definitely happens > > with the LTS releases. When the next LTS is a year away, it is hard to > > suck it up and wait that long. > > > > A couple of options: > > > > * Keep even/odd pattern, and continue being flexible with release dates > > > > + flexible > > - unpredictable > > - odd releases of dubious value > > > > * Keep even/odd pattern, but force a 'train' model with a more regular > > cadence > > > > + predictable schedule > > - some features will miss the target and be delayed a year > > > > * Drop the odd releases but change nothing else (i.e., 12-month release > > cadence) > > > > + eliminate the confusing odd releases with dubious value > > > > * Drop the odd releases, and aim for a ~9 month cadence. This splits the > > difference between the current even/odd pattern we've been doing. > > > > + eliminate the confusing odd releases with dubious value > > + waiting for the next release isn't quite as bad > > - required upgrades every 9 months instead of ever 12 months > > > > * Drop the odd releases, but relax the
Re: [ceph-users] Ceph release cadence
As a user, I woul like to add, I would like to see a real 2 year support for LTS releases. Hammer releases were sketchy at best in 2017. When luminous was released The outstanding bugs were auto closed, good buy and good readance. Also the decision to drop certain OS support created a barrier to upgrade and looking at jewel and luminous upgrade path where you cannot easily go back after upgrade is completed doesn't add the confidence. So making upgrades less radical may help production support to be more consistent and update process less dangerous. I would say 9 month is a good reference point but for me it is ready when it is really ready and tested. Keeping development release may be better for devs and early adopters. I don't believe production admins would go for intermediate one's as they being released now. This is only MHO and may be wrong. On Sep 9, 2017, 15:32, at 15:32, Christian Theune wrote: >Hi, > >have been using Ceph for multiple years now. It’s unclear to me which >of your options fits best, but here are my preferences: > >* Updates are risky in a way that we tend to rather not do them every >year. Also, having seen jewel, we’ve been well off to avoid two >major issues what would have bitten us and will upgrade from hammer in >the next month or so. > >* Non-production releases are of not much value to me, as I have to >keep our dev/staging/prod clusters in sync to work on our stuff. >As you can never downgrade, there’s no value in it for me to try >non-production releases (without frying dev for everyone). > >* I’d prefer stability over new features. *Specifically* that new >features can be properly recombined with existing features (and each >other) without leading to surprises. (E.g. cache tiering breaking with >snapshots and then no way going back and a general notion of > “that combination wasn’t really well tested). > >* I’d prefer versions that I have to be maintained for >production-critical issues maybe 2 years, so I can have some time after >a new >production release that overlaps with the new production release >receiving important bug fixes until I switch. > >Maybe this is close to what your "Drop the odd releases, and aim for a >~9 month cadence.” would say. Waiting for a feature for a year is a >pain, but my personal goal for Ceph is that it first has to work >properly, meaning: not loose your data, not "stopping the show”, and >not drawing you into a corner you can’t get out. > >That’s my perspective as a user. As a fellow developer I feel your pain >about wanting to release faster and reducing maintenance load, so >thanks for asking! > >Hope this helps, >Christian > >> On Sep 6, 2017, at 5:23 PM, Sage Weil wrote: >> >> Hi everyone, >> >> Traditionally, we have done a major named "stable" release twice a >year, >> and every other such release has been an "LTS" release, with fixes >> backported for 1-2 years. >> >> With kraken and luminous we missed our schedule by a lot: instead of >> releasing in October and April we released in January and August. >> >> A few observations: >> >> - Not a lot of people seem to run the "odd" releases (e.g., >infernalis, >> kraken). This limits the value of actually making them. It also >means >> that those who *do* run them are running riskier code (fewer users -> >more >> bugs). >> >> - The more recent requirement that upgrading clusters must make a >stop at >> each LTS (e.g., hammer -> luminous not supported, must go hammer -> >jewel >> -> lumninous) has been hugely helpful on the development side by >reducing >> the amount of cross-version compatibility code to maintain and >reducing >> the number of upgrade combinations to test. >> >> - When we try to do a time-based "train" release cadence, there >always >> seems to be some "must-have" thing that delays the release a bit. >This >> doesn't happen as much with the odd releases, but it definitely >happens >> with the LTS releases. When the next LTS is a year away, it is hard >to >> suck it up and wait that long. >> >> A couple of options: >> >> * Keep even/odd pattern, and continue being flexible with release >dates >> >> + flexible >> - unpredictable >> - odd releases of dubious value >> >> * Keep even/odd pattern, but force a 'train' model with a more >regular >> cadence >> >> + predictable schedule >> - some features will miss the target and be delayed a year >> >> * Drop the odd releases but change nothing else (i.e., 12-month >release >> cadence) >> >> + eliminate the confusing odd releases with dubious value >> >> * Drop the odd releases, and aim for a ~9 month cadence. This splits >the >> difference between the current even/odd pattern we've been doing. >> >> + eliminate the confusing odd releases with dubious value >> + waiting for the next release isn't quite as bad >> - required upgrades every 9 months instead of ever 12 months >> >> * Drop the odd releases, but relax the "must upgrade through every >LTS" to >> all