[ceph-users] bluestore assertion happening mostly on cachetier SSDs with external WAL/DB nvme

2017-11-07 Thread Eric Nelson
Hi all, This list has been so great a resource for myself in the past few years using ceph that first off I want to say thanks. This is the first I've needed to post but have gained a ton of insight/experience reading responses from you helpful people. We've been running Luminous for about a mont

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-15 Thread Eric Nelson
I've been seeing these as well on our SSD cachetier that's been ravaged by disk failures as of late Same tp_peering assert as above even running luminous branch from git. Let me know if you have a bug filed I can +1 or have found a workaround. E On Wed, Nov 15, 2017 at 10:25 AM, Ashley Merri

Re: [ceph-users] Is the 12.2.1 really stable? Anybody have production cluster with Luminous Bluestore?

2017-11-16 Thread Eric Nelson
We upgraded to it a few weeks ago in order to get some of the new indexing features, but have also had a few nasty bugs in the process (including this one) as we have been upgrading osds from filestore to bluestore. Currently these are isolated to our SSD cache tier so I've been evicting everything

Re: [ceph-users] Is the 12.2.1 really stable? Anybody have production cluster with Luminous Bluestore?

2017-11-16 Thread Eric Nelson
Our cluster here having troubles is primarily for object storage, and somewhere around 650M objects and 600T. Majority of objects being small jpgs, large objects are big movie .ts files and .mp4. This was upgraded from jewel on xenial last month.majority of bugs are ceph-osd on SSDs for us. We've

Re: [ceph-users] OSD killed by OOM when many cache available

2017-11-17 Thread Eric Nelson
One thing that doesn't show up is fs cache, which is likely the cause here. We went through this on our SSDs and had to add the following to stop the crashes. I believe vm.vfs_cache_pressure and min_free_kbytes were the really helpful things in getting the crashes to stop. HTH! sysctl_param 'vm.vf

Re: [ceph-users] OSD killed by OOM when many cache available

2017-11-17 Thread Eric Nelson
fight since then! If I find it I'll send it your way. Cheers, E On Fri, Nov 17, 2017 at 6:03 PM, Sam Huracan wrote: > @Eric: How can I check status of fscache? Why can it be root cause? > > Thanks > > 2017-11-18 7:30 GMT+07:00 Eric Nelson : > >> One thing that do

Re: [ceph-users] Ceph - SSD cluster

2017-11-21 Thread Eric Nelson
Plus one here the evos are terrible On Tue, Nov 21, 2017 at 6:10 AM Phil Schwarz wrote: > Hi, > not a real HAL, but keeping this list [1] in mind is mandatory. > > According to me, use roughly any kind of Intel SSD :3750 in SATA or best > 3700 in MVNE. > Avoid any Samsung pro or EVO of nearly a