count
> while scrubbing due to the missing object, but I don't think so.
>
> Anyway, I just wanted to thank you for your help!
>
> Best wishes,
>
> Lawrence
>
> On 10/13/2018 02:00 AM, Mike Lovell wrote:
>
> what was the object name that you marked lost? was it
what was the object name that you marked lost? was it one of the cache tier
hit_sets?
the trace you have does seem to be failing when the OSD is trying to remove
a hit set that is no longer needed. i ran into a similar problem which
might have been why that bug you listed was created. maybe provid
On Thu, Mar 29, 2018 at 1:17 AM, Jakub Jaszewski
wrote:
> Many thanks Mike, that justifies stopped IOs. I've just finished adding
> new disks to cluster and now try to evenly reweight OSD by PG.
>
> May I ask you two more questions?
> 1. As I was in a hurry I did not check if only write ops were
was the pg-upmap feature used to force a pg to get mapped to a particular
osd?
mike
On Thu, Feb 22, 2018 at 10:28 AM, Wido den Hollander wrote:
> Hi,
>
> I have a situation with a cluster which was recently upgraded to Luminous
> and has a PG mapped to OSDs on the same host.
>
> root@man:~# cep
mike
On Thu, Feb 22, 2018 at 3:58 PM, Hans Chris Jones <
chris.jo...@lambdastack.io> wrote:
> Interesting. This does not inspire confidence. What SSDs (2TB or 4TB) do
> people have good success with in high use production systems with bluestore?
>
> Thanks
>
> On Thu, F
t;
> I returned the lot and am done with Intel SSDs, will advise as many
> customers and peers to do the sameā¦
>
>
>
>
>
> Regards
>
> David Herselman
>
>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Mike Lovell
> *Se
has anyone tried with the most recent firmwares from intel? i've had a
number of s4600 960gb drives that have been waiting for me to get around to
adding them to a ceph cluster. this as well as having 2 die almost
simultaneously in a different storage box is giving me pause. i noticed
that David li
On Tue, Jan 16, 2018 at 9:25 AM, Jens-U. Mozdzen wrote:
> Hello Mike,
>
> Zitat von Mike Lovell :
>
>> On Mon, Jan 8, 2018 at 6:08 AM, Jens-U. Mozdzen wrote:
>>
>>> Hi *,
>>> [...]
>>> 1. Does setting the cache mode to "forward" lead
On Mon, Jan 8, 2018 at 6:08 AM, Jens-U. Mozdzen wrote:
> Hi *,
>
> trying to remove a caching tier from a pool used for RBD / Openstack, we
> followed the procedure from http://docs.ceph.com/docs/mast
> er/rados/operations/cache-tiering/#removing-a-writeback-cache and ran
> into problems.
>
> The
On Mon, Mar 20, 2017 at 4:20 PM, Nick Fisk wrote:
> Just a few corrections, hope you don't mind
>
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> > Mike Lovell
> > Sent: 20 March 2017 20:30
> &g
i'm not an expert but here is my understanding of it. a hit_set keeps track
of whether or not an object was accessed during the timespan of the
hit_set. for example, if you have a hit_set_period of 600, then the hit_set
covers a period of 10 minutes. the hit_set_count defines how many of the
hit_se
has anyone on the list done an upgrade from hammer (something later than
0.94.6) to jewel with a cache tier configured? i tried doing one last week
and had a hiccup with it. i'm curious if others have been able to
successfully do the upgrade and, if so, did they take any extra steps
related to the
i started an upgrade process to go from 0.94.7 to 10.2.5 on a production
cluster that is using cache tiering. this cluster has 3 monitors, 28
storage nodes, around 370 osds. the upgrade of the monitors completed
without issue. i then upgraded 2 of the storage nodes, and after the
restarts, the osds
i was just testing an upgrade of some monitors in a test cluster from
hammer (0.94.7) to jewel (10.2.5). after upgrade each of the first two
monitors, i stopped and restarted a single osd to cause changes in the
maps. the same error messages showed up in ceph -w. i haven't dug into it
much but just
On Wed, Jun 1, 2016 at 9:13 AM, Adam Tygart wrote:
> Hello all,
>
> I'm running into an issue with ceph osds crashing over the last 4
> days. I'm running Jewel (10.2.1) on CentOS 7.2.1511.
>
> A little setup information:
> 26 hosts
> 2x 400GB Intel DC P3700 SSDs
> 12x6TB spinning disks
> 4x4TB spi
On Fri, Apr 29, 2016 at 9:34 AM, Mike Lovell
wrote:
> On Fri, Apr 29, 2016 at 5:54 AM, Alexey Sheplyakov <
> asheplya...@mirantis.com> wrote:
>
>> Hi,
>>
>> > i also wonder if just taking 148 out of the cluster (probably just
>> marking it out) would
are the new osds running 0.94.5 or did they get the latest .6 packages? are
you also using cache tiering? we ran in to a problem with individual rbd
objects getting corrupted when using 0.94.6 with a cache tier
and min_read_recency_for_promote was > 1. our only solution to corruption
that happened
On Fri, Apr 29, 2016 at 5:54 AM, Alexey Sheplyakov wrote:
> Hi,
>
> > i also wonder if just taking 148 out of the cluster (probably just
> marking it out) would help
>
> As far as I understand this can only harm your data. The acting set of PG
> 17.73 is [41, 148],
> so after stopping/taking out
progress we'll need debug ms = 20 on both
> sides of the connection when a message is lost.
> -Sam
>
> On Thu, Apr 28, 2016 at 2:38 PM, Mike Lovell
> wrote:
> > there was a problem on one of the clusters i manage a couple weeks ago
> where
> > pairs of OSDs would w
there was a problem on one of the clusters i manage a couple weeks ago
where pairs of OSDs would wait indefinitely on subops from the other OSD in
the pair. we used a liberal dose of "ceph osd down ##" on the osds and
eventually things just sorted them out a couple days later.
it seems to have com
? anyone have a similar problem?
mike
On Mon, Mar 14, 2016 at 8:51 PM, Mike Lovell
wrote:
> something weird happened on one of the ceph clusters that i administer
> tonight which resulted in virtual machines using rbd volumes seeing
> corruption in multiple forms.
>
> when ever
set to greater than 1.
mike
On Wed, Mar 16, 2016 at 4:41 PM, Mike Lovell
wrote:
> robert and i have done some further investigation the past couple days on
> this. we have a test environment with a hard drive tier and an ssd tier as
> a cache. several vms were created with volumes from
close.
mike
On Mon, Mar 14, 2016 at 9:35 PM, Christian Balzer wrote:
>
> Hello,
>
> On Mon, 14 Mar 2016 20:51:04 -0600 Mike Lovell wrote:
>
> > something weird happened on one of the ceph clusters that i administer
> > tonight which resulted in virtual machines using rbd
something weird happened on one of the ceph clusters that i administer
tonight which resulted in virtual machines using rbd volumes seeing
corruption in multiple forms.
when everything was fine earlier in the day, the cluster was a number of
storage nodes spread across 3 different roots in the cru
12 osds running. it looks like they're creating over 2500
threads each. i don't know the internals of the code but that seems like a
lot. oh well. hopefully this fixes it.
mike
On Mon, Mar 7, 2016 at 1:55 PM, Gregory Farnum wrote:
> On Mon, Mar 7, 2016 at 11:04 AM, Mike Lovell
first off, hello all. this is my first time posting to the list.
i have seen a recurring problem that has starting in the past week or so on
one of my ceph clusters. osds will crash and it seems to happen whenever
backfill or recovery is started. looking at the logs it appears that the
the osd is
26 matches
Mail list logo