i attempted to grab some logs from the two osds in questions with debug_ms
and debug_osd at 20. i have looked through them a little bit but digging
through the logs at this verbosity is something i don't have much
experience with. hopefully someone on the list can help make sense of it.
the logs ar
A quick update on the case.
I think i've isolated the problem. I've spent a while checking the osd servers
for differences in configuration. I've noticed two distinctions. The first one
being the sysctl.conf tuning options for ipoib, which were not present on the
new server. The second one is t
Hi Karsten,
It works!
[root@cephmon03 ~]# systemctl enable ceph-mon@cephmon03
Created symlink from
/etc/systemd/system/ceph-mon.target.wants/ceph-mon@cephmon03.service to
/usr/lib/systemd/system/ceph-mon@.service.
ceph 731 1 0 12:12 ?00:00:00 /usr/bin/ceph-mon -f
--cluster c
Hi,
> i also wonder if just taking 148 out of the cluster (probably just
marking it out) would help
As far as I understand this can only harm your data. The acting set of PG
17.73 is [41, 148],
so after stopping/taking out OSD 148 OSD 41 will store the only copy of
objects in PG 17.73
(so it wo
On Thu, May 22, 2014 at 12:56 PM, Olivier Bonvalet wrote:
>
> Le mercredi 21 mai 2014 à 18:20 -0700, Josh Durgin a écrit :
>> On 05/21/2014 03:03 PM, Olivier Bonvalet wrote:
>> > Le mercredi 21 mai 2014 à 08:20 -0700, Sage Weil a écrit :
>> >> You're certain that that is the correct prefix for the
On Fri, Apr 29, 2016 at 5:54 AM, Alexey Sheplyakov wrote:
> Hi,
>
> > i also wonder if just taking 148 out of the cluster (probably just
> marking it out) would help
>
> As far as I understand this can only harm your data. The acting set of PG
> 17.73 is [41, 148],
> so after stopping/taking out
Hi,
yesterday we ran into a strange bug / mysterious issue with a Hammer
0.94.5 storage cluster.
We added OSDs and the cluster started the backfilling. Suddenly one of
the running VMs complained that it lost a partition in a 2TB RBD.
After resetting the VM it could not boot any more as the RBD h
Hi,
I'd like to watch and monitor Ceph's progress as it as asynchronously
removes the objects belonging to a snapshot that I just deleted.
Is there a to monitor the workqueue?
Is there a better way to do this? And determine when the snapshot as
been completely deleted?
Thanks,
Dyweni
__
are the new osds running 0.94.5 or did they get the latest .6 packages? are
you also using cache tiering? we ran in to a problem with individual rbd
objects getting corrupted when using 0.94.6 with a cache tier
and min_read_recency_for_promote was > 1. our only solution to corruption
that happened
Hi,
all OSDs are running 0.94.5 as the new ones were added to the existing servers.
No cache tiering is involved.
We observed many "slow request" warnings during the backfill.
As the backfilling with the full weight of the new OSDs would have run for more
than 28h and no VM was usable we re-we
This is more of a "why" than a "can I/should I" question.
The Ceph block device quickstart says (if I interpret it correctly) not to use
a physical machine as both a Ceph RBD client and a node for hosting OSDs or
other Ceph services.
Is this interpretation correct? If so, what is the reasoning?
Hello everyone,
Please excuse me if this topic has been covered already. I've not managed to
find a guide, checklist or even a set of notes on optimising OS level
settings/configuration/services for running ceph. One of the main reasons for
asking is I've recently had to troubleshoot a bunch o
Hi,
I had a fully functional Ceph cluster with 3 x86 Nodes and 3 ARM64 nodes, each
with 12 HDD Drives and 2SSD Drives. All these were initially running Hammer,
and then were successfully updated to Infernalis (9.2.0).
I recently deleted all my OSDs and swapped my drives with new ones on the x86
Your fs is throwing an EIO on open.
-Sam
On Fri, Apr 29, 2016 at 8:54 AM, Garg, Pankaj
wrote:
> Hi,
>
> I had a fully functional Ceph cluster with 3 x86 Nodes and 3 ARM64 nodes,
> each with 12 HDD Drives and 2SSD Drives. All these were initially running
> Hammer, and then were successfully update
I can see that. I guess what would that be symptomatic of? How is it doing that
on 6 different systems and on multiple OSDs?
-Original Message-
From: Samuel Just [mailto:sj...@redhat.com]
Sent: Friday, April 29, 2016 8:57 AM
To: Garg, Pankaj
Cc: ceph-users@lists.ceph.com
Subject: Re: [ce
Check system log and search for the corresponding drive. It should have the
information what is failing..
Thanks & Regards
Somnath
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Garg,
Pankaj
Sent: Friday, April 29, 2016 8:59 AM
To: Samuel Jus
You could strace the process to see precisely what ceph-osd is doing
to provoke the EIO.
-Sam
On Fri, Apr 29, 2016 at 9:03 AM, Somnath Roy wrote:
> Check system log and search for the corresponding drive. It should have the
> information what is failing..
>
> Thanks & Regards
> Somnath
>
> -
I think the issue is possibly coming from my Journal drives after upgrade to
Infernalis. I have 2 SSDs, which have 6 partitions each for a total of 12
Journals / server.
When I create OSDS, I pass the partition names as Journals
For e.g. ceph-deploy osd prepare x86Ceph7:/dev/sdd:/dev/sdb1
Thi
On Fri, Apr 29, 2016 at 9:34 AM, Mike Lovell
wrote:
> On Fri, Apr 29, 2016 at 5:54 AM, Alexey Sheplyakov <
> asheplya...@mirantis.com> wrote:
>
>> Hi,
>>
>> > i also wonder if just taking 148 out of the cluster (probably just
>> marking it out) would help
>>
>> As far as I understand this can onl
On 04/29/2016 11:44 AM, Ming Lin wrote:
> On Tue, Jan 19, 2016 at 1:34 PM, Mike Christie wrote:
>> Everyone is right - sort of :)
>>
>> It is that target_core_rbd module that I made that was rejected
>> upstream, along with modifications from SUSE which added persistent
>> reservations support. I
Hi,
I have a little bit of additional information here that might help debug
this situation. From the OSD logs:
2016-04-29 14:32:46.886538 7fa4cd004800 0 osd.2 14422 done with init,
starting boot process
2016-04-29 14:32:46.886555 7fa4cd004800 1 -- 10.2.0.116:6808/32079 -->
10.2.0.117:6789/0 --
Hi,
When I compile Ceph Jewel 10.2.0 using 'make -j1' I get the following
ldap undefined references:
./.libs/librgw.so: undefined reference to `ldap_get_dn'
./.libs/librgw.so: undefined reference to `ldap_search_s'
./.libs/librgw.so: undefined reference to `ldap_memfree'
./.libs/librgw.so: und
Dear Joao , dear ceph users,
Thanks for your fast reply.
I couldn't get to my ceph cluster so far since I was visiting the OpenStack
summit in Austin, TX, and had really no time until now.
I just fixed the monitor, it is up and running again, by removing and re-adding
it.
I still wonder though
It can be done.
However, with the node hosting OSDs already has enough work to do and you will
run into performance issues.
It's been, and can be done, but you are better off to not do so.
//Tu
_
From: Edward Huyer
Sent: Friday, April 29, 2016 11:30 AM
Subject
Actually this guy is already a fan of Hadoop. I was just wondering
whether anyone has been playing around with it on top of cephfs lately.
It seems like the last round of papers were from around cuttlefish.
On 04/28/2016 06:21 AM, Oliver Dzombic wrote:
Hi,
bad idea :-)
Its of course nice a
On Friday, April 29, 2016, Edward Huyer wrote:
> This is more of a "why" than a "can I/should I" question.
>
> The Ceph block device quickstart says (if I interpret it correctly) not to
> use a physical machine as both a Ceph RBD client and a node for hosting
> OSDs or other Ceph services.
>
> Is
26 matches
Mail list logo