Hi
I monitor dmesg in each of the 3 nodes, no hardware issue reported. And
the problem happens with various different OSDs in different nodes, for
me it is clear it's not an hardware problem.
Thanks for reply
Il 05/03/2018 21:45, Vladimir Prokofev ha scritto:
> always solved by ceph pg re
On Sun, Mar 4, 2018 at 12:02 AM Mayank Kumar wrote:
> Ceph Users,
>
> My question is if all mons are down(i know its a terrible situation to
> be), does an existing rbd volume which is mapped to a host and being
> used(read/written to) continues to work?
>
> I understand that it wont get notifica
On Fri, Mar 2, 2018 at 3:54 PM, Alex Gorbachev wrote:
> On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote:
>> Blocked requests and slow requests are synonyms in ceph. They are 2 names
>> for the exact same thing.
>>
>>
>> On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev
>> wrote:
>>>
>>> On Thu,
On Mon, Mar 5, 2018 at 2:17 PM, Gregory Farnum wrote:
> On Thu, Mar 1, 2018 at 9:21 AM Max Cuttins wrote:
>>
>> I think this is a good question for everybody: How hard should be delete a
>> Pool?
>>
>> We ask to tell the pool twice.
>> We ask to add "--yes-i-really-really-mean-it"
>> We ask to ad
Hello ceph-users,It is a really really Really tough problem for our team.We investigated in the problem for a long time, try a lot of efforts, but can’t solve the problem, even the concentrate cause of the problem is still unclear for us!So, Anyone give any solution/suggestion/opinion whatever wil
On 2018/02/28 3:32 pm, David Turner wrote:
You could probably write an SNMP module for the new ceph-mgr daemon.
What do you want to use to monitor Ceph that requires SNMP?
On Wed, Feb 28, 2018 at 1:13 PM Andre Goree wrote:
I've looked and haven't found much information besides custom
3rd-pa
> always solved by ceph pg repair
That doesn't necessarily means that there's no hardware issue. In my case
repair also worked fine and returned cluster to OK state every time, but in
time faulty disk fail another scrub operation, and this repeated multiple
times before we replaced that disk.
One
On Mon, Mar 5, 2018 at 2:07 PM, Brady Deetz wrote:
> While preparing a risk assessment for a DR solution involving RBD, I'm
> increasingly unsure of a few things.
>
> 1) Does the failover from primary to secondary cluster occur automatically
> in the case that the primary backing rados pool become
Dear all,
I have some questions about cache tier in ceph:
1. Can someone share experiences with cache tiering? What are the sensitive
things to pay attention regarding the cache tier? Can one use the same ssd for
both cache and
2. Is cache tiering supported with bluestore? Any advices for usin
On Thu, Mar 1, 2018 at 9:21 AM Max Cuttins wrote:
> I think this is a good question for everybody: How hard should be delete a
> Pool?
>
> We ask to tell the pool twice.
> We ask to add "--yes-i-really-really-mean-it"
> We ask to add ability to mons to delete the pool (and remove this ability
> A
While preparing a risk assessment for a DR solution involving RBD, I'm
increasingly unsure of a few things.
1) Does the failover from primary to secondary cluster occur automatically
in the case that the primary backing rados pool becomes inaccessible?
1.a) If the primary backing rados pool is un
On Mon, Mar 5, 2018 at 9:56 AM Jonathan D. Proulx wrote:
> Hi All,
>
> I've recently noticed my deep scrubs are EXTREAMLY poorly
> distributed. They are stating with in the 18->06 local time start
> stop time but are not distrubuted over enough days or well distributed
> over the range of days t
Hi All,
I've recently noticed my deep scrubs are EXTREAMLY poorly
distributed. They are stating with in the 18->06 local time start
stop time but are not distrubuted over enough days or well distributed
over the range of days they have.
root@ceph-mon0:~# for date in `ceph pg dump | awk '/active/
I had a similar problem with some relatively underpowered servers (2x
E5-2603 6 core 1.7ghz no HT, 12-14 2TB OSDs per server, 32Gb RAM)
There was a process on a couple of the servers that would hang and chew up
all available CPU. When that happened, I started getting scrub errors on
those servers.
Hi and thanks for reply
The OSDs are all healthy, in fact after a ceph pg repair the ceph
health is back to OK and in the OSD log I see repair ok, 0 fixed
The SMART data of the 3 OSDs seems fine
*OSD.5*
# ceph-disk list | grep osd.5
/dev/sdd1 ceph data, active, cluster ceph, osd.5, block
On 05. mars 2018 14:45, Jan Marquardt wrote:
Am 05.03.18 um 13:13 schrieb Ronny Aasen:
i had some similar issues when i started my proof of concept. especialy
the snapshot deletion i remember well.
the rule of thumb for filestore that i assume you are running is 1GB ram
per TB of osd. so with 8
Hi
I just posted in the ceph tracker with my logs and my issue
Let's hope this will be fixed
Thanks
Il 05/03/2018 13:36, Paul Emmerich ha scritto:
Hi,
yeah, the cluster that I'm seeing this on also has only one host that
reports that specific checksum. Two other hosts only report the same
Am 05.03.18 um 13:13 schrieb Ronny Aasen:
> i had some similar issues when i started my proof of concept. especialy
> the snapshot deletion i remember well.
>
> the rule of thumb for filestore that i assume you are running is 1GB ram
> per TB of osd. so with 8 x 4TB osd's you are looking at 32GB o
> candidate had a read error
speaks for itself - while scrubbing it coudn't read data.
I had similar issue, and it was just OSD dying - errors and relocated
sectors in SMART, just replaced the disk. But in your case it seems that
errors are on different OSDs? Are your OSDs all healthy?
You can use
Hi,
yeah, the cluster that I'm seeing this on also has only one host that
reports that specific checksum. Two other hosts only report the same error
that you are seeing.
Could you post to the tracker issue that you are also seeing this?
Paul
2018-03-05 12:21 GMT+01:00 Marco Baldini - H.S. Amiat
On 05. mars 2018 11:21, Jan Marquardt wrote:
Hi,
we are relatively new to Ceph and are observing some issues, where
I'd like to know how likely they are to happen when operating a
Ceph cluster.
Currently our setup consists of three servers which are acting as
OSDs and MONs. Each server has two
I'll pitch in my personal expirience.
When single OSD in a pool becomes full(95% used), then all client IO writes
to this pool must stop, even if other OSDs are almost free. This is done
for the purpose of data intergity. [1]
To avoid this you need to balance your failure domains.
For example, ass
After going through:
https://de.slideshare.net/sageweil1/bluestore-a-new-storage-backend-for-ceph-one-year-in
I can already answer some of my own questions - notably, compaction should
happen slowly,
and there is high write amplification for SSDs, which could explain why our
SSDs in our MDS reach
One full OSD has caused that all pools got full. Can anyone help me
understand this ?
During ongoing PGs backfilling I see that MAX AVAIL values are changing
when USED values are constant.
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
425T 145T 279T 65.70
POOLS:
On 05.03.2018 00:26, Adrian Saul wrote:
>
>
> We are using Ceph+RBD+NFS under pacemaker for VMware. We are doing
> iSCSI using SCST but have not used it against VMware, just Solaris and
> Hyper-V.
>
>
> It generally works and performs well enough – the biggest issues are the
> clustering for
Hi
After some days with debug_osd 5/5 I found [ERR] in different days,
different PGs, different OSDs, different hosts. This is what I get in
the OSD logs:
*OSD.5 (host 3)*
2018-03-01 20:30:02.702269 7fdf4d515700 2 osd.5 pg_epoch: 16486 pg[9.1c( v
16486'51798 (16431'50251,16486'51798] local-
Dear Cephalopodians,
in the benchmarks done with many files, I noted that our bottleneck was mainly
given by the MDS-SSD performance,
and notably, after deletion of the many files in CephFS, the RocksDB stayed
large and did not shrink.
Recreating an OSD from scratch and backfilling it, however,
There is a presentation of sage, slide 16,
https://es.slideshare.net/sageweil1/bluestore-a-new-storage-backend-for-ceph-one-year-in
You can probably try that as an initial guide, hope it helps.
gus
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Frank
Ritchie
Sent: do
Hi,
we are relatively new to Ceph and are observing some issues, where
I'd like to know how likely they are to happen when operating a
Ceph cluster.
Currently our setup consists of three servers which are acting as
OSDs and MONs. Each server has two Intel Xeon L5420 (yes, I know,
it's not state o
29 matches
Mail list logo