You mean that you never see recovery without crush map removal ? That
is strange. I see quick recovery in our two small clusters and even in
our production when a daemon is killed.
It's only when as osd crashes, I don't see recovery in production.
Let me talk to ceph-devel community to find wheth
Have you tried restarting the mons? Did you change timezone or ran hwclock or
something like that during their lifetime? And if you're running them in
containers, are you providing them with /etc/adjtime and such?
Jan
> On 19 May 2016, at 07:29, Stefan Eriksson wrote:
>
> I'm using hammer and
I'm using hammer and centos 7, and this message wont go away:
health HEALTH_WARN
clock skew detected on mon.ceph01-osd02, mon.ceph01-osd03,
mon.ceph01-osd04, mon.ceph01-osd05
I have checked the time on all nodes and it is ok:
for i in ceph01-osd01 ceph01-osd02 ceph01-osd03
Hello Kris,
On Wed, 18 May 2016 19:31:49 -0700 Kris Jurka wrote:
>
>
> On 5/18/2016 7:15 PM, Christian Balzer wrote:
>
> >> We have hit the following issues:
> >>
> >> - Filestore merge splits occur at ~40 MObjects with default
> >> settings. This is a really, really bad couple of days whil
Hello Sage,
On Wed, 18 May 2016 17:23:00 -0400 (EDT) Sage Weil wrote:
> Currently, after an OSD has been down for 5 minutes, we mark the OSD
> "out", whic redistributes the data to other OSDs in the cluster. If the
> OSD comes back up, it marks the OSD back in (with the same reweight
> value,
Hi,
Our VM has been using ceph as block storage for both system and data
patition.
This is what dd shows,
# dd if=/dev/zero of=test.file bs=4k count=1024k
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 16.7969 s, 256 MB/s
When dd again with fdatasync argument,th
On 5/18/2016 7:15 PM, Christian Balzer wrote:
We have hit the following issues:
- Filestore merge splits occur at ~40 MObjects with default settings.
This is a really, really bad couple of days while things settle.
Could you elaborate on that?
As in which settings affect this and what hap
Hello,
On Wed, 18 May 2016 08:14:51 -0500 Brian Felton wrote:
> At my current gig, we are running five (soon to be six) pure object
> storage clusters in production with the following specs:
>
> - 9 nodes
> - 32 cores, 256 GB RAM per node
> - 72 6 TB SAS spinners per node (648 total per clus
Hello,
On Wed, 18 May 2016 12:32:25 -0400 Benjeman Meekhof wrote:
> Hi Lionel,
>
> These are all very good points we should consider, thanks for the
> analysis. Just a couple clarifications:
>
> - NVMe in this system are actually slotted in hot-plug front bays so a
> failure can be swapped on
Hello again,
On Wed, 18 May 2016 15:32:50 +0200 Dietmar Rieder wrote:
> Hello Christian,
>
> > Hello,
> >
> > On Wed, 18 May 2016 13:57:59 +0200 Dietmar Rieder wrote:
> >
> >> Dear Ceph users,
> >>
> >> I've a question regarding the memory recommendations for an OSD node.
> >>
> >> The offici
I am running 10.2.0-0ubuntu0.16.04.1.
I've run into a problem w/ cephfs metadata pool. Specifically I have a pg
w/ an 'unfound' object.
But i can't figure out which since when i run:
ceph pg 12.94 list_unfound
it hangs (as does ceph pg 12.94 query). I know its in the cephfs metadata
pool since I
Dear All...
Our infrastructure is the following:
- We use CEPH/CEPHFS (9.2.0)
- We have 3 mons and 8 storage servers supporting 8 OSDs each.
- We use SSDs for journals (2 SSDs per storage server, each serving
4 OSDs).
- We have one main mds and one standby-replay mds.
- We are
>>On 16-05-18 14:23, Sage Weil wrote:
>> Currently, after an OSD has been down for 5 minutes, we mark the OSD
>> "out", whic redistributes the data to other OSDs in the cluster. If the
>> OSD comes back up, it marks the OSD back in (with the same reweight value,
>> usually 1.0).
>>
>> The good thi
On 16-05-18 14:23, Sage Weil wrote:
Currently, after an OSD has been down for 5 minutes, we mark the OSD
"out", whic redistributes the data to other OSDs in the cluster. If the
OSD comes back up, it marks the OSD back in (with the same reweight value,
usually 1.0).
The good thing about marking
Currently, after an OSD has been down for 5 minutes, we mark the OSD
"out", whic redistributes the data to other OSDs in the cluster. If the
OSD comes back up, it marks the OSD back in (with the same reweight value,
usually 1.0).
The good thing about marking OSDs out is that exactly the amount
Hi Blair,
We use 36 OSDs nodes with journals on HDD running in a 90% object storage
cluster.
The servers have 128 GB RAM and 40 cores (HT) for the storage nodes with 4
TB SAS drives, and 256 GB and 48 cores for the storage nodes with 6 TB SAS
drives.
We use 2x10 Gb bonded for the client network, a
Hi Lionel,
These are all very good points we should consider, thanks for the
analysis. Just a couple clarifications:
- NVMe in this system are actually slotted in hot-plug front bays so a
failure can be swapped online. However I do see your point about this
otherwise being a non-optimal config.
Hi Gaurav,
It could be an issue. But, I never see crush map removal without recovery.
Best regards,
On Wed, May 18, 2016 at 1:41 PM, Gaurav Bafna wrote:
> Is it a known issue and is it expected ?
>
> When as osd is marked out, the reweight becomes 0 and the PGs should
> get remapped , right ?
Hi,
I'm not yet familiar with Jewel, so take this with a grain of salt.
Le 18/05/2016 16:36, Benjeman Meekhof a écrit :
> We're in process of tuning a cluster that currently consists of 3
> dense nodes with more to be added. The storage nodes have spec:
> - Dell R730xd 2 x Xeon E5-2650 v3 @ 2.30
We're in process of tuning a cluster that currently consists of 3
dense nodes with more to be added. The storage nodes have spec:
- Dell R730xd 2 x Xeon E5-2650 v3 @ 2.30GHz (20 phys cores)
- 384 GB RAM
- 60 x 8TB HGST HUH728080AL5204 in MD3060e enclosure attached via 2 x
LSI 9207-8e SAS 6Gbps
- X
On Wed, May 18, 2016 at 3:56 PM, Jürgen Ludyga
wrote:
> Hi,
>
>
>
> I’ve question: Do you ever fix the error you’ve mentioned in this post:
>
>
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-April/000364.html
I did - http://tracker.ceph.com/issues/11449.
Thanks,
Il
Hello Christian,
> Hello,
>
> On Wed, 18 May 2016 13:57:59 +0200 Dietmar Rieder wrote:
>
>> Dear Ceph users,
>>
>> I've a question regarding the memory recommendations for an OSD node.
>>
>> The official Ceph hardware recommendations say that an OSD node should
>> have 1GB Ram / TB OSD [1]
>>
>>
At my current gig, we are running five (soon to be six) pure object storage
clusters in production with the following specs:
- 9 nodes
- 32 cores, 256 GB RAM per node
- 72 6 TB SAS spinners per node (648 total per cluster)
- 7,2 erasure coded pool for RGW buckets
- ZFS as the filesystem on th
Hi Ceph-users,
I am having some trouble in finding the bottleneck in my CephFS Infernalis
setup.
I am running 5 OSD servers which all have 6 OSD's each (so I have 30 OSD's in
total). Each OSD is a physical disk (non SSD) and each OSD has it's journal
stored on the first partition of it's own d
On Wed, 18 May 2016 08:56:51 + Van Leeuwen, Robert wrote:
> >We've hit issues (twice now) that seem (have not
> >figured out exactly how to confirm this yet) to be related to kernel
> >dentry slab cache exhaustion - symptoms were a major slow down in
> >performance and slow requests all over t
Hello,
On Wed, 18 May 2016 13:57:59 +0200 Dietmar Rieder wrote:
> Dear Ceph users,
>
> I've a question regarding the memory recommendations for an OSD node.
>
> The official Ceph hardware recommendations say that an OSD node should
> have 1GB Ram / TB OSD [1]
>
> The "Reference Architecture"
Dear Ceph users,
I've a question regarding the memory recommendations for an OSD node.
The official Ceph hardware recommendations say that an OSD node should
have 1GB Ram / TB OSD [1]
The "Reference Architecture" whitpaper from Red Hat & Supermicro says
that "typically" 2GB of memory per OSD on
Hello,
the following code snippet from rgw_rados.cc shows the problem.
RGWRados::create_bucket(...)
{
...
...
ret = put_linked_bucket_info(info, exclusive, ceph::real_time(),
pep_objv, &attrs, true);
if (ret == -EEXIST) {
...
//* if the bucket has exist, the new bucket ins
>We've hit issues (twice now) that seem (have not
>figured out exactly how to confirm this yet) to be related to kernel
>dentry slab cache exhaustion - symptoms were a major slow down in
>performance and slow requests all over the place on writes, watching
>OSD iostat would show a single drive hitt
Hello,
I am not sure I understood the problem.
Can you post the example steps to reproduce the problem ?
Also what version of Ceph RGW are you running ?
Saverio
2016-05-18 10:24 GMT+02:00 fangchen sun :
> Dear ALL,
>
> I found a problem that the RGW create a new bucket instance and delete
> th
Dear ALL,
I found a problem that the RGW create a new bucket instance and delete
the bucket instance at every create bucket OP with same name
http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html
According to the error code "BucketAlreadyOwnedByYou" from the above
link, shouldn't th
Hello,
On Wed, 18 May 2016 15:54:59 +1000 Blair Bethwaite wrote:
> Hi all,
>
> What are the densest node configs out there, and what are your
> experiences with them and tuning required to make them work? If we can
> gather enough info here then I'll volunteer to propose some upstream
> docs co
> Op 18 mei 2016 om 7:54 schreef Blair Bethwaite :
>
>
> Hi all,
>
> What are the densest node configs out there, and what are your
> experiences with them and tuning required to make them work? If we can
> gather enough info here then I'll volunteer to propose some upstream
> docs covering thi
33 matches
Mail list logo