I filled http://tracker.ceph.com/issues/17858 recently, I am seeing this
problem on 10.2.3 ceph-fuse, but maybe kernel client is affected too.
It is easy to replicate, just do deep "mkdir -p", e.g. "mkdir -p
1/2/3/4/5/6/7/8/9/0/1/2/3/4/5/6/7/8/9"
On 16-11-11 10:46, Dan van der Ster wrote:
Hi
Hi,
I have two OSD's which are failing with an assert which looks related to
missing objects. This happened after a large RBD snapshot
was deleted causing several OSD's to start flapping as they experienced high
load. Cluster is fully recovered and I don't need any
help from a recovery perspecti
Hi All,
Just a slight note of caution. I had been running the 4.7 kernel (With Ubuntu
16.04) on the majority of my OSD Nodes, as when I
installed the cluster there was that outstanding panic bug with the 4.4 kernel.
I have been experiencing a lot of flapping OSD's
every time the cluster was p
Hi,
On 15/11/16 01:27, Craig Chi wrote:
> What's your Ceph version?
> I am using Jewel 10.2.3 and systemd seems to work normally. I deployed
> Ceph by ansible, too.
The version in Ubuntu 16.04, which is 10.2.2-0ubuntu0.16.04.2
> You can check whether you have /lib/systemd/system/ceph-mon.target
Hi,
We observed the same behavior with kernel 4.7 and Ubuntu 14.04 under heavy
load. Kernel 4.2 is stable. We use only S3 gateway.
--
Jarek
--
Jarosław Owsiewski
2016-11-15 11:31 GMT+01:00 Nick Fisk :
> Hi All,
>
>
>
> Just a slight note of caution. I had been running the 4.7 kernel (With
>
Hi Chris,
We checked memory as well and we have plenty of free memory (12GB used / 125GB
available) on each and every DN.
Actually we have activated some Debug logs yesterday and found many messages
like :
1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7ff9bdb42700' had timed out
after 1
On Mon, Nov 14, 2016 at 11:35 PM, Goncalo Borges
wrote:
> Hi John...
>
> Thanks for replying.
>
> Some of the requested input is inline.
>
> Cheers
>
> Goncalo
>
>
>>>
>>>
>>> We are currently undergoing an infrastructure migration. One of the first
>>> machines to go through this migration proces
Hi,
You can try to manually fix this by adding the
/lib/systemd/system/ceph-mon.target file, which contains:
===
[Unit]
Description=ceph target allowing to start/stop all ceph-mon@.service instances
at once
PartOf=ceph.target
[Install]
WantedBy
Hello!
I have problem with slow requests on kernel 4.4.0-45 , rolled back all
nodes to 4.4.0-42
Ubuntu 16.04.1 LTS (Xenial Xerus)
ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
cid:image001.png@01CDADF0.79E46560
Оралов Алексей
Отдел корпоративной сети и техно
Hi,
after running a cephfs on my ceph cluster I got stuck with the following
heath status:
# ceph status
cluster ac482f5b-dce7-410d-bcc9-7b8584bd58f5
health HEALTH_WARN
128 pgs degraded
128 pgs stuck unclean
128 pgs undersized
recovery 24/4
Also, i instructed all unclean pgs to repair and nothing happend. I did it
like this:
~# for pg in `ceph pg dump_stuck unclean 2>&1 | grep -Po
'[0-9]+\.[A-Za-z0-9]+'`; do ceph pg repair $pg; done
On Tue, Nov 15, 2016 at 9:58 AM Webert de Souza Lima
wrote:
> Hi,
>
> after running a cephfs on my c
On Tue, Nov 15, 2016 at 11:58 AM, Webert de Souza Lima
wrote:
> Hi,
>
> after running a cephfs on my ceph cluster I got stuck with the following
> heath status:
>
> # ceph status
> cluster ac482f5b-dce7-410d-bcc9-7b8584bd58f5
> health HEALTH_WARN
> 128 pgs degraded
>
Hey John.
Just to be sure; by "deleting the pools" you mean the *cephfs_metadata* and
*cephfs_metadata* pools, right?
Does it have any impact over radosgw? Thanks.
On Tue, Nov 15, 2016 at 10:10 AM John Spray wrote:
> On Tue, Nov 15, 2016 at 11:58 AM, Webert de Souza Lima
> wrote:
> > Hi,
> >
>
I'm sorry, I meant *cephfs_data* and *cephfs_metadata*
On Tue, Nov 15, 2016 at 10:15 AM Webert de Souza Lima
wrote:
> Hey John.
>
> Just to be sure; by "deleting the pools" you mean the *cephfs_metadata*
> and *cephfs_metadata* pools, right?
> Does it have any impact over radosgw? Thanks.
>
> O
On Tue, Nov 15, 2016 at 12:14 PM, Webert de Souza Lima
wrote:
> Hey John.
>
> Just to be sure; by "deleting the pools" you mean the cephfs_metadata and
> cephfs_metadata pools, right?
> Does it have any impact over radosgw? Thanks.
Yes, I meant the cephfs pools. It doesn't affect rgw (assuming y
Not that I know of. On 5 other clusters it works just fine and
configuration is the same for all.
On this cluster I was using only radosgw, but cephfs was not in use but it
had been already created following our procedures.
This happened right after mounting it.
On Tue, Nov 15, 2016 at 10:24 AM J
Hi,
On 11/15/2016 01:27 PM, Webert de Souza Lima wrote:
Not that I know of. On 5 other clusters it works just fine and
configuration is the same for all.
On this cluster I was using only radosgw, but cephfs was not in use
but it had been already created following our procedures.
This happene
On 11/15/16 12:58, Оралов Алкексей wrote:
>
>
>
> Hello!
>
>
>
> I have problem with slow requests on kernel 4.4.0-45 , rolled back all
> nodes to 4.4.0-42
>
>
>
> Ubuntu 16.04.1 LTS (Xenial Xerus)
>
> ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>
>
>
Can you describe yo
Which kernel version are you using?
I have a similar issue..ubuntu 14.04 kernel 3.13.0-96-generic, and ceph
jewel 10.2.3.
I get logs like this:
2016-11-15 13:13:57.295067 osd.9 10.3.0.132:6817/24137 98 : cluster
[WRN] 16 slow requests, 5 included below; oldest blocked for > 7.957045 secs
I set o
sure, as requested:
*cephfs* was created using the following command:
ceph osd pool create cephfs_metadata 128 128
ceph osd pool create cephfs_data 128 128
ceph fs new cephfs cephfs_metadata cephfs_data
*ceph.conf:*
https://paste.debian.net/895841/
*# ceph osd crush tree*https://paste.debian.n
Hi Peter,
Ceph cluster version is 0.94.5 and we are running with Firefly tunables and
also we have 10KPGs instead of the 30K / 40K we should have.
The linux kernel version is 3.10.0-327.36.1.el7.x86_64 with RHEL 7.2
On our side we havethe following settings:
mon_osd_adjust_heartbeat_grace = fals
Hi,
On 11/15/2016 01:55 PM, Webert de Souza Lima wrote:
sure, as requested:
*cephfs* was created using the following command:
ceph osd pool create cephfs_metadata 128 128
ceph osd pool create cephfs_data 128 128
ceph fs new cephfs cephfs_metadata cephfs_data
*ceph.conf:*
https://paste.debian
Hello,
wie have setup a ceph cluster with 10.0.2.3. under centos7.
We have some Directories with more than 100k entries.
We cannot! Unfortunately reduce directory count on the 100k Directories. As
well as we don't want a ceph cluster with development functions.
We installed the jewel release bec
Right, thank you.
On this particular cluster it would be Ok to have everything on the HDD. No
big traffic here.
In order to do that, do I need to delete this cephfs, delete its pools and
create them again?
After that I assume I would run ceph osd pool set cephfs_metadata
crush_ruleset 0, as 0 is
Hi,
does anyone ever tried to run ceph monitors in containers?
Could it lead to performance issues?
Can I run monitor containers on the OSD nodes?
I don’t want to buy 3 dedicated servers. Is there any other solution?
Thanks
Best regards
Matteo Dacrema
__
I've had lots of success running monitors in VM's. Never tried the
container route but there is a ceph-docker project
https://github.com/ceph/ceph-docker if you want to give it a shot. I don't
know how highly recommended that it though, I've got no personal experience
with it.
No matter what you w
In addition, Red Hat is shipping a containerized Ceph (all daemons, not
just mons) as a tech preview in RHCS, and the plan is to support it
going forward. We have not seen performance issues related to being
containerized. It's based on the ceph-docker and ceph-ansible projects.
Daniel
On 1
We are running all Ceph services inside LXC containers with XFS bind
mounts since few years and it works great. Additionally we use macvlan
for networking so each container has it's own IP address without any NATing.
As for Docker (and specifically aufs/overlay), I would advise to test
for data in
Hello,
We have 3 RGW servers setup with 5 OSDs. We have an application that is
doing pretty steady writes, as well as a bunch of reads from that and other
applications.
Over the last week or so we have been seeing the app doing the writing
getting blocked connections randomly, and in the RGW logs
I removed cephfs and its pools, created everything again using the default
crush ruleset, which is for the HDD, and now ceph health is OK.
I appreciate your help. Thank you very much.
On Tue, Nov 15, 2016 at 11:48 AM Webert de Souza Lima
wrote:
> Right, thank you.
>
> On this particular cluster
http://tracker.ceph.com/issues/17916
I just pushed a branch wip-17916-jewel based on v10.2.3 with some
additional debugging. Once it builds, would you be able to start the
afflicted osds with that version of ceph-osd and
debug osd = 20
debug ms = 1
debug filestore = 20
and get me the log?
-Sam
On 11/15/16 14:05, Thomas Danan wrote:
> Hi Peter,
>
> Ceph cluster version is 0.94.5 and we are running with Firefly tunables and
> also we have 10KPGs instead of the 30K / 40K we should have.
> The linux kernel version is 3.10.0-327.36.1.el7.x86_64 with RHEL 7.2
>
> On our side we havethe follow
Very interesting ...
Any idea why optimal tunable would help here ? on our cluster we have 500TB of
data, I am a bit concerned about changing it without taking lot of precautions
. ...
I am curious to know how much time it takes you to change tunable, size of your
cluster and observed impacts
I think you may need to re-evaluate your situation. If you aren't
willing to spend the $ on 3 Dedicated Servers, is your platform big
enough to warrant the need for Ceph?
On 16/11/16 01:25, Matteo Dacrema wrote:
Hi,
does anyone ever tried to run ceph monitors in containers?
Could it lead to
I forgot to mention that we are running 2 of our 3 monitors in VM's on our
OSD nodes. It's a small cluster with only two OSD nodes. The third monitor
is on a VM on a separate host. It works well but we made sure the OSD's had
plenty of extra resources to accommodate the VM's and the host OS is
runn
On 11/15/16 22:13, Thomas Danan wrote:
> Very interesting ...
>
> Any idea why optimal tunable would help here ?
I think there are some versions where it rebalances data a bunch to even
things out... I don't know why I think that...where I read it or
anything. Maybe it was only argonaut vs newer. B
Hi everyone,
There was a regression in jewel that can trigger long OSD stalls during
scrub. How long the stalls are depends on how many objects are in your
PGs, how fast your storage device is, and what is cached, but in at least
one case they were long enough that the OSD internal heartbeat c
Hello,
How does the rgw cache work ? Is there any situation in which it would be
better to disable it ?
Regards,
Martin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On Tue, Nov 15, 2016 at 8:40 AM, Hauke Homburg wrote:
> In the last weeks we enabled for testing the dir fragmentation. The Resultat
> is that we have sometimes error messages with rsync with unlink and no-space
> left on device.
Enabling directory fragmentation would not cause the unlink and ENO
Dear All,
Any suggestion in this regard will be helpful.
Thanks,
Daleep Singh Bais
Forwarded Message
Subject:iSCSI Lun issue after MON Out Of Memory
Date: Tue, 15 Nov 2016 11:58:07 +0530
From: Daleep Singh Bais
To: ceph-users
Hello friends,
I had RBD imag
Hi All,
We have a Ceph Storage Cluster and it's been integrated with our Openstack
private cloud.
We have created a Pool for Volume which allows our Openstack Private Cloud user
to create a volume from image and boot from volume.
Additionally our images(both Ubuntu1404 and CentOS 7) are in a raw
41 matches
Mail list logo