On Mon, May 30, 2016 at 10:22 PM, qisy wrote:
> Hi,
> After jewel released fs product ready version, I upgrade the old hammer
> cluster, but iops droped a lot
>
> I made a test, with 3 nodes, each one have 8c 16G 1osd, the osd device
> got 15000 iops
>
> I found ceph-fuse client has be
Hello everybody,
we want to upgrade/fix our SAN switches. I kinda screwed up when I was
first planning our CEPH storage cluster.
Right now we have 2 x HP 2530-24G Switch (J9776A). We have 3 server each
outfittet with 2 x 4 gigabit cards. (Don't judge me, I also was on a budget)
Each card go
On Mon, May 30, 2016 at 10:29 PM, David wrote:
> Hi All
>
> I'm having an issue with slow writes over NFS (v3) when cephfs is mounted
> with the kernel driver. Writing a single 4K file from the NFS client is
> taking 3 - 4 seconds, however a 4K write (with sync) into the same folder on
> the serve
Also if for political reasons you need a “vendor” solution – ask Dell about
their DSS 7000 servers – 90 8TB disks and two compute nodes in 4RU would go a
long way to making up a multi-PB Ceph solution.
Supermicro also do a similar solution with some 36, 60 and 90 disk in 4RU
models.
Cisco ha
Hello, i have a cluster running Jewel 10.2.0, 25 OSD's + 4 Mon.
Today my cluster suddenly went unhealth with lots of stuck pg's due
unfound objects, no disks failures nor node crashes, it just went bad.
I managed to put the cluster on health state again by marking lost objects
to delete "ceph pg
Hello,
firstly, I'm not the main network guy here by a long shot, OTOH I do know
a thing or two, my they just be from trial and error.
On Wed, 1 Jun 2016 09:49:53 +0200 David Riedl wrote:
> Hello everybody,
>
> we want to upgrade/fix our SAN switches. I kinda screwed up when I was
> first pla
On Wed, Jun 1, 2016 at 6:15 AM, James Webb wrote:
> Dear ceph-users...
>
> My team runs an internal buildfarm using ceph as a backend storage platform.
> We’ve recently upgraded to Jewel and are having reliability issues that we
> need some help with.
>
> Our infrastructure is the following:
> -
So 3 servers are the entirety of your Ceph storage nodes, right?
Exactly. + 3 Openstack Compute Nodes
Have you been able to determine what causes the drops?
My first guess would be that this bonding is simply not compatible with
what the switches can do/expect.
Yeah, something like that.
Hello,
On Wed, 1 Jun 2016 11:03:16 +0200 David Riedl wrote:
>
> > So 3 servers are the entirety of your Ceph storage nodes, right?
> Exactly. + 3 Openstack Compute Nodes
>
>
> > Have you been able to determine what causes the drops?
> > My first guess would be that this bonding is simply not
Hello Jean-Charles,
Thanks for the tip. When I added my buckets as host aliases it worked with
s3cmd.
I just couldn't visualise how this is a thing, to add bucket names as
hostname aliases.
However, now, from dragondisk and crossftp s3 clients on another machine,
when i try to put requests i get
Just a couple of points.
1. I know you said 10G was not an option, but I would really push for it.
You can pick up Dell 10G-T switches (N4032) for not a lot more than a 48
port 1G switch. They make a lot more difference than just 10x the bandwidth.
With Ceph latency is critical. As its 10G-T, you
I do…
In my case, I have collocated the MONs with some OSDs, and no later than
Saturday when I lost data again, I found out that one of the MON+OSD nodes ran
out of memory and started killing ceph-mon on that node…
At the same moment, all OSDs started to complain about not being able to see
oth
Hi All.
My name is John Haan.
I've been testing Cache Pool using Jewel version on ubuntu 16.04 OS.
I implemented 2 types of cache tiers.
first one is cache pool + erasure pool and the other one is cache pool +
replicated pool
I choose writeback mode of cache mode.
vdbench and rados bench are
> Am 01.06.2016 um 10:25 schrieb Diego Castro :
>
> Hello, i have a cluster running Jewel 10.2.0, 25 OSD's + 4 Mon.
> Today my cluster suddenly went unhealth with lots of stuck pg's due unfound
> objects, no disks failures nor node crashes, it just went bad.
>
> I managed to put the cluster on
> On Wed, Jun 1, 2016 at 6:15 AM, James Webb wrote:
>> Dear ceph-users...
>>
>> My team runs an internal buildfarm using ceph as a backend storage platform.
>> We’ve recently upgraded to Jewel and are having reliability issues that we
>> need some help with.
>>
>> Our infrastructure is the follo
my test fio
fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=1G
-filename=test.iso -name="CEPH 4KB randwrite test" -iodepth=32 -runtime=60
在 16/6/1 15:22, Yan, Zheng 写道:
On Mon, May 30, 2016 at 10:22 PM, qisy wrote:
Hi,
After jewel released fs product ready version,
Hi,
I'm begin to look at rbd mirror features.
How much space does it take ? Is it only a journal with some kind of list of
block changes ?
and how much io/s does it take ?
worst case, 4k block write , how much write in journal ?
My osd are ssd, 1journal + data for each osd/ssd, but I don't ov
Hello Uwe, i also have sortbitwise flag enable and i have the
exactly behavior of yours.
Perhaps this is also the root of my issues, does anybody knows if is safe
to disable it?
---
Diego Castro / The CloudFather
GetupCloud.com - Eliminamos a Gravidade
2016-06-01 7:17 GMT-03:00 Uwe Mesecke :
>
On Wed, 1 Jun 2016, Yan, Zheng wrote:
> On Wed, Jun 1, 2016 at 6:15 AM, James Webb wrote:
> > Dear ceph-users...
> >
> > My team runs an internal buildfarm using ceph as a backend storage
> > platform. We’ve recently upgraded to Jewel and are having reliability
> > issues that we need some help
Hi All,
We are testing a erasure coded pool fronted by rados gateway.
Recently many osds are going down due to out-of-memory.
Here are the details.
*Description of the cluster:*
32 hosts, 6 disks (osds) per host so 32*6 = 192 osds
17024 pgs, 15 pools, 107 TB data, 57616 kobjects
167 TB used, 508
*Hello, Everyone. *
I'm trying to install a Calamari server in my organisation and I'm
encountering some problems.
I have a small dev environment, just 4 OSD nodes and 5 monitors (one of
them is also the RADOS GW). We chose to use Ubuntu 14.04 LTS for all our
servers. The Calamari server is provi
On Wed, Jun 1, 2016 at 2:49 PM, Sage Weil wrote:
> On Wed, 1 Jun 2016, Yan, Zheng wrote:
>> On Wed, Jun 1, 2016 at 6:15 AM, James Webb wrote:
>> > Dear ceph-users...
>> >
>> > My team runs an internal buildfarm using ceph as a backend storage
>> > platform. We’ve recently upgraded to Jewel and a
On Wed, Jun 1, 2016 at 6:52 PM, qisy wrote:
> my test fio
>
> fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=1G
> -filename=test.iso -name="CEPH 4KB randwrite test" -iodepth=32 -runtime=60
>
You were testing direct-IO performance. Hammer does not handle
direct-IO correctly, da
On Wed, Jun 1, 2016 at 8:49 PM, Sage Weil wrote:
> On Wed, 1 Jun 2016, Yan, Zheng wrote:
>> On Wed, Jun 1, 2016 at 6:15 AM, James Webb wrote:
>> > Dear ceph-users...
>> >
>> > My team runs an internal buildfarm using ceph as a backend storage
>> > platform. We’ve recently upgraded to Jewel and a
On Wed, 1 Jun 2016, Yan, Zheng wrote:
> On Wed, Jun 1, 2016 at 8:49 PM, Sage Weil wrote:
> > On Wed, 1 Jun 2016, Yan, Zheng wrote:
> >> On Wed, Jun 1, 2016 at 6:15 AM, James Webb wrote:
> >> > Dear ceph-users...
> >> >
> >> > My team runs an internal buildfarm using ceph as a backend storage
> >
Hey cephers,
Just a reminder that our monthly Ceph developer call is today in just
under 2 hours. Come join us to talk about current work going in to
Ceph. Thanks!
http://wiki.ceph.com/Planning
--
Best Regards,
Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com || http://com
On Wed, Jun 1, 2016 at 4:22 PM, Sage Weil wrote:
> On Wed, 1 Jun 2016, Yan, Zheng wrote:
>> On Wed, Jun 1, 2016 at 8:49 PM, Sage Weil wrote:
>> > On Wed, 1 Jun 2016, Yan, Zheng wrote:
>> >> On Wed, Jun 1, 2016 at 6:15 AM, James Webb wrote:
>> >> > Dear ceph-users...
>> >> >
>> >> > My team runs
Hello all,
I'm running into an issue with ceph osds crashing over the last 4
days. I'm running Jewel (10.2.1) on CentOS 7.2.1511.
A little setup information:
26 hosts
2x 400GB Intel DC P3700 SSDs
12x6TB spinning disks
4x4TB spinning disks.
The SSDs are used for both journals and as an OSD (for t
4. As Ceph has lots of connections on lots of IP's and port's, LACP or the
Linux ALB mode should work really well to balance connections.
Linux ALB Mode looks promising. Does that work with two switches? Each
server has 4 ports which are 'splitted' and connected to each switch.
Can either of you reproduce with logs? That would make it a lot
easier to track down if it's a bug. I'd want
debug osd = 20
debug ms = 1
debug filestore = 20
On all of the osds for a particular pg from when it is clean until it
develops an unfound object.
-Sam
On Wed, Jun 1, 2016 at 5:36 AM, D
Adam,
We ran into similar issues when we get too many objects in bucket
(around 300 million). The .rgw.buckets.index pool became unable to
complete backfill operations.The only way we were able to get past it
was to export the offending placement group with the ceph-objectstore-tool
and
Hi,
I have a Jewel Ceph cluster in OK state and I have a "ceph-fuse" Ubuntu
Trusty client with ceph Infernalis. The cephfs is mounted automatically
and perfectly during the boot via ceph-fuse and this line in /etc/fstab :
~# grep ceph /etc/fstab
id=cephfs,keyring=/etc/ceph/ceph.client.cephfs.keyr
Hi,
I just performed a minor ceph upgrade on my ubuntu 14.04 cluster from ceph
version to0.94.6-1trusty to 0.94.7-1trusty. Upon restarting the OSDs, I
receive the error message:
2016-06-01 12:17:49.219512 7f64a70ea8c0 0 monclient: wait_auth_rotating
timed out after 30
2016-06-01 12:17:49.219
On Wed, Jun 1, 2016 at 10:23 AM, Francois Lafont wrote:
> Hi,
>
> I have a Jewel Ceph cluster in OK state and I have a "ceph-fuse" Ubuntu
> Trusty client with ceph Infernalis. The cephfs is mounted automatically
> and perfectly during the boot via ceph-fuse and this line in /etc/fstab :
>
> ~# gre
Hello Samuel, i'm bit afraid of restarting my osd's again, i'll wait until
the weekend to push the config.
BTW, i just unset sortbitwise flag.
---
Diego Castro / The CloudFather
GetupCloud.com - Eliminamos a Gravidade
2016-06-01 13:39 GMT-03:00 Samuel Just :
> Can either of you reproduce with l
Question:
I'm curious if there is anybody else out there running CephFS at the scale
I'm planning for. I'd like to know some of the issues you didn't expect
that I should be looking out for. I'd also like to simply see when CephFS
hasn't worked out and why. Basically, give me your war stories.
Pr
Hi all,
I'm trying to set up a Ceph cluster with an S3 gateway using the
ceph-ansible playbooks. I'm running into an issue where the radosgw-admin
client can't find the keyring. The path to the keyring is listed in the
ceph.conf file. I confirmed with strace that the client opens the conf
file
I've been attempting to work through this, finding the pgs that are
causing hangs, determining if they are "safe" to remove, and removing
them with ceph-objectstore-tool on osd 16.
I'm now getting hangs (followed by suicide timeouts) referencing pgs
that I've just removed, so this doesn't seem to
Hi,
radosgw-admin is not radosgw. It’s the RADDOS Gateway cli admin utility.
All ceph components by default use the client.admin user name to connect to the
Ceph cluster. If you deployed the radosgw, the gateway itself was properly
configured by Ansible and the files were placed where they have
If that pool is your metadata pool, it looks at a quick glance like
it's timing out somewhere while reading and building up the omap
contents (ie, the contents of a directory). Which might make sense if,
say, you have very fragmented leveldb stores combined with very large
CephFS directories. Tryin
I did use ceph-ansible to deploy the gateway -- using the default
settings. It should work out of the box but does not.
So... can the radosgw-admin CLI utility take a keyring path in the conf
file or does the path need to be manually specified?
And secondly, after copying the keyring to one of t
I tried to compact the leveldb on osd 16 and the osd is still hitting
the suicide timeout. I know I've got some users with more than 1
million files in single directories.
Now that I'm in this situation, can I get some pointers on how can I
use either of your options?
Thanks,
Adam
On Wed, Jun 1,
Was this cluster upgraded to jewel? If so, at what version did it start?
-Sam
On Wed, Jun 1, 2016 at 1:48 PM, Diego Castro
wrote:
> Hello Samuel, i'm bit afraid of restarting my osd's again, i'll wait until
> the weekend to push the config.
> BTW, i just unset sortbitwise flag.
>
>
> ---
> Diego
Yes, it was created as Hammer.
I haven't faced any issues on the upgrade (despite the well know systemd),
and after that the cluster didn't show any suspicious behavior.
---
Diego Castro / The CloudFather
GetupCloud.com - Eliminamos a Gravidade
2016-06-01 18:57 GMT-03:00 Samuel Just :
> Was thi
Looks like I missed the paste:
http://docs.ceph.com/docs/master/man/8/ceph/#options
There you have the options available from the command line.
In your case the user id is radosgw-rgw0 so the command line should be
radosgw-admin --id radosgw.rgw0 usage show or radosgw-admin --name
client.rados
http://tracker.ceph.com/issues/16113
I think I found the bug. Thanks for the report! Turning off
sortbitwise should be an ok workaround for the moment.
-Sam
On Wed, Jun 1, 2016 at 3:00 PM, Diego Castro
wrote:
> Yes, it was created as Hammer.
> I haven't faced any issues on the upgrade (despite
On Wed, Jun 1, 2016 at 9:13 AM, Adam Tygart wrote:
> Hello all,
>
> I'm running into an issue with ceph osds crashing over the last 4
> days. I'm running Jewel (10.2.1) on CentOS 7.2.1511.
>
> A little setup information:
> 26 hosts
> 2x 400GB Intel DC P3700 SSDs
> 12x6TB spinning disks
> 4x4TB spi
On Wed, Jun 1, 2016 at 2:47 PM, Adam Tygart wrote:
> I tried to compact the leveldb on osd 16 and the osd is still hitting
> the suicide timeout. I know I've got some users with more than 1
> million files in single directories.
>
> Now that I'm in this situation, can I get some pointers on how ca
Hey Ceph Community,
I'd like to show everyone a project I've been working on. It parses the
ceph/src/mon/MonCommands.h file and produces a Python file that allows
you to call every possible command Ceph exposes. It also has sub
modules for every release since firefly so you can import the module
Would you enable debug for osd.177
debug osd = 20
debug filestore = 20
debug ms = 1
Cheers,
Shinobu
On Thu, Jun 2, 2016 at 2:31 AM, Jeffrey McDonald wrote:
> Hi,
>
> I just performed a minor ceph upgrade on my ubuntu 14.04 cluster from ceph
> version to0.94.6-1trusty to 0.94.7-1trusty. Upo
Hey Sam,
glad you found the bug. As another data point a just did the whole round of
"healthy -> set sortbitwise -> osd restarts -> unfound objects -> unset
sortbitwise -> healthy" with the debug settings as described by you earlier.
I uploaded the logfiles...
https://www.dropbox.com/s/f5hhptb
Hi,
On 01/06/2016 23:16, Florent B wrote:
> Don't have this problem on Debian migration from Infernalis to Jewel,
> check all permissions...
Ok, it's probably the reason (I hope) but currently I don't find the good unix
rights. I have this (which doesn't work):
~# ll -d /etc/ceph
drwxr-xr-x 2 r
I concur with Greg.
The only way that I was able to get back to Health_OK was to
export/import. * Please note, any time you use the
ceph_objectstore_tool you risk data loss if not done carefully. Never
remove a PG until you have a known good export *
Here are the steps I used:
1. set
Yep, looks like the same issue:
2016-06-02 00:45:27.977064 7fc11b4e9700 10 osd.17 pg_epoch: 11108
pg[34.4a( v 11104'1080336 lc 11104'1080335
(11069'1077294,11104'1080336] local-les=11108 n=50593 ec=2051 les/c/f
11104/11104/0 11106/11107/11107) [17,13] r=0 lpr=11107
pi=11101-11106/3 crt=11104'10803
Could this be the call in RotatingKeyRing::get_secret() failing?
Mathias, I'd suggest opening a tracker for this with the information in
your last post and let us know the number here.
Cheers,
Brad
On Wed, Jun 1, 2016 at 3:15 PM, Mathias Buresch <
mathias.bure...@de.clara.net> wrote:
> Hi,
>
>
Hello,
On Wed, 1 Jun 2016 12:31:41 -0500 Jeffrey McDonald wrote:
> Hi,
>
> I just performed a minor ceph upgrade on my ubuntu 14.04 cluster from
> ceph version to0.94.6-1trusty to 0.94.7-1trusty. Upon restarting the
> OSDs, I receive the error message:
>
Unfortunately (despite what common s
On Wed, 1 Jun 2016 18:11:54 +0200 David Riedl wrote:
>
> > 4. As Ceph has lots of connections on lots of IP's and port's, LACP or
> > the Linux ALB mode should work really well to balance connections.
> Linux ALB Mode looks promising. Does that work with two switches? Each
> server has 4 ports w
Hi eric,
can the new release version 1.4 be used on ceph jewel ?
2016-05-31 15:05 GMT+08:00 eric mourgaya :
> hi guys,
>
> Inkscope 1.4 is released.
> You can find the rpms and debian packages at
> https://github.com/inkscope/inkscope-packaging.
> This release add a monitor panel using coll
First, please check your ceph cluster is HEALTH_OK and then check if you
have the caps the create users.
2016-05-31 16:11 GMT+08:00 Khang Nguyễn Nhật
:
> Thank, Wasserman!
> I followed the instructions here:
> http://docs.ceph.com/docs/master/radosgw/multisite/
> Step 1: radosgw-admin realm cre
I am currently running our Ceph POC environment using dual Nexus 9372TX 10G-T
switches, each OSD host has two connections to each switch and they are formed
into a single 4 link VPC (MC-LAG), which is bonded under LACP on the host side.
What I have noticed is that the various hashing policies f
Hello,
On Wed, 1 Jun 2016 15:50:19 -0500 Brady Deetz wrote:
> Question:
> I'm curious if there is anybody else out there running CephFS at the
> scale I'm planning for. I'd like to know some of the issues you didn't
> expect that I should be looking out for. I'd also like to simply see
> when Ce
Thanks Christian,
I did google the error and I actually found this link. (Of course, I
wouldn't want to waste others' time as either.) It appears to me to be a
different issue than what I see because the OSDs actually fail to start.
Anyways, after a few minutes I restarted the OSDs and they s
Use Haproxy.
sudomakeinstall.com/uncategorized/ceph-radosgw-nginx-tengine-apache-and-now-civetweb
- Original Message -
From: c...@jack.fr.eu.org
To: ceph-users@lists.ceph.com
Sent: Tuesday, May 24, 2016 5:01:05 AM
Subject: Re: [ceph-users] civetweb vs Apache for rgw
I'm using mod_rewrit
Hello,
On Wed, 1 Jun 2016 20:21:29 -0500 Jeffrey McDonald wrote:
> Thanks Christian,
> I did google the error and I actually found this link. (Of course, I
> wouldn't want to waste others' time as either.) It appears to me to be a
> different issue than what I see because the OSDs actually fa
Hello Adrian,
On Thu, 2 Jun 2016 00:53:41 + Adrian Saul wrote:
>
> I am currently running our Ceph POC environment using dual Nexus 9372TX
> 10G-T switches, each OSD host has two connections to each switch and
> they are formed into a single 4 link VPC (MC-LAG), which is bonded under
> LACP
Now, I have a explanation and it's _very_ strange, absolutely not related
to a problem of Unix rights. For memory, my client node is an updated
Ubuntu Trusty and I use ceph-fuse. Here is my fstab line:
~# grep ceph /etc/fstab
id=cephfs,keyring=/etc/ceph/ceph.client.cephfs.keyring,client_mountpoint
On Wed, Jun 1, 2016 at 10:22 PM, Sage Weil wrote:
> On Wed, 1 Jun 2016, Yan, Zheng wrote:
>> On Wed, Jun 1, 2016 at 8:49 PM, Sage Weil wrote:
>> > On Wed, 1 Jun 2016, Yan, Zheng wrote:
>> >> On Wed, Jun 1, 2016 at 6:15 AM, James Webb wrote:
>> >> > Dear ceph-users...
>> >> >
>> >> > My team runs
> > For two links it should be quite good - it seemed to balance across
> > that quite well, but with 4 links it seemed to really prefer 2 in my case.
> >
> Just for the record, did you also change the LACP policies on the switches?
>
> From what I gather, having fancy pants L3+4 hashing on the Li
Thank Wang!
I will check it again.
2016-06-02 7:37 GMT+07:00 David Wang :
> First, please check your ceph cluster is HEALTH_OK and then check if you
> have the caps the create users.
>
> 2016-05-31 16:11 GMT+08:00 Khang Nguyễn Nhật <
> nguyennhatkhang2...@gmail.com>:
>
>> Thank, Wasserman!
>> I f
Hi,
I have 1 cluster as pictured below:
- OSD-host1 run 2 ceph-osd daemon is mounted in /var/ceph/osd0 and
/var/ceph/osd1.
- OSD-host2 run 2 ceph-osd daemon is mounted in /var/ceph/osd2 and
/var/ceph/osd3.
- OSD-host3 only run 1 ceph-osd daemon is mounted in the /var/ceph/osd4.
- This is my mypr
You need to either change failure domain to osd or need at least 5 host to
satisfy host failure domain.
Since it is not satisfying failure domain , pgs are undersized and degraded..
Thanks & Regards
Somnath
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Khang
Nguy?n Nh
Hi Everyone,
Can anyone tell me how the ceph pg x.x mark_unfound_lost revert|delete
command is meant to work?
Due to some not fully know strange circumstances I have 1 unfound
object in one of my pools.
I've read through
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#unf
72 matches
Mail list logo