On Mon, Aug 17, 2015 at 8:21 PM, Patrik Plank wrote:
> Hi,
>
>
> have a ceph cluster witch tree nodes and 32 osds.
>
> The tree nodes have 16Gb memory but only 5Gb is in use.
>
> Nodes are Dell Poweredge R510.
>
>
> my ceph.conf:
>
>
> [global]
> mon_initial_members = ceph01
> mon_host = 10.0.0.20
>From a quick peek it looks like some of the OSDs are missing clones of
objects. I'm not sure how that could happen and I'd expect the pg
repair to handle that but if it's not there's probably something
wrong; what version of Ceph are you running? Sam, is this something
you've seen, a new bug, or s
On Thu, Aug 20, 2015 at 11:07 AM, Simon Hallam wrote:
> Hey all,
>
>
>
> We are currently testing CephFS on a small (3 node) cluster.
>
>
>
> The setup is currently:
>
>
>
> Each server has 12 OSDs, 1 Monitor and 1 MDS running on it:
>
> The servers are running: 0.94.2-0.el7
>
> The clients are r
On Fri, Aug 21, 2015 at 10:27 PM, Scottix wrote:
> I saw this article on Linux Today and immediately thought of Ceph.
>
> http://www.enterprisestorageforum.com/storage-management/object-storage-vs.-posix-storage-something-in-the-middle-please-1.html
>
> I was thinking would it theoretically be pos
mds.0 [INF] denied reconnect attempt (mds is
> up:active) from client.24149 10.10.10.68:0/22416725 after 648.029988 (allowed
> interval 45)
>
> I did just notice that none of the times match up. So may try again once I
> fix ntp/chrony and see if that makes a difference.
>
> Ch
On Wed, Aug 26, 2015 at 9:36 AM, Wido den Hollander wrote:
> Hi,
>
> It's something which has been 'bugging' me for some time now. Why are
> RGW pools prefixed with a period?
>
> I tried setting the root pool to 'rgw.root', but RGW (0.94.1) refuses to
> start:
>
> ERROR: region root pool name must
There is a cephfs-journal-tool that I believe is present in hammer and
ought to let you get your MDS through replay. Depending on which PGs
were lost you will have holes and/or missing files, in addition to not
being able to find parts of the directory hierarchy (and maybe getting
crashes if you ac
On Thu, Aug 27, 2015 at 2:54 AM, Goncalo Borges
wrote:
> Hey guys...
>
> 1./ I have a simple question regarding the appearance of degraded PGs.
> First, for reference:
>
> a. I am working with 0.94.2
>
> b. I have 32 OSDs distributed in 4 servers, meaning that I have 8 OSD per
> server.
>
> c. Our
On Thu, Aug 27, 2015 at 11:11 AM, John Spray wrote:
> On Thu, Aug 27, 2015 at 9:33 AM, Andrzej Łukawski
> wrote:
>> Hi,
>>
>> I ran cephfs-journal-tool to inspect journal 12 hours ago - it's still
>> running. Or... it didn't crush yet, although I don't see any output from it.
>> Is it normal beh
om: Yan, Zheng [mailto:z...@redhat.com]
>> Sent: 24 August 2015 12:28
>> To: Simon Hallam
>> Cc: ceph-users@lists.ceph.com; Gregory Farnum
>> Subject: Re: [ceph-users] Testing CephFS
>>
>>
>> > On Aug 24, 2015, at 18:38, Gregory Farnum wrote:
>> >
I haven't looked at the internals of the model, but the PL(site)
you've pointed out is definitely the crux of the issue here. In the
first grouping, it's just looking at the probability of data loss due
to failing disks, and as the copies increase that goes down. In the
second grouping, it's includ
On Mon, Aug 24, 2015 at 4:03 PM, Vickey Singh
wrote:
> Hello Ceph Geeks
>
> I am planning to develop a python plugin that pulls out cluster recovery IO
> and client IO operation metrics , that can be further used with collectd.
>
> For example , i need to take out these values
>
> recovery io 814
On Fri, Aug 28, 2015 at 1:42 PM, Wido den Hollander wrote:
>
>
> On 28-08-15 13:07, Gregory Farnum wrote:
>> On Mon, Aug 24, 2015 at 4:03 PM, Vickey Singh
>> wrote:
>>> Hello Ceph Geeks
>>>
>>> I am planning to develop a python plugin that
On Mon, Aug 31, 2015 at 5:07 AM, Christian Balzer wrote:
>
> Hello,
>
> I'm about to add another storage node to small firefly cluster here and
> refurbish 2 existing nodes (more RAM, different OSD disks).
>
> Insert rant about not going to start using ceph-deploy as I would have to
> set the clus
On Mon, Aug 31, 2015 at 8:30 AM, 10 minus wrote:
> Hi ,
>
> I 'm in the process of upgrading my ceph cluster from Firefly to Hammer.
>
> The ceph cluster has 12 OSD spread across 4 nodes.
>
> Mons have been upgraded to hammer, since I have created pools with value
> 512 and 256 , so am bit confus
On Mon, Aug 31, 2015 at 9:33 AM, Eino Tuominen wrote:
> Hello,
>
> I'm getting a segmentation fault error from the monitor of our test cluster.
> The cluster was in a bad state because I have recently removed three hosts
> from it. Now I started cleaning it up and first marked the removed osd's
This generally shouldn't be a problem at your bucket sizes. Have you
checked that the cluster is actually in a healthy state? The sleeping
locks are normal but should be getting woken up; if they aren't it
means the object access isn't working for some reason. A down PG or
something would be the si
On Sat, Aug 29, 2015 at 11:50 AM, Gerd Jakobovitsch wrote:
> Dear all,
>
> During a cluster reconfiguration (change of crush tunables from legacy to
> TUNABLES2) with large data replacement, several OSDs get overloaded and had
> to be restarted; when OSDs stabilize, I got a number of PGs marked st
On Sat, Aug 29, 2015 at 3:32 PM, Евгений Д. wrote:
> I'm running 3-node cluster with Ceph (it's Deis cluster, so Ceph daemons are
> containerized). There are 3 OSDs and 3 mons. After rebooting all nodes one
> by one all monitors are up, but only two OSDs of three are up. 'Down' OSD is
> really run
dalone_election() ()
> #12 0x005c42eb in Monitor::bootstrap() ()
> #13 0x005c4645 in Monitor::init() ()
> #14 0x005769c0 in main ()
>
> -Original Message-
> From: Gregory Farnum [mailto:gfar...@redhat.com]
> Sent: 31. elokuuta 2015 11:46
> To
On Mon, Aug 31, 2015 at 12:16 PM, Yan, Zheng wrote:
> On Mon, Aug 24, 2015 at 6:38 PM, Gregory Farnum wrote:
>> On Mon, Aug 24, 2015 at 11:35 AM, Simon Hallam wrote:
>>> Hi Greg,
>>>
>>> The MDS' detect that the other one went down and started the r
On Sep 1, 2015 4:41 PM, "Janusz Borkowski"
wrote:
>
> Hi!
>
> open( ... O_APPEND) works fine in a single system. If many processes
write to the same file, their output will never overwrite each other.
>
> On NFS overwriting is possible, as appending is only emulated - each
write is preceded by a s
That explanation makes quite a lot of sense — unfortunately the crush
parser isn't very intelligent right now.
Could you put a ticket in the tracker (ceph.com/tracker) describing
this issue? :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Mon, Jul 8, 2013 at 12:45 PM, Da
On Mon, Jul 8, 2013 at 6:13 PM, Mikaël Cluseau wrote:
> Hi dear list :)
>
> I have a small doubt about these two options, as the documentation states
> this :
>
> osd client op priority
>
> Description:The priority set for client operations. It is relative to
> osd recovery op priority.
> Defa
On Mon, Jul 8, 2013 at 8:08 PM, Mikaël Cluseau wrote:
>
> Hi Greg,
>
> thank you for your (fast) answer.
Please keep all messages on the list. :)
I just realized you were talking about increased latencies during
scrubbing; the options you reference are for data recovery, not
scrubbing. However,
On Mon, Jul 8, 2013 at 11:45 PM, Mihály Árva-Tóth
wrote:
> Hello,
>
> Is there any limit or recommendation to store objects in one container?
> (rados) When I store one thousand or 100 million objects, performance will
> not affect?
Nope, no limit. RADOS doesn't index contents or anything, so the
On Tue, Jul 9, 2013 at 3:08 AM, Tom Verdaat wrote:
> Hi all,
>
> I've set up a new Ceph cluster for testing and it doesn't seem to be working
> out-of-the-box. If I check the status it tells me that of the 3 defined
> OSD's, only 1 is in:
>
>>health HEALTH_WARN 392 pgs degraded; 392 pgs stuck
On Tue, Jul 9, 2013 at 8:54 AM, huangjun wrote:
> I've tried to use samba shared directory as an OSD in ceph,
> follow the steps:
> 1) mount -t cifs //192.168.0.13/public /var/lib/ceph/osd/ceph-4 -o
> username=root,user_xattr
> 2) configure the osd.4 in ceph.conf
> [osd]
> journal dio
On Tue, Jul 9, 2013 at 2:37 PM, ker can wrote:
> In this slide deck on Slide #14, there is some stuff about being able to
> embed code in the ceph-osd daemon via plugin API. Are there links to some
> examples on how to do that ?
>
> http://indico.cern.ch/getFile.py/access?contribId=9&sessionId=1
On Wed, Jul 10, 2013 at 12:38 AM, Erwan Velu wrote:
> Hi,
>
> I've just subscribe the mailing. I'm maybe breaking the thread as I cannot
> "answer to all" ;o)
>
> I'd like to share my research on understanding of this behavior.
>
> A rados put is showing the expected behavior while the rados bench
On Thu, Jul 11, 2013 at 6:06 AM, Sylvain Munaut
wrote:
> Hi,
>
>
> I'd like the pool_id to be included in the hash used for the PG, to
> try and improve the data distribution. (I have 10 pool).
>
> I see that there is a flag named FLAG_HASHPSPOOL. Is it possible to
> enable it on existing pool ?
On Thu, Jul 11, 2013 at 4:38 PM, Mandell Degerness
wrote:
> I'm not certain what the correct behavior should be in this case, so
> maybe it is not a bug, but here is what is happening:
>
> When an OSD becomes full, a process fails and we unmount the rbd
> attempt to remove the lock associated with
On Mon, Jul 15, 2013 at 2:03 AM, Sylvain Munaut
wrote:
> Hi,
>
>>> I'd like the pool_id to be included in the hash used for the PG, to
>>> try and improve the data distribution. (I have 10 pool).
>>>
>>> I see that there is a flag named FLAG_HASHPSPOOL. Is it possible to
>>> enable it on existing
On Mon, Jul 15, 2013 at 1:30 AM, Stefan Priebe - Profihost AG
wrote:
> Am 15.07.2013 10:19, schrieb Sylvain Munaut:
>> Hi,
>>
>> I'm curious what would be the official recommendation for when you
>> have multiple pools.
>> In total we have 21 pools and that lead to around 12000 PGs for only 24 OSD
It's probably not the same issue as that ticket, which was about the
OSD handling a lack of output incorrectly. (It might be handling the
output incorrectly in some other way, but hopefully not...)
Have you run this crush map through any test mappings yet?
-Greg
Software Engineer #42 @ http://inkt
ave you run this crush map through any test mappings yet?
>>> Yes, it worked on test cluster, and after apply map to main cluster.
>>> OSD servers downed after i'm try to apply crush ruleset 3 (iscsi) to
>>> pool iscsi:
>>> ceph osd pool set data crush_rul
Just old kernels, as they didn't correctly provide all the barriers
and other ordering constraints necessary for the write cache to be
used safely.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Jul 16, 2013 at 9:20 AM, Da Chun wrote:
> In this doc,
> http://ceph.com/
You don't have mount.ceph installed, and there's some translation that
needs to be done in userspace before the kernel sees the mount which
isn't happening. On Debian it's in the ceph-fs-common package.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Jul 16, 2013 at 10:
put in dmesg? And what's the output of "ceph -s"?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
>
> Thanks,
> Hari
>
>
>
> On Tue, Jul 16, 2013 at 10:07 AM, Gregory Farnum wrote:
>>
>> You don't have mount.ceph installed, and t
at i post crushmap to this mail list.
> Is any method to extract crush map from downed osd server and inject
> it to the mon server? from /var/lib/ceph/osd/ceph-2/current/omap
> folder?
>
> 2013/7/17 Gregory Farnum :
>> I notice that your first dump of the crush map didn't i
.
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Jul 16, 2013 at 4:00 PM, Vladislav Gorbunov wrote:
> output is in the attached files
>
> 2013/7/17 Gregory Farnum :
>> The maps in the OSDs only would have gotten there from the monitors.
>> If
n a performant fashion
(instead of just syncing all over the place), but if the consistency
mechanisms in the kernel are broken/disabled/whatever then it's all
for naught.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
>
> -- Original --
On Wed, Jul 17, 2013 at 4:40 AM, Vladislav Gorbunov wrote:
> Sorry, not send to ceph-users later.
>
> I check mon.1 log and found that cluster was not in HEALTH_OK when set
> ruleset to iscsi:
> 2013-07-14 15:52:15.715871 7fe8a852a700 0 log [INF] : pgmap
> v16861121: 19296 pgs: 19052 active+clean
eph-1=155.53.104.100:6789/0}, election epoch 2,
> quorum 0 ceph-1
>
>osdmap e15: 2 osds: 2 up, 2 in
>
>pgmap v1134: 192 pgs: 192 active+clean; 197 MB data, 14166 MB used, 1242
> GB / 1323 GB avail
>
> mdsmap e527: 1/1/1 up {0=ceph-1=up:active}
>
>
>
>
> On
In the monitor log you sent along, the monitor was crashing on a
setcrushmap command. Where in this sequence of events did that happen?
On Wed, Jul 17, 2013 at 5:07 PM, Vladislav Gorbunov wrote:
> That's what I did:
>
> cluster state HEALTH_OK
>
> 1. load crush map from cluster:
> https://dl.drop
On Thu, Jul 18, 2013 at 11:31 AM, Jens Kristian Søgaard
wrote:
> Hi,
>
>> service ceph stop mon
>>
>> doesn't work.
>> how i can stop some osds or mons ?
>
>
> Try for example:
>
> service ceph stop mon.a
>
> or
>
> service ceph stop osd.1
>
> replacing "a" and "1" with the id, you want to sto
On Thu, Jul 18, 2013 at 3:53 AM, Ta Ba Tuan wrote:
> Hi all,
>
> I have 4 (stale+inactive) pgs, how to delete those pgs?
>
> pgmap v59722: 21944 pgs: 4 stale, 12827 active+clean, 9113 active+degraded;
> 45689 MB data, 1006 GB used, 293 TB / 294 TB avail;
>
> I found on google a long time, still ca
On Thu, Jul 18, 2013 at 12:50 AM, Alvaro Izquierdo Jimeno
wrote:
> Hi,
>
>
>
> Reading the URL http://ceph.com/docs/next/radosgw/adminops/#create-user ,
> I’m trying to create a new user with:
>
>
>
> curl -v -X PUT -d '{"uid": "alvaro", "display-name": "alvaro"}
> http://myradosgw/admin/user?for
'0 0.00
> 2.2c6 0 0 0 0 0 0 0
> stale+creating 2013-07-17 16:35:28.445878 0'0 0'0 []
> [68,5] 0'0 0.000'0 0.00
>
> How to delete above pgs, Greg?
>
> Thank Greg so muc
> On Friday, July 19, 2013, wrote:
>
> Hello,
>
> I’ve deployed a Ceph cluster consisting of 5 server nodes and a Ceph client
> that will hold the mounted CephFS.
>
> The cephclient serves as admin too, and from that node I want to deploy the
> 5 servers with the ceph-deploy tool.
>
> From the admi
Yeah, that's a known bug with the stats collection. I think I heard
Sam discussing fixing it earlier today or something.
Thanks for mentioning it. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Jul 17, 2013 at 4:53 PM, Mikaël Cluseau wrote:
> Hi list,
>
> not a rea
Did you do "ceph-deploy new" before you started?
On Friday, July 19, 2013, wrote:
> Hello,
>
> I’ve deployed a Ceph cluster consisting of 5 server nodes and a Ceph
> client that will hold the mounted CephFS.
>
> The cephclient serves as admin too, and from that node I want to deploy
> the 5 serv
On Mon, Jul 22, 2013 at 5:42 AM, w sun wrote:
> Does anyone know how to do this or if this is not possible? We try to modify
> the security scope for an existing cephx user but could not figure out how
> to add access to a new pool without recreating the user, e.g.,
>
> ceph auth get-or-create cli
On Fri, Jul 19, 2013 at 11:04 PM, Noah Watkins wrote:
> On Fri, Jul 19, 2013 at 8:09 AM, ker can wrote:
>>
>> With ceph is there any way to influence the data block placement for a
>> single file ?
>
> AFAIK, no... But, this is an interesting twist. New files written out
> to HDFS, IIRC, will by
On Tue, Jul 23, 2013 at 8:50 AM, Guido Winkelmann
wrote:
> Hi,
>
> How can I get a list of all defined monitors in a ceph cluster from a client
> when using the C API?
>
> I need to store the monitors for available ceph clusters in a database, and I
> would like to build this software so that a) t
On Tue, Jul 23, 2013 at 8:54 AM, Matthew Walster wrote:
> I've got a relatively small Ceph cluster I'm playing with at the moment, and
> against advice, I'm running the MONs on the OSDs.
>
> Ideally, I'd like to have half the OSDs in a different facility and
> therefore have one MON in each facili
On Tue, Jul 23, 2013 at 9:04 AM, Matthew Walster wrote:
>
> That's fantastic, thanks. I'm assuming that 5ms is probably too much for the
> OSDs -- do we have any idea/data as to the effect of latency on OSDs if they
> were split over a similar distance? Or even a spread - 0.5ms, 1ms, 2ms etc.
> Ob
On Tue, Jul 23, 2013 at 9:12 AM, Matthew Walster wrote:
> On 23 July 2013 17:07, Gregory Farnum wrote:
>>
>> If you have three osds that are
>> separated by 5ms each and all hosting a PG, then your lower-bound
>> latency for a write op is 10ms — 5 ms to send from the
On Tue, Jul 23, 2013 at 1:28 PM, Wido den Hollander wrote:
> On 07/23/2013 09:09 PM, Gaylord Holder wrote:
>>
>> Is it possible to find out which machines are mapping and RBD?
>
>
> No, that is stateless. You can use locking however, you can for example put
> the hostname of the machine in the loc
On Tue, Jul 23, 2013 at 2:55 PM, Sebastien Han
wrote:
>
> Hi Greg,
>
> Just tried the list watchers, on a rbd with the QEMU driver and I got:
>
> root@ceph:~# rados -p volumes listwatchers rbd_header.789c2ae8944a
> watcher=client.30882 cookie=1
>
> I also tried with the kernel module but didn't se
On Tue, Jul 23, 2013 at 2:50 PM, Studziński Krzysztof
wrote:
> Hi,
> We've got some problem with our cluster - it continuously reports failed one
> osd and after auto-rebooting everything seems to work fine for some time (few
> minutes). CPU util of this osd is max 8%, iostat is very low. We tri
Yeah, this is because right now when you mark an OSD out the weights
of the buckets above it aren't changing. I guess conceivably we could
set it up to do so, hrm...
In any case, if this is inconvenient you can do something like unlink
the OSD right after you mark it out; that should update the CRU
On Tue, Jul 23, 2013 at 3:20 PM, Studziński Krzysztof
wrote:
>> On Tue, Jul 23, 2013 at 2:50 PM, Studziński Krzysztof
>> wrote:
>> > Hi,
>> > We've got some problem with our cluster - it continuously reports failed
>> one osd and after auto-rebooting everything seems to work fine for some
>> time
On Mon, Jul 22, 2013 at 2:53 AM, wrote:
> I am using RHEL6.
>
> From the ceph admin machine I executed:
>
> ceph-deploy install cephserverX
> ceph-deploy new cephserverX
> ceph-deploy mon create cephserverX
>
> is there a debug mode more verbose than -v that I can enable, in order to see
> more
On Fri, Jul 19, 2013 at 3:44 PM, Pawel Veselov wrote:
> Hi.
>
> I'm trying to understand the reason behind some of my unclean pages, after
> moving some OSDs around. Any help would be greatly appreciated.I'm sure we
> are missing something, but can't quite figure out what.
>
> [root@ip-10-16-43-12
Can you get the quorum and related dumps out of the admin socket for
each running monitor and see what they say?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Jul 23, 2013 at 4:51 PM, Mandell Degerness
wrote:
> One of our tests last night failed in a weird way. We s
gt;
> On Tue, Jul 23, 2013 at 4:57 PM, Gregory Farnum wrote:
>> Can you get the quorum and related dumps out of the admin socket for
>> each running monitor and see what they say?
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>
On Thursday, July 25, 2013, Dzianis Kahanovich wrote:
> I think to make pool-per-user (primary for cephfs; for security, quota,
> etc),
> hundreds (or even more) of them. But I remember 2 facts:
> 1) info in manual about slowdown on many pools;
Yep, this is still a problem; pool-per-user isn't g
On Thu, Jul 25, 2013 at 12:47 AM, Mostowiec Dominik
wrote:
> Hi
> We found something else.
> After osd.72 flapp, one PG '3.54d' was recovering long time.
>
> --
> ceph health details
> HEALTH_WARN 1 pgs recovering; recovery 1/39821745 degraded (0.000%)
> pg 3.54d is active+recovering, acting [72,1
On Thu, Jul 25, 2013 at 7:41 PM, Rongze Zhu wrote:
> Hi folks,
>
> Recently, I use puppet to deploy Ceph and integrate Ceph with OpenStack. We
> put computeand storage together in the same cluster. So nova-compute and
> OSDs will be in each server. We will create a local pool for each server,
> an
On Thu, Jul 25, 2013 at 7:42 PM, Greg Chavez wrote:
> Any idea how we tweak this? If I want to keep my ceph node root
> volume at 85% used, that's my business, man.
There are config options you can set. On the monitors they are "mon
osd full ratio" and "mon osd nearfull ratio"; on the OSDs you m
On Fri, Jul 26, 2013 at 9:17 AM, johnu wrote:
> Hi all,
> I need to know whether someone else also faced the same issue.
>
>
> I tried openstack + ceph integration. I have seen that I could create
> volumes from horizon and it is created in rados.
>
> When I check the created volumes in ad
On Fri, Jul 26, 2013 at 9:35 AM, johnu wrote:
> Greg,
> I verified in all cluster nodes that rbd_secret_uuid is same as
> virsh secret-list. And If I do virsh secret-get-value of this uuid, i
> getting back the auth key for client.volumes. What did you mean by same
> configuration?. Did y
ow, if
> volume is attached to an instance lying on the same host, it works
> otherwise, it doesn't. Might be a coincidence. And I am surprised that no
> one else has seen or reported this issue. Any idea?
>
> On Fri, Jul 26, 2013 at 9:45 AM, Gregory Farnum wrote:
>>
&g
as the mistake in the configuration.
> virsh secret-define gave different secrets
>
> sudo virsh secret-define --file secret.xml
>
> sudo virsh secret-set-value --secret {uuid of secret} --base64 $(cat
> client.volumes.key)
>
>
>
> On Fri, Jul 26, 2013 at 10:16 AM, Greg
On Mon, Jul 29, 2013 at 11:36 AM, Don Talton (dotalton)
wrote:
> Hello,
>
> I have a small test cluster that I deploy using puppet-ceph. Both the MON and
> the OSDs deploy properly, and appear to have all of the correct
> configurations. However, the OSDs are never marked as up. Any input is
>
On Mon, Jul 29, 2013 at 9:38 PM, James Harper
wrote:
> My servers all have 4 x 1gb network adapters, and I'm presently using DRBD
> over a bonded rr link.
>
> Moving to ceph, I'm thinking for each server:
>
> eth0 - LAN traffic for server and VM's
> eth1 - "public" ceph traffic
> eth2+eth3 - LACP
You'll want to figure out why the cluster isn't healthy to begin with.
Is the incomplete/inactive PG staying constant? Track down which OSDs
it's on and make sure the acting set is the right size, or if you've
somehow lost data on it. I believe the docs have some content on doing
this but I don't h
On Wednesday, August 14, 2013, wrote:
>
> Hi Sage,
>
> I just upgraded and everything went quite smoothly with osds, mons and
> mds, good work guys! :)
>
> The only problem I have ran into is with radosgw. It is unable to start
> after the upgrade with the following message:
>
> 2013-08-14 11:57:25
On Thu, Aug 1, 2013 at 9:57 AM, Jeff Moskow wrote:
> Greg,
>
> Thanks for the hints. I looked through the logs and found OSD's with
> RETRY's. I marked those "out" (marked in orange) and let ceph rebalance.
> Then I ran the bench command.
> I now have many more errors than before :-(.
>
> he
Yep. I don't remember for sure but I think you may need to use the
ceph CLI to specify changes to these parameters, though — the config
file options will only apply to the initial creation of the OSD map.
("ceph pg set_nearfull_ratio 0.88" etc)
-Greg
Software Engineer #42 @ http://inktank.com | htt
Doing that could paper over some other issues, but you certainly
shouldn't need every node in the cluster to be a monitor. If you could
be a bit clearer about what steps you took in both cases maybe
somebody can figure it out, though. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://cep
On Monday, August 5, 2013, Kevin Weiler wrote:
> Thanks for looking Sage,
>
> I came to this conclusion myself as well and this seemed to work. I'm
> trying to replicate a ceph cluster that was made with ceph-deploy
> manually. I noted that these capabilities entries were not in the
> ceph-deploy
They're unclean because CRUSH isn't generating an acting set of
sufficient size so the OSDs/monitors are keeping them remapped in
order to maintain replication guarantees. Look in the docs for the
crush tunables options for a discussion on this.
-Greg
Software Engineer #42 @ http://inktank.com | ht
On Sunday, August 18, 2013, Guang Yang wrote:
> Hi ceph-users,
> This is Guang and I am pretty new to ceph, glad to meet you guys in the
> community!
>
> After walking through some documents of Ceph, I have a couple of questions:
> 1. Is there any comparison between Ceph and AWS S3, in terms of
Have you ever used the FS? It's missing an object which we're
intermittently seeing failures to create (on initial setup) when the
cluster is unstable.
If so, clear out the metadata pool and check the docs for "newfs".
-Greg
On Monday, August 19, 2013, Georg Höllrigl wrote:
> Hello List,
>
> The
On Mon, Aug 19, 2013 at 9:07 AM, Sage Weil wrote:
> On Mon, 19 Aug 2013, S?bastien Han wrote:
>> Hi guys,
>>
>> While reading a developer doc, I came across the following options:
>>
>> * osd balance reads = true
>> * osd shed reads = true
>> * osd shed reads min latency
>> * osd shed reads min la
On Fri, Aug 16, 2013 at 5:47 AM, Mostowiec Dominik
wrote:
> Hi,
> Thanks for your response.
>
>> It's possible, as deep scrub in particular will add a bit of load (it
>> goes through and compares the object contents).
>
> It is possible that the scrubbing blocks access(RW or only W) to bucket inde
On Mon, Aug 19, 2013 at 3:09 PM, Mostowiec Dominik
wrote:
> Hi,
>> Yes, it definitely can as scrubbing takes locks on the PG, which will
>> prevent reads or writes while the message is being processed (which will
>> involve the rgw index being scanned).
> It is possible to tune scrubbing config
On Monday, August 19, 2013, Guang Yang wrote:
> Thanks Greg.
>
> Some comments inline...
>
> On Sunday, August 18, 2013, Guang Yang wrote:
>
> Hi ceph-users,
> This is Guang and I am pretty new to ceph, glad to meet you guys in the
> community!
>
> After walking through some documents of Ceph, I h
On Tue, Aug 20, 2013 at 4:56 PM, Petr Soukup wrote:
> I am using ceph filesystem through ceph-fuse to store product photos and most
> of the time it works great. But if there is some problem on ceph server, my
> connected clients start acting crazy. Load on all servers with mounted ceph
> jumps
Do you have full logs from the beginning of replay? I believe you
should only see this when a client is reconnecting to the MDS with
files that the MDS doesn't know about already, which shouldn't happen
at all in a single-MDS system. Although that "pool -1" also looks
suspicious and makes me wonder
Sounds like the puppet scripts haven't put the client.admin keyring on that
node, or it's in the wrong place.
Alternatively, there's a different keyring they're supposed to be using but
it's not saying so in the command.
-Greg
On Wednesday, August 21, 2013, Stroppa Daniele (strp) wrote:
> Hi Al
bably
applicable in this context.
-Greg
>
> I will look into other ways of reading files from ceph. Most of the
> traffic is from webserver loading images - I could load these images with
> some script using some ceph library and implement simple timeout.
>
>
> -O
On Wed, Aug 21, 2013 at 11:33 AM, Guido Winkelmann
wrote:
> Hi,
>
> Is it possible to have more than one CephFS filesystem per Ceph cluster?
>
> In the default configuration, a ceph cluster has got only one filesystem, and
> you can mount that or nothing. Is it possible somehow to have several dis
On Wed, Aug 21, 2013 at 12:20 PM, Petr Soukup wrote:
> Files are subdivided to folders by 1000 in each folder
> (/img/10/21/10213456.jpg etc.) to increase performance.
In that case the stability issues are probably with your OSDs being
overloaded by write requests that aren't being appropriately
On Thursday, August 22, 2013, Amit Vijairania wrote:
> Hello!
>
> We, in our environment, need a shared file system for
> /var/lib/nova/instances and Glance image cache (_base)..
>
> Is anyone using CephFS for this purpose?
> When folks say CephFS is not production ready, is the primary concern
>
On Thu, Aug 22, 2013 at 2:23 PM, Oliver Daudey wrote:
> Hey Greg,
>
> I encountered a similar problem and we're just in the process of
> tracking it down here on the list. Try downgrading your OSD-binaries to
> 0.61.8 Cuttlefish and re-test. If it's significantly faster on RBD,
> you're probably
On Thu, Aug 22, 2013 at 2:47 PM, Oliver Daudey wrote:
> Hey Greg,
>
> Thanks for the tip! I was assuming a clean shutdown of the OSD should
> flush the journal for you and have the OSD try to exit with it's
> data-store in a clean state? Otherwise, I would first have to stop
> updates a that par
On Thu, Aug 22, 2013 at 5:23 PM, Greg Poirier wrote:
> On Thu, Aug 22, 2013 at 2:34 PM, Gregory Farnum wrote:
>>
>> You don't appear to have accounted for the 2x replication (where all
>> writes go to two OSDs) in these calculations. I assume your pools have
>
>
501 - 600 of 2358 matches
Mail list logo