Hello,
We are currently running a single datacenter Ceph deployment. Our setup is
as follows:
- 4 HDD OSD nodes (primarily used for RadosGW/Object Storage)
- 2 SSD OSD nodes (used for RBD/VM block devices)
- 3 Monitor daemons running on 3 of the HDD OSD nodes
- The CRUSH rules are set to push all
On Mon, Jan 5, 2015 at 6:59 AM, Austin S Hemmelgarn
wrote:
> Secondly, I would highly recommend not using ANY non-cluster-aware FS on top
> of a clustered block device like RBD
For my use-case, this is just a single server using the RBD device. No
clustering involved on the BTRFS side of thing.
Do monitors have any impact on read/write latencies? Everything I've read
says no, but since a client needs to talk to a monitor before reading or
writing to OSDs it would seem like that would introduce some overhead.
I ask for two reasons:
1) We are currently using SSD based OSD nodes for our RB
On 01/06/2015 04:45 PM, Robert LeBlanc wrote:
Seems like a message bus would be nice. Each opener of an RBD could
subscribe for messages on the bus for that RBD. Anytime the map is
modified a message could be put on the bus to update the others. That
opens up a whole other can of worms though.
On Wed, 7 Jan 2015 00:54:13 +0900 Christian Balzer wrote:
> On Tue, 6 Jan 2015 19:28:44 +0400 ivan babrou wrote:
>
> > Restarting OSD fixed PGs that were stuck:
> > http://i.imgur.com/qd5vuzV.png
> >
> Good to hear that.
>
> Funny (not really) how often restarting OSDs fixes stuff like that.
>
it is already in parallel, the outstanding ops are limited to ~10 per
client(tuneable),enlarge this may help.
BUut pls note that there is no noop here, OSD has no idea wherher it has an
object until it failed to find it in the disk, that means the op had almost
traveled the code path.
Ro
On 07/01/15 17:43, hemant burman wrote:
Hello Yehuda,
The issue seem to be with the user data file for swift subser not
getting synced properly.
FWIW, I'm seeing exactly the same thing as well (Hermant - that was well
spotted)!
___
ceph-users mai
On 06/01/15 06:45, hemant burman wrote:
One more thing Yehuda,
In radosgw log in Slave Zone:
2015-01-05 17:22:42.188108 7fe4b66d2780 20 enqueued request req=0xbc1f50
2015-01-05 17:22:42.188125 7fe4b66d2780 20 RGWWQ:
2015-01-05 17:22:42.188126 7fe4b66d2780 20 req: 0xbc1f50
2015-01-05 17:22:42
On 07/01/15 16:22, Mark Kirkwood wrote:
FWIW I can reproduce this too (ceph 0.90-663-ge1384af). The *user*
replicates ok (complete with its swift keys and secret). I can
authenticate to both zones ok using S3 api (boto version 2.29), but only
to the master using swift (swift client versions 2.3
Hi,
I have been experiencing issues with several PGs which remained in
inconsistent state (I use BTRFS). "ceph pg repair" is not able to repair
them. The only way I can delete the corresponding file, which is causing
the issue (see logs bellow) from the OSDs. This however means loss of data.
Hello,
I'm re-sending this message since I didn't see it picked up on the list
archives yesterday. My apologies if it was received previously.
We are currently running a single datacenter Ceph deployment. Our setup is
as follows:
- 4 HDD OSD nodes (primarily used for RadosGW/Object Storage)
- 2
Hi,
my osd folder "current" has a size of ~360MB but I do not have any
objects inside the corresponding pool; ceph status reports '8 bytes
data'. Even with 'rados -p mypool ls --all' I do not see any objects.
But there are a few current/12._head folders with files consuming
disk space.
How to
On Tue, 6 Jan 2015, Chen, Xiaoxi wrote:
> it is already in parallel, the outstanding ops are limited to ~10 per
> client(tuneable),enlarge this may help.
>
> BUut pls note that there is no noop here, OSD has no idea wherher it has
> an object until it failed to find it in the disk, that means th
Hi all, apologies for the slow reply.
Been flat out lately and so any cluster work has been relegated to the
back-burner. I'm only just starting to get back to it now.
On 06/06/14 01:00, Sage Weil wrote:
> On Thu, 5 Jun 2014, Wido den Hollander wrote:
>> On 06/05/2014 08:59 AM, Stuart Longland w
The bitmap certainly sounds like it would help shortcut a lot of code
that Xiaoxi mentions. Is the idea that the client caches the bitmap
for the RBD so it know which OSDs to contact (thus saving a round trip
to the OSD), or only for the OSD to know which objects exist on it's
disk?
On Tue, Jan 6,
On 01/06/2015 04:19 PM, Robert LeBlanc wrote:
The bitmap certainly sounds like it would help shortcut a lot of code
that Xiaoxi mentions. Is the idea that the client caches the bitmap
for the RBD so it know which OSDs to contact (thus saving a round trip
to the OSD), or only for the OSD to know w
Hello Yehuda,
The issue seem to be with the user data file for swift subser not getting
synced properly.
MasterZone:
root@ceph-all:/var/local# ceph osd map .us-1-east-1.users.uid johndoe2
osdmap e796 pool '.us-1-east-1.users.uid' (286) object 'johndoe2' -> pg
286.c384ed51 (286.51) -> up [2] acti
Hi all,
I'm wondering if it's possible to make the files in Ceph available via FTP
by just configuring Ceph. If this is not possible, what are the typical
steps on how to make the files available via FTP?
Thanks!
___
ceph-users mailing list
ceph-users@l
Hi,
We have the same setup including OpenNebula 4.10.1. We had some
backfilling due to node failures and node expansion. If we throttle
osd_max_backfills there is not a problem at all. If the value for
backfilling jobs is too high, we can see delayed reactions within the
shell, eg. `ls -lh` needs
Hello Eneko;
Firstly, thanks for your comments!
You mentioned that machines see a QEMU IDE/SCSI disk, they don't know
whether its on ceph, NFS, local, LVM, ... so it works OK for any VM guest
SO.
But what if I want to CEPH cluster to serve a whole range of clients in the
data center, ranging fro
On 2015-01-04 15:26, Jérôme Poulin wrote:
Happy holiday everyone,
TL;DR: Hardware corruption is really bad, if btrfs-restore work,
kernel Btrfs can!
I'm cross-posting this message since the root cause for this problem
is the Ceph RBD device however, my main concern is data loss from a
BTRFS fil
Hello
newbie on CEPH here.I do have three lab servers with CEPH. Each server got
two 2 x 3TB SATA disks. Up to now I run 2 OSD per server and partitioned the 2
disks in to 4 partitions and had 2 OSD split over the 4 partitions. 1 Disk = 1
OSD = 2 partitions (data and journal).
Now I starte
On Thu, Dec 25, 2014 at 03:57:15PM +1100, Dmitry Smirnov wrote:
> Please don't withhold this improvement -- go ahead and submit pull request to
> let developers decide whether they want this or not. IMHO it is a very useful
> improvement. Thank you very much for implementing it.
Done. https://g
Hi Max,
Thanks for this info.
I am planing to use CephFS (ceph version 0.87) at home, because its more
convenient than NFS over RBD. I dont have large environment; about 20TB,
so hopefully it will hold.
I backup all important data just in case. :)
Thank you.
Jiri
On 29/12/2014 21:09, Thoma
Hi,
BTRFS crashed because the system ran out of memory...
I see these entries in your logs:
Jan 4 17:11:06 ceph1 kernel: [756636.535661] kworker/0:2: page
allocation failure: order:1, mode:0x204020
Jan 4 17:11:06 ceph1 kernel: [756636.536112] BTRFS: error (device
sdb1) in create_pendi
Good evening,
we also tried to rescue data *from* our old / broken pool by map'ing the
rbd devices, mounting them on a host and rsync'ing away as much as
possible.
However, after some time rsync got completly stuck and eventually the
host which mounted the rbd mapped devices decided to kernel pan
I created a ceph tracker issue:
http://tracker.ceph.com/issues/10471
Thanks,
Yehuda
On Tue, Jan 6, 2015 at 10:19 PM, Mark Kirkwood
wrote:
> On 07/01/15 17:43, hemant burman wrote:
>>
>> Hello Yehuda,
>>
>> The issue seem to be with the user data file for swift subser not
>> getting synced prope
This is probably more suited to the ceph-user list. Moving it there. Thanks.
Best Regards,
Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com || http://community.redhat.com
@scuttlemonkey || @ceph
On Wed, Jan 7, 2015 at 9:17 AM, Walter Valenti wrote:
> Scenario:
> Openstack
Monitors are in charge of the CRUSH map. When ever there is a change
to the CRUSH map, an OSD goes down, a new OSD is added, PGs are
increased, etc, the monitor(s) builds a new CRUSH map and distributes
it to all clients and OSDs. Once the client has the CRUSH map, it does
not need to contact the m
On 12/30/14 16:36, Nico Schottelius wrote:
> Good evening,
>
> we also tried to rescue data *from* our old / broken pool by map'ing the
> rbd devices, mounting them on a host and rsync'ing away as much as
> possible.
>
> However, after some time rsync got completly stuck and eventually the
> host w
Hello.
Quick question RE: cache tiering vs. OSD journals.
As I understand it, SSD acceleration is possible at the pool or OSD level.
When considering cache tiering, should I still put OSD journals on SSDs or
should they be disabled altogether.
Can a single SSD pool function as a cac
On Sat, Dec 20, 2014 at 1:15 AM, Anthony Alba wrote:
> Hi Sage,
>
> Has the repo metadata been regenerated?
>
> One of my reposync jobs can only see up to 0.89, using
> http://ceph.com/rpm-testing.
It was generated but we somehow missed out on properly syncing it. You
should now see 0.90 properly
Hello.
I wasn’t able to obtain a clear answer in my googling and reading official Ceph
docs if Erasure Coded pools are possible/supported for RBD access?
The idea is to have block (cold) storage for archival purposes. I would
access an RBD device and format it as EXT or XFS for block use.
Hello,
On Tue, 6 Jan 2015 15:29:50 + Shain Miley wrote:
> Hello,
>
> We currently have a 12 node (3 monitor+9 OSD) ceph cluster, made up of
> 107 x 4TB drives formatted with xfs. The cluster is running ceph version
> 0.80.7:
>
I assume journals on the same HDD then.
How much memory per no
Seems like a message bus would be nice. Each opener of an RBD could
subscribe for messages on the bus for that RBD. Anytime the map is modified
a message could be put on the bus to update the others. That opens up a
whole other can of worms though.
Robert LeBlanc
Sent from a mobile device please
I think your free memory is just fine. If you have lots of data change
(read/write) then I think it is just aging out your directory cache.
If fast directory listing is important to you, you can always write a
script to periodically read the directory listing so it stays in cache
or use http://lime
Hi, I have a situation where I moved the interfaces over which my ceph-public
network is connected (only the interfaces, not the IPs, etc.) this was done to
increase available bandwidth, but it backfired catastrophically. My monitors
all failed and somehow became corrupted, but I was unable to r
On 2015-01-06 23:11, Jérôme Poulin wrote:
On Mon, Jan 5, 2015 at 6:59 AM, Austin S Hemmelgarn
wrote:
Secondly, I would highly recommend not using ANY non-cluster-aware FS on top
of a clustered block device like RBD
For my use-case, this is just a single server using the RBD device. No
cluste
Hello,
I’d like to know how can I calculate the overhead of a erasure pool?
Regards.
Italo Santos
http://italosantos.com.br/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On 01/06/2015 10:24 AM, Robert LeBlanc wrote:
Can't this be done in parallel? If the OSD doesn't have an object then
it is a noop and should be pretty quick. The number of outstanding
operations can be limited to 100 or a 1000 which would provide a
balance between speed and performance impact if
Looks like there was (is) a technical issue at Dreamhost that is being
actively worked on. I put in a request to get mmarch run manually for
now until the issue is resolved. You can always browse the posts in
real time from the archive pages:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/
Hi Nico,
Yes Ceph is production ready. Yes people are using it in production for qemu.
Last time I heard, Ceph was surveyed as the most popular backend for OpenStack
Cinder in production.
When using RBD in production, it really is critically important to (a) use 3
replicas and (b) pay attention
Thank you for your assistance Craig. At the time, I hadn’t noted placement
group details, but I know to do that if I get inactive placement groups again.
I’m still getting familiar with the cluster, with 15 OSDs now across five
hosts, a mix of good and bad drives, XFS/BTRFS and with/without SSD
I'm trying to install Firefly on an up-to-date FC20 box. I'm getting
the following errors:
[nwatkins@kyoto cluster]$ ../ceph-deploy/ceph-deploy install --release
firefly kyoto
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/nwatkins/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked
Hi Italo,
=k/(k+m)
Where k is data chunks and m is coding chunks.
For example k=8 m=2 would give you
=8/(8+2)
.8 or 80% usable storage and 20% used for coding. Please keep in mind however
that you can’t fill up the storage completely.
Nick
From: ceph-users [mailto:ceph-
Hello Achim,
good to hear someone else running this setup. We have changed the number
of backfills using
ceph tell osd.\* injectargs '--osd-max-backfills 1'
and it seems to work mostly in regards of issues when rebalancing.
One unsolved problem we have is machines kernel panic'ing, when i/o
Just to follow up on this thread, the main reason that the rbd directory
listing latency was an issue for us, was that we were seeing a large amount of
IO delay in a PHP app that reads from that rbd image.
It occurred to me (based on Roberts cache_dir suggestion below) that maybe
doing a recur
Hello,
Can you give the link the exact instructions you followed?
For CentOS7 (EL7) ceph-extras should not be necessary. The instructions at
[1] do not have you enabled the ceph-extras repo. You will find that there
are EL7 packages at [2]. I recently found a README that was incorrectly
refere
Hi Steven,
Until the RBD/FS drivers are developed for those particular OS’s you are forced
to use a Linux server to “proxy” the storage into another format which those
OS’s can understand.
However if you take a look on the Dev mailing list, somebody has just posted a
link to a Windows Ce
Hello Dan,
it is good to know that there are actually people using ceph + qemu in
production!
Regarding replicas: I thought about using size = 2, but I see that
this resembles raid5 and size = 3 is more or less equal in terms of loss
to raid6.
Regarding the kernel panics: I am still researching
Thanks Nick.
At.
Italo Santos
http://italosantos.com.br/
On Wednesday, January 7, 2015 at 18:44, Nick Fisk wrote:
> Hi Italo,
>
> =k/(k+m)
>
> Where k is data chunks and m is coding chunks.
>
> For example k=8 m=2 would give you
>
> =8/(8+2)
>
> .8 or 80% usable storage and 20
Hi Noah,
I'll try to recreate this on a fresh FC20 install as well. Looks to
me like there might be a repo priority issue. It's mixing packages
from Fedora downstream repos and the ceph.com upstream repos. That's
not supposed to happen.
- Travis
On Wed, Jan 7, 2015 at 2:15 PM, Noah Watkins
Hello all,
Just a quick heads up that we now have a PG calculator to help determine
the proper PG per pool numbers to achieve a target PG per OSD ratio.
http://ceph.com/pgcalc
Please check it out! Happy to answer any questions, and always welcome any
feedback on the tool / verbiage, etc...
As
Hi,
I"m playing with this with a modest sized ceph cluster (36x6TB disks).
Based on this it says that small pools (such as .users) would have just 16
PGs. Is this correct? I've historically always made even these small pools
have at least as many PGs as the next power of 2 over my number of OSDs (
Hello Christopher,
Keep in mind that the PGs per OSD (and per pool) calculations take into
account the replica count ( pool size= parameter ). So, for example.. if
you're using a default of 3 replicas.. 16 * 3 = 48 PGs which allows for at
least one PG per OSD on that pool. Even with a size=2, 3
Ah, so I've been doing it wrong all this time (I thought we had to take the
size multiple into account ourselves).
Thanks!
On Wed, Jan 7, 2015 at 4:25 PM, Michael J. Kidd
wrote:
> Hello Christopher,
> Keep in mind that the PGs per OSD (and per pool) calculations take into
> account the replic
On 07/01/2015 23:08, Michael J. Kidd wrote:
> Hello all,
> Just a quick heads up that we now have a PG calculator to help determine
> the proper PG per pool numbers to achieve a target PG per OSD ratio.
>
> http://ceph.com/pgcalc
>
> Please check it out! Happy to answer any questions, and
> Where is the source ?
On the page.. :) It does link out to jquery and jquery-ui, but all the
custom bits are embedded in the HTML.
Glad it's helpful :)
Michael J. Kidd
Sr. Storage Consultant
Inktank Professional Services
- by Red Hat
On Wed, Jan 7, 2015 at 3:46 PM, Loic Dachary wrote:
>
>
Hi Michael,
Good job! It would be really useful to add in calculations to show the
expected distribution and max deviation from the mean.
I'm dredging this up from an old email I sent out a year ago, but if we
treat this as a "balls into bins" problem ala Raab & Steger:
http://www14.in.tum
On Mon, Dec 29, 2014 at 4:49 PM, Alexandre Oliva wrote:
> However, I suspect that temporarily setting min size to a lower number
> could be enough for the PGs to recover. If "ceph osd pool set
> min_size 1" doesn't get the PGs going, I suppose restarting at least one
> of the OSDs involved in t
Hello,
On Thu, 8 Jan 2015 00:17:11 + Sanders, Bill wrote:
> Thanks for your reply, Christian. Sorry for my delay in responding.
>
> The kernel logs are silent. Forgot to mention before that ntpd is
> running and the nodes are sync'd.
>
> I'm working on some folks for an updated kernel, b
This is interesting. Kudos to you guys for getting the calculator up, I think
this'll help some folks.
I have 1 pool, 40 OSDs, and replica of 3. I based my PG count on:
http://ceph.com/docs/master/rados/operations/placement-groups/
'''
Less than 5 OSDs set pg_num to 128
Between 5 and 10 OSDs
Hello Bill,
Either 2048 or 4096 should be acceptable. 4096 gives about a 300 PG per
OSD ratio, which would leave room for tripling the OSD count without
needing to increase the PG number. While 2048 gives about 150 PGs per OSD,
not leaving room but for about a 50% OSD count expansion.
The high
With cephfs we have the two pools - data & metadata. Does that effect the
pg calculations? metadata pool will have substantially less data than the
data pool.
--
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listin
Thanks for your reply, Christian. Sorry for my delay in responding.
The kernel logs are silent. Forgot to mention before that ntpd is running and
the nodes are sync'd.
I'm working on some folks for an updated kernel, but I'm not holding my breath.
That said, If I'm seeing this problem by run
Hi,
I am trying to get a very minimal Ceph cluster up and running (on ARM) and I'm
wondering what is the smallest unit that I can run rados-bench on ?
Documentation at (http://ceph.com/docs/next/start/quick-ceph-deploy/) seems to
refer to 4 different nodes. Admin Node, Monitor Node and 2 OSD only
Excellent, thanks for the detailed breakdown.
Take care,
Bill
From: Michael J. Kidd [michael.k...@inktank.com]
Sent: Wednesday, January 07, 2015 4:50 PM
To: Sanders, Bill
Cc: Loic Dachary; ceph-us...@ceph.com
Subject: Re: [ceph-users] PG num calculator live on Ceph
On Wed, 7 Jan 2015 17:07:46 -0800 Craig Lewis wrote:
> On Mon, Dec 29, 2014 at 4:49 PM, Alexandre Oliva wrote:
>
> > However, I suspect that temporarily setting min size to a lower number
> > could be enough for the PGs to recover. If "ceph osd pool set
> > min_size 1" doesn't get the PGs goin
68 matches
Mail list logo