a single instance since the data would be needed to
be written exactly spread across the cluster.
In my experience it is "good enough" for some low writes instances but not for
write intensive applications like Mysql.
Cheers,
Robert van Leeuwen
; 6. Should i use RAID Level for the drivers on OSD nodes ? or it's better to
> go without RAID ?
Without RAID usually makes for better performance. Benchmark your specific
workload to be sure.
In general I would go for 3 replica's and no RAID.
Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
;
instead of strongly consistent.
I think Ceph is working on something similar for the Rados gateway.
Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
h of an issue but latency certainly will be.
Although bandwidth during a rebalance of data might also be problematic...
Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
mes
more complex having to rule out more potential causes.
Not saying it can not work perfectly fine.
I'd rather just not take any chances with the storage system...
Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists
;s is still not very
comfortable. (especially if the disks come from the same batch)
Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
and
> generally have over 50% of free cpu power.
The number of cores do not really matter if they are all busy ;)
I honestly do not know how Ceph behaves when it is CPU starved but I guess it
might not be pretty.
Since your whole environment will be crumbling down if your storage becomes
un
hypervisors
get above a certain load threshold.
I would certainly test a lot with high loads before putting it in production...
Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
st IO for database like loads I think you
should probably go with SSD only (pools).
(I'd be happy to hear the numbers about running high random write loads on it :)
And another nice hardware scaling PDF from dreamhost...
https://objects.dreamhost.com/inktankweb/Inkta
nusable.
It is also possible to specifically not conntrack certain connections.
e.g.
iptables -t raw -A PREROUTING -p tcp --dport 6789 -j CT --notrack
Note that you will have to make the rules in both traffic flows since the
connections are no longer tracked it does not automatically accepts the return
ous incoming data and lose a 2-3 percent
of lifetime per week.
I would highly recommend to monitor this if you are not doing this already ;)
Buying bigger SSDs will help because the writes are spread across more cells.
So a 240GB drive should last 2x a 120GB drive.
Cheers,
Robert van Leeuwen
hat in the crush map bucket hierarchy indeed.
There is a nice explanation here:
http://ceph.com/docs/master/rados/operations/crush-map/
Note that your clients will write in both the local and remote dc so it will
impact write latency!
Cheers,
Robert van Leeuwen
__
> Which leveldb from where? 1.12.0-5 that tends to be in el6/7 repos is broken
> for Ceph.
> You need to remove the “basho fix” patch.
> 1.7.0 is the only readily available version that works, though it is so old
> that I suspect it is responsible for various
> issues we see.
Apparently at some
> I cannot add a new OSD to a current Ceph cluster.
> It just hangs, here is the debug log:
> This is ceph 0.72.1 on CentOS.
Found the issue:
Although I installed the specific ceph (0.72.1) version the latest leveldb was
installed.
Apparently this breaks stuff...
Cheers,
Robert va
Hi,
I cannot add a new OSD to a current Ceph cluster.
It just hangs, here is the debug log:
ceph-osd -d --debug-ms=20 --debug-osd=20 --debug-filestore=31 -i 10
--osd-journal=/mnt/ceph/journal_vg_sda/journal0 --mkfs --mkjournal --mkkey
2014-07-09 10:50:28.934959 7f80f6a737a
> Try to add --debug-osd=20 and --debug-filestore=20
> The logs might tell you more why it isn't going through.
Nothing of interest there :(
What I do notice is that when I run ceph_deploy it is referencing a keyring
that does not exist:
--keyring /var/lib/ceph/tmp/mnt.J7nSi0/keyring
If I look o
ngs on OSD start.
When I manually add the OSD the following process just hangs:
ceph-osd -i 10 --osd-journal=/mnt/ceph/journal_vg_sda/journal --mkfs --mkkey
Running ceph-0.72.1 on CentOS.
Any tips?
Thx,
Robert van Leeuwen
___
ceph-users mailing list
> All of which means that Mysql performance (looking at you binlog) may
> still suffer due to lots of small block size sync writes.
Which begs the question:
Anyone running a reasonable busy Mysql server on Ceph backed storage?
We tried and it did not perform good enough.
We have a small ceph clu
> this is a very good point that I totally overlooked. I concentrated more on
> the IOPS alignment plus write durability,
> and forgot to check the sequential write bandwidth.
Again, this totally depends on the expected load.
Running lots of VMs usually tends to end up being random IOPS on your
> We are at the end of the process of designing and purchasing storage to
> provide Ceph based backend for VM images, VM boot (ephemeral) disks,
> persistent volumes (and possibly object storage) for our future Openstack
> cloud.
> We considered many options and we chose to prefer commodity sto
replica counts.
When you write something with a replica count of 2 it will show up as using
twice the amount of space.
So 1 GB usage will result in:
1394 GB / 1396 GB avail
Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http
> This is "similar" to ISCSI except that the data is distributed accross x ceph
> nodes.
> Just as ISCSI you should mount this on two locations unless you run a
> clustered filesystem (e.g. GFS / OCFS)
Oops I meant, should NOT mount this on two locations unles... :)
Cheers,
Robert
_
> So .. the idea was that ceph would provide the required clustered filesystem
> element,
> and it was the only FS that provided the required "resize on the fly and
> snapshotting" things that were needed.
> I can't see it working with one shared lun. In theory I can't see why it
> couldn't wor
something similar.
You can get it to work but it is a bit of a PITA.
There are also some performance considerations with those filesystems so you
should really do some proper testing before any large scale deployments.
Cheers,
Robert van Leeuwen
___
c
On 28.03.14, 12:12, Guang wrote:
> Hello ceph-users,
> We are trying to play with RBD and I would like to ask if RBD works for
> RedHat 6.4 (with kernel version 2.6.32),
Unless things changed I think there is no usable kernel client with the Redhat
supplied kernel.
You could build your own kern
across 2 failure domains by default.
My guess is the default crush map will see a node as a single failure domain by
default.
So, edit the crushmap to allow this or add a second node.
Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists
nyone have experience running Mysql or other real-life heavy workloads
from qemu and getting more then 300 IOPS?
Thx,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> I'm hoping to get some feedback on the Dell H310 (LSI SAS2008 chipset).
> Based on searching I'd done previously I got the impression that people
> generally recommended avoiding it in favour of the higher specced H710
> (LSI SAS2208 chipset).
Purely based on the controller chip it should be OK.
Hi,
We experience something similar with our Openstack Swift setup.
You can change the sysstl "vm.vfs_cache_pressure" to make sure more inodes are
being kept in cache.
(Do not set this to 0 because you will trigger the OOM killer at some point ;)
We also decided to go for nodes with more memory
the m4 is.
Up to now it seems that only Intel seems to have done his homework.
In general they *seem* to be the most reliable SSD provider.
Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo
ave 10Gb copper on board.
The above machines just have 2x 1Gb.
I think all brands have their own quirks, the question is which one you are the
most comfortable to live with.
(e.g. we have no support contracts with Supermicro and just have parts on stock)
Cheers,
Robert van Leeuwen
__
o:
Result jobs=1:
iops=297
Result jobs=16:
iops=1200
I'm running the fio bench from a KVM virtual.
Seems that a single write thread is not able to go above 300 iops (latency?)
Ceph can handle more iops if you start more / parallel write threads.
Cheers,
Robe
full (above 85%).
Hi,
Could you check / show the weights of all 3 servers and disks?
eg. run "ceph osd tree"
Also if you use failure domains (e.g. by rack) and the first 2 are in the same
domain it will not spread the data according to just the weight but also the
failure domai
code base I *think* it should be pretty
trivial to change the code to support this and would be a very small change
compared to erasure code.
( I looked a bit at crush map Bucket Types but it *seems* that all Bucket types
will still stripe the PGs across all nodes within a failure dom
use RAID 10 and do 2 instead of 3
replicas.
Cheers,
Robert van Leeuwen
From: ceph-users-boun...@lists.ceph.com [ceph-users-boun...@lists.ceph.com] on
behalf of nicolasc [nicolas.cance...@surfsara.nl]
Sent: Thursday, December 12, 2013 5:23 PM
To: Craig Lewis
licas for objects in the pool in order to
acknowledge a write operation to the client. If minimum is not met, Ceph will
not acknowledge the write to the client. This setting ensures a minimum number
of replicas when operating in degraded mode.
Cheers,
Robert va
making it fully
random.
I would expect a performance of 100 to 200 IOPS max.
Doing an iostat -x or atop should show this bottleneck immediately.
This is also the reason to go with SSDs: they have reasonable random IO
performance.
Cheers,
Robert van Leeuwen
Sent from my iPad
> On 6 dec. 2013,
we
use the SSD for flashcache instead of journals)
Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
sume you know this maillinglist is a community effort.
If you want immediate and official support 24x7 buy support @ www.inktank.com
Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
sequential
writes.
Cheers,
Robert van Leeuwen
Sent from my iPad
> On 3 dec. 2013, at 17:02, "Mike Dawson" wrote:
>
> Robert,
>
> Do you have rbd writeback cache enabled on these volumes? That could
> certainly explain the higher than expected write performance. Any
d for 100% percent.
Also the usage dropped to 0% pretty much immediately after the benchmark so it
looks like it's not lagging behind the journal.
Did not really test reads yet since we have so much read cache (128 GB per
node) I assume we will mostly be write limited.
Cheers,
Robert v
wntime.
What I did see is that IOs will crawl to a halt during pg creation ( 1000 took
a few minutes).
Also expect reduced performance during the rebalance of the data.
The OSDs will be quite busy during that time.
I would certainly pick a time with low traffic to do
y and CPU did not seem to be a problem.
Since had the option to recreate the pool and I was not using the recommended
settings I did not really dive into the issue.
I will not stray to far from the recommended settings in the future though :)
Cheers,
Robert van Leeuwen
___
would like to do a partition/format and some ceph commands to get
stuff working again...
Thx,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hi,
I'm playing with our new Ceph cluster and it seems that Ceph is not gracefully
handling a maxed out cluster network.
I had some "flapping" nodes once every few minutes when pushing a lot of
traffic to the nodes so I decided to set the noup and nodown as described in
the docs.
http://ceph.c
Ok, probably hitting this:
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
flapping OSD part...
Cheers,
Robert
From: ceph-users-boun...@lists.ceph.com [ceph-users-boun...@lists.ceph.com] on
behalf of Robert van Leeuwen
803 mon.0 [INF] osd.7 marked itself down
2013-11-19 13:48:40.694596 7f15a6192700 0 monclient: hunting for new mon
Thx,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
t using /dev/sdX for this instead of the /dev/disk/by-id
/by-path given by ceph-deploy.
So I am wondering how other people are setting up machines and how things work
:)
Thx,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
h
omewhat reduces any need to do this in
Ceph but I am curious what Ceph does)
I guess it is pretty tricky to handle since load can either be raw bandwith or
number of IOPS.
Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@list
> I tried putting Flashcache on my spindle OSDs using an Intel SSL and it works
> great.
> This is getting me read and write SSD caching instead of just write
> performance on the journal.
> It should also allow me to protect the OSD journal on the same drive as the
> OSD data and still get bene
east, serves the Redhat RPM's,
maybe the EPEL and Ceph repo's can be added there?
If not, make sure you have a lot of time and patience to copy stuff around.
Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
> The behavior you both are seeing is fixed by making flush requests
> asynchronous in the qemu driver. This was fixed upstream in qemu 1.4.2
> and 1.5.0. If you've installed from ceph-extras, make sure you're using
> the .async rpms [1] (we should probably remove the non-async ones at
> this point
ffectively losing the
network.
This is what I set in the ceph client:
[client]
rbd cache = true
rbd cache writethrough until flush = true
Anyone else noticed this behaviour before or have some troubleshooting tips?
Thx,
Robert van Leeuwen
___
ceph
53 matches
Mail list logo