Hi Greg,
I’m not using it, I am just going through the code to try and come up with some
improvements to the promotion logic and thought I would just double check if
this also requires a bit of work.
Nick
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Gregor
Hi,
That seems very odd - what do the logs say for the osds with slow requests?
Thanks
On Tue, Nov 24, 2015 at 2:20 AM, Mika c wrote:
> Hi Sean,
>Yes, the cluster scrubbing status(scrub + deep scrub) is almost two
> weeks.
>And the result of execute `ceph pg dump | grep scrub` is empty
After doing some more in deep research and tune some parameters I've gain a
little bit more of performance:
# fio --rw=randread --bs=1m --numjobs=4 --iodepth=32 --runtime=22
--time_based --size=16777216k --loops=1 --ioengine=libaio --direct=1
--invalidate=1 --fsync_on_close=1 --randrepeat=1 --nora
Hello.
Someone have list of verified/tested SSD drives for Ceph?
I thinking about Ultrastar SSD1600MM SAS SSD for our all-flash Ceph
cluster. Somebody use it in production?
--
Mike, runs.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://l
Each port should be able to do 40Gb/s or 56Gb/s minus overhead and any
PCIe or car related bottlenecks. IPoIB will further limit that,
especially if you haven't done any kind of interrupt affinity tuning.
Assuming these are mellanox cards you'll want to read this guide:
http://www.mellanox.co
Are the journals on the same device – it might be better to use the SSDs for
journaling since you are not getting better performance with SSDs?
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Marek
Dohojda
Sent: Monday, November 23, 2015 10:24 PM
To: Haomai Wang
Cc: ceph
Thanks a lot for the response Mark, I will take a look at the guide that
you point me out. Regarding the iperf results find them below:
*FDR-HOST -> to -> QDR-Blade-HOST*
*(client) (server)*
server:
--
# iperf -s
Server l
Another test make between two HP blades with QDR (with bonding)
e60-host01# iperf -s
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
[ 5] local 172.23.
On 11/20/2015 09:31 AM, Loic Dachary wrote:
> Hi,
>
> On 20/11/2015 02:13, Yonghua Peng wrote:
>> I have been using firefly release. is there an official documentation for
>> upgrading? thanks.
>
> Here it is : http://docs.ceph.com/docs/firefly/install/upgrading-ceph/
>
> Enjoy !
Also suggest
Yeah they are, that is one thing I was planning on changing, What I am
really interested at the moment, is vague expected performance. I mean is
100MB around normal, very low, or "could be better"?
On Tue, Nov 24, 2015 at 8:02 AM, Alan Johnson wrote:
> Are the journals on the same device – it m
Dear Marek,
I would expect a higher performance, but did you measure this? with
rados bench? Please note, ceph is build for parallel access, so the
combined speed increases with more threads, if this is a single thread
measurement, I wonder how well it reflects the performance of the
platform. Wit
Intel DC series (S3610 for journals, S3510 might be OK for data).
Samsung DC PRO series (if you can get them).
There are other drives that might be suitable but I strongly suggest you avoid
those that aren't tested by others - it's a PITA to deal with the problems poor
SSDs cause.
Jan
> On 24
Hi,
Last night I upgraded my cluster from Centos 6.5 -> Centos 7.1 and in the
process upgraded from Emperor -> Firefly -> Hammer
When I finished I changed the crush tunables from
ceph osd crush tunables legacy -> ceph osd crush tunables optimal
I knew this would cause data movement. But the IO
Hello
I used real world type data, which is to say rsync, dd , scp, etc.. In
addition to simple performance of the VMs themselves (I am using this Ceph
cluster as backend to KVM) .
Regardless of which method I used I averaged between 90 to 100MB.
On Tue, Nov 24, 2015 at 8:47 AM, Mart van Santen
Hard to know without more config details such as no of servers, network – GigE
or !0 GigE, also not sure how you are measuring, (reads or writes) you could
try RADOS bench as a baseline, I would expect more performance with 7 X 10K
spinners journaled to SSDs. The fact that SSDs did not perform
7 total servers, 20 GIG pipe between servers, both reads and writes. The
network itself has plenty of pipe left, it is averaging 40Mbits/s
Rados Bench SAS 30 writes
Total time run: 30.591927
Total writes made: 386
Write size: 4194304
Bandwidth (MB/sec): 50.471
Stdde
You are talking about 20 “GIG” (what is that? GB/s? Gb/s? I assume the latter)
then talk about 40Mbit/s.
Am I the only one who cannot parse this? :-)
> On 24 Nov 2015, at 17:27, Marek Dohojda wrote:
>
> 7 total servers, 20 GIG pipe between servers, both reads and writes. The
> network itself
You haven’t stated what size replication you are running. Keep in mind that
with a replication factor of 3, you will be writing 6x the amount of data down
to disks than what the benchmark says (3x replication x2 for data+journal
write).
You might actually be near the hardware maximums. What
Is anyone around the list using ceph + IB FDR or QDR and getting with fio
or any other tool around 3GB/s? then if possible to share some config
variables to see where can I tweak a little bit, since I've already use
mlnx_tune and mlnx_affinity in order to improve and change parameters for
irq affin
this. With 3x replication, journals on disk, and large writes, you'll
essentially be writing the data out 6 times. 100MB/s for 7 disks might
be a little slow, but it's generally in the right ballpark. One of the
goals for newstore is to improve performance for large sequential writes
by avoi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
I've had wildly different iperf results based on the version of the
kernel, OFED and whether you are using datagram or connected mode as
well as the MTU. You really have to just try all the different options
to figure out what works the best.
Please
On 11/24/2015 09:05 AM, German Anders wrote:
Thanks a lot for the response Mark, I will take a look at the guide that
you point me out. Regarding the iperf results find them below:
*FDR-HOST -> to -> QDR-Blade-HOST
*
*(client) (server)*
server:
--
# iperf -s
--
Thanks a lot Robert for the explanation. I understand what you are saying
and I'm also excited to see more about IB with Ceph to get those
performance numbers up, and hopefully (hopefully soon) to see accelio
working for production. Regarding the HP IB switch we got 4 ports (uplinks)
connected to o
Yes, I'm wondering if this is my top performance threshold with this kind
of setup, although I'll assume that IB perf would be better.. :(
*German*
2015-11-24 14:24 GMT-03:00 Mark Nelson :
> On 11/24/2015 09:05 AM, German Anders wrote:
>
>> Thanks a lot for the response Mark, I will take a look
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
I've gotten about 3.2 GB/s with IPoIB on QDR, but it took a couple of
weeks of tuning to get that rate. If your switch is at 2048 MTU, it is
really hard to get it increased without an outage if I remember
correctly. Connected mode is much easier to g
Dear ceph users,
I try to write a crush ruleset that will, for a pool size of 3, put a
copy in another host in the local rack and a copy in another rack. I now
how to do the later, but I do not understand how to match the current
rack. Here is my try:
rule replicate_three_times {
ruleset 1
Crad I think you are 100% correct:
rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await
r_await w_await svctm %util
0.00 369.00 33.00 1405.00 132.00 135656.00 188.86 5.614.02
21.943.60 0.70 100.00
I was kinda wondering that this maybe the case, w
Or separate the journals as this will bring the workload down on the spinners
to 3Xrather than 6X
From: Marek Dohojda [mailto:mdoho...@altitudedigital.com]
Sent: Tuesday, November 24, 2015 1:24 PM
To: Nick Fisk
Cc: Alan Johnson; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Performance ques
Thank you! I will do that. Would you suggest getting another SSD drive or
move the journal to the SSD OSD?
(Sorry for a stupid question, if that is such).
On Tue, Nov 24, 2015 at 11:25 AM, Alan Johnson wrote:
> Or separate the journals as this will bring the workload down on the
> spinners to
Separate would be best, but as with many things in life we are not all driving
around in sports cars!!
Moving the journals to the SSD’s that are also OSD’s themselves will be fine.
SSD’s tend to be more bandwidth limited than IOPs and the reverse is true for
Disks, so you will get maybe 2x
I dunno, I think I just go into my Lotus and mull this over ;) (I wish)
This is a storage for a KVM, and we have quite a few boxes. While right
now none are suffering from IO load, I am seeing slowdown personally and
know that sooner or later others will notice as well.
I think what I will do is
Ok, but it’s probably a bit of a waste. The journals for each disk will
probably require 200-300iops from each SSD and maybe 5GB of space. Personally I
would keep the SSD pool, maybe use it for high perf VM’s?
Typically VM’s will generate more random smaller IO’s so a default rados bench
mig
Oh, well in that you made my life easier, I like that :)
I thought Journal needed to be on a physical space though, not within raw
rbd pool. Was I mistaken?
On Tue, Nov 24, 2015 at 11:51 AM, Nick Fisk wrote:
> Ok, but it’s probably a bit of a waste. The journals for each disk will
> probably r
I think what Nick is suggesting is that you create Nx5GB partitions on the
SSD's (where N is the number of OSD's you want to have fast journals for),
and use the rest of the space for OSDs that would form the SSD pool.
Bill
On Tue, Nov 24, 2015 at 10:56 AM, Marek Dohojda <
mdoho...@altitudedigita
On 11/24/2015 07:00 PM, Emmanuel Lacour wrote:
>
> Dear ceph users,
>
>
> I try to write a crush ruleset that will, for a pool size of 3, put a
> copy in another host in the local rack and a copy in another rack. I now
> how to do the later, but I do not understand how to match the current
> rac
Hi there,
I'm currently following the Ceph QSGs and have currently finished the
Storage Cluster Quick Start and have the current topology of
admin-node - node1 (mon, mds)
- node2 (osd0)
- node3 (osd1)
I am now looking to continue creating a block device and th
Hi Yehuda/RGW experts,
I have one cluster with RGW up and running in the customer site.
I did some heavy performance testing on that with CosBench and as a result
written significant amount of data to showcase performance on that.
Over time, customer also wrote significant amount of data using S3
Hi there,
I'm currently following the Ceph QSGs and have currently finished the
Storage Cluster Quick Start and have the current topology of
admin-node - node1 (mon, mds)
- node2 (osd0)
- node3 (osd1)
I am now looking to continue creating a block device and th
I'll try to put the ports on the HP IB QDR switch to 4K and then configured
the interfaces also to mtu 4096 and do the same tests again and see what
are the results. However, is there any other parameter that I need to take
into account to tune for this? For example this is the port configuration
o
Doh! Sorry I didn't get that. It does make sense.
Thank you everybody. I will try to do set this up before end if this week
and let everybody know the results. Since that may help others in the
future.
On Nov 24, 2015 12:20, "Bill Sanders" wrote:
> I think what Nick is suggesting is that you
On Tue, Nov 24, 2015 at 1:37 PM, Wido den Hollander wrote:
> On 11/24/2015 07:00 PM, Emmanuel Lacour wrote:
>>
>> Dear ceph users,
>>
>>
>> I try to write a crush ruleset that will, for a pool size of 3, put a
>> copy in another host in the local rack and a copy in another rack. I now
>> how to do
On Tue, Nov 24, 2015 at 1:50 PM, James Gallagher
wrote:
> Hi there,
>
> I'm currently following the Ceph QSGs and have currently finished the
> Storage Cluster Quick Start and have the current topology of
>
> admin-node - node1 (mon, mds)
> - node2 (osd0)
> - no
You upgraded (and restarted as appropriate) all the clients first, right?
Warren Wang
On 11/24/15, 10:52 AM, "Joe Ryner" wrote:
>Hi,
>
>Last night I upgraded my cluster from Centos 6.5 -> Centos 7.1 and in the
>process upgraded from Emperor -> Firefly -> Hammer
>
>When I finished I changed
Hi Sean,
Cut some part of slow request log, please find the attachment.
Thinking about set scrub interval more longer(once a month maybe).
An irrelevant question, do you think deep scrub will bring the performance
down?
Best wishes,
Mika
2015-11-24 18:24 GMT+08:00 Sean Redmond :
> Hi,
>
Hi,
in my cluster with 16 OSD daemons and more than 20 million files on
cephfs, the memory usage on MDS is around 16 GB. It seems that 'mds
cache size' has no real influence on the memory usage of the MDS.
Is there a formula that relates 'mds cache size' directly to memory
consumption on the
On Tue, Nov 24, 2015 at 10:26 PM, Mike Miller wrote:
> Hi,
>
> in my cluster with 16 OSD daemons and more than 20 million files on cephfs,
> the memory usage on MDS is around 16 GB. It seems that 'mds cache size' has
> no real influence on the memory usage of the MDS.
>
> Is there a formula that r
Hi Greg,
thanks very much. This is clear to me now.
As for 'MDS cluster', I thought that this was not recommended at this
stage? I would very much like to have a number >1 of MDS in my cluster
as this would probably help very much to balance the load. But I am
afraid what everybody says about
47 matches
Mail list logo