Thanks for reply!
not very good, but seems acceptable, how do you think the possible reasons?
osd erf counters helpful for this?
sudo fio --filename=/dev/sda2 --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test
journal-te
Hello Huan,
If you look at Sebestien blog (
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/)
at comment section. You can see that Samsung SSD behaves very and very
poorly on tests:
Samsung SSD 850 PRO 256GB
40960 bytes (410 MB) copied
> Op 12 februari 2016 om 6:55 schreef Austin Johnson :
>
>
> The Supermicro 5018A-AR12L is built for object storage. In our testing,
> they perform pretty well. You would have to invest in discrete 10G nics to
> meet all of your requirements.
>
Using these ones in a archiving cluster in the Ne
> Op 12 februari 2016 om 10:14 schreef Ferhat Ozkasgarli :
>
>
> Hello Huan,
>
> If you look at Sebestien blog (
> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/)
> at comment section. You can see that Samsung SSD behaves very and very
200 iops is close to the sync write latency you will get with either slow
CPU's or 1GB networking. What sort of hardware/networking are you running?
With top of the range hardware and a replica count of 2-3, don't expect to
get much above 500-750iops for a single direct write.
> -Original Mes
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Wido den Hollander
> Sent: 12 February 2016 09:15
> To: Schlacta, Christ ; Austin Johnson
>
> Cc: ceph-users@lists.ceph.com; Nick Fisk
> Subject: Re: [ceph-users] Xeon-D 1540 Ceph Nodes
>
"op_w_latency":
"avgcount": 42991,
"sum": 402.804741329
402.0/42991
0.009350794352306296
~9ms latency, that means this ssd not suitable for journal device?
"osd": {
"op_wip": 0,
"op": 58683,
"op_in_bytes": 7309042294,
"op_out_bytes": 5071374
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Huan Zhang
> Sent: 12 February 2016 10:00
> To: Irek Fasikhov
> Cc: ceph-users
> Subject: Re: [ceph-users] ceph 9.2.0 SAMSUNG ssd performance issue?
>
> "op_w_latency":
> "avgcount":
thanks nick,
filestore-> journal_latency: ~1.1ms
214.0/180611
0.0011848669239415096
seems ssd write is ok, any other idea is highly appreciated!
"filestore": {
"journal_queue_max_ops": 300,
"journal_queue_ops": 0,
"journal_ops": 180611,
"journal_queue_ma
Write latency of 1.1ms is ok, but not brilliant. What IO size are you testing
with?
Don't forget if you have a journal latency of 1.1ms, excluding all other
latency introduced by networking, replication and processing in the OSD code,
you won't get more than about 900 iops. All the things I me
Hi, at this night I had same issue on Hammer LTS.
I think that this is a ceph bug.
My history:
Ceph version: 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
Distro: Debian 7 (Proxmox 3.4)
Kernel: 2.6.32-39-pve
We have 9x 6TB SAS Drives in main pool and 6x 128GB PCIe SSD in cache
pool on 3
Good news, while I wrote the previous letter I found the solution, to
recovery back my vm's:
ceph osd tier remove cold-storage
I've been thinking how it can affect what happened. But I still do not
understand why overlay option has so strange behavior.
I know that overlay option sets overlay
My enviroment:
32 cores Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
10GiB NICS
4 osds/host
My client is database(mysql) direct/sync write per transaction, a little
bit sensitive to io latency(sync/direct).
I used sata disk for osd backends, get ~100 iops/4k/1 iodepth, ~10ms io
latency , simila
On Thu, 11 Feb 2016, Robert LeBlanc wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Is this only a problem with EC base tiers or would replicated base
> tiers see this too?
In general proxying to the base tier will work just fine if its
replicated, so this is mostly an EC-only iss
Hello,
We are planning to build 1PB Ceph cluster for RadosGW with Erasure
Code. It will be used for storing online videos.
We do not expect outstanding write performace, something like
200-300MB/s of sequental write will be quite enough, but data safety
is very important.
What are the most popular
I will do my best to answer, but some of the questions are starting to stretch
the limit of my knowledge
> -Original Message-
> From: Huan Zhang [mailto:huan.zhang...@gmail.com]
> Sent: 12 February 2016 12:15
> To: Nick Fisk
> Cc: Irek Fasikhov ; ceph-users us...@ceph.com>
> Subject: Re
On Thu, 11 Feb 2016, Sage Weil wrote:
> On Thu, 11 Feb 2016, Nick Fisk wrote:
> > That’s a relief, I was sensing a major case of face palm occuring when I
> > read Jason's email!!!
>
> https://github.com/ceph/ceph/pull/7617
>
> The tangled logic in maybe_handle_cache wasn't respecting the force
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Sage Weil
> Sent: 12 February 2016 13:15
> To: Nick Fisk
> Cc: 'Jason Dillaman' ; 'Samuel Just'
> ; ceph-users@lists.ceph.com; ceph-
> de...@vger.kernel.org
> Subject: RE
HI,
I'm seeing at of errors like the following. The root cause appears to be
the existence of a collection-- garbage data in the filestore.To clean
it up, I have to remove a set of empty directories. The directories are
old, created last August or September.I've had this happen a number
I have 7 servers, each containing 60 x 6TB drives in jbod mode. When I
first started, I only activated a couple drives on 3 nodes as Ceph OSDs.
Yesterday, I went to expand to the remaining nodes as well as prepare and
activate all the drives.
ceph-disk prepare worked just fine. However, ceph-disk
Can you check the value of kernel.pid_max. This may have to be increased for
larger OSD counts, it may have some bearing?
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of John
Hogenmiller (yt)
Sent: Friday, February 12, 2016 8:52 AM
To: ceph-users@lists.ceph.com
Subject:
Hello,
yesterday I upgraded our most busy (in other words lethally overloaded)
production cluster to the latest Firefly in preparation for a Hammer
upgrade and then phasing in of a cache tier.
When restarting the ODSs it took 3 minutes (1 minute in a consecutive
repeat to test the impact of prim
Hi,
On 02/12/2016 03:47 PM, Christian Balzer wrote:
Hello,
yesterday I upgraded our most busy (in other words lethally overloaded)
production cluster to the latest Firefly in preparation for a Hammer
upgrade and then phasing in of a cache tier.
When restarting the ODSs it took 3 minutes (1 min
John,
> 2016-02-12 12:53:43.340526 7f149bc71940 -1 journal FileJournal::_open: unable
> to setup io_context (0) Success
Try increasing aio-max-nr:
echo 131072 > /proc/sys/fs/aio-max-nr
Best regards,
Alexey
On Fri, Feb 12, 2016 at 4:51 PM, John Hogenmiller (yt) wrote:
>
>
> I have 7 se
On Fri, 12 Feb 2016 15:56:31 +0100 Burkhard Linke wrote:
> Hi,
>
> On 02/12/2016 03:47 PM, Christian Balzer wrote:
> > Hello,
> >
> > yesterday I upgraded our most busy (in other words lethally overloaded)
> > production cluster to the latest Firefly in preparation for a Hammer
> > upgrade and th
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Christian Balzer
> Sent: 12 February 2016 15:38
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Reducing the impact of OSD restarts (noout ain't
> uptosnuff)
>
> On Fri, 12 Feb 2
Nick is right. Setting noout is the right move in this scenario. Restarting an
OSD shouldn't block I/O unless nodown is also set, however. The exception to
this would be a case where min_size can't be achieved because of the down OSD,
i.e. min_size=3 and 1 of 3 OSDs is restarting. That would cer
I wonder if Christian is hitting some performance issue when the OSD or
number of OSD's all start up at once? Or maybe the OSD is still doing some
internal startup procedure and when the IO hits it on a very busy cluster,
it causes it to become overloaded for a few seconds?
I've seen similar thing
Hi Bart,
This email belongs in ceph-users (CC'ed), or maybe ceph-devel. You're
unlikely to get answers to this on ceph-community.
-Joao
On 09/17/2015 11:33 PM, bart.bar...@osnexus.com wrote:
> I'm running in a 3-node cluster and doing osd/rbd creation and deletion,
> and ran across this WARN
>
Your probably running into issues with sysvinit / upstart / whatever.
Try partitioning the DM and then mapping it directly in your ceph.conf under
the osd section.
It should work, ceph is just a process using the filesystem.
Tyler Bishop
Chief Technical Officer
513-299-7108 x
Great work as always sage!
Tyler Bishop
Chief Technical Officer
513-299-7108 x10
tyler.bis...@beyondhosting.net
If you are not the intended recipient of this transmission you are notified
that disclosing, copying, distributing or taking any action in reliance on the
contents of this inf
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
What I've seen is that when an OSD starts up in a busy cluster, as
soon as it is "in" (could be "out" before) it starts getting client
traffic. However, it has be "in" to start catching up and peering to
the other OSDs in the cluster. The OSD is not
> -Original Message-
> From: Nick Fisk [mailto:n...@fisk.me.uk]
> Sent: 12 February 2016 13:31
> To: 'Sage Weil'
> Cc: 'Jason Dillaman' ; 'Samuel Just'
> ; ceph-users@lists.ceph.com; ceph-
> de...@vger.kernel.org
> Subject: RE: cls_rbd ops on rbd_id.$name objects in EC pool
>
> > -Ori
Does anyone know if there will be any representation of ceph at the Lustre
Users' Group in Portland this year?
If not, is there any event in the US that brings the ceph community together?
Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238
Hello fellow namesake.
Though I'm doubtful there will be representation in an official capacity at
the Lustre User's Group, you might want to check out Ceph Days...
http://ceph.com/cephdays/
On Fri, Feb 12, 2016 at 1:13 PM, Andrus, Brian Contractor
wrote:
> Does anyone know if there will be an
I could be wrong, but I didn't think a PG would have to peer when an OSD is
restarted with noout set. If I'm wrong, then this peering would definitely
block I/O. I just did a quick test on a non-busy cluster and didn't see any
peering when my OSD went down or up, but I'm not sure how good a test
I started a cluster with 9 OSD across 3 nodes. Then I expanded it to 419
OSDs across 7 nodes. Along the way, I increased the pg_num/pgp_num in the
rbd pool. Thanks to help earlier on this list, I was able to do that.
Tonight I started to do some perf testing and quickly realized that I never
upda
Hello,
for the record what Robert is writing below matches my experience the best.
On Fri, 12 Feb 2016 22:17:01 + Steve Taylor wrote:
> I could be wrong, but I didn't think a PG would have to peer when an OSD
> is restarted with noout set. If I'm wrong, then this peering would
> definitely
Christian,
Yep, that describes what I see too. Good news is that I made a lot of
progress on optimizing the queue today 10-50% performance increase in my
microbenchmarks (that is only the improvement in enqueueing and dequeueing
ops which is a small part of the whole IO path, but every little bit
39 matches
Mail list logo