My 0.02, there are two kinds of balance, one for space utilization , another
for performance.
Now seems you will be good for the space utilization, but you might suffer a
bit for the performance as the density of disk increase.The new rack will hold
1/3 data by 1/5 disks, if we assume the work
Pre-allocated the volume by "DD" across the entire RBD before you do any
performance test:).
In this case, you may want to re-create the RBD, pre-allocate and try again.
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf O
Hi Mark,
The Async result in 128K drops quickly after some point, is that because
of the testing methodology?
Other conclusion looks to me like simple messenger + Jemalloc is the best
practice till now as it has the same performance as async but using much less
memory?
-Xiaoxi
Hi,
1. In short, the OSD need to heartbeat with up to #PG x (#Replica -1 ),
but actually will be much less since most of the peers are redundant.
For example, An OSD (say OSD 1) is holding 100 PGs, especially for some PGs,
say PG 1, OSD1 is the primary OSD of PG1, then OSD1 need to pee
Hi Francois,
Actually you are discussing two separate questions here:)
1. in the 5 mons(2 in dc1, 2 in dc2, 1 in wan), can the monitor form a quorum?
How to offload the mon in WAN?
Yes and No, in one case, you lose any of your DC completely, that's
fine, the left 3 monitors could
if there is data to be
> trimmed. I'm not a big fan of a "--skip-trimming" option as there is
> the potential to leave some orphan objects that may not be cleaned up
> correctly.
>
> On Tue, Jan 6, 2015 at 8:09 AM, Jake Young wrote:
> >
> >
> &g
How do you think?
From: Jake Young [mailto:jak3...@gmail.com]
Sent: Monday, January 5, 2015 9:45 PM
To: Chen, Xiaoxi
Cc: Edwin Peer; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] rbd resize (shrink) taking forever and a day
On Sunday, January 4, 2015, Chen, Xiaoxi
mailto:xia
Some low level caching might help, flashcache, dmcache,etc…
But that may hurt the reliability to some extent , and make it harder for
operator ☺
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Lindsay Mathieson
Sent: Monday, January 5, 2015 12:14 PM
To: Christian Balzer
You could use rbd info to see the block_name_prefix, the object
name consist like ., so for example,
rb.0.ff53.3d1b58ba.e6ad should be the th object of the volume
with block_name_prefix rb.0.ff53.3d1b58ba.
$ rbd info huge
rbd image 'huge':
size 1024 TB in 26843
Did you shut down the node with 2 mon?
I think it might be impossible to have redundancy with only 2 node, paxos
quorum is the reason:
Say you have N (N=2K+1) monitors, you always have a node(let's named it node A)
with majority number of MONs(>= K+1), another node(node B) with minority number
Hi,
First of all, the data is safe since it's persistent in journal, if error
occurs on OSD data partition, replay the journal will get the data back.
And, there is a wbthrottle there, you can config how much data(ios, bytes,
inodes) you wants to remain in memory. A background thread wil
Hi Yang bin,
Not sure if you followed the right docs. I suspect you didn’t, because you
should use ceph-disk and specified a FS-Type in the command.
I think you might mislead by the quick
start(http://ceph.com/docs/master/start/quick-ceph-deploy/#create-a-cluster),
it use a directory inst
...@gmail.com]
Sent: Tuesday, December 2, 2014 1:27 PM
To: Chen, Xiaoxi
Cc: ceph-us...@ceph.com; Haomai Wang
Subject: Re: [ceph-users] LevelDB support status is still experimental on Giant?
Hi Xiaoxi,
Thanks for very useful information.
Can you share more details about "Terrible bad performance"
had better off to optimize the key-value backend code
to support specified kind of load.
From: Haomai Wang [mailto:haomaiw...@gmail.com]
Sent: Monday, December 1, 2014 10:14 PM
To: Chen, Xiaoxi
Cc: Satoru Funai; ceph-us...@ceph.com
Subject: Re: [ceph-users] LevelDB support status is still
We have tested it for a while, basically it seems kind of stable but show
terrible bad performance.
This is not the fault of Ceph , but levelDB, or more generally, all K-V
storage with LSM design(RocksDB,etc), the LSM tree structure naturally
introduce very large write amplification 10X to
Hi Simon
Do your workload has lots of RAW? Since Ceph has RW lock in each object,
so if you have a write to RBD and the following read happen to hit the same
object, the latency will be higher.
Another possibility is the OSD op_wq, it’s a priority queue but read and
write have same pr
Hi Chris,
I am not the expert of LIO but from your result, seems RBD/Ceph works
well(RBD on local system, no iSCSI) and LIO works well(Ramdisk (No RBD) -> LIO
target) , and if you change LIO to use other interface (file, loopback) to
play with RBD, it also works well.
So see
Hi Mark
It's client IOPS and we use replica = 2, journal and OSD are hosted in the
same SSDs so the real IOPS is 23K * 2 * 2 =90K, still far from HW limit (30K+
for a single DCS3700)
CPU % is ~62% in peak (2VM ), interrupt distributed.
An additional information, seems the cluster is in a kind
Could you show your cache tiering configuration? Especially this three
parameters.
ceph osd pool set hot-storage cache_target_dirty_ratio 0.4
ceph osd pool set hot-storage cache_target_full_ratio 0.8
ceph osd pool set {cachepool} target_max_bytes {#bytes}
From: ceph-users [mailto:ceph-users-bo
Yes, but usually a system has several layer of error-detecting/recovering stuff
in different granularity.
Disk CRC works on Sector level, Ceph CRC mostly work on object level, and we
also have replication/erasure coding in system level.
The CRC in ceph mainly handle the case, imaging you have a
发自我的 iPhone
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
The "random" may come from ceph trunks. For RBD, Ceph trunk the image to
4M(default) objects, for Rados bench , it already 4M objects if you didn't set
the parameters. So from XFS's view, there are lots of 4M files, in default,
with ag!=1 (allocation group, specified during mkfs, default seems t
From: zrz...@gmail.com [mailto:zrz...@gmail.com] On Behalf Of Rongze Zhu
Sent: Monday, July 29, 2013 2:18 PM
To: Chen, Xiaoxi
Cc: Gregory Farnum; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] add crush rule in one command
On Sat, Jul 27, 2013 at 4:25 PM, Chen, Xiaoxi
mailto:xiaoxi.c
My 0.02:
1. Why you need to simultaneously set the map for your purpose ? It’s
obvious very important for ceph to have an atomic CLI , but this is just
because the map may be changed by cluster itself ( loss node or what), but not
for your case. Since the map can be auto-distributed by ce
发自我的 iPhone
在 2013-7-23,4:35,"Charles 'Boyo"
mailto:charlesb...@gmail.com>> 写道:
Hi,
On Mon, Jul 22, 2013 at 2:08 AM, Chen, Xiaoxi
mailto:xiaoxi.c...@intel.com>> wrote:
Hi,
> Can you share any information on the SSD you are using, is it PCIe
connect
发自我的 iPhone
在 2013-7-22,23:16,"Gandalf Corvotempesta" 写道:
> 2013/7/22 Chen, Xiaoxi :
>> With “journal writeahead”,the data first write to journal ,ack to the
>> client, and write to OSD, note that, the data always keep in memory before
>> it write to both
Basically i think endurance is most important for a ceph journal,since the
workload for journal is full write,you can easily caculate how long your ssd
will burn out.. even we assume your ssd only run at 100MB/s in average,you will
burn out 8TB/day and 240TB/month
DCS 3500 is definitely not use
发自我的 iPhone
在 2013-7-23,0:21,"Gandalf Corvotempesta" 写道:
> 2013/7/22 Chen, Xiaoxi :
>> Imaging you have several writes have been flushed to journal and acked,but
>> not yet write to disk. Now the system crash by kernal panic or power
>> failure,you will lose
Hi,
My 0.02 :
> Secondly, I'm unclear about how OSDs use the journal. It appears they
write to the journal (in all cases, can't be turned
>off), ack to the client and then read the journal later to write to backing
>storage. Is that correct?
I would like to say NO, the journal w
2:17 PM
To: Chen, Xiaoxi
Cc: ceph-de...@vger.kernel.org; ceph-us...@ceph.com
Subject: Re: Any concern about Ceph on CentOS
Hi Xiaoxi,
we are really running Ceph on CentOS-6.4
(6 server nodes, 3 client nodes, 160 OSDs).
We put a 3.8.13 Kernel on top and installed the ceph-0.61.4 cluster with
mkc
rstanding of the issue is that for the actual cluster it's self, it
should be ok.
I could be wrong here, but I thought the kernel module was only specifically
for mounting cephfs (And even then, there's a fuse module that you *can* use
anyway)
On 07/17/2013 11:18 AM, Chen, Xiaoxi
Hi list,
I would like to ask if anyone really run Ceph on CentOS/RHEL? Since the
kernel version for Cent/RHEL is much older than that of Ubuntu, I am thinking
about whether we have some known performance/functionality issue?
Thanks for everyone could share your insight for Ceph+CentOS.
threads. This is still too high for 8 core or 16 core cpu/cpus and will waste a
lot of cycles in context switchinh.
发自我的 iPhone
在 2013-6-7,0:21,"Gregory Farnum" 写道:
> On Thu, Jun 6, 2013 at 12:25 AM, Chen, Xiaoxi wrote:
>>
>> Hi,
>> From the code, each pi
Hi,
From the code, each pipe (contains a TCP socket) will fork 2 threads,
a reader and a writer. We really observe 100+ threads per OSD daemon with 30
instances of rados bench as clients.
But this number seems a bit crazy, if I have a 40 disks node, thus I
will have 40 OSDs, we
iaoxi
-Original Message-
From: Gregory Farnum [mailto:g...@inktank.com]
Sent: 2013年6月4日 0:37
To: Chen, Xiaoxi
Cc: ceph-de...@vger.kernel.org; Mark Nelson (mark.nel...@inktank.com);
ceph-us...@ceph.com
Subject: Re: [ceph-users] Ceph killed by OS because of OOM under high load
On Mon, Jun 3, 2013 at 8:
my 0.02, you really dont need to wait for health_ok between your recovery
steps,just go ahead. Everytime a new map be generated and broadcasted,the old
map and in-progress recovery will be canceled
发自我的 iPhone
在 2013-6-2,11:30,"Nigel Williams" 写道:
> Could I have a critique of this approach pl
Hi,
As my previous mail reported some weeks ago ,we are suffering from OSD
crash/ OSD Flipping / System reboot and etc, all these unstable issue really
stop us from digging further into ceph characterization.
Good news is that we seems find out the cause, I explain our
experiment
Hi,
Can I assume i am safe without this patch if i don't use any rbd cache?
发自我的 iPhone
在 2013-5-29,16:00,"Alex Bligh" 写道:
>
> On 28 May 2013, at 06:50, Wolfgang Hennerbichler wrote:
>
>> for anybody who's interested, I've packaged the latest qemu-1.4.2 (not 1.5,
>> it didn't work nicel
Cannot agree more,when I trying to promote ceph to internal state holder,they
always complaining the stability of ceph,especially when they are evaluating
ceph with high enough pressure, ceph cannot stay heathy during the test.
发自我的 iPhone
在 2013-5-29,19:13,"Wolfgang Hennerbichler"
写道:
> H
ormal.
Xiaoxi
-Original Message-
From: Chen, Xiaoxi
Sent: 2013年5月16日 6:38
To: 'Sage Weil'
Subject: RE: [ceph-users] OSD state flipping when cluster-network in high
utilization
Uploaded to /home/cephdrop/xiaoxi_flip_osd/osdlog.tar.gz
Thanks
-Original Me
Thanks, but i am not quite understand how to determine weather monitor
overloaded? and if yes,will start several monitor help?
发自我的 iPhone
在 2013-5-15,23:07,"Jim Schutt" 写道:
> On 05/14/2013 09:23 PM, Chen, Xiaoxi wrote:
>>> How responsive generally is the machine
853'4329,4103'5330] local-les=4092 n=154 ec
=100 les/c 4092/4093 4091/4091/4034) [319,46] r=0 lpr=4091 mlcod 4103'5329
active+clean] do_op mode is idle(wr=0)
2013-05-15 15:29:22.513295 7f0253340700 10 osd.319 pg_epoch: 4113 pg[3.d7( v
4103'5330 (3853'4329,4103'5330]
d like to say it may related with CPU scheduler ? The
heartbeat thread (in busy OSD ) failed to get enough cpu cycle.
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
Sent: 2013年5月15日 7:23
To: Chen, Xiaoxi
Cc: Mark
h >30% io wait).Enabling jumbo frame **seems**
make things worth.(just feeling.no data supports)
发自我的 iPhone
在 2013-5-14,23:36,"Mark Nelson" 写道:
> On 05/14/2013 10:30 AM, Sage Weil wrote:
>> On Tue, 14 May 2013, Chen, Xiaoxi wrote:
>>>
>>> Hi
>>>
Hi
We are suffering our OSD flipping between up and down ( OSD X be voted to
down due to 3 missing ping, and after a while it tells the monitor "map xxx
wrongly mark me down" ). Because we are running sequential write performance
test on top of RBDs, and the cluster network nics is really in h
by what means you want a pool with replication=0?
发自我的 iPhone
在 2013-4-10,18:59,"Witalij Poljatchek"
mailto:witalij.poljatc...@aixit.com>> 写道:
Hello,
need help to solve segfault on all osd in my test cluster.
Setup ceph from scratch.
service ceph -a start
ceph -w
health HEALTH_OK
m
Hi Mark,
I think you are the right man for these questions :) I am really don't
understand how osd_client_message_size_cap , objecter_infilght_op_bytes/ops,
ms_dispatch_throttle_bytes works? And how they affect performance.
Especially ,the objecter_inflight_op_bytes seems be used
Are you using a partition as journal?
From: ceph-users-boun...@lists.ceph.com
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Aleksey Samarin
Sent: 2013年3月26日 20:45
To: ceph-us...@ceph.com
Subject: [ceph-users] Journal size
Hello everyone!
I have question about journal. Ceph cluster is
-Original Message-
From: Sage Weil [mailto:s...@inktank.com]
Sent: 2013年3月25日 23:35
To: Chen, Xiaoxi
Cc: 'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com);
ceph-de...@vger.kernel.org
Subject: Re: [ceph-users] Ceph Crach at sync_thread_timeout after heavy random
writes.
Hi Xiaox
Rephrase it to make it more clear
From: ceph-users-boun...@lists.ceph.com
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Chen, Xiaoxi
Sent: 2013年3月25日 17:02
To: 'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com)
Cc: ceph-de...@vger.kernel.org
Subject: [ceph-users] Cep
say the issue I hit is a different issue?(not #3737)
> Wolfgang
xiaoxi
>
> On 03/25/2013 10:15 AM, Chen, Xiaoxi wrote:
>>
>>
>> Hi Wolfgang,
>>
>>Thanks for the reply,but why my problem is related with issue#3737? I
is could be related to this issue here and has been reported multiple
> times:
>
> http://tracker.ceph.com/issues/3737
>
> In short: They're working on it, they know about it.
>
> Wolfgang
>
> On 03/25/2013 10:01 AM, Chen, Xiaoxi wrote:
>> Hi list,
>&g
Hi list,
We have hit and reproduce this issue for several times, ceph will
suicide because FileStore: sync_entry timed out after a very heavy random IO on
top of the RBD.
My test environment is:
4 Nodes ceph cluster with 20 HDDs for OSDs and 4
Intel
Hi List,
I cannot start my monitor when I update my cluster to v0.59, pls note
that I am not trying to upgrade,but by reinstall the ceph software stack and
rerunning mkcephfs. I have seen that the monitor change a lot after 0.58, is
the mkcephfs still have bugs ?
Below is the log:
Thanks josh,the problem is solved by updating ceph in the glance node.
发自我的 iPhone
在 2013-3-20,14:59,"Josh Durgin" 写道:
> On 03/19/2013 11:03 PM, Chen, Xiaoxi wrote:
>> I think Josh may be the right man for this question ☺
>>
>> To be more precious, I would l
I think Josh may be the right man for this question ☺
To be more precious, I would like to add more words about the status:
1. We have configured “show_image_direct_url= Ture” in Glance, and from the
Cinder-volume’s log, we can make sure we have got a direct_url , for example.
image_id 6565d775-
For me,We have seem a supermicro machine,which is 2U with 2 CPU and 24 2.5 inch
sata/sas drives,together with 2 onboard 10Gb Nic. I think it's good enough for
both density and computing power.
To another end, we are also planning to evaluating small node for ceph,say a
ATOM with 2 /4 disks per
57 matches
Mail list logo