[ceph-users] 2 pgs stuck in undersized after cluster recovery

2018-06-29 Thread shadow_lin
uot;: "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] } 2018-06-30 shadow_lin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Uneven data distribution with even pg distribution after rebalancing

2018-06-25 Thread shadow_lin
This is the formated pg dump result: https://pasteboard.co/HrBZv3s.png You can see the pg distribution of each pool on each osd is fine. 2018-06-26 shadow_lin 发件人:David Turner 发送时间:2018-06-26 10:32 主题:Re: Re: Re: [ceph-users] Uneven data distribution with even pg distribution after

Re: [ceph-users] Uneven data distribution with even pg distribution after rebalancing

2018-06-25 Thread shadow_lin
ec_rbd_pool 3 219T 81.4050172G 57441718 rbd_pool4 144 037629G 19 2018-06-26 shadow_lin 发件人:David Turner 发送时间:2018-06-26 10:21 主题:Re: Re: [ceph-users] Uneven data distribution with even pg distribution after rebalancing 收件人

Re: [ceph-users] Uneven data distribution with even pg distribution after rebalancing

2018-06-25 Thread shadow_lin
hash rjenkins pg_num 128 pgp_num 128 last_change 3248 flags hashpspool,nearfull stripe_width 0 application rbd pg distribution of osd of all pools: https://pasteboard.co/HrBZv3s.png What I don't understand is why data distribution is uneven when pg distribution is even. 2018-06

[ceph-users] Uneven data distribution with even pg distribution after rebalancing

2018-06-24 Thread shadow_lin
ected or I misconfiged something or some kind of bug? Thanks 2018-06-25 shadow_lin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Does jewel 10.2.10 support filestore_split_rand_factor?

2018-04-08 Thread shadow_lin
folder when clients are writing data into the cluster. 2018-04-08 shadow_lin 发件人:David Turner 发送时间:2018-04-07 03:33 主题:Re: [ceph-users] Does jewel 10.2.10 support filestore_split_rand_factor? 收件人:"shadow_lin" 抄送:"Pavan Rallabhandi","ceph-users" You could randomi

Re: [ceph-users] Does jewel 10.2.10 support filestore_split_rand_factor?

2018-04-01 Thread shadow_lin
Thanks. Is there any workaround for 10.2.10 to avoid all osd start spliting at the same time? 2018-04-01 shadowlin 发件人:Pavan Rallabhandi 发送时间:2018-04-01 22:39 主题:Re: [ceph-users] Does jewel 10.2.10 support filestore_split_rand_factor? 收件人:"shadow_lin","ceph-users&

[ceph-users] Does jewel 10.2.10 support filestore_split_rand_factor?

2018-04-01 Thread shadow_lin
ax_split_count": "32", "journaler_allow_split_entries": "true", "mds_bal_split_size": "1", "mds_bal_split_rd": "25000", "mds_bal_split_wr": "1", "mds_bal_split

Re: [ceph-users] ceph mgr balancer bad distribution

2018-03-29 Thread shadow_lin
Hi Stefan, > Am 28.02.2018 um 13:47 schrieb Stefan Priebe - Profihost AG: >> Hello, >> >> with jewel we always used the python crush optimizer which gave us a >> pretty good distribution fo the used space. >> You mentioned a python crush opimizer for jewel.Could you tell me where I can find it? Ca

Re: [ceph-users] remove big rbd image is very slow

2018-03-27 Thread shadow_lin
I did have done that before, but in most time I can't just delete the pool. Is there any other way to speed up the rbd image deletion? 2018-03-27 shadowlin 发件人:Ilya Dryomov 发送时间:2018-03-26 20:09 主题:Re: [ceph-users] remove big rbd image is very slow 收件人:"shadow_lin" 抄送:"

[ceph-users] remove big rbd image is very slow

2018-03-17 Thread shadow_lin
Hi list, My ceph version is jewel 10.2.10. I tired to use rbd rm to remove a 50TB image(without object map because krbd does't support it).It takes about 30mins to just complete about 3%. Is this expected? Is there a way to make it faster? I know there are scripts to delete rados objects of the r

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-11 Thread shadow_lin
the lock later when the lock is released. 2018-03-11 shadowlin 发件人:Jason Dillaman 发送时间:2018-03-11 07:46 主题:Re: Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock 收件人:"shadow_lin" 抄送:"Mike Christie","Lazuardi Nasution","Ceph Users

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-10 Thread shadow_lin
and overwrite the new writes? PS: Petasan say they can do active/active iscsi with patched suse kernel. 2018-03-10 shadowlin 发件人:Jason Dillaman 发送时间:2018-03-10 21:40 主题:Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock 收件人:"shadow_lin" 抄送:"Mike Christie&q

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-10 Thread shadow_lin
active/active? What mechanism should be implement to avoid the problem with active/passive and active/active multipath? 2018-03-10 shadowlin 发件人:Mike Christie 发送时间:2018-03-09 00:54 主题:Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock 收件人:"shadow_lin","Laz

Re: [ceph-users] [jewel] High fs_apply_latency osds

2018-03-10 Thread shadow_lin
because I have some problem to run xfs for now. I am trying to better balance the pg distrubition now to see if it can ease the high latency problem. 2018-03-10 shadowlin 发件人:Chris Hoy Poy 发送时间:2018-03-10 09:44 主题:Re: [ceph-users] [jewel] High fs_apply_latency osds 收件人:"shadow_lin&q

[ceph-users] [jewel] High fs_apply_latency osds

2018-03-09 Thread shadow_lin
Hi list, During my write test,I find there are always some of the osds have high fs_apply_latency(1k-5kms,2-8times more than others). At first I think it is caused by unbalanced pg distribution, but after I reweight the osds the problem hasn't gone away. I looked into the osds with high latency

Re: [ceph-users] Uneven pg distribution cause high fs_apply_latency on osds with more pgs

2018-03-08 Thread shadow_lin
exception table in the osdmap in luminous 12.2.x. It is said to use this it is possible to achive perfect pg distribution among osds. 2018-03-09 shadow_lin 发件人:David Turner 发送时间:2018-03-09 06:45 主题:Re: [ceph-users] Uneven pg distribution cause high fs_apply_latency on osds with more pgs 收件人

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-07 Thread shadow_lin
Hi David, Thanks for the info. Could I assume that if use active/passive multipath with rbd exclusive lock then all targets which support rbd(via block) are safe? 2018-03-08 shadow_lin 发件人:David Disseldorp 发送时间:2018-03-08 08:47 主题:Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-07 Thread shadow_lin
Hi Christie, Is it safe to use active/passive multipath with krbd with exclusive lock for lio/tgt/scst/tcmu? Is it safe to use active/active multipath If use suse kernel with target_core_rbd? Thanks. 2018-03-07 shadowlin 发件人:Mike Christie 发送时间:2018-03-07 03:51 主题:Re: [ceph-users] iSCSI M

[ceph-users] Uneven pg distribution cause high fs_apply_latency on osds with more pgs

2018-03-07 Thread shadow_lin
Hi list, Ceph version is jewel 10.2.10 and all osd are using filestore. The Cluster has 96 osds and 1 pool with size=2 replication with 4096 pg(base on pg calculate method from ceph doc for 100pg/per osd). The osd with the most pg count has 104 PGs and there are 6 osds have above 100 PGs M

Re: [ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster?

2018-03-07 Thread shadow_lin
What you said make sense. I have encountered a few hardware related issue that caused one osd to work abnormal and blocked all io of the whole cluster(all osd in one pool) which makes me think how to avoid this situation. 2018-03-07 shadow_lin 发件人:David Turner 发送时间:2018-03-07 13:51 主题:Re

Re: [ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster?

2018-03-06 Thread shadow_lin
Hi Turner, Thanks for your insight. I am wondering if the mon can detect slow/blocked request from certain osd why can't mon mark a osd with blocked request down if the request is blocked for a certain time. 2018-03-07 shadow_lin 发件人:David Turner 发送时间:2018-03-06 23:56 主题:Re: [ceph-

[ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster?

2018-03-03 Thread shadow_lin
o let ceph to mark the the crippled osd down if the requests direct to that osd are blocked more than certain time to avoid the whole cluster is blocked? 2018-03-04 shadow_lin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.cep

Re: [ceph-users] how is iops from ceph -s client io section caculated?

2018-03-03 Thread shadow_lin
David Turner 发送时间:2018-03-03 22:35 主题:Re: [ceph-users] how is iops from ceph -s client io section caculated? 收件人:"shadow_lin" 抄送:"ceph-users" I would guess that the higher iops in ceph status are from iops calculated from replication. fio isn't aware of the backend re

[ceph-users] how is iops from ceph -s client io section caculated?

2018-03-02 Thread shadow_lin
Hi list, There is a client io section from the result of ceph -s. I found the value of it is kinda confusing. I am using fio to test rbd seq write performance with 4m block.The throughput is about 2000MB/s and fio shows the iops is 500.But from the ceph -s client io section the throughput is abo

[ceph-users] [luminous12.2.2]Cache tier doesn't work properly

2018-02-13 Thread shadow_lin
sd pool set hot-pool cache_target_full_ratio 0.8 set pool 39 cache_target_full_ratio to 0.8 # # ceph osd pool set hot-pool cache_min_flush_age 600 set pool 39 cache_min_flush_age to 600 # # ceph osd pool set hot-pool cache_min_evict_age 1800 set pool 39 cache_min_evict_age to 1800 2018-02-13 shadow_lin __

[ceph-users] How does cache tier work in writeback mode?

2018-02-08 Thread shadow_lin
Hi list, I am testing cache tier in writeback mode. The test resutl is confusing.The write performance is worse than without a cache tier. The hot storage pool is an all ssd pool and the cold storage pool is an all hdd pool. I also created a hddpool and a ssdpool with the same crush rule as the

Re: [ceph-users] How to clean data of osd with ssd journal(wal, db if it is bluestore) ?

2018-02-01 Thread shadow_lin
How to clean data of osd with ssd journal(wal, db if it is bluestore) ? 收件人:"shadow_lin" 抄送:"ceph-users" Hi Lin, We do the extra dd after zapping the disk. ceph-disk has a zap function that uses wipefs to wipe fs traces, dd to zero 10MB at partition starts, then sgdisk t

Re: [ceph-users] How to clean data of osd with ssd journal(wal, db if it is bluestore) ?

2018-02-01 Thread shadow_lin
l, db if it is bluestore) ? 收件人:"David Turner" 抄送:"shadow_lin","ceph-users" I would recommend as Wido to use the dd command. block db device holds the metada/allocation of objects stored in data block, not cleaning this is asking for problems, besides it does

Re: [ceph-users] How to clean data of osd with ssd journal(wal, db if it is bluestore) ?

2018-01-31 Thread shadow_lin
:2018-01-31 17:24 主题:Re: [ceph-users] How to clean data of osd with ssd journal(wal, db if it is bluestore) ? 收件人:"shadow_lin" 抄送:"ceph-users" I use gdisk to remove the partition and partprobe for the OS to see the new partition table. You can script it with sgdisk. On W

[ceph-users] How to clean data of osd with ssd journal(wal, db if it is bluestore) ?

2018-01-31 Thread shadow_lin
,db if it is bluestore) of the osd I want to remove?Especially when there are other osds are using other partition of the same ssd as journals(wal,db if it is bluestore) . 2018-01-31 shadow_lin ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] How ceph client read data from ceph cluster

2018-01-26 Thread shadow_lin
read data from ceph cluster 收件人:"shadow_lin" 抄送:"ceph-users" On 2018-01-26 09:09, shadow_lin wrote: Hi List, I read a old article about how ceph client read from ceph cluster.It said the client only read from the primary osd. Since ceph cluster in replicate mode have se

[ceph-users] How ceph client read data from ceph cluster

2018-01-25 Thread shadow_lin
artcile is rather old so maybe ceph has imporved to read from all the copys? But I haven't find any info about that. Any info about that would be appreciated. Thanks 2018-01-26 shadow_lin ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Limit deep scrub

2018-01-14 Thread shadow_lin
hi, you can try to adjusting osd_scrub_chunk_min,osd_scrub_chunk_max and osd_scrub_sleep. osd scrub sleep Description: Time to sleep before scrubbing next group of chunks. Increasing this value will slow down whole scrub operation while client operations will be less impacted. Type: Flo

Re: [ceph-users] How to speed up backfill

2018-01-10 Thread shadow_lin
Hi , Mine is purely backfilling(remove a osd from the cluster) and it started at 600Mb/s and ended at about 3MB/s. How is your recovery made up?Is it backfill or log replay pg recovery or both? 2018-01-11 shadow_lin 发件人:Josef Zelenka 发送时间:2018-01-11 15:26 主题:Re: [ceph-users] How

[ceph-users] How to speed up backfill

2018-01-10 Thread shadow_lin
Hi all, I am playing with setting for backfill to try to find how to control the speed of backfill. Now I only find "osd max backfills" can have effect the backfill speed. But after all pg need to be backfilled begin backfilling I can't find any way to speed up backfills. Especailly when it c

Re: [ceph-users] Bad crc causing osd hang and block all request.

2018-01-10 Thread shadow_lin
Thanks for your advice I rebuilt the osd and haven't have this happened again.So it could be corruption on the hdds. 2018-01-11 lin.yunfan 发件人:Konstantin Shalygin 发送时间:2018-01-09 12:11 主题:Re: [ceph-users] Bad crc causing osd hang and block all request. 收件人:"ceph-users" 抄送: > What could ca

[ceph-users] Bad crc causing osd hang and block all request.

2018-01-08 Thread shadow_lin
Hi lists, ceph version:luminous 12.2.2 The cluster was doing writing thoughput test when this problem happened. The cluster health became error Health check update: 27 stuck requests are blocked > 4096 sec (REQUEST_STUCK) Clients can't write any data into cluster. osd22 and osd40 are the osds wh

[ceph-users] [luminous 12.2.2]bluestore cache uses much more memory than setting value

2018-01-06 Thread shadow_lin
Hi all, I have already know that luminous would use more memory for bluestore cache than the config setting, but I was expecting 1.5x not 7-8x. below is my bluestore cache setting [osd] osd max backfills = 4 bluestore_cache_size = 134217728 bluestore_cache_kv_max = 134217728 osd client messag

[ceph-users] How to monitor slow request?

2017-12-26 Thread shadow_lin
> 32 sec (REQUEST_SLOW) There is no osd id info about where the slow reqeust happenened. what would be a proper way to moniter which osd caused the slow request and how many slow requests are one that osd? 2017-12-27 shadow_lin ___ ceph-users mail

Re: [ceph-users] [luminous 12.2.2] Cluster write performance degradation problem(possibly tcmalloc related)

2017-12-26 Thread shadow_lin
I have disabled scrub before the test. 2017-12-27 shadow_lin 发件人:Webert de Souza Lima 发送时间:2017-12-22 20:37 主题:Re: [ceph-users] [luminous 12.2.2] Cluster write performance degradation problem(possibly tcmalloc related) 收件人:"ceph-users" 抄送: On Thu, Dec 21, 2017 at 12:52 PM,

[ceph-users] [luminous 12.2.2] Cluster write performance degradation problem(possibly tcmalloc related)

2017-12-21 Thread shadow_lin
My testing cluster is an all hdd cluster with 12 osd(10T hdd each). I moinitor luminous 12.2.2 write performance and osd memory usage with grafana graph for statistic logging. The test is done by using fio on a mounted rbd with follow fio parameters: fio -directory=fiotest -direct=1 -thread -rw=w

Re: [ceph-users] [Luminous 12.2.2] Cluster peformance drops after certain point of time

2017-12-21 Thread shadow_lin
, This is just a tip, I do not know if this actually applies to you, but some ssds are decreasing their write throughput on purpose so they do not wear out the cells before the warranty period is over. Denes. On 12/17/2017 06:45 PM, shadow_lin wrote: Hi All, I am testing luminous 12.2.

Re: [ceph-users] [Luminous 12.2.2] Cluster peformance drops after certain point of time

2017-12-18 Thread shadow_lin
perf dump | jq '.bluefs' | grep -E '(db|slow)' "db_total_bytes": 400029646848, "db_used_bytes": 9347006464, "slow_total_bytes": 0, "slow_used_bytes": 0 2017-12-18 shadow_lin 发件人:Konstantin Shalygin 发送时间:2017-12-18 13:52 主题

[ceph-users] [Luminous 12.2.2] Cluster peformance drops after certain point of time

2017-12-17 Thread shadow_lin
Hi All, I am testing luminous 12.2.2 and find a strange behavior of my cluster. I was testing my cluster throughput by using fio on a mounted rbd with follow fio parameters: fio -directory=fiotest -direct=1 -thread -rw=write -ioengine=libaio -size=200G -group_reporting -bs=1m -i

Re: [ceph-users] The way to minimize osd memory usage?

2017-12-10 Thread shadow_lin
to minimize osd memory usage? 收件人:"David Turner" 抄送:"shadow_lin","ceph-users","Konstantin Shalygin" I've had some success in this configuration by cutting the bluestore cache size down to 512mb and only one OSD on an 8tb drive. Still get occasional O

Re: [ceph-users] The way to minimize osd memory usage?

2017-12-09 Thread shadow_lin
:Re: [ceph-users] The way to minimize osd memory usage? 收件人:"ceph-users" 抄送:"shadow_lin" > I am testing running ceph luminous(12.2.1-249-g42172a4 > (42172a443183ffe6b36e85770e53fe678db293bf) on ARM server. Try new 12.2.2 - this release should fix me

[ceph-users] The way to minimize osd memory usage?

2017-12-09 Thread shadow_lin
Hi All, I am testing running ceph luminous(12.2.1-249-g42172a4 (42172a443183ffe6b36e85770e53fe678db293bf) on ARM server. The ARM server has a two cores@1.4GHz cpu and 2GB ram and I am running 2 osd per ARM server with 2x8TB(or 2x10TB) hdd. Now I am facing constantly oom problem.I have tried upgra

[ceph-users] Why degraded objects count keeps increasing as more data is wrote into cluster?

2017-11-07 Thread shadow_lin
Hi all, I have a pool of 2 replicate(failure domain host) and I was testing it with fio writing to rbd image(about 450MB/s) when one of my host crashed.I rebooted the crashed host and mon said all osd and host were online, but there were some pg in degraded status. I thought it would recover but

[ceph-users] [luminous][ERR] Error -2 reading object

2017-11-03 Thread shadow_lin
Hi all, I am testing luminouse for ec pool backed rbd[k=8,m=2]. My luminouse version is: ceph version 12.2.1-249-g42172a4 (42172a443183ffe6b36e85770e53fe678db293bf) luminous (stable) My cluster had some osd memory oom problem so some osds got oom killed.The cluster entered recovery state.

[ceph-users] How would ec profile effect performance?

2017-11-02 Thread shadow_lin
Hi all, I am wondering how ec profile would effect ceph performance? Will ec profile k=10,m=2 perform better than k=8,m=2 since there would be more chunk to wirte and read concurrently? Will ec profile k=10,m=2 perform need more memory and cpu power than ec profile k=8,m=2? 2017-11-02 lin.yu

[ceph-users] 回复: 回复: Re: [luminous]OSD memory usage increase when writing^J a lot of data to cluster

2017-11-02 Thread shadow_lin
his problem was discussed before at http://tracker.ceph.com/issues/12681, is it a tcmalloc problem? 2017-11-02 lin.yunfan 发件人:Sage Weil 发送时间:2017-11-01 20:11 主题:Re: 回复: Re: [ceph-users] [luminous]OSD memory usage increase when writing^J a lot of data to cluster 收件人:"shadow_lin" 抄

[ceph-users] 回复: 回复: Re: [luminous]OSD memory usage increase when writing^J a lot of data to cluster

2017-11-01 Thread shadow_lin
16765 Spans in use MALLOC: 32 Thread heaps in use MALLOC: 8192 Tcmalloc page size I have run test for about 10hrs writing,so far no oom happened.The osd uses 9xxMB memory max and keep stable

[ceph-users] 回复: Re: [luminous]OSD memory usage increase when writing a lot of data to cluster

2017-10-31 Thread shadow_lin
ta to cluster 收件人:"shadow_lin" 抄送:"ceph-users" On Tue, 24 Oct 2017, shadow_lin wrote: > BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em} body > {border-width:0;margin:0} img {border:0;margin:0;padding:0} Hi All, > The cluster has 24 osd with 24 8TB h

[ceph-users] 回复: Re: mkfs rbd image is very slow

2017-10-31 Thread shadow_lin
:07 主题:Re: [ceph-users] mkfs rbd image is very slow 收件人:"shadow_lin" 抄送:"ceph-users" Try running "mkfs.xfs -K" which disables discarding to see if that improves the mkfs speed. The librbd-based implementation encountered a similar issue before when certain OSs sent v

[ceph-users] [Luminous]How to choose the proper ec profile?

2017-10-30 Thread shadow_lin
Hi all, I am wondering how to choose the proper ec profile for new luminous ec rbd image. If I set the k too high what the draw back would be? Is it a good idea to set k=10 m=2? It sounds attempting the overhead of storage capacity is low and the redundancy is good. What is the difference for s

[ceph-users] mkfs rbd image is very slow

2017-10-29 Thread shadow_lin
Hi all, I am testing ec pool backed rbd image performace and found that it takes a very long time to format the rbd image by mkfs. I created a 5TB image and mounted it on the client(ubuntu 16.04 with 4.12 kernel) and use mkfs.ext4 and mkfs.xfs to format it. It takes hours to finish the format and

[ceph-users] 回复: Re: [luminous]OSD memory usage increase when writing a lot of data to cluster

2017-10-24 Thread shadow_lin
Hi Sage, When will 12.2.2 be released? 2017-10-24 lin.yunfan 发件人:Sage Weil 发送时间:2017-10-24 20:03 主题:Re: [ceph-users] [luminous]OSD memory usage increase when writing a lot of data to cluster 收件人:"shadow_lin" 抄送:"ceph-users" On Tue, 24 Oct 2017, shadow_lin wrote: >

[ceph-users] [luminous]OSD memory usage increase when writing a lot of data to cluster

2017-10-24 Thread shadow_lin
Hi All, The cluster has 24 osd with 24 8TB hdd. Each osd server has 2GB ram and runs 2OSD with 2 8TBHDD. I know the memory is below the remmanded value, but this osd server is an ARM server so I can't do anything to add more ram. I created a replicated(2 rep) pool and an 20TB image and mounted

[ceph-users] How does ceph pg repair work in jewel or later versions of ceph?

2017-05-04 Thread shadow_lin
I have read that the pg repair is simply copy the data from the primary osd to other osds.Is that true?or the later version of ceph has improved that? 2017-05-05 lin.yunfan___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/li