[ceph-users] [Cephfs] Mounting a specific pool

2013-11-05 Thread NEVEU Stephane
Hi all,

I'm trying to test/figure out how cephfs works and my goal is to mount specific 
pools on different KVM hosts :
Ceph osd pool create qcow2 1
Ceph osd dump | grep qcow2
-> Pool 9
Ceph mds add_data_pool 9
I want now a 900Gb quota for my pool :
Ceph osd pool set-quota qcow2 max_bytes 120795955200
Ok, now how can I verify the size in Gb of my pool (not the replication size 
1,2,3 etc) ?

On my KVM host (client):
Mount -t ceph ip1:6789,ip2:6789,ip3:6789:/ /disks/
OK
Cephsfs /disks/ show_layout
Layout.data_pool: 0
Etc...
Cephfs /disks/ set_layout -p 9 -u 4194304 -c 1 -s 4194304
Umount /disks/ && mount /disks/
Cephsfs /disks/ show_layout
Layout.data_pool: 9

Great my layout is now 9 so my qcow2 pool but :
Df-h | grep disks, shows the entire cluster size not only 900Gb why ? is it 
normal ? or am I doing something wrong ?

Thank you for your help :)


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Cephfs] Mounting a specific pool

2013-11-05 Thread Yan, Zheng
On Tue, Nov 5, 2013 at 4:05 PM, NEVEU Stephane
 wrote:
> Hi all,
>
>
>
> I’m trying to test/figure out how cephfs works and my goal is to mount
> specific pools on different KVM hosts :
>
> Ceph osd pool create qcow2 1
>
> Ceph osd dump | grep qcow2
>
> -> Pool 9
>
> Ceph mds add_data_pool 9
>
> I want now a 900Gb quota for my pool :
>
> Ceph osd pool set-quota qcow2 max_bytes 120795955200
>
> Ok, now how can I verify the size in Gb of my pool (not the replication size
> 1,2,3 etc) ?
>
>
>
> On my KVM host (client):
>
> Mount –t ceph ip1:6789,ip2:6789,ip3:6789:/ /disks/
>
> OK
>
> Cephsfs /disks/ show_layout
>
> Layout.data_pool: 0
>
> Etc…
>
> Cephfs /disks/ set_layout –p 9 –u 4194304 –c 1 –s 4194304
>
> Umount /disks/ && mount /disks/
>
> Cephsfs /disks/ show_layout
>
> Layout.data_pool: 9
>
>
>
> Great my layout is now 9 so my qcow2 pool but :
>
> Df-h | grep disks, shows the entire cluster size not only 900Gb why ? is it
> normal ? or am I doing something wrong ?

cephfs does not support any type of quota, df always reports ntire cluster size.

Yan, Zheng


>
>
>
> Thank you for your help J
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw-agent error

2013-11-05 Thread lixuehui
It works so cool !Thanks so much from your help! 


lixuehui

From: Josh Durgin
Date: 2013-10-31 01:31
To: Mark Kirkwood; lixuehui; ceph-users
Subject: Re: [ceph-users] radosgw-agent error
On 10/30/2013 01:54 AM, Mark Kirkwood wrote:
> On 29/10/13 20:53, lixuehui wrote:
>> Hi,list
>>  From the document that a radosgw-agent's right info should like this
>>
>> INFO:radosgw_agent.sync:Starting incremental sync
>> INFO:radosgw_agent.worker:17910 is processing shard number 0
>> INFO:radosgw_agent.worker:shard 0 has 0 entries after ''
>> INFO:radosgw_agent.worker:finished processing shard 0
>> INFO:radosgw_agent.worker:17910 is processing shard number 1
>> INFO:radosgw_agent.sync:1/64 shards processed
>> INFO:radosgw_agent.worker:shard 1 has 0 entries after ''
>> INFO:radosgw_agent.worker:finished processing shard 1
>> INFO:radosgw_agent.sync:2/64 shards processed
>>
>> my radosgw-agent return error like
>>
>>out = request(connection, 'get', '/admin/log',
>> dict(type=shard_type))
>>File
>> "/usr/lib/python2.7/dist-packages/radosgw_agent/client.py", line 76,
>> in request
>>  return result.json()
>> AttributeError: 'Response' object has no attribute 'json'
>> ERROR:root:error doing incremental sync, trying again later
>> Traceback (most recent call last):
>>File
>> "/usr/lib/python2.7/dist-packages/radosgw_agent/cli.py", line 247, in
>> main
>>  args.max_entries)
>>File
>> "/usr/lib/python2.7/dist-packages/radosgw_agent/sync.py", line 22, in
>> sync_incremental
>>  num_shards = client.num_log_shards(self.src_conn,
>> self._type)
>>File
>> "/usr/lib/python2.7/dist-packages/radosgw_agent/client.py", line 142,
>> in num_log_shards
>>  out = request(connection, 'get', '/admin/log',
>> dict(type=shard_type))
>>File
>> "/usr/lib/python2.7/dist-packages/radosgw_agent/client.py", line 76,
>> in request
>> Maybe there anyone ever encountered the same problem. Any help
>> is appropriated!
>>
>
> I received this error too - although I was attempting a 'full' sync at
> the time. I surmised that maybe the response object == None at that
> point? But otherwise I had no idea.

This particular error is coming from a too-old version of the 
python-requests package. We weren't setting a lower bound for that library
version before, but are now. If you install with the bootstrap script
you should get a new enough version in a virtualenv, and you can run
./radosgw-agent from your git checkout.

> I was also confused about:
> - was this even supposed to work with ceph 0.71?

No, there ended up being a bug and an admin api change, so if you want
to try it early you can use the next branch. You'll need to restart the
osds and radosgw if you're upgrading. It'll be backported to dumpling
as well, but the backport hasn't been finished yet.

> - which radosgw-agent to use:
>* https://github.com/ceph/radosgw-agent

This one.

>* https://github.com/jdurgin/radosgw-agent
>
> Given that the newly updated docs:
> http://ceph.com/docs/wip-doc-radosgw/radosgw/federated-config/ suggest
> ceph 0.72, I'm wondering if we just need to be more patient?

Note that the wip in the url means it's a work-in-progress branch,
so it's not totally ready yet either. If anything is confusing or
missing, let us know.

> However - Inktank folks - there is a lot of interest in the feature, so
> forgive us if we are jumping the gun, but also the current state of play
> is murky and some clarification would not go amiss!

It's great people are interested in trying this early. It's very
helpful to find issues sooner (like the requests library version).

Thanks!
Josh___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd on ubuntu 12.04 LTS

2013-11-05 Thread Fuchs, Andreas (SwissTXT)
The command you recomend doesnt work, also I cannot find something in the 
command reference how to do it.

How can the settings be verified?
Ceph osd dump does not show any flags:
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 
64 pgp_num 64 last_change 1 owner 0

also I cannot find something in the current running crushmap:
rule rbd {
ruleset 2
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

I'm I really looking in the right direction?



> -Original Message-
> From: Gregory Farnum [mailto:g...@inktank.com]
> Sent: Montag, 4. November 2013 19:17
> To: Fuchs, Andreas (SwissTXT)
> Cc: Karan Singh; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] rbd on ubuntu 12.04 LTS
> 
> On Mon, Nov 4, 2013 at 12:13 AM, Fuchs, Andreas (SwissTXT)
>  wrote:
> > I tryed with:
> > ceph osd crush tunables default
> > ceph osd crush tunables argonaut
> >
> > while the command runs without error, I still get the feature set
> > mismatch error whe I try to mount do I have to restart some service?
> 
> Ah, looking more closely it seems the feature mismatch you're getting is
> actually the "HASHPSPOOL" feature bit. I don't think that should have been
> enabled on Dumpling, but you can unset it on a pool basis ("ceph osd pool
> unset  hashpspool", I believe). I don't think you'll need to restart
> anything, but it's possible.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Kernel Panic / RBD Instability

2013-11-05 Thread James Wilkins
Hello,

Wondering if anyone else has come over an issue we're having with our POC CEPH 
Cluster at the moment.

Some details about its setup;

6 x Dell R720 (20 x 1TB Drives, 4 xSSD CacheCade), 4 x 10GB Nics
4 x Generic white label server (24 x 2 4TB Disk Raid-0 ), 4 x 10GB Nics
3 x Dell R620 - Acting as ISCSI Heads (targetcli / Linux kernel ISCSI) - 4 x 
10GB Nics.  An RBD device is mounted and exported via targetcli, this is then 
mounted on a client device to push backup data.

All machines are running Ubuntu 12.04.3 LTS and ceph 0.67.4

Machines are split over two racks (distinct layer 2 domains) using a leaf/spine 
model and we use ECMP/quagga on the ISCSI heads to reach the CEPH Cluster.

Crush map has racks defined to spread data over 2 racks -  I've attached the 
ceph.conf

The cluster performs great normally, and we only have issues when simulating 
rack failure.

The issue comes when the following steps are taken

o) Initiate load against the cluster (backups going via ISCSI)
o) ceph osd set noout
o) Reboot 2 x Generic Servers / 3 x Dell Servers (basically all the nodes in 1 
Rack)
o) Cluster goes degraded, as expected

  cluster 55dcf929-fca5-49fe-99d0-324a19afd5b4
   health HEALTH_WARN 7056 pgs degraded; 282 pgs stale; 2842 pgs stuck unclean; 
recovery 1286582/2700870 degraded (47.636%); 108/216 in osds are down; noout 
flag(s) set
   monmap e3: 5 mons at 
{fh-ceph01-mon-01=172.17.12.224:6789/0,fh-ceph01-mon-02=172.17.12.225:6789/0,fh-ceph01-mon-03=172.17.11.224:6789/0,fh-ceph01-mon-04=172.17.11.225:6789/0,fh-ceph01-mon-05=172.17.12.226:6789/0},
 election epoch 74, quorum 0,1,2,3,4 
fh-ceph01-mon-01,fh-ceph01-mon-02,fh-ceph01-mon-03,fh-ceph01-mon-04,fh-ceph01-mon-05
   osdmap e4237: 216 osds: 108 up, 216 in
pgmap v117686: 7328 pgs: 266 active+clean, 6 stale+active+clean, 6780 
active+degraded, 276 stale+active+degraded; 3511 GB data, 10546 GB used, 794 TB 
/ 805 TB avail; 1286582/2700870 degraded (47.636%)
   mdsmap e1: 0/0/1 up


2013-11-05 08:51:44.830393 mon.0 [INF] pgmap v117685: 7328 pgs: 1489 
active+clean, 1289 stale+active+clean, 3215 active+degraded, 1335 
stale+active+degraded; 3511 GB data, 10546 GB used, 794 TB / 805 TB avail; 
1048742/2700870 degraded (38.830%);  recovering 7 o/s, 28969KB/s

o) As OSDS start returning

2013-11-05 08:52:42.019295 mon.0 [INF] osd.165 172.17.11.9:6864/6074 boot
2013-11-05 08:52:42.023055 mon.0 [INF] osd.154 172.17.11.9:6828/5943 boot
2013-11-05 08:52:42.024226 mon.0 [INF] osd.159 172.17.11.9:6816/5820 boot
2013-11-05 08:52:42.031996 mon.0 [INF] osd.161 172.17.11.9:6856/6059 boot

o) We then see some slow requests;

2013-11-05 08:53:11.677044 osd.153 [WRN] 6 slow requests, 6 included below; 
oldest blocked for > 30.409992 secs
2013-11-05 08:53:11.677052 osd.153 [WRN] slow request 30.409992 seconds old, 
received at 2013-11-05 08:52:41.266994: osd_op(client.16010.1:13441679 
rb.0.21ec.238e1f29.0012fa28 [write 2854912~4096] 3.516ef071 RETRY=-1 e4240) 
currently reached pg
2013-11-05 08:53:11.677056 osd.153 [WRN] slow request 30.423024 seconds old, 
received at 2013-11-05 08:52:41.253962: osd_op(client.15755.1:13437999 
rb.0.21ec.238e1f29.0012fa28 [write 0~233472] 3.516ef071 RETRY=1 e4240) v4 
currently reached pg

o) A few minutes , the ISCSI heads start panicking

Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664305] [ cut 
here ]
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664313] WARNING: at 
/build/buildd/linux-lts-raring-3.8.0/kernel/watchdog.c:246 wat
chdog_overflow_callback+0x9a/0xc0()
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664315] Hardware name: 
PowerEdge R620
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664317] Watchdog detected 
hard LOCKUP on cpu 6
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664318] Modules linked in: 
ib_srpt(F) tcm_qla2xxx(F) tcm_loop(F) tcm_fc(F) iscsi_t
arget_mod(F) target_core_pscsi(F) target_core_file(F) target_core_iblock(F) 
target_core_mod(F) rbd(F) libceph(F) ipmi_devintf(F) ipm
i_si(F) ipmi_msghandler(F) qla2xxx(F) libfc(F) scsi_transport_fc(F) scsi_tgt(F) 
configfs(F) dell_rbu(F) ib_iser(F) rdma_cm(F) ib_cm(
F) iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) ext2(F) iscsi_tcp(F) 
libiscsi_tcp(F) libiscsi(F) scsi_transport_iscsi(F) corete
mp(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) 
cryptd(F) lrw(F) aes_x86_64(F) xts(F) gf128mul(F) gpi
o_ich(F) dcdbas(F) microcode(F) joydev(F) shpchp(F) sb_edac(F) wmi(F) 
edac_core(F) acpi_power_meter(F) mei(F) lpc_ich(F) mac_hid(F) 
8021q(F) garp(F) stp(F) llc(F) lp(F) parport(F) hid_generic(F) usbhid(F) hid(F) 
ahci(F) libahci(F) ixgbe(F) dca(F) megaraid_sas(F) m
dio(F) tg3(F) ptp(F) pps_core(F) btrfs(F) zlib_deflate(F) libcrc32c(F) [last 
unloaded: target_core_mod]
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664387] Pid: 460, comm: 
kworker/u:5 Tainted: GF   W3.8.0-31-generic #46~pr
ecise1-Ubuntu
Nov  5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664389] Call Tra

Re: [ceph-users] [Cephfs] Mounting a specific pool

2013-11-05 Thread NEVEU Stephane
Ok thank you, so is there a way to unset my quota ? or should I create a new 
pool and destroy the old one ?

Another question by the way :) , does this syntax work :
mount -t ceph ip1:6789,ip2:6789,ip3:6789:/qcow2   /disks/
If I only want to mount my "qcow2" pool ?


>On Tue, Nov 5, 2013 at 5:09 PM, NEVEU Stephane 
> wrote:
> Ok, so this command only works with rbd ?
> ceph osd pool set-quota poolname max_bytes 
>
> What happen then if I've already set a quota on my pool then added my 
> data_pool to mds ? Will this "quota" simply become ineffective in the cephfs 
> context ? and I'll be able to write larger datas than my "quota" ?
>
No. OSDs still enforce the quota. If quota has been reached, sync write will 
return -ENOSPC, buffered write will lose data.

Yan, Zheng




> -Message d'origine-
> De : Yan, Zheng [mailto:uker...@gmail.com] Envoyé : mardi 5 novembre 
> 2013 09:38 À : NEVEU Stephane Cc : ceph-users@lists.ceph.com Objet : 
> Re: [ceph-users] [Cephfs] Mounting a specific pool
>
> On Tue, Nov 5, 2013 at 4:05 PM, NEVEU Stephane 
>  wrote:
>> Hi all,
>>
>>
>>
>> I'm trying to test/figure out how cephfs works and my goal is to 
>> mount specific pools on different KVM hosts :
>>
>> Ceph osd pool create qcow2 1
>>
>> Ceph osd dump | grep qcow2
>>
>> -> Pool 9
>>
>> Ceph mds add_data_pool 9
>>
>> I want now a 900Gb quota for my pool :
>>
>> Ceph osd pool set-quota qcow2 max_bytes 120795955200
>>
>> Ok, now how can I verify the size in Gb of my pool (not the 
>> replication size
>> 1,2,3 etc) ?
>>
>>
>>
>> On my KVM host (client):
>>
>> Mount -t ceph ip1:6789,ip2:6789,ip3:6789:/ /disks/
>>
>> OK
>>
>> Cephsfs /disks/ show_layout
>>
>> Layout.data_pool: 0
>>
>> Etc.
>>
>> Cephfs /disks/ set_layout -p 9 -u 4194304 -c 1 -s 4194304
>>
>> Umount /disks/ && mount /disks/
>>
>> Cephsfs /disks/ show_layout
>>
>> Layout.data_pool: 9
>>
>>
>>
>> Great my layout is now 9 so my qcow2 pool but :
>>
>> Df-h | grep disks, shows the entire cluster size not only 900Gb why ?
>> is it normal ? or am I doing something wrong ?
>
> cephfs does not support any type of quota, df always reports ntire cluster 
> size.
>
> Yan, Zheng
>
>
>>
>>
>>
>> Thank you for your help J
>>
>>
>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph cluster performance

2013-11-05 Thread Dinu Vlad
Ok, so after tweaking the deadline scheduler and the filestore_wbthrottle* ceph 
settings I was able to get 440 MB/s from 8 rados bench instances, over a single 
osd node (pool pg_num = 1800, size = 1) 

This still looks awfully slow to me - fio throughput across all disks reaches 
2.8 GB/s!!

I'd appreciate any suggestion, where to look for the issue. Thanks!


On Oct 31, 2013, at 6:35 PM, Dinu Vlad  wrote:

> 
> I tested the osd performance from a single node. For this purpose I deployed 
> a new cluster (using ceph-deploy, as before) and on fresh/repartitioned 
> drives. I created a single pool, 1800 pgs. I ran the rados bench both on the 
> osd server and on a remote one. Cluster configuration stayed "default", with 
> the same additions about xfs mount & mkfs.xfs as before. 
> 
> With a single host, the pgs were "stuck unclean" (active only, not 
> active+clean):
> 
> # ceph -s
>  cluster ffd16afa-6348-4877-b6bc-d7f9d82a4062
>   health HEALTH_WARN 1800 pgs stuck unclean
>   monmap e1: 3 mons at 
> {cephmon1=10.4.0.250:6789/0,cephmon2=10.4.0.251:6789/0,cephmon3=10.4.0.252:6789/0},
>  election epoch 4, quorum 0,1,2 cephmon1,cephmon2,cephmon3
>   osdmap e101: 18 osds: 18 up, 18 in
>pgmap v1055: 1800 pgs: 1800 active; 0 bytes data, 732 MB used, 16758 GB / 
> 16759 GB avail
>   mdsmap e1: 0/0/1 up
> 
> 
> Test results: 
> Local test, 1 process, 16 threads: 241.7 MB/s
> Local test, 8 processes, 128 threads: 374.8 MB/s
> Remote test, 1 process, 16 threads: 231.8 MB/s
> Remote test, 8 processes, 128 threads: 366.1 MB/s
> 
> Maybe it's just me, but it seems on the low side too. 
> 
> Thanks,
> Dinu
> 
> 
> On Oct 30, 2013, at 8:59 PM, Mark Nelson  wrote:
> 
>> On 10/30/2013 01:51 PM, Dinu Vlad wrote:
>>> Mark,
>>> 
>>> The SSDs are 
>>> http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/ssd/enterprise-sata-ssd/?sku=ST240FN0021
>>>  and the HDDs are 
>>> http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/hdd/constellation/?sku=ST91000640SS.
>>> 
>>> The chasis is a "SiliconMechanics C602" - but I don't have the exact model. 
>>> It's based on Supermicro, has 24 slots front and 2 in the back and a SAS 
>>> expander.
>>> 
>>> I did a fio test (raw partitions, 4M blocksize, ioqueue maxed out according 
>>> to what the driver reports in dmesg). here are the results (filtered):
>>> 
>>> Sequential:
>>> Run status group 0 (all jobs):
>>>  WRITE: io=176952MB, aggrb=2879.0MB/s, minb=106306KB/s, maxb=191165KB/s, 
>>> mint=60444msec, maxt=61463msec
>>> 
>>> Individually, the HDDs had best:worst 103:109 MB/s while the SSDs gave 
>>> 153:189 MB/s
>> 
>> Ok, that looks like what I'd expect to see given the controller being used.  
>> SSDs are probably limited by total aggregate throughput.
>> 
>>> 
>>> Random:
>>> Run status group 0 (all jobs):
>>>  WRITE: io=106868MB, aggrb=1727.2MB/s, minb=67674KB/s, maxb=106493KB/s, 
>>> mint=60404msec, maxt=61875msec
>>> 
>>> Individually (best:worst) HDD 71:73 MB/s, SSD 68:101 MB/s (with only one 
>>> out of 6 doing 101)
>>> 
>>> This is on just one of the osd servers.
>> 
>> Where the ceph tests to one OSD server or across all servers?  It might be 
>> worth trying tests against a single server with no replication using 
>> multiple rados bench instances and just seeing what happens.
>> 
>>> 
>>> Thanks,
>>> Dinu
>>> 
>>> 
>>> On Oct 30, 2013, at 6:38 PM, Mark Nelson  wrote:
>>> 
 On 10/30/2013 09:05 AM, Dinu Vlad wrote:
> Hello,
> 
> I've been doing some tests on a newly installed ceph cluster:
> 
> # ceph osd create bench1 2048 2048
> # ceph osd create bench2 2048 2048
> # rbd -p bench1 create test
> # rbd -p bench1 bench-write test --io-pattern rand
> elapsed:   483  ops:   396579  ops/sec:   820.23  bytes/sec: 2220781.36
> 
> # rados -p bench2 bench 300 write --show-time
> # (run 1)
> Total writes made:  20665
> Write size: 4194304
> Bandwidth (MB/sec): 274.923
> 
> Stddev Bandwidth:   96.3316
> Max bandwidth (MB/sec): 748
> Min bandwidth (MB/sec): 0
> Average Latency:0.23273
> Stddev Latency: 0.262043
> Max latency:1.69475
> Min latency:0.057293
> 
> These results seem to be quite poor for the configuration:
> 
> MON: dual-cpu Xeon E5-2407 2.2 GHz, 48 GB RAM, 2xSSD for OS
> OSD: dual-cpu Xeon E5-2620 2.0 GHz, 64 GB RAM, 2xSSD for OS (on-board 
> controller), 18 HDD 1TB 7.2K rpm SAS for OSD drives and 6 SSDs (SATA) for 
> journal, attached to a LSI 9207-8i controller.
> All servers have dual 10GE network cards, connected to a pair of 
> dedicated switches. Each SSD has 3 10 GB partitions for journals.
 
 Agreed, you should see much higher throughput with that kind of storage 
 setup.  What brand/model SSDs are these?  Also, what brand and model of 
 chassis?  With 24 drives and 8 SSDs I could push 2GB/s (no replication 
 though) with

[ceph-users] stopped backfilling process

2013-11-05 Thread Dominik Mostowiec
Hi,
After remove ( ceph osd out X) osd from one server ( 11 osd ) ceph
starts data migration process.
It stopped on:
32424 pgs: 30635 active+clean, 191 active+remapped, 1596
active+degraded, 2 active+clean+scrubbing;
degraded (1.718%)

All osd with reweight==1 are UP.

ceph -v
ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33)

health details:
https://www.dropbox.com/s/149zvee2ump1418/health_details.txt

pg active+degraded query:
https://www.dropbox.com/s/46emswxd7s8xce1/pg_11.39_query.txt
pg active+remapped query:
https://www.dropbox.com/s/wij4uqh8qoz60fd/pg_16.2172_query.txt

Please help - how can we fix it?

-- 
Pozdrawiam
Dominik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph cluster performance

2013-11-05 Thread Mark Nelson

Ok, some more thoughts:

1) What kernel are you using?

2) Mixing SATA and SAS on an expander backplane can some times have bad 
effects.  We don't really know how bad this is and in what 
circumstances, but the Nexenta folks have seen problems with ZFS on 
solaris and it's not impossible linux may suffer too:


http://gdamore.blogspot.com/2010/08/why-sas-sata-is-not-such-great-idea.html

3) If you are doing tests and look at disk throughput with something 
like "collectl -sD -oT"  do the writes look balanced across the spinning 
disks?  Do any devices have much really high service times or queue times?


4) Also, after the test is done, you can try:

find /var/run/ceph/*.asok -maxdepth 1 -exec sudo ceph --admin-daemon {} 
dump_historic_ops \; > foo


and then grep for "duration" in foo.  You'll get a list of the slowest 
operations over the last 10 minutes from every osd on the node.  Once 
you identify a slow duration, you can go back and in an editor search 
for the slow duration and look at where in the OSD it hung up.  That 
might tell us more about slow/latent operations.


5) Something interesting here is that I've heard from another party that 
in a 36 drive Supermicro SC847E16 chassis they had 30 7.2K RPM disks and 
6 SSDs on a SAS9207-8i controller and were pushing significantly faster 
throughput than you are seeing (even given the greater number of 
drives).  So it's very interesting to me that you are pushing so much 
less.  The 36 drive supermicro chassis I have with no expanders and 30 
drives with 6 SSDs can push about 2100MB/s with a bunch of 9207-8i 
controllers and XFS (no replication).


Mark

On 11/05/2013 05:15 AM, Dinu Vlad wrote:

Ok, so after tweaking the deadline scheduler and the filestore_wbthrottle* ceph 
settings I was able to get 440 MB/s from 8 rados bench instances, over a single 
osd node (pool pg_num = 1800, size = 1)

This still looks awfully slow to me - fio throughput across all disks reaches 
2.8 GB/s!!

I'd appreciate any suggestion, where to look for the issue. Thanks!


On Oct 31, 2013, at 6:35 PM, Dinu Vlad  wrote:



I tested the osd performance from a single node. For this purpose I deployed a new cluster 
(using ceph-deploy, as before) and on fresh/repartitioned drives. I created a single pool, 
1800 pgs. I ran the rados bench both on the osd server and on a remote one. Cluster 
configuration stayed "default", with the same additions about xfs mount & 
mkfs.xfs as before.

With a single host, the pgs were "stuck unclean" (active only, not 
active+clean):

# ceph -s
  cluster ffd16afa-6348-4877-b6bc-d7f9d82a4062
   health HEALTH_WARN 1800 pgs stuck unclean
   monmap e1: 3 mons at 
{cephmon1=10.4.0.250:6789/0,cephmon2=10.4.0.251:6789/0,cephmon3=10.4.0.252:6789/0},
 election epoch 4, quorum 0,1,2 cephmon1,cephmon2,cephmon3
   osdmap e101: 18 osds: 18 up, 18 in
pgmap v1055: 1800 pgs: 1800 active; 0 bytes data, 732 MB used, 16758 GB / 
16759 GB avail
   mdsmap e1: 0/0/1 up


Test results:
Local test, 1 process, 16 threads: 241.7 MB/s
Local test, 8 processes, 128 threads: 374.8 MB/s
Remote test, 1 process, 16 threads: 231.8 MB/s
Remote test, 8 processes, 128 threads: 366.1 MB/s

Maybe it's just me, but it seems on the low side too.

Thanks,
Dinu


On Oct 30, 2013, at 8:59 PM, Mark Nelson  wrote:


On 10/30/2013 01:51 PM, Dinu Vlad wrote:

Mark,

The SSDs are 
http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/ssd/enterprise-sata-ssd/?sku=ST240FN0021
 and the HDDs are 
http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/hdd/constellation/?sku=ST91000640SS.

The chasis is a "SiliconMechanics C602" - but I don't have the exact model. 
It's based on Supermicro, has 24 slots front and 2 in the back and a SAS expander.

I did a fio test (raw partitions, 4M blocksize, ioqueue maxed out according to 
what the driver reports in dmesg). here are the results (filtered):

Sequential:
Run status group 0 (all jobs):
  WRITE: io=176952MB, aggrb=2879.0MB/s, minb=106306KB/s, maxb=191165KB/s, 
mint=60444msec, maxt=61463msec

Individually, the HDDs had best:worst 103:109 MB/s while the SSDs gave 153:189 
MB/s


Ok, that looks like what I'd expect to see given the controller being used.  
SSDs are probably limited by total aggregate throughput.



Random:
Run status group 0 (all jobs):
  WRITE: io=106868MB, aggrb=1727.2MB/s, minb=67674KB/s, maxb=106493KB/s, 
mint=60404msec, maxt=61875msec

Individually (best:worst) HDD 71:73 MB/s, SSD 68:101 MB/s (with only one out of 
6 doing 101)

This is on just one of the osd servers.


Where the ceph tests to one OSD server or across all servers?  It might be 
worth trying tests against a single server with no replication using multiple 
rados bench instances and just seeing what happens.



Thanks,
Dinu


On Oct 30, 2013, at 6:38 PM, Mark Nelson  wrote:


On 10/30/2013 09:05 AM, Dinu Vlad wrote:

Hello,

I've been doing some tests on a newly installed ceph cluster:

# ceph osd create bench1 2048 2048
# ceph osd 

[ceph-users] Pool without a name, how to remove it?

2013-11-05 Thread Wido den Hollander

Hi,

On a Ceph cluster I have a pool without a name. I have no idea how it 
got there, but how do I remove it?


pool 14 '' rep size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 8 pgp_num 8 last_change 158 owner 18446744073709551615


Is there a way to remove a pool by it's ID? I couldn't find anything in 
librados do to so.


--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] About memory usage of ceph-mon on arm

2013-11-05 Thread Yu Changyuan
Finally, my tiny ceph cluster get 3 monitors, newly added mon.b and mon.c
both running on cubieboard2, which is cheap but still with enough cpu
power(dual-core arm A7 cpu, 1.2G) and memory(1G).

But compare to mon.a which running on an amd64 cpu, both mon.b and mon.c
easily consume too much memory, so I want to know whether this is caused by
memory leak. Below is the output of 'ceph tell mon.a heap stats' and 'ceph
tell mon.c heap stats'(mon.c only start 12hr ago, while mon.a already
running for more than 10 days)

mon.atcmalloc heap stats:
MALLOC:5480160 (5.2 MiB) Bytes in use by application
MALLOC: + 28065792 (   26.8 MiB) Bytes in page heap freelist
MALLOC: + 15242312 (   14.5 MiB) Bytes in central cache freelist
MALLOC: + 10116608 (9.6 MiB) Bytes in transfer cache freelist
MALLOC: + 10432216 (9.9 MiB) Bytes in thread cache freelists
MALLOC: +  1667224 (1.6 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: = 71004312 (   67.7 MiB) Actual memory used (physical + swap)
MALLOC: + 57540608 (   54.9 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =128544920 (  122.6 MiB) Virtual address space used
MALLOC:
MALLOC:   4655  Spans in use
MALLOC: 34  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

Call ReleaseFreeMemory() to release freelist memory to the OS (via
madvise()).
Bytes released to the


mon.ctcmalloc heap stats:
MALLOC:  175861640 (  167.7 MiB) Bytes in use by application
MALLOC: +  2220032 (2.1 MiB) Bytes in page heap freelist
MALLOC: +  1007560 (1.0 MiB) Bytes in central cache freelist
MALLOC: +  2871296 (2.7 MiB) Bytes in transfer cache freelist
MALLOC: +  4686000 (4.5 MiB) Bytes in thread cache freelists
MALLOC: +  2758880 (2.6 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =189405408 (  180.6 MiB) Actual memory used (physical + swap)
MALLOC: +0 (0.0 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =189405408 (  180.6 MiB) Virtual address space used
MALLOC:
MALLOC:   3445  Spans in use
MALLOC: 14  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

Call ReleaseFreeMemory() to release freelist memory to the OS (via
madvise()).
Bytes released to the

The ceph versin is 0.67.4, compiled with tcmalloc enabled,
gcc(armv7a-hardfloat-linux-gnueabi-gcc) version 4.7.3 and I also try to
dump heap, but I can not find anything useful, below is a recent dump,
output by command "pprof --text /usr/bin/ceph-mon mon.c.profile.0021.heap".
What extra step should I  take to make the dump more meaningful?

Using local file /usr/bin/ceph-mon.
Using local file mon.c.profile.0021.heap.
Total: 149.3 MB
   146.2  97.9%  97.9%146.2  97.9% b6a7ce7c
 1.4   0.9%  98.9%  1.4   0.9% std::basic_string::_Rep::_S_create
??:0
 1.4   0.9%  99.8%  1.4   0.9% 002dd794
 0.1   0.1%  99.9%  0.1   0.1% b6a81170
 0.1   0.1%  99.9%  0.1   0.1% b6a80894
 0.0   0.0% 100.0%  0.0   0.0% b6a7e2ac
 0.0   0.0% 100.0%  0.0   0.0% b6a81410
 0.0   0.0% 100.0%  0.0   0.0% 00367450
 0.0   0.0% 100.0%  0.0   0.0% 001d4474
 0.0   0.0% 100.0%  0.0   0.0% 0028847c
 0.0   0.0% 100.0%  0.0   0.0% b6a7e8d8
 0.0   0.0% 100.0%  0.0   0.0% 0020c80c
 0.0   0.0% 100.0%  0.0   0.0% 0028bd20
 0.0   0.0% 100.0%  0.0   0.0% b6a63248
 0.0   0.0% 100.0%  0.0   0.0% b6a83478
 0.0   0.0% 100.0%  0.0   0.0% b6a806f0
 0.0   0.0% 100.0%  0.0   0.0% 002eb8b8
 0.0   0.0% 100.0%  0.0   0.0% 0024efb4
 0.0   0.0% 100.0%  0.0   0.0% 0027e550
 0.0   0.0% 100.0%  0.0   0.0% b6a77104
 0.0   0.0% 100.0%  0.0   0.0% _dl_mcount ??:0
 0.0   0.0% 100.0%  0.0   0.0% 003673ec
 0.0   0.0% 100.0%  0.0   0.0% b6a7a91c
 0.0   0.0% 100.0%  0.0   0.0% 00295e44
 0.0   0.0% 100.0%  0.0   0.0% b6a7ee38
 0.0   0.0% 100.0%  0.0   0.0% 00283948
 0.0   0.0% 100.0%  0.0   0.0% 002a53c4
 0.0   0.0% 100.0%  0.0   0.0% b6a7665c
 0.0   0.0% 100.0%  0.0   0.0% 002c4590
 0.0   0.0% 100.0%  0.0   0.0% b6a7e88c
 0.0   0.0% 100.0%  0.0   0.0% b6a8456c
 0.0   0.0% 100.0%  0.0   0.0% b6a76ed4
 0.0   0.0% 100.0%  0.0   0.0% b6a842f0
 0.0   0.0% 100.0%  0.0   0.0% b6a72bd0
 0.0   0.

Re: [ceph-users] ceph recovery killing vms

2013-11-05 Thread Kevin Weiler
Thanks Kyle,

What's the unit for osd recovery max chunk?

Also, how do I find out what my current values are for these osd options?

--

Kevin Weiler

IT


IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
60606 | http://imc-chicago.com/

Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
kevin.wei...@imc-chicago.com







On 10/28/13 6:22 PM, "Kyle Bader"  wrote:

>You can change some OSD tunables to lower the priority of backfills:
>
>osd recovery max chunk:   8388608
>osd recovery op priority: 2
>
>In general a lower op priority means it will take longer for your
>placement groups to go from degraded to active+clean, the idea is to
>balance recovery time and not starving client requests. I've found 2
>to work well on our clusters, YMMV.
>
>On Mon, Oct 28, 2013 at 10:16 AM, Kevin Weiler
> wrote:
>> Hi all,
>>
>> We have a ceph cluster that being used as a backing store for several
>>VMs
>> (windows and linux). We notice that when we reboot a node, the cluster
>> enters a degraded state (which is expected), but when it begins to
>>recover,
>> it starts backfilling and it kills the performance of our VMs. The VMs
>>run
>> slow, or not at all, and also seem to switch it's ceph mounts to
>>read-only.
>> I was wondering 2 things:
>>
>> Shouldn't we be recovering instead of backfilling? It seems like
>>backfilling
>> is much more intensive operation
>> Can we improve the recovery/backfill performance so that our VMs don't
>>go
>> down when there is a problem with the cluster?
>>
>>
>> --
>>
>> Kevin Weiler
>>
>> IT
>>
>>
>>
>> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
>>60606
>> | http://imc-chicago.com/
>>
>> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
>> kevin.wei...@imc-chicago.com
>>
>>
>> 
>>
>> The information in this e-mail is intended only for the person or
>>entity to
>> which it is addressed.
>>
>> It may contain confidential and /or privileged material. If someone
>>other
>> than the intended recipient should receive this e-mail, he / she shall
>>not
>> be entitled to read, disseminate, disclose or duplicate it.
>>
>> If you receive this e-mail unintentionally, please inform us
>>immediately by
>> "reply" and then delete it from your system. Although this information
>>has
>> been compiled with great care, neither IMC Financial Markets & Asset
>> Management nor any of its related entities shall accept any
>>responsibility
>> for any errors, omissions or other inaccuracies in this information or
>>for
>> the consequences thereof, nor shall it be bound in any way by the
>>contents
>> of this e-mail or its attachments. In the event of incomplete or
>>incorrect
>> transmission, please return the e-mail to the sender and permanently
>>delete
>> this message and any attachments.
>>
>> Messages and attachments are scanned for all known viruses. Always scan
>> attachments before opening them.
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
>--
>
>Kyle




The information in this e-mail is intended only for the person or entity to 
which it is addressed.

It may contain confidential and /or privileged material. If someone other than 
the intended recipient should receive this e-mail, he / she shall not be 
entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by 
"reply" and then delete it from your system. Although this information has been 
compiled with great care, neither IMC Financial Markets & Asset Management nor 
any of its related entities shall accept any responsibility for any errors, 
omissions or other inaccuracies in this information or for the consequences 
thereof, nor shall it be bound in any way by the contents of this e-mail or its 
attachments. In the event of incomplete or incorrect transmission, please 
return the e-mail to the sender and permanently delete this message and any 
attachments.

Messages and attachments are scanned for all known viruses. Always scan 
attachments before opening them.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] near full osd

2013-11-05 Thread Kevin Weiler
Hi guys,

I have an OSD in my cluster that is near full at 90%, but we're using a little 
less than half the available storage in the cluster. Shouldn't this be balanced 
out?

--
Kevin Weiler
IT

IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | 
http://imc-chicago.com/
Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
kevin.wei...@imc-chicago.com



The information in this e-mail is intended only for the person or entity to 
which it is addressed.

It may contain confidential and /or privileged material. If someone other than 
the intended recipient should receive this e-mail, he / she shall not be 
entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by 
"reply" and then delete it from your system. Although this information has been 
compiled with great care, neither IMC Financial Markets & Asset Management nor 
any of its related entities shall accept any responsibility for any errors, 
omissions or other inaccuracies in this information or for the consequences 
thereof, nor shall it be bound in any way by the contents of this e-mail or its 
attachments. In the event of incomplete or incorrect transmission, please 
return the e-mail to the sender and permanently delete this message and any 
attachments.

Messages and attachments are scanned for all known viruses. Always scan 
attachments before opening them.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] near full osd

2013-11-05 Thread Greg Chavez
Kevin, in my experience that usually indicates a bad or underperforming
disk, or a too-high priority.  Try running "ceph osd crush reweight
osd.<##> 1.0.  If that doesn't do the trick, you may want to just out that
guy.

I don't think the crush algorithm guarantees balancing things out in the
way you're expecting.


--Greg

On Tue, Nov 5, 2013 at 11:11 AM, Kevin Weiler
wrote:

>  Hi guys,
>
>  I have an OSD in my cluster that is near full at 90%, but we're using a
> little less than half the available storage in the cluster. Shouldn't this
> be balanced out?
>
>
>  --
>
> *Kevin Weiler*
>
> IT
>
>
>
> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
> 60606 | http://imc-chicago.com/
>
> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
> *kevin.wei...@imc-chicago.com
> *
>
> --
>
> The information in this e-mail is intended only for the person or entity
> to which it is addressed.
>
> It may contain confidential and /or privileged material. If someone other
> than the intended recipient should receive this e-mail, he / she shall not
> be entitled to read, disseminate, disclose or duplicate it.
>
> If you receive this e-mail unintentionally, please inform us immediately
> by "reply" and then delete it from your system. Although this information
> has been compiled with great care, neither IMC Financial Markets & Asset
> Management nor any of its related entities shall accept any responsibility
> for any errors, omissions or other inaccuracies in this information or for
> the consequences thereof, nor shall it be bound in any way by the contents
> of this e-mail or its attachments. In the event of incomplete or incorrect
> transmission, please return the e-mail to the sender and permanently delete
> this message and any attachments.
>
> Messages and attachments are scanned for all known viruses. Always scan
> attachments before opening them.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] near full osd

2013-11-05 Thread Kevin Weiler
All of the disks in my cluster are identical and therefore all have the same 
weight (each drive is 2TB and the automatically generated weight is 1.82 for 
each one).

Would the procedure here be to reduce the weight, let it rebal, and then put 
the weight back to where it was?

--
Kevin Weiler
IT

IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | 
http://imc-chicago.com/
Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
kevin.wei...@imc-chicago.com

From: , Erik 
mailto:earone...@expressionanalysis.com>>
Date: Tuesday, November 5, 2013 10:27 AM
To: Greg Chavez mailto:greg.cha...@gmail.com>>, Kevin 
Weiler mailto:kevin.wei...@imc-chicago.com>>
Cc: "ceph-users@lists.ceph.com" 
mailto:ceph-users@lists.ceph.com>>
Subject: RE: [ceph-users] near full osd

If there’s an underperforming disk, why on earth would more data be put on it?  
You’d think it would be less….   I would think an overperforming disk should 
(desirably) cause that case,right?

From: 
ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Greg Chavez
Sent: Tuesday, November 05, 2013 11:20 AM
To: Kevin Weiler
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] near full osd

Kevin, in my experience that usually indicates a bad or underperforming disk, 
or a too-high priority.  Try running "ceph osd crush reweight osd.<##> 1.0.  If 
that doesn't do the trick, you may want to just out that guy.

I don't think the crush algorithm guarantees balancing things out in the way 
you're expecting.


--Greg
On Tue, Nov 5, 2013 at 11:11 AM, Kevin Weiler 
mailto:kevin.wei...@imc-chicago.com>> wrote:
Hi guys,

I have an OSD in my cluster that is near full at 90%, but we're using a little 
less than half the available storage in the cluster. Shouldn't this be balanced 
out?

--
Kevin Weiler
IT

IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | 
http://imc-chicago.com/
Phone: +1 312-204-7439 | Fax: +1 
312-244-3301 | E-Mail: 
kevin.wei...@imc-chicago.com



The information in this e-mail is intended only for the person or entity to 
which it is addressed.

It may contain confidential and /or privileged material. If someone other than 
the intended recipient should receive this e-mail, he / she shall not be 
entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by 
"reply" and then delete it from your system. Although this information has been 
compiled with great care, neither IMC Financial Markets & Asset Management nor 
any of its related entities shall accept any responsibility for any errors, 
omissions or other inaccuracies in this information or for the consequences 
thereof, nor shall it be bound in any way by the contents of this e-mail or its 
attachments. In the event of incomplete or incorrect transmission, please 
return the e-mail to the sender and permanently delete this message and any 
attachments.

Messages and attachments are scanned for all known viruses. Always scan 
attachments before opening them.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




The information in this e-mail is intended only for the person or entity to 
which it is addressed.

It may contain confidential and /or privileged material. If someone other than 
the intended recipient should receive this e-mail, he / she shall not be 
entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by 
"reply" and then delete it from your system. Although this information has been 
compiled with great care, neither IMC Financial Markets & Asset Management nor 
any of its related entities shall accept any responsibility for any errors, 
omissions or other inaccuracies in this information or for the consequences 
thereof, nor shall it be bound in any way by the contents of this e-mail or its 
attachments. In the event of incomplete or incorrect transmission, please 
return the e-mail to the sender and permanently delete this message and any 
attachments.

Messages and attachments are scanned for all known viruses. Always scan 
attachments before opening them.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] near full osd

2013-11-05 Thread Aronesty, Erik
If there's an underperforming disk, why on earth would more data be put on it?  
You'd think it would be less   I would think an overperforming disk should 
(desirably) cause that case,right?

From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Greg Chavez
Sent: Tuesday, November 05, 2013 11:20 AM
To: Kevin Weiler
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] near full osd

Kevin, in my experience that usually indicates a bad or underperforming disk, 
or a too-high priority.  Try running "ceph osd crush reweight osd.<##> 1.0.  If 
that doesn't do the trick, you may want to just out that guy.

I don't think the crush algorithm guarantees balancing things out in the way 
you're expecting.


--Greg
On Tue, Nov 5, 2013 at 11:11 AM, Kevin Weiler 
mailto:kevin.wei...@imc-chicago.com>> wrote:
Hi guys,

I have an OSD in my cluster that is near full at 90%, but we're using a little 
less than half the available storage in the cluster. Shouldn't this be balanced 
out?

--
Kevin Weiler
IT

IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | 
http://imc-chicago.com/
Phone: +1 312-204-7439 | Fax: +1 
312-244-3301 | E-Mail: 
kevin.wei...@imc-chicago.com



The information in this e-mail is intended only for the person or entity to 
which it is addressed.

It may contain confidential and /or privileged material. If someone other than 
the intended recipient should receive this e-mail, he / she shall not be 
entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by 
"reply" and then delete it from your system. Although this information has been 
compiled with great care, neither IMC Financial Markets & Asset Management nor 
any of its related entities shall accept any responsibility for any errors, 
omissions or other inaccuracies in this information or for the consequences 
thereof, nor shall it be bound in any way by the contents of this e-mail or its 
attachments. In the event of incomplete or incorrect transmission, please 
return the e-mail to the sender and permanently delete this message and any 
attachments.

Messages and attachments are scanned for all known viruses. Always scan 
attachments before opening them.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] migrating to ceph-deploy

2013-11-05 Thread Brian Andrus
As long as default mon and osd paths are used, and you have the proper mon
caps set, you should be okay.

Here is a mention of it in the ceph docs:

http://ceph.com/docs/master/install/upgrading-ceph/#transitioning-to-ceph-deploy

Brian Andrus
Storage Consultant, Inktank


On Fri, Nov 1, 2013 at 4:34 PM, James Harper
wrote:

> I have a cluster already set up, and I'd like to start using ceph-deploy
> to add my next OSD. The cluster currently doesn't have any authentication
> or anything.
>
> Should I start using ceph-deploy now, or just add the OSD manually? If the
> former, is there anything I need to do to make sure ceph-deploy won't break
> the already running cluster?
>
> Thanks
>
> James
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] near full osd

2013-11-05 Thread Greg Chavez
Erik, it's utterly non-intuitive and I'd love another explanation than the
one I've provided.  Nevertheless, the OSDs on my slower PE2970 nodes fill
up much faster than those on HP585s or Dell R820s.  I've handled this by
dropping priorities and, in a couple of cases, outing or removing the OSD.

Kevin, generally speaking, the OSDs that fill up on me are the same ones.
 Once I lower the weights, they stay low or they fill back up again within
days or hours of re-raising the weight.  Please try to lift them up though,
maybe you'll have better luck than me.

--Greg


On Tue, Nov 5, 2013 at 11:30 AM, Kevin Weiler
wrote:

>   All of the disks in my cluster are identical and therefore all have the
> same weight (each drive is 2TB and the automatically generated weight is
> 1.82 for each one).
>
>  Would the procedure here be to reduce the weight, let it rebal, and then
> put the weight back to where it was?
>
>
>  --
>
> *Kevin Weiler*
>
> IT
>
>
>
> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
> 60606 | http://imc-chicago.com/
>
> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
> *kevin.wei...@imc-chicago.com
> *
>
>   From: , Erik 
> Date: Tuesday, November 5, 2013 10:27 AM
> To: Greg Chavez , Kevin Weiler <
> kevin.wei...@imc-chicago.com>
> Cc: "ceph-users@lists.ceph.com" 
> Subject: RE: [ceph-users] near full osd
>
>   If there’s an underperforming disk, why on earth would *more* data be
> put on it?  You’d think it would be less….   I would think an
> *overperforming* disk should (desirably) cause that case,right?
>
>
>
> *From:* ceph-users-boun...@lists.ceph.com [
> mailto:ceph-users-boun...@lists.ceph.com]
> *On Behalf Of *Greg Chavez
> *Sent:* Tuesday, November 05, 2013 11:20 AM
> *To:* Kevin Weiler
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] near full osd
>
>
>
> Kevin, in my experience that usually indicates a bad or underperforming
> disk, or a too-high priority.  Try running "ceph osd crush reweight
> osd.<##> 1.0.  If that doesn't do the trick, you may want to just out that
> guy.
>
>
>
> I don't think the crush algorithm guarantees balancing things out in the
> way you're expecting.
>
>
>
> --Greg
>
> On Tue, Nov 5, 2013 at 11:11 AM, Kevin Weiler <
> kevin.wei...@imc-chicago.com> wrote:
>
> Hi guys,
>
>
>
> I have an OSD in my cluster that is near full at 90%, but we're using a
> little less than half the available storage in the cluster. Shouldn't this
> be balanced out?
>
>
>
> --
>
> *Kevin Weiler*
>
> IT
>
>
>
> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
> 60606 | http://imc-chicago.com/
>
> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
> *kevin.wei...@imc-chicago.com
> *
>
>
>  --
>
>
> The information in this e-mail is intended only for the person or entity
> to which it is addressed.
>
> It may contain confidential and /or privileged material. If someone other
> than the intended recipient should receive this e-mail, he / she shall not
> be entitled to read, disseminate, disclose or duplicate it.
>
> If you receive this e-mail unintentionally, please inform us immediately
> by "reply" and then delete it from your system. Although this information
> has been compiled with great care, neither IMC Financial Markets & Asset
> Management nor any of its related entities shall accept any responsibility
> for any errors, omissions or other inaccuracies in this information or for
> the consequences thereof, nor shall it be bound in any way by the contents
> of this e-mail or its attachments. In the event of incomplete or incorrect
> transmission, please return the e-mail to the sender and permanently delete
> this message and any attachments.
>
> Messages and attachments are scanned for all known viruses. Always scan
> attachments before opening them.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
>
> The information in this e-mail is intended only for the person or entity
> to which it is addressed.
>
> It may contain confidential and /or privileged material. If someone other
> than the intended recipient should receive this e-mail, he / she shall not
> be entitled to read, disseminate, disclose or duplicate it.
>
> If you receive this e-mail unintentionally, please inform us immediately
> by "reply" and then delete it from your system. Although this information
> has been compiled with great care, neither IMC Financial Markets & Asset
> Management nor any of its related entities shall accept any responsibility
> for any errors, omissions or other inaccuracies in this information or for
> the consequences thereof, nor shall it be bound in any way by the contents
> of this e-mail or its attachments. In the event of incomplete or incorrect
> transmission, please return the e-mail to the sender and permanently delete
> this messa

Re: [ceph-users] About memory usage of ceph-mon on arm

2013-11-05 Thread james


We recently discussed briefly the Seagate Ethernet drives, which were 
basically dismissed as too limited.  But what about moving an ARM SBC to 
the drive tray, complete with an mSATA SSD slot?


A proper SBC could implement full Ubuntu single-drive failure domains 
that also solve the journal issue.


Using Seagate's dual-ethernet-over-sata-plug thingie, presumably any 
interested party could make a drive chassis to accept Ethernet hot-plug 
drives (i.e. basically a switch with a couple of 10GbE ports on the 
back).


There are a couple of low-cost SBCs with GbE and SATA - anyone know of 
one around $100 that provides 2x GbE and 2x SATA?


Apologise for the O/T - but you did raise the SBC idea :)


On 2013-11-05 15:21, Yu Changyuan wrote:

Finally, my tiny ceph cluster get 3 monitors, newly added mon.b and
mon.c both running on cubieboard2, which is cheap but still with
enough cpu power(dual-core arm A7 cpu, 1.2G) and memory(1G).

But compare to mon.a which running on an amd64 cpu, both mon.b and
mon.c easily consume too much memory, so I want to know whether this
is caused by memory leak. Below is the output of 'ceph tell mon.a 
heap

stats' and 'ceph tell mon.c heap stats'(mon.c only start 12hr ago,
while mon.a already running for more than 10 days)

mon.atcmalloc heap 
stats:

MALLOC:    5480160 (    5.2 MiB) Bytes in use by application
MALLOC: + 28065792 (   26.8 MiB) Bytes in page heap freelist
MALLOC: + 15242312 (   14.5 MiB) Bytes in central cache freelist
 MALLOC: + 10116608 (    9.6 MiB) Bytes in transfer cache 
freelist

MALLOC: + 10432216 (    9.9 MiB) Bytes in thread cache freelists
MALLOC: +  1667224 (    1.6 MiB) Bytes in malloc metadata
MALLOC:   
 MALLOC: = 71004312 (   67.7 MiB) Actual memory used (physical + 
swap)
MALLOC: + 57540608 (   54.9 MiB) Bytes released to OS (aka 
unmapped)

MALLOC:   
MALLOC: =    128544920 (  122.6 MiB) Virtual address space used
 MALLOC:
MALLOC:   4655  Spans in use
MALLOC: 34  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

 Call ReleaseFreeMemory() to release freelist memory to the OS (via 
madvise()).

Bytes released to the

mon.ctcmalloc heap 
stats:

MALLOC:  175861640 (  167.7 MiB) Bytes in use by application
 MALLOC: +  2220032 (    2.1 MiB) Bytes in page heap freelist
MALLOC: +  1007560 (    1.0 MiB) Bytes in central cache freelist
MALLOC: +  2871296 (    2.7 MiB) Bytes in transfer cache freelist
MALLOC: +  4686000 (    4.5 MiB) Bytes in thread cache freelists
 MALLOC: +  2758880 (    2.6 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =    189405408 (  180.6 MiB) Actual memory used (physical + 
swap)
MALLOC: +    0 (    0.0 MiB) Bytes released to OS (aka 
unmapped)

 MALLOC:   
MALLOC: =    189405408 (  180.6 MiB) Virtual address space used
MALLOC:
MALLOC:   3445  Spans in use
MALLOC: 14  Thread heaps in use
MALLOC:   8192  Tcmalloc page size
 
Call ReleaseFreeMemory() to release freelist memory to the OS (via 
madvise()).

Bytes released to the

The ceph versin is 0.67.4, compiled with tcmalloc enabled,
gcc(armv7a-hardfloat-linux-gnueabi-gcc) version 4.7.3 and I also try
to dump heap, but I can not find anything useful, below is a recent
dump, output by command "pprof --text /usr/bin/ceph-mon
mon.c.profile.0021.heap". What extra step should I  take to make the
dump more meaningful?

Using local file /usr/bin/ceph-mon.
Using local file mon.c.profile.0021.heap.
Total: 149.3 MB
   146.2  97.9%  97.9%    146.2  97.9% b6a7ce7c
 1.4   0.9%  98.9%  1.4   0.9% 
std::basic_string::_Rep::_S_create ??:0

  1.4   0.9%  99.8%  1.4   0.9% 002dd794
 0.1   0.1%  99.9%  0.1   0.1% b6a81170
 0.1   0.1%  99.9%  0.1   0.1% b6a80894
 0.0   0.0% 100.0%  0.0   0.0% b6a7e2ac
  0.0   0.0% 100.0%  0.0   0.0% b6a81410
 0.0   0.0% 100.0%  0.0   0.0% 00367450
 0.0   0.0% 100.0%  0.0   0.0% 001d4474
 0.0   0.0% 100.0%  0.0   0.0% 0028847c
  0.0   0.0% 100.0%  0.0   0.0% b6a7e8d8
 0.0   0.0% 100.0%  0.0   0.0% 0020c80c
 0.0   0.0% 100.0%  0.0   0.0% 0028bd20
 0.0   0.0% 100.0%  0.0   0.0% b6a63248
  0.0   0.0% 100.0%  0.0   0.0% b6a83478
 0.0   0.0% 100.0%  0.0   0.0% b6a806f0
 0.0   0.0% 100.0%  0.0   0.0% 002eb8b8
 0.0   0.0% 100.0%  0.0   0.0% 0024efb4
  0.0   0.0% 100.0%  0.0   0.0% 0027e550
 0.0   0.0% 100.0%  0.0   0.0% b6a77104
 0.0   

Re: [ceph-users] ceph recovery killing vms

2013-11-05 Thread Kurt Bauer


Kevin Weiler schrieb:
> Thanks Kyle,
>
> What's the unit for osd recovery max chunk?
Have a look at
http://ceph.com/docs/master/rados/configuration/osd-config-ref/ where
all the possible OSD config options are described, especially have a
look at the backfilling and recovery sections.
>
> Also, how do I find out what my current values are for these osd options?
Have a look at
http://ceph.com/docs/master/rados/configuration/ceph-conf/#viewing-a-configuration-at-runtime

Best regards,
Kurt
>
> --
>
> Kevin Weiler
>
> IT
>
>
> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
> 60606 | http://imc-chicago.com/
>
> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
> kevin.wei...@imc-chicago.com
>
>
>
>
>
>
>
> On 10/28/13 6:22 PM, "Kyle Bader"  wrote:
>
>> You can change some OSD tunables to lower the priority of backfills:
>>
>>osd recovery max chunk:   8388608
>>osd recovery op priority: 2
>>
>> In general a lower op priority means it will take longer for your
>> placement groups to go from degraded to active+clean, the idea is to
>> balance recovery time and not starving client requests. I've found 2
>> to work well on our clusters, YMMV.
>>
>> On Mon, Oct 28, 2013 at 10:16 AM, Kevin Weiler
>>  wrote:
>>> Hi all,
>>>
>>> We have a ceph cluster that being used as a backing store for several
>>> VMs
>>> (windows and linux). We notice that when we reboot a node, the cluster
>>> enters a degraded state (which is expected), but when it begins to
>>> recover,
>>> it starts backfilling and it kills the performance of our VMs. The VMs
>>> run
>>> slow, or not at all, and also seem to switch it's ceph mounts to
>>> read-only.
>>> I was wondering 2 things:
>>>
>>> Shouldn't we be recovering instead of backfilling? It seems like
>>> backfilling
>>> is much more intensive operation
>>> Can we improve the recovery/backfill performance so that our VMs don't
>>> go
>>> down when there is a problem with the cluster?
>>>
>>>
>>> --
>>>
>>> Kevin Weiler
>>>
>>> IT
>>>
>>>
>>>
>>> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
>>> 60606
>>> | http://imc-chicago.com/
>>>
>>> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
>>> kevin.wei...@imc-chicago.com
>>>
>>>
>>> 
>>>
>>> The information in this e-mail is intended only for the person or
>>> entity to
>>> which it is addressed.
>>>
>>> It may contain confidential and /or privileged material. If someone
>>> other
>>> than the intended recipient should receive this e-mail, he / she shall
>>> not
>>> be entitled to read, disseminate, disclose or duplicate it.
>>>
>>> If you receive this e-mail unintentionally, please inform us
>>> immediately by
>>> "reply" and then delete it from your system. Although this information
>>> has
>>> been compiled with great care, neither IMC Financial Markets & Asset
>>> Management nor any of its related entities shall accept any
>>> responsibility
>>> for any errors, omissions or other inaccuracies in this information or
>>> for
>>> the consequences thereof, nor shall it be bound in any way by the
>>> contents
>>> of this e-mail or its attachments. In the event of incomplete or
>>> incorrect
>>> transmission, please return the e-mail to the sender and permanently
>>> delete
>>> this message and any attachments.
>>>
>>> Messages and attachments are scanned for all known viruses. Always scan
>>> attachments before opening them.
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> --
>>
>> Kyle
>
>
> 
>
> The information in this e-mail is intended only for the person or entity to 
> which it is addressed.
>
> It may contain confidential and /or privileged material. If someone other 
> than the intended recipient should receive this e-mail, he / she shall not be 
> entitled to read, disseminate, disclose or duplicate it.
>
> If you receive this e-mail unintentionally, please inform us immediately by 
> "reply" and then delete it from your system. Although this information has 
> been compiled with great care, neither IMC Financial Markets & Asset 
> Management nor any of its related entities shall accept any responsibility 
> for any errors, omissions or other inaccuracies in this information or for 
> the consequences thereof, nor shall it be bound in any way by the contents of 
> this e-mail or its attachments. In the event of incomplete or incorrect 
> transmission, please return the e-mail to the sender and permanently delete 
> this message and any attachments.
>
> Messages and attachments are scanned for all known viruses. Always scan 
> attachments before opening them.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Kurt Bauer 
Vienna University Computer Center - 

Re: [ceph-users] ceph recovery killing vms

2013-11-05 Thread Kevin Weiler
Thanks Kurt,

I wasn't aware of the second page and has been very helpful. However, the osd 
recovery max chunk doesn't list a unit


osd recovery max chunk

Description:The maximum size of a recovered chunk of data to push.
Type:   64-bit Integer Unsigned
Default:1 << 20



I assume this is in bytes.


--
Kevin Weiler
IT

IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | 
http://imc-chicago.com/
Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
kevin.wei...@imc-chicago.com

From: Kurt Bauer mailto:kurt.ba...@univie.ac.at>>
Date: Tuesday, November 5, 2013 2:52 PM
To: Kevin Weiler 
mailto:kevin.wei...@imc-chicago.com>>
Cc: "ceph-users@lists.ceph.com" 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] ceph recovery killing vms



Kevin Weiler schrieb:

Thanks Kyle,

What's the unit for osd recovery max chunk?

Have a look at http://ceph.com/docs/master/rados/configuration/osd-config-ref/ 
where all the possible OSD config options are described, especially have a look 
at the backfilling and recovery sections.

Also, how do I find out what my current values are for these osd options?

Have a look at 
http://ceph.com/docs/master/rados/configuration/ceph-conf/#viewing-a-configuration-at-runtime

Best regards,
Kurt

--

Kevin Weiler

IT


IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
60606 | http://imc-chicago.com/

Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
kevin.wei...@imc-chicago.com







On 10/28/13 6:22 PM, "Kyle Bader" 
 wrote:



You can change some OSD tunables to lower the priority of backfills:

   osd recovery max chunk:   8388608
   osd recovery op priority: 2

In general a lower op priority means it will take longer for your
placement groups to go from degraded to active+clean, the idea is to
balance recovery time and not starving client requests. I've found 2
to work well on our clusters, YMMV.

On Mon, Oct 28, 2013 at 10:16 AM, Kevin Weiler
 wrote:


Hi all,

We have a ceph cluster that being used as a backing store for several
VMs
(windows and linux). We notice that when we reboot a node, the cluster
enters a degraded state (which is expected), but when it begins to
recover,
it starts backfilling and it kills the performance of our VMs. The VMs
run
slow, or not at all, and also seem to switch it's ceph mounts to
read-only.
I was wondering 2 things:

Shouldn't we be recovering instead of backfilling? It seems like
backfilling
is much more intensive operation
Can we improve the recovery/backfill performance so that our VMs don't
go
down when there is a problem with the cluster?


--

Kevin Weiler

IT



IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
60606
| http://imc-chicago.com/

Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
kevin.wei...@imc-chicago.com




The information in this e-mail is intended only for the person or
entity to
which it is addressed.

It may contain confidential and /or privileged material. If someone
other
than the intended recipient should receive this e-mail, he / she shall
not
be entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us
immediately by
"reply" and then delete it from your system. Although this information
has
been compiled with great care, neither IMC Financial Markets & Asset
Management nor any of its related entities shall accept any
responsibility
for any errors, omissions or other inaccuracies in this information or
for
the consequences thereof, nor shall it be bound in any way by the
contents
of this e-mail or its attachments. In the event of incomplete or
incorrect
transmission, please return the e-mail to the sender and permanently
delete
this message and any attachments.

Messages and attachments are scanned for all known viruses. Always scan
attachments before opening them.

___
ceph-users mailing list
ceph-users@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Kyle






The information in this e-mail is intended only for the person or entity to 
which it is addressed.

It may contain confidential and /or privileged material. If someone other than 
the intended recipient should receive this e-mail, he / she shall not be 
entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by 
"reply" and then delete it from your system. Although this information has been 
compiled with great care, neither IMC Financial Markets & Asset Management nor 
any of its related entities shall accept any responsibility for any er

[ceph-users] USB pendrive as boot disk

2013-11-05 Thread Gandalf Corvotempesta
Hi,
what do you think to use a USB pendrive as boot disk for OSDs nodes?
Pendrive are cheaper and bigger, and doing this will allow me to use
all spinning disks and SSDs as OSD storage/journal.

More over, in a future, i'll be able to boot from net replacing the
pendrive without loosing space on spinning disks to store operating
system
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] USB pendrive as boot disk

2013-11-05 Thread james
It has been reported that the system is heavy on the OS during 
recovery; I believe the current recommendation is 5:1 OSD disks to SSDs 
and separate OS mirror.


On 2013-11-05 21:33, Gandalf Corvotempesta wrote:

Hi,
what do you think to use a USB pendrive as boot disk for OSDs nodes?
Pendrive are cheaper and bigger, and doing this will allow me to use
all spinning disks and SSDs as OSD storage/journal.

More over, in a future, i'll be able to boot from net replacing the
pendrive without loosing space on spinning disks to store operating
system
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Aging off objects [SEC=UNOFFICIAL]

2013-11-05 Thread Dickson, Matt MR
UNOFFICIAL

Hi,

I'm new to Ceph and investigating how objects can be aged off, ie delete all 
objects older than 7 days.  Is there funtionality to do this via the Ceph SWIFT 
api or alternatively using a java rados libaray?

Thanks in advance,
Matt Dickson
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Very frustrated with Ceph!

2013-11-05 Thread Neil Levine
In the Debian world, purge does both a removal of the package and a clean
up the files so might be good to keep semantic consistency here?


On Tue, Nov 5, 2013 at 1:11 AM, Sage Weil  wrote:

> Purgedata is only meant to be run *after* the package is uninstalled.  We
> should make it do a check to enforce that.  Otherwise we run into these
> problems...
>
>
> Mark Kirkwood  wrote:
>>
>> On 05/11/13 06:37, Alfredo Deza wrote:
>>
>>>  On Mon, Nov 4, 2013 at 12:25 PM, Gruher, Joseph R
>>>   wrote:
>>>
  Could these problems be caused by running a purgedata but not a purge?

>>>
>>>  It could be, I am not clear on what the expectation was for just doing
>>>  purgedata without a purge.
>>>
>>>  Purgedata removes /etc/ceph but without the purge ceph is still installed,
  then ceph-deploy install detects ceph as already installed and does not
  (re)create /etc/ceph?

>>>
>>>  ceph-deploy will not create directories for you, that is
>>> left to the
>>>  ceph install process, and just to be clear, the
>>>  latest ceph-deploy version (1.3) does not remote /etc/ceph, just the 
>>> contents.
>>>
>>
>> Yeah, however purgedata is removing /var/lib/ceph, which means after
>> running purgedata you need to either run purge then install or manually
>> recreate the various working directories under /var/lib/ceph before
>> attempting any mon. mds or osd creation.
>>
>> Maybe purgedata should actually leave those top level dirs under
>> /var/lib/ceph?
>>
>> regards
>>
>> Mark
>> --
>>
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Very frustrated with Ceph!

2013-11-05 Thread Dan Mick
Yeah; purge does remove packages and *package config files*; however, 
Ceph data is in a different class, hence the existence of purgedata.


A user might be furious if he did what he thought was "remove the 
packages" and the process also creamed his terabytes of stored data he 
was in the process of moving to a different OSD server, manually 
recovering, or whatever.


On 11/05/2013 03:03 PM, Neil Levine wrote:

In the Debian world, purge does both a removal of the package and a
clean up the files so might be good to keep semantic consistency here?


On Tue, Nov 5, 2013 at 1:11 AM, Sage Weil mailto:s...@newdream.net>> wrote:

Purgedata is only meant to be run *after* the package is
uninstalled.  We should make it do a check to enforce that.
Otherwise we run into these problems...



Mark Kirkwood mailto:mark.kirkw...@catalyst.net.nz>> wrote:

On 05/11/13 06:37, Alfredo Deza wrote:

On Mon, Nov 4, 2013 at 12:25 PM, Gruher, Joseph R
mailto:joseph.r.gru...@intel.com>> wrote:

Could these problems be caused by running a purgedata
but not a purge?


It could be, I am not clear on what the expectation was for
just doing
purgedata without a purge.

Purgedata removes /etc/ceph but without the purge ceph
is still installed,
then ceph-deploy install detects ceph as already
installed and does not
(re)create /etc/ceph?


ceph-deploy will not create directories for you, that is
left to the
ceph install process, and just to be clear, the
latest ceph-deploy version (1.3) does not remote /etc/ceph,
just the contents.


Yeah, however purgedata is removing /var/lib/ceph, which means
after
running purgedata you need to either run purge then install or
manually
recreate the various working directories under /var/lib/ceph before
attempting any mon. mds or osd creation.

Maybe purgedata should actually leave those top level dirs under
/var/lib/ceph?

regards

Mark


ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Running on disks that lose their head

2013-11-05 Thread Loic Dachary
Hi Ceph,

People from Western Digital suggested ways to better take advantage of the disk 
error reporting. They gave two examples that struck my imagination. First there 
are errors that look like the disk is dying ( read / write failures ) but it's 
only a transient problem and the driver should be able to make the difference 
by properly interpreting the available information. They said that the 
prolonged life you get if you don't decommission a disk that only has a 
transient error is significant. The second example is when one head out of ten 
fails : disks can keep working with the nine remaining heads. Losing 1/10 of 
the disk is likely to result in a full re-install of the Ceph osd. But, again, 
the disk could keep going after that, with 9/10 of its original capacity. And 
Ceph is good at handling osd failures.

All this is news to me and sounds really cool. But I'm sure there are people 
who already know about it and I'm eager to hear their opinion :-)

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Very frustrated with Ceph!

2013-11-05 Thread Mark Kirkwood
I think purge of several data containing packages will ask if you want 
to destroy that too (Mysql comes to mind - asks if you want to remove 
the databases under /var/lib/mysql). So this is possibly reasonable 
behaviour.


Cheers

Mark

On 06/11/13 13:25, Dan Mick wrote:
Yeah; purge does remove packages and *package config files*; however, 
Ceph data is in a different class, hence the existence of purgedata.


A user might be furious if he did what he thought was "remove the 
packages" and the process also creamed his terabytes of stored data he 
was in the process of moving to a different OSD server, manually 
recovering, or whatever.


On 11/05/2013 03:03 PM, Neil Levine wrote:

In the Debian world, purge does both a removal of the package and a
clean up the files so might be good to keep semantic consistency here?


On Tue, Nov 5, 2013 at 1:11 AM, Sage Weil mailto:s...@newdream.net>> wrote:

Purgedata is only meant to be run *after* the package is
uninstalled.  We should make it do a check to enforce that.
Otherwise we run into these problems...



Mark Kirkwood mailto:mark.kirkw...@catalyst.net.nz>> wrote:

On 05/11/13 06:37, Alfredo Deza wrote:

On Mon, Nov 4, 2013 at 12:25 PM, Gruher, Joseph R
mailto:joseph.r.gru...@intel.com>> wrote:

Could these problems be caused by running a purgedata
but not a purge?


It could be, I am not clear on what the expectation was for
just doing
purgedata without a purge.

Purgedata removes /etc/ceph but without the purge ceph
is still installed,
then ceph-deploy install detects ceph as already
installed and does not
(re)create /etc/ceph?


ceph-deploy will not create directories for you, that is
left to the
ceph install process, and just to be clear, the
latest ceph-deploy version (1.3) does not remote /etc/ceph,
just the contents.


Yeah, however purgedata is removing /var/lib/ceph, which means
after
running purgedata you need to either run purge then install or
manually
recreate the various working directories under /var/lib/ceph 
before

attempting any mon. mds or osd creation.

Maybe purgedata should actually leave those top level dirs under
/var/lib/ceph?

regards

Mark


ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Very frustrated with Ceph!

2013-11-05 Thread Mark Kirkwood
... forgot to add: maybe 'uninstall' should be target for ceph-deploy 
that removes just the actual software daemons...


On 06/11/13 14:16, Mark Kirkwood wrote:
I think purge of several data containing packages will ask if you want 
to destroy that too (Mysql comes to mind - asks if you want to remove 
the databases under /var/lib/mysql). So this is possibly reasonable 
behaviour.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Very frustrated with Ceph!

2013-11-05 Thread Mark Nelson
We had a discussion about all of this a year ago (when package purge was 
removing mds data and thus destroying clusters).  I think we have to be 
really careful here as it's rather permanent if you make a bad choice. 
I'd much rather that users be annoyed with me that they have to go 
manually clean up old data vs users who can't get their data back 
without herculean efforts.


Mark

On 11/05/2013 07:19 PM, Mark Kirkwood wrote:

... forgot to add: maybe 'uninstall' should be target for ceph-deploy
that removes just the actual software daemons...

On 06/11/13 14:16, Mark Kirkwood wrote:

I think purge of several data containing packages will ask if you want
to destroy that too (Mysql comes to mind - asks if you want to remove
the databases under /var/lib/mysql). So this is possibly reasonable
behaviour.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Very frustrated with Ceph!

2013-11-05 Thread Mark Kirkwood

Yep - better to be overly cautious about that :-)

On 06/11/13 14:40, Mark Nelson wrote:
We had a discussion about all of this a year ago (when package purge 
was removing mds data and thus destroying clusters).  I think we have 
to be really careful here as it's rather permanent if you make a bad 
choice. I'd much rather that users be annoyed with me that they have 
to go manually clean up old data vs users who can't get their data 
back without herculean efforts.


Mark

On 11/05/2013 07:19 PM, Mark Kirkwood wrote:

... forgot to add: maybe 'uninstall' should be target for ceph-deploy
that removes just the actual software daemons...

On 06/11/13 14:16, Mark Kirkwood wrote:

I think purge of several data containing packages will ask if you want
to destroy that too (Mysql comes to mind - asks if you want to remove
the databases under /var/lib/mysql). So this is possibly reasonable
behaviour.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] stopped backfilling process

2013-11-05 Thread Dominik Mostowiec
Hi,
This is s3/ceph cluster, .rgw.buckets has 3 copies of data.
Many PG's are only on 2 OSD's and are marked as 'degraded'.
Scrubbing can fix this on degraded object's?

I don't have set tunables in cruch, mabye this can help (this is safe?)?

--
Regards
Dominik



2013/11/5 Dominik Mostowiec :
> Hi,
> After remove ( ceph osd out X) osd from one server ( 11 osd ) ceph
> starts data migration process.
> It stopped on:
> 32424 pgs: 30635 active+clean, 191 active+remapped, 1596
> active+degraded, 2 active+clean+scrubbing;
> degraded (1.718%)
>
> All osd with reweight==1 are UP.
>
> ceph -v
> ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33)
>
> health details:
> https://www.dropbox.com/s/149zvee2ump1418/health_details.txt
>
> pg active+degraded query:
> https://www.dropbox.com/s/46emswxd7s8xce1/pg_11.39_query.txt
> pg active+remapped query:
> https://www.dropbox.com/s/wij4uqh8qoz60fd/pg_16.2172_query.txt
>
> Please help - how can we fix it?
>
> --
> Pozdrawiam
> Dominik



-- 
Pozdrawiam
Dominik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] s3 user can't create bucket

2013-11-05 Thread lixuehui
Hi all:

I failed to create bucket with s3 API. the error is 403 'Access Denied'.In fact 
,I've give the user write permission.
{ "user_id": "lxh",
  "display_name": "=lxh",
  "email": "",
  "suspended": 0,
  "max_buckets": 1000,
  "auid": 0,
  "subusers": [],
  "keys": [
{ "user": "lxh",
  "access_key": "JZ9N42JQY636PTTZ76VZ",
  "secret_key": "2D37kjLXda7dPxGpjJ3ZhNCBHzd9wmxoJnf9FcQo"}],
  "swift_keys": [],
  "caps": [
{ "type": "usage",
  "perm": "*"},
{ "type": "user",
  "perm": "*"}],
  "op_mask": "read, write, delete",
  "default_placement": "",
  "placement_tags": []}

At the same time,there is not '\' generated in the secret_key.
2013-11-06 15:20:31.787363 7f167df9b700  2 req 1:0.000522::PUT 
/my_bucket/::initializing
2013-11-06 15:20:31.787435 7f167df9b700 10 host=cephtest.com 
rgw_dns_name=ceph-osd26
2013-11-06 15:20:31.787929 7f167df9b700 10 s->object= s->bucket=my_bucket
2013-11-06 15:20:31.788085 7f167df9b700 20 FCGI_ROLE=RESPONDER
2013-11-06 15:20:31.788107 7f167df9b700 20 SCRIPT_URL=/my_bucket/
2013-11-06 15:20:31.788119 7f167df9b700 20 
SCRIPT_URI=http://cephtest.com/my_bucket/
2013-11-06 15:20:31.788130 7f167df9b700 20 HTTP_HOST=cephtest.com
2013-11-06 15:20:31.788140 7f167df9b700 20 HTTP_ACCEPT_ENCODING=identity
2013-11-06 15:20:31.788151 7f167df9b700 20 HTTP_DATE=Wed, 06 Nov 2013 07:20:31 
GMT
2013-11-06 15:20:31.788162 7f167df9b700 20 CONTENT_LENGTH=0
2013-11-06 15:20:31.788172 7f167df9b700 20 HTTP_USER_AGENT=Boto/2.15.0 
Python/2.7.3 Linux/3.5.0-23-generic
2013-11-06 15:20:31.788182 7f167df9b700 20 PATH=/usr/local/bin:/usr/bin:/bin
2013-11-06 15:20:31.788193 7f167df9b700 20 SERVER_SIGNATURE=
2013-11-06 15:20:31.788203 7f167df9b700 20 SERVER_SOFTWARE=Apache/2.2.22 
(Ubuntu)
2013-11-06 15:20:31.788213 7f167df9b700 20 SERVER_NAME=cephtest.com
2013-11-06 15:20:31.788223 7f167df9b700 20 SERVER_ADDR=192.168.50.116
2013-11-06 15:20:31.788234 7f167df9b700 20 SERVER_PORT=80
2013-11-06 15:20:31.788247 7f167df9b700 20 REMOTE_ADDR=192.168.50.116
2013-11-06 15:20:31.788260 7f167df9b700 20 DOCUMENT_ROOT=/var/www/
2013-11-06 15:20:31.788311 7f167df9b700 20 SERVER_ADMIN=[no address given]
2013-11-06 15:20:31.788324 7f167df9b700 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
2013-11-06 15:20:31.788336 7f167df9b700 20 REMOTE_PORT=45737
2013-11-06 15:20:31.788348 7f167df9b700 20 GATEWAY_INTERFACE=CGI/1.1
2013-11-06 15:20:31.788361 7f167df9b700 20 SERVER_PROTOCOL=HTTP/1.1
2013-11-06 15:20:31.788374 7f167df9b700 20 REQUEST_METHOD=PUT
2013-11-06 15:20:31.788389 7f167df9b700 20 
QUERY_STRING=[E=HTTP_AUTHORIZATION:AWS 
JZ9N42JQY636PTTZ76VZ:ttIro1R21j6GAjVsDITrz5DK66Y=,L]
2013-11-06 15:20:31.788471 7f167df9b700 20 REQUEST_URI=/my_bucket/
2013-11-06 15:20:31.788476 7f167df9b700 20 SCRIPT_NAME=/my_bucket/
2013-11-06 15:20:31.788483 7f167df9b700  2 req 1:0.001643:s3:PUT 
/my_bucket/::getting op
2013-11-06 15:20:31.788519 7f167df9b700  2 req 1:0.001679:s3:PUT 
/my_bucket/:create_bucket:authorizing
2013-11-06 15:20:31.788638 7f167df9b700  2 req 1:0.001798:s3:PUT 
/my_bucket/:create_bucket:reading permissions
2013-11-06 15:20:31.788688 7f167df9b700  2 req 1:0.001847:s3:PUT 
/my_bucket/:create_bucket:verifying op mask
2013-11-06 15:20:31.788719 7f167df9b700 20 required_mask= 2 user.op_mask=7
2013-11-06 15:20:31.788743 7f167df9b700  2 req 1:0.001903:s3:PUT 
/my_bucket/:create_bucket:verifying op permissions
2013-11-06 15:20:31.789225 7f167df9b700  2 req 1:0.002385:s3:PUT 
/my_bucket/:create_bucket:http status=403
2013-11-06 15:20:31.790319 7f167df9b700  1 == req done req=0x20d6eb0 
http_status=403 ==


the program is like this:
import boto
import boto.s3.connection 
access_key='JZ9N42JQY636PTTZ76VZ'
secret_key='2D37kjLXda7dPxGpjJ3ZhNCBHzd9wmxoJnf9FcQo'
conn=boto.connect_s3(
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
host="cephtest.com",
is_secure=False,
calling_format=boto.s3.connection.OrdinaryCallingFormat(),
)
print "hello world"
conn.create_bucket('my_bucket')
It seems the permission problem,but I really can not reslove the problem with 
the user information.
HELP!
HELP!!
Thanks for any help!
 



lixuehui___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] USB pendrive as boot disk

2013-11-05 Thread Gandalf Corvotempesta
2013/11/5  :
> It has been reported that the system is heavy on the OS during recovery;

Why? Recovery is made from OSDs/SSD, why ceph is heavy on OS disks?
There is nothing usefull to read from that disks during a recovery.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running on disks that lose their head

2013-11-05 Thread james
It is cool - and it's interesting that more and more access to the 
inner workings of the drives would be useful, given ATA controller 
history (an evolution of the WD1010 MFM controller) having hidden 
steadily more, to maintain compatibility with the old CHS addressing 
(later LBA).


The streaming command set could (I think) help, since the number of 
re-tries can be specified down to 3 revolutions only, after which the 
drive will return whatever it has but with the error flag set 
accordingly.  Especially with btrfs with it's own checksumming, in a way 
this seems logical... there's another 2 copies elsewhere anyway and you 
know which of those is good via the FS level checksum.


The head failure is an interesting scenario.  But a drive in such state 
could surely never complete its own POST with the firmwares implemented 
today - if you take the lid off and scratch up the top surface, it kills 
the drive immediately, though they will run without a lid for a bit 
(minus the scratches).


IDE integrated the MFM controller on the drive, now it seems there is 
opportunity to integrate the rest of the system onto the drive too - IDS 
drives perhaps?  This would give us the ability to run our own OS (i.e. 
Linux), and if each surface were to be presented as a separate block 
device (/sda, /sdb etc) along with a separate SSD device or other NVRAM 
for journalling or caching, so the disk can be used in whatever way the 
use case requires, and the drive becomes much more useful surely?


Ultimately cheaper deployments should be possible: say such drives were 
4TB, cost £400 and used 12W, with 16A per rack power budget then drives, 
rack, power, backplane and 10Gb switches all guestimated... something 
like £460 per usable TB to operate for 3 years (1.3p/GB-month).


Would love to know where the info from WD came from ;)


On 2013-11-06 00:32, Loic Dachary wrote:

Hi Ceph,

People from Western Digital suggested ways to better take advantage
of the disk error reporting. They gave two examples that struck my
imagination. First there are errors that look like the disk is dying 
(

read / write failures ) but it's only a transient problem and the
driver should be able to make the difference by properly interpreting
the available information. They said that the prolonged life you get
if you don't decommission a disk that only has a transient error is
significant. The second example is when one head out of ten fails :
disks can keep working with the nine remaining heads. Losing 1/10 of
the disk is likely to result in a full re-install of the Ceph osd.
But, again, the disk could keep going after that, with 9/10 of its
original capacity. And Ceph is good at handling osd failures.

All this is news to me and sounds really cool. But I'm sure there are
people who already know about it and I'm eager to hear their opinion
:-)

Cheers

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com