date:20150728

Re: [ceph-users] which kernel version can help avoid kernel client deadlock

2015-07-28 Thread Ilya Dryomov

On Tue, Jul 28, 2015 at 9:17 AM, van  wrote:
> Hi, list,
>
>   I found on the ceph FAQ that, ceph kernel client should not run on
> machines belong to ceph cluster.
>   As ceph FAQ metioned, “In older kernels, Ceph can deadlock if you try to
> mount CephFS or RBD client services on the same host that runs your test
> Ceph cluster. This is not a Ceph-related issue.”
>   Here it says that there will be deadlock if using old kernel version.
>   I wonder if anyone knows which new kernel version solve this loopback
> mount deadlock.
>   It will be a great help since I do need to use rbd kernel client on the
> ceph cluster.

Note that doing this is *not* recommended.  That said, if you don't
push your system to its knees too hard, it should work.  I'm not sure
what exactly constitutes and older kernel as per that FAQ (as you
haven't even linked it), but even if I knew, I'd still suggest 4.1.

>
>   As I search more informations, I found two articals
> https://lwn.net/Articles/595652/ and https://lwn.net/Articles/596618/  talk
> about supporting nfs loopback mount，it seems they do effort not on memory
> management only, but also on nfs related codes, I wonder if ceph has also so
> some effort on kernel client to solve this problem. If ceph did, could
> anyone help provide the kernel version with the patch?

There wasn't any specific effort on the ceph side, but we do try not to
break it: sometime around 3.18 a ceph patch was merged that made it
impossible to do co-locate kernel client with OSDs; once we realized
that, the culprit patch was reverted and the revert was backported.

So the bottom line is we don't recommend it, but we try not to break
your ability to do it ;)

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] State of nfs-ganesha CEPH fsal

2015-07-28 Thread Burkhard Linke


Hi,

On 07/27/2015 05:42 PM, Gregory Farnum wrote:

On Mon, Jul 27, 2015 at 4:33 PM, Burkhard Linke
 wrote:

Hi,

the nfs-ganesha documentation states:

"... This FSAL links to a modified version of the CEPH library that has been
extended to expose its distributed cluster and replication facilities to the
pNFS operations in the FSAL. ... The CEPH library modifications have not
been merged into the upstream yet. "

(https://github.com/nfs-ganesha/nfs-ganesha/wiki/Fsalsupport#ceph)

Is this still the case with the hammer release?

The FSAL has been upstream for quite a while, but it's not part of our
regular testing yet and I'm not sure what it gets from the Ganesha
side. I'd encourage you to test it, but be wary — we had a recent
report of some issues we haven't been able to set up to reproduce yet.
Can you give some details on that issues? I'm currently looking for a 
way to provide NFS based access to CephFS to our desktop machines.


The kernel NFS implementation in Ubuntu had some problems with CephFS in 
our setup, which I was not able to resolve yet. Ganesha seems to be more 
promising, since it uses libcephfs directly and does not need a 
mountpoint of its own.


Best regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] wrong documentation in add or rm mons

2015-07-28 Thread Makkelie, R (ITCDCC) - KLM

i followed the following documentation to add monitors to my already existing 
cluster with 1 mon
http://ceph.com/docs/master/rados/operations/add-or-rm-mons/

when i follow this documentation.
the monitor assimilates the old monitor so my monitor status is gone.

but when i skip the "ceph mon add  [:]" part
it adds the monitor and all works well.

this issue also happens with "ceph-deploy mon add"

so i think the documentation is not correct
can someone confirm this?

greetz
Ramonskie

For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any attachment may contain confidential and 
privileged material intended for the addressee only. If you are not the 
addressee, you are notified that no part of the e-mail or any attachment may be 
disclosed, copied or distributed, and that any other action related to this 
e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
received this e-mail by error, please notify the sender immediately by return 
e-mail, and delete this message.

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which kernel version can help avoid kernel client deadlock

2015-07-28 Thread van

Hi, Ilya,

  Thanks for your quick reply.

  Here is the link http://ceph.com/docs/cuttlefish/faq/ 
  , under the "HOW CAN I GIVE CEPH A 
TRY?” section which talk about the old kernel stuff.

  By the way, what’s the main reason of using kernel 4.1, is there a lot of 
critical bugs fixed in that version despite perf improvements?
  I am worrying kernel 4.1 is too new that may introduce other problems.
  And if I’m using the librdb API, is the kernel version matters?

  In my tests, I built a 2-nodes cluster, each with only one OSD with os centos 
7.1, kernel version 3.10.0.229 and ceph v0.94.2.
  I created several rbds and mkfs.xfs on those rbds to create filesystems. 
(kernel client were running on the ceph cluster)
  I performed heavy IO tests on those filesystems and found some fio got hung 
and turned into D state forever (uninterruptible sleep).
  I suspect it’s the deadlock that make the fio process hung.
  However the ceph-osd are stil responsive, and I can operate rbd via librbd 
API.
  Does this mean it’s not the loopback mount deadlock that cause the fio 
process hung?
  Or it is also a deadlock phnonmenon, only one thread is blocked in memory 
allocation and other threads are still possible to receive API requests, so the 
ceph-osd are still responsive?

  What worth mentioning is that after I restart the ceph-osd daemon, all 
processes in D state come back into normal state.

  Below is related log in kernel:

Jul  7 02:25:39 node0 kernel: INFO: task xfsaild/rbd1:24795 blocked for more 
than 120 seconds.
Jul  7 02:25:39 node0 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul  7 02:25:39 node0 kernel: xfsaild/rbd1D 880c2fc13680 0 24795
  2 0x0080
Jul  7 02:25:39 node0 kernel: 8801d6343d40 0046 
8801d6343fd8 00013680
Jul  7 02:25:39 node0 kernel: 8801d6343fd8 00013680 
880c0c0b 880c0c0b
Jul  7 02:25:39 node0 kernel: 880c2fc14340 0001 
 8805bace2528
Jul  7 02:25:39 node0 kernel: Call Trace:
Jul  7 02:25:39 node0 kernel: [] schedule+0x29/0x70
Jul  7 02:25:39 node0 kernel: [] _xfs_log_force+0x230/0x290 
[xfs]
Jul  7 02:25:39 node0 kernel: [] ? wake_up_state+0x20/0x20
Jul  7 02:25:39 node0 kernel: [] xfs_log_force+0x26/0x80 [xfs]
Jul  7 02:25:39 node0 kernel: [] ? 
xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Jul  7 02:25:39 node0 kernel: [] xfsaild+0x151/0x5e0 [xfs]
Jul  7 02:25:39 node0 kernel: [] ? 
xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Jul  7 02:25:39 node0 kernel: [] kthread+0xcf/0xe0
Jul  7 02:25:39 node0 kernel: [] ? 
kthread_create_on_node+0x140/0x140
Jul  7 02:25:39 node0 kernel: [] ret_from_fork+0x7c/0xb0
Jul  7 02:25:39 node0 kernel: [] ? 
kthread_create_on_node+0x140/0x140
Jul  7 02:25:39 node0 kernel: INFO: task xfsaild/rbd5:2914 blocked for more 
than 120 seconds.

  Does anyone encounter the same problem or could help with this?

  Thanks. 

> 
> On Jul 28, 2015, at 3:01 PM, Ilya Dryomov  wrote:
> 
> On Tue, Jul 28, 2015 at 9:17 AM, van  wrote:
>> Hi, list,
>> 
>>  I found on the ceph FAQ that, ceph kernel client should not run on
>> machines belong to ceph cluster.
>>  As ceph FAQ metioned, “In older kernels, Ceph can deadlock if you try to
>> mount CephFS or RBD client services on the same host that runs your test
>> Ceph cluster. This is not a Ceph-related issue.”
>>  Here it says that there will be deadlock if using old kernel version.
>>  I wonder if anyone knows which new kernel version solve this loopback
>> mount deadlock.
>>  It will be a great help since I do need to use rbd kernel client on the
>> ceph cluster.
> 
> Note that doing this is *not* recommended.  That said, if you don't
> push your system to its knees too hard, it should work.  I'm not sure
> what exactly constitutes and older kernel as per that FAQ (as you
> haven't even linked it), but even if I knew, I'd still suggest 4.1.

> 
>> 
>>  As I search more informations, I found two articals
>> https://lwn.net/Articles/595652/ and https://lwn.net/Articles/596618/  talk
>> about supporting nfs loopback mount，it seems they do effort not on memory
>> management only, but also on nfs related codes, I wonder if ceph has also so
>> some effort on kernel client to solve this problem. If ceph did, could
>> anyone help provide the kernel version with the patch?
> 
> There wasn't any specific effort on the ceph side, but we do try not to
> break it: sometime around 3.18 a ceph patch was merged that made it
> impossible to do co-locate kernel client with OSDs; once we realized
> that, the culprit patch was reverted and the revert was backported.
> 
> So the bottom line is we don't recommend it, but we try not to break
> your ability to do it ;)
> 
> Thanks,
> 
>Ilya

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] State of nfs-ganesha CEPH fsal

2015-07-28 Thread Gregory Farnum

On Tue, Jul 28, 2015 at 8:01 AM, Burkhard Linke
 wrote:
> Hi,
>
> On 07/27/2015 05:42 PM, Gregory Farnum wrote:
>>
>> On Mon, Jul 27, 2015 at 4:33 PM, Burkhard Linke
>>  wrote:
>>>
>>> Hi,
>>>
>>> the nfs-ganesha documentation states:
>>>
>>> "... This FSAL links to a modified version of the CEPH library that has
>>> been
>>> extended to expose its distributed cluster and replication facilities to
>>> the
>>> pNFS operations in the FSAL. ... The CEPH library modifications have not
>>> been merged into the upstream yet. "
>>>
>>> (https://github.com/nfs-ganesha/nfs-ganesha/wiki/Fsalsupport#ceph)
>>>
>>> Is this still the case with the hammer release?
>>
>> The FSAL has been upstream for quite a while, but it's not part of our
>> regular testing yet and I'm not sure what it gets from the Ganesha
>> side. I'd encourage you to test it, but be wary — we had a recent
>> report of some issues we haven't been able to set up to reproduce yet.
>
> Can you give some details on that issues? I'm currently looking for a way to
> provide NFS based access to CephFS to our desktop machines.

Ummm...sadly I can't; we don't appear to have any tracker tickets and
I'm not sure where the report went to. :( I think it was from
Haomai...
-Greg

>
> The kernel NFS implementation in Ubuntu had some problems with CephFS in our
> setup, which I was not able to resolve yet. Ganesha seems to be more
> promising, since it uses libcephfs directly and does not need a mountpoint
> of its own.
>
> Best regards,
> Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Did maximum performance reached?

2015-07-28 Thread Shneur Zalman Mattern

We've built Ceph cluster:
3 mon nodes (one of them is combined with mds)
3 osd nodes (each one have 10 osd + 2 ssd for journaling)
switch 24 ports x 10G
10 gigabit - for public network
20 gigabit bonding - between osds
Ubuntu 12.04.05
Ceph 0.87.2
-
Clients has:
10 gigabit for ceph-connection
CentOS 6.6 with kernel 3.19.8 equipped by cephfs-kmodule



== fio-2.0.13 seqwrite, bs=1M, filesize=10G, parallel-jobs=16 ===
Single client:


Starting 16 processes

.below is just 1 job info
trivial-readwrite-grid01: (groupid=0, jobs=1): err= 0: pid=10484: Tue Jul 28 
13:26:24 2015
  write: io=10240MB, bw=78656KB/s, iops=76 , runt=133312msec
slat (msec): min=1 , max=117 , avg=13.01, stdev=12.57
clat (usec): min=1 , max=68 , avg= 3.61, stdev= 1.99
 lat (msec): min=1 , max=117 , avg=13.01, stdev=12.57
clat percentiles (usec):
 |  1.00th=[1],  5.00th=[2], 10.00th=[2], 20.00th=[2],
 | 30.00th=[3], 40.00th=[3], 50.00th=[3], 60.00th=[4],
 | 70.00th=[4], 80.00th=[5], 90.00th=[5], 95.00th=[6],
 | 99.00th=[9], 99.50th=[   10], 99.90th=[   23], 99.95th=[   28],
 | 99.99th=[   62]
bw (KB/s)  : min=35790, max=318215, per=6.31%, avg=78816.91, stdev=26397.76
lat (usec) : 2=1.33%, 4=54.43%, 10=43.54%, 20=0.56%, 50=0.11%
lat (usec) : 100=0.03%
  cpu  : usr=0.89%, sys=12.85%, ctx=58248, majf=0, minf=9
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued: total=r=0/w=10240/d=0, short=r=0/w=0/d=0

...what's above repeats 16 times...

Run status group 0 (all jobs):
  WRITE: io=163840MB, aggrb=1219.8MB/s, minb=78060KB/s, maxb=78655KB/s, 
mint=133312msec, maxt=134329msec

+
Two clients:
+
below is just 1 job info
trivial-readwrite-gridsrv: (groupid=0, jobs=1): err= 0: pid=10605: Tue Jul 28 
14:05:59 2015
  write: io=10240MB, bw=43154KB/s, iops=42 , runt=242984msec
slat (usec): min=991 , max=285653 , avg=23716.12, stdev=23960.60
clat (usec): min=1 , max=65 , avg= 3.67, stdev= 2.02
 lat (usec): min=994 , max=285664 , avg=23723.39, stdev=23962.22
clat percentiles (usec):
 |  1.00th=[2],  5.00th=[2], 10.00th=[2], 20.00th=[2],
 | 30.00th=[3], 40.00th=[3], 50.00th=[3], 60.00th=[4],
 | 70.00th=[4], 80.00th=[5], 90.00th=[5], 95.00th=[6],
 | 99.00th=[8], 99.50th=[   10], 99.90th=[   28], 99.95th=[   37],
 | 99.99th=[   56]
bw (KB/s)  : min=20630, max=276480, per=6.30%, avg=43328.34, stdev=21905.92
lat (usec) : 2=0.84%, 4=49.45%, 10=49.13%, 20=0.37%, 50=0.18%
lat (usec) : 100=0.03%
  cpu  : usr=0.49%, sys=5.68%, ctx=31428, majf=0, minf=9
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued: total=r=0/w=10240/d=0, short=r=0/w=0/d=0

...what's above repeats 16 times...

Run status group 0 (all jobs):
  WRITE: io=163840MB, aggrb=687960KB/s, minb=42997KB/s, maxb=43270KB/s, 
mint=242331msec, maxt=243869msec

- And almost the same(?!) aggregated result from the second client: 
-

Run status group 0 (all jobs):
  WRITE: io=163840MB, aggrb=679401KB/s, minb=42462KB/s, maxb=42852KB/s, 
mint=244697msec, maxt=246941msec

- If I'll summarize: -
aggrb1 + aggrb2 = 687960KB/s + 679401KB/s = 1367MB/s

it looks like the same bandwidth from just one client aggrb=1219.8MB/s and it 
was divided? why?
Question: If I'll connect 12 clients nodes - each one can write just on 100MB/s?
Perhaps, I need to scale out our ceph up to 15(how many?) OSD nodes - and it'll 
serve 2 clients on the 1.3GB/s (bw of 10gig nic), or not?



health HEALTH_OK
 monmap e1: 3 mons at 
{mon1=192.168.56.251:6789/0,mon2=192.168.56.252:6789/0,mon3=192.168.56.253:6789/0},
 election epoch 140, quorum 0,1,2 mon1,mon2,mon3
 mdsmap e12: 1/1/1 up {0=mon3=up:active}
 osdmap e832: 31 osds: 30 up, 30 in
  pgmap v106186: 6144 pgs, 3 pools, 2306 GB data, 1379 kobjects
4624 GB used, 104 TB / 109 TB avail
6144 active+clean


Perhaps, I don't understand something in Ceph architecture? I thought, that:

Each spindel-disk can write ~ 100MB/s , and we have 10 SAS disks on each node = 
aggregated write speed is ~ 900MB/s (because of striping etc.)
And we have 3 OSD nodes, and objects are striped also on 30 osds - I thought 
it's also agg

Re: [ceph-users] State of nfs-ganesha CEPH fsal

2015-07-28 Thread Haomai Wang

On Tue, Jul 28, 2015 at 4:47 PM, Gregory Farnum  wrote:
> On Tue, Jul 28, 2015 at 8:01 AM, Burkhard Linke
>  wrote:
>> Hi,
>>
>> On 07/27/2015 05:42 PM, Gregory Farnum wrote:
>>>
>>> On Mon, Jul 27, 2015 at 4:33 PM, Burkhard Linke
>>>  wrote:

 Hi,

 the nfs-ganesha documentation states:

 "... This FSAL links to a modified version of the CEPH library that has
 been
 extended to expose its distributed cluster and replication facilities to
 the
 pNFS operations in the FSAL. ... The CEPH library modifications have not
 been merged into the upstream yet. "

 (https://github.com/nfs-ganesha/nfs-ganesha/wiki/Fsalsupport#ceph)

 Is this still the case with the hammer release?
>>>
>>> The FSAL has been upstream for quite a while, but it's not part of our
>>> regular testing yet and I'm not sure what it gets from the Ganesha
>>> side. I'd encourage you to test it, but be wary — we had a recent
>>> report of some issues we haven't been able to set up to reproduce yet.
>>
>> Can you give some details on that issues? I'm currently looking for a way to
>> provide NFS based access to CephFS to our desktop machines.
>
> Ummm...sadly I can't; we don't appear to have any tracker tickets and
> I'm not sure where the report went to. :( I think it was from
> Haomai...

My fault, I should report this to ticket.

I have forgotten the details about the problem, I submit the infos to IRC :-(

It related to the "ls" output. It will print the wrong user/group
owner as "-1", maybe related to root squash?

> -Greg
>
>>
>> The kernel NFS implementation in Ubuntu had some problems with CephFS in our
>> setup, which I was not able to resolve yet. Ganesha seems to be more
>> promising, since it uses libcephfs directly and does not need a mountpoint
>> of its own.
>>
>> Best regards,
>> Burkhard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Did maximum performance reached?

2015-07-28 Thread Johannes Formann

Hello,

what is the „size“ parameter of your pool?

Some math do show the impact:
size=3 means each write is written 6 times (3 copies, first journal, later 
disk). Calculating with 1.300MB/s „Client“ Bandwidth that means:

3 (size) * 1300 MB/s / 6 (SSD) => 650MB/s per SSD
3 (size) * 1300 MB/s / 30 (HDD) => 130MB/s per HDD

If you use size=3, the results are as good as one can expect. (Even with size=2 
the results won’t be bad)

greetings

Johannes

> Am 28.07.2015 um 10:53 schrieb Shneur Zalman Mattern :
> 
> We've built Ceph cluster:
> 3 mon nodes (one of them is combined with mds)
> 3 osd nodes (each one have 10 osd + 2 ssd for journaling)
> switch 24 ports x 10G
> 10 gigabit - for public network
> 20 gigabit bonding - between osds 
> Ubuntu 12.04.05
> Ceph 0.87.2
> -
> Clients has:
> 10 gigabit for ceph-connection
> CentOS 6.6 with kernel 3.19.8 equipped by cephfs-kmodule 
> 
> 
> 
> == fio-2.0.13 seqwrite, bs=1M, filesize=10G, parallel-jobs=16 ===
> Single client:
> 
> 
> Starting 16 processes
> 
> .below is just 1 job info
> trivial-readwrite-grid01: (groupid=0, jobs=1): err= 0: pid=10484: Tue Jul 28 
> 13:26:24 2015
>   write: io=10240MB, bw=78656KB/s, iops=76 , runt=133312msec
> slat (msec): min=1 , max=117 , avg=13.01, stdev=12.57
> clat (usec): min=1 , max=68 , avg= 3.61, stdev= 1.99
>  lat (msec): min=1 , max=117 , avg=13.01, stdev=12.57
> clat percentiles (usec):
>  |  1.00th=[1],  5.00th=[2], 10.00th=[2], 20.00th=[2],
>  | 30.00th=[3], 40.00th=[3], 50.00th=[3], 60.00th=[4],
>  | 70.00th=[4], 80.00th=[5], 90.00th=[5], 95.00th=[6],
>  | 99.00th=[9], 99.50th=[   10], 99.90th=[   23], 99.95th=[   28],
>  | 99.99th=[   62]
> bw (KB/s)  : min=35790, max=318215, per=6.31%, avg=78816.91, 
> stdev=26397.76
> lat (usec) : 2=1.33%, 4=54.43%, 10=43.54%, 20=0.56%, 50=0.11%
> lat (usec) : 100=0.03%
>   cpu  : usr=0.89%, sys=12.85%, ctx=58248, majf=0, minf=9
>   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  issued: total=r=0/w=10240/d=0, short=r=0/w=0/d=0
> 
> ...what's above repeats 16 times... 
> 
> Run status group 0 (all jobs):
>   WRITE: io=163840MB, aggrb=1219.8MB/s, minb=78060KB/s, maxb=78655KB/s, 
> mint=133312msec, maxt=134329msec
> 
> +
> Two clients:
> +
> below is just 1 job info
> trivial-readwrite-gridsrv: (groupid=0, jobs=1): err= 0: pid=10605: Tue Jul 28 
> 14:05:59 2015
>   write: io=10240MB, bw=43154KB/s, iops=42 , runt=242984msec
> slat (usec): min=991 , max=285653 , avg=23716.12, stdev=23960.60
> clat (usec): min=1 , max=65 , avg= 3.67, stdev= 2.02
>  lat (usec): min=994 , max=285664 , avg=23723.39, stdev=23962.22
> clat percentiles (usec):
>  |  1.00th=[2],  5.00th=[2], 10.00th=[2], 20.00th=[2],
>  | 30.00th=[3], 40.00th=[3], 50.00th=[3], 60.00th=[4],
>  | 70.00th=[4], 80.00th=[5], 90.00th=[5], 95.00th=[6],
>  | 99.00th=[8], 99.50th=[   10], 99.90th=[   28], 99.95th=[   37],
>  | 99.99th=[   56]
> bw (KB/s)  : min=20630, max=276480, per=6.30%, avg=43328.34, 
> stdev=21905.92
> lat (usec) : 2=0.84%, 4=49.45%, 10=49.13%, 20=0.37%, 50=0.18%
> lat (usec) : 100=0.03%
>   cpu  : usr=0.49%, sys=5.68%, ctx=31428, majf=0, minf=9
>   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  issued: total=r=0/w=10240/d=0, short=r=0/w=0/d=0
> 
> ...what's above repeats 16 times... 
> 
> Run status group 0 (all jobs):
>   WRITE: io=163840MB, aggrb=687960KB/s, minb=42997KB/s, maxb=43270KB/s, 
> mint=242331msec, maxt=243869msec
> 
> - And almost the same(?!) aggregated result from the second client: 
> -
> 
> Run status group 0 (all jobs):
>   WRITE: io=163840MB, aggrb=679401KB/s, minb=42462KB/s, maxb=42852KB/s, 
> mint=244697msec, maxt=246941msec
> 
> - If I'll summarize: -
> aggrb1 + aggrb2 = 687960KB/s + 679401KB/s = 1367MB/s 
> 
> it looks like the same bandwidth from just one client aggrb=1219.8MB/s and it 
> was divided? why?
> Question: If I'll connect 12 clients nodes - each one can write just on 
> 100MB/s?
> Perhaps, I need to scale out our ceph up to 15(how many?) OSD nodes - and 
> it'll serve 2 clients on the 1.3GB/s (bw of 10gig nic), or not? 
> 
> =

Re: [ceph-users] Did maximum performance reached?

2015-07-28 Thread Karan Singh

Hi

What type of clients do you have.

- Are they Linux physical OR VM mounting Ceph RBD or CephFS ??
- Or they are simply openstack / cloud instances using Ceph as cinder volumes 
or something like that ??


- Karan -

> On 28 Jul 2015, at 11:53, Shneur Zalman Mattern  wrote:
> 
> We've built Ceph cluster:
> 3 mon nodes (one of them is combined with mds)
> 3 osd nodes (each one have 10 osd + 2 ssd for journaling)
> switch 24 ports x 10G
> 10 gigabit - for public network
> 20 gigabit bonding - between osds 
> Ubuntu 12.04.05
> Ceph 0.87.2
> -
> Clients has:
> 10 gigabit for ceph-connection
> CentOS 6.6 with kernel 3.19.8 equipped by cephfs-kmodule 
> 
> 
> 
> == fio-2.0.13 seqwrite, bs=1M, filesize=10G, parallel-jobs=16 ===
> Single client:
> 
> 
> Starting 16 processes
> 
> .below is just 1 job info
> trivial-readwrite-grid01: (groupid=0, jobs=1): err= 0: pid=10484: Tue Jul 28 
> 13:26:24 2015
>   write: io=10240MB, bw=78656KB/s, iops=76 , runt=133312msec
> slat (msec): min=1 , max=117 , avg=13.01, stdev=12.57
> clat (usec): min=1 , max=68 , avg= 3.61, stdev= 1.99
>  lat (msec): min=1 , max=117 , avg=13.01, stdev=12.57
> clat percentiles (usec):
>  |  1.00th=[1],  5.00th=[2], 10.00th=[2], 20.00th=[2],
>  | 30.00th=[3], 40.00th=[3], 50.00th=[3], 60.00th=[4],
>  | 70.00th=[4], 80.00th=[5], 90.00th=[5], 95.00th=[6],
>  | 99.00th=[9], 99.50th=[   10], 99.90th=[   23], 99.95th=[   28],
>  | 99.99th=[   62]
> bw (KB/s)  : min=35790, max=318215, per=6.31%, avg=78816.91, 
> stdev=26397.76
> lat (usec) : 2=1.33%, 4=54.43%, 10=43.54%, 20=0.56%, 50=0.11%
> lat (usec) : 100=0.03%
>   cpu  : usr=0.89%, sys=12.85%, ctx=58248, majf=0, minf=9
>   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  issued: total=r=0/w=10240/d=0, short=r=0/w=0/d=0
> 
> ...what's above repeats 16 times... 
> 
> Run status group 0 (all jobs):
>   WRITE: io=163840MB, aggrb=1219.8MB/s, minb=78060KB/s, maxb=78655KB/s, 
> mint=133312msec, maxt=134329msec
> 
> +
> Two clients:
> +
> below is just 1 job info
> trivial-readwrite-gridsrv: (groupid=0, jobs=1): err= 0: pid=10605: Tue Jul 28 
> 14:05:59 2015
>   write: io=10240MB, bw=43154KB/s, iops=42 , runt=242984msec
> slat (usec): min=991 , max=285653 , avg=23716.12, stdev=23960.60
> clat (usec): min=1 , max=65 , avg= 3.67, stdev= 2.02
>  lat (usec): min=994 , max=285664 , avg=23723.39, stdev=23962.22
> clat percentiles (usec):
>  |  1.00th=[2],  5.00th=[2], 10.00th=[2], 20.00th=[2],
>  | 30.00th=[3], 40.00th=[3], 50.00th=[3], 60.00th=[4],
>  | 70.00th=[4], 80.00th=[5], 90.00th=[5], 95.00th=[6],
>  | 99.00th=[8], 99.50th=[   10], 99.90th=[   28], 99.95th=[   37],
>  | 99.99th=[   56]
> bw (KB/s)  : min=20630, max=276480, per=6.30%, avg=43328.34, 
> stdev=21905.92
> lat (usec) : 2=0.84%, 4=49.45%, 10=49.13%, 20=0.37%, 50=0.18%
> lat (usec) : 100=0.03%
>   cpu  : usr=0.49%, sys=5.68%, ctx=31428, majf=0, minf=9
>   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  issued: total=r=0/w=10240/d=0, short=r=0/w=0/d=0
> 
> ...what's above repeats 16 times... 
> 
> Run status group 0 (all jobs):
>   WRITE: io=163840MB, aggrb=687960KB/s, minb=42997KB/s, maxb=43270KB/s, 
> mint=242331msec, maxt=243869msec
> 
> - And almost the same(?!) aggregated result from the second client: 
> -
> 
> Run status group 0 (all jobs):
>   WRITE: io=163840MB, aggrb=679401KB/s, minb=42462KB/s, maxb=42852KB/s, 
> mint=244697msec, maxt=246941msec
> 
> - If I'll summarize: -
> aggrb1 + aggrb2 = 687960KB/s + 679401KB/s = 1367MB/s 
> 
> it looks like the same bandwidth from just one client aggrb=1219.8MB/s and it 
> was divided? why?
> Question: If I'll connect 12 clients nodes - each one can write just on 
> 100MB/s?
> Perhaps, I need to scale out our ceph up to 15(how many?) OSD nodes - and 
> it'll serve 2 clients on the 1.3GB/s (bw of 10gig nic), or not? 
> 
> 
> 
> health HEALTH_OK
>  monmap e1: 3 mons at 
> {mon1=192.168.56.251:6789/0,mon2=192.168.56.252:6789/0,mon3=192.168.56.253:6789/0},
>  election epoch 140, quorum 0,1,2 mon1,mon2,mon3
>  mdsmap e12: 1/1/1 up {0=mon3=up

Re: [ceph-users] State of nfs-ganesha CEPH fsal

2015-07-28 Thread Burkhard Linke


Hi,

On 07/28/2015 11:08 AM, Haomai Wang wrote:

On Tue, Jul 28, 2015 at 4:47 PM, Gregory Farnum  wrote:

On Tue, Jul 28, 2015 at 8:01 AM, Burkhard Linke
 wrote:


*snipsnap*
Can you give some details on that issues? I'm currently looking for 
a way to provide NFS based access to CephFS to our desktop machines. 

Ummm...sadly I can't; we don't appear to have any tracker tickets and
I'm not sure where the report went to. :( I think it was from
Haomai...

My fault, I should report this to ticket.

I have forgotten the details about the problem, I submit the infos to IRC :-(

It related to the "ls" output. It will print the wrong user/group
owner as "-1", maybe related to root squash?
Are you sure this problem is related to the CephFS FSAL? I also had a 
hard time setting up ganesha correctly, especially with respect to user 
and group mappings, especially with a kerberized setup.


I'm currently running a small test setup with one server and one client 
to single out the last kerberos related problems (nfs-ganesha 2.2.0 / 
Ceph Hammer 0.94.2 / Ubuntu 14.04). User/group listings have been OK so 
far. Do you remember whether the problem occurs every time or just 
arbitrarily?


Best regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] State of nfs-ganesha CEPH fsal

2015-07-28 Thread Haomai Wang

On Tue, Jul 28, 2015 at 5:28 PM, Burkhard Linke
 wrote:
> Hi,
>
> On 07/28/2015 11:08 AM, Haomai Wang wrote:
>>
>> On Tue, Jul 28, 2015 at 4:47 PM, Gregory Farnum  wrote:
>>>
>>> On Tue, Jul 28, 2015 at 8:01 AM, Burkhard Linke
>>>  wrote:
>
>
> *snipsnap*

 Can you give some details on that issues? I'm currently looking for a
 way to provide NFS based access to CephFS to our desktop machines.
>>>
>>> Ummm...sadly I can't; we don't appear to have any tracker tickets and
>>> I'm not sure where the report went to. :( I think it was from
>>> Haomai...
>>
>> My fault, I should report this to ticket.
>>
>> I have forgotten the details about the problem, I submit the infos to IRC
>> :-(
>>
>> It related to the "ls" output. It will print the wrong user/group
>> owner as "-1", maybe related to root squash?
>
> Are you sure this problem is related to the CephFS FSAL? I also had a hard
> time setting up ganesha correctly, especially with respect to user and group
> mappings, especially with a kerberized setup.
>
> I'm currently running a small test setup with one server and one client to
> single out the last kerberos related problems (nfs-ganesha 2.2.0 / Ceph
> Hammer 0.94.2 / Ubuntu 14.04). User/group listings have been OK so far. Do
> you remember whether the problem occurs every time or just arbitrarily?
>

Great!

I'm not sure the reason. I guess it may related to nfs-ganesha version
or client distro version.

> Best regards,
> Burkhard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Did maximum performance reached?

2015-07-28 Thread Shneur Zalman Mattern

Hi, Johannes (that's my grandpa's name)

The size is 2, do you really think that number of replicas can increase 
performance?
on the  http://ceph.com/docs/master/architecture/
written "Note: Striping is independent of object replicas. Since CRUSH 
replicates objects across OSDs, stripes get replicated automatically. "

OK, I'll check it,
Regards, Shneur

From: Johannes Formann 
Sent: Tuesday, July 28, 2015 12:09 PM
To: Shneur Zalman Mattern
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Did maximum performance reached?

Hello,

what is the „size“ parameter of your pool?

Some math do show the impact:
size=3 means each write is written 6 times (3 copies, first journal, later 
disk). Calculating with 1.300MB/s „Client“ Bandwidth that means:

3 (size) * 1300 MB/s / 6 (SSD) => 650MB/s per SSD
3 (size) * 1300 MB/s / 30 (HDD) => 130MB/s per HDD

If you use size=3, the results are as good as one can expect. (Even with size=2 
the results won’t be bad)

greetings

Johannes

> Am 28.07.2015 um 10:53 schrieb Shneur Zalman Mattern :
>
> We've built Ceph cluster:
> 3 mon nodes (one of them is combined with mds)
> 3 osd nodes (each one have 10 osd + 2 ssd for journaling)
> switch 24 ports x 10G
> 10 gigabit - for public network
> 20 gigabit bonding - between osds
> Ubuntu 12.04.05
> Ceph 0.87.2
> -
> Clients has:
> 10 gigabit for ceph-connection
> CentOS 6.6 with kernel 3.19.8 equipped by cephfs-kmodule
>
>
>
> == fio-2.0.13 seqwrite, bs=1M, filesize=10G, parallel-jobs=16 ===
> Single client:
> 
>
> Starting 16 processes
>
> .below is just 1 job info
> trivial-readwrite-grid01: (groupid=0, jobs=1): err= 0: pid=10484: Tue Jul 28 
> 13:26:24 2015
>   write: io=10240MB, bw=78656KB/s, iops=76 , runt=133312msec
> slat (msec): min=1 , max=117 , avg=13.01, stdev=12.57
> clat (usec): min=1 , max=68 , avg= 3.61, stdev= 1.99
>  lat (msec): min=1 , max=117 , avg=13.01, stdev=12.57
> clat percentiles (usec):
>  |  1.00th=[1],  5.00th=[2], 10.00th=[2], 20.00th=[2],
>  | 30.00th=[3], 40.00th=[3], 50.00th=[3], 60.00th=[4],
>  | 70.00th=[4], 80.00th=[5], 90.00th=[5], 95.00th=[6],
>  | 99.00th=[9], 99.50th=[   10], 99.90th=[   23], 99.95th=[   28],
>  | 99.99th=[   62]
> bw (KB/s)  : min=35790, max=318215, per=6.31%, avg=78816.91, 
> stdev=26397.76
> lat (usec) : 2=1.33%, 4=54.43%, 10=43.54%, 20=0.56%, 50=0.11%
> lat (usec) : 100=0.03%
>   cpu  : usr=0.89%, sys=12.85%, ctx=58248, majf=0, minf=9
>   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  issued: total=r=0/w=10240/d=0, short=r=0/w=0/d=0
>
> ...what's above repeats 16 times...
>
> Run status group 0 (all jobs):
>   WRITE: io=163840MB, aggrb=1219.8MB/s, minb=78060KB/s, maxb=78655KB/s, 
> mint=133312msec, maxt=134329msec
>
> +
> Two clients:
> +
> below is just 1 job info
> trivial-readwrite-gridsrv: (groupid=0, jobs=1): err= 0: pid=10605: Tue Jul 28 
> 14:05:59 2015
>   write: io=10240MB, bw=43154KB/s, iops=42 , runt=242984msec
> slat (usec): min=991 , max=285653 , avg=23716.12, stdev=23960.60
> clat (usec): min=1 , max=65 , avg= 3.67, stdev= 2.02
>  lat (usec): min=994 , max=285664 , avg=23723.39, stdev=23962.22
> clat percentiles (usec):
>  |  1.00th=[2],  5.00th=[2], 10.00th=[2], 20.00th=[2],
>  | 30.00th=[3], 40.00th=[3], 50.00th=[3], 60.00th=[4],
>  | 70.00th=[4], 80.00th=[5], 90.00th=[5], 95.00th=[6],
>  | 99.00th=[8], 99.50th=[   10], 99.90th=[   28], 99.95th=[   37],
>  | 99.99th=[   56]
> bw (KB/s)  : min=20630, max=276480, per=6.30%, avg=43328.34, 
> stdev=21905.92
> lat (usec) : 2=0.84%, 4=49.45%, 10=49.13%, 20=0.37%, 50=0.18%
> lat (usec) : 100=0.03%
>   cpu  : usr=0.49%, sys=5.68%, ctx=31428, majf=0, minf=9
>   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  issued: total=r=0/w=10240/d=0, short=r=0/w=0/d=0
>
> ...what's above repeats 16 times...
>
> Run status group 0 (all jobs):
>   WRITE: io=163840MB, aggrb=687960KB/s, minb=42997KB/s, maxb=43270KB/s, 
> mint=242331msec, maxt=243869msec
>
> - And almost the same(?!) aggregated result from the second client: 
> -
>
> Run status group 0 (all jobs):
>   WRITE: io=163840MB, aggrb=679401KB/s, minb=42462KB/s, maxb=42852KB/s, 
> mi

[ceph-users] Did maximum performance reached?

2015-07-28 Thread Shneur Zalman Mattern

Hi, Karan!

That's physical CentOS clients of CephFS mounted by kernel-module (kernel 4.1.3)

Thanks

>Hi
>
>What type of clients do you have.
>
>- Are they Linux physical OR VM mounting Ceph RBD or CephFS ??
>- Or they are simply openstack / cloud instances using Ceph as cinder volumes 
>or something like that ??
>
>
>- Karan -

>> On 28 Jul 2015, at 11:53, Shneur Zalman Mattern  wrote:
>>
>> We've built Ceph cluster:
>>3 mon nodes (one of them is combined with mds)
>> 3 osd nodes (each one have 10 osd + 2 ssd for journaling)
>> switch 24 ports x 10G
>> 10 gigabit - for public network
>> 20 gigabit bonding - between osds
>> Ubuntu 12.04.05
>> Ceph 0.87.2
>> -
>> Clients has:
>> 10 gigabit for ceph-connection
>> CentOS 6.6 with kernel 4.1.3 equipped by cephfs-kmodule
>>
>>
>>
 
 

This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals & computer 
viruses.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] hadoop on ceph

2015-07-28 Thread Gregory Farnum

On Mon, Jul 27, 2015 at 6:34 PM, Patrick McGarry  wrote:
> Moving this to the ceph-user list where it has a better chance of
> being answered.
>
>
>
> On Mon, Jul 27, 2015 at 5:35 AM, jingxia@baifendian.com
>  wrote:
>> Dear ,
>> I have questions to ask.
>> The doc says hadoop on ceph but requires Hadoop 1.1.X stable series
>> I want to know if CephFS Hadoop plugin can be used by Hadoop 2.6.0  now or
>> it is not support Hadoop2.6.0 and still being developed?
>> If Ceph can not be used by Hadoop2.6.0,then i want to know when it will can
>> be used and is there a team to developing it?
>> I use Hadoop 1.1.2 on ceph is ok, but when hadoop 2.6.0 use ceph,there is
>> something wrong and hdfs is still on.

The current Hadoop plugin we test with should run against Hadoop 2.
There are a couple of different versions floating around so maybe you
managed to grab the old one?
But in any case the Ceph plugin has very little to do with whether
HDFS gets started or not; that's all in your configuration steps and
scripts.

Development on the Hadoop integration is pretty sporadic but it runs
in our nightlies so we notice if it breaks.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Did maximum performance reached?

2015-07-28 Thread Shneur Zalman Mattern

Hi,

But my question is why speed is divided between clients?
And how much OSDnodes, OSDdaemos, PGs, I have to add/remove to ceph,
that each cephfs-client could write with his max network speed (10Gbit/s ~ 
1.2GB/s)???



From: Johannes Formann 
Sent: Tuesday, July 28, 2015 12:46 PM
To: Shneur Zalman Mattern
Subject: Re: [ceph-users] Did maximum performance reached?

Hi,

size=3 would decrease your performance. But with size=2 your results are not 
bad too:
Math:
size=2 means each write is written 4 times (2 copies, first journal, later 
disk). Calculating with 1.300MB/s „Client“ Bandwidth that means:

2 (size) * 1300 MB/s / 6 (SSD) => 433MB/s each SSD
2 (size) * 1300 MB/s / 30 (HDD) => 87MB/s each HDD


greetings

Johannes

> Am 28.07.2015 um 11:41 schrieb Shneur Zalman Mattern :
>
> Hi, Johannes (that's my grandpa's name)
>
> The size is 2, do you really think that number of replicas can increase 
> performance?
> on the  http://ceph.com/docs/master/architecture/
> written "Note: Striping is independent of object replicas. Since CRUSH 
> replicates objects across OSDs, stripes get replicated automatically. "
>
> OK, I'll check it,
> Regards, Shneur
> 
> From: Johannes Formann 
> Sent: Tuesday, July 28, 2015 12:09 PM
> To: Shneur Zalman Mattern
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Did maximum performance reached?
>
> Hello,
>
> what is the „size“ parameter of your pool?
>
> Some math do show the impact:
> size=3 means each write is written 6 times (3 copies, first journal, later 
> disk). Calculating with 1.300MB/s „Client“ Bandwidth that means:
>
> 3 (size) * 1300 MB/s / 6 (SSD) => 650MB/s per SSD
> 3 (size) * 1300 MB/s / 30 (HDD) => 130MB/s per HDD
>
> If you use size=3, the results are as good as one can expect. (Even with 
> size=2 the results won’t be bad)
>
> greetings
>
> Johannes
>
>> Am 28.07.2015 um 10:53 schrieb Shneur Zalman Mattern :
>>
>> We've built Ceph cluster:
>>3 mon nodes (one of them is combined with mds)
>>3 osd nodes (each one have 10 osd + 2 ssd for journaling)
>>switch 24 ports x 10G
>>10 gigabit - for public network
>>20 gigabit bonding - between osds
>>Ubuntu 12.04.05
>>Ceph 0.87.2
>> -
>> Clients has:
>>10 gigabit for ceph-connection
>>CentOS 6.6 with kernel 3.19.8 equipped by cephfs-kmodule
>>
>>
>>
>> == fio-2.0.13 seqwrite, bs=1M, filesize=10G, parallel-jobs=16 ===
>> Single client:
>> 
>>
>> Starting 16 processes
>>
>> .below is just 1 job info
>> trivial-readwrite-grid01: (groupid=0, jobs=1): err= 0: pid=10484: Tue Jul 28 
>> 13:26:24 2015
>>  write: io=10240MB, bw=78656KB/s, iops=76 , runt=133312msec
>>slat (msec): min=1 , max=117 , avg=13.01, stdev=12.57
>>clat (usec): min=1 , max=68 , avg= 3.61, stdev= 1.99
>> lat (msec): min=1 , max=117 , avg=13.01, stdev=12.57
>>clat percentiles (usec):
>> |  1.00th=[1],  5.00th=[2], 10.00th=[2], 20.00th=[2],
>> | 30.00th=[3], 40.00th=[3], 50.00th=[3], 60.00th=[4],
>> | 70.00th=[4], 80.00th=[5], 90.00th=[5], 95.00th=[6],
>> | 99.00th=[9], 99.50th=[   10], 99.90th=[   23], 99.95th=[   28],
>> | 99.99th=[   62]
>>bw (KB/s)  : min=35790, max=318215, per=6.31%, avg=78816.91, 
>> stdev=26397.76
>>lat (usec) : 2=1.33%, 4=54.43%, 10=43.54%, 20=0.56%, 50=0.11%
>>lat (usec) : 100=0.03%
>>  cpu  : usr=0.89%, sys=12.85%, ctx=58248, majf=0, minf=9
>>  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>> submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>> >=64=0.0%
>> complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>> >=64=0.0%
>> issued: total=r=0/w=10240/d=0, short=r=0/w=0/d=0
>>
>> ...what's above repeats 16 times...
>>
>> Run status group 0 (all jobs):
>>  WRITE: io=163840MB, aggrb=1219.8MB/s, minb=78060KB/s, maxb=78655KB/s, 
>> mint=133312msec, maxt=134329msec
>>
>> +
>> Two clients:
>> +
>> below is just 1 job info
>> trivial-readwrite-gridsrv: (groupid=0, jobs=1): err= 0: pid=10605: Tue Jul 
>> 28 14:05:59 2015
>>  write: io=10240MB, bw=43154KB/s, iops=42 , runt=242984msec
>>slat (usec): min=991 , max=285653 , avg=23716.12, stdev=23960.60
>>clat (usec): min=1 , max=65 , avg= 3.67, stdev= 2.02
>> lat (usec): min=994 , max=285664 , avg=23723.39, stdev=23962.22
>>clat percentiles (usec):
>> |  1.00th=[2],  5.00th=[2], 10.00th=[2], 20.00th=[2],
>> | 30.00th=[3], 40.00th=[3], 50.00th=[3], 60.00th=[4],
>> | 70.00th=[4], 80.00th=[5], 90.00th=[5], 95.00th=[6],
>> | 99.00th=[8], 99.50th=[   10], 99.90th=[   28], 99.95th=[   37],
>> | 99.99th=[   56]
>>bw (KB/s)  : min=206

Re: [ceph-users] OSD RAM usage values

2015-07-28 Thread Kenneth Waegeman




On 07/17/2015 02:50 PM, Gregory Farnum wrote:

On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman
 wrote:

Hi all,

I've read in the documentation that OSDs use around 512MB on a healthy
cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram)
Now, our OSD's are all using around 2GB of RAM memory while the cluster is
healthy.


   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
29784 root  20   0 6081276 2.535g   4740 S   0.7  8.1   1346:55 ceph-osd
32818 root  20   0 5417212 2.164g  24780 S  16.2  6.9   1238:55 ceph-osd
25053 root  20   0 5386604 2.159g  27864 S   0.7  6.9   1192:08 ceph-osd
33875 root  20   0 5345288 2.092g   3544 S   0.7  6.7   1188:53 ceph-osd
30779 root  20   0 5474832 2.090g  28892 S   1.0  6.7   1142:29 ceph-osd
22068 root  20   0 5191516 2.000g  28664 S   0.7  6.4  31:56.72 ceph-osd
34932 root  20   0 5242656 1.994g   4536 S   0.3  6.4   1144:48 ceph-osd
26883 root  20   0 5178164 1.938g   6164 S   0.3  6.2   1173:01 ceph-osd
31796 root  20   0 5193308 1.916g  27000 S  16.2  6.1 923:14.87 ceph-osd
25958 root  20   0 5193436 1.901g   2900 S   0.7  6.1   1039:53 ceph-osd
27826 root  20   0 5225764 1.845g   5576 S   1.0  5.9   1031:15 ceph-osd
36011 root  20   0 5111660 1.823g  20512 S  15.9  5.8   1093:01 ceph-osd
19736 root  20   0 2134680 0.994g  0 S   0.3  3.2  46:13.47 ceph-osd



[root@osd003 ~]# ceph status
2015-07-17 14:03:13.865063 7f1fde5f0700 -1 WARNING: the following dangerous
and experimental features are enabled: keyvaluestore
2015-07-17 14:03:13.887087 7f1fde5f0700 -1 WARNING: the following dangerous
and experimental features are enabled: keyvaluestore
 cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
  health HEALTH_OK
  monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
 election epoch 58, quorum 0,1,2 mds01,mds02,mds03
  mdsmap e17218: 1/1/1 up {0=mds03=up:active}, 1 up:standby
  osdmap e25542: 258 osds: 258 up, 258 in
   pgmap v2460163: 4160 pgs, 4 pools, 228 TB data, 154 Mobjects
 270 TB used, 549 TB / 819 TB avail
 4152 active+clean
8 active+clean+scrubbing+deep


We are using erasure code on most of our OSDs, so maybe that is a reason.
But also the cache-pool filestore OSDS on 200GB SSDs are using 2GB of RAM.
Our erasure code pool (16*14 osds) have a pg_num of 2048; our cache pool
(2*14 OSDS) has a pg_num of 1024.

Are these normal values for this configuration, and is the documentation a
bit outdated, or should we look into something else?


2GB of RSS is larger than I would have expected, but not unreasonable.
In particular I don't think we've gathered numbers on either EC pools
or on the effects of the caching processes.


Which data is actually in memory of the OSDS?
Is this mostly cached data?
We are short on memory on these servers, can we have influence on this?

Thanks again!
Kenneth


-Greg


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Did maximum performance reached?

2015-07-28 Thread Shneur Zalman Mattern

Hi!

And so, in your math
I need to build size = osd, 30 replicas for my cluster of 120TB - to get my 
demans 
And 4TB real storage capacity in price 3000$ per 1TB? Joke?

All the best,
Shneur

From: Johannes Formann 
Sent: Tuesday, July 28, 2015 12:46 PM
To: Shneur Zalman Mattern
Subject: Re: [ceph-users] Did maximum performance reached?

Hi,

size=3 would decrease your performance. But with size=2 your results are not 
bad too:
Math:
size=2 means each write is written 4 times (2 copies, first journal, later 
disk). Calculating with 1.300MB/s „Client“ Bandwidth that means:

2 (size) * 1300 MB/s / 6 (SSD) => 433MB/s each SSD
2 (size) * 1300 MB/s / 30 (HDD) => 87MB/s each HDD


greetings

Johannes

 
 

This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals & computer 
viruses.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Weird behaviour of cephfs with samba

2015-07-28 Thread Gregory Farnum

On Mon, Jul 27, 2015 at 6:25 PM, Jörg Henne  wrote:
> Gregory Farnum  writes:
>>
>> Yeah, I think there were some directory listing bugs in that version
>> that Samba is probably running into. They're fixed in a newer kernel
>> release (I'm not sure which one exactly, sorry).
>
> Ok, thanks, good to know!
>
>> > and then detaches itself but the mountpoint stays empty no matter what.
>> > /var/log/ceph/ceph-client.admin.log isn't enlighting as well. I've never
>> > used a FUSE before, though, so I might be overlooking something.
>>
>> Uh, that's odd. What do you mean it's empty no matter what? Is the
>> ceph-fuse process actually still running?
>
> Yes, e.g.
>
>  8525 pts/0Sl 0:00 ceph-fuse -m 10.208.66.1:6789 /mnt/regtest2
>
> But
>
> root@gru:/mnt# ls /mnt/regtest2 | wc -l
> 0
>
> With the kernel module I mount just a subpath of the cephfs space like in
>
> /etc/fstab:
> my_monhost:/regression-test /mnt/regtest ...
>
> which ceph-fuse doesn't seem to support, but then I would expect
> regression-test to simply be a sub-directory of /mnt/regtest2.

You can mount subtrees with the -r option to ceph-fuse.

Once you've started it up you should find a file like
"client.admin.[0-9]*.asok" in (I think?) /var/run/ceph. You can run
"ceph --admin-daemon /var/run/ceph/{client_asok} status" and provide
the output to see if it's doing anything useful. Or set "debug client
= 20" in the config and then upload the client log file either
publicly or with ceph-post-file and I'll take a quick look to see
what's going on.
-Greg

>
>> (You should also be able to talk to Ceph directly via the Samba
>> daemon; the bindings are in upstream Samba although you probably need
>> to install one of the Ceph packages to make it work. That's the way we
>> test in our "nightlies".)
>
> Indeed, it seems like something is missing:
>
> [2015/07/27 19:21:40.080572,  0] ../lib/util/modules.c:48(load_module)
>   Error loading module '/usr/lib/x86_64-linux-gnu/samba/vfs/ceph.so':
> /usr/lib/x86_64-linux-gnu/samba/vfs/ceph.so: cannot open shared object file:
> No such file or directory

Mmm, that looks like a Samba config issue which unfortunately I don't
know much about. Perhaps you need to install these modules
individually? It looks like our nightly tests are just getting the
Ceph VFS installed by default. :/
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Did maximum performance reached?

2015-07-28 Thread Johannes Formann

The speed is divided because ist fair :)
You reach the limit your hardware (I guess the SSDs) can deliver.

For 2 clients each doing 1200 MB/s you’ll have basically to double the amount 
of OSDs.

greetings

Johannes

> Am 28.07.2015 um 11:56 schrieb Shneur Zalman Mattern :
> 
> Hi,
> 
> But my question is why speed is divided between clients?
> And how much OSDnodes, OSDdaemos, PGs, I have to add/remove to ceph,
> that each cephfs-client could write with his max network speed (10Gbit/s ~ 
> 1.2GB/s)???
> 
> 
> 
> From: Johannes Formann 
> Sent: Tuesday, July 28, 2015 12:46 PM
> To: Shneur Zalman Mattern
> Subject: Re: [ceph-users] Did maximum performance reached?
> 
> Hi,
> 
> size=3 would decrease your performance. But with size=2 your results are not 
> bad too:
> Math:
> size=2 means each write is written 4 times (2 copies, first journal, later 
> disk). Calculating with 1.300MB/s „Client“ Bandwidth that means:
> 
> 2 (size) * 1300 MB/s / 6 (SSD) => 433MB/s each SSD
> 2 (size) * 1300 MB/s / 30 (HDD) => 87MB/s each HDD
> 
> 
> greetings
> 
> Johannes
> 
>> Am 28.07.2015 um 11:41 schrieb Shneur Zalman Mattern :
>> 
>> Hi, Johannes (that's my grandpa's name)
>> 
>> The size is 2, do you really think that number of replicas can increase 
>> performance?
>> on the  http://ceph.com/docs/master/architecture/
>> written "Note: Striping is independent of object replicas. Since CRUSH 
>> replicates objects across OSDs, stripes get replicated automatically. "
>> 
>> OK, I'll check it,
>> Regards, Shneur
>> 
>> From: Johannes Formann 
>> Sent: Tuesday, July 28, 2015 12:09 PM
>> To: Shneur Zalman Mattern
>> Cc: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] Did maximum performance reached?
>> 
>> Hello,
>> 
>> what is the „size“ parameter of your pool?
>> 
>> Some math do show the impact:
>> size=3 means each write is written 6 times (3 copies, first journal, later 
>> disk). Calculating with 1.300MB/s „Client“ Bandwidth that means:
>> 
>> 3 (size) * 1300 MB/s / 6 (SSD) => 650MB/s per SSD
>> 3 (size) * 1300 MB/s / 30 (HDD) => 130MB/s per HDD
>> 
>> If you use size=3, the results are as good as one can expect. (Even with 
>> size=2 the results won’t be bad)
>> 
>> greetings
>> 
>> Johannes
>> 
>>> Am 28.07.2015 um 10:53 schrieb Shneur Zalman Mattern :
>>> 
>>> We've built Ceph cluster:
>>>   3 mon nodes (one of them is combined with mds)
>>>   3 osd nodes (each one have 10 osd + 2 ssd for journaling)
>>>   switch 24 ports x 10G
>>>   10 gigabit - for public network
>>>   20 gigabit bonding - between osds
>>>   Ubuntu 12.04.05
>>>   Ceph 0.87.2
>>> -
>>> Clients has:
>>>   10 gigabit for ceph-connection
>>>   CentOS 6.6 with kernel 3.19.8 equipped by cephfs-kmodule
>>> 
>>> 
>>> 
>>> == fio-2.0.13 seqwrite, bs=1M, filesize=10G, parallel-jobs=16 
>>> ===
>>> Single client:
>>> 
>>> 
>>> Starting 16 processes
>>> 
>>> .below is just 1 job info
>>> trivial-readwrite-grid01: (groupid=0, jobs=1): err= 0: pid=10484: Tue Jul 
>>> 28 13:26:24 2015
>>> write: io=10240MB, bw=78656KB/s, iops=76 , runt=133312msec
>>>   slat (msec): min=1 , max=117 , avg=13.01, stdev=12.57
>>>   clat (usec): min=1 , max=68 , avg= 3.61, stdev= 1.99
>>>lat (msec): min=1 , max=117 , avg=13.01, stdev=12.57
>>>   clat percentiles (usec):
>>>|  1.00th=[1],  5.00th=[2], 10.00th=[2], 20.00th=[2],
>>>| 30.00th=[3], 40.00th=[3], 50.00th=[3], 60.00th=[4],
>>>| 70.00th=[4], 80.00th=[5], 90.00th=[5], 95.00th=[6],
>>>| 99.00th=[9], 99.50th=[   10], 99.90th=[   23], 99.95th=[   28],
>>>| 99.99th=[   62]
>>>   bw (KB/s)  : min=35790, max=318215, per=6.31%, avg=78816.91, 
>>> stdev=26397.76
>>>   lat (usec) : 2=1.33%, 4=54.43%, 10=43.54%, 20=0.56%, 50=0.11%
>>>   lat (usec) : 100=0.03%
>>> cpu  : usr=0.89%, sys=12.85%, ctx=58248, majf=0, minf=9
>>> IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>>>submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>>> >=64=0.0%
>>>complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>>> >=64=0.0%
>>>issued: total=r=0/w=10240/d=0, short=r=0/w=0/d=0
>>> 
>>> ...what's above repeats 16 times...
>>> 
>>> Run status group 0 (all jobs):
>>> WRITE: io=163840MB, aggrb=1219.8MB/s, minb=78060KB/s, maxb=78655KB/s, 
>>> mint=133312msec, maxt=134329msec
>>> 
>>> +
>>> Two clients:
>>> +
>>> below is just 1 job info
>>> trivial-readwrite-gridsrv: (groupid=0, jobs=1): err= 0: pid=10605: Tue Jul 
>>> 28 14:05:59 2015
>>> write: io=10240MB, bw=43154KB/s, iops=42 , runt=242984msec
>>>   slat (usec): min=991 , max=285653 , avg=23716.12, stdev=23960.60
>>>   clat (usec): min=1 , max=65 , avg= 3.67, stdev= 2.02
>>>lat (usec): min=994 ,

Re: [ceph-users] OSD RAM usage values

2015-07-28 Thread Gregory Farnum

On Tue, Jul 28, 2015 at 11:00 AM, Kenneth Waegeman
 wrote:
>
>
> On 07/17/2015 02:50 PM, Gregory Farnum wrote:
>>
>> On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman
>>  wrote:
>>>
>>> Hi all,
>>>
>>> I've read in the documentation that OSDs use around 512MB on a healthy
>>> cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram)
>>> Now, our OSD's are all using around 2GB of RAM memory while the cluster
>>> is
>>> healthy.
>>>
>>>
>>>PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
>>> COMMAND
>>> 29784 root  20   0 6081276 2.535g   4740 S   0.7  8.1   1346:55
>>> ceph-osd
>>> 32818 root  20   0 5417212 2.164g  24780 S  16.2  6.9   1238:55
>>> ceph-osd
>>> 25053 root  20   0 5386604 2.159g  27864 S   0.7  6.9   1192:08
>>> ceph-osd
>>> 33875 root  20   0 5345288 2.092g   3544 S   0.7  6.7   1188:53
>>> ceph-osd
>>> 30779 root  20   0 5474832 2.090g  28892 S   1.0  6.7   1142:29
>>> ceph-osd
>>> 22068 root  20   0 5191516 2.000g  28664 S   0.7  6.4  31:56.72
>>> ceph-osd
>>> 34932 root  20   0 5242656 1.994g   4536 S   0.3  6.4   1144:48
>>> ceph-osd
>>> 26883 root  20   0 5178164 1.938g   6164 S   0.3  6.2   1173:01
>>> ceph-osd
>>> 31796 root  20   0 5193308 1.916g  27000 S  16.2  6.1 923:14.87
>>> ceph-osd
>>> 25958 root  20   0 5193436 1.901g   2900 S   0.7  6.1   1039:53
>>> ceph-osd
>>> 27826 root  20   0 5225764 1.845g   5576 S   1.0  5.9   1031:15
>>> ceph-osd
>>> 36011 root  20   0 5111660 1.823g  20512 S  15.9  5.8   1093:01
>>> ceph-osd
>>> 19736 root  20   0 2134680 0.994g  0 S   0.3  3.2  46:13.47
>>> ceph-osd
>>>
>>>
>>>
>>> [root@osd003 ~]# ceph status
>>> 2015-07-17 14:03:13.865063 7f1fde5f0700 -1 WARNING: the following
>>> dangerous
>>> and experimental features are enabled: keyvaluestore
>>> 2015-07-17 14:03:13.887087 7f1fde5f0700 -1 WARNING: the following
>>> dangerous
>>> and experimental features are enabled: keyvaluestore
>>>  cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
>>>   health HEALTH_OK
>>>   monmap e1: 3 mons at
>>>
>>> {mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
>>>  election epoch 58, quorum 0,1,2 mds01,mds02,mds03
>>>   mdsmap e17218: 1/1/1 up {0=mds03=up:active}, 1 up:standby
>>>   osdmap e25542: 258 osds: 258 up, 258 in
>>>pgmap v2460163: 4160 pgs, 4 pools, 228 TB data, 154 Mobjects
>>>  270 TB used, 549 TB / 819 TB avail
>>>  4152 active+clean
>>> 8 active+clean+scrubbing+deep
>>>
>>>
>>> We are using erasure code on most of our OSDs, so maybe that is a reason.
>>> But also the cache-pool filestore OSDS on 200GB SSDs are using 2GB of
>>> RAM.
>>> Our erasure code pool (16*14 osds) have a pg_num of 2048; our cache pool
>>> (2*14 OSDS) has a pg_num of 1024.
>>>
>>> Are these normal values for this configuration, and is the documentation
>>> a
>>> bit outdated, or should we look into something else?
>>
>>
>> 2GB of RSS is larger than I would have expected, but not unreasonable.
>> In particular I don't think we've gathered numbers on either EC pools
>> or on the effects of the caching processes.
>
>
> Which data is actually in memory of the OSDS?
> Is this mostly cached data?
> We are short on memory on these servers, can we have influence on this?

Mmm, we've discussed this a few times on the mailing list. The CERN
guys published a document on experimenting with a very large cluster
and not enough RAM, but there's nothing I would really recommend
changing for a production system, especially an EC one, if you aren't
intimately familiar with what's going on.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Did maximum performance reached?

2015-07-28 Thread Shneur Zalman Mattern

Oh, now I've to cry :-)
not because it's not SSDs... it's SAS2 HDDs

Because, I need to build something for 140 clients... 4200 OSDs

:-(

Looks like, I can pickup my performance by SSDs, but I need a huge capacity ~ 
2PB 
Perhaps, tiering cache pool can save my money, but I've read here - that it's 
slower than all people think...

:-(

Why Lustre is more performable? There're same HDDs?

 
 

This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals & computer 
viruses.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Did maximum performance reached?

2015-07-28 Thread John Spray




On 28/07/15 11:17, Shneur Zalman Mattern wrote:

Oh, now I've to cry :-)
not because it's not SSDs... it's SAS2 HDDs

Because, I need to build something for 140 clients... 4200 OSDs

:-(

Looks like, I can pickup my performance by SSDs, but I need a huge capacity ~ 
2PB
Perhaps, tiering cache pool can save my money, but I've read here - that it's 
slower than all people think...

:-(

Why Lustre is more performable? There're same HDDs?


Lustre isn't (A) creating two copies of your data, and it's (B) not 
executing disk writes as atomic transactions (i.e. no data writeahead log).


The A tradeoff is that while a Lustre system typically requires an 
expensive dual ported RAID controller, Ceph doesn't.  You take the money 
you saved on RAID controllers have spend it on having a larger number of 
cheaper hosts and drives.  If you've already bought the Lustre-oriented 
hardware then my advice would be to run Lustre on it :-)


The efficient way of handling B is to use SSD journals for your OSDs.  
Typical Ceph servers have one SSD per approx 4 OSDs.


John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Weird behaviour of cephfs with samba

2015-07-28 Thread Dzianis Kahanovich

I use cephfs over samba vfs and have some issues.

1) If I use >1 stacked vfs (ceph & scannedonly) - I have problems with file
order, but solved by "dirsort" vfs ("vfs objects = scannedonly dirsort ceph").
Single "ceph" vfs looks good too (and I use it single for fast internal shares),
but you can try add "dirsort" ("vfs objects = dirsort ceph").

2) I use 2 my patches:
https://github.com/mahatma-kaganovich/raw/tree/master/app-portage/ppatch/files/extensions/net-fs/samba/compile
- to support "max disk size" and to secure chown. About chown: I unsure about
strict follow standard system behaviour, but it works for me, without - user can
chown() even to root. I put first (max disk size) patch into samba bugzilla
times ago, second patch - no, I unsure about it correctness, but sure about
security hole.

Jörg Henne пишет:
> Hi all,
> 
> the faq at http://ceph.com/docs/cuttlefish/faq/ mentions the possibility to
> run export a mounted cephfs via samba. This combination exhibits a very
> weird behaviour, though.
> 
> We have a directory on cephfs with many small xml snippets. If I repeadtedly
> ls the directory on Unix, I get the same answer each and every time:
> 
> root@gru:/mnt/regtest/regressiontestdata2/assets# while true; do ls|wc -l;
> sleep 1; done
> 851
> 851
> 851
> ... and so on
> 
> If I do the same on the directory exported and mounted via SMB under Windows
> the result looks like that (output generated unter cygwin, but effect is
> present with Windows Explorer as well):
> 
> $ while true; do ls|wc -l; sleep 1; done
> 380
> 380
> 380
> 380
> 380
> 1451
> 362
> 851
> 851
> 851
> 851
> 851
> 851
> 851
> 851
> 1451
> 362
> 851
> 851
> 851
> ...
> 
> The problem does not seem to be related to Samba. If I copy the files to an
> XFS volume and export that, things look fine.
> 
> Thanks
> Joerg Henne
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which kernel version can help avoid kernel client deadlock

2015-07-28 Thread Ilya Dryomov

On Tue, Jul 28, 2015 at 11:19 AM, van  wrote:
> Hi, Ilya,
>
>   Thanks for your quick reply.
>
>   Here is the link http://ceph.com/docs/cuttlefish/faq/  , under the "HOW
> CAN I GIVE CEPH A TRY?” section which talk about the old kernel stuff.
>
>   By the way, what’s the main reason of using kernel 4.1, is there a lot of
> critical bugs fixed in that version despite perf improvements?
>   I am worrying kernel 4.1 is too new that may introduce other problems.

Well, I'm not sure what exactly is in 3.10.0.229, so I can't tell you
off hand.  I can think of one important memory pressure related fix
that's probably not in there.

I'm suggesting the latest stable version of 4.1 (currently 4.1.3),
because if you hit a deadlock (remember, this is a configuration that
is neither recommended nor guaranteed to work), it'll be easier to
debug and fix if the fix turns out to be worth it.

If 4.1 is not acceptable for you, try the latest stable version of 3.18
(that is 3.18.19).  It's an LTS kernel, so that should mitigate some of
your concerns.

>   And if I’m using the librdb API, is the kernel version matters?

No, not so much.

>
>   In my tests, I built a 2-nodes cluster, each with only one OSD with os
> centos 7.1, kernel version 3.10.0.229 and ceph v0.94.2.
>   I created several rbds and mkfs.xfs on those rbds to create filesystems.
> (kernel client were running on the ceph cluster)
>   I performed heavy IO tests on those filesystems and found some fio got
> hung and turned into D state forever (uninterruptible sleep).
>   I suspect it’s the deadlock that make the fio process hung.
>   However the ceph-osd are stil responsive, and I can operate rbd via librbd
> API.
>   Does this mean it’s not the loopback mount deadlock that cause the fio
> process hung?
>   Or it is also a deadlock phnonmenon, only one thread is blocked in memory
> allocation and other threads are still possible to receive API requests, so
> the ceph-osd are still responsive?
>
>   What worth mentioning is that after I restart the ceph-osd daemon, all
> processes in D state come back into normal state.
>
>   Below is related log in kernel:
>
> Jul  7 02:25:39 node0 kernel: INFO: task xfsaild/rbd1:24795 blocked for more
> than 120 seconds.
> Jul  7 02:25:39 node0 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jul  7 02:25:39 node0 kernel: xfsaild/rbd1D 880c2fc13680 0 24795
> 2 0x0080
> Jul  7 02:25:39 node0 kernel: 8801d6343d40 0046
> 8801d6343fd8 00013680
> Jul  7 02:25:39 node0 kernel: 8801d6343fd8 00013680
> 880c0c0b 880c0c0b
> Jul  7 02:25:39 node0 kernel: 880c2fc14340 0001
>  8805bace2528
> Jul  7 02:25:39 node0 kernel: Call Trace:
> Jul  7 02:25:39 node0 kernel: [] schedule+0x29/0x70
> Jul  7 02:25:39 node0 kernel: []
> _xfs_log_force+0x230/0x290 [xfs]
> Jul  7 02:25:39 node0 kernel: [] ? wake_up_state+0x20/0x20
> Jul  7 02:25:39 node0 kernel: [] xfs_log_force+0x26/0x80
> [xfs]
> Jul  7 02:25:39 node0 kernel: [] ?
> xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
> Jul  7 02:25:39 node0 kernel: [] xfsaild+0x151/0x5e0 [xfs]
> Jul  7 02:25:39 node0 kernel: [] ?
> xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
> Jul  7 02:25:39 node0 kernel: [] kthread+0xcf/0xe0
> Jul  7 02:25:39 node0 kernel: [] ?
> kthread_create_on_node+0x140/0x140
> Jul  7 02:25:39 node0 kernel: [] ret_from_fork+0x7c/0xb0
> Jul  7 02:25:39 node0 kernel: [] ?
> kthread_create_on_node+0x140/0x140
> Jul  7 02:25:39 node0 kernel: INFO: task xfsaild/rbd5:2914 blocked for more
> than 120 seconds.

Is that all there is in dmesg?  Can you paste the entire dmesg?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Weird behaviour of cephfs with samba

2015-07-28 Thread Dzianis Kahanovich

PS I start to use this patches with samba 4.1. IMHO some of problems may (or
must) be solved not inside vfs code, but outside - in samba kernel, but I still
use both in samba 4.2.3 without verification.

Dzianis Kahanovich пишет:
> I use cephfs over samba vfs and have some issues.
> 
> 1) If I use >1 stacked vfs (ceph & scannedonly) - I have problems with file
> order, but solved by "dirsort" vfs ("vfs objects = scannedonly dirsort ceph").
> Single "ceph" vfs looks good too (and I use it single for fast internal 
> shares),
> but you can try add "dirsort" ("vfs objects = dirsort ceph").
> 
> 2) I use 2 my patches:
> https://github.com/mahatma-kaganovich/raw/tree/master/app-portage/ppatch/files/extensions/net-fs/samba/compile
> - to support "max disk size" and to secure chown. About chown: I unsure about
> strict follow standard system behaviour, but it works for me, without - user 
> can
> chown() even to root. I put first (max disk size) patch into samba bugzilla
> times ago, second patch - no, I unsure about it correctness, but sure about
> security hole.
> 
> Jörg Henne пишет:
>> Hi all,
>>
>> the faq at http://ceph.com/docs/cuttlefish/faq/ mentions the possibility to
>> run export a mounted cephfs via samba. This combination exhibits a very
>> weird behaviour, though.
>>
>> We have a directory on cephfs with many small xml snippets. If I repeadtedly
>> ls the directory on Unix, I get the same answer each and every time:
>>
>> root@gru:/mnt/regtest/regressiontestdata2/assets# while true; do ls|wc -l;
>> sleep 1; done
>> 851
>> 851
>> 851
>> ... and so on
>>
>> If I do the same on the directory exported and mounted via SMB under Windows
>> the result looks like that (output generated unter cygwin, but effect is
>> present with Windows Explorer as well):
>>
>> $ while true; do ls|wc -l; sleep 1; done
>> 380
>> 380
>> 380
>> 380
>> 380
>> 1451
>> 362
>> 851
>> 851
>> 851
>> 851
>> 851
>> 851
>> 851
>> 851
>> 1451
>> 362
>> 851
>> 851
>> 851
>> ...
>>
>> The problem does not seem to be related to Samba. If I copy the files to an
>> XFS volume and export that, things look fine.
>>
>> Thanks
>> Joerg Henne
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> 


-- 
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Did maximum performance reached?

2015-07-28 Thread John Spray




On 28/07/15 11:53, John Spray wrote:



On 28/07/15 11:17, Shneur Zalman Mattern wrote:

Oh, now I've to cry :-)
not because it's not SSDs... it's SAS2 HDDs

Because, I need to build something for 140 clients... 4200 OSDs

:-(

Looks like, I can pickup my performance by SSDs, but I need a huge 
capacity ~ 2PB
Perhaps, tiering cache pool can save my money, but I've read here - 
that it's slower than all people think...


:-(

Why Lustre is more performable? There're same HDDs?


Lustre isn't (A) creating two copies of your data, and it's (B) not 
executing disk writes as atomic transactions (i.e. no data writeahead 
log).


The A tradeoff is that while a Lustre system typically requires an 
expensive dual ported RAID controller, Ceph doesn't.  You take the 
money you saved on RAID controllers have spend it on having a larger 
number of cheaper hosts and drives.  If you've already bought the 
Lustre-oriented hardware then my advice would be to run Lustre on it :-)


The efficient way of handling B is to use SSD journals for your OSDs.  
Typical Ceph servers have one SSD per approx 4 OSDs.


Oh, I've just re-read the original message in this thread, and you're 
already using SSD journals.


So I think the only point of confusion was that you weren't dividing 
your expected bandwidth number by the number of replicas, right?


> Each spindel-disk can write ~ 100MB/s , and we have 10 SAS disks on 
each node = aggregated write speed is ~ 900MB/s (because of striping etc.)
And we have 3 OSD nodes, and objects are striped also on 30 osds - I 
thought it's also aggregateble and we'll get something around 2.5 GB/s, 
but not...


Your expected bandwidth (with size=2 replicas) will be (900MB/s * 3)/2 = 
1300MB/s -- so I think you're actually doing pretty well with your 
1367MB/s number.


John





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Unable to create new pool in cluster

2015-07-28 Thread Daleep Bais

Dear Kefu,

Thanks..
It worked..

Appreciate your help..

TC

On Sun, Jul 26, 2015 at 8:06 AM, kefu chai  wrote:

> On Sat, Jul 25, 2015 at 9:43 PM, Daleep Bais  wrote:
> > Hi All,
> >
> > I am unable to create new pool in my cluster. I have some existing pools.
> >
> > I get error :
> >
> > ceph osd pool create fullpool 128 128
> > Error EINVAL: crushtool: exec failed: (2) No such file or directory
> >
> >
> > existing pools are :
> >
> > cluster# ceph osd lspools
> > 0 rbd,1 data,3 pspl,
> >
> > Please suggest..
>
>
> Daleep, seems your crushtool is not in $PATH when the monitor started.
> you might want to make sure you have crushtool installed somewhere,
> and:
>
> $ ceph --admin-daemon  config show | grep
> crushtool ## check the patch to crushtool
> $ ceph tell mon.* injectargs --crushtool  ##
> point it to your crushtool
>
>
> HTH.
>
> --
> Regards
> Kefu Chai
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which kernel version can help avoid kernel client deadlock

2015-07-28 Thread van

Hi, Ilya,
  
  In the dmesg, there is also a lot of libceph socket error, which I think may 
be caused by my stopping ceph service without unmap rbd.
  
  Here is a more than 1 lines log contains more info, http://jmp.sh/NcokrfT 
 
  
  Thanks for willing to help.

van
chaofa...@owtware.com



> On Jul 28, 2015, at 7:11 PM, Ilya Dryomov  wrote:
> 
> On Tue, Jul 28, 2015 at 11:19 AM, van  > wrote:
>> Hi, Ilya,
>> 
>>  Thanks for your quick reply.
>> 
>>  Here is the link http://ceph.com/docs/cuttlefish/faq/ 
>>   , under the "HOW
>> CAN I GIVE CEPH A TRY?” section which talk about the old kernel stuff.
>> 
>>  By the way, what’s the main reason of using kernel 4.1, is there a lot of
>> critical bugs fixed in that version despite perf improvements?
>>  I am worrying kernel 4.1 is too new that may introduce other problems.
> 
> Well, I'm not sure what exactly is in 3.10.0.229, so I can't tell you
> off hand.  I can think of one important memory pressure related fix
> that's probably not in there.
> 
> I'm suggesting the latest stable version of 4.1 (currently 4.1.3),
> because if you hit a deadlock (remember, this is a configuration that
> is neither recommended nor guaranteed to work), it'll be easier to
> debug and fix if the fix turns out to be worth it.
> 
> If 4.1 is not acceptable for you, try the latest stable version of 3.18
> (that is 3.18.19).  It's an LTS kernel, so that should mitigate some of
> your concerns.
> 
>>  And if I’m using the librdb API, is the kernel version matters?
> 
> No, not so much.
> 
>> 
>>  In my tests, I built a 2-nodes cluster, each with only one OSD with os
>> centos 7.1, kernel version 3.10.0.229 and ceph v0.94.2.
>>  I created several rbds and mkfs.xfs on those rbds to create filesystems.
>> (kernel client were running on the ceph cluster)
>>  I performed heavy IO tests on those filesystems and found some fio got
>> hung and turned into D state forever (uninterruptible sleep).
>>  I suspect it’s the deadlock that make the fio process hung.
>>  However the ceph-osd are stil responsive, and I can operate rbd via librbd
>> API.
>>  Does this mean it’s not the loopback mount deadlock that cause the fio
>> process hung?
>>  Or it is also a deadlock phnonmenon, only one thread is blocked in memory
>> allocation and other threads are still possible to receive API requests, so
>> the ceph-osd are still responsive?
>> 
>>  What worth mentioning is that after I restart the ceph-osd daemon, all
>> processes in D state come back into normal state.
>> 
>>  Below is related log in kernel:
>> 
>> Jul  7 02:25:39 node0 kernel: INFO: task xfsaild/rbd1:24795 blocked for more
>> than 120 seconds.
>> Jul  7 02:25:39 node0 kernel: "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Jul  7 02:25:39 node0 kernel: xfsaild/rbd1D 880c2fc13680 0 24795
>> 2 0x0080
>> Jul  7 02:25:39 node0 kernel: 8801d6343d40 0046
>> 8801d6343fd8 00013680
>> Jul  7 02:25:39 node0 kernel: 8801d6343fd8 00013680
>> 880c0c0b 880c0c0b
>> Jul  7 02:25:39 node0 kernel: 880c2fc14340 0001
>>  8805bace2528
>> Jul  7 02:25:39 node0 kernel: Call Trace:
>> Jul  7 02:25:39 node0 kernel: [] schedule+0x29/0x70
>> Jul  7 02:25:39 node0 kernel: []
>> _xfs_log_force+0x230/0x290 [xfs]
>> Jul  7 02:25:39 node0 kernel: [] ? wake_up_state+0x20/0x20
>> Jul  7 02:25:39 node0 kernel: [] xfs_log_force+0x26/0x80
>> [xfs]
>> Jul  7 02:25:39 node0 kernel: [] ?
>> xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
>> Jul  7 02:25:39 node0 kernel: [] xfsaild+0x151/0x5e0 [xfs]
>> Jul  7 02:25:39 node0 kernel: [] ?
>> xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
>> Jul  7 02:25:39 node0 kernel: [] kthread+0xcf/0xe0
>> Jul  7 02:25:39 node0 kernel: [] ?
>> kthread_create_on_node+0x140/0x140
>> Jul  7 02:25:39 node0 kernel: [] ret_from_fork+0x7c/0xb0
>> Jul  7 02:25:39 node0 kernel: [] ?
>> kthread_create_on_node+0x140/0x140
>> Jul  7 02:25:39 node0 kernel: INFO: task xfsaild/rbd5:2914 blocked for more
>> than 120 seconds.
> 
> Is that all there is in dmesg?  Can you paste the entire dmesg?
> 
> Thanks,
> 
>Ilya

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which kernel version can help avoid kernel client deadlock

2015-07-28 Thread Ilya Dryomov

On Tue, Jul 28, 2015 at 2:46 PM, van  wrote:
> Hi, Ilya,
>
>   In the dmesg, there is also a lot of libceph socket error, which I think
> may be caused by my stopping ceph service without unmap rbd.

Well, sure enough, if you kill all OSDs, the filesystem mounted on top
of rbd device will get stuck.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Did maximum performance reached?

2015-07-28 Thread Shneur Zalman Mattern

As I'm understanding now that's in this case (30 disks) 10Gbit Network is not a 
bottleneck!

With other HW config ( + 5 OSD nodes = + 50 disks ) I'd get 3400 MB/s,
and 3 clients can work on full bandwidth, yes?

OK, let's try ! ! ! ! ! ! !

Perhaps, somebody has more suggestions for increasing performance:
1. NVMe journals, 
2. btrfs over osd
3. ssd-based osds,
4. 15K hdds 
5. RAID 10 on each OSD node
.
everybody - brainstorm!!!

>John:
>Your expected bandwidth (with size=2 replicas) will be (900MB/s * 3)/2 =
>1300MB/s -- so I think you're actually doing pretty well with your
>1367MB/s number.











This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals & computer 
viruses.





 
 

This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals & computer 
viruses.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Did maximum performance reached?

2015-07-28 Thread Udo Lembke

Hi,

On 28.07.2015 12:02, Shneur Zalman Mattern wrote:
> Hi!
>
> And so, in your math
> I need to build size = osd, 30 replicas for my cluster of 120TB - to get my 
> demans 
30 replicas is the wrong math! Less replicas = more speed (because of
less writing).
More replicas less speed.
Fore data safety an replica of 3 is recommended.


Udo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD RAM usage values

2015-07-28 Thread Dan van der Ster

On Tue, Jul 28, 2015 at 12:07 PM, Gregory Farnum  wrote:
> On Tue, Jul 28, 2015 at 11:00 AM, Kenneth Waegeman
>  wrote:
>>
>>
>> On 07/17/2015 02:50 PM, Gregory Farnum wrote:
>>>
>>> On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman
>>>  wrote:

 Hi all,

 I've read in the documentation that OSDs use around 512MB on a healthy
 cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram)
 Now, our OSD's are all using around 2GB of RAM memory while the cluster
 is
 healthy.


PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
 COMMAND
 29784 root  20   0 6081276 2.535g   4740 S   0.7  8.1   1346:55
 ceph-osd
 32818 root  20   0 5417212 2.164g  24780 S  16.2  6.9   1238:55
 ceph-osd
 25053 root  20   0 5386604 2.159g  27864 S   0.7  6.9   1192:08
 ceph-osd
 33875 root  20   0 5345288 2.092g   3544 S   0.7  6.7   1188:53
 ceph-osd
 30779 root  20   0 5474832 2.090g  28892 S   1.0  6.7   1142:29
 ceph-osd
 22068 root  20   0 5191516 2.000g  28664 S   0.7  6.4  31:56.72
 ceph-osd
 34932 root  20   0 5242656 1.994g   4536 S   0.3  6.4   1144:48
 ceph-osd
 26883 root  20   0 5178164 1.938g   6164 S   0.3  6.2   1173:01
 ceph-osd
 31796 root  20   0 5193308 1.916g  27000 S  16.2  6.1 923:14.87
 ceph-osd
 25958 root  20   0 5193436 1.901g   2900 S   0.7  6.1   1039:53
 ceph-osd
 27826 root  20   0 5225764 1.845g   5576 S   1.0  5.9   1031:15
 ceph-osd
 36011 root  20   0 5111660 1.823g  20512 S  15.9  5.8   1093:01
 ceph-osd
 19736 root  20   0 2134680 0.994g  0 S   0.3  3.2  46:13.47
 ceph-osd



 [root@osd003 ~]# ceph status
 2015-07-17 14:03:13.865063 7f1fde5f0700 -1 WARNING: the following
 dangerous
 and experimental features are enabled: keyvaluestore
 2015-07-17 14:03:13.887087 7f1fde5f0700 -1 WARNING: the following
 dangerous
 and experimental features are enabled: keyvaluestore
  cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
   health HEALTH_OK
   monmap e1: 3 mons at

 {mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
  election epoch 58, quorum 0,1,2 mds01,mds02,mds03
   mdsmap e17218: 1/1/1 up {0=mds03=up:active}, 1 up:standby
   osdmap e25542: 258 osds: 258 up, 258 in
pgmap v2460163: 4160 pgs, 4 pools, 228 TB data, 154 Mobjects
  270 TB used, 549 TB / 819 TB avail
  4152 active+clean
 8 active+clean+scrubbing+deep


 We are using erasure code on most of our OSDs, so maybe that is a reason.
 But also the cache-pool filestore OSDS on 200GB SSDs are using 2GB of
 RAM.
 Our erasure code pool (16*14 osds) have a pg_num of 2048; our cache pool
 (2*14 OSDS) has a pg_num of 1024.

 Are these normal values for this configuration, and is the documentation
 a
 bit outdated, or should we look into something else?
>>>
>>>
>>> 2GB of RSS is larger than I would have expected, but not unreasonable.
>>> In particular I don't think we've gathered numbers on either EC pools
>>> or on the effects of the caching processes.
>>
>>
>> Which data is actually in memory of the OSDS?
>> Is this mostly cached data?
>> We are short on memory on these servers, can we have influence on this?
>
> Mmm, we've discussed this a few times on the mailing list. The CERN
> guys published a document on experimenting with a very large cluster
> and not enough RAM, but there's nothing I would really recommend
> changing for a production system, especially an EC one, if you aren't
> intimately familiar with what's going on.

In that CERN test the obvious large memory consumer was the osdmap
cache, which was so large because (a) the maps were getting quite
large (7200 OSDs creates a 4MB map, IIRC) and (b) so much osdmap churn
was leading each OSD to cache 500 of the maps. Once the cluster was
fully deployed and healthy, we could restart an OSD and it would then
only use ~300MB (because now the osdmap cache was ~empty).

Kenneth: does the memory usage shrink if you restart an osd? If so, it
could be a similar issue.

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD RAM usage values

2015-07-28 Thread Mark Nelson




On 07/17/2015 07:50 AM, Gregory Farnum wrote:

On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman
 wrote:

Hi all,

I've read in the documentation that OSDs use around 512MB on a healthy
cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram)
Now, our OSD's are all using around 2GB of RAM memory while the cluster is
healthy.


   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
29784 root  20   0 6081276 2.535g   4740 S   0.7  8.1   1346:55 ceph-osd
32818 root  20   0 5417212 2.164g  24780 S  16.2  6.9   1238:55 ceph-osd
25053 root  20   0 5386604 2.159g  27864 S   0.7  6.9   1192:08 ceph-osd
33875 root  20   0 5345288 2.092g   3544 S   0.7  6.7   1188:53 ceph-osd
30779 root  20   0 5474832 2.090g  28892 S   1.0  6.7   1142:29 ceph-osd
22068 root  20   0 5191516 2.000g  28664 S   0.7  6.4  31:56.72 ceph-osd
34932 root  20   0 5242656 1.994g   4536 S   0.3  6.4   1144:48 ceph-osd
26883 root  20   0 5178164 1.938g   6164 S   0.3  6.2   1173:01 ceph-osd
31796 root  20   0 5193308 1.916g  27000 S  16.2  6.1 923:14.87 ceph-osd
25958 root  20   0 5193436 1.901g   2900 S   0.7  6.1   1039:53 ceph-osd
27826 root  20   0 5225764 1.845g   5576 S   1.0  5.9   1031:15 ceph-osd
36011 root  20   0 5111660 1.823g  20512 S  15.9  5.8   1093:01 ceph-osd
19736 root  20   0 2134680 0.994g  0 S   0.3  3.2  46:13.47 ceph-osd



[root@osd003 ~]# ceph status
2015-07-17 14:03:13.865063 7f1fde5f0700 -1 WARNING: the following dangerous
and experimental features are enabled: keyvaluestore
2015-07-17 14:03:13.887087 7f1fde5f0700 -1 WARNING: the following dangerous
and experimental features are enabled: keyvaluestore
 cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
  health HEALTH_OK
  monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
 election epoch 58, quorum 0,1,2 mds01,mds02,mds03
  mdsmap e17218: 1/1/1 up {0=mds03=up:active}, 1 up:standby
  osdmap e25542: 258 osds: 258 up, 258 in
   pgmap v2460163: 4160 pgs, 4 pools, 228 TB data, 154 Mobjects
 270 TB used, 549 TB / 819 TB avail
 4152 active+clean
8 active+clean+scrubbing+deep


We are using erasure code on most of our OSDs, so maybe that is a reason.
But also the cache-pool filestore OSDS on 200GB SSDs are using 2GB of RAM.
Our erasure code pool (16*14 osds) have a pg_num of 2048; our cache pool
(2*14 OSDS) has a pg_num of 1024.

Are these normal values for this configuration, and is the documentation a
bit outdated, or should we look into something else?


2GB of RSS is larger than I would have expected, but not unreasonable.
In particular I don't think we've gathered numbers on either EC pools
or on the effects of the caching processes.


FWIW, here's statistics for ~36 ceph-osds on the wip-promote-prob branch 
after several hours of cache tiering tests (30 OSD base, 6 OS cache 
tier) using an EC6+2 pool.  At the time of this test, 4K random 
read/writes were being performed.  The cache tier OSDs specifically use 
quite a bit more memory than the base tier.  Interestingly in this test 
major pagefaults are showing up for the cache tier OSDs which is 
annoying. I may need to tweak kernel VM settings on this box.



# PROCESS SUMMARY (counters are /sec)
#Time  PID  User PR  PPID THRD S   VSZ   RSS CP  SysT  UsrT Pct  
AccuTime  RKB  WKB MajF MinF Command
09:58:48   715  root 20 1  424 S1G  271M  8  0.19  0.43   6  
30:12.64000 2502 /usr/local/bin/ceph-osd
09:58:48  1363  root 20 1  424 S1G  325M  8  0.14  0.33   4  
26:50.54000   68 /usr/local/bin/ceph-osd
09:58:48  2080  root 20 1  420 S1G  276M  1  0.21  0.49   7  
23:49.36000 2848 /usr/local/bin/ceph-osd
09:58:48  2747  root 20 1  424 S1G  283M  8  0.25  0.68   9  
25:16.63000 1391 /usr/local/bin/ceph-osd
09:58:48  3451  root 20 1  424 S1G  331M  6  0.13  0.14   2  
27:36.71000  148 /usr/local/bin/ceph-osd
09:58:48  4172  root 20 1  424 S1G  301M  6  0.19  0.43   6  
29:44.56000 2165 /usr/local/bin/ceph-osd
09:58:48  4935  root 20 1  420 S1G  310M  9  0.18  0.28   4  
29:09.78000 2042 /usr/local/bin/ceph-osd
09:58:48  5750  root 20 1  420 S1G  267M  2  0.11  0.14   2  
26:55.31000  866 /usr/local/bin/ceph-osd
09:58:48  6544  root 20 1  424 S1G  299M  7  0.22  0.62   8  
26:46.35000 3468 /usr/local/bin/ceph-osd
09:58:48  7379  root 20 1  424 S1G  283M  8  0.16  0.47   6  
25:47.86000  538 /usr/local/bin/ceph-osd
09:58:48  8183  root 20 1  424 S1G  269M  4  0.25  0.67   9  
35:09.85000 2968 /usr/local/bin/ceph-osd
09:58:48  9026  root 20 1  424 S1G  261M  1  0.19  0.46   6  
26:27.36000  539 /usr/local/bin/ceph-o

[ceph-users] Updating OSD Parameters

2015-07-28 Thread Noah Mehl

When we update the following in ceph.conf:

[osd]
  osd_recovery_max_active = 1
  osd_max_backfills = 1

How do we make sure it takes affect?  Do we have to restart all of the ceph 
osd’s and mon’s?

Thanks!

~Noah

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Updating OSD Parameters

2015-07-28 Thread Wido den Hollander

On 28-07-15 16:53, Noah Mehl wrote:
> When we update the following in ceph.conf:
> 
> [osd]
>   osd_recovery_max_active = 1
>   osd_max_backfills = 1
> 
> How do we make sure it takes affect?  Do we have to restart all of the
> ceph osd’s and mon’s?

On a client with client.admin keyring you execute:

ceph tell osd.* injectargs '--osd_recovery_max_active=1'

It will take effect immediately. Keep in mind though that PGs which are
currently recovering are not affected.

So if a OSD is currently doing 10 backfills, it will keep doing that. It
however won't accept any new backfills. So it slowly goes down to 9, 8,
7, etc, until you see only 1 backfill active.

Same goes for recovery.

Wido

> 
> Thanks!
> 
> ~Noah
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] why are there "degraded" PGs when adding OSDs?

2015-07-28 Thread Samuel Just

If it wouldn't be too much trouble, I'd actually like the binary osdmap as well 
(it contains the crushmap, but also a bunch of other stuff).  There is a 
command that lets you get old osdmaps from the mon by epoch as long as they 
haven't been trimmed.
-Sam

- Original Message -
From: "Chad William Seys" 
To: "Samuel Just" 
Cc: "ceph-users" 
Sent: Tuesday, July 28, 2015 7:40:31 AM
Subject: Re: [ceph-users] why are there "degraded" PGs when adding OSDs?

Hi Sam,

Trying again today with crush tunables set to firefly.  Degraded peaked around 
46.8%.

I've attached the ceph pg dump and the crushmap (same as osdmap) from before 
and after the OSD additions. 3 osds were added on host osd03.  This added 5TB 
to about 17TB for a total of around 22TB.  5TB/22TB = 22.7%  Is it expected 
for 46.8% of PGs to be degraded after adding 22% of the storage?

Another weird thing is that the kernel RBD clients froze up after the OSDs 
were added, but worked fine after reboot.  (Debian kernel 3.16.7)

Thanks for checking!
C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Updating OSD Parameters

2015-07-28 Thread Noah Mehl

Wido,

That’s awesome, I will look at this right now.

Thanks!

~Noah

> On Jul 28, 2015, at 11:02 AM, Wido den Hollander  wrote:
> 
> 
> 
> On 28-07-15 16:53, Noah Mehl wrote:
>> When we update the following in ceph.conf:
>> 
>> [osd]
>>  osd_recovery_max_active = 1
>>  osd_max_backfills = 1
>> 
>> How do we make sure it takes affect?  Do we have to restart all of the
>> ceph osd’s and mon’s?
> 
> On a client with client.admin keyring you execute:
> 
> ceph tell osd.* injectargs '--osd_recovery_max_active=1'
> 
> It will take effect immediately. Keep in mind though that PGs which are
> currently recovering are not affected.
> 
> So if a OSD is currently doing 10 backfills, it will keep doing that. It
> however won't accept any new backfills. So it slowly goes down to 9, 8,
> 7, etc, until you see only 1 backfill active.
> 
> Same goes for recovery.
> 
> Wido
> 
>> 
>> Thanks!
>> 
>> ~Noah
>> 
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph 0.94 (and lower) performance on >1 hosts ??

2015-07-28 Thread SCHAER Frederic

Hi again,

So I have tried 
- changing the cpus frequency : either 1.6GHZ, or 2.4GHZ on all cores
- changing the memory configuration, from "advanced ecc mode" to "performance 
mode", boosting the memory bandwidth from 35GB/s to 40GB/s
- plugged a second 10GB/s link and setup a ceph internal network
- tried various "tuned-adm profile" such as "throughput-performance"

This changed about nothing.

If 
- the CPUs are not maxed out, and lowering the frequency doesn't change a thing
- the network is not maxed out
- the memory doesn't seem to have an impact
- network interrupts are spread across all 8 cpu cores and receive queues are OK
- disks are not used at their maximum potential (iostat shows my dd commands 
produce much more tps than the 4MB ceph transfers...)

Where can I possibly find a bottleneck ?

I'm /(almost) out of ideas/ ... :'(

Regards

-Message d'origine-
De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de SCHAER 
Frederic
Envoyé : vendredi 24 juillet 2015 16:04
À : Christian Balzer; ceph-users@lists.ceph.com
Objet : [PROVENANCE INTERNET] Re: [ceph-users] Ceph 0.94 (and lower) 
performance on >1 hosts ??

Hi,

Thanks.
I did not know about atop, nice tool... and I don't seem to be IRQ overloaded - 
I can reach 100% cpu % for IRQs, but that's shared across all 8 physical cores.
I also discovered "turbostat" which showed me the R510s were not configured for 
"performance" in the bios (but dbpm - demand based power management), and were 
not bumping the CPUs frequency to 2.4GHz as they should... only apparently 
remaining at 1.6Ghz...

But changing that did not improve things unfortunately. I know have CPUs  using 
their xeon turbo frequency, but no throughput improvement.

Looking at RPS/ RSS, it looks like our Broadcom cards are configured correctly 
according to redhat, i.e : one receive queue per physical core, spreading the 
IRQ load everywhere.
One thing I noticed though is that the dell BIOS allows to change IRQs... but 
once you change the network card IRQ, it also changes the RAID card IRQ as well 
as many others, all sharing the same bios IRQ (that's therefore apparently a 
useless option). Weird.

Still attempting to determine the bottleneck ;)

Regards
Frederic

-Message d'origine-
De : Christian Balzer [mailto:ch...@gol.com] 
Envoyé : jeudi 23 juillet 2015 14:18
À : ceph-users@lists.ceph.com
Cc : Gregory Farnum; SCHAER Frederic
Objet : Re: [ceph-users] Ceph 0.94 (and lower) performance on >1 hosts ??

On Thu, 23 Jul 2015 11:14:22 +0100 Gregory Farnum wrote:

> Your note that dd can do 2GB/s without networking makes me think that
> you should explore that. As you say, network interrupts can be
> problematic in some systems. The only thing I can think of that's been
> really bad in the past is that some systems process all network
> interrupts on cpu 0, and you probably want to make sure that it's
> splitting them across CPUs.
>

An IRQ overload would be very visible with atop.

Splitting the IRQs will help, but it is likely to need some smarts.

As in, irqbalance may spread things across NUMA nodes.

A card with just one IRQ line will need RPS (Receive Packet Steering),
irqbalance can't help it.

For example, I have a compute node with such a single line card and Quad
Opterons (64 cores, 8 NUMA nodes).

The default is all interrupt handling on CPU0 and that is very little,
except for eth2. So this gets a special treatment:
---
echo 4 >/proc/irq/106/smp_affinity_list
---
Pinning the IRQ for eth2 to CPU 4 by default

---
echo f0 > /sys/class/net/eth2/queues/rx-0/rps_cpus
---
giving RPS CPUs 4-7 to work with. At peak times it needs more than 2
cores, otherwise with this architecture just using 4 and 5 (same L2 cache)
would be better.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which kernel version can help avoid kernel client deadlock

2015-07-28 Thread van

> On Jul 28, 2015, at 7:57 PM, Ilya Dryomov  wrote:
> 
> On Tue, Jul 28, 2015 at 2:46 PM, van  wrote:
>> Hi, Ilya,
>> 
>>  In the dmesg, there is also a lot of libceph socket error, which I think
>> may be caused by my stopping ceph service without unmap rbd.
> 
> Well, sure enough, if you kill all OSDs, the filesystem mounted on top
> of rbd device will get stuck.

Sure it will get stuck if osds are stopped. And since rados requests have retry 
policy, the stucked requests will recover after I start the daemon again.

But in my case, the osds are running in normal state and librbd API can 
read/write normally.
Meanwhile, heavy fio test for the filesystem mounted on top of rbd device will 
get stuck.

I wonder if this phenomenon is triggered by running rbd kernel client on 
machines have ceph daemons, i.e. the annoying loopback mount deadlock issue.

In my opinion, if it’s due to the loopback mount deadlock, the OSDs will become 
unresponsive.
No matter the requests are from user space requests (like API) or from kernel 
client.
Am I right?

If so, my case seems to be triggered by another bug.

Anyway, it seems that I should separate client and daemons at least.

Thanks.

> 
> Thanks,
> 
>Ilya

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which kernel version can help avoid kernel client deadlock

2015-07-28 Thread Ilya Dryomov

On Tue, Jul 28, 2015 at 7:20 PM, van  wrote:
>
>> On Jul 28, 2015, at 7:57 PM, Ilya Dryomov  wrote:
>>
>> On Tue, Jul 28, 2015 at 2:46 PM, van  wrote:
>>> Hi, Ilya,
>>>
>>>  In the dmesg, there is also a lot of libceph socket error, which I think
>>> may be caused by my stopping ceph service without unmap rbd.
>>
>> Well, sure enough, if you kill all OSDs, the filesystem mounted on top
>> of rbd device will get stuck.
>
> Sure it will get stuck if osds are stopped. And since rados requests have 
> retry policy, the stucked requests will recover after I start the daemon 
> again.
>
> But in my case, the osds are running in normal state and librbd API can 
> read/write normally.
> Meanwhile, heavy fio test for the filesystem mounted on top of rbd device 
> will get stuck.
>
> I wonder if this phenomenon is triggered by running rbd kernel client on 
> machines have ceph daemons, i.e. the annoying loopback mount deadlock issue.
>
> In my opinion, if it’s due to the loopback mount deadlock, the OSDs will 
> become unresponsive.
> No matter the requests are from user space requests (like API) or from kernel 
> client.
> Am I right?

Not necessarily.

>
> If so, my case seems to be triggered by another bug.
>
> Anyway, it seems that I should separate client and daemons at least.

Try 3.18.19 if you can.  I'd be interested in your results.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RadosGW - radosgw-agent start error

2015-07-28 Thread Italo Santos

Hello everyone,  

I’m setting up a federated configuration of radosgw but when I start a 
radosgw-agent I face with the error bellow and I’d like to know if I’m doing 
something wrong…?

See the error:

root@cephgw0001:~# radosgw-agent -v -c /etc/ceph/radosgw-agent/default.conf
2015-07-28 17:02:03,103 3600 [radosgw_agent][INFO  ]  ____  
 __   ___  ___
2015-07-28 17:02:03,103 3600 [radosgw_agent][INFO  ] /__` \ / |\ | /  ` /\  
/ _` |__  |\ |  |
2015-07-28 17:02:03,104 3600 [radosgw_agent][INFO  ] .__/  |  | \| \__,/~~\ 
\__> |___ | \|  |
2015-07-28 17:02:03,104 3600 [radosgw_agent][INFO  ]
  v1.2.3
2015-07-28 17:02:03,105 3600 [radosgw_agent][INFO  ] agent options:
2015-07-28 17:02:03,105 3600 [radosgw_agent][INFO  ]  args:
2015-07-28 17:02:03,106 3600 [radosgw_agent][INFO  ]conf
  : None
2015-07-28 17:02:03,106 3600 [radosgw_agent][INFO  ]dest_access_key 
  : 
2015-07-28 17:02:03,107 3600 [radosgw_agent][INFO  ]dest_secret_key 
  : 
2015-07-28 17:02:03,108 3600 [radosgw_agent][INFO  ]destination 
  : http://tmk.object-storage.local:80
2015-07-28 17:02:03,108 3600 [radosgw_agent][INFO  ]incremental_sync_delay  
  : 30
2015-07-28 17:02:03,109 3600 [radosgw_agent][INFO  ]lock_timeout
  : 60
2015-07-28 17:02:03,109 3600 [radosgw_agent][INFO  ]log_file
  : /var/log/radosgw/radosgw-sync.log
2015-07-28 17:02:03,110 3600 [radosgw_agent][INFO  ]log_lock_time   
  : 20
2015-07-28 17:02:03,110 3600 [radosgw_agent][INFO  ]max_entries 
  : 1000
2015-07-28 17:02:03,111 3600 [radosgw_agent][INFO  ]metadata_only   
  : False
2015-07-28 17:02:03,111 3600 [radosgw_agent][INFO  ]num_workers 
  : 1
2015-07-28 17:02:03,112 3600 [radosgw_agent][INFO  ]object_sync_timeout 
  : 216000
2015-07-28 17:02:03,112 3600 [radosgw_agent][INFO  ]prepare_error_delay 
  : 10
2015-07-28 17:02:03,113 3600 [radosgw_agent][INFO  ]quiet   
  : False
2015-07-28 17:02:03,113 3600 [radosgw_agent][INFO  ]rgw_data_log_window 
  : 30
2015-07-28 17:02:03,114 3600 [radosgw_agent][INFO  ]source  
  : None
2015-07-28 17:02:03,114 3600 [radosgw_agent][INFO  ]src_access_key  
  : 
2015-07-28 17:02:03,115 3600 [radosgw_agent][INFO  ]src_secret_key  
  : 
2015-07-28 17:02:03,115 3600 [radosgw_agent][INFO  ]src_zone
  : None
2015-07-28 17:02:03,116 3600 [radosgw_agent][INFO  ]sync_scope  
  : incremental
2015-07-28 17:02:03,116 3600 [radosgw_agent][INFO  ]test_server_host
  : None
2015-07-28 17:02:03,117 3600 [radosgw_agent][INFO  ]test_server_port
  : 8080
2015-07-28 17:02:03,118 3600 [radosgw_agent][INFO  ]verbose 
  : True
2015-07-28 17:02:03,118 3600 [radosgw_agent][INFO  ]versioned   
  : False
2015-07-28 17:02:03,118 3600 [radosgw_agent.client][INFO  ] creating connection 
to endpoint: http://tmk.object-storage.local:80
2015-07-28 17:02:03,120 3600 [radosgw_agent][ERROR ] RegionMapError: Could not 
retrieve region map from destination: make_request() got an unexpected keyword 
argument 'params'


Regards.

Italo Santos
http://italosantos.com.br/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Configuring MemStore in Ceph

2015-07-28 Thread Aakanksha Pudipeddi-SSI

Hello,

I am trying to setup a ceph cluster with a memstore backend. The problem is, it 
is always created with a fixed size (1GB). I made changes to the ceph.conf file 
as follows:

osd_objectstore = memstore
memstore_device_bytes = 5*1024*1024*1024

The resultant cluster still has 1GB allocated to it. Could anybody point out 
what I am doing wrong here?

Thanks,
Aakanksha
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Updating OSD Parameters

2015-07-28 Thread Nikhil Mitra (nikmitra)

I believe you can use ceph tell to inject it in a running cluster.

>From your admin node you should be able to run
Ceph tell osd.* injectargs "--osd_recovery_max_active 1 --osd_max_backfills 1”


Regards,
Nikhil Mitra


From: ceph-users 
mailto:ceph-users-boun...@lists.ceph.com>> 
on behalf of Noah Mehl 
mailto:noahm...@combinedpublic.com>>
Date: Tuesday, July 28, 2015 at 7:53 AM
To: "ceph-users@lists.ceph.com" 
mailto:ceph-users@lists.ceph.com>>
Subject: [ceph-users] Updating OSD Parameters

When we update the following in ceph.conf:

[osd]
  osd_recovery_max_active = 1
  osd_max_backfills = 1

How do we make sure it takes affect?  Do we have to restart all of the ceph 
osd’s and mon’s?

Thanks!

~Noah

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Configuring MemStore in Ceph

2015-07-28 Thread Haomai Wang

Which version do you use?

https://github.com/ceph/ceph/commit/c60f88ba8a6624099f576eaa5f1225c2fcaab41a
should fix your problem

On Wed, Jul 29, 2015 at 5:44 AM, Aakanksha Pudipeddi-SSI
 wrote:
> Hello,
>
>
>
> I am trying to setup a ceph cluster with a memstore backend. The problem is,
> it is always created with a fixed size (1GB). I made changes to the
> ceph.conf file as follows:
>
>
>
> osd_objectstore = memstore
>
> memstore_device_bytes = 5*1024*1024*1024
>
>
>
> The resultant cluster still has 1GB allocated to it. Could anybody point out
> what I am doing wrong here?
>
>
>
> Thanks,
>
> Aakanksha
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Configuring MemStore in Ceph

2015-07-28 Thread Aakanksha Pudipeddi-SSI

Hello Haomai,

I am using v0.94.2.

Thanks,
Aakanksha

-Original Message-
From: Haomai Wang [mailto:haomaiw...@gmail.com] 
Sent: Tuesday, July 28, 2015 7:20 PM
To: Aakanksha Pudipeddi-SSI
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] Configuring MemStore in Ceph

Which version do you use?

https://github.com/ceph/ceph/commit/c60f88ba8a6624099f576eaa5f1225c2fcaab41a
should fix your problem

On Wed, Jul 29, 2015 at 5:44 AM, Aakanksha Pudipeddi-SSI 
 wrote:
> Hello,
>
>
>
> I am trying to setup a ceph cluster with a memstore backend. The 
> problem is, it is always created with a fixed size (1GB). I made 
> changes to the ceph.conf file as follows:
>
>
>
> osd_objectstore = memstore
>
> memstore_device_bytes = 5*1024*1024*1024
>
>
>
> The resultant cluster still has 1GB allocated to it. Could anybody 
> point out what I am doing wrong here?
>
>
>
> Thanks,
>
> Aakanksha
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Configuring MemStore in Ceph

2015-07-28 Thread Haomai Wang

On Wed, Jul 29, 2015 at 10:21 AM, Aakanksha Pudipeddi-SSI
 wrote:
> Hello Haomai,
>
> I am using v0.94.2.
>
> Thanks,
> Aakanksha
>
> -Original Message-
> From: Haomai Wang [mailto:haomaiw...@gmail.com]
> Sent: Tuesday, July 28, 2015 7:20 PM
> To: Aakanksha Pudipeddi-SSI
> Cc: ceph-us...@ceph.com
> Subject: Re: [ceph-users] Configuring MemStore in Ceph
>
> Which version do you use?
>
> https://github.com/ceph/ceph/commit/c60f88ba8a6624099f576eaa5f1225c2fcaab41a
> should fix your problem
>
> On Wed, Jul 29, 2015 at 5:44 AM, Aakanksha Pudipeddi-SSI 
>  wrote:
>> Hello,
>>
>>
>>
>> I am trying to setup a ceph cluster with a memstore backend. The
>> problem is, it is always created with a fixed size (1GB). I made
>> changes to the ceph.conf file as follows:
>>
>>
>>
>> osd_objectstore = memstore
>>
>> memstore_device_bytes = 5*1024*1024*1024
>>
>>
>>
>> The resultant cluster still has 1GB allocated to it. Could anybody
>> point out what I am doing wrong here?

What's the mean of "The resultant cluster still has 1GB allocated to it"?

Is it mean that you can't write data more than 1GB?

>>
>>
>>
>> Thanks,
>>
>> Aakanksha
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Best Regards,
>
> Wheat



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

46 matches

Mail list logo