Re: [ceph-users] ceph master build fails on src/gmock, workaround?

2016-07-10 Thread Brad Hubbard
On Sat, Jul 09, 2016 at 10:43:52AM +, Kevan Rehm wrote:
> Greetings,
> 
> I cloned the master branch of ceph at https://github.com/ceph/ceph.git
> onto a Centos 7 machine, then did
> 
> ./autogen.sh
> ./configure --enable-xio
> make

BTW, you should be defaulting to cmake if you don't have a specific need to
use the autotools build.

-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph admin socket protocol

2016-07-10 Thread Stefan Priebe - Profihost AG
Hi,

is the ceph admin socket protocol described anywhere? I want to talk directly 
to the socket instead of calling the ceph binary. I searched the doc but didn't 
find anything useful.

Thanks,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Filestore merge and split

2016-07-10 Thread Nick Fisk
You need to set the option in the ceph.conf and restart the OSD I think. But it 
will only take effect when splitting or merging in the future, it won't adjust 
the current folder layout. 

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Paul 
> Renner
> Sent: 09 July 2016 22:18
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Filestore merge and split
> 
> Hello cephers
> we have many (millions,  small objects in our RadosGW system and are getting 
> not very good write performance, 100-200 PUTs /sec.
> 
> I have read on the mailinglist that one possible tuning option would be to 
> increase the max. number of files per directory on OSDs with
> eg.
> 
> filestore merge threshold = 40
> filestore split multiple = 8
> Now my question is, do we need to rebuild the OSDs to make this effective? Or 
> is it a runtime setting?
> I'm asking because when setting this with injectargs I get the message 
> "unchangeable" back.
> Thanks for any insight.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Drive letters shuffled on reboot

2016-07-10 Thread William Josefsson
Hi everyone,

I have problem with swapping drive and partition names on reboot. My Ceph is 
Hammer on CentOS7, Dell R730 6xSSD (2xSSD OS RAID1 PERC, 4xSSD=Journal drives), 
18x1.8T SAS for OSDs.

Whenever I reboot, drives randomly seem to change names. This is extremely 
dangerous and frustrating when I've initially setup CEPH with ceph-deploy, zap, 
prepare and activate. It has happened that I've accidentally erased wrong disk 
too when e.g. /dev/sdX had become /dev/sdY.

Please see an output below of how this drive swapping below appears SDC is 
shifted, indexes and drive names got shuffled. Ceph OSDs didn't come up 
properly.

Please advice on how to get this corrected, with no more drive name shuffling. 
Can this be due to the PERC HW raid? thx will



POST REBOOT 2 (expected outcome.. with sda,sdb,sdc,sdd as journal. sdw is a 
perc raid1)


[cephnode3][INFO  ] Running command: sudo /usr/sbin/ceph-disk list
[cephnode3][DEBUG ] /dev/sda :
[cephnode3][DEBUG ]  /dev/sda1 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sda2 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sda3 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sda4 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sda5 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ] /dev/sdb :
[cephnode3][DEBUG ]  /dev/sdb1 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sdb2 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sdb3 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sdb4 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sdb5 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ] /dev/sdc :
[cephnode3][DEBUG ]  /dev/sdc1 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sdc2 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sdc3 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sdc4 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sdc5 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ] /dev/sdd :
[cephnode3][DEBUG ]  /dev/sdd1 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sdd2 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sdd3 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sdd4 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sdd5 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ] /dev/sde :
[cephnode3][DEBUG ]  /dev/sde1 ceph data, active, cluster ceph, osd.0
[cephnode3][DEBUG ] /dev/sdf :
[cephnode3][DEBUG ]  /dev/sdf1 ceph data, active, cluster ceph, osd.1
[cephnode3][DEBUG ] /dev/sdg :
[cephnode3][DEBUG ]  /dev/sdg1 ceph data, active, cluster ceph, osd.2
[cephnode3][DEBUG ] /dev/sdh :
[cephnode3][DEBUG ]  /dev/sdh1 ceph data, active, cluster ceph, osd.3
[cephnode3][DEBUG ] /dev/sdi :
[cephnode3][DEBUG ]  /dev/sdi1 ceph data, active, cluster ceph, osd.4
[cephnode3][DEBUG ] /dev/sdj :
[cephnode3][DEBUG ]  /dev/sdj1 ceph data, active, cluster ceph, osd.5
[cephnode3][DEBUG ] /dev/sdk :
[cephnode3][DEBUG ]  /dev/sdk1 ceph data, active, cluster ceph, osd.6
[cephnode3][DEBUG ] /dev/sdl :
[cephnode3][DEBUG ]  /dev/sdl1 ceph data, active, cluster ceph, osd.7
[cephnode3][DEBUG ] /dev/sdm :
[cephnode3][DEBUG ]  /dev/sdm1 other, xfs
[cephnode3][DEBUG ] /dev/sdn :
[cephnode3][DEBUG ]  /dev/sdn1 ceph data, active, cluster ceph, osd.9
[cephnode3][DEBUG ] /dev/sdo :
[cephnode3][DEBUG ]  /dev/sdo1 ceph data, active, cluster ceph, osd.10
[cephnode3][DEBUG ] /dev/sdp :
[cephnode3][DEBUG ]  /dev/sdp1 ceph data, active, cluster ceph, osd.11
[cephnode3][DEBUG ] /dev/sdq :
[cephnode3][DEBUG ]  /dev/sdq1 ceph data, active, cluster ceph, osd.12
[cephnode3][DEBUG ] /dev/sdr :
[cephnode3][DEBUG ]  /dev/sdr1 ceph data, active, cluster ceph, osd.13
[cephnode3][DEBUG ] /dev/sds :
[cephnode3][DEBUG ]  /dev/sds1 ceph data, active, cluster ceph, osd.14
[cephnode3][DEBUG ] /dev/sdt :
[cephnode3][DEBUG ]  /dev/sdt1 ceph data, active, cluster ceph, osd.15
[cephnode3][DEBUG ] /dev/sdu :
[cephnode3][DEBUG ]  /dev/sdu1 ceph data, active, cluster ceph, osd.16
[cephnode3][DEBUG ] /dev/sdv :
[cephnode3][DEBUG ]  /dev/sdv1 ceph data, active, cluster ceph, osd.17
[cephnode3][DEBUG ] /dev/sdw :
[cephnode3][DEBUG ]  /dev/sdw1 other, xfs, mounted on /
[cephnode3][DEBUG ]  /dev/sdw2 swap, swap


POST REBOOT 1:


[cephnode3][DEBUG ] /dev/sda :
[cephnode3][DEBUG ]  /dev/sda1 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sda2 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sda3 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sda4 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ]  /dev/sda5 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[cephnode3][DEBUG ] /dev/sdb :
[cephnode3][DEBUG ]  /dev/sdb1 other, ebd0a0a2-b9e5-4433-87c0-68b6b

Re: [ceph-users] ceph admin socket protocol

2016-07-10 Thread Daniel Swarbrick
If you can read C code, there is a collectd plugin that talks directly 
to the admin socket:


https://github.com/collectd/collectd/blob/master/src/ceph.c

On 10/07/16 10:36, Stefan Priebe - Profihost AG wrote:

Hi,

is the ceph admin socket protocol described anywhere? I want to talk directly 
to the socket instead of calling the ceph binary. I searched the doc but didn't 
find anything useful.

Thanks,
Stefan




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph for online file storage

2016-07-10 Thread m.da...@bluewin.ch
Hello,

>Those 2 servers are running Ceph?
>If so, be more specific, what's the HW like, CPU, RAM. network, journal
>SSDs?

Yes, I was hesitating between GlusterFS and Ceph but the latter is much more 
scalable and is future-proof.

Both have the same configuration, namely E5 2628L (6c/12t @ 1.9GHz), 8x16G 
2133MHz, 2x10G bonded (we only use 10G and fiber links), multiple 120G SSDs 
avaailable for journals and caching.

>Also, 2 servers indicate a replication of 2, something I'd avoid in
>production.

This is true. I was thinking about EC instead of replication.

>Your first and foremost way to improve IOPS is to have SSD journals,
>everybody who deployed Ceph w/o them in any serious production environment
>came to regret it.

I think it is clear that journal are a must, especially since many small files 
will be read and written to.

>Doubling the OSDs while halving the size will give you the same
>space but at a much better performance.

It's true, but then the $/TB or even $/PB ratio is much higher. It would be 
interesting to compare the outcome with more lower-density disks vs less 
higher-density disks but with more (agressive) caching/journaling.

Your overview of the whole system definitely helps sorting things out. As you 
suggested, it's best I try some combinations to find what suits my use case 
best.

>If you were to use CephFS for storage, putting the metadata on SSDs will
>be beneficial, too.

All OS drives are SSDs, and considering the system will never use the SSD in 
full I think it would be safe to partition it for MDS, cache and journal data.

--
Sincères salutations,

Moïn Danai.
Original Message
From : ch...@gol.com
Date : 01/07/2016 - 04:26 (CEST)
To : ceph-users@lists.ceph.com
Cc : m.da...@bluewin.ch
Subject : Re: [ceph-users] Ceph for online file storage


Hello,

On Thu, 30 Jun 2016 08:34:12 + (GMT) m.da...@bluewin.ch wrote:

> Thank you all for your prompt answers.
> 
> >firstly, wall of text, makes things incredibly hard to read.
> >Use paragraphs/returns liberally.
> 
> I actually made sure to use paragraphs. For some reason, the formatting
> was removed.
> 
> >Is that your entire experience with Ceph, ML archives and docs?
> 
> Of course not, I have already been through the whole documentation many
> times. It's just that I couldn't really decide between the choices I was
> given.
> 
> >What's an "online storage"?
> >I assume you're talking about what is is commonly referred as "cloud
> storage".
> 
> I try not to use the term "cloud", but if you must, then yes that's the
> idea behind it. Basically an online hard disk.
> 
While I can certainly agree that "cloud" is overused and often mis-used as
well, it makes things clearer in this context.

> >10MB is not a small file in my book, 1-4KB (your typical mail) are small
> >files.
> >How much data (volume/space) are you looking at initially and within a
> >year of deployment?
> 
> 10MB is small compared to the larger files, but it is indeed bigger that
> smaller, IOPS-intensive files (like the emails you pointed out).
> 
> Right now there are two servers, each with 12x8TB. I expect a growth
> rate of about the same size every 2-3 months.
> 
Those 2 servers are running Ceph?
If so, be more specific, what's the HW like, CPU, RAM. network, journal
SSDs?

Also, 2 servers indicate a replication of 2, something I'd avoid in
production.


> >What usage patterns are you looking at, expecting?
> 
> Since my customers will put their files on this "cloud", it's generally
> write once, read many (or at least more reads than writes). As they most
> likely will store private documents, but some bigger files too, the
> smaller files are predominant.
>
Reads are helped by having plenty of RAM in your storage servers.
 
> >That's quite the blanket statement and sounds like from A sales
> >brochure. SSDs for OSD journals are always a good idea.
> >Ceph scales first and foremost by adding more storage nodes and OSDs.
> 
> What I meant by scaling is that as the number of customers grows, the
> more small files there will be, and so in order to have decent
> performance at that point, SSDs are a must. I can add many OSDs, but if
> they are all struggling with IOPS then it's no use (except having more
> space).
> 
You seem to grasp the fact that IOPS are likely to be your bottleneck, yet
are going for 8TB HDDs.
Which as Oliver mentioned and plenty of experience shared on this ML shows
is a poor choice unless it's for very low IOPS, large data use cases.

Now while I certainly understand the appeal of dense storage nodes from
cost/space perspective you will want to run several scenarios and
calculations to see what actually turns out to be the best fit.

Your HDDs can do about 150 IOPS, half of that if they have no SSD journals
and then some 30% more lost to FS journals, LevelDB updates, etc.
Let's call it 60 IOPS w/o SSD journals and 120 with.

Your first and foremost way to improve IOPS is to have SSD journals,
everybody who deployed Ceph

Re: [ceph-users] ceph admin socket protocol

2016-07-10 Thread John Spray
On Sun, Jul 10, 2016 at 9:36 AM, Stefan Priebe - Profihost AG
 wrote:
> Hi,
>
> is the ceph admin socket protocol described anywhere? I want to talk directly 
> to the socket instead of calling the ceph binary. I searched the doc but 
> didn't find anything useful.

There's no binary involved in sending commands to the admin socket,
the CLI is using the python code here:
https://github.com/ceph/ceph/blob/master/src/pybind/ceph_daemon.py

Cheers,
John

> Thanks,
> Stefan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Filestore merge and split

2016-07-10 Thread Paul Renner
Thanks...

Do you know when splitting or merging will happen? Is it enough that a
directory is read, eg. through scrub? If possible I would like to initiate
the process

Regards
Paul

On Sun, Jul 10, 2016 at 10:47 AM, Nick Fisk  wrote:

> You need to set the option in the ceph.conf and restart the OSD I think.
> But it will only take effect when splitting or merging in the future, it
> won't adjust the current folder layout.
>
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> Of Paul Renner
> > Sent: 09 July 2016 22:18
> > To: ceph-users@lists.ceph.com
> > Subject: [ceph-users] Filestore merge and split
> >
> > Hello cephers
> > we have many (millions,  small objects in our RadosGW system and are
> getting not very good write performance, 100-200 PUTs /sec.
> >
> > I have read on the mailinglist that one possible tuning option would be
> to increase the max. number of files per directory on OSDs with
> > eg.
> >
> > filestore merge threshold = 40
> > filestore split multiple = 8
> > Now my question is, do we need to rebuild the OSDs to make this
> effective? Or is it a runtime setting?
> > I'm asking because when setting this with injectargs I get the message
> "unchangeable" back.
> > Thanks for any insight.
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph admin socket protocol

2016-07-10 Thread Stefan Priebe - Profihost AG

Am 10.07.2016 um 16:33 schrieb Daniel Swarbrick:
> If you can read C code, there is a collectd plugin that talks directly
> to the admin socket:
> 
> https://github.com/collectd/collectd/blob/master/src/ceph.c

thanks can read that.

Stefan

> 
> On 10/07/16 10:36, Stefan Priebe - Profihost AG wrote:
>> Hi,
>>
>> is the ceph admin socket protocol described anywhere? I want to talk
>> directly to the socket instead of calling the ceph binary. I searched
>> the doc but didn't find anything useful.
>>
>> Thanks,
>> Stefan
>>
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph admin socket protocol

2016-07-10 Thread Stefan Priebe - Profihost AG

Am 10.07.2016 um 20:08 schrieb John Spray:
> On Sun, Jul 10, 2016 at 9:36 AM, Stefan Priebe - Profihost AG
>  wrote:
>> Hi,
>>
>> is the ceph admin socket protocol described anywhere? I want to talk 
>> directly to the socket instead of calling the ceph binary. I searched the 
>> doc but didn't find anything useful.
> 
> There's no binary involved in sending commands to the admin socket,
> the CLI is using the python code here:
> https://github.com/ceph/ceph/blob/master/src/pybind/ceph_daemon.py

argh thanks ;-) never noticed the python code there.

> 
> Cheers,
> John
> 
>> Thanks,
>> Stefan
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph admin socket from non root

2016-07-10 Thread Stefan Priebe - Profihost AG
Hi,

is there a proposed way how to connect from non root f.e. a monitoring
system to the ceph admin socket?

In the past they were created with 777 permissions but now they're 755
which prevents me from connecting from our monitoring daemon. I don't
like to set CAP_DAC_OVERRIDE for the monitoring agent.

Greets,
Stefan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph admin socket protocol

2016-07-10 Thread Brad Hubbard
On Sun, Jul 10, 2016 at 09:32:33PM +0200, Stefan Priebe - Profihost AG wrote:
> 
> Am 10.07.2016 um 16:33 schrieb Daniel Swarbrick:
> > If you can read C code, there is a collectd plugin that talks directly
> > to the admin socket:
> > 
> > https://github.com/collectd/collectd/blob/master/src/ceph.c
> 
> thanks can read that.

If you're interested in using the AdminSocketClient here's some example code.

#include "common/admin_socket_client.h"

#include 

int main(int argc, char** argv)
{
std::string response;
AdminSocketClient client(argv[1]);
//client.do_request("{\"prefix\":\"help\"}", &response);
//client.do_request("{\"prefix\":\"help\", \"format\": \"json\"}", 
&response);
client.do_request("{\"prefix\":\"perf dump\"}", &response);
//client.do_request("{\"prefix\":\"perf dump\", \"format\": \"json\"}", 
&response);
std::cout << response << '\n';

return 0;

}

// $ g++ -O2 -std=c++11 ceph-admin-socket-test.cpp -I../ceph/src/ 
-I../ceph/build/include/ ../ceph/build/lib/libcommon.a


-- 
Cheers,
Brad

> 
> Stefan
> 
> > 
> > On 10/07/16 10:36, Stefan Priebe - Profihost AG wrote:
> >> Hi,
> >>
> >> is the ceph admin socket protocol described anywhere? I want to talk
> >> directly to the socket instead of calling the ceph binary. I searched
> >> the doc but didn't find anything useful.
> >>
> >> Thanks,
> >> Stefan
> >>
> > 
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Drive letters shuffled on reboot

2016-07-10 Thread Christian Balzer

Hello,

On Sun, 10 Jul 2016 12:46:39 + (UTC) William Josefsson wrote:

> Hi everyone,
> 
> I have problem with swapping drive and partition names on reboot. My
> Ceph is Hammer on CentOS7, Dell R730 6xSSD (2xSSD OS RAID1 PERC,
> 4xSSD=Journal drives), 18x1.8T SAS for OSDs.
> 
> Whenever I reboot, drives randomly seem to change names. This is
> extremely dangerous and frustrating when I've initially setup CEPH with
> ceph-deploy, zap, prepare and activate. It has happened that I've
> accidentally erased wrong disk too when e.g. /dev/sdX had
> become /dev/sdY.
>
This isn't a Ceph specific question per se and you could probably keep
things from moving around by enforcing module loads in a particular order.

But that of course still wouldn't help if something else changed or a
drive totally failed. 

So in the context of Ceph, it doesn't (shouldn't) care if the OSD (HDD)
changes names, especially since you did set it up with ceph-deploy.

And to avoid the journals getting jumbled up, do what everybody does
(outside of Ceph as well), use /dev/disk/by-id or uuid.

Like:
---
# ls -la /var/lib/ceph/osd/ceph-28/

 journal -> /dev/disk/by-id/wwn-0x55cd2e404b73d569-part3
---

Christian
> Please see an output below of how this drive swapping below appears SDC
> is shifted, indexes and drive names got shuffled. Ceph OSDs didn't come
> up properly.
> 
> Please advice on how to get this corrected, with no more drive name
> shuffling. Can this be due to the PERC HW raid? thx will
> 
> 
> 
> POST REBOOT 2 (expected outcome.. with sda,sdb,sdc,sdd as journal. sdw
> is a perc raid1)
> 
> 
> [cephnode3][INFO  ] Running command: sudo /usr/sbin/ceph-disk list
> [cephnode3][DEBUG ] /dev/sda :
> [cephnode3][DEBUG ]  /dev/sda1 other,
> ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sda2
> other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> ]  /dev/sda3 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
> [cephnode3][DEBUG ]  /dev/sda4 other,
> ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sda5
> other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> ] /dev/sdb : [cephnode3][DEBUG ]  /dev/sdb1 other,
> ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sdb2
> other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> ]  /dev/sdb3 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
> [cephnode3][DEBUG ]  /dev/sdb4 other,
> ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sdb5
> other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> ] /dev/sdc : [cephnode3][DEBUG ]  /dev/sdc1 other,
> ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sdc2
> other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> ]  /dev/sdc3 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
> [cephnode3][DEBUG ]  /dev/sdc4 other,
> ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sdc5
> other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> ] /dev/sdd : [cephnode3][DEBUG ]  /dev/sdd1 other,
> ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sdd2
> other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> ]  /dev/sdd3 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
> [cephnode3][DEBUG ]  /dev/sdd4 other,
> ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sdd5
> other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> ] /dev/sde : [cephnode3][DEBUG ]  /dev/sde1 ceph data, active, cluster
> ceph, osd.0 [cephnode3][DEBUG ] /dev/sdf : [cephnode3][DEBUG
> ]  /dev/sdf1 ceph data, active, cluster ceph, osd.1 [cephnode3][DEBUG
> ] /dev/sdg : [cephnode3][DEBUG ]  /dev/sdg1 ceph data, active, cluster
> ceph, osd.2 [cephnode3][DEBUG ] /dev/sdh : [cephnode3][DEBUG
> ]  /dev/sdh1 ceph data, active, cluster ceph, osd.3 [cephnode3][DEBUG
> ] /dev/sdi : [cephnode3][DEBUG ]  /dev/sdi1 ceph data, active, cluster
> ceph, osd.4 [cephnode3][DEBUG ] /dev/sdj : [cephnode3][DEBUG
> ]  /dev/sdj1 ceph data, active, cluster ceph, osd.5 [cephnode3][DEBUG
> ] /dev/sdk : [cephnode3][DEBUG ]  /dev/sdk1 ceph data, active, cluster
> ceph, osd.6 [cephnode3][DEBUG ] /dev/sdl : [cephnode3][DEBUG
> ]  /dev/sdl1 ceph data, active, cluster ceph, osd.7 [cephnode3][DEBUG
> ] /dev/sdm : [cephnode3][DEBUG ]  /dev/sdm1 other, xfs
> [cephnode3][DEBUG ] /dev/sdn :
> [cephnode3][DEBUG ]  /dev/sdn1 ceph data, active, cluster ceph, osd.9
> [cephnode3][DEBUG ] /dev/sdo :
> [cephnode3][DEBUG ]  /dev/sdo1 ceph data, active, cluster ceph, osd.10
> [cephnode3][DEBUG ] /dev/sdp :
> [cephnode3][DEBUG ]  /dev/sdp1 ceph data, active, cluster ceph, osd.11
> [cephnode3][DEBUG ] /dev/sdq :
> [cephnode3][DEBUG ]  /dev/sdq1 ceph data, active, cluster ceph, osd.12
> [cephnode3][DEBUG ] /dev/sdr :
> [cephnode3][DEBUG ]  /dev/sdr1 ceph data, active, cluster ceph, osd.13
> [cephnode3][DEBUG ] /dev/sds :
> [cephnode3][DEBUG ]  /dev/sds1 ceph data, active, cluster ceph, osd.14
> [cephnode3][DEBUG ] /dev/sdt :
> [cephnode3][DEBUG ]  /dev/sdt1 ceph data, active, cluster ceph, osd.15

Re: [ceph-users] Ceph for online file storage

2016-07-10 Thread Christian Balzer

Hello,

On Sun, 10 Jul 2016 14:33:36 + (GMT) m.da...@bluewin.ch wrote:

> Hello,
> 
> >Those 2 servers are running Ceph?
> >If so, be more specific, what's the HW like, CPU, RAM. network, journal
> >SSDs?
> 
> Yes, I was hesitating between GlusterFS and Ceph but the latter is much
> more scalable and is future-proof.
> 
> Both have the same configuration, namely E5 2628L (6c/12t @ 1.9GHz),
> 8x16G 2133MHz, 2x10G bonded (we only use 10G and fiber links), multiple
> 120G SSDs avaailable for journals and caching.
> 
With two of these CPUs (and SSD journals) definitely not more than 24 OSDs
per node.
RAM is plentiful.

Which exact SSD models?
None of the 120GB ones I can think of would make good journal ones.

> >Also, 2 servers indicate a replication of 2, something I'd avoid in
> >production.
> 
> This is true. I was thinking about EC instead of replication.
> 
With EC you need to keep several things in mind:

1. Performance, especially IOPS, is worse than replicated.
2. More CPU power is needed.
3. A cache tier is mandatory. 
4. Most importantly, you can't start small. 
   With something akin to RAID6 levels of redundancy, you probably want
   nothing smaller than 8 nodes (K=6,M=2). 

> >Your first and foremost way to improve IOPS is to have SSD journals,
> >everybody who deployed Ceph w/o them in any serious production
> >environment came to regret it.
> 
> I think it is clear that journal are a must, especially since many small
> files will be read and written to.
> 
> >Doubling the OSDs while halving the size will give you the same
> >space but at a much better performance.
> 
> It's true, but then the $/TB or even $/PB ratio is much higher. It would
> be interesting to compare the outcome with more lower-density disks vs
> less higher-density disks but with more (agressive) caching/journaling.
> 
You may find that it's a zero-sum game, more or less.

Basically you have the costs for chassis/MB/network cards per node that
push you towards higher density nodes to save costs.
OTOH cache-tier nodes (SSDs, NVMEs, CPUs) don't come cheap either.


> Your overview of the whole system definitely helps sorting things out.
> As you suggested, it's best I try some combinations to find what suits
> my use case best.
> 
> >If you were to use CephFS for storage, putting the metadata on SSDs will
> >be beneficial, too.
> 
> All OS drives are SSDs, and considering the system will never use the
> SSD in full I think it would be safe to partition it for MDS, cache and
> journal data.
> 
Again, needs to be right kind of SSD for this to work, but in general,
yes.
I do share OS/journal SSDs all the time.
Note that MDS in and by itself doesn't hold any persistent (on-disk) data,
the metadata is all in the Ceph meta-data pool and that's the one you want
to put on SSDs.

Christian
> --
> Sincères salutations,
> 
> Moïn Danai.
> Original Message
> From : ch...@gol.com
> Date : 01/07/2016 - 04:26 (CEST)
> To : ceph-users@lists.ceph.com
> Cc : m.da...@bluewin.ch
> Subject : Re: [ceph-users] Ceph for online file storage
> 
> 
> Hello,
> 
> On Thu, 30 Jun 2016 08:34:12 + (GMT) m.da...@bluewin.ch wrote:
> 
> > Thank you all for your prompt answers.
> > 
> > >firstly, wall of text, makes things incredibly hard to read.
> > >Use paragraphs/returns liberally.
> > 
> > I actually made sure to use paragraphs. For some reason, the formatting
> > was removed.
> > 
> > >Is that your entire experience with Ceph, ML archives and docs?
> > 
> > Of course not, I have already been through the whole documentation many
> > times. It's just that I couldn't really decide between the choices I
> > was given.
> > 
> > >What's an "online storage"?
> > >I assume you're talking about what is is commonly referred as "cloud
> > storage".
> > 
> > I try not to use the term "cloud", but if you must, then yes that's the
> > idea behind it. Basically an online hard disk.
> > 
> While I can certainly agree that "cloud" is overused and often mis-used
> as well, it makes things clearer in this context.
> 
> > >10MB is not a small file in my book, 1-4KB (your typical mail) are
> > >small files.
> > >How much data (volume/space) are you looking at initially and within a
> > >year of deployment?
> > 
> > 10MB is small compared to the larger files, but it is indeed bigger
> > that smaller, IOPS-intensive files (like the emails you pointed out).
> > 
> > Right now there are two servers, each with 12x8TB. I expect a growth
> > rate of about the same size every 2-3 months.
> > 
> Those 2 servers are running Ceph?
> If so, be more specific, what's the HW like, CPU, RAM. network, journal
> SSDs?
> 
> Also, 2 servers indicate a replication of 2, something I'd avoid in
> production.
> 
> 
> > >What usage patterns are you looking at, expecting?
> > 
> > Since my customers will put their files on this "cloud", it's generally
> > write once, read many (or at least more reads than writes). As they
> > most likely will store private documents, but some bigger fil

Re: [ceph-users] Drive letters shuffled on reboot

2016-07-10 Thread Gaurav Goyal
Hello,

This is an interesting topic and would like to know a solution to this
problem. Does that mean we should never use Dell storage as ceph storage
device?  I have similar setup with Dell 4 iscsi LUNs attached to openstack
controller and compute node in active-active situation.

As they were in active active 1 selected first 2 luns as osd on node1 and
last 2 as osd on node 2.

Is it ok to have this configuration specially when and node will be down or
considering live migration.

Regards
Gaurav Goyal
On 10-Jul-2016 9:02 pm, "Christian Balzer"  wrote:

>
> Hello,
>
> On Sun, 10 Jul 2016 12:46:39 + (UTC) William Josefsson wrote:
>
> > Hi everyone,
> >
> > I have problem with swapping drive and partition names on reboot. My
> > Ceph is Hammer on CentOS7, Dell R730 6xSSD (2xSSD OS RAID1 PERC,
> > 4xSSD=Journal drives), 18x1.8T SAS for OSDs.
> >
> > Whenever I reboot, drives randomly seem to change names. This is
> > extremely dangerous and frustrating when I've initially setup CEPH with
> > ceph-deploy, zap, prepare and activate. It has happened that I've
> > accidentally erased wrong disk too when e.g. /dev/sdX had
> > become /dev/sdY.
> >
> This isn't a Ceph specific question per se and you could probably keep
> things from moving around by enforcing module loads in a particular order.
>
> But that of course still wouldn't help if something else changed or a
> drive totally failed.
>
> So in the context of Ceph, it doesn't (shouldn't) care if the OSD (HDD)
> changes names, especially since you did set it up with ceph-deploy.
>
> And to avoid the journals getting jumbled up, do what everybody does
> (outside of Ceph as well), use /dev/disk/by-id or uuid.
>
> Like:
> ---
> # ls -la /var/lib/ceph/osd/ceph-28/
>
>  journal -> /dev/disk/by-id/wwn-0x55cd2e404b73d569-part3
> ---
>
> Christian
> > Please see an output below of how this drive swapping below appears SDC
> > is shifted, indexes and drive names got shuffled. Ceph OSDs didn't come
> > up properly.
> >
> > Please advice on how to get this corrected, with no more drive name
> > shuffling. Can this be due to the PERC HW raid? thx will
> >
> >
> >
> > POST REBOOT 2 (expected outcome.. with sda,sdb,sdc,sdd as journal. sdw
> > is a perc raid1)
> >
> >
> > [cephnode3][INFO  ] Running command: sudo /usr/sbin/ceph-disk list
> > [cephnode3][DEBUG ] /dev/sda :
> > [cephnode3][DEBUG ]  /dev/sda1 other,
> > ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sda2
> > other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> > ]  /dev/sda3 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
> > [cephnode3][DEBUG ]  /dev/sda4 other,
> > ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sda5
> > other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> > ] /dev/sdb : [cephnode3][DEBUG ]  /dev/sdb1 other,
> > ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sdb2
> > other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> > ]  /dev/sdb3 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
> > [cephnode3][DEBUG ]  /dev/sdb4 other,
> > ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sdb5
> > other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> > ] /dev/sdc : [cephnode3][DEBUG ]  /dev/sdc1 other,
> > ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sdc2
> > other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> > ]  /dev/sdc3 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
> > [cephnode3][DEBUG ]  /dev/sdc4 other,
> > ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sdc5
> > other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> > ] /dev/sdd : [cephnode3][DEBUG ]  /dev/sdd1 other,
> > ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sdd2
> > other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> > ]  /dev/sdd3 other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
> > [cephnode3][DEBUG ]  /dev/sdd4 other,
> > ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG ]  /dev/sdd5
> > other, ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 [cephnode3][DEBUG
> > ] /dev/sde : [cephnode3][DEBUG ]  /dev/sde1 ceph data, active, cluster
> > ceph, osd.0 [cephnode3][DEBUG ] /dev/sdf : [cephnode3][DEBUG
> > ]  /dev/sdf1 ceph data, active, cluster ceph, osd.1 [cephnode3][DEBUG
> > ] /dev/sdg : [cephnode3][DEBUG ]  /dev/sdg1 ceph data, active, cluster
> > ceph, osd.2 [cephnode3][DEBUG ] /dev/sdh : [cephnode3][DEBUG
> > ]  /dev/sdh1 ceph data, active, cluster ceph, osd.3 [cephnode3][DEBUG
> > ] /dev/sdi : [cephnode3][DEBUG ]  /dev/sdi1 ceph data, active, cluster
> > ceph, osd.4 [cephnode3][DEBUG ] /dev/sdj : [cephnode3][DEBUG
> > ]  /dev/sdj1 ceph data, active, cluster ceph, osd.5 [cephnode3][DEBUG
> > ] /dev/sdk : [cephnode3][DEBUG ]  /dev/sdk1 ceph data, active, cluster
> > ceph, osd.6 [cephnode3][DEBUG ] /dev/sdl : [cephnode3][DEBUG
> > ]  /dev/sdl1 ceph data, active, cluster ceph, osd.7 [cephnode3][DEBUG
> > ] /dev/sdm : [cephnode3][DEBUG ]  /dev/sdm1 other, xfs
> > [cephn

Re: [ceph-users] Backing up RBD snapshots to a different cloud service

2016-07-10 Thread Alex Gorbachev
Hi Brendan,

On Friday, July 8, 2016, Brendan Moloney  wrote:

> Hi,
>
> We have a smallish Ceph cluster for RBD images. We use snapshotting for
> local incremental backups.  I would like to start sending some of these
> snapshots to an external cloud service (likely Amazon) for disaster
> recovery purposes.
>
> Does anyone have advice on how to do this?  I suppose I could just use the
> rbd export/diff commands but some of our RBD images are quite large
> (multiple terabytes) so I can imagine this becoming quite inefficient. We
> would either need to keep all snapshots indefinitely and retrieve every
> single snapshot to recover or we would have to occasionally send a new full
> disk image.
>
> I guess doing the backups on the object level could potentially avoid
> these issues, but I am not sure how to go about that.
>

We are currently rolling out a solution that  utilizes merge-diff command
to continuously create synthetic fulls at the remote site.  The remote site
needs to be more than just storage, e.g. a Linux VM or such, but as long as
the continuity of snapshots is maintained, you should be able to recover
from just the one image.

Detecting start and end snapshot of a diff export file is not hard, I asked
details earlier on this list, and would be happy to send you code stubs in
Perl if you are interested.

Another option, which we have not yet tried with RBD exports is the
borgbackup project, which offers excellent deduplication.

HTH,
Alex


>
>
> Any advice is greatly appreciated.
>
> Thanks,
> Brendan
>





-- 
--
Alex Gorbachev
Storcium
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Ceph OSD suicide himself

2016-07-10 Thread 한승진
Hi cephers.

I need your help for some issues.

The ceph cluster version is Jewel(10.2.1), and the filesytem is btrfs.

I run 1 Mon and 48 OSD in 4 Nodes(each node has 12 OSDs).

I've experienced one of OSDs was killed himself.

Always it issued suicide timeout message.

Below is detailed logs.


==
0. ceph df detail
$ sudo ceph df detail
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED OBJECTS
42989G 24734G   18138G 42.19  23443k
POOLS:
NAMEID CATEGORY QUOTA OBJECTS QUOTA BYTES USED
  %USED MAX AVAIL OBJECTS  DIRTY  READ   WRITE
 RAW USED
ha-pool 40 -N/A   N/A
 1405G  9.81 5270G 22986458 22447k  0
22447k4217G
volumes 45 -N/A   N/A
 4093G 28.57 5270G   933401   911k   648M
649M   12280G
images  46 -N/A   N/A
53745M  0.37 5270G 6746   6746  1278k
 21046 157G
backups 47 -N/A   N/A
 0 0 5270G0  0  0  0
 0
vms 48 -N/A   N/A
309G  2.16 5270G79426  79426 92612k 46506k
928G

1. ceph no.15 log

*(20:02 first timed out message)*
2016-07-08 20:02:01.049483 7fcd3caa5700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fcd2c284700' had timed out after 15
2016-07-08 20:02:01.050403 7fcd3b2a2700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fcd2c284700' had timed out after 15
2016-07-08 20:02:01.086792 7fcd3b2a2700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fcd2c284700' had timed out after 15
.
.
(sometimes this logs with..)
2016-07-08 20:02:11.379597 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
12 slow requests, 5 included below; oldest blocked for > 30.269577 secs
2016-07-08 20:02:11.379608 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
slow request 30.269577 seconds old, received at 2016-07-08 20:01:41.109937:
osd_op(client.895668.0:5302745 45.e2e779c2
rbd_data.cc460bc7fc8f.04d8 [stat,write 2596864~516096] snapc
0=[] ack+ondisk+write+known_if_redirected e30969) currently commit_sent
2016-07-08 20:02:11.379612 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
slow request 30.269108 seconds old, received at 2016-07-08 20:01:41.110406:
osd_op(client.895668.0:5302746 45.e2e779c2
rbd_data.cc460bc7fc8f.04d8 [stat,write 3112960~516096] snapc
0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
locks
2016-07-08 20:02:11.379630 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
slow request 30.268607 seconds old, received at 2016-07-08 20:01:41.110907:
osd_op(client.895668.0:5302747 45.e2e779c2
rbd_data.cc460bc7fc8f.04d8 [stat,write 3629056~516096] snapc
0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
locks
2016-07-08 20:02:11.379633 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
slow request 30.268143 seconds old, received at 2016-07-08 20:01:41.111371:
osd_op(client.895668.0:5302748 45.e2e779c2
rbd_data.cc460bc7fc8f.04d8 [stat,write 4145152~516096] snapc
0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
locks
2016-07-08 20:02:11.379636 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
slow request 30.267662 seconds old, received at 2016-07-08 20:01:41.111852:
osd_op(client.895668.0:5302749 45.e2e779c2
rbd_data.cc460bc7fc8f.04d8 [stat,write 4661248~516096] snapc
0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
locks
.
.
(after a lot of same messages)
2016-07-08 20:03:53.682828 7fcd3caa5700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fcd2d286700' had timed out after 15
2016-07-08 20:03:53.682828 7fcd3caa5700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fcd2da87700' had timed out after 15
2016-07-08 20:03:53.682829 7fcd3caa5700  1 heartbeat_map is_healthy
'FileStore::op_tp thread 0x7fcd48716700' had timed out after 60
2016-07-08 20:03:53.682830 7fcd3caa5700  1 heartbeat_map is_healthy
'FileStore::op_tp thread 0x7fcd47f15700' had timed out after 60
.
.
(fault with nothing to send, going to standby massages)
2016-07-08 20:03:53.708665 7fcd15787700  0 -- 10.200.10.145:6818/6462 >>
10.200.10.146:6806/4642 pipe(0x55818727e000 sd=276 :51916 s=2 pgs=2225 cs=1
l=0 c=0x558186f61d80).fault with nothing to send, going to standby
2016-07-08 20:03:53.724928 7fcd072c2700  0 -- 10.200.10.145:6818/6462 >>
10.200.10.146:6800/4336 pipe(0x55818a25b400 sd=109 :6818 s=2 pgs=2440 cs=13
l=0 c=0x55818730f080).fault with nothing to send, going to standby
2016-07-08 20:03:53.738216 7fcd0b7d3700  0 -- 10.200.10.145:6818/6462 >>
10.200.10.144:6814/5069 pipe(0x55816c6a4800 sd=334 :53850 s=2 pgs=43 cs=1
l=0 c=0x55818611f800).fa

Re: [ceph-users] Fwd: Ceph OSD suicide himself

2016-07-10 Thread Brad Hubbard
On Mon, Jul 11, 2016 at 11:48:57AM +0900, 한승진 wrote:
> Hi cephers.
> 
> I need your help for some issues.
> 
> The ceph cluster version is Jewel(10.2.1), and the filesytem is btrfs.
> 
> I run 1 Mon and 48 OSD in 4 Nodes(each node has 12 OSDs).
> 
> I've experienced one of OSDs was killed himself.
> 
> Always it issued suicide timeout message.
> 
> Below is detailed logs.
> 
> 
> ==
> 0. ceph df detail
> $ sudo ceph df detail
> GLOBAL:
> SIZE   AVAIL  RAW USED %RAW USED OBJECTS
> 42989G 24734G   18138G 42.19  23443k
> POOLS:
> NAMEID CATEGORY QUOTA OBJECTS QUOTA BYTES USED
>   %USED MAX AVAIL OBJECTS  DIRTY  READ   WRITE
>  RAW USED
> ha-pool 40 -N/A   N/A
>  1405G  9.81 5270G 22986458 22447k  0
> 22447k4217G
> volumes 45 -N/A   N/A
>  4093G 28.57 5270G   933401   911k   648M
> 649M   12280G
> images  46 -N/A   N/A
> 53745M  0.37 5270G 6746   6746  1278k
>  21046 157G
> backups 47 -N/A   N/A
>  0 0 5270G0  0  0  0
>  0
> vms 48 -N/A   N/A
> 309G  2.16 5270G79426  79426 92612k 46506k
> 928G
> 
> 1. ceph no.15 log
> 
> *(20:02 first timed out message)*
> 2016-07-08 20:02:01.049483 7fcd3caa5700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fcd2c284700' had timed out after 15
> 2016-07-08 20:02:01.050403 7fcd3b2a2700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fcd2c284700' had timed out after 15
> 2016-07-08 20:02:01.086792 7fcd3b2a2700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fcd2c284700' had timed out after 15
> .
> .
> (sometimes this logs with..)
> 2016-07-08 20:02:11.379597 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
> 12 slow requests, 5 included below; oldest blocked for > 30.269577 secs
> 2016-07-08 20:02:11.379608 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
> slow request 30.269577 seconds old, received at 2016-07-08 20:01:41.109937:
> osd_op(client.895668.0:5302745 45.e2e779c2
> rbd_data.cc460bc7fc8f.04d8 [stat,write 2596864~516096] snapc
> 0=[] ack+ondisk+write+known_if_redirected e30969) currently commit_sent
> 2016-07-08 20:02:11.379612 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
> slow request 30.269108 seconds old, received at 2016-07-08 20:01:41.110406:
> osd_op(client.895668.0:5302746 45.e2e779c2
> rbd_data.cc460bc7fc8f.04d8 [stat,write 3112960~516096] snapc
> 0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
> locks
> 2016-07-08 20:02:11.379630 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
> slow request 30.268607 seconds old, received at 2016-07-08 20:01:41.110907:
> osd_op(client.895668.0:5302747 45.e2e779c2
> rbd_data.cc460bc7fc8f.04d8 [stat,write 3629056~516096] snapc
> 0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
> locks
> 2016-07-08 20:02:11.379633 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
> slow request 30.268143 seconds old, received at 2016-07-08 20:01:41.111371:
> osd_op(client.895668.0:5302748 45.e2e779c2
> rbd_data.cc460bc7fc8f.04d8 [stat,write 4145152~516096] snapc
> 0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
> locks
> 2016-07-08 20:02:11.379636 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
> slow request 30.267662 seconds old, received at 2016-07-08 20:01:41.111852:
> osd_op(client.895668.0:5302749 45.e2e779c2
> rbd_data.cc460bc7fc8f.04d8 [stat,write 4661248~516096] snapc
> 0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
> locks
> .
> .
> (after a lot of same messages)
> 2016-07-08 20:03:53.682828 7fcd3caa5700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fcd2d286700' had timed out after 15
> 2016-07-08 20:03:53.682828 7fcd3caa5700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fcd2da87700' had timed out after 15
> 2016-07-08 20:03:53.682829 7fcd3caa5700  1 heartbeat_map is_healthy
> 'FileStore::op_tp thread 0x7fcd48716700' had timed out after 60
> 2016-07-08 20:03:53.682830 7fcd3caa5700  1 heartbeat_map is_healthy
> 'FileStore::op_tp thread 0x7fcd47f15700' had timed out after 60
> .
> .
> (fault with nothing to send, going to standby massages)
> 2016-07-08 20:03:53.708665 7fcd15787700  0 -- 10.200.10.145:6818/6462 >>
> 10.200.10.146:6806/4642 pipe(0x55818727e000 sd=276 :51916 s=2 pgs=2225 cs=1
> l=0 c=0x558186f61d80).fault with nothing to send, going to standby
> 2016-07-08 20:03:53.724928 7fcd072c2700  0 -- 10.200.10.145:6818/6462 >>
> 10.200.10.146:6800/4336 pipe(0x55818a25b400 sd=109 :681

Re: [ceph-users] Fwd: Ceph OSD suicide himself

2016-07-10 Thread Brad Hubbard
On Mon, Jul 11, 2016 at 1:21 PM, Brad Hubbard  wrote:
> On Mon, Jul 11, 2016 at 11:48:57AM +0900, 한승진 wrote:
>> Hi cephers.
>>
>> I need your help for some issues.
>>
>> The ceph cluster version is Jewel(10.2.1), and the filesytem is btrfs.
>>
>> I run 1 Mon and 48 OSD in 4 Nodes(each node has 12 OSDs).
>>
>> I've experienced one of OSDs was killed himself.
>>
>> Always it issued suicide timeout message.
>>
>> Below is detailed logs.
>>
>>
>> ==
>> 0. ceph df detail
>> $ sudo ceph df detail
>> GLOBAL:
>> SIZE   AVAIL  RAW USED %RAW USED OBJECTS
>> 42989G 24734G   18138G 42.19  23443k
>> POOLS:
>> NAMEID CATEGORY QUOTA OBJECTS QUOTA BYTES USED
>>   %USED MAX AVAIL OBJECTS  DIRTY  READ   WRITE
>>  RAW USED
>> ha-pool 40 -N/A   N/A
>>  1405G  9.81 5270G 22986458 22447k  0
>> 22447k4217G
>> volumes 45 -N/A   N/A
>>  4093G 28.57 5270G   933401   911k   648M
>> 649M   12280G
>> images  46 -N/A   N/A
>> 53745M  0.37 5270G 6746   6746  1278k
>>  21046 157G
>> backups 47 -N/A   N/A
>>  0 0 5270G0  0  0  0
>>  0
>> vms 48 -N/A   N/A
>> 309G  2.16 5270G79426  79426 92612k 46506k
>> 928G
>>
>> 1. ceph no.15 log
>>
>> *(20:02 first timed out message)*
>> 2016-07-08 20:02:01.049483 7fcd3caa5700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7fcd2c284700' had timed out after 15
>> 2016-07-08 20:02:01.050403 7fcd3b2a2700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7fcd2c284700' had timed out after 15
>> 2016-07-08 20:02:01.086792 7fcd3b2a2700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7fcd2c284700' had timed out after 15
>> .
>> .
>> (sometimes this logs with..)
>> 2016-07-08 20:02:11.379597 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
>> 12 slow requests, 5 included below; oldest blocked for > 30.269577 secs
>> 2016-07-08 20:02:11.379608 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
>> slow request 30.269577 seconds old, received at 2016-07-08 20:01:41.109937:
>> osd_op(client.895668.0:5302745 45.e2e779c2
>> rbd_data.cc460bc7fc8f.04d8 [stat,write 2596864~516096] snapc
>> 0=[] ack+ondisk+write+known_if_redirected e30969) currently commit_sent
>> 2016-07-08 20:02:11.379612 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
>> slow request 30.269108 seconds old, received at 2016-07-08 20:01:41.110406:
>> osd_op(client.895668.0:5302746 45.e2e779c2
>> rbd_data.cc460bc7fc8f.04d8 [stat,write 3112960~516096] snapc
>> 0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
>> locks
>> 2016-07-08 20:02:11.379630 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
>> slow request 30.268607 seconds old, received at 2016-07-08 20:01:41.110907:
>> osd_op(client.895668.0:5302747 45.e2e779c2
>> rbd_data.cc460bc7fc8f.04d8 [stat,write 3629056~516096] snapc
>> 0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
>> locks
>> 2016-07-08 20:02:11.379633 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
>> slow request 30.268143 seconds old, received at 2016-07-08 20:01:41.111371:
>> osd_op(client.895668.0:5302748 45.e2e779c2
>> rbd_data.cc460bc7fc8f.04d8 [stat,write 4145152~516096] snapc
>> 0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
>> locks
>> 2016-07-08 20:02:11.379636 7fcd4d8f8700  0 log_channel(cluster) log [WRN] :
>> slow request 30.267662 seconds old, received at 2016-07-08 20:01:41.111852:
>> osd_op(client.895668.0:5302749 45.e2e779c2
>> rbd_data.cc460bc7fc8f.04d8 [stat,write 4661248~516096] snapc
>> 0=[] ack+ondisk+write+known_if_redirected e30969) currently waiting for rw
>> locks
>> .
>> .
>> (after a lot of same messages)
>> 2016-07-08 20:03:53.682828 7fcd3caa5700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7fcd2d286700' had timed out after 15
>> 2016-07-08 20:03:53.682828 7fcd3caa5700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7fcd2da87700' had timed out after 15
>> 2016-07-08 20:03:53.682829 7fcd3caa5700  1 heartbeat_map is_healthy
>> 'FileStore::op_tp thread 0x7fcd48716700' had timed out after 60
>> 2016-07-08 20:03:53.682830 7fcd3caa5700  1 heartbeat_map is_healthy
>> 'FileStore::op_tp thread 0x7fcd47f15700' had timed out after 60
>> .
>> .
>> (fault with nothing to send, going to standby massages)
>> 2016-07-08 20:03:53.708665 7fcd15787700  0 -- 10.200.10.145:6818/6462 >>
>> 10.200.10.146:6806/4642 pipe(0x55818727e000 sd=276 :51916 s=2 pgs=2225 cs=1
>> l=0 c=0x558186f61d80).fault with nothing to send, go

[ceph-users] Slow performance into windows VM

2016-07-10 Thread K K

Hello, guys

I to face a task poor performance into windows 2k12r2 instance running on rbd 
(openstack cluster). RBD disk have a size 17Tb. My ceph cluster consist from:
- 3 monitors nodes (Celeron G530/6Gb RAM, DualCore E6500/2Gb RAM, Core2Duo 
E7500/2Gb RAM). Each node have 1Gbit network to frontend subnet od Ceph cluster
- 2 block nodes (Xeon E5620/32Gb RAM/2*1Gbit NIC). Each node have 2*500Gb HDD 
for operation system and 9*3Tb SATA HDD (WD SE). Total 18 OSD daemons on 2 
nodes. Journals placed on same HDD as a rados data. I know that better using 
for those purpose separate SSD disk.
When I test new windows instance performance was good (read/write something 
about 100Mb/sec). But after I copied 16Tb data to windows instance read 
performance has down to 10Mb/sec. Type of data on VM - image and video.

ceph.conf on client side:
[global]
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
filestore xattr use omap = true
filestore max sync interval = 10
filestore queue max ops = 3000
filestore queue commiting max bytes = 1048576000
filestore queue commiting max ops = 5000
filestore queue max bytes = 1048576000
filestore queue committing max ops = 4096
filestore queue committing max bytes = 16 MiB
filestore op threads = 20
filestore flusher = false
filestore journal parallel = false
filestore journal writeahead = true
journal dio = true
journal aio = true
journal force aio = true
journal block align = true
journal max write bytes = 1048576000
journal_discard = true
osd pool default size = 2 # Write an object n times.
osd pool default min size = 1
osd pool default pg num = 333
osd pool default pgp num = 333
osd crush chooseleaf type = 1

[client]
rbd cache = true
rbd cache size = 67108864
rbd cache max dirty = 50331648
rbd cache target dirty = 33554432
rbd cache max dirty age = 2
rbd cache writethrough until flush = true


rados bench show from block node show:
rados bench -p scbench 120 write --no-cleanup
Total time run: 120.399337
Total writes made: 3538
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 117.542
Stddev Bandwidth: 9.31244
Max bandwidth (MB/sec): 148
Min bandwidth (MB/sec): 92
Average IOPS: 29
Stddev IOPS: 2
Max IOPS: 37
Min IOPS: 23
Average Latency(s): 0.544365
Stddev Latency(s): 0.35825
Max latency(s): 5.42548
Min latency(s): 0.101533

rados bench -p scbench 120 seq
Total time run: 120.880920
Total reads made: 1932
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 63.9307
Average IOPS 15
Stddev IOPS: 3
Max IOPS: 25
Min IOPS: 5
Average Latency(s): 0.999095
Max latency(s): 8.50774
Min latency(s): 0.0391591

rados bench -p scbench 120 rand
Total time run: 121.059005
Total reads made: 1920
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 63.4401
Average IOPS: 15
Stddev IOPS: 4
Max IOPS: 26
Min IOPS: 1
Average Latency(s): 1.00785
Max latency(s): 6.48138
Min latency(s): 0.038925

On XFS partitions fragmentation no more than 1%
On libvirt disk connected so:











4680524c-2c10-47a3-af59-2e1bd12a7ce4


Do anybody some idea?





Konstantin___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow performance into windows VM

2016-07-10 Thread Christian Balzer

Hello,

On Mon, 11 Jul 2016 07:35:02 +0300 K K wrote:

> 
> Hello, guys
> 
> I to face a task poor performance into windows 2k12r2 instance running
> on rbd (openstack cluster). RBD disk have a size 17Tb. My ceph cluster
> consist from:
> - 3 monitors nodes (Celeron G530/6Gb RAM, DualCore E6500/2Gb RAM,
> Core2Duo E7500/2Gb RAM). Each node have 1Gbit network to frontend subnet
> od Ceph cluster

I hope the fastest of these MONs (CPU and storage) has the lowest IP
number and thus is the leader.

Also what Ceph, OS, kernel version?

> - 2 block nodes (Xeon E5620/32Gb RAM/2*1Gbit NIC). Each node have
> 2*500Gb HDD for operation system and 9*3Tb SATA HDD (WD SE). Total 18
> OSD daemons on 2 nodes. 

Two GbE ports, given the "frontend" up there with the MON description I
assume that's 1 port per client (front) and cluster (back) network?

>Journals placed on same HDD as a rados data. I
> know that better using for those purpose separate SSD disk. 
Indeed...

>When I test
> new windows instance performance was good (read/write something about
> 100Mb/sec). But after I copied 16Tb data to windows instance read
> performance has down to 10Mb/sec. Type of data on VM - image and video.
> 
100MB/s would be absolute perfect with the setup you have, assuming no
contention (other clients).

Is there any other client on than that Windows VM on your Ceph cluster?

> ceph.conf on client side:
> [global]
> auth cluster required = cephx
> auth service required = cephx
> auth client required = cephx
> filestore xattr use omap = true
> filestore max sync interval = 10
> filestore queue max ops = 3000
> filestore queue commiting max bytes = 1048576000
> filestore queue commiting max ops = 5000
> filestore queue max bytes = 1048576000
> filestore queue committing max ops = 4096
> filestore queue committing max bytes = 16 MiB
^^^
Is Ceph understanding this now?
Other than that, the queue options aren't likely to do much good with pure
HDD OSDs.

> filestore op threads = 20
> filestore flusher = false
> filestore journal parallel = false
> filestore journal writeahead = true
> journal dio = true
> journal aio = true
> journal force aio = true
> journal block align = true
> journal max write bytes = 1048576000
> journal_discard = true
> osd pool default size = 2 # Write an object n times.
> osd pool default min size = 1
> osd pool default pg num = 333
> osd pool default pgp num = 333
That should be 512, 1024 really with one RBD pool.
http://ceph.com/pgcalc/

> osd crush chooseleaf type = 1
> 
> [client]
> rbd cache = true
> rbd cache size = 67108864
> rbd cache max dirty = 50331648
> rbd cache target dirty = 33554432
> rbd cache max dirty age = 2
> rbd cache writethrough until flush = true
> 
> 
> rados bench show from block node show:
Wrong way to test this, test it from a monitor node, another client node
(like your openstack nodes).
In your 2 node cluster half of the reads or writes will be local, very
much skewing your results.

> rados bench -p scbench 120 write --no-cleanup

Default tests with 4MB "blocks", what are the writes or reads from you
client VM like?

> Total time run: 120.399337
> Total writes made: 3538
> Write size: 4194304
> Object size: 4194304
> Bandwidth (MB/sec): 117.542
> Stddev Bandwidth: 9.31244
> Max bandwidth (MB/sec): 148 
  ^^^
That wouldn't be possible from an external client.

> Min bandwidth (MB/sec): 92
> Average IOPS: 29
> Stddev IOPS: 2
> Max IOPS: 37
> Min IOPS: 23
> Average Latency(s): 0.544365
> Stddev Latency(s): 0.35825
> Max latency(s): 5.42548
Very high max latency, telling us that your cluster ran out of steam at
some point.

> Min latency(s): 0.101533
> 
> rados bench -p scbench 120 seq
> Total time run: 120.880920
> Total reads made: 1932
> Read size: 4194304
> Object size: 4194304
> Bandwidth (MB/sec): 63.9307
> Average IOPS 15
> Stddev IOPS: 3
> Max IOPS: 25
> Min IOPS: 5
> Average Latency(s): 0.999095
> Max latency(s): 8.50774
> Min latency(s): 0.0391591
> 
> rados bench -p scbench 120 rand
> Total time run: 121.059005
> Total reads made: 1920
> Read size: 4194304
> Object size: 4194304
> Bandwidth (MB/sec): 63.4401
> Average IOPS: 15
> Stddev IOPS: 4
> Max IOPS: 26
> Min IOPS: 1
> Average Latency(s): 1.00785
> Max latency(s): 6.48138
> Min latency(s): 0.038925
> 
> On XFS partitions fragmentation no more than 1%
I'd de-frag anyway, just to rule that out.

When doing your tests or normal (busy) operations from the client VM, run
atop on your storage nodes and observe your OSD HDDs. 
Do they get busy, around 100%?

Check with iperf or NPtcp that your network to the clients from the
storage nodes is fully functional. 

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph

Re: [ceph-users] Slow performance into windows VM

2016-07-10 Thread K K

> I hope the fastest of these MONs (CPU and storage) has the lowest IP
> number and thus is the leader.
no, the lowest IP has slowest CPU. But zabbix didn't show any load at all mons.
> Also what Ceph, OS, kernel version?

ubuntu 16.04 kernel 4.4.0-22

> Two GbE ports, given the "frontend" up there with the MON description I
> assume that's 1 port per client (front) and cluster (back) network?
yes, one GbE for ceph client, one GbE for back network.
> Is there any other client on than that Windows VM on your Ceph cluster?
Yes, another one instance but without load.
> Is Ceph understanding this now?
> Other than that, the queue options aren't likely to do much good with pure
>HDD OSDs.

I can't find those parameter in running config:
ceph --admin-daemon /var/run/ceph/ceph-mon.block01.asok config show|grep 
"filestore_queue"
"filestore_queue_max_ops": "3000",
"filestore_queue_max_bytes": "1048576000",
"filestore_queue_max_delay_multiple": "0",
"filestore_queue_high_delay_multiple": "0",
"filestore_queue_low_threshhold": "0.3",
"filestore_queue_high_threshhold": "0.9",
> That should be 512, 1024 really with one RBD pool.

Yes, I know. Today for test I added scbench pool with 128 pg
There are output status and osd tree:
ceph status
cluster 830beb43-9898-4fa9-98c1-ee04c1cdf69c
health HEALTH_OK
monmap e6: 3 mons at 
{block01=10.30.9.21:6789/0,object01=10.30.9.129:6789/0,object02=10.30.9.130:6789/0}
election epoch 238, quorum 0,1,2 block01,object01,object02
osdmap e6887: 18 osds: 18 up, 18 in
pgmap v9738812: 1280 pgs, 3 pools, 17440 GB data, 4346 kobjects
35049 GB used, 15218 GB / 50267 GB avail
1275 active+clean
3 active+clean+scrubbing+deep
2 active+clean+scrubbing
client io 5030 kB/s rd, 1699 B/s wr, 19 op/s rd, 0 op/s wr

ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 54.0 root default 
-2 27.0 host cn802 
0 3.0 osd.0 up 1.0 1.0 
2 3.0 osd.2 up 1.0 1.0 
4 3.0 osd.4 up 1.0 1.0 
6 3.0 osd.6 up 0.89995 1.0 
8 3.0 osd.8 up 1.0 1.0 
10 3.0 osd.10 up 1.0 1.0 
12 3.0 osd.12 up 0.8 1.0 
16 3.0 osd.16 up 1.0 1.0 
18 3.0 osd.18 up 0.90002 1.0 
-3 27.0 host cn803 
1 3.0 osd.1 up 1.0 1.0 
3 3.0 osd.3 up 0.95316 1.0 
5 3.0 osd.5 up 1.0 1.0 
7 3.0 osd.7 up 1.0 1.0 
9 3.0 osd.9 up 1.0 1.0 
11 3.0 osd.11 up 0.95001 1.0 
13 3.0 osd.13 up 1.0 1.0 
17 3.0 osd.17 up 0.84999 1.0 
19 3.0 osd.19 up 1.0 1.0
> Wrong way to test this, test it from a monitor node, another client node
> (like your openstack nodes).
> In your 2 node cluster half of the reads or writes will be local, very
> much skewing your results.
I have been tested from copmute node also and have same result. 80-100Mb/sec

> Very high max latency, telling us that your cluster ran out of steam at
some point.

I copying data from my windows instance right now.
> I'd de-frag anyway, just to rule that out.


>When doing your tests or normal (busy) operations from the client VM, run
> atop on your storage nodes and observe your OSD HDDs. 
> Do they get busy, around 100%?

Yes, high IO load (600-800 io).  But this is very strange on SATA HDD. All HDD 
have own OSD daemon and presented in OS as hardware RAID0(each block node have 
hardware RAID). Example:
avg-cpu: %user %nice %system %iowait %steal %idle
1.44 0.00 3.56 17.56 0.00 77.44
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await 
w_await svctm %util
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 649.00 0.00 82912.00 0.00 255.51 8.30 12.74 12.74 0.00 1.26 81.60
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 761.00 0.00 94308.00 0.00 247.85 8.66 11.26 11.26 0.00 1.18 90.00
sdg 0.00 0.00 761.00 0.00 97408.00 0.00 256.00 7.80 10.22 10.22 0.00 1.01 76.80
sdh 0.00 0.00 801.00 0.00 102344.00 0.00 255.54 8.05 10.05 10.05 0.00 0.96 76.80
sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdj 0.00 0.00 537.00 0.00 68736.00 0.00 256.00 5.54 10.26 10.26 0.00 0.98 52.80


> Check with iperf or NPtcp that your network to the clients from the
> storage nodes is fully functional. 
The network have been tested by iperf. 950-970Mbit among all nodes in clustes 
(openstack and ceph) Понедельник, 11 июля 2016, 10:58 +05:00 от Christian 
Balzer :
>
>
>Hello,
>
>On Mon, 11 Jul 2016 07:35:02 +0300 K K wrote:
>
>> 
>> Hello, guys
>> 
>> I to face a task poor performance into windows 2k12r2 instance running
>> on rbd (openstack cluster). RBD disk have a size 17Tb. My ceph cluster
>> consist from:
>> - 3 monitors nodes (Celeron G530/6Gb RAM, DualCore E6500/2Gb RAM,
>> Core2Duo E7500/2Gb RAM). Each node have 1Gbit network to frontend subnet
>> od Ceph cluster
>
>I hope the fastest of these MONs (CPU and storage) has the lowest IP
>number and thus is th

[ceph-users] drop i386 support

2016-07-10 Thread kefu chai
Hi Cephers,

I am proposing drop the support of i386. as we don't compile Ceph with
any i386 gitbuilder now[1] and hence don't test the i386 builds on
sepia on a regular basis. Also, based on the assumption that people
don't use i386 in production, I think we can drop it from the minimum
hardware document[2]?

And we won't explicitly disable the i386 build in code if we decide to
drop the i386 support, as we always try to be portable if possible.
But just don't claim the i386 as the officially supported arch
anymore.

What do you think?

---
[1] http://ceph.com/gitbuilder.cgi
[2] 
http://docs.ceph.com/docs/master/start/hardware-recommendations/#minimum-hardware-recommendations


-- 
Regards
Kefu Chai
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com