Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Wade Holler
Hi there,

What is the best way to "look at the rgw admin socket " to see what
operations are taking a long time ?

Best Regards
Wade
On Mon, Feb 8, 2016 at 12:16 PM Gregory Farnum  wrote:

> On Mon, Feb 8, 2016 at 8:49 AM, Kris Jurka  wrote:
> >
> > I've been testing the performance of ceph by storing objects through RGW.
> > This is on Debian with Hammer using 40 magnetic OSDs, 5 mons, and 4 RGW
> > instances.  Initially the storage time was holding reasonably steady,
> but it
> > has started to rise recently as shown in the attached chart.
> >
> > The test repeatedly saves 100k objects of 55 kB size using multiple
> threads
> > (50) against multiple RGW gateways (4).  It uses a sequential identifier
> as
> > the object key and shards the bucket name using id % 100.  The buckets
> have
> > index sharding enabled with 64 index shards per bucket.
> >
> > ceph status doesn't appear to show any issues.  Is there something I
> should
> > be looking at here?
> >
> >
> > # ceph status
> > cluster 3fc86d01-cf9c-4bed-b130-7a53d7997964
> >  health HEALTH_OK
> >  monmap e2: 5 mons at
> > {condor=
> 192.168.188.90:6789/0,duck=192.168.188.140:6789/0,eagle=192.168.188.100:6789/0,falcon=192.168.188.110:6789/0,shark=192.168.188.118:6789/0
> }
> > election epoch 18, quorum 0,1,2,3,4
> > condor,eagle,falcon,shark,duck
> >  osdmap e674: 40 osds: 40 up, 40 in
> >   pgmap v258756: 3128 pgs, 10 pools, 1392 GB data, 27282 kobjects
> > 4784 GB used, 69499 GB / 74284 GB avail
> > 3128 active+clean
> >   client io 268 kB/s rd, 1100 kB/s wr, 493 op/s
>
> It's probably a combination of your bucket indices getting larger and
> your PGs getting split into subfolders on the OSDs. If you keep
> running tests and things get slower it's the first; if they speed
> partway back up again it's the latter.
> Other things to check:
> * you can look at your OSD stores and how the object files are divvied up.
> * you can look at the rgw admin socket and/or logs to see what
> operations are the ones taking time
> * you can check the dump_historic_ops on the OSDs to see if there are
> any notably slow ops
> -Greg
>
> >
> >
> > Kris Jurka
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] K is for Kraken

2016-02-09 Thread Dan van der Ster
On Mon, Feb 8, 2016 at 8:10 PM, Sage Weil  wrote:
> On Mon, 8 Feb 2016, Karol Mroz wrote:
>> On Mon, Feb 08, 2016 at 01:36:57PM -0500, Sage Weil wrote:
>> > I didn't find any other good K names, but I'm not sure anything would top
>> > kraken anyway, so I didn't look too hard.  :)
>> >
>> > For L, the options I found were
>> >
>> > luminous (flying squid)
>> > longfin (squid)
>> > long barrel (squid)
>> > liliput (octopus)
>>
>> Kraken is awesome.
>>
>> Perhaps we can add 'Loligo' (https://en.wikipedia.org/wiki/Loligo) to the L 
>> list?
>
> Yep!
>
> http://pad.ceph.com/p/l

I took the liberty of adding L'Octopus, our 8 legged French friend.

.. Dan (re-sent due to Gmail suckiness)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Increasing time to save RGW objects

2016-02-09 Thread Jaroslaw Owsiewski
FYI
-- 
Jarek

-- Forwarded message --
From: Jaroslaw Owsiewski 
Date: 2016-02-09 12:00 GMT+01:00
Subject: Re: [ceph-users] Increasing time to save RGW objects
To: Wade Holler 


Hi,

For example:

# ceph --admin-daemon=ceph-osd.98.asok perf dump

generaly:

ceph --admin-daemon=/path/to/osd.asok help

Best Regards

-- 
Jarek


2016-02-09 11:21 GMT+01:00 Wade Holler :

> Hi there,
>
> What is the best way to "look at the rgw admin socket " to see what
> operations are taking a long time ?
>
> Best Regards
> Wade
>
> On Mon, Feb 8, 2016 at 12:16 PM Gregory Farnum  wrote:
>
>> On Mon, Feb 8, 2016 at 8:49 AM, Kris Jurka  wrote:
>> >
>> > I've been testing the performance of ceph by storing objects through
>> RGW.
>> > This is on Debian with Hammer using 40 magnetic OSDs, 5 mons, and 4 RGW
>> > instances.  Initially the storage time was holding reasonably steady,
>> but it
>> > has started to rise recently as shown in the attached chart.
>> >
>> > The test repeatedly saves 100k objects of 55 kB size using multiple
>> threads
>> > (50) against multiple RGW gateways (4).  It uses a sequential
>> identifier as
>> > the object key and shards the bucket name using id % 100.  The buckets
>> have
>> > index sharding enabled with 64 index shards per bucket.
>> >
>> > ceph status doesn't appear to show any issues.  Is there something I
>> should
>> > be looking at here?
>> >
>> >
>> > # ceph status
>> > cluster 3fc86d01-cf9c-4bed-b130-7a53d7997964
>> >  health HEALTH_OK
>> >  monmap e2: 5 mons at
>> > {condor=
>> 192.168.188.90:6789/0,duck=192.168.188.140:6789/0,eagle=192.168.188.100:6789/0,falcon=192.168.188.110:6789/0,shark=192.168.188.118:6789/0
>> }
>> > election epoch 18, quorum 0,1,2,3,4
>> > condor,eagle,falcon,shark,duck
>> >  osdmap e674: 40 osds: 40 up, 40 in
>> >   pgmap v258756: 3128 pgs, 10 pools, 1392 GB data, 27282 kobjects
>> > 4784 GB used, 69499 GB / 74284 GB avail
>> > 3128 active+clean
>> >   client io 268 kB/s rd, 1100 kB/s wr, 493 op/s
>>
>> It's probably a combination of your bucket indices getting larger and
>> your PGs getting split into subfolders on the OSDs. If you keep
>> running tests and things get slower it's the first; if they speed
>> partway back up again it's the latter.
>> Other things to check:
>> * you can look at your OSD stores and how the object files are divvied up.
>> * you can look at the rgw admin socket and/or logs to see what
>> operations are the ones taking time
>> * you can check the dump_historic_ops on the OSDs to see if there are
>> any notably slow ops
>> -Greg
>>
>> >
>> >
>> > Kris Jurka
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] K is for Kraken

2016-02-09 Thread Ferhat Ozkasgarli
Release the Kraken! (Please...)
On Feb 9, 2016 1:05 PM, "Dan van der Ster"  wrote:

> On Mon, Feb 8, 2016 at 8:10 PM, Sage Weil  wrote:
> > On Mon, 8 Feb 2016, Karol Mroz wrote:
> >> On Mon, Feb 08, 2016 at 01:36:57PM -0500, Sage Weil wrote:
> >> > I didn't find any other good K names, but I'm not sure anything would
> top
> >> > kraken anyway, so I didn't look too hard.  :)
> >> >
> >> > For L, the options I found were
> >> >
> >> > luminous (flying squid)
> >> > longfin (squid)
> >> > long barrel (squid)
> >> > liliput (octopus)
> >>
> >> Kraken is awesome.
> >>
> >> Perhaps we can add 'Loligo' (https://en.wikipedia.org/wiki/Loligo) to
> the L list?
> >
> > Yep!
> >
> > http://pad.ceph.com/p/l
>
> I took the liberty of adding L'Octopus, our 8 legged French friend.
>
> .. Dan (re-sent due to Gmail suckiness)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw anonymous write

2016-02-09 Thread Jacek Jarosiewicz

Hi list,

My setup is: ceph 0.94.5, ubuntu 14.04, tengine (patched nginx).

I'm trying to migrate from our old file storage (MogileFS) to the new 
ceph radosgw. The problem is that the old storage had no access control 
- no authorization, so the access to read and/or write was controlled by 
the web server (ie per IP/network).


I want to keep the clients using old storage, but get rid of the 
MogileFS so I don't have to maintain two different storage solutions.


Basically MogileFS http API is similar to S3, except for the 
authorization part - so the methods are the same (PUT, GET, DELETE..).


I've created a bucket with public-read-write access and tried to connect 
MogileFS client to it - the uploads work fine, and the files get acl 
public-read so are readable, but they don't have an owner.


So after upload I can't manage them (ie modify acl) - I can only remove 
objects.


Is there a way to force files that are uploaded anonymously to have an 
owner? Is there a way maybe to have them inherit owner from the bucket?


Cheers,
J

--
Jacek Jarosiewicz
Administrator Systemów Informatycznych


SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego,

nr KRS 029537; kapitał zakładowy 42.756.000 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa


SUPERMEDIA ->   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] K is for Kraken

2016-02-09 Thread Götz Reinicke - IT Koordinator
Am 08.02.16 um 20:09 schrieb Robert LeBlanc:
> Too bad K isn't an LTS. It was be fun to release the Kraken many times.

+1

:) https://www.youtube.com/watch?v=_lN2auTVavw

cheers . Götz


-- 
Götz Reinicke
IT-Koordinator

Tel. +49 7141 969 82420
E-Mail goetz.reini...@filmakademie.de

Filmakademie Baden-Württemberg GmbH
Akademiehof 10
71638 Ludwigsburg
www.filmakademie.de

Eintragung Amtsgericht Stuttgart HRB 205016

Vorsitzender des Aufsichtsrats: Jürgen Walter MdL
Staatssekretär im Ministerium für Wissenschaft,
Forschung und Kunst Baden-Württemberg

Geschäftsführer: Prof. Thomas Schadt



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption?

2016-02-09 Thread Jason Dillaman
What release of Infernalis are you running?  When you encounter this error, is 
the partition table zeroed out or does it appear to be random corruption?  

-- 

Jason Dillaman 

- Original Message -
> From: "Udo Waechter" 
> To: "ceph-users" 
> Sent: Saturday, February 6, 2016 5:31:51 AM
> Subject: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption?
> 
> Hello,
> 
> I am experiencing totally weird filesystem corruptions with the
> following setup:
> 
> * Ceph infernalis on Debian8
> * 10 OSDs (5 hosts) with spinning disks
> * 4 OSDs (1 host, with SSDs)
> 
> The SSDs are new in my setup and I am trying to setup a Cache tier.
> 
> Now, with the spinning disks Ceph is running since about a year without
> any major issues. Replacing disks and all that went fine.
> 
> Ceph is used by rbd+libvirt+kvm with
> 
> rbd_cache = true
> rbd_cache_writethrough_until_flush = true
> rbd_cache_size = 128M
> rbd_cache_max_dirty = 96M
> 
> Also, in libvirt, I have
> 
> cachemode=writeback enabled.
> 
> So far so good.
> 
> Now, I've added the SSD-Cache tier to the picture with "cache-mode
> writeback"
> 
> The SSD-Machine also has "deadline" scheduler enabled.
> 
> Suddenly VMs start to corrupt their filesystems (all ext4) with "Journal
> failed".
> Trying to reboot the machines ends in "No bootable drive"
> Using parted and testdisk on the image mapped via rbd reveals that the
> partition table is gone.
> 
> testdisk finds the proper ones, e2fsck repairs the filesystem beyond
> usage afterwards.
> 
> This does not happen to all machines, It happens to those that actually
> do some or most fo the IO
> 
> elasticsearch, MariaDB+Galera, postgres, backup, GIT
> 
> So I thought, yesterday one of my ldap-servers died, and that one is not
> doing IO.
> 
> Could it be that rbd caching + qemu writeback cache + ceph cach tier
> writeback are not playing well together?
> 
> I've read through some older mails on the list, where people had similar
> problems and suspected somehting like that.
> 
> What are the proper/right settings for rdb/qemu/libvirt?
> 
> libvirt: cachemode=none (writeback?)
> rdb: cache_mode = none
> SSD-tier: cachemode: writeback
> 
> ?
> 
> Thanks for any help,
> udo.
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw anonymous write

2016-02-09 Thread Yehuda Sadeh-Weinraub
On Tue, Feb 9, 2016 at 5:15 AM, Jacek Jarosiewicz
 wrote:
> Hi list,
>
> My setup is: ceph 0.94.5, ubuntu 14.04, tengine (patched nginx).
>
> I'm trying to migrate from our old file storage (MogileFS) to the new ceph
> radosgw. The problem is that the old storage had no access control - no
> authorization, so the access to read and/or write was controlled by the web
> server (ie per IP/network).
>
> I want to keep the clients using old storage, but get rid of the MogileFS so
> I don't have to maintain two different storage solutions.
>
> Basically MogileFS http API is similar to S3, except for the authorization
> part - so the methods are the same (PUT, GET, DELETE..).
>
> I've created a bucket with public-read-write access and tried to connect
> MogileFS client to it - the uploads work fine, and the files get acl
> public-read so are readable, but they don't have an owner.
>
> So after upload I can't manage them (ie modify acl) - I can only remove
> objects.
>
> Is there a way to force files that are uploaded anonymously to have an
> owner? Is there a way maybe to have them inherit owner from the bucket?
>

Currently there's no way to change it. I'm not sure though that we're
doing the correct thing. Did you try it with Amazon S3 by any chance?

> Cheers,
> J
>
> --
> Jacek Jarosiewicz
> Administrator Systemów Informatycznych
>
> 
> SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
> ul. Senatorska 13/15, 00-075 Warszawa
> Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego Rejestru
> Sądowego,
> nr KRS 029537; kapitał zakładowy 42.756.000 zł
> NIP: 957-05-49-503
> Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa
>
> 
> SUPERMEDIA ->   http://www.supermedia.pl
> dostep do internetu - hosting - kolokacja - lacza - telefonia
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw anonymous write

2016-02-09 Thread Jacek Jarosiewicz

On 02/09/2016 04:07 PM, Yehuda Sadeh-Weinraub wrote:

On Tue, Feb 9, 2016 at 5:15 AM, Jacek Jarosiewicz
 wrote:

Hi list,

My setup is: ceph 0.94.5, ubuntu 14.04, tengine (patched nginx).

I'm trying to migrate from our old file storage (MogileFS) to the new ceph
radosgw. The problem is that the old storage had no access control - no
authorization, so the access to read and/or write was controlled by the web
server (ie per IP/network).

I want to keep the clients using old storage, but get rid of the MogileFS so
I don't have to maintain two different storage solutions.

Basically MogileFS http API is similar to S3, except for the authorization
part - so the methods are the same (PUT, GET, DELETE..).

I've created a bucket with public-read-write access and tried to connect
MogileFS client to it - the uploads work fine, and the files get acl
public-read so are readable, but they don't have an owner.

So after upload I can't manage them (ie modify acl) - I can only remove
objects.

Is there a way to force files that are uploaded anonymously to have an
owner? Is there a way maybe to have them inherit owner from the bucket?



Currently there's no way to change it. I'm not sure though that we're
doing the correct thing. Did you try it with Amazon S3 by any chance?




Hi,

No, I haven't. I've only been testing this with radosgw. But I think I 
misspoke. I mean - the files upload OK, they have public-read-write 
permissions, but no owner and I'm getting status=404 from the radosgw 
when trying to access them. Nginx is set up to serve files either from 
one backend (rados) or the other (mogile) - I think I didn't look 
closely enough as to where the files were actually coming from, because 
now I get only 404 from rados. The file permission xml looks like this:


root@cfgate01:~# radosgw-admin policy --bucket  --object y

xmlns="http://s3.amazonaws.com/doc/2006-03-01/";>xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:type="CanonicalUser">FULL_CONTROLxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:type="Group">http://acs.amazonaws.com/groups/global/AllUsersREADxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:type="Group">http://acs.amazonaws.com/groups/global/AllUsersWRITE


J

--
Jacek Jarosiewicz
Administrator Systemów Informatycznych


SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego,

nr KRS 029537; kapitał zakładowy 42.756.000 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa


SUPERMEDIA ->   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Kris Jurka



On 2/8/2016 9:16 AM, Gregory Farnum wrote:

On Mon, Feb 8, 2016 at 8:49 AM, Kris Jurka  wrote:


I've been testing the performance of ceph by storing objects through RGW.
This is on Debian with Hammer using 40 magnetic OSDs, 5 mons, and 4 RGW
instances.  Initially the storage time was holding reasonably steady, but it
has started to rise recently as shown in the attached chart.



It's probably a combination of your bucket indices getting larger and
your PGs getting split into subfolders on the OSDs. If you keep
running tests and things get slower it's the first; if they speed
partway back up again it's the latter.


Indeed, after running for another day, performance has leveled back out, 
as attached.  So tuning something like filestore_split_multiple would 
have moved around the time of this performance spike, but is there a way 
to eliminate it?  Some way of saying, start with N levels of directory 
structure because I'm going to have a ton of objects?  If this test 
continues, it's just going to hit another, worse spike later when it 
needs to split again.



Other things to check:
* you can look at your OSD stores and how the object files are divvied up.


Yes, checking the directory structure and times on the OSDs does show 
that things have been split recently.


Kris Jurka
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Tips for faster openstack instance boot

2016-02-09 Thread Vickey Singh
Guys Thanks a lot for your response.

We are running OpenStack Juno + Ceph 94.5

@Jason Dillaman Can you please explain what do you mean by "Glance is
configured to cache your RBD image" ? This might give me some clue.

Many Thanks.


On Mon, Feb 8, 2016 at 10:33 PM, Jason Dillaman  wrote:

> If Nova and Glance are properly configured, it should only require a quick
> clone of the Glance image to create your Nova ephemeral image.  Have you
> double-checked your configuration against the documentation [1]?  What
> version of OpenStack are you using?
>
> To answer your questions:
>
> > - From Ceph point of view. does COW works cross pool i.e. image from
> glance
> > pool ---> (cow) --> instance disk on nova pool
> Yes, cloning copy-on-write images works across pools
>
> > - Will a single pool for glance and nova instead of separate pool . will
> help
> > here ?
> Should be no change -- the creation of the clone is extremely lightweight
> (add the image to a directory, create a couple metadata objects)
>
> > - Is there any tunable parameter from Ceph or OpenStack side that should
> be
> > set ?
> I'd double-check your OpenStack configuration.  Perhaps Glance isn't
> configured with "show_image_direct_url = True", or Glance is configured to
> cache your RBD images, or you have an older OpenStack release that requires
> patches to fully support Nova+RBD.
>
> [1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/
>
> --
>
> Jason Dillaman
>
>
> - Original Message -
>
> > From: "Vickey Singh" 
> > To: ceph-users@lists.ceph.com, "ceph-users" 
> > Sent: Monday, February 8, 2016 9:10:59 AM
> > Subject: [ceph-users] Tips for faster openstack instance boot
>
> > Hello Community
>
> > I need some guidance how can i reduce openstack instance boot time using
> Ceph
>
> > We are using Ceph Storage with openstack ( cinder, glance and nova ). All
> > OpenStack images and instances are being stored on Ceph in different
> pools
> > glance and nova pool respectively.
>
> > I assume that Ceph by default uses COW rbd , so for example if an
> instance is
> > launched using glance image (which is stored on Ceph) , Ceph should take
> COW
> > snapshot of glance image and map it as RBD disk for instance. And this
> whole
> > process should be very quick.
>
> > In our case , the instance launch is taking 90 seconds. Is this normal ?
> ( i
> > know this really depends one's infra , but still )
>
> > Is there any way , i can utilize Ceph's power and can launch instances
> ever
> > faster.
>
> > - From Ceph point of view. does COW works cross pool i.e. image from
> glance
> > pool ---> (cow) --> instance disk on nova pool
> > - Will a single pool for glance and nova instead of separate pool . will
> help
> > here ?
> > - Is there any tunable parameter from Ceph or OpenStack side that should
> be
> > set ?
>
> > Regards
> > Vickey
>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Tips for faster openstack instance boot

2016-02-09 Thread Jason Dillaman
If your glance configuration includes the following, RBD images will be cached 
to disk on the API server:

[paste_deploy]
flavor = keystone+cachemanagement

See [1] for the configuration steps for Glance.

[1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/#configuring-glance

-- 

Jason Dillaman 

- Original Message - 

> From: "Vickey Singh" 
> To: "Jason Dillaman" 
> Cc: ceph-users@lists.ceph.com, "ceph-users" 
> Sent: Tuesday, February 9, 2016 11:11:31 AM
> Subject: Re: [ceph-users] Tips for faster openstack instance boot

> Guys Thanks a lot for your response.

> We are running OpenStack Juno + Ceph 94.5

> @ Jason Dillaman Can you please explain what do you mean by "Glance is
> configured to cache your RBD image" ? This might give me some clue.

> Many Thanks.

> On Mon, Feb 8, 2016 at 10:33 PM, Jason Dillaman < dilla...@redhat.com >
> wrote:

> > If Nova and Glance are properly configured, it should only require a quick
> > clone of the Glance image to create your Nova ephemeral image. Have you
> > double-checked your configuration against the documentation [1]? What
> > version of OpenStack are you using?
> 

> > To answer your questions:
> 

> > > - From Ceph point of view. does COW works cross pool i.e. image from
> > > glance
> 
> > > pool ---> (cow) --> instance disk on nova pool
> 
> > Yes, cloning copy-on-write images works across pools
> 

> > > - Will a single pool for glance and nova instead of separate pool . will
> > > help
> 
> > > here ?
> 
> > Should be no change -- the creation of the clone is extremely lightweight
> > (add the image to a directory, create a couple metadata objects)
> 

> > > - Is there any tunable parameter from Ceph or OpenStack side that should
> > > be
> 
> > > set ?
> 
> > I'd double-check your OpenStack configuration. Perhaps Glance isn't
> > configured with "show_image_direct_url = True", or Glance is configured to
> > cache your RBD images, or you have an older OpenStack release that requires
> > patches to fully support Nova+RBD.
> 

> > [1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/
> 

> > --
> 

> > Jason Dillaman
> 

> > - Original Message -
> 

> > > From: "Vickey Singh" < vickey.singh22...@gmail.com >
> 
> > > To: ceph-users@lists.ceph.com , "ceph-users" < ceph-us...@ceph.com >
> 
> > > Sent: Monday, February 8, 2016 9:10:59 AM
> 
> > > Subject: [ceph-users] Tips for faster openstack instance boot
> 

> > > Hello Community
> 

> > > I need some guidance how can i reduce openstack instance boot time using
> > > Ceph
> 

> > > We are using Ceph Storage with openstack ( cinder, glance and nova ). All
> 
> > > OpenStack images and instances are being stored on Ceph in different
> > > pools
> 
> > > glance and nova pool respectively.
> 

> > > I assume that Ceph by default uses COW rbd , so for example if an
> > > instance
> > > is
> 
> > > launched using glance image (which is stored on Ceph) , Ceph should take
> > > COW
> 
> > > snapshot of glance image and map it as RBD disk for instance. And this
> > > whole
> 
> > > process should be very quick.
> 

> > > In our case , the instance launch is taking 90 seconds. Is this normal ?
> > > (
> > > i
> 
> > > know this really depends one's infra , but still )
> 

> > > Is there any way , i can utilize Ceph's power and can launch instances
> > > ever
> 
> > > faster.
> 

> > > - From Ceph point of view. does COW works cross pool i.e. image from
> > > glance
> 
> > > pool ---> (cow) --> instance disk on nova pool
> 
> > > - Will a single pool for glance and nova instead of separate pool . will
> > > help
> 
> > > here ?
> 
> > > - Is there any tunable parameter from Ceph or OpenStack side that should
> > > be
> 
> > > set ?
> 

> > > Regards
> 
> > > Vickey
> 

> > > ___
> 
> > > ceph-users mailing list
> 
> > > ceph-users@lists.ceph.com
> 
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Lionel Bouton
Hi,

Le 09/02/2016 17:07, Kris Jurka a écrit :
>
>
> On 2/8/2016 9:16 AM, Gregory Farnum wrote:
>> On Mon, Feb 8, 2016 at 8:49 AM, Kris Jurka  wrote:
>>>
>>> I've been testing the performance of ceph by storing objects through
>>> RGW.
>>> This is on Debian with Hammer using 40 magnetic OSDs, 5 mons, and 4 RGW
>>> instances.  Initially the storage time was holding reasonably
>>> steady, but it
>>> has started to rise recently as shown in the attached chart.
>>>
>>
>> It's probably a combination of your bucket indices getting larger and
>> your PGs getting split into subfolders on the OSDs. If you keep
>> running tests and things get slower it's the first; if they speed
>> partway back up again it's the latter.
>
> Indeed, after running for another day, performance has leveled back
> out, as attached.  So tuning something like filestore_split_multiple
> would have moved around the time of this performance spike, but is
> there a way to eliminate it?  Some way of saying, start with N levels
> of directory structure because I'm going to have a ton of objects?  If
> this test continues, it's just going to hit another, worse spike later
> when it needs to split again.

Actually if I understand correctly how PG splitting works the next spike
should be  times smaller and spread over  times the period (where
 is the number of subdirectories created during each split which
seems to be 15 according to OSDs' directory layout).

That said, the problem that could happen is that by the time you reach
the next split you might have reached  times the object creation
speed you have currently and get the very same spike.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Lionel Bouton
Le 09/02/2016 19:11, Lionel Bouton a écrit :
> Actually if I understand correctly how PG splitting works the next spike
> should be  times smaller and spread over  times the period (where
>  is the number of subdirectories created during each split which
> seems to be 15

typo : 16
>  according to OSDs' directory layout).
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Gregory Farnum
On Tue, Feb 9, 2016 at 8:07 AM, Kris Jurka  wrote:
>
>
> On 2/8/2016 9:16 AM, Gregory Farnum wrote:
>>
>> On Mon, Feb 8, 2016 at 8:49 AM, Kris Jurka  wrote:
>>>
>>>
>>> I've been testing the performance of ceph by storing objects through RGW.
>>> This is on Debian with Hammer using 40 magnetic OSDs, 5 mons, and 4 RGW
>>> instances.  Initially the storage time was holding reasonably steady, but
>>> it
>>> has started to rise recently as shown in the attached chart.
>>>
>>
>> It's probably a combination of your bucket indices getting larger and
>> your PGs getting split into subfolders on the OSDs. If you keep
>> running tests and things get slower it's the first; if they speed
>> partway back up again it's the latter.
>
>
> Indeed, after running for another day, performance has leveled back out, as
> attached.  So tuning something like filestore_split_multiple would have
> moved around the time of this performance spike, but is there a way to
> eliminate it?  Some way of saying, start with N levels of directory
> structure because I'm going to have a ton of objects?  If this test
> continues, it's just going to hit another, worse spike later when it needs
> to split again.

This has been discussed before but I'm not sure of the outcome. Sam?
-Greg

>
>> Other things to check:
>> * you can look at your OSD stores and how the object files are divvied up.
>
>
> Yes, checking the directory structure and times on the OSDs does show that
> things have been split recently.
>
> Kris Jurka
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Tips for faster openstack instance boot

2016-02-09 Thread Josef Johansson
The biggest question here is if the OS is using systemctl or not. Cl7 boots
extremely quick but our cl6 instances take up to 90 seconds if the cluster
has work to do.

I know there a lot to do in the init as well with boot profiling etc that
could help.

/Josef

On Tue, 9 Feb 2016 17:11 Vickey Singh  wrote:

> Guys Thanks a lot for your response.
>
> We are running OpenStack Juno + Ceph 94.5
>
> @Jason Dillaman Can you please explain what do you mean by "Glance is
> configured to cache your RBD image" ? This might give me some clue.
>
> Many Thanks.
>
>
> On Mon, Feb 8, 2016 at 10:33 PM, Jason Dillaman 
> wrote:
>
>> If Nova and Glance are properly configured, it should only require a
>> quick clone of the Glance image to create your Nova ephemeral image.  Have
>> you double-checked your configuration against the documentation [1]?  What
>> version of OpenStack are you using?
>>
>> To answer your questions:
>>
>> > - From Ceph point of view. does COW works cross pool i.e. image from
>> glance
>> > pool ---> (cow) --> instance disk on nova pool
>> Yes, cloning copy-on-write images works across pools
>>
>> > - Will a single pool for glance and nova instead of separate pool .
>> will help
>> > here ?
>> Should be no change -- the creation of the clone is extremely lightweight
>> (add the image to a directory, create a couple metadata objects)
>>
>> > - Is there any tunable parameter from Ceph or OpenStack side that
>> should be
>> > set ?
>> I'd double-check your OpenStack configuration.  Perhaps Glance isn't
>> configured with "show_image_direct_url = True", or Glance is configured to
>> cache your RBD images, or you have an older OpenStack release that requires
>> patches to fully support Nova+RBD.
>>
>> [1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/
>>
>> --
>>
>> Jason Dillaman
>>
>>
>> - Original Message -
>>
>> > From: "Vickey Singh" 
>> > To: ceph-users@lists.ceph.com, "ceph-users" 
>> > Sent: Monday, February 8, 2016 9:10:59 AM
>> > Subject: [ceph-users] Tips for faster openstack instance boot
>>
>> > Hello Community
>>
>> > I need some guidance how can i reduce openstack instance boot time
>> using Ceph
>>
>> > We are using Ceph Storage with openstack ( cinder, glance and nova ).
>> All
>> > OpenStack images and instances are being stored on Ceph in different
>> pools
>> > glance and nova pool respectively.
>>
>> > I assume that Ceph by default uses COW rbd , so for example if an
>> instance is
>> > launched using glance image (which is stored on Ceph) , Ceph should
>> take COW
>> > snapshot of glance image and map it as RBD disk for instance. And this
>> whole
>> > process should be very quick.
>>
>> > In our case , the instance launch is taking 90 seconds. Is this normal
>> ? ( i
>> > know this really depends one's infra , but still )
>>
>> > Is there any way , i can utilize Ceph's power and can launch instances
>> ever
>> > faster.
>>
>> > - From Ceph point of view. does COW works cross pool i.e. image from
>> glance
>> > pool ---> (cow) --> instance disk on nova pool
>> > - Will a single pool for glance and nova instead of separate pool .
>> will help
>> > here ?
>> > - Is there any tunable parameter from Ceph or OpenStack side that
>> should be
>> > set ?
>>
>> > Regards
>> > Vickey
>>
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Kris Jurka



On 2/9/2016 10:11 AM, Lionel Bouton wrote:


Actually if I understand correctly how PG splitting works the next spike
should be  times smaller and spread over  times the period (where
 is the number of subdirectories created during each split which
seems to be 15 according to OSDs' directory layout).



I would expect that splitting one directory would take the same amount 
of time as it did this time, it's just that now there will be N times as 
many directories to split because of the previous splits.  So the 
duration of the spike would be quite a bit longer.



That said, the problem that could happen is that by the time you reach
the next split you might have reached  times the object creation
speed you have currently and get the very same spike.



This test runs as fast as possible, so in the best case scenario, object 
creation speed would stay the same, but is likely to gradually slow over 
time.


Kris Jurka
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Lionel Bouton
Le 09/02/2016 20:07, Kris Jurka a écrit :
>
>
> On 2/9/2016 10:11 AM, Lionel Bouton wrote:
>
>> Actually if I understand correctly how PG splitting works the next spike
>> should be  times smaller and spread over  times the period (where
>>  is the number of subdirectories created during each split which
>> seems to be 15 according to OSDs' directory layout).
>>
>
> I would expect that splitting one directory would take the same amount
> of time as it did this time, it's just that now there will be N times
> as many directories to split because of the previous splits.  So the
> duration of the spike would be quite a bit longer.

Oops I missed this bit, I believe you are right: the spike duration
should be ~16x longer but the slowdown roughly the same over this new
period :-(

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Lionel Bouton
Le 09/02/2016 20:18, Lionel Bouton a écrit :
> Le 09/02/2016 20:07, Kris Jurka a écrit :
>>
>> On 2/9/2016 10:11 AM, Lionel Bouton wrote:
>>
>>> Actually if I understand correctly how PG splitting works the next spike
>>> should be  times smaller and spread over  times the period (where
>>>  is the number of subdirectories created during each split which
>>> seems to be 15 according to OSDs' directory layout).
>>>
>> I would expect that splitting one directory would take the same amount
>> of time as it did this time, it's just that now there will be N times
>> as many directories to split because of the previous splits.  So the
>> duration of the spike would be quite a bit longer.
> Oops I missed this bit, I believe you are right: the spike duration
> should be ~16x longer but the slowdown roughly the same over this new
> period :-(

As I don't see any way around this, I'm thinking out of the box.

As splitting is costly for you you might want to try to avoid it (or at
least limit it to the first occurrence if your use case can handle such
a slowdown).
You can test increasing the PG number of your pool before reaching the
point where the split starts.
This would generate movements but this might (or might not) slow down
your access less than what you see when splitting occurs (I'm not sure
about the exact constraints but basically Ceph forces you to increase
the number of placement PG by small amounts which should limit the
performance impact).

Another way to do this with no movement and slowdown is to add pools
(which basically create new placement groups without rebalancing data)
but this means modifying your application so that new objects are stored
on the new pool (which may or may not be possible depending on your
actual access patterns).

There are limits to these 2 suggestions : increasing the number of
placement groups have costs so you might want to check with devs how
high you can go and if it fits your constraints.

Lionel.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] erasure code backing pool, replication cache, and openstack

2016-02-09 Thread WRIGHT, JON R (JON R)

New user.  :)

I'm interested in exploring how to use an erasure coded pool as block 
storage for Openstack.  Instructions are on this page.


http://docs.ceph.com/docs/master/rados/operations/erasure-code/

Of course, it says

"It is not possible to create an RBD image on an erasure coded pool 
because it requires partial writes. It is however possible to create an 
RBD image on an erasure coded pools when a replicated pool tier set a 
cache tier:"


So I have set up a erasure-coded backing pool with a replicated cache 
tier.  This seems to work.


Also, I've set up a cinder backend to use the erasure coded pool (with 
cache tier) as block storage for Openstack.  Because traffic is 
redirected from the backing pool to the cache tier, I set up the cinder 
backend to reference the backing pool.


Again, this seems to work.   But, I wonder whether the cinder backend 
configuration should reference the backing pool or the cache tier?   
Because of the redirected traffic, I'm not sure that it matters.


Jon


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Samuel Just
There was a patch at some point to pre-split on pg creation (merged in
ad6a2be402665215a19708f55b719112096da3f4).  More generally, bluestore
is the answer to this.
-Sam

On Tue, Feb 9, 2016 at 11:34 AM, Lionel Bouton
 wrote:
> Le 09/02/2016 20:18, Lionel Bouton a écrit :
>> Le 09/02/2016 20:07, Kris Jurka a écrit :
>>>
>>> On 2/9/2016 10:11 AM, Lionel Bouton wrote:
>>>
 Actually if I understand correctly how PG splitting works the next spike
 should be  times smaller and spread over  times the period (where
  is the number of subdirectories created during each split which
 seems to be 15 according to OSDs' directory layout).

>>> I would expect that splitting one directory would take the same amount
>>> of time as it did this time, it's just that now there will be N times
>>> as many directories to split because of the previous splits.  So the
>>> duration of the spike would be quite a bit longer.
>> Oops I missed this bit, I believe you are right: the spike duration
>> should be ~16x longer but the slowdown roughly the same over this new
>> period :-(
>
> As I don't see any way around this, I'm thinking out of the box.
>
> As splitting is costly for you you might want to try to avoid it (or at
> least limit it to the first occurrence if your use case can handle such
> a slowdown).
> You can test increasing the PG number of your pool before reaching the
> point where the split starts.
> This would generate movements but this might (or might not) slow down
> your access less than what you see when splitting occurs (I'm not sure
> about the exact constraints but basically Ceph forces you to increase
> the number of placement PG by small amounts which should limit the
> performance impact).
>
> Another way to do this with no movement and slowdown is to add pools
> (which basically create new placement groups without rebalancing data)
> but this means modifying your application so that new objects are stored
> on the new pool (which may or may not be possible depending on your
> actual access patterns).
>
> There are limits to these 2 suggestions : increasing the number of
> placement groups have costs so you might want to check with devs how
> high you can go and if it fits your constraints.
>
> Lionel.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bucket listing requests get stuck

2016-02-09 Thread Alexey Kuntsevich
Hi!

I have a ver 0.94.5 debian-based cluster used mostly through rados.
I tried to delete objects with the same prefix from one of the buckets
(~1300 objects) using a python boto library. The process finished after
several minutes without any errors, but now I can list only a subset (~20)
of objects in this bucket and if I increase the number of objects to list
even to 25 the request hangs for hours. I can still access objects directly
in this bucket and list objects with any other prefix than the one that was
used for deletion.
I tried rebooting nodes and the gateway server, checked the radosgw logs
(nothing except messages with 200 and 499 return codes) and doing random
maintenance tasks that I was able to find in the documentation.
Is there a way to fix the issue without moving the rest of the data into
another bucket and dropping the old bucket with radosgw-admin?
Are there any monitoring means that can show locks/data consistency issues
for radosgw?
Is it possible to trace where the request gets stuck?
Is there any documentation on how radosgw stores its data inside ceph?

Any help is appreciated!

-- 
Best regards,
Alexey Kuntsevich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Dell Ceph Hardware recommendations

2016-02-09 Thread Michael
Hello,

I'm looking at purchasing Qty 3-4, Dell PowerEdge T630 or R730xd for my OSD 
nodes in a Ceph cluster.

Hardware:
Qty x 1, E5-2630v3 2.4Ghz 8C/16T
128 GB DDR4 Ram
QLogic 57810 DP 10Gb DA/SFP+ Converged Network Adapter

I'm trying to determine which RAID controller to use, since I've read JBOD 
is pretty suitable for software defined storage.

Raid Controllers options (series 9 controllers supposedly support JBOD, 
http://www.dell.com/learn/us/en/04/campaigns/dell-raid-controllers):
PERC H330 RAID Controller (supports for non-RAID passthrough
configuration options)
PERC H730 RAID Controller (LSI SAS 3108), 1GB NV Cache  Selected  +$210.00
PERC H730P RAID Controller (LSI SAS 3108), 2GB NV Cache +$420.00
Dual PERC H730P RAID Controllers (LSI SAS 3108), 2GB NV Cache   +$1235.30

PERC HBA330 12GB Controller Minicard -($12.90)

I was told by a Dell Technical Associate today the the H330 could support 
12 disks and I'm currently leaning toward using it.  Any suggestions on the 
tradeoffs would be greatly appreciated.

As far as connected storage, I'm looking to use:
Intel S3700 128GB for every 4-6 Drives
HGST 7K4000 4 TB Drives, Qty x 12-16
-


This would essentially mean that 4-5 HGST 7K4000 4 TB Drives per each OSD 
node and one SSD for each node as well.  We're looking for a basic Ceph 
cluster that will allow us to virtualize a large number of (300-400) VMs 
that are thinly provisioned.  Any feedback from similar use cases, is much 
appreciated.  The Dell computers appear to have some type of plate in-
between the SAS connections and cables to the front of the hard drive hot 
swap caddy area on my existing servers.. I'm not sure if this limits me in 
my ability to swap out the H330 raid controller that is provided by Dell.  
We have bulk pricing discounts through dell, though I'm open to other 
options that work best for the Ceph deployment.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dell Ceph Hardware recommendations

2016-02-09 Thread Alexandre DERUMIER
Hi,

I'm using dell r630 (8 disks, no expander backplane) with PERC H330 RAID 
Controller (supports for non-RAID passthrough)

Full ssd nodes, 2x raid1 (s3700 100GB) for os + mon,  passthrough for 6x osd 
(intel s3610 1,6TB)

64GB ram, 2x intel E5-2687W v3 @ 3.10GHz (10C/20T)


H330 controller work very will with ssd  (no NV cache). (benched 600k iops 4k 
read with 3 nodes)

- Mail original -
De: "Michael" 
À: "ceph-users" 
Envoyé: Mardi 9 Février 2016 22:49:14
Objet: [ceph-users] Dell Ceph Hardware recommendations

Hello, 

I'm looking at purchasing Qty 3-4, Dell PowerEdge T630 or R730xd for my OSD 
nodes in a Ceph cluster. 

Hardware: 
Qty x 1, E5-2630v3 2.4Ghz 8C/16T 
128 GB DDR4 Ram 
QLogic 57810 DP 10Gb DA/SFP+ Converged Network Adapter 

I'm trying to determine which RAID controller to use, since I've read JBOD 
is pretty suitable for software defined storage. 

Raid Controllers options (series 9 controllers supposedly support JBOD, 
http://www.dell.com/learn/us/en/04/campaigns/dell-raid-controllers): 
PERC H330 RAID Controller (supports for non-RAID passthrough 
configuration options) 
PERC H730 RAID Controller (LSI SAS 3108), 1GB NV Cache Selected +$210.00 
PERC H730P RAID Controller (LSI SAS 3108), 2GB NV Cache +$420.00 
Dual PERC H730P RAID Controllers (LSI SAS 3108), 2GB NV Cache +$1235.30 

PERC HBA330 12GB Controller Minicard -($12.90) 

I was told by a Dell Technical Associate today the the H330 could support 
12 disks and I'm currently leaning toward using it. Any suggestions on the 
tradeoffs would be greatly appreciated. 

As far as connected storage, I'm looking to use: 
Intel S3700 128GB for every 4-6 Drives 
HGST 7K4000 4 TB Drives, Qty x 12-16 
- 


This would essentially mean that 4-5 HGST 7K4000 4 TB Drives per each OSD 
node and one SSD for each node as well. We're looking for a basic Ceph 
cluster that will allow us to virtualize a large number of (300-400) VMs 
that are thinly provisioned. Any feedback from similar use cases, is much 
appreciated. The Dell computers appear to have some type of plate in- 
between the SAS connections and cables to the front of the hard drive hot 
swap caddy area on my existing servers.. I'm not sure if this limits me in 
my ability to swap out the H330 raid controller that is provided by Dell. 
We have bulk pricing discounts through dell, though I'm open to other 
options that work best for the Ceph deployment. 


___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dell Ceph Hardware recommendations

2016-02-09 Thread Matt Taylor
We are using Dell R730XD's with 2 x Internal SAS in Raid 1 for OS. 24 x 
400GB SSD.


PERC H730P Mini is being used with non-RAID passthrough for the SSD's.

CPU and RAM specs aren't really needed to be known as you can do 
whatever you want, however I would recommend minimum of 2 x quad's and 
at least 48GB of RAM.


NICS are 4 x 10G (2 x 10 bonded for cluster, 2 x bonded for public). 
Naturally, you have the 4 x 1G on-board too.


Performance is very good for us.

Coupled with CentOS 7, OMSA, Zabbix (custom scripts) and Ceph Dash; you 
can get some nice metrics and real time alerting.


Cheers,
Matt.

On 10/02/2016 13:21, Alexandre DERUMIER wrote:

Hi,

I'm using dell r630 (8 disks, no expander backplane) with PERC H330 RAID 
Controller (supports for non-RAID passthrough)

Full ssd nodes, 2x raid1 (s3700 100GB) for os + mon,  passthrough for 6x osd 
(intel s3610 1,6TB)

64GB ram, 2x intel E5-2687W v3 @ 3.10GHz (10C/20T)


H330 controller work very will with ssd  (no NV cache). (benched 600k iops 4k 
read with 3 nodes)

- Mail original -
De: "Michael" 
À: "ceph-users" 
Envoyé: Mardi 9 Février 2016 22:49:14
Objet: [ceph-users] Dell Ceph Hardware recommendations

Hello,

I'm looking at purchasing Qty 3-4, Dell PowerEdge T630 or R730xd for my OSD
nodes in a Ceph cluster.

Hardware:
Qty x 1, E5-2630v3 2.4Ghz 8C/16T
128 GB DDR4 Ram
QLogic 57810 DP 10Gb DA/SFP+ Converged Network Adapter

I'm trying to determine which RAID controller to use, since I've read JBOD
is pretty suitable for software defined storage.

Raid Controllers options (series 9 controllers supposedly support JBOD,
http://www.dell.com/learn/us/en/04/campaigns/dell-raid-controllers):
PERC H330 RAID Controller (supports for non-RAID passthrough
configuration options)
PERC H730 RAID Controller (LSI SAS 3108), 1GB NV Cache Selected +$210.00
PERC H730P RAID Controller (LSI SAS 3108), 2GB NV Cache +$420.00
Dual PERC H730P RAID Controllers (LSI SAS 3108), 2GB NV Cache +$1235.30

PERC HBA330 12GB Controller Minicard -($12.90)

I was told by a Dell Technical Associate today the the H330 could support
12 disks and I'm currently leaning toward using it. Any suggestions on the
tradeoffs would be greatly appreciated.

As far as connected storage, I'm looking to use:
Intel S3700 128GB for every 4-6 Drives
HGST 7K4000 4 TB Drives, Qty x 12-16
-


This would essentially mean that 4-5 HGST 7K4000 4 TB Drives per each OSD
node and one SSD for each node as well. We're looking for a basic Ceph
cluster that will allow us to virtualize a large number of (300-400) VMs
that are thinly provisioned. Any feedback from similar use cases, is much
appreciated. The Dell computers appear to have some type of plate in-
between the SAS connections and cables to the front of the hard drive hot
swap caddy area on my existing servers.. I'm not sure if this limits me in
my ability to swap out the H330 raid controller that is provided by Dell.
We have bulk pricing discounts through dell, though I'm open to other
options that work best for the Ceph deployment.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Max Replica Size

2016-02-09 Thread Swapnil Jain
Hi,


What is the maximum replica size we can have for a poll with Infernalis 



—
Swapnil Jain | swap...@linux.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can't fix down+incomplete PG

2016-02-09 Thread Scott Laird
I lost a few OSDs recently.  Now my cell is unhealthy and I can't figure
out how to get it healthy again.

OSD 3, 7, 10, and 40 died in a power outage.  Now I have 10 PGs that are
down+incomplete, but all of them seem like they should have surviving
replicas of all data.

I'm running 9.2.0.

$ ceph health detail | grep down
pg 18.c1 is down+incomplete, acting [11,18,9]
pg 18.47 is down+incomplete, acting [11,9,22]
pg 18.1d7 is down+incomplete, acting [5,31,24]
pg 18.1d6 is down+incomplete, acting [22,11,5]
pg 18.2af is down+incomplete, acting [19,24,18]
pg 18.2dd is down+incomplete, acting [15,11,22]
pg 18.2de is down+incomplete, acting [15,17,11]
pg 18.3e is down+incomplete, acting [25,8,18]
pg 18.3d6 is down+incomplete, acting [22,39,24]
pg 18.3e6 is down+incomplete, acting [9,23,8]

$ ceph pg 18.c1 query
{
"state": "down+incomplete",
"snap_trimq": "[]",
"epoch": 960905,
"up": [
11,
18,
9
],
"acting": [
11,
18,
9
],
"info": {
"pgid": "18.c1",
"last_update": "0'0",
"last_complete": "0'0",
"log_tail": "0'0",
"last_user_version": 0,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": "[]",
"history": {
"epoch_created": 595523,
"last_epoch_started": 954170,
"last_epoch_clean": 954170,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 959988,
"same_interval_since": 959988,
"same_primary_since": 959988,
"last_scrub": "613947'7736",
"last_scrub_stamp": "2015-11-11 21:18:35.118057",
"last_deep_scrub": "613947'7736",
"last_deep_scrub_stamp": "2015-11-11 21:18:35.118057",
"last_clean_scrub_stamp": "2015-11-11 21:18:35.118057"
},
...
"probing_osds": [
"9",
"11",
"18",
"23",
"25"
],
"down_osds_we_would_probe": [
7,
10
],
"peering_blocked_by": []
},
{
"name": "Started",
"enter_time": "2016-02-09 20:35:57.627376"
}
],
"agent_state": {}
}

I tried replacing disks. I created a new OSD 3 and 7 but neither will start
up; the ceph-osd task starts but never actually makes it to 'up' with
nothing obvious in the logs.  I can post logs if that helps.  Since the
OSDs were removed a few days ago, 'ceph osd lost' doesn't seem to help.

Is there a way to fix these PGs and get my cluster healthy again?


Scott
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Max Replica Size

2016-02-09 Thread Shinobu Kinjo
What is poll?

Rgds,
Shinobu

- Original Message -
From: "Swapnil Jain" 
To: ceph-users@lists.ceph.com
Sent: Wednesday, February 10, 2016 2:20:08 PM
Subject: [ceph-users] Max Replica Size

Hi, 


What is the maximum replica size we can have for a poll with Infernalis 



— 
Swapnil Jain | swap...@linux.com 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Max Replica Size

2016-02-09 Thread Swapnil Jain
Sorry for the typo, its pool ;)

—

Swapnil Jain | swap...@linux.com 

> On 10-Feb-2016, at 11:13 AM, Shinobu Kinjo  wrote:
> 
> What is poll?
> 
> Rgds,
> Shinobu
> 
> - Original Message -
> From: "Swapnil Jain" 
> To: ceph-users@lists.ceph.com
> Sent: Wednesday, February 10, 2016 2:20:08 PM
> Subject: [ceph-users] Max Replica Size
> 
> Hi,
> 
> 
> What is the maximum replica size we can have for a poll with Infernalis
> 
> 
> 
> —
> Swapnil Jain | swap...@linux.com
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Max Replica Size

2016-02-09 Thread Lindsay Mathieson

On 10/02/16 15:43, Shinobu Kinjo wrote:

What is poll?



One suspects "Pool"

--
Lindsay Mathieson

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't fix down+incomplete PG

2016-02-09 Thread Arvydas Opulskis
Hi,

What is min_size for this pool? Maybe you need to decrease it for cluster to 
start recovering.

Arvydas

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Scott 
Laird
Sent: Wednesday, February 10, 2016 7:22 AM
To: 'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com) 

Subject: [ceph-users] Can't fix down+incomplete PG

I lost a few OSDs recently.  Now my cell is unhealthy and I can't figure out 
how to get it healthy again.

OSD 3, 7, 10, and 40 died in a power outage.  Now I have 10 PGs that are 
down+incomplete, but all of them seem like they should have surviving replicas 
of all data.

I'm running 9.2.0.

$ ceph health detail | grep down
pg 18.c1 is down+incomplete, acting [11,18,9]
pg 18.47 is down+incomplete, acting [11,9,22]
pg 18.1d7 is down+incomplete, acting [5,31,24]
pg 18.1d6 is down+incomplete, acting [22,11,5]
pg 18.2af is down+incomplete, acting [19,24,18]
pg 18.2dd is down+incomplete, acting [15,11,22]
pg 18.2de is down+incomplete, acting [15,17,11]
pg 18.3e is down+incomplete, acting [25,8,18]
pg 18.3d6 is down+incomplete, acting [22,39,24]
pg 18.3e6 is down+incomplete, acting [9,23,8]

$ ceph pg 18.c1 query
{
"state": "down+incomplete",
"snap_trimq": "[]",
"epoch": 960905,
"up": [
11,
18,
9
],
"acting": [
11,
18,
9
],
"info": {
"pgid": "18.c1",
"last_update": "0'0",
"last_complete": "0'0",
"log_tail": "0'0",
"last_user_version": 0,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": "[]",
"history": {
"epoch_created": 595523,
"last_epoch_started": 954170,
"last_epoch_clean": 954170,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 959988,
"same_interval_since": 959988,
"same_primary_since": 959988,
"last_scrub": "613947'7736",
"last_scrub_stamp": "2015-11-11 21:18:35.118057",
"last_deep_scrub": "613947'7736",
"last_deep_scrub_stamp": "2015-11-11 21:18:35.118057",
"last_clean_scrub_stamp": "2015-11-11 21:18:35.118057"
},
...
"probing_osds": [
"9",
"11",
"18",
"23",
"25"
],
"down_osds_we_would_probe": [
7,
10
],
"peering_blocked_by": []
},
{
"name": "Started",
"enter_time": "2016-02-09 20:35:57.627376"
}
],
"agent_state": {}
}

I tried replacing disks. I created a new OSD 3 and 7 but neither will start up; 
the ceph-osd task starts but never actually makes it to 'up' with nothing 
obvious in the logs.  I can post logs if that helps.  Since the OSDs were 
removed a few days ago, 'ceph osd lost' doesn't seem to help.

Is there a way to fix these PGs and get my cluster healthy again?


Scott
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Max Replica Size

2016-02-09 Thread Loris Cuoghi

Pool ;)

Le 10/02/2016 06:43, Shinobu Kinjo a écrit :

What is poll?

Rgds,
Shinobu

- Original Message -
From: "Swapnil Jain" 
To: ceph-users@lists.ceph.com
Sent: Wednesday, February 10, 2016 2:20:08 PM
Subject: [ceph-users] Max Replica Size

Hi,


What is the maximum replica size we can have for a poll with Infernalis



—
Swapnil Jain | swap...@linux.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Max Replica Size

2016-02-09 Thread Wido den Hollander

> Op 10 februari 2016 om 6:20 schreef Swapnil Jain :
> 
> 
> Hi,
> 
> 
> What is the maximum replica size we can have for a poll with Infernalis 
> 

Depends on your CRUSH map, but if you have sufficient places for CRUSH, you can
go up to 10 replicas with the default min and max_size settings of a ruleset.

Wido

> 
> 
> —
> Swapnil Jain | swap...@linux.com 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com