Re: [ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1

2015-04-16 Thread Steffen W Sørensen
> That later change would have _increased_ the number of recommended PG, not
> decreased it.
Weird as our Giant health status was ok before upgrading to Hammer…

> With your cluster 2048 PGs total (all pools combined!) would be the sweet
> spot, see:
> 
> http://ceph.com/pgcalc/ 
Had read this originally when creating the cluster

> It seems to me that you increased PG counts assuming that the formula is per 
> pool.
Well yes maybe, believe we bumped PGs per status complain in Giant mentioned 
explicit different pool names, eg. too few PGs in …
so we naturally bumped mentioned pools slightly up til next 2-power until 
health stop complaining
and yes we wondered over this relative high number of PGs in total for the 
cluster, as we initially had read pgcalc and thought we understood this.

ceph.com  not responsding presently…

- are you saying one needs to consider in advance #pools in a cluster and 
factor this in when calculating the number of PGs?

- If so, how to decide which pool gets what #PG, as this is set per pool, 
especially if one can’t precalc the amount objects ending up in each pool?

But yes understand also that more pools means more PGs per OSD, does this imply 
using different pools to segregate various data f.ex. per application in same 
cluster is a bad idea?

Using pools as sort of name space segregation makes it easy f.e. to 
remove/migration data per application and thus a handy segregation tool ImHO.

- Are the BCP to consolidate data in few pools per cluster?

/Steffen___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph repo - RSYNC?

2015-04-16 Thread Wido den Hollander
On 15-04-15 18:17, Paul Mansfield wrote:
> 
> Sorry for starting a new thread, I've only just subscribed to the list
> and the archive on the mail listserv is far from complete at the moment.
> 

No problem!

It's on my radar to come up with a proper mirror system for Ceph. A
simple Bash script which is in the Git repo which you can use to sync
all Ceph packages and downloads.

Didn't get to it yet.

> on 8th March David Moreau Simard said
>   http://www.spinics.net/lists/ceph-users/msg16334.html
> that there was a rsync'able mirror of the ceph repo at
> http://ceph.mirror.iweb.ca/
> 
> 
> My problem is that the repo doesn't include Hammer. Is there someone who
> can get that added to the mirror?
> 
> thanks very much
> Paul
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1

2015-04-16 Thread Steffen W Sørensen
On 16/04/2015, at 01.48, Steffen W Sørensen  wrote:
> 
> Also our calamari web UI won't authenticate anymore, can’t see any issues in 
> any log under /var/log/calamari, any hints on what to look for are 
> appreciated, TIA!
Well this morning it will authenticate me, but seems calamari can’t talk to 
cluster anymore, wondering where to start digging… or will I need to rebuilt 
newer version to talk with a hammer cluster?

> # dpkg -l | egrep -i calamari\|ceph
> ii  calamari-clients   1.2.3.1-2-gc1f14b2all  
> Inktank Calamari user interface
> ii  calamari-server1.3-rc-16-g321cd58amd64
> Inktank package containing the Calamari management srever
Are this version of calamari able to monitor a Hammer cluster like below?

> ii  ceph   0.94.1-1~bpo70+1  amd64
> distributed storage and file system
> ii  ceph-common0.94.1-1~bpo70+1  amd64
> common utilities to mount and interact with a ceph storage cluster
> ii  ceph-deploy1.5.23~bpo70+1all  
> Ceph-deploy is an easy to use configuration tool
> ii  ceph-fs-common 0.94.1-1~bpo70+1  amd64
> common utilities to mount and interact with a ceph file system
> ii  ceph-fuse  0.94.1-1~bpo70+1  amd64
> FUSE-based client for the Ceph distributed file system
> ii  ceph-mds   0.94.1-1~bpo70+1  amd64
> metadata server for the ceph distributed file system
> ii  curl   7.29.0-1~bpo70+1.ceph amd64
> command line tool for transferring data with URL syntax
> ii  libcephfs1 0.94.1-1~bpo70+1  amd64
> Ceph distributed file system client library
> ii  libcurl3:amd64 7.29.0-1~bpo70+1.ceph amd64
> easy-to-use client-side URL transfer library (OpenSSL flavour)
> ii  libcurl3-gnutls:amd64  7.29.0-1~bpo70+1.ceph amd64
> easy-to-use client-side URL transfer library (GnuTLS flavour)
> ii  libleveldb1:amd64  1.12.0-1~bpo70+1.ceph amd64
> fast key-value storage library
> ii  python-ceph0.94.1-1~bpo70+1  amd64
> Meta-package for python libraries for the Ceph libraries
> ii  python-cephfs  0.94.1-1~bpo70+1  amd64
> Python libraries for the Ceph libcephfs library
> ii  python-rados   0.94.1-1~bpo70+1  amd64
> Python libraries for the Ceph librados library
> ii  python-rbd 0.94.1-1~bpo70+1  amd64
> Python libraries for the Ceph librbd library


TIA

/Steffen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1

2015-04-16 Thread Christian Balzer
On Thu, 16 Apr 2015 10:46:35 +0200 Steffen W Sørensen wrote:

> > That later change would have _increased_ the number of recommended PG,
> > not decreased it.
> Weird as our Giant health status was ok before upgrading to Hammer…
> 
I'm pretty sure the "too many" check was added around then, and the the
"too little" warning one earlier.

> > With your cluster 2048 PGs total (all pools combined!) would be the
> > sweet spot, see:
> > 
> > http://ceph.com/pgcalc/ 
> Had read this originally when creating the cluster
> 
> > It seems to me that you increased PG counts assuming that the formula
> > is per pool.
> Well yes maybe, believe we bumped PGs per status complain in Giant
> mentioned explicit different pool names, eg. too few PGs in …
Probably something like "less then 20 PGs" or some such, right?

> so we naturally bumped mentioned pools slightly up til next 2-power
> until health stop complaining and yes we wondered over this relative
> high number of PGs in total for the cluster, as we initially had read
> pgcalc and thought we understood this.
>

Your cluster (OSD count) needs (should really, it is not a hard failure
but a warning) to be high enough to satisfy the minimum amount of PGs, so
(too) many pools with a small cluster will leave you between a rock and
hard place.

> ceph.com  not responsding presently…
> 
It's being DoS'ed last I heard.

> - are you saying one needs to consider in advance #pools in a cluster
> and factor this in when calculating the number of PGs?
> 
Yes. Of course the idea is that pools consume space, so if you have many,
you also will have more OSDs to spread your PGs around.

> - If so, how to decide which pool gets what #PG, as this is set per
> pool, especially if one can’t precalc the amount objects ending up in
> each pool?
> 

Dead reckoning. 
As in, you should have some idea which pool is going to receive how much
data.

> But yes understand also that more pools means more PGs per OSD, does
> this imply using different pools to segregate various data f.ex. per
> application in same cluster is a bad idea?
> 
It certainly can be.

> Using pools as sort of name space segregation makes it easy f.e. to
> remove/migration data per application and thus a handy segregation tool
> ImHO.
>
Certainly, but unless you have a large enough cluster and pools that have
predictable utilization, fewer pools are the answer.
 
> - Are the BCP to consolidate data in few pools per cluster?
>

It is for me, as I have clusters of similar small size and only one type
of usage, RBD images. So they have 1 or 2 pools and that's it.

This also results in the smoothest data distribution possible of course.

Christian

> /Steffen

-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph site is very slow

2015-04-16 Thread unixkeeper
it still on DDOS ATTACK?
is there have a mirror site could get doc&guide?
thx a  lot



On Wed, Apr 15, 2015 at 11:32 PM, Gregory Farnum  wrote:

> People are working on it but I understand there was/is a DoS attack going
> on. :/
> -Greg
>
> On Wed, Apr 15, 2015 at 1:50 AM Ignazio Cassano 
> wrote:
>
>> Many thanks
>>
>> 2015-04-15 10:44 GMT+02:00 Wido den Hollander :
>>
>>> On 04/15/2015 10:20 AM, Ignazio Cassano wrote:
>>> > Hi all,
>>> > why ceph.com is very slow ?
>>>
>>> Not known right now. But you can try eu.ceph.com for your packages and
>>> downloads.
>>>
>>> > It is impossible download files for installing ceph.
>>> > Regards
>>> > Ignazio
>>> >
>>> >
>>> >
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>>
>>>
>>> --
>>> Wido den Hollander
>>> 42on B.V.
>>> Ceph trainer and consultant
>>>
>>> Phone: +31 (0)20 700 9902
>>> Skype: contact42on
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph site is very slow

2015-04-16 Thread Vikhyat Umrao

I hope this will help you : http://docs.ceph.com/docs/master/

Regards,
Vikhyat

On 04/16/2015 02:39 PM, unixkeeper wrote:

it still on DDOS ATTACK?
is there have a mirror site could get doc&guide?
thx a  lot



On Wed, Apr 15, 2015 at 11:32 PM, Gregory Farnum > wrote:


People are working on it but I understand there was/is a DoS
attack going on. :/
-Greg

On Wed, Apr 15, 2015 at 1:50 AM Ignazio Cassano
mailto:ignaziocass...@gmail.com>> wrote:

Many thanks

2015-04-15 10:44 GMT+02:00 Wido den Hollander mailto:w...@42on.com>>:

On 04/15/2015 10:20 AM, Ignazio Cassano wrote:
> Hi all,
> why ceph.com  is very slow ?

Not known right now. But you can try eu.ceph.com
 for your packages and
downloads.

> It is impossible download files for installing ceph.
> Regards
> Ignazio
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902 
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Motherboard recommendation?

2015-04-16 Thread Mohamed Pakkeer
Hi Nick,

Thanks Nick for your reply. There is a clear picture on the hardware
requirement for replication( 1Ghz per osd). But We cant find any document
related to hardware recommendation  for erasure coding.I read the mark
nelson report. But still some erasure coding testing shows 100% CPU
utilization. So what would be recommended CPU processing power for those
testing to avoid the 100% CPU utilization.

cheers
K.Mohamed Pakkeer

On Fri, Apr 10, 2015 at 1:40 PM, Nick Fisk  wrote:

> Hi Mohamed,
>
> There was an excellent document posted to the list by Mark Nelson a number
> of weeks back showing CPU utilisation for both replicated and erasure coded
> clusters under different operations (read/write/rebuild...etc)
>
> If you search for that it will probably answer quite a few of your
> questions. One thing that came from it that was important for Erasure
> Coding, is that the increase in the total number of shards, increases the
> CPU requirements, so it's not a simple black and white answer.
>
> Nick
>
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> > Mohamed Pakkeer
> > Sent: 10 April 2015 08:57
> > To: Christian Balzer
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Motherboard recommendation?
> >
> > Hi Blazer,
> >
> > Ceph recommends 1GHz CPU power per OSD. Is it applicable for both
> > replication and erasure coding based cluster? or will we require more CPU
> > power( more than 1Ghz per OSD) for erasure coding.
> >
> > We are running a test cluster with 15 * 4U servers and each server
> contains
> > 36 OSDs, dual Intel 2630 V3 processor and 96 GB RAM. We are getting
> > average CPU load 4 to 5% on cluster ideal condition. Is it normal
> behavior of
> > CPU load or  Could you advice the average CPU load requires in ideal
> > condition for good erasure coding cluster?
> >
> > ceph version 0.87.1
> > cluster : Erasure coding and CephFS
> >
> >
> >
> > Thanks in advance
> >
> > Cheers,
> > K.Mohamed Pakkeer
> >
> >
> > On Fri, Apr 10, 2015 at 12:20 PM, Christian Balzer 
> wrote:
> >
> > Hello,
> >
> > On Thu, 09 Apr 2015 10:00:37 +0200 Markus Goldberg wrote:
> >
> > > Hi,
> > > i have a backup-storage with ceph 0,93
> > Living on the edge...
> >
> > > As every backup-system it is only been written and hopefully never
> read.
> > >
> > What and how are you backing up?
> > As in, lots of small files copied like with rsync or a stream into a big
> > archive file like with bacula?
> > Is the Ceph target a RBD image or CephFS?
> >
> > > The hardware is 3 Supermicro SC847-cases with 30 SATA-HDDS each (2- and
> > > 4-TB-WD-disks) = 250TB
> > Uneven disk sizes can make for fun (not) later on.
> >
> > > I have realized, that the motherboards and CPUs are totally undersized,
> > > so i want to install new boards.
> > What's in there now?
> >
> > > I'm thinking of the following:
> > > 3 Supermicro X10DRH-CT or X10DRC-T4+ with 128GB memory each.
> > > What do you think about these boards? Will they fit into the SC847?
> > They should, but that question, like others, should best be asked to
> > Supermicro or your vendor.
> > As it will be their problem, not yours if they gave you a wrong answer.
> > Same goes for the question if the onboard controller can see the devices
> > behind the backplane expander (I would strongly expect the answer to be
> > "yes, of course")
> >
> > > They have SAS and 10G-Base-T onboard, so no extra controller seems to
> be
> > > necessary.
> > That's a LSI 3108 SAS controller.
> > No IT mode available for it AFAIK.
> > Thus not suitable/recommended to hook up individual JBOD disks.
> >
> > > What Xeon-v3 should i take, how many cores?
> > http://ark.intel.com/products/family/78583/Intel-Xeon-Processor-E5-v3-
> > Family
> >
> > Find the best (price, TDP) combination that gives you at least 30GHz of
> > total CPU power.
> > So the E5-2630 v3 comes to mind.
> >
> > > Does anyone know if M.2-SSDs are supported in their pci-e-slots?
> > >
> > One would think so, what SSDs where you thinking about?
> > How much data are you backing up per day (TDW, endurance of SSDs)?
> >
> > But realistically, with just 3 nodes and 30 HDDs per node, the best you
> > can hope for is probably 10 HDDs per journal SSD, so a single SSD failure
> > would impact your cluster significantly.
> >
> > However _if_ you plan on using journal SSDs and _if_ your backups consist
> > of a lot of small writes, get as much CPU power as you can afford.
> >
> > Christian
> > > Thank you very much,
> > >Markus
> > >
> > >
> --
> > > Markus Goldberg   Universität Hildesheim
> > >Rechenzentrum
> > > Tel +49 5121 88392822 Universitätsplatz 1, D-31141 Hildesheim, Germany
> > > Fax +49 5121 88392823 email goldb...@uni-hildesheim.de
> > >
> --
> > >
> > > __

Re: [ceph-users] Motherboard recommendation?

2015-04-16 Thread Nick Fisk
Hi Mohamed,

I asked Mark the exact same question about his report, on his test hardware he 
had slightly less than 1GHZ per OSD so he was fairly sure the guideline was 
still reasonable accurate.

However it's hard to come up with an exact figure as the CPU usage will change 
with the varying K/M values for the pool. It will also change depending on the 
speed of the under lying disks and whether SSD journals are being used. I would 
imagine the true value is probably somewhere between 1 - 1.5ghz per OSD for 
erasure coding, so your best bet is to aim for the upper value if you are 
worried about maxing out the CPU's.

Nick


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Mohamed Pakkeer
> Sent: 16 April 2015 10:40
> To: Nick Fisk
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Motherboard recommendation?
> 
> Hi Nick,
> 
> Thanks Nick for your reply. There is a clear picture on the hardware
> requirement for replication( 1Ghz per osd). But We cant find any document
> related to hardware recommendation  for erasure coding.I read the mark
> nelson report. But still some erasure coding testing shows 100% CPU
> utilization. So what would be recommended CPU processing power for those
> testing to avoid the 100% CPU utilization.
> 
> cheers
> K.Mohamed Pakkeer
> 
> On Fri, Apr 10, 2015 at 1:40 PM, Nick Fisk  wrote:
> Hi Mohamed,
> 
> There was an excellent document posted to the list by Mark Nelson a
> number of weeks back showing CPU utilisation for both replicated and
> erasure coded clusters under different operations (read/write/rebuild...etc)
> 
> If you search for that it will probably answer quite a few of your questions.
> One thing that came from it that was important for Erasure Coding, is that the
> increase in the total number of shards, increases the CPU requirements, so
> it's not a simple black and white answer.
> 
> Nick
> 
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> Of
> > Mohamed Pakkeer
> > Sent: 10 April 2015 08:57
> > To: Christian Balzer
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Motherboard recommendation?
> >
> > Hi Blazer,
> >
> > Ceph recommends 1GHz CPU power per OSD. Is it applicable for both
> > replication and erasure coding based cluster? or will we require more CPU
> > power( more than 1Ghz per OSD) for erasure coding.
> >
> > We are running a test cluster with 15 * 4U servers and each server contains
> > 36 OSDs, dual Intel 2630 V3 processor and 96 GB RAM. We are getting
> > average CPU load 4 to 5% on cluster ideal condition. Is it normal behavior 
> > of
> > CPU load or  Could you advice the average CPU load requires in ideal
> > condition for good erasure coding cluster?
> >
> > ceph version 0.87.1
> > cluster : Erasure coding and CephFS
> >
> >
> >
> > Thanks in advance
> >
> > Cheers,
> > K.Mohamed Pakkeer
> >
> >
> > On Fri, Apr 10, 2015 at 12:20 PM, Christian Balzer  wrote:
> >
> > Hello,
> >
> > On Thu, 09 Apr 2015 10:00:37 +0200 Markus Goldberg wrote:
> >
> > > Hi,
> > > i have a backup-storage with ceph 0,93
> > Living on the edge...
> >
> > > As every backup-system it is only been written and hopefully never read.
> > >
> > What and how are you backing up?
> > As in, lots of small files copied like with rsync or a stream into a big
> > archive file like with bacula?
> > Is the Ceph target a RBD image or CephFS?
> >
> > > The hardware is 3 Supermicro SC847-cases with 30 SATA-HDDS each (2-
> and
> > > 4-TB-WD-disks) = 250TB
> > Uneven disk sizes can make for fun (not) later on.
> >
> > > I have realized, that the motherboards and CPUs are totally undersized,
> > > so i want to install new boards.
> > What's in there now?
> >
> > > I'm thinking of the following:
> > > 3 Supermicro X10DRH-CT or X10DRC-T4+ with 128GB memory each.
> > > What do you think about these boards? Will they fit into the SC847?
> > They should, but that question, like others, should best be asked to
> > Supermicro or your vendor.
> > As it will be their problem, not yours if they gave you a wrong answer.
> > Same goes for the question if the onboard controller can see the devices
> > behind the backplane expander (I would strongly expect the answer to be
> > "yes, of course")
> >
> > > They have SAS and 10G-Base-T onboard, so no extra controller seems to
> be
> > > necessary.
> > That's a LSI 3108 SAS controller.
> > No IT mode available for it AFAIK.
> > Thus not suitable/recommended to hook up individual JBOD disks.
> >
> > > What Xeon-v3 should i take, how many cores?
> > http://ark.intel.com/products/family/78583/Intel-Xeon-Processor-E5-v3-
> > Family
> >
> > Find the best (price, TDP) combination that gives you at least 30GHz of
> > total CPU power.
> > So the E5-2630 v3 comes to mind.
> >
> > > Does anyone know if M.2-SSDs are supported in their pci-e-slots?
> > >
> > One would think so, what SSDs where you thinking about?
> > How much dat

Re: [ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1

2015-04-16 Thread Steffen W Sørensen

> On 16/04/2015, at 11.09, Christian Balzer  wrote:
> 
> On Thu, 16 Apr 2015 10:46:35 +0200 Steffen W Sørensen wrote:
> 
>>> That later change would have _increased_ the number of recommended PG,
>>> not decreased it.
>> Weird as our Giant health status was ok before upgrading to Hammer…
>> 
> I'm pretty sure the "too many" check was added around then, and the the
> "too little" warning one earlier.
Okay, might explain why too many shown up now :)

>> It seems to me that you increased PG counts assuming that the formula
>>> is per pool.
>> Well yes maybe, believe we bumped PGs per status complain in Giant
>> mentioned explicit different pool names, eg. too few PGs in …
> Probably something like "less then 20 PGs" or some such, right?
Properly yes, at least fewer than what seemed good for proper distribution

> Your cluster (OSD count) needs (should really, it is not a hard failure
> but a warning) to be high enough to satisfy the minimum amount of PGs, so
> (too) many pools with a small cluster will leave you between a rock and hard 
> place.
Right, maybe pgcalc should mention/explain a bit on considering #pools ahead as 
well... 

>> - are you saying one needs to consider in advance #pools in a cluster
>> and factor this in when calculating the number of PGs?
>> 
> Yes. Of course the idea is that pools consume space, so if you have many,
> you also will have more OSDs to spread your PGs around.
In this case we wanted to test out radosgw & S3 and thus needed to create the 
required number of pools which increased #PGs
But so far not real any data in GW pools as it failed working for our AS3 
compatible App. Now we removed those pools again.
And are back down to 4 pool. two for ceph FS and two for RBD images, each with 
1024 PGs, but still to many PGs, will try to consolidate the two RBD pools into 
one or two new with fewer PGs…

>> - If so, how to decide which pool gets what #PG, as this is set per
>> pool, especially if one can’t precalc the amount objects ending up in
>> each pool?
> Dead reckoning. 
> As in, you should have some idea which pool is going to receive how much data.
> 
> Certainly, but unless you have a large enough cluster and pools that have
> predictable utilization, fewer pools are the answer.
becasuse this makes it easier to match PGs against #OSDs I see

It would be nice somehow if #PGs could be decoupled from pools, but then 
against how to figure out where each pools object are…
Just convient to be have all data from a single App in a seperate pool/name 
space to easily see usage and perform management tasks :/

> It is for me, as I have clusters of similar small size and only one type
> of usage, RBD images. So they have 1 or 2 pools and that's it.
> 
> This also results in the smoothest data distribution possible of course.
Right, thanks 4 sharing!

/Steffen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados Gateway and keystone

2015-04-16 Thread ghislain.chevalier
Hi,

I finally configure a cloudberry profile by setting what seems to be the right 
endpoint for object storage according to the openstack environment : 
myrgw:myport/swift/v1
I got a “204 no content” error even if 2 containers were previously created by 
a swift operation with object into them.

In the log, I saw a dialog between the rgw and keystone but the right service 
doesn’t seem to be selected and the id became anonymous.

Any idea?

De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de 
ghislain.cheval...@orange.com
Envoyé : mercredi 15 avril 2015 18:39
À : ceph-users
Objet : Re: [ceph-users] Rados Gateway and keystone

Hi,

Despite the creation of ec2 credentials which provides an accesskey and a 
secretkey for a user, it’s always impossible to connect using S3 
(Forbidden/Access denied).
All is right using swift (create container, list container, get object, put 
object, delete object)
I use cloudberry client to do so.

Does someone know how I can check if the interoperability between keystone and 
the rgw is correctly set up?
In the rgw pools? in the radosgw metadata?

Best regards

De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de 
ghislain.cheval...@orange.com
Envoyé : mercredi 15 avril 2015 13:16
À : Erik McCormick
Cc : ceph-users
Objet : Re: [ceph-users] Rados Gateway and keystone

Thanks a lot
That helps.

De : Erik McCormick [mailto:emccorm...@cirrusseven.com]
Envoyé : lundi 13 avril 2015 18:32
À : CHEVALIER Ghislain IMT/OLPS
Cc : ceph-users
Objet : Re: [ceph-users] Rados Gateway and keystone

I haven't really used the S3 stuff much, but the credentials should be in 
keystone already. If you're in horizon, you can download them under Access and 
Security->API Access. Using the CLI you can use the openstack client like 
"openstack credential " or with the 
keystone client like "keystone ec2-credentials-list", etc.  Then you should be 
able to feed those credentials to the rgw like a normal S3 API call.

Cheers,
Erik

On Mon, Apr 13, 2015 at 10:16 AM, 
mailto:ghislain.cheval...@orange.com>> wrote:
Hi all,

Coming back to that issue.

I successfully used keystone users for the rados gateway and the swift API but 
I still don't understand how it can work with S3 API and i.e. S3 users 
(AccessKey/SecretKey)

I found a swift3 initiative but I think It's only compliant in a pure OpenStack 
swift environment  by setting up a specific plug-in.
https://github.com/stackforge/swift3

A rgw can be, at the same, time under keystone control and  standard 
radosgw-admin if
- for swift, you use the right authentication service (keystone or internal)
- for S3, you use the internal authentication service

So, my questions are still valid.
How can a rgw work for S3 users if there are stored in keystone? Which is the 
accesskey and secretkey?
What is the purpose of "rgw s3 auth use keystone" parameter ?

Best regards

--
De : ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com]
 De la part de 
ghislain.cheval...@orange.com
Envoyé : lundi 23 mars 2015 14:03
À : ceph-users
Objet : [ceph-users] Rados Gateway and keystone

Hi All,

I just would to be sure about keystone configuration for Rados Gateway.

I read the documentation http://ceph.com/docs/master/radosgw/keystone/ and 
http://ceph.com/docs/master/radosgw/config-ref/?highlight=keystone
but I didn't catch if after having configured the rados gateway (ceph.conf) in 
order to use keystone, it becomes mandatory to create all the users in it.

In other words, can a rgw be, at the same, time under keystone control and  
standard radosgw-admin ?
How does it work for S3 users ?
What is the purpose of "rgw s3 auth use keystone" parameter ?

Best regards

- - - - - - - - - - - - - - - - -
Ghislain Chevalier
+33299124432
+33788624370
ghislain.cheval...@orange.com
_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

_

Re: [ceph-users] mds crashing

2015-04-16 Thread Adam Tygart
(Adding back to the list)

We've not seen any slow requests near that badly behind. Leading up to
the crash, the furthest behind I saw any request was ~90 seconds. Here
is the cluster log leading up to the mds crashes.
http://people.beocat.cis.ksu.edu/~mozes/ceph-mds-crashes-20150415.log

--
Adam

On Thu, Apr 16, 2015 at 1:35 AM, Yan, Zheng  wrote:
> On Thu, Apr 16, 2015 at 10:44 AM, Adam Tygart  wrote:
>> We did that just after Kyle responded to John Spray above. I am
>> rebuilding the kernel now to include dynamic printk support.
>>
>
> Maybe the first crash was caused by hang request in MDS. Is there
> warnings like "cluster [WRN] slow request [several thousands or more ]
> seconds old, received at ...: client_request(client.734537:23 getattr
> pAsLsXsFs ...) "  in your ceph cluster log.
>
> Regards
> Yan, Zheng
>
>> --
>> Adam
>>
>> On Wed, Apr 15, 2015 at 9:37 PM, Yan, Zheng  wrote:
>>> On Thu, Apr 16, 2015 at 10:24 AM, Adam Tygart  wrote:
 I don't have "dynamic_debug" enabled in the currently running kernel,
 so I can't bump the verbosity of the ceph functions. I can rebuild the
 kernel and reboot it to enable dynamic_debug, but then we'll have to
 wait for when we re-trigger the bug. Attached is the mdsc file.

 As for getting the mds back running, we put a route in the faulty
 client to redirect ceph traffic to the loopback device. Started the
 mds again, waited for the full startup sequence to finish for the mds
 and re-set the normal routing. That seemed to cleanup the existing
 session and allow the mds to live and the client to reconnect. With
 the above mds requests still pending/hung, of course.
>>>
>>> did you do the trick before? the trick leaves the client in ill state.
>>> MDS will crash again after the client sends another 3M requests to it.
>>>
>>> Regards
>>> Yan, Zheng
>>>

 --
 Adam

 On Wed, Apr 15, 2015 at 9:04 PM, Yan, Zheng  wrote:
> On Thu, Apr 16, 2015 at 9:48 AM, Adam Tygart  wrote:
>> What is significantly smaller? We have 67 requests in the 16,400,000
>> range and 250 in the 18,900,000 range.
>>
>
> that explains the crash. could you help me to debug this issue.
>
>  send /sys/kernel/debug/ceph/*/mdsc to me.
>
>  run "echo module ceph +p > /sys/kernel/debug/dynamic_debug/control"
> on the cephfs mount machine
>  restart the mds and wait until it crash again
>  run "echo module ceph -p > /sys/kernel/debug/dynamic_debug/control"
> on the cephfs mount machine
>  send kernel message of the cephfs mount machine to me (should in
> /var/log/kerne.log or /var/log/message)
>
> to recover from the crash. you can either force reset the machine
> contains cephfs mount or add "mds wipe sessions = 1" to mds section of
> ceph.conf
>
> Regards
> Yan, Zheng
>
>
>> Thanks,
>>
>> Adam
>>
>> On Wed, Apr 15, 2015 at 8:38 PM, Yan, Zheng  wrote:
>>> On Thu, Apr 16, 2015 at 9:07 AM, Adam Tygart  wrote:
 We are using 3.18.6-gentoo. Based on that, I was hoping that the
 kernel bug referred to in the bug report would have been fixed.

>>>
>>> The bug was supposed to be fixed, but you hit the bug again. could you
>>> check if the kernel client has any hang mds request. (check
>>> /sys/kernel/debug/ceph/*/mdsc on the machine that contain cephfs
>>> mount. If there is any request whose ID is significant smaller than
>>> other requests' IDs)
>>>
>>> Regards
>>> Yan, Zheng
>>>
 --
 Adam

 On Wed, Apr 15, 2015 at 8:02 PM, Yan, Zheng  wrote:
> On Thu, Apr 16, 2015 at 5:29 AM, Kyle Hutson  
> wrote:
>> Thank you, John!
>>
>> That was exactly the bug we were hitting. My Google-fu didn't lead 
>> me to
>> this one.
>
>
> here is the bug report http://tracker.ceph.com/issues/10449. It's a
> kernel client bug which causes the session map size increase
> infinitely. which version of linux kernel are using?
>
> Regards
> Yan, Zheng
>
>
>>
>> On Wed, Apr 15, 2015 at 4:16 PM, John Spray  
>> wrote:
>>>
>>> On 15/04/2015 20:02, Kyle Hutson wrote:

 I upgraded to 0.94.1 from 0.94 on Monday, and everything had been 
 going
 pretty well.

 Then, about noon today, we had an mds crash. And then the failover 
 mds
 crashed. And this cascaded through all 4 mds servers we have.

 If I try to start it ('service ceph start mds' on CentOS 7.1), it 
 appears
 to be OK for a little while. ceph -w goes through 'replay' 
 'reconnect'
 'rejoin' 'clientreplay' and 'active' but nearly immediately after 
 get

Re: [ceph-users] Ceph repo - RSYNC?

2015-04-16 Thread Paul Mansfield
On 16/04/15 09:55, Wido den Hollander wrote:
> It's on my radar to come up with a proper mirror system for Ceph. A
> simple Bash script which is in the Git repo which you can use to sync
> all Ceph packages and downloads.

I've now set up a mirror of ceph/rpm-hammer/rhel7 for our internal use
and a simple snapshotting script copies the mirror to a date-stamped
directory using hard links so as not to eat up lots of disk space.


the key bits of the script look somewhat like this (I'm copying/pasting
and editing without testing the results, and missing out various error
checks and information messages, so please don't just copy this into a
script blindly ;-)


#!/bin/bash

DDD=`date +%Y%m%d`

MIRRDIR=/fileserver/rhel/ceph
SNAPDIR=/fileserver/rhel/ceph-snapshots/ceph-$DDD
RSYNCSRC=rsync://eu.ceph.com/ceph


mkdir -p $SNAPDIR

# copy flags: a = archive, l = hard links, r = recursive,
# u = updated/newer files, v = verbose

# trailing slash style otherwise we end up with ceph-yymmdd/ceph/
nice cp -alruv $MIRRDIR/* $SNAPDIR/

if [ $? != 0] ; then
echo "error"
exit
fi


# add other versions here:
for SRC in rpm-hammer/rhel7
rsync --bwlimit=1024 -aiH --no-perms --numeric-ids \
--delete --delete-after --delay-updates \
--exclude="*.i686.rpm" \
 $RSYNCSRC/$SRC/ $MIRRDIR/$SRC/



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph repo - RSYNC?

2015-04-16 Thread Wido den Hollander


On 16-04-15 15:11, Paul Mansfield wrote:
> On 16/04/15 09:55, Wido den Hollander wrote:
>> It's on my radar to come up with a proper mirror system for Ceph. A
>> simple Bash script which is in the Git repo which you can use to sync
>> all Ceph packages and downloads.
> 
> I've now set up a mirror of ceph/rpm-hammer/rhel7 for our internal use
> and a simple snapshotting script copies the mirror to a date-stamped
> directory using hard links so as not to eat up lots of disk space.
> 

Yes, that works, but I also want to make sure all docs are copied.

Anyway, thanks for sharing!

Wido

> 
> the key bits of the script look somewhat like this (I'm copying/pasting
> and editing without testing the results, and missing out various error
> checks and information messages, so please don't just copy this into a
> script blindly ;-)
> 
> 
> #!/bin/bash
> 
> DDD=`date +%Y%m%d`
> 
> MIRRDIR=/fileserver/rhel/ceph
> SNAPDIR=/fileserver/rhel/ceph-snapshots/ceph-$DDD
> RSYNCSRC=rsync://eu.ceph.com/ceph
> 
> 
> mkdir -p $SNAPDIR
> 
> # copy flags: a = archive, l = hard links, r = recursive,
> # u = updated/newer files, v = verbose
> 
> # trailing slash style otherwise we end up with ceph-yymmdd/ceph/
> nice cp -alruv $MIRRDIR/* $SNAPDIR/
> 
> if [ $? != 0] ; then
>   echo "error"
>   exit
> fi
> 
> 
> # add other versions here:
> for SRC in rpm-hammer/rhel7
> rsync --bwlimit=1024 -aiH --no-perms --numeric-ids \
>   --delete --delete-after --delay-updates \
>   --exclude="*.i686.rpm" \
>$RSYNCSRC/$SRC/ $MIRRDIR/$SRC/
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph.com

2015-04-16 Thread Patrick McGarry
Hey cephers,

As most of you have no doubt noticed, ceph.com has been having
some...er..."issues" lately. Unfortunately this is some of the
holdover infrastructure stuff from being a startup without a big-boy
ops plan.

The current setup has ceph.com sharing a host with some of the nightly
build stuff to make it easier for gitbuilder tasks (that also build
the website doc) to coexist. Was this smart? No, probably not. Was is
the quick-and-dirty way for us to get stuff rolling when we were tiny?
Yep.

So, now that things are continuing to grow (website traffic load,
ceph-deploy key requests, number of simultaneous builds) we are
hitting the end of what one hard-working box can handle. I am in the
process of moving ceph.com to a new host so that build explosions wont
slag things like Ceph Day pages and the blog, but the doc may lag
behind a bit.

Hopefully since I'm starting with the website it wont hose up too many
of the other tasks, but bear with us while we split routing for a bit.
If you have any questions please feel free to poke me. Thanks.

-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph.com

2015-04-16 Thread Ferber, Dan

Thanks for working on this Patrick. I have looked for a mirror that I can point 
all the ceph.com references to in 
/usr/lib/python2.6/site-packages/ceph_deploy/hosts/centos/install.py. So I can 
get ceph-deploy to work.

I tried eu.ceph.com but it does not work for this

Dan Ferber
Software  Defined Storage
+1 651-344-1846
dan.fer...@intel.com

From: Patrick McGarry mailto:pmcga...@redhat.com>>
Date: Thursday, April 16, 2015 at 10:28 AM
To: Ceph Devel mailto:ceph-de...@vger.kernel.org>>, 
Ceph-User mailto:ceph-us...@ceph.com>>
Subject: [ceph-users] Ceph.com

Hey cephers,

As most of you have no doubt noticed, ceph.com has been having
some...er..."issues" lately. Unfortunately this is some of the
holdover infrastructure stuff from being a startup without a big-boy
ops plan.

The current setup has ceph.com sharing a host with some of the nightly
build stuff to make it easier for gitbuilder tasks (that also build
the website doc) to coexist. Was this smart? No, probably not. Was is
the quick-and-dirty way for us to get stuff rolling when we were tiny?
Yep.

So, now that things are continuing to grow (website traffic load,
ceph-deploy key requests, number of simultaneous builds) we are
hitting the end of what one hard-working box can handle. I am in the
process of moving ceph.com to a new host so that build explosions wont
slag things like Ceph Day pages and the blog, but the doc may lag
behind a bit.

Hopefully since I'm starting with the website it wont hose up too many
of the other tasks, but bear with us while we split routing for a bit.
If you have any questions please feel free to poke me. Thanks.

--

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph.com

2015-04-16 Thread Sage Weil
We've fixed it so that 404 handling isn't done by wordpress/php and 
things are muuuch happier.  We've also moved all of the git stuff to 
git.ceph.com.  There is a redirect from http://ceph.com/git to 
git.ceph.com (tho no https on the new site yet) and a proxy for 
git://ceph.com.

Please let us know if anything still appears to be broken or slow!

Thanks-
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph.com

2015-04-16 Thread Chris Armstrong
Thanks for the update, Patrick. Our Docker builds were failing due to the
mirror being down. I appreciate being able to check the mailing list and
quickly see what's going on!

Chris

On Thu, Apr 16, 2015 at 11:28 AM, Patrick McGarry 
wrote:

> Hey cephers,
>
> As most of you have no doubt noticed, ceph.com has been having
> some...er..."issues" lately. Unfortunately this is some of the
> holdover infrastructure stuff from being a startup without a big-boy
> ops plan.
>
> The current setup has ceph.com sharing a host with some of the nightly
> build stuff to make it easier for gitbuilder tasks (that also build
> the website doc) to coexist. Was this smart? No, probably not. Was is
> the quick-and-dirty way for us to get stuff rolling when we were tiny?
> Yep.
>
> So, now that things are continuing to grow (website traffic load,
> ceph-deploy key requests, number of simultaneous builds) we are
> hitting the end of what one hard-working box can handle. I am in the
> process of moving ceph.com to a new host so that build explosions wont
> slag things like Ceph Day pages and the blog, but the doc may lag
> behind a bit.
>
> Hopefully since I'm starting with the website it wont hose up too many
> of the other tasks, but bear with us while we split routing for a bit.
> If you have any questions please feel free to poke me. Thanks.
>
> --
>
> Best Regards,
>
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com
> @scuttlemonkey || @ceph
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
*Chris Armstrong* | Deis Team Lead | Engine Yard

Deis is now part of Engine Yard! http://deis.io/deis-meet-engine-yard/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] switching journal location

2015-04-16 Thread Deneau, Tom
If my cluster is quiet and on one node I want to switch the location of the 
journal from
the default location to a file on an SSD drive (or vice versa), what is the
quickest way to do that?  Can I make a soft link to the new location and
do it without restarting the OSDs?

-- Tom Deneau, AMD


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] switching journal location

2015-04-16 Thread LOPEZ Jean-Charles
Hi Tom,

you will have to stop the OSD, flush the existing journal to ensure data 
consistency at the OSD level and then switch over to the new journal location 
(initialise journal then start the OSD).

Visit this link for step by step from Sébastien : 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-August/042085.html 
 
An old ML post actually

Cheers
JC

> On 16 Apr 2015, at 15:17, Deneau, Tom  wrote:
> 
> If my cluster is quiet and on one node I want to switch the location of the 
> journal from
> the default location to a file on an SSD drive (or vice versa), what is the
> quickest way to do that?  Can I make a soft link to the new location and
> do it without restarting the OSDs?
> 
> -- Tom Deneau, AMD
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph on Debian Jessie stopped working

2015-04-16 Thread Gregory Farnum
On Wed, Apr 15, 2015 at 9:31 AM, Chad William Seys
 wrote:
> Hi All,
> Earlier ceph on Debian Jessie was working.  Jessie is running 3.16.7 .
>
> Now when I modprobe rbd , no /dev/rbd appear.
>
> # dmesg | grep -e rbd -e ceph
> [   15.814423] Key type ceph registered
> [   15.814461] libceph: loaded (mon/osd proto 15/24)
> [   15.831092] rbd: loaded
> [   22.084573] rbd: no image name provided
> [   22.230176] rbd: no image name provided
>
>
> Some files appear under /sys
> ls /sys/devices/rbd
> power  uevent
>
> ceph-fuse /mnt/cephfs just hangs.
>
> I haven't changed the ceph config, but possibly there were package updates.  I
> did install a earlier Jessie kernel from a machine which is still working and
> rebooted.  No luck.
>
> Any ideas of what to check next?

Well, with those symptoms (and I'm not familiar with the rbd dmesg
output, but just from reading what it says) it sure looks like you've
lost some of your config state. Either the ceph keyring or the whole
config file.
Did you ever have CephFS working, or was the attempt to mount it just
another check after rbd failed?
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs not coming up on one host

2015-04-16 Thread Gregory Farnum
The monitor looks like it's not generating a new OSDMap including the
booting OSDs. I could say with more certainty what's going on with the
monitor log file, but I'm betting you've got one of the noin or noup
family of flags set. I *think* these will be output in "ceph -w" or in
"ceph osd dump", although I can't say for certain in Firefly.
-Greg

On Fri, Apr 10, 2015 at 1:57 AM, Jacob Reid  wrote:
> On Fri, Apr 10, 2015 at 09:55:20AM +0100, Jacob Reid wrote:
>> On Thu, Apr 09, 2015 at 05:21:47PM +0100, Jacob Reid wrote:
>> > On Thu, Apr 09, 2015 at 08:46:07AM -0700, Gregory Farnum wrote:
>> > > On Thu, Apr 9, 2015 at 8:14 AM, Jacob Reid  
>> > > wrote:
>> > > > On Thu, Apr 09, 2015 at 06:43:45AM -0700, Gregory Farnum wrote:
>> > > >> You can turn up debugging ("debug osd = 10" and "debug filestore = 10"
>> > > >> are probably enough, or maybe 20 each) and see what comes out to get
>> > > >> more information about why the threads are stuck.
>> > > >>
>> > > >> But just from the log my answer is the same as before, and now I don't
>> > > >> trust that controller (or maybe its disks), regardless of what it's
>> > > >> admitting to. ;)
>> > > >> -Greg
>> > > >>
>> > > >
>> > > > Ran with osd and filestore debug both at 20; still nothing jumping out 
>> > > > at me. Logfile attached as it got huge fairly quickly, but mostly 
>> > > > seems to be the same extra lines. I tried running some test I/O on the 
>> > > > drives in question to try and provoke some kind of problem, but they 
>> > > > seem fine now...
>> > >
>> > > Okay, this is strange. Something very wonky is happening with your
>> > > scheduler — it looks like these threads are all idle, and they're
>> > > scheduling wakeups that handle an appreciable amount of time after
>> > > they're supposed to. For instance:
>> > > 2015-04-09 15:56:55.953116 7f70a7963700 20
>> > > filestore(/var/lib/ceph/osd/osd.15) sync_entry woke after 5.416704
>> > > 2015-04-09 15:56:55.953153 7f70a7963700 20
>> > > filestore(/var/lib/ceph/osd/osd.15) sync_entry waiting for
>> > > max_interval 5.00
>> > >
>> > > This is the thread that syncs your backing store, and it always sets
>> > > itself to get woken up at 5-second intervals — but here it took >5.4
>> > > seconds, and later on in your log it takes more than 6 seconds.
>> > > It looks like all the threads which are getting timed out are also
>> > > idle, but are taking so much longer to wake up than they're set for
>> > > that they get a timeout warning.
>> > >
>> > > There might be some bugs in here where we're expecting wakeups to be
>> > > more precise than they can be, but these sorts of misses are
>> > > definitely not normal. Is this server overloaded on the CPU? Have you
>> > > done something to make the scheduler or wakeups wonky?
>> > > -Greg
>> >
>> > CPU load is minimal - the host does nothing but run OSDs and has 8 cores 
>> > that are all sitting idle with a load average of 0.1. I haven't done 
>> > anything to scheduling. That was with the debug logging on, if that could 
>> > be the cause of any delays. A scheduler issue seems possible - I haven't 
>> > done anything to it, but `time sleep 5` run a few times returns anything 
>> > spread randomly from 5.002 to 7.1(!) seconds but mostly in the 5.5-6.0 
>> > region where it managed fairly consistently <5.2 on the other servers in 
>> > the cluster and <5.02 on my desktop. I have disabled the CPU power saving 
>> > mode as the only thing I could think of that might be having an effect on 
>> > this, and running the same test again gives more sane results... we'll see 
>> > if this reflects in the OSD logs or not, I guess. If this is the cause, 
>> > it's probably something that the next version might want to make a 
>> > specific warning case of detecting. I will keep you updated as to their 
>> > behaviour now...
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> Overnight, nothing changed - I am no longer seeing the timeout in the logs 
>> but all the OSDs in questions are still happily sitting at booting and 
>> showing as down in the tree. Debug 20 logfile attached again.
> ...and here actually *is* the logfile, which I managed to forget... must be 
> Friday, I guess.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-osd failure following 0.92 -> 0.94 upgrade

2015-04-16 Thread Gregory Farnum
Yeah, you're going to have to modify one of the journal readers. I
don't remember exactly what the issue is so I don't know which
direction would be easiest — but you (unlike the Ceph upstream) can
have high confidence that the encoding bug is caused by v0.92 rather
than some other bug or hardware failure. You may be able to take
advantage of that in the decode path if you check through the bug
conditions carefully enough.
-Greg

On Fri, Apr 10, 2015 at 4:42 PM, Dirk Grunwald
 wrote:
> I've gone through the ceph-users mailing list and the only suggested fix (by
> Sage) was to roll back to V0.92, do ceph-osd -i NNN --flush-journal and then
> upgrade to V0.93 (which was the issue at the time).
>
> However, I've done that and the V0.92 code faults for a different reason,
> which I suspect is a transaction added when the V0.94 code started to run.
> Out of 60 OSD's, about 50-55 have this problem.
>
> My three solutions would seem to be:
> (1) rebuild the journals losing all the journal transactions (not ideal)
> (2) git clone the v0.92 code, modify the journal commit code to not barf on
> the V0.94 transactions
> (3) git cone the v0.94 code, modify the journal commit code to not barf on
> the V0.92 transactions
>
> Option #1 would lead to data loss but I think not OSD loss (which would be
> terrible).
>
> Option #3 would seem more sensible than Option #2, but I assume that if #3
> was easy to do,
> then it would have been included in the V0.94 codebase instead of the errata
> in the V0.80upgrade
> comments which got me into this fix.
>
> Suggestions of which is the better route or an alternate fix? Right now, I
> have ~55 useless OSD's
> and a lot of lost data.
>
> On Thu, Apr 9, 2015 at 7:13 PM, Dirk Grunwald 
> wrote:
>>
>> The solution to prevent this now (hours long) fix on my part was buried in
>> material
>> labeled as "upgrade form 0.80x giant".
>>
>> To prevent others from having the same issue, it may make sense to move
>> the 0.92
>> issue to the forefront, like the single 0.93 issue called out.
>>
>>
>>
>> On Thu, Apr 9, 2015 at 5:34 PM, Gregory Farnum  wrote:
>>>
>>> If you dig into the list archives I think somebody else went through
>>> this when the issue was discovered and recovered successfully. But I
>>> don't know the details. :)
>>> -Greg
>>>
>>> On Thu, Apr 9, 2015 at 3:38 PM, Dirk Grunwald
>>>  wrote:
>>> > Aha. That would have been useful to see -- I saw the notice about 0.93,
>>> > but
>>> > not that.
>>> >
>>> > when I roll back to v0.92, I get a different error (see below)
>>> >
>>> > This doesn't seem very happy - any suggestions?
>>> >
>>> >
>>> > root@zfs2:~/XYZZY/v92# ceph-osd -d -i 4 --flush-journal
>>> > 2015-04-09 16:31:44.756113 7f987f822900  0 ceph version 0.92
>>> > (00a3ac3b67d93860e7f0b6e07319f11b14d0fec0), process ceph-osd, pid 12605
>>> > 2015-04-09 16:31:44.758743 7f987f822900  0
>>> > filestore(/var/lib/ceph/osd/ceph-4) backend btrfs (magic 0x9123683e)
>>> > 2015-04-09 16:31:44.807613 7f987f822900  0
>>> > genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
>>> > FIEMAP
>>> > ioctl is supported and appears to work
>>> > 2015-04-09 16:31:44.807673 7f987f822900  0
>>> > genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
>>> > FIEMAP
>>> > ioctl is disabled via 'filestore fiemap' config opt\
>>> > ion
>>> > 2015-04-09 16:31:45.148028 7f987f822900  0
>>> > genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
>>> > syncfs(2)
>>> > syscall fully supported (by glibc and kernel)
>>> > 2015-04-09 16:31:45.148163 7f987f822900  0
>>> > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature:
>>> > CLONE_RANGE
>>> > ioctl is supported
>>> > 2015-04-09 16:31:45.923009 7f987f822900  0
>>> > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature:
>>> > SNAP_CREATE
>>> > is supported
>>> > 2015-04-09 16:31:45.923673 7f987f822900  0
>>> > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature:
>>> > SNAP_DESTROY
>>> > is supported
>>> > 2015-04-09 16:31:45.923979 7f987f822900  0
>>> > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature:
>>> > START_SYNC
>>> > is supported (transid 372081)
>>> > 2015-04-09 16:31:46.381367 7f987f822900  0
>>> > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature:
>>> > WAIT_SYNC is
>>> > supported
>>> > 2015-04-09 16:31:46.724449 7f987f822900  0
>>> > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature:
>>> > SNAP_CREATE_V2 is supported
>>> > 2015-04-09 16:31:47.473175 7f987f822900  0
>>> > filestore(/var/lib/ceph/osd/ceph-4) mount: enabling PARALLEL journal
>>> > mode:
>>> > fs, checkpoint is enabled
>>> >  HDIO_DRIVE_CMD(identify) failed: Invalid argument
>>> > 2015-04-09 16:31:47.495711 7f987f822900  1 journal _open
>>> > /var/lib/ceph/osd/ceph-4/journal fd 16: 1072693248 bytes, block size
>>> > 4096
>>> > bytes, directio = 1, aio = 1
>>> > terminate called after throwing an instance of
>>> > 'ceph::buffer::malformed_input'
>>> >   what():  buffer::mal

Re: [ceph-users] Getting placement groups to place evenly (again)

2015-04-16 Thread Gregory Farnum
On Sat, Apr 11, 2015 at 12:11 PM, J David  wrote:
> On Thu, Apr 9, 2015 at 7:20 PM, Gregory Farnum  wrote:
>> Okay, but 118/85 = 1.38. You say you're seeing variance from 53%
>> utilization to 96%, and 53%*1.38 = 73.5%, which is *way* off your
>> numbers.
>
> 53% to 96% is with all weights set to default (i.e. disk size) and all
> reweights set to 1.  (I.e. before reweight-by-utilization and many
> hours of hand-tuning).

Ah, I see.

>
>> But it might just be faster to look for
>> anomalies within the size of important bits on the OSD — leveldb
>> stores, etc that don't correspond to the PG count).
>
> That would only work if I understood what you said and knew how to do it. :)

The OSD backing store sits on a regular filesystem. There are
directories within it for each PG, as well as for things like the
LevelDB instance embedded in each OSD.
If you're just getting unlucky with the big PGs ending up on OSDs
which already have too many PGs, then there's a CRUSH balancing
problem and you may be out of luck. But if, say, the LevelDB store is
just bigger on some OSDs than others for no particular reason, you
could maybe do something about that.

Since I now realize you did a bunch of reweighting to try and make
data match up I don't think you'll find something like badly-sized
LevelDB instances, though.

Final possibility which I guess hasn't been called out here is to make
sure that your CRUSH map is good and actually expected to place things
evenly. Can you share it?
Since you've got 38 OSDs and 8 nodes some of the hosts are clearly
different sizes; is there any correlation between which size the node
is and how full its OSDs are?
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] metadata management in case of ceph object storage and ceph block storage

2015-04-16 Thread Josef Johansson
Hi,

Maybe others had your mail going into junk as well, but that is why at least I 
did not see  it.

To your question, which I’m not sure I understand completely.

In Ceph you have three distinct types of services,

Mon, Monitors
MDS, Metadata Servers
OSD, Object Storage Devices

And some other concepts

PG, placement group
Object
Pool

So a Pool contains PGs that contains Objects, in that order.
Monitors keeps track of pools and PGs, and the objects are kept on the OSD.

In case you’re running CephFS, the Ceph File System, you also have files, which 
the MDS keeps track of.

So yes, you don’t need the MDS if you just keep track of block storage and 
object storage. (i.e. images for KVM)

So the Mon keeps track of the metadata for the Pool and PG
and the MDS keep track of all the files, hence the MDS should have at least 10x 
the memory of what the Mon have.

I’m no Ceph expert, especially not on CephFS, but this is my picture of it :)

Maybe the architecture docs could help you out? 
http://docs.ceph.com/docs/master/architecture/#cluster-map 


Hope that resolves your question.

Cheers,
Josef

> On 06 Apr 2015, at 18:51, pragya jain  wrote:
> 
> Please somebody reply my queries.
> 
> Thank yuo
>  
> -
> Regards
> Pragya Jain
> Department of Computer Science
> University of Delhi
> Delhi, India
> 
> 
> 
> On Saturday, 4 April 2015 3:24 PM, pragya jain  wrote:
> 
> 
> hello all!
> 
> As the documentation said "One of the unique features of Ceph is that it 
> decouples data and metadata".
> for applying the mechanism of decoupling, Ceph uses Metadata Server (MDS) 
> cluster.
> MDS cluster manages metadata operations, like open or rename a file
> 
> On the other hand, Ceph implementation for object storage as a service and 
> block storage as a service does not require MDS implementation.
> 
> My question is:
> In case of object storage and block storage, how does Ceph manage the 
> metadata?
> 
> Please help me to understand this concept more clearly.
> 
> Thank you
>  
> -
> Regards
> Pragya Jain
> Department of Computer Science
> University of Delhi
> Delhi, India
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cache-tier problem when cache becomes full

2015-04-16 Thread Xavier Serrano
Hello all,

We are trying to run some tests on a cache-tier Ceph cluster, but
we are encountering serious problems, which eventually lead the cluster
unusable.

We are apparently doing something wrong, but we have no idea of
what it could be. We'd really appreciate if someone could point us what
to do.

We are running Ceph version:
ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)

All nodes are Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-34-generic x86_64)

Our test cluster is:
 * disk-host-1: monitor, with 128 GB RAM

 * disk-brick-3, disk-brick-4, disk-brick-5: each node has:
   - 32 GB RAM
   - /dev/sda and /dev/sdb: 2 TB spinning HDDs
   - /dev/sdu: 400 GB SSD

 * disk-host-5: client, with 128 GB RAM

Please, find the ceph.conf file and the decompiled CRUSH map at the end
of this e-mail.


This is what we do:

(1) Create a pool named "cache_pool":
sudo ceph osd pool create cache_pool 32 32
sudo ceph osd pool set cache_pool crush_ruleset 4

(2) Increase PGs of the default "rbd" pool before putting anything in it:
sudo ceph osd pool set rbd pg_num 256
sudo ceph osd pool set rbd pgp_num 256

(3) Create cache-tier having the new "cache_pool" over "rbd":
sudo ceph osd tier add rbd cache_pool
sudo ceph osd tier cache-mode cache_pool writeback
sudo ceph osd tier set-overlay rbd cache_pool

(4) Configure some parameters for "cache_pool":
sudo ceph osd pool set cache_pool hit_set_type bloom
sudo ceph osd pool set cache_pool hit_set_count 1
sudo ceph osd pool set cache_pool hit_set_period 300
sudo ceph osd pool set cache_pool cache_min_flush_age 300
sudo ceph osd pool set cache_pool cache_min_evict_age 300
sudo ceph osd pool set cache_pool target_max_bytes 0
sudo ceph osd pool set cache_pool target_max_objects 0
sudo ceph osd pool set cache_pool cache_target_dirty_ratio .4
sudo ceph osd pool set cache_pool cache_target_full_ratio .8

(5) Create a 2 TB object to run our tests:
sudo rbd create fiobench --size 2048000

(6) In the client (disk-host-5), map and mount the object:
sudo rbd map --image fiobench ---> result is /dev/rbd0
mkfs.xfs /dev/rbd0
mkdir /mnt/fio
mount /dev/rbd0 /mnt/fio

(7) Run the fio tests (http://packages.ubuntu.com/trusty/fio)
in the client. Please, find the fiobench.sh script at the end of
this e-mail with all the details.

fio creates 64 files of 30 GB each on the /mnt/fio filesystem
(built on top of a RADOS image) prior to its measurements. Creating the
files works OK, and the benchmark begins.

After a while, the benchmark becomes stalled. Read and write tests
were completed, but random read tests just hung. Inspecting the cluster,
we see that one OSD in the cache_pool has become full, and ceph has
marked it down.

>From that point, it is not possible to resume the benchmark,
and we are not able to get the cluster healthy (HEALTH_OK) back again.

Any ideas will be very much appreciated.

Thank you very much for your time and your help.

Best regards,
- Xavier Serrano
- LCAC, Laboratori de Càlcul
- Departament d'Arquitectura de Computadors
- UPC, Universitat Politècnica de Catalunya, BarcelonaTECH


The /etc/ceph/ceph.conf file is:

[global]
fsid = 726babd1-c7df-4fed-8b5f-c5a70d35c4a0
mon_initial_members = disk-host-1
mon_host = 192.168.31.65
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
public_network = 192.168.31.0/24


The CRUSH map looks like this:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host disk-brick-4 {
id -2   # do not change unnecessarily
# weight 3.980
alg straw
hash 0  # rjenkins1
item osd.0 weight 1.810
item osd.3 weight 1.810
item osd.6 weight 0.360
}
host disk-brick-5 {
id -3   # do not change unnecessarily
# weight 3.980
alg straw
hash 0  # rjenkins1
item osd.1 weight 1.810
item osd.4 weight 1.810
item osd.7 weight 0.360
}
host disk-brick-6 {
id -4   # do not change unnecessarily
# weight 3.980
alg straw
hash 0  # rjenkins1
item osd.2 weight 1.810
item osd.5 weight 1.810
item osd.8 weight 0.360
}
root default {
id -1   # do not change unnecessarily
# weight 11.940
alg straw
hash 0  # rjenkins1
item osd.0 weight 1.810
item osd.3 weight 1.810
item osd.1 weight 1.810
item osd.4 weight 1.810
item osd.2 weight 1.810
item osd.5 weight 1.810
}

root cache {
id -5
alg straw
hash 0
item 

Re: [ceph-users] Cache-tier problem when cache becomes full

2015-04-16 Thread LOPEZ Jean-Charles
Hi Xavier

see comments inline

JC

> On 16 Apr 2015, at 23:02, Xavier Serrano  wrote:
> 
> Hello all,
> 
> We are trying to run some tests on a cache-tier Ceph cluster, but
> we are encountering serious problems, which eventually lead the cluster
> unusable.
> 
> We are apparently doing something wrong, but we have no idea of
> what it could be. We'd really appreciate if someone could point us what
> to do.
> 
> We are running Ceph version:
> ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)
> 
> All nodes are Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-34-generic x86_64)
> 
> Our test cluster is:
> * disk-host-1: monitor, with 128 GB RAM
> 
> * disk-brick-3, disk-brick-4, disk-brick-5: each node has:
>   - 32 GB RAM
>   - /dev/sda and /dev/sdb: 2 TB spinning HDDs
>   - /dev/sdu: 400 GB SSD
> 
> * disk-host-5: client, with 128 GB RAM
> 
> Please, find the ceph.conf file and the decompiled CRUSH map at the end
> of this e-mail.
> 
> 
> This is what we do:
> 
> (1) Create a pool named "cache_pool":
> sudo ceph osd pool create cache_pool 32 32
> sudo ceph osd pool set cache_pool crush_ruleset 4
> 
> (2) Increase PGs of the default "rbd" pool before putting anything in it:
> sudo ceph osd pool set rbd pg_num 256
> sudo ceph osd pool set rbd pgp_num 256
> 
> (3) Create cache-tier having the new "cache_pool" over "rbd":
> sudo ceph osd tier add rbd cache_pool
> sudo ceph osd tier cache-mode cache_pool writeback
> sudo ceph osd tier set-overlay rbd cache_pool
> 
> (4) Configure some parameters for "cache_pool":
> sudo ceph osd pool set cache_pool hit_set_type bloom
> sudo ceph osd pool set cache_pool hit_set_count 1
> sudo ceph osd pool set cache_pool hit_set_period 300
> sudo ceph osd pool set cache_pool cache_min_flush_age 300
> sudo ceph osd pool set cache_pool cache_min_evict_age 300
I would lower this parameter. If the cache pool really gets full you will not 
be able to evict an object if it’s not more than 5 minute old.
—-
> sudo ceph osd pool set cache_pool target_max_bytes 0
> sudo ceph osd pool set cache_pool target_max_objects 0
—-
One or both above parameters between the dashed lines must be set
> sudo ceph osd pool set cache_pool cache_target_dirty_ratio .4 <—— The ratio 
> here are expressed as a proportional value of target_max_bytes and/or 
> target_max_objects
> sudo ceph osd pool set cache_pool cache_target_full_ratio .8  <—— The ratio 
> here are expressed as a proportional value of target_max_bytes and/or 
> target_max_objects
—-
> 
> (5) Create a 2 TB object to run our tests:
> sudo rbd create fiobench --size 2048000
> 
> (6) In the client (disk-host-5), map and mount the object:
> sudo rbd map --image fiobench ---> result is /dev/rbd0
> mkfs.xfs /dev/rbd0
> mkdir /mnt/fio
> mount /dev/rbd0 /mnt/fio
> 
> (7) Run the fio tests (http://packages.ubuntu.com/trusty/fio)
> in the client. Please, find the fiobench.sh script at the end of
> this e-mail with all the details.
> 
> fio creates 64 files of 30 GB each on the /mnt/fio filesystem
> (built on top of a RADOS image) prior to its measurements. Creating the
> files works OK, and the benchmark begins.
> 
> After a while, the benchmark becomes stalled. Read and write tests
> were completed, but random read tests just hung. Inspecting the cluster,
> we see that one OSD in the cache_pool has become full, and ceph has
> marked it down.
> 
> From that point, it is not possible to resume the benchmark,
> and we are not able to get the cluster healthy (HEALTH_OK) back again.
> 
> Any ideas will be very much appreciated.
> 
> Thank you very much for your time and your help.
> 
> Best regards,
> - Xavier Serrano
> - LCAC, Laboratori de Càlcul
> - Departament d'Arquitectura de Computadors
> - UPC, Universitat Politècnica de Catalunya, BarcelonaTECH
> 
> 
> The /etc/ceph/ceph.conf file is:
> 
> [global]
> fsid = 726babd1-c7df-4fed-8b5f-c5a70d35c4a0
> mon_initial_members = disk-host-1
> mon_host = 192.168.31.65
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> public_network = 192.168.31.0/24
> 
> 
> The CRUSH map looks like this:
> 
> # begin crush map
> tunable choose_local_tries 0
> tunable choose_local_fallback_tries 0
> tunable choose_total_tries 50
> tunable chooseleaf_descend_once 1
> 
> # devices
> device 0 osd.0
> device 1 osd.1
> device 2 osd.2
> device 3 osd.3
> device 4 osd.4
> device 5 osd.5
> device 6 osd.6
> device 7 osd.7
> device 8 osd.8
> 
> # types
> type 0 osd
> type 1 host
> type 2 chassis
> type 3 rack
> type 4 row
> type 5 pdu
> type 6 pod
> type 7 room
> type 8 datacenter
> type 9 region
> type 10 root
> 
> # buckets
> host disk-brick-4 {
>   id -2   # do not change unnecessarily
>   # weight 3.980
>   alg straw
>   hash 0  # rjenkins1
>   item osd.0 weight 1.810
>   item osd.3 weight 1.810
>   item osd.6 w