date:20131106


Why? Recovery is made from OSDs/SSD, why ceph is heavy on OS disks?
There is nothing usefull to read from that disks during a recovery.


See this thread: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-October/005378.html

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] s3 user can't create bucket

2013-11-06 Thread Yehuda Sadeh

On Tue, Nov 5, 2013 at 11:28 PM, lixuehui  wrote:
> Hi all:
>
> I failed to create bucket with s3 API. the error is 403 'Access Denied'.In
> fact ,I've give the user write permission.
> { "user_id": "lxh",
>   "display_name": "=lxh",
>   "email": "",
>   "suspended": 0,
>   "max_buckets": 1000,
>   "auid": 0,
>   "subusers": [],
>   "keys": [
> { "user": "lxh",
>   "access_key": "JZ9N42JQY636PTTZ76VZ",
>   "secret_key": "2D37kjLXda7dPxGpjJ3ZhNCBHzd9wmxoJnf9FcQo"}],
>   "swift_keys": [],
>   "caps": [
> { "type": "usage",
>   "perm": "*"},
> { "type": "user",
>   "perm": "*"}],
>   "op_mask": "read, write, delete",
>   "default_placement": "",
>   "placement_tags": []}
>
> At the same time,there is not '\' generated in the secret_key.
>
> 2013-11-06 15:20:31.787363 7f167df9b700  2 req 1:0.000522::PUT
> /my_bucket/::initializing
> 2013-11-06 15:20:31.787435 7f167df9b700 10 host=cephtest.com
> rgw_dns_name=ceph-osd26
> 2013-11-06 15:20:31.787929 7f167df9b700 10 s->object=
> s->bucket=my_bucket
> 2013-11-06 15:20:31.788085 7f167df9b700 20 FCGI_ROLE=RESPONDER
> 2013-11-06 15:20:31.788107 7f167df9b700 20 SCRIPT_URL=/my_bucket/
> 2013-11-06 15:20:31.788119 7f167df9b700 20
> SCRIPT_URI=http://cephtest.com/my_bucket/
> 2013-11-06 15:20:31.788130 7f167df9b700 20 HTTP_HOST=cephtest.com
> 2013-11-06 15:20:31.788140 7f167df9b700 20 HTTP_ACCEPT_ENCODING=identity
> 2013-11-06 15:20:31.788151 7f167df9b700 20 HTTP_DATE=Wed, 06 Nov 2013
> 07:20:31 GMT
> 2013-11-06 15:20:31.788162 7f167df9b700 20 CONTENT_LENGTH=0
> 2013-11-06 15:20:31.788172 7f167df9b700 20 HTTP_USER_AGENT=Boto/2.15.0
> Python/2.7.3 Linux/3.5.0-23-generic
> 2013-11-06 15:20:31.788182 7f167df9b700 20 PATH=/usr/local/bin:/usr/bin:/bin
> 2013-11-06 15:20:31.788193 7f167df9b700 20 SERVER_SIGNATURE=
> 2013-11-06 15:20:31.788203 7f167df9b700 20 SERVER_SOFTWARE=Apache/2.2.22
> (Ubuntu)
> 2013-11-06 15:20:31.788213 7f167df9b700 20 SERVER_NAME=cephtest.com
> 2013-11-06 15:20:31.788223 7f167df9b700 20 SERVER_ADDR=192.168.50.116
> 2013-11-06 15:20:31.788234 7f167df9b700 20 SERVER_PORT=80
> 2013-11-06 15:20:31.788247 7f167df9b700 20 REMOTE_ADDR=192.168.50.116
> 2013-11-06 15:20:31.788260 7f167df9b700 20 DOCUMENT_ROOT=/var/www/
> 2013-11-06 15:20:31.788311 7f167df9b700 20 SERVER_ADMIN=[no address given]
> 2013-11-06 15:20:31.788324 7f167df9b700 20
> SCRIPT_FILENAME=/var/www/s3gw.fcgi
> 2013-11-06 15:20:31.788336 7f167df9b700 20 REMOTE_PORT=45737
> 2013-11-06 15:20:31.788348 7f167df9b700 20 GATEWAY_INTERFACE=CGI/1.1
> 2013-11-06 15:20:31.788361 7f167df9b700 20 SERVER_PROTOCOL=HTTP/1.1
> 2013-11-06 15:20:31.788374 7f167df9b700 20 REQUEST_METHOD=PUT
> 2013-11-06 15:20:31.788389 7f167df9b700 20
> QUERY_STRING=[E=HTTP_AUTHORIZATION:AWS
> JZ9N42JQY636PTTZ76VZ:ttIro1R21j6GAjVsDITrz5DK66Y=,L]

Your rewrite rule is broken.

> 2013-11-06 15:20:31.788471 7f167df9b700 20 REQUEST_URI=/my_bucket/
> 2013-11-06 15:20:31.788476 7f167df9b700 20 SCRIPT_NAME=/my_bucket/
> 2013-11-06 15:20:31.788483 7f167df9b700  2 req 1:0.001643:s3:PUT
> /my_bucket/::getting op
> 2013-11-06 15:20:31.788519 7f167df9b700  2 req 1:0.001679:s3:PUT
> /my_bucket/:create_bucket:authorizing
> 2013-11-06 15:20:31.788638 7f167df9b700  2 req 1:0.001798:s3:PUT
> /my_bucket/:create_bucket:reading permissions
> 2013-11-06 15:20:31.788688 7f167df9b700  2 req 1:0.001847:s3:PUT
> /my_bucket/:create_bucket:verifying op mask
> 2013-11-06 15:20:31.788719 7f167df9b700 20 required_mask= 2 user.op_mask=7
> 2013-11-06 15:20:31.788743 7f167df9b700  2 req 1:0.001903:s3:PUT
> /my_bucket/:create_bucket:verifying op permissions
> 2013-11-06 15:20:31.789225 7f167df9b700  2 req 1:0.002385:s3:PUT
> /my_bucket/:create_bucket:http status=403
> 2013-11-06 15:20:31.790319 7f167df9b700  1 == req done req=0x20d6eb0
> http_status=403 ==
>
>
>
> the program is like this:
>
> import boto
> import boto.s3.connection
> access_key='JZ9N42JQY636PTTZ76VZ'
> secret_key='2D37kjLXda7dPxGpjJ3ZhNCBHzd9wmxoJnf9FcQo'
> conn=boto.connect_s3(
> aws_access_key_id=access_key,
> aws_secret_access_key=secret_key,
> host="cephtest.com",
> is_secure=False,
> calling_format=boto.s3.connection.OrdinaryCallingFormat(),
> )
> print "hello world"
> conn.create_bucket('my_bucket')
>
> It seems the permission problem,but I really can not reslove the problem
> with the user information.
> HELP!
> HELP!!
> Thanks for any help!
>
>

You need to fix your apache rewrite rule. Basically the authorization
header is not passed correctly.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Running on disks that lose their head

2013-11-06 Thread Sage Weil

On Wed, 6 Nov 2013, Loic Dachary wrote:
> Hi Ceph,
> 
> People from Western Digital suggested ways to better take advantage of 
> the disk error reporting. They gave two examples that struck my 
> imagination. First there are errors that look like the disk is dying ( 
> read / write failures ) but it's only a transient problem and the driver 
> should be able to make the difference by properly interpreting the 
> available information. They said that the prolonged life you get if you 
> don't decommission a disk that only has a transient error is 

This make me think we really need to build or integrate with some generic 
SMART reporting infrastructure so that we can identify disks that are 
failing or going to fail.  What to do with that information is another 
question; initially I would lean toward just marking the disk out, but 
there may be smarter alternatives to investigate.

> significant. The second example is when one head out of ten fails : 
> disks can keep working with the nine remaining heads. Losing 1/10 of the 
> disk is likely to result in a full re-install of the Ceph osd. But, 
> again, the disk could keep going after that, with 9/10 of its original 
> capacity. And Ceph is good at handling osd failures.

Yeah...but if you lose 1/10 of a block device any existing local file 
system is going to blow up.  I suspet this is something that newgangled 
interfaces like Kinetic will be much better at.  Even then, though, it is 
challenging for anything sitting above to cope with losing some random 
subset of it's data underneath.  To a first approximation, for this to be 
useful, the fs and disk would need to keep, say, all teh data in a 
particular PG confined to a single platter, so that when a head goes the 
other PGs are still fully intact and usage.  It is probably a long way to 
get from here to there...

> All this is news to me and sounds really cool. But I'm sure there are 
> people who already know about it and I'm eager to hear their opinion :-)
> 
> Cheers
> 
> -- 
> Lo?c Dachary, Artisan Logiciel Libre
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] stopped backfilling process

2013-11-06 Thread Dominik Mostowiec

I hope it will help.

crush: https://www.dropbox.com/s/inrmq3t40om26vf/crush.txt
ceph osd dump: https://www.dropbox.com/s/jsbt7iypyfnnbqm/ceph_osd_dump.txt

--
Regards
Dominik

2013/11/6 yy-nm :
> On 2013/11/5 22:02, Dominik Mostowiec wrote:
>>
>> Hi,
>> After remove ( ceph osd out X) osd from one server ( 11 osd ) ceph
>> starts data migration process.
>> It stopped on:
>> 32424 pgs: 30635 active+clean, 191 active+remapped, 1596
>> active+degraded, 2 active+clean+scrubbing;
>> degraded (1.718%)
>>
>> All osd with reweight==1 are UP.
>>
>> ceph -v
>> ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33)
>>
>> health details:
>> https://www.dropbox.com/s/149zvee2ump1418/health_details.txt
>>
>> pg active+degraded query:
>> https://www.dropbox.com/s/46emswxd7s8xce1/pg_11.39_query.txt
>> pg active+remapped query:
>> https://www.dropbox.com/s/wij4uqh8qoz60fd/pg_16.2172_query.txt
>>
>> Please help - how can we fix it?
>>
> can you show  your  decoded crushmap? and output of #ceph osd dump ?
>
> ---
> 此电子邮件没有病毒和恶意软件，因为 avast! 防病毒保护处于活动状态。
> http://www.avast.com
>



-- 
Pozdrawiam
Dominik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] locking rbd device

2013-11-06 Thread Wolfgang Hennerbichler

On 08/26/2013 09:03 AM, Wolfgang Hennerbichler wrote:
> hi list,
> 
> I realize there's a command called "rbd lock" to lock an image. Can
> libvirt use this to prevent virtual machines from being started
> simultaneously on different virtualisation containers?

Anser to myself, only 2 months later ;)

libvirt can do this. There are hooks in libvirt, to call actions when
VM's are stopped or started. I've written a script (that is not perfect
but working), that will lock all your rbd-images according to your
libvirt-configuration once you start your VM. If you try to start the VM
on another hypervisor (that has the same hooks in place) it will refuse
to start the VM.

I've documented and published my script on my webpage:
http://www.wogri.at/en/linux/ceph-libvirt-locking/

> wogri
> 

-- 
http://www.wogri.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw questions

2013-11-06 Thread Alessandro Brega

Good day ceph users,

I'm new to ceph but installation went well so far. Now I have a lot of
questions regarding radosgw. Hope you don't mind...

1. To build a high performance yet cheap radosgw storage, which pools
should be placed on ssd and which on hdd backed pools? Upon installation of
radosgw, it created the following pools: .rgw, .rgw.buckets,
.rgw.buckets.index, .rgw.control, .rgw.gc, .rgw.root, .usage, .users,
.users.email.

2. In order to have very high availability I like to setup two different
ceph clusters, each in its own datacenter. How to configure radowsgw to
make use of this layout? Can I have a multi-master setup with having a load
balancer (or using geo-dns) which distributes the load to radosgw instances
in both datacenters?

3. Is it possible to start with a simple setup now (only one ceph cluster)
and later add the multi-datacenter redundancy described above without
downtime? Do I have to respect any special pool-naming requirements?

4. Which number of replaction would you suggest? In other words, which
replication is need to achive 99.9% durability like dreamobjects states?

5. Is it possible to map fqdn custom domain to buckets, not only subdomains?

6. The command "radosgw-admin pool list" returns "could not list placement
set: (2) No such file or directory". But radosgw seems to work as expected
anyway?

Looking forward to your suggestions.

Alessandro Brega
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Puppet Modules for Ceph

2013-11-06 Thread Karan Singh

Dear Cephers 

I have a running ceph cluster that was deployed using ceph-deploy , our next 
objective is to build a Puppet setup that can be used for long term scaling of 
ceph infrastructure.

It would be a great help if any one can 

1) Provide ceph modules for (centos OS)
2) Guidance on how to proceed

Many Thanks
Karan Singh


- Original Message -
From: "Karan Singh" 
To: "Loic Dachary" 
Cc: ceph-users@lists.ceph.com
Sent: Monday, 4 November, 2013 5:01:26 PM
Subject: Re: [ceph-users] Ceph deployment using puppet

Hello Loic

Thanks for your reply , Ceph-deploy works good to me.

My next objective is to deploy ceph using puppet. Can you guide me now i can 
proceed.

Regards
karan

- Original Message -
From: "Loic Dachary" 
To: ceph-users@lists.ceph.com
Sent: Monday, 4 November, 2013 4:45:06 PM
Subject: Re: [ceph-users] Ceph deployment using puppet

Hi,

Unless you're force to use puppet for some reason, I suggest you give 
ceph-deploy a try:

http://ceph.com/docs/master/start/quick-ceph-deploy/

Cheers

On 04/11/2013 19:00, Karan Singh wrote:
> Hello Everyone
> 
> Can  someone guide me how i can start for " ceph deployment using puppet " , 
> what all things i need to have for this .
> 
> I have no prior idea of using puppet , hence need your help to getting 
> started with it.
> 
> 
> Regards
> Karan Singh
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Disk Density Considerations

2013-11-06 Thread Darren Birkett

Hi,

I understand from various reading and research that there are a number of
things to consider when deciding how many disks one wants to put into a
single chassis:

1. Higher density means higher failure domain (more data to re-replicate if
you lose a node)
2. More disks means more CPU/memory horsepower to handle the number of OSDs
3. Network becomes a bottleneck with too many OSDs per node
4. ...

We are looking at building high density nodes for small scale 'starter'
deployments for our customers (maybe 4 or 5 nodes).  High density in this
case could mean a 2u chassis with 2x external 45 disk JBOD containers
attached.  That's 90 3TB disks/OSDs to be managed by a single node.  That's
about 243TB of potential usable space, and so (assuming up to 75% fillage)
maybe 182TB of potential data 'loss' in the event of a node failure.  On an
uncongested, unused, 10Gbps network, my back-of-a-beer-mat calculations say
that would take about 45 hours to get the cluster back into an undegraded
state - that is the requisite number of copies of all objects.

Assuming that you can shove in a pair of hex core hyperthreaded processors,
you're probably OK with number 2.  If you're already considering 10GbE
networking for the storage network, there's probably not much you can do
about 3 unless you want to spend a lot more money (and the reason we're
going so dense is to keep this as a cheap option).  So the main thing would
seem to be a real fear of 'losing' so much data in the event of a node
failure.  Who wants to wait 45 hours (probably much longer assuming the
cluster remains live and has production traffic traversing that networl)
for the cluster to self-heal?

But surely this fear is based on an assumption that in that time, you've
not identified and replaced the failed chassis.  That you would sit for 2-3
days and just leave the cluster to catch up, and not actually address the
broken node.  Given good data centre processes, a good stock of spare
parts, isn't it more likely that you'd have replaced that node and got
things back up and running in a mater of hours?  In all likelyhood, a node
crash/failure is not likely to have taken out all, or maybe any, of the
disks, and a new chassis can just have the JBODs plugged back in and away
you go?

I'm sure I'm missing some other pieces, but if you're comfortable with your
hardware replacement processes, doesn't number 1 become a non-fear really?
I understand that in some ways it goes against the concept of ceph being
self healing, and that in an ideal world you'd have lots of lower density
nodes to limit your failure domain, but when being driven by cost isn't
this an OK way to look at things?  What other glaringly obvious
considerations am I missing with this approach?

Darren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] stopped backfilling process

2013-11-06 Thread Bohdan Sydor

On Tue, Nov 5, 2013 at 3:02 PM, Dominik Mostowiec
 wrote:
> After remove ( ceph osd out X) osd from one server ( 11 osd ) ceph
> starts data migration process.
> It stopped on:
> 32424 pgs: 30635 active+clean, 191 active+remapped, 1596
> active+degraded, 2 active+clean+scrubbing;
> degraded (1.718%)
>
> All osd with reweight==1 are UP.
>
> ceph -v
> ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33)

Hi,

Below, I'm pasting some more information on this issue.

The cluster status hasn't been changed for more than 24 hours:
# ceph health
HEALTH_WARN 1596 pgs degraded; 1787 pgs stuck unclean; recovery
2142704/123949567 degraded (1.729%)

I parsed the output of ceph pg dump and can see there three types of pg states:

1. *Two* osd's are up and *two* acting:

16.11   [42, 92][42, 92]active+degraded
17.10   [42, 92][42, 92]active+degraded

2. *Three* osd's are up and *three* acting:

12.d[114, 138, 5]   [114, 138, 5]   active+clean
15.e[13, 130, 142]  [13, 130, 142]  active+clean

3. *Two* osd's that are and *three* acting:

16.2256 [63, 109]   [63, 109, 40]   active+remapped
16.220b [129, 22]   [129, 22, 47]   active+remapped

A part of the crush map:

rack rack1 {
id -5   # do not change unnecessarily
# weight 60.000
alg straw
hash 0  # rjenkins1
item storinodfs1 weight 12.000
item storinodfs11 weight 12.000
item storinodfs6 weight 12.000
item storinodfs9 weight 12.000
item storinodfs8 weight 12.000
}

rack rack2 {
id -7   # do not change unnecessarily
# weight 48.000
alg straw
hash 0  # rjenkins1
item storinodfs3 weight 12.000
item storinodfs4 weight 12.000
item storinodfs2 weight 12.000
item storinodfs10 weight 12.000
}

rack rack3 {
id -10  # do not change unnecessarily
# weight 36.000
alg straw
hash 0  # rjenkins1
item storinodfs5 weight 12.000 <=== all osd's on this node
have been disabled by ceph osd out
item storinodfs7 weight 12.000
item storinodfs12 weight 12.000
}

rule data {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type rack
step emit
}

The command ceph osd out has been invoked on all osd's on storinodfs5,
and I can see them all down while listing with ceph osd tree:

-11 12  host storinodfs5
48  1   osd.48  down0
49  1   osd.49  down0
50  1   osd.50  down0
51  1   osd.51  down0
52  1   osd.52  down0
53  1   osd.53  down0
54  1   osd.54  down0
55  1   osd.55  down0
56  1   osd.56  down0
57  1   osd.57  down0
58  1   osd.58  down0
59  1   osd.59  down0



I wonder if the current cluster state might be related to the fact
that the crush map keeps information that storinodfs5 has weight 12?
We're unable to make ceph recover from this faulty state.

Any hints are very appreciated.

-- 
Regards,
Bohdan Sydor
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] deployment architecture practices / new ideas?

2013-11-06 Thread Gautam Saxena

We're looking to deploy CEPH on about 8 Dell servers to start, each of
which typically contain 6 to 8 harddisks with Perc RAID controllers which
support write-back cache (~512 MB usually). Most machines have between 32
and 128 GB RAM. Our questions are as follows. Please feel free to comment
on even just one of the questions below if that's the area of your
expertise/interest.

1. Based on various "best practice" guides, they suggest putting the OS
on a separate disk. But, we though that would not be good because we'd
sacrifice a whole disk on each machine (~3 TB) or even two whole disks (~6
TB) if we did a hardware RAID 1 on it. So, do people normally just
sacrifice one whole disk? Specifically, we came up with this idea:
1. We set up all hard disks as "pass-through" in the raid controller,
so that the RAID controller's cache is still in effect, but the OS sees
just a bunch of disks (6 to 8 in our case)
2. We then do a SOFTWARE-baised RAID 1 (using Centos 6.4) for the OS
across all 6 to 8 hardisks
3. We then do a SOFTWARE-baised RAID 0 (using Centos 6.4) for the
SWAP space.
4. *Does anyone see any flaws in our idea above? We think that RAID 1
is not computationally expensive for the machines to computer,
and most of
the time, the OS should be in RAM. Similarly, we think RAID 0 should be
easy for the CPU to compute, and hopefully, we won't hit much SWAP if we
have enough RAM. And this way, we don't sacrific 1 or 2 whole disks for
just the OS.*
2. Based on the performance benchmark blog of Marc Nelson (

http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/),
has anything substantially changed since then? Specifically, it suggests
that SSDs may not be really necessary if one has raid controllers with
write-back cache. Is this still true even though the article was written
with a version of CEPH that was over 1 year old? (Marc suggests that things
may change with newer versions of CEPH)
3. Based on our understanding, it would seem that CEPH can deliver very
high throughput performance (especially for reads) if dozens and dozeons of
hard disks are being accessed simultaneously across multiple machines. So,
we could have several GBs throughput, right? (CEPH never advertises the
advantage of read throughput with distributed architecture, so I'm
wondering if I'm missing something.) If so, then is it reasonable to assume
that one common bottleneck is the ethernet? So if we only use 1 NIC card at
1 GBs, that'll be a major bottleneck? If so, we're thinking of trying to
"bond" multiple 1 GB/s ethernet cards to make a "bonded" ethernet
connection of 4 GBs (4 * 1 GB/s). But we didn't see anyone discuss this
strategy? Is there any holes in it? Or does CEPH "automatically" take
advantage of multiple NIC cards without us having to deal with the
complexity (and expense of buying a new switch which supports bonding) for
doing bonding? That is, is it possible and a good idea to have CEPH OSDs be
set up to use specific NICs, so that we spread the load? (We read through
the recommendation of having different NICs for front-end traffic vs
back-end traffic, but we're not worried about network attacks -- so we're
thinking that just creating a "big" fat ethernet pipe gives us the most
flexibility.)
4. I'm a little confused -- does CEPH support incremental snapshots of
either VMs or the CEPH-FS? I saw in the release notes for "dumpling"
release (http://ceph.com/docs/master/release-notes/#v0-67-dumpling) this
statement: "The MDS now disallows snapshots by default as they are not
considered stable. The command ‘ceph mds set allow_snaps’ will enable
them." So, should I assume that we can't do incremental file-system
snapshots in a stable fashion until further notice?

-Sidharta
--
*Gautam Saxena *
President & CEO
Integrated Analysis Inc.

Making Sense of Data.™
Biomarker Discovery Software | Bioinformatics Services | Data Warehouse
Consulting | Data Migration Consulting
www.i-a-inc.com
gsax...@i-a-inc.com
(301) 760-3077 office
(240) 479-4272 direct
(301) 560-3463 fax
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk Density Considerations

2013-11-06 Thread Andrey Korolyov

On Wed, Nov 6, 2013 at 4:15 PM, Darren Birkett  wrote:
> Hi,
>
> I understand from various reading and research that there are a number of
> things to consider when deciding how many disks one wants to put into a
> single chassis:
>
> 1. Higher density means higher failure domain (more data to re-replicate if
> you lose a node)
> 2. More disks means more CPU/memory horsepower to handle the number of OSDs
> 3. Network becomes a bottleneck with too many OSDs per node
> 4. ...
>
> We are looking at building high density nodes for small scale 'starter'
> deployments for our customers (maybe 4 or 5 nodes).  High density in this
> case could mean a 2u chassis with 2x external 45 disk JBOD containers
> attached.  That's 90 3TB disks/OSDs to be managed by a single node.  That's
> about 243TB of potential usable space, and so (assuming up to 75% fillage)
> maybe 182TB of potential data 'loss' in the event of a node failure.  On an
> uncongested, unused, 10Gbps network, my back-of-a-beer-mat calculations say
> that would take about 45 hours to get the cluster back into an undegraded
> state - that is the requisite number of copies of all objects.
>

For such large number of disks you should consider that the cache
amortization will not take any place even if you are using 1GB
controller(s) - only tiered cache can be an option. Also recovery will
take much more time even if you have a room for client I/O in the
calculations because raw disks have very limited IOPS capacity and
recovery will either take a much longer than such expectations at a
glance or affect regular operations. For S3/Swift it may be acceptable
but for VM images it does not.

> Assuming that you can shove in a pair of hex core hyperthreaded processors,
> you're probably OK with number 2.  If you're already considering 10GbE
> networking for the storage network, there's probably not much you can do
> about 3 unless you want to spend a lot more money (and the reason we're
> going so dense is to keep this as a cheap option).  So the main thing would
> seem to be a real fear of 'losing' so much data in the event of a node
> failure.  Who wants to wait 45 hours (probably much longer assuming the
> cluster remains live and has production traffic traversing that networl) for
> the cluster to self-heal?
>
> But surely this fear is based on an assumption that in that time, you've not
> identified and replaced the failed chassis.  That you would sit for 2-3 days
> and just leave the cluster to catch up, and not actually address the broken
> node.  Given good data centre processes, a good stock of spare parts, isn't
> it more likely that you'd have replaced that node and got things back up and
> running in a mater of hours?  In all likelyhood, a node crash/failure is not
> likely to have taken out all, or maybe any, of the disks, and a new chassis
> can just have the JBODs plugged back in and away you go?
>
> I'm sure I'm missing some other pieces, but if you're comfortable with your
> hardware replacement processes, doesn't number 1 become a non-fear really? I
> understand that in some ways it goes against the concept of ceph being self
> healing, and that in an ideal world you'd have lots of lower density nodes
> to limit your failure domain, but when being driven by cost isn't this an OK
> way to look at things?  What other glaringly obvious considerations am I
> missing with this approach?
>
> Darren
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Running on disks that lose their head


On 2013-11-06 09:33, Sage Weil wrote:


This make me think we really need to build or integrate with some 
generic

SMART reporting infrastructure so that we can identify disks that are
failing or going to fail.


It could be of use especially for SSD devices used for journals.

Unfortunately there seems to be no standard in reporting SSD cell life 
remaining, but if there were some subset of devices supported my thought 
is that the journals could be moved to another drive once write life 
reaches some level, like 5%.  Or some method to shutdown the 
host(/affected OSDs), replace the SSD, then bring them back online with 
the new journal device.


Just thinking out loud, as usual :)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk Density Considerations


On 11/06/2013 06:15 AM, Darren Birkett wrote:

Hi,

I understand from various reading and research that there are a number
of things to consider when deciding how many disks one wants to put into
a single chassis:

1. Higher density means higher failure domain (more data to re-replicate
if you lose a node)
2. More disks means more CPU/memory horsepower to handle the number of OSDs
3. Network becomes a bottleneck with too many OSDs per node
4. ...

We are looking at building high density nodes for small scale 'starter'
deployments for our customers (maybe 4 or 5 nodes).  High density in
this case could mean a 2u chassis with 2x external 45 disk JBOD
containers attached.  That's 90 3TB disks/OSDs to be managed by a single
node.  That's about 243TB of potential usable space, and so (assuming up
to 75% fillage) maybe 182TB of potential data 'loss' in the event of a
node failure.  On an uncongested, unused, 10Gbps network, my
back-of-a-beer-mat calculations say that would take about 45 hours to
get the cluster back into an undegraded state - that is the requisite
number of copies of all objects.


Basically the recommendation I give is that 5 is the absolute bare 
minimum number of nodes I'd put in production, but I'd feel a lot better 
with 10-20 nodes.  The setup you are looking at is 90 drives spread 
across 10U in 1 node, but you could instead use 2 36 drive chassis (I'm 
assuming you are looking at supermicro) with the integrated motherboard 
and do 72 drives in 8U.  The same density, but over double the node 
count.  Further it requires no external SAS cables and you can now do 
4-5 lower bin processors instead of two very top bin processors which 
gives you more overall CPU power for the OSDs.  You can also use cheaper 
less dense memory, and you are buying 1 chassis per node instead of 3 
(though more nodes overall).  Between all of this, you may save enough 
money that the overall hardware costs may not be that much more.


Taking this even further, options like the hadoop fat twin nodes with 12 
drives in 1U potentially could be even denser, while spreading the 
drives out over even more nodes.  Now instead of 4-5 large dense nodes 
you have maybe 35-40 small dense nodes.  The downside here though is 
that the cost may be a bit higher and you have to slide out a whole node 
to swap drives, though Ceph is more tolerant of this than many 
distributed systems.




Assuming that you can shove in a pair of hex core hyperthreaded
processors, you're probably OK with number 2.  If you're already
considering 10GbE networking for the storage network, there's probably
not much you can do about 3 unless you want to spend a lot more money
(and the reason we're going so dense is to keep this as a cheap option).
  So the main thing would seem to be a real fear of 'losing' so much
data in the event of a node failure.  Who wants to wait 45 hours
(probably much longer assuming the cluster remains live and has
production traffic traversing that networl) for the cluster to self-heal?

But surely this fear is based on an assumption that in that time, you've
not identified and replaced the failed chassis.  That you would sit for
2-3 days and just leave the cluster to catch up, and not actually
address the broken node.  Given good data centre processes, a good stock
of spare parts, isn't it more likely that you'd have replaced that node
and got things back up and running in a mater of hours?  In all
likelyhood, a node crash/failure is not likely to have taken out all, or
maybe any, of the disks, and a new chassis can just have the JBODs
plugged back in and away you go?


You might be able to rig up something like this, but honestly hardware 
isn't really the expensive part of distributed systems.  One of the 
advantages that Ceph gives you is that it makes it easier to support 
very large deployments without a ton of maintenance overhead.  Paying an 
extra 10 percent to move away from complicated nodes with external JBODs 
to simpler nodes is worth it imho.




I'm sure I'm missing some other pieces, but if you're comfortable with
your hardware replacement processes, doesn't number 1 become a non-fear
really? I understand that in some ways it goes against the concept of
ceph being self healing, and that in an ideal world you'd have lots of
lower density nodes to limit your failure domain, but when being driven
by cost isn't this an OK way to look at things?  What other glaringly
obvious considerations am I missing with this approach?


When hardware cost is the #1 concern, the way I look at it is that there 
are often one or more sweet spots where it may no longer make sense to 
try to shove more drives in 1 node if it means having to buy denser 
memory, top bin CPUs, exotic controllers, or the very densest drives 
available.




Darren


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___

Re: [ceph-users] Disk Density Considerations

2013-11-06 Thread Darren Birkett

On 6 November 2013 14:08, Andrey Korolyov  wrote:

> > We are looking at building high density nodes for small scale 'starter'
> > deployments for our customers (maybe 4 or 5 nodes).  High density in this
> > case could mean a 2u chassis with 2x external 45 disk JBOD containers
> > attached.  That's 90 3TB disks/OSDs to be managed by a single node.
>  That's
> > about 243TB of potential usable space, and so (assuming up to 75%
> fillage)
> > maybe 182TB of potential data 'loss' in the event of a node failure.  On
> an
> > uncongested, unused, 10Gbps network, my back-of-a-beer-mat calculations
> say
> > that would take about 45 hours to get the cluster back into an undegraded
> > state - that is the requisite number of copies of all objects.
> >
>
> For such large number of disks you should consider that the cache
> amortization will not take any place even if you are using 1GB
> controller(s) - only tiered cache can be an option. Also recovery will
> take much more time even if you have a room for client I/O in the
> calculations because raw disks have very limited IOPS capacity and
> recovery will either take a much longer than such expectations at a
> glance or affect regular operations. For S3/Swift it may be acceptable
> but for VM images it does not.


Sure, but my argument was that you are never likely to actually let that
entire recovery operation complete - you're going to replace the hardware
and plug the disks back in and let them catch up by log replay/backfill.
 Assuming you don't ever actually expect to really lose all data on 90
disks in one go...

By tiered caching, do you mean using something like flashcache or bcache?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Running on disks that lose their head


On 11/06/2013 03:33 AM, Sage Weil wrote:

On Wed, 6 Nov 2013, Loic Dachary wrote:

Hi Ceph,

People from Western Digital suggested ways to better take advantage of
the disk error reporting. They gave two examples that struck my
imagination. First there are errors that look like the disk is dying (
read / write failures ) but it's only a transient problem and the driver
should be able to make the difference by properly interpreting the
available information. They said that the prolonged life you get if you
don't decommission a disk that only has a transient error is


This make me think we really need to build or integrate with some generic
SMART reporting infrastructure so that we can identify disks that are
failing or going to fail.  What to do with that information is another
question; initially I would lean toward just marking the disk out, but
there may be smarter alternatives to investigate.


significant. The second example is when one head out of ten fails :
disks can keep working with the nine remaining heads. Losing 1/10 of the
disk is likely to result in a full re-install of the Ceph osd. But,
again, the disk could keep going after that, with 9/10 of its original
capacity. And Ceph is good at handling osd failures.


Yeah...but if you lose 1/10 of a block device any existing local file
system is going to blow up.  I suspet this is something that newgangled
interfaces like Kinetic will be much better at.  Even then, though, it is
challenging for anything sitting above to cope with losing some random
subset of it's data underneath.  To a first approximation, for this to be
useful, the fs and disk would need to keep, say, all teh data in a
particular PG confined to a single platter, so that when a head goes the
other PGs are still fully intact and usage.  It is probably a long way to
get from here to there...


Putting my sysadmin hat on:

Once I know a drive has had a head failure, do I trust that the rest of 
the drive isn't going to go at an inconvenient moment vs just fixing it 
right now when it's not 3AM on Christmas morning? (true story)  As good 
as Ceph is, do I trust that Ceph is smart enough to prevent spreading 
corrupt data all over the cluster if I leave bad disks in place and they 
start doing terrible things to the data?


Mark




All this is news to me and sounds really cool. But I'm sure there are
people who already know about it and I'm eager to hear their opinion :-)

Cheers

--
Lo?c Dachary, Artisan Logiciel Libre



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk Density Considerations

2013-11-06 Thread Andrey Korolyov

On Wed, Nov 6, 2013 at 6:42 PM, Darren Birkett  wrote:
>
> On 6 November 2013 14:08, Andrey Korolyov  wrote:
>>
>> > We are looking at building high density nodes for small scale 'starter'
>> > deployments for our customers (maybe 4 or 5 nodes).  High density in
>> > this
>> > case could mean a 2u chassis with 2x external 45 disk JBOD containers
>> > attached.  That's 90 3TB disks/OSDs to be managed by a single node.
>> > That's
>> > about 243TB of potential usable space, and so (assuming up to 75%
>> > fillage)
>> > maybe 182TB of potential data 'loss' in the event of a node failure.  On
>> > an
>> > uncongested, unused, 10Gbps network, my back-of-a-beer-mat calculations
>> > say
>> > that would take about 45 hours to get the cluster back into an
>> > undegraded
>> > state - that is the requisite number of copies of all objects.
>> >
>>
>> For such large number of disks you should consider that the cache
>> amortization will not take any place even if you are using 1GB
>> controller(s) - only tiered cache can be an option. Also recovery will
>> take much more time even if you have a room for client I/O in the
>> calculations because raw disks have very limited IOPS capacity and
>> recovery will either take a much longer than such expectations at a
>> glance or affect regular operations. For S3/Swift it may be acceptable
>> but for VM images it does not.
>
>
> Sure, but my argument was that you are never likely to actually let that
> entire recovery operation complete - you're going to replace the hardware
> and plug the disks back in and let them catch up by log replay/backfill.
> Assuming you don't ever actually expect to really lose all data on 90 disks
> in one go...
> By tiered caching, do you mean using something like flashcache or bcache?

Exactly, just another step to offload CPU from I/O time.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] USB pendrive as boot disk

2013-11-06 Thread Carl-Johan Schenström


> See this thread:
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-October/005378.html

I can't find anything about Ceph being heavy on OS disks in that thread, only 
that one shouldn't combine OS and journal on the same disk, since *journals* 
are heavy on the disks and that might slow the node down to a crawl.

-- 
Carl-Johan Schenström
Driftansvarig / System Administrator
Språkbanken & Svensk nationell datatjänst /
The Swedish Language Bank & Swedish National Data Service
Göteborgs universitet / University of Gothenburg
carl-johan.schenst...@gu.se / +46 709 116769
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk Density Considerations

2013-11-06 Thread Dimitri Maziuk


On 2013-11-06 08:37, Mark Nelson wrote:
...

Taking this even further, options like the hadoop fat twin nodes with 12
drives in 1U potentially could be even denser, while spreading the
drives out over even more nodes.  Now instead of 4-5 large dense nodes
you have maybe 35-40 small dense nodes.  The downside here though is
that the cost may be a bit higher and you have to slide out a whole node
to swap drives, though Ceph is more tolerant of this than many
distributed systems.


Another one is 35-40 switch ports vs 4-5. I hear "regular" 10G ports eat 
up over 10 watts of juice and cat6e cable offers a unique combination of 
poor design and high cost. It's probably ok to need 35-40 routable ip 
addresses: you can add another interface & subnet to your public-facing 
clients.


Dima

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Running on disks that lose their head

An anonymous kernel developer sends this link:

http://en.wikipedia.org/wiki/Error_recovery_control


On 06/11/2013 08:32, Loic Dachary wrote:
> Hi Ceph,
> 
> People from Western Digital suggested ways to better take advantage of the 
> disk error reporting. They gave two examples that struck my imagination. 
> First there are errors that look like the disk is dying ( read / write 
> failures ) but it's only a transient problem and the driver should be able to 
> make the difference by properly interpreting the available information. They 
> said that the prolonged life you get if you don't decommission a disk that 
> only has a transient error is significant. The second example is when one 
> head out of ten fails : disks can keep working with the nine remaining heads. 
> Losing 1/10 of the disk is likely to result in a full re-install of the Ceph 
> osd. But, again, the disk could keep going after that, with 9/10 of its 
> original capacity. And Ceph is good at handling osd failures.
> 
> All this is news to me and sounds really cool. But I'm sure there are people 
> who already know about it and I'm eager to hear their opinion :-)
> 
> Cheers
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Puppet Modules for Ceph

2013-11-06 Thread Don Talton (dotalton)

This will work https://github.com/dontalton/puppet-cephdeploy

Just change the unless statements (should only be two) from testing dpkg to 
testing rpm instead.
I'll add an OS check myself, or you can fork and send me a pull request.

> -Original Message-
> From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
> boun...@lists.ceph.com] On Behalf Of Karan Singh
> Sent: Wednesday, November 06, 2013 7:56 PM
> To: ceph-users@lists.ceph.com; ceph-users-j...@lists.ceph.com; ceph-
> us...@ceph.com
> Subject: Re: [ceph-users] Puppet Modules for Ceph
> 
> Dear Cephers
> 
> I have a running ceph cluster that was deployed using ceph-deploy , our next
> objective is to build a Puppet setup that can be used for long term scaling of
> ceph infrastructure.
> 
> It would be a great help if any one can
> 
> 1) Provide ceph modules for (centos OS)
> 2) Guidance on how to proceed
> 
> Many Thanks
> Karan Singh
> 
> 
> - Original Message -
> From: "Karan Singh" 
> To: "Loic Dachary" 
> Cc: ceph-users@lists.ceph.com
> Sent: Monday, 4 November, 2013 5:01:26 PM
> Subject: Re: [ceph-users] Ceph deployment using puppet
> 
> Hello Loic
> 
> Thanks for your reply , Ceph-deploy works good to me.
> 
> My next objective is to deploy ceph using puppet. Can you guide me now i
> can proceed.
> 
> Regards
> karan
> 
> - Original Message -
> From: "Loic Dachary" 
> To: ceph-users@lists.ceph.com
> Sent: Monday, 4 November, 2013 4:45:06 PM
> Subject: Re: [ceph-users] Ceph deployment using puppet
> 
> Hi,
> 
> Unless you're force to use puppet for some reason, I suggest you give ceph-
> deploy a try:
> 
> http://ceph.com/docs/master/start/quick-ceph-deploy/
> 
> Cheers
> 
> On 04/11/2013 19:00, Karan Singh wrote:
> > Hello Everyone
> >
> > Can  someone guide me how i can start for " ceph deployment using
> puppet " , what all things i need to have for this .
> >
> > I have no prior idea of using puppet , hence need your help to getting
> started with it.
> >
> >
> > Regards
> > Karan Singh
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> --
> Loïc Dachary, Artisan Logiciel Libre
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Running on disks that lose their head



> Putting my sysadmin hat on:
> 
> Once I know a drive has had a head failure, do I trust that the rest of the 
> drive isn't going to go at an inconvenient moment vs just fixing it right now 
> when it's not 3AM on Christmas morning? (true story)  As good as Ceph is, do 
> I trust that Ceph is smart enough to prevent spreading corrupt data all over 
> the cluster if I leave bad disks in place and they start doing terrible 
> things to the data?

I'm confident it won't spread corrupt data :-) I would be more worried about 
propagation of bit error from non ECC memory than an object being corrupted.

Am I over optimistic ?

> 
> Mark
>-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk Density Considerations


On 11/06/2013 09:36 AM, Dimitri Maziuk wrote:

On 2013-11-06 08:37, Mark Nelson wrote:
...

Taking this even further, options like the hadoop fat twin nodes with 12
drives in 1U potentially could be even denser, while spreading the
drives out over even more nodes.  Now instead of 4-5 large dense nodes
you have maybe 35-40 small dense nodes.  The downside here though is
that the cost may be a bit higher and you have to slide out a whole node
to swap drives, though Ceph is more tolerant of this than many
distributed systems.


Another one is 35-40 switch ports vs 4-5. I hear "regular" 10G ports eat
up over 10 watts of juice and cat6e cable offers a unique combination of
poor design and high cost. It's probably ok to need 35-40 routable ip
addresses: you can add another interface & subnet to your public-facing
clients.


I figure it's about tradeoffs.  A single 10GbE link for 90 OSDs is 
pretty oversubscribed.  You'll probably be doing at least dual 10GbE 
(one for front and one for back), and for such heavy systems you may 
want redundant network links to reduce the chance of failure, as one of 
those nodes going down is going to have a huge impact on the cluster 
while it's down.


With 35-40 smaller nodes you might do single or dual 10GbE for each node 
if you are shooting for high performance, but if cost is the motivating 
factor you could potentially do a pair of 2 way bonded 1GbE links. 
Having redundant links is less important because the impact of a node 
failure is far less.


As for Cat6 vs SFP+, I tend to favor SFP+ with twinax cables.  The 
cables are more expensive up front, but the cards tend to be a bit 
cheaper and the per-port power consumption is low.  I've heard the 
newest generation of Cat6 products have improved dramatically though, so 
maybe it's a harder decision now.




Dima

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph User Committee

Hi Ceph,

I would like to open a discussion about organizing a Ceph User Committee. We 
briefly discussed the idea with Ross Turk, Patrick McGarry and Sage Weil today 
during the OpenStack summit. A pad was created and roughly summarizes the idea:

http://pad.ceph.com/p/user-committee

If there is enough interest, I'm willing to devote one day a week working for 
the Ceph User Committee. And yes, that includes sitting at the Ceph booth 
during the FOSDEM :-) And interviewing Ceph users and describing their use 
cases, which I enjoy very much. But also contribute to a user centric roadmap, 
which is what ultimately matters for the company I work for.

If you'd like to see this happen but don't have time to participate in this 
discussion, please add your name + email at the end of the pad. 

What do you think ?

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

2013-11-06 Thread Dinu Vlad

I'm using the latest 3.8.0 branch from raring. Is there a more recent/better 
kernel recommended? 

Meanwhile, I think I might have identified the culprit - my SSD drives are 
extremely slow on sync writes, doing 5-600 iops max with 4k blocksize. By 
comparison, an Intel 530 in another server (also installed behind a SAS 
expander is doing the same test with ~ 8k iops. I guess I'm good for replacing 
them. 

Removing the SSD drives from the setup and re-testing with ceph => 595 MB/s 
throughput under the same conditions (only mechanical drives, journal on a 
separate partition on each one, 8 rados bench processes, 16 threads each).  


On Nov 5, 2013, at 4:38 PM, Mark Nelson  wrote:

> Ok, some more thoughts:
> 
> 1) What kernel are you using?
> 
> 2) Mixing SATA and SAS on an expander backplane can some times have bad 
> effects.  We don't really know how bad this is and in what circumstances, but 
> the Nexenta folks have seen problems with ZFS on solaris and it's not 
> impossible linux may suffer too:
> 
> http://gdamore.blogspot.com/2010/08/why-sas-sata-is-not-such-great-idea.html
> 
> 3) If you are doing tests and look at disk throughput with something like 
> "collectl -sD -oT"  do the writes look balanced across the spinning disks?  
> Do any devices have much really high service times or queue times?
> 
> 4) Also, after the test is done, you can try:
> 
> find /var/run/ceph/*.asok -maxdepth 1 -exec sudo ceph --admin-daemon {} 
> dump_historic_ops \; > foo
> 
> and then grep for "duration" in foo.  You'll get a list of the slowest 
> operations over the last 10 minutes from every osd on the node.  Once you 
> identify a slow duration, you can go back and in an editor search for the 
> slow duration and look at where in the OSD it hung up.  That might tell us 
> more about slow/latent operations.
> 
> 5) Something interesting here is that I've heard from another party that in a 
> 36 drive Supermicro SC847E16 chassis they had 30 7.2K RPM disks and 6 SSDs on 
> a SAS9207-8i controller and were pushing significantly faster throughput than 
> you are seeing (even given the greater number of drives).  So it's very 
> interesting to me that you are pushing so much less.  The 36 drive supermicro 
> chassis I have with no expanders and 30 drives with 6 SSDs can push about 
> 2100MB/s with a bunch of 9207-8i controllers and XFS (no replication).
> 
> Mark
> 
> On 11/05/2013 05:15 AM, Dinu Vlad wrote:
>> Ok, so after tweaking the deadline scheduler and the filestore_wbthrottle* 
>> ceph settings I was able to get 440 MB/s from 8 rados bench instances, over 
>> a single osd node (pool pg_num = 1800, size = 1)
>> 
>> This still looks awfully slow to me - fio throughput across all disks 
>> reaches 2.8 GB/s!!
>> 
>> I'd appreciate any suggestion, where to look for the issue. Thanks!
>> 
>> 
>> On Oct 31, 2013, at 6:35 PM, Dinu Vlad  wrote:
>> 
>>> 
>>> I tested the osd performance from a single node. For this purpose I 
>>> deployed a new cluster (using ceph-deploy, as before) and on 
>>> fresh/repartitioned drives. I created a single pool, 1800 pgs. I ran the 
>>> rados bench both on the osd server and on a remote one. Cluster 
>>> configuration stayed "default", with the same additions about xfs mount & 
>>> mkfs.xfs as before.
>>> 
>>> With a single host, the pgs were "stuck unclean" (active only, not 
>>> active+clean):
>>> 
>>> # ceph -s
>>>  cluster ffd16afa-6348-4877-b6bc-d7f9d82a4062
>>>   health HEALTH_WARN 1800 pgs stuck unclean
>>>   monmap e1: 3 mons at 
>>> {cephmon1=10.4.0.250:6789/0,cephmon2=10.4.0.251:6789/0,cephmon3=10.4.0.252:6789/0},
>>>  election epoch 4, quorum 0,1,2 cephmon1,cephmon2,cephmon3
>>>   osdmap e101: 18 osds: 18 up, 18 in
>>>pgmap v1055: 1800 pgs: 1800 active; 0 bytes data, 732 MB used, 16758 GB 
>>> / 16759 GB avail
>>>   mdsmap e1: 0/0/1 up
>>> 
>>> 
>>> Test results:
>>> Local test, 1 process, 16 threads: 241.7 MB/s
>>> Local test, 8 processes, 128 threads: 374.8 MB/s
>>> Remote test, 1 process, 16 threads: 231.8 MB/s
>>> Remote test, 8 processes, 128 threads: 366.1 MB/s
>>> 
>>> Maybe it's just me, but it seems on the low side too.
>>> 
>>> Thanks,
>>> Dinu
>>> 
>>> 
>>> On Oct 30, 2013, at 8:59 PM, Mark Nelson  wrote:
>>> 
 On 10/30/2013 01:51 PM, Dinu Vlad wrote:
> Mark,
> 
> The SSDs are 
> http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/ssd/enterprise-sata-ssd/?sku=ST240FN0021
>  and the HDDs are 
> http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/hdd/constellation/?sku=ST91000640SS.
> 
> The chasis is a "SiliconMechanics C602" - but I don't have the exact 
> model. It's based on Supermicro, has 24 slots front and 2 in the back and 
> a SAS expander.
> 
> I did a fio test (raw partitions, 4M blocksize, ioqueue maxed out 
> according to what the driver reports in dmesg). here are the results 
> (filtered):
> 
> Sequential:
> Run status group 0 (all jobs):
>

[ceph-users] ceph 0.72 with zfs

2013-11-06 Thread Dinu Vlad

Hello,

I'm testing the 0.72 release and thought to give a spin to the zfs support. 

While I managed to setup a cluster on top of a number of zfs datasets, the 
ceph-osd logs show it's using the "genericfilestorebackend": 

2013-11-06 09:27:59.386392 7fdfee0ab7c0  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl 
is NOT supported
2013-11-06 09:27:59.386409 7fdfee0ab7c0  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl 
is disabled via 'filestore fiemap' config option
2013-11-06 09:27:59.391026 7fdfee0ab7c0  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) 
syscall fully supported (by glibc and kernel)

I noticed however that the ceph sources include some files related to zfs: 

# find . | grep -i zfs
./src/os/ZFS.cc
./src/os/ZFS.h
./src/os/ZFSFileStoreBackend.cc
./src/os/ZFSFileStoreBackend.h 

A coupel of questions: 

- is 0.72-rc1 package currently in the raring repository compiled with zfs 
support ? 
- if yes - how can I "inform" ceph-osd to use the ZFSFileStoreBackend ? 

Thanks,
Dinu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

On 11/06/2013 11:39 AM, Dinu Vlad wrote:

I'm using the latest 3.8.0 branch from raring. Is there a more recent/better
kernel recommended?

I've been using the 3.8 kernel in the precise repo effectively, so I
suspect it should be ok.

Meanwhile, I think I might have identified the culprit - my SSD drives are
extremely slow on sync writes, doing 5-600 iops max with 4k blocksize. By
comparison, an Intel 530 in another server (also installed behind a SAS
expander is doing the same test with ~ 8k iops. I guess I'm good for replacing
them.

Very interesting!

Removing the SSD drives from the setup and re-testing with ceph => 595 MB/s
throughput under the same conditions (only mechanical drives, journal on a
separate partition on each one, 8 rados bench processes, 16 threads each).

Ok, so you went from like 300MB/s to ~600MB/s by removing the SSDs and
just using spinners? That's pretty crazy! In any event, 600MB/s from
18 disks with journal writes is like 66MB/s per disk. That's not
particularly great, but if it's on the 9207-8i with no cache might be
about right since journal and fs writes will be in more contention. I'd
be curious what you'd see with DC S3700s for journals.

On Nov 5, 2013, at 4:38 PM, Mark Nelson wrote:

Ok, some more thoughts:

1) What kernel are you using?

2) Mixing SATA and SAS on an expander backplane can some times have bad
effects. We don't really know how bad this is and in what circumstances, but
the Nexenta folks have seen problems with ZFS on solaris and it's not
impossible linux may suffer too:

http://gdamore.blogspot.com/2010/08/why-sas-sata-is-not-such-great-idea.html

3) If you are doing tests and look at disk throughput with something like "collectl
-sD -oT" do the writes look balanced across the spinning disks? Do any devices
have much really high service times or queue times?

4) Also, after the test is done, you can try:

find /var/run/ceph/*.asok -maxdepth 1 -exec sudo ceph --admin-daemon {}
dump_historic_ops \; > foo

and then grep for "duration" in foo. You'll get a list of the slowest
operations over the last 10 minutes from every osd on the node. Once you identify a slow
duration, you can go back and in an editor search for the slow duration and look at where
in the OSD it hung up. That might tell us more about slow/latent operations.

5) Something interesting here is that I've heard from another party that in a
36 drive Supermicro SC847E16 chassis they had 30 7.2K RPM disks and 6 SSDs on a
SAS9207-8i controller and were pushing significantly faster throughput than you
are seeing (even given the greater number of drives). So it's very interesting
to me that you are pushing so much less. The 36 drive supermicro chassis I
have with no expanders and 30 drives with 6 SSDs can push about 2100MB/s with a
bunch of 9207-8i controllers and XFS (no replication).

Mark

On 11/05/2013 05:15 AM, Dinu Vlad wrote:

Ok, so after tweaking the deadline scheduler and the filestore_wbthrottle* ceph
settings I was able to get 440 MB/s from 8 rados bench instances, over a single
osd node (pool pg_num = 1800, size = 1)

This still looks awfully slow to me - fio throughput across all disks reaches
2.8 GB/s!!

I'd appreciate any suggestion, where to look for the issue. Thanks!

On Oct 31, 2013, at 6:35 PM, Dinu Vlad wrote:

I tested the osd performance from a single node. For this purpose I deployed a new cluster
(using ceph-deploy, as before) and on fresh/repartitioned drives. I created a single pool,
1800 pgs. I ran the rados bench both on the osd server and on a remote one. Cluster
configuration stayed "default", with the same additions about xfs mount &
mkfs.xfs as before.

With a single host, the pgs were "stuck unclean" (active only, not
active+clean):

# ceph -s
cluster ffd16afa-6348-4877-b6bc-d7f9d82a4062
health HEALTH_WARN 1800 pgs stuck unclean
monmap e1: 3 mons at
{cephmon1=10.4.0.250:6789/0,cephmon2=10.4.0.251:6789/0,cephmon3=10.4.0.252:6789/0},
election epoch 4, quorum 0,1,2 cephmon1,cephmon2,cephmon3
osdmap e101: 18 osds: 18 up, 18 in
pgmap v1055: 1800 pgs: 1800 active; 0 bytes data, 732 MB used, 16758 GB /
16759 GB avail
mdsmap e1: 0/0/1 up

Test results:
Local test, 1 process, 16 threads: 241.7 MB/s
Local test, 8 processes, 128 threads: 374.8 MB/s
Remote test, 1 process, 16 threads: 231.8 MB/s
Remote test, 8 processes, 128 threads: 366.1 MB/s

Maybe it's just me, but it seems on the low side too.

Thanks,
Dinu

On Oct 30, 2013, at 8:59 PM, Mark Nelson wrote:

On 10/30/2013 01:51 PM, Dinu Vlad wrote:

Mark,

The SSDs are
http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/ssd/enterprise-sata-ssd/?sku=ST240FN0021
and the HDDs are
http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/hdd/constellation/?sku=ST91000640SS.

The chasis is a "SiliconMechanics C602" - but I don't have the exact model.
It's based on Supermicro, has 24 slots front and 2 in the back

Re: [ceph-users] Ceph User Committee



On 07/11/2013 01:53, ja...@peacon.co.uk wrote:
> It's a great idea... are there any requirements, to be considered?

Being a Ceph user seems to be the only requirement to me. Do you have something 
else in mind ?

Cheers

> 
> On 2013-11-06 17:35, Loic Dachary wrote:
>> Hi Ceph,
>>
>> I would like to open a discussion about organizing a Ceph User
>> Committee. We briefly discussed the idea with Ross Turk, Patrick
>> McGarry and Sage Weil today during the OpenStack summit. A pad was
>> created and roughly summarizes the idea:
>>
>> http://pad.ceph.com/p/user-committee
>>
>> If there is enough interest, I'm willing to devote one day a week
>> working for the Ceph User Committee. And yes, that includes sitting at
>> the Ceph booth during the FOSDEM :-) And interviewing Ceph users and
>> describing their use cases, which I enjoy very much. But also
>> contribute to a user centric roadmap, which is what ultimately matters
>> for the company I work for.
>>
>> If you'd like to see this happen but don't have time to participate
>> in this discussion, please add your name + email at the end of the
>> pad.
>>
>> What do you think ?
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee

2013-11-06 Thread Lincoln Bryant

Seems interesting to me. I've added my name to the pot :)

--Lincoln

On Nov 6, 2013, at 11:56 AM, Loic Dachary wrote:

> 
> 
> On 07/11/2013 01:53, ja...@peacon.co.uk wrote:
>> It's a great idea... are there any requirements, to be considered?
> 
> Being a Ceph user seems to be the only requirement to me. Do you have 
> something else in mind ?
> 
> Cheers
> 
>> 
>> On 2013-11-06 17:35, Loic Dachary wrote:
>>> Hi Ceph,
>>> 
>>> I would like to open a discussion about organizing a Ceph User
>>> Committee. We briefly discussed the idea with Ross Turk, Patrick
>>> McGarry and Sage Weil today during the OpenStack summit. A pad was
>>> created and roughly summarizes the idea:
>>> 
>>> http://pad.ceph.com/p/user-committee
>>> 
>>> If there is enough interest, I'm willing to devote one day a week
>>> working for the Ceph User Committee. And yes, that includes sitting at
>>> the Ceph booth during the FOSDEM :-) And interviewing Ceph users and
>>> describing their use cases, which I enjoy very much. But also
>>> contribute to a user centric roadmap, which is what ultimately matters
>>> for the company I work for.
>>> 
>>> If you'd like to see this happen but don't have time to participate
>>> in this discussion, please add your name + email at the end of the
>>> pad.
>>> 
>>> What do you think ?
>>> 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee

2013-11-06 Thread Mike Dawson


I also have time I could spend. Thanks for getting this started Loic!

Thanks,
Mike Dawson


On 11/6/2013 12:35 PM, Loic Dachary wrote:

Hi Ceph,

I would like to open a discussion about organizing a Ceph User Committee. We 
briefly discussed the idea with Ross Turk, Patrick McGarry and Sage Weil today 
during the OpenStack summit. A pad was created and roughly summarizes the idea:

http://pad.ceph.com/p/user-committee

If there is enough interest, I'm willing to devote one day a week working for 
the Ceph User Committee. And yes, that includes sitting at the Ceph booth 
during the FOSDEM :-) And interviewing Ceph users and describing their use 
cases, which I enjoy very much. But also contribute to a user centric roadmap, 
which is what ultimately matters for the company I work for.

If you'd like to see this happen but don't have time to participate in this 
discussion, please add your name + email at the end of the pad.

What do you think ?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

2013-11-06 Thread Mike Dawson

We just fixed a performance issue on our cluster related to spikes of
high latency on some of our SSDs used for osd journals. In our case, the
slow SSDs showed spikes of 100x higher latency than expected.

What SSDs were you using that were so slow?

Cheers,
Mike

On 11/6/2013 12:39 PM, Dinu Vlad wrote:

I'm using the latest 3.8.0 branch from raring. Is there a more recent/better
kernel recommended?

On Nov 5, 2013, at 4:38 PM, Mark Nelson wrote:

Ok, some more thoughts:

1) What kernel are you using?

http://gdamore.blogspot.com/2010/08/why-sas-sata-is-not-such-great-idea.html

4) Also, after the test is done, you can try:

find /var/run/ceph/*.asok -maxdepth 1 -exec sudo ceph --admin-daemon {}
dump_historic_ops \; > foo

Mark

On 11/05/2013 05:15 AM, Dinu Vlad wrote:

Ok, so after tweaking the deadline scheduler and the filestore_wbthrottle* ceph
settings I was able to get 440 MB/s from 8 rados bench instances, over a single
osd node (pool pg_num = 1800, size = 1)

This still looks awfully slow to me - fio throughput across all disks reaches
2.8 GB/s!!

I'd appreciate any suggestion, where to look for the issue. Thanks!

On Oct 31, 2013, at 6:35 PM, Dinu Vlad wrote:

With a single host, the pgs were "stuck unclean" (active only, not
active+clean):

Maybe it's just me, but it seems on the low side too.

Thanks,
Dinu

On Oct 30, 2013, at 8:59 PM, Mark Nelson wrote:

On 10/30/2013 01:51 PM, Dinu Vlad wrote:

Mark,

The chasis is a "SiliconMechanics C602" - but I don't have the exact model.
It's based on Supermicro, has 24 slots front and 2 in the back and a SAS expander.

I did a fio test (raw partitions, 4M blocksize, ioqueue maxed out according to
what the driver reports in dmesg). here are the results (filtered):

Sequential:
Run status group 0 (all jobs):
WRITE: io=176952MB, aggrb=2879.0MB/s, minb=106306KB/s, m

Re: [ceph-users] Puppet Modules for Ceph

2013-11-06 Thread Karan Singh

A Big thanks Don for creating puppet modules .

Need your guidance on -

1) Did you manage to run this on centos
2) What all things can be installed using these modules ( mon , osd , mds OR 
All )
3) What all things i need to change in this module


Many Thanks
Karan Singh


- Original Message -
From: "Don Talton (dotalton)" 
To: "Karan Singh" , ceph-users@lists.ceph.com, 
ceph-users-j...@lists.ceph.com, ceph-us...@ceph.com
Sent: Wednesday, 6 November, 2013 6:49:16 PM
Subject: RE: [ceph-users] Puppet Modules for Ceph

This will work https://github.com/dontalton/puppet-cephdeploy

Just change the unless statements (should only be two) from testing dpkg to 
testing rpm instead.
I'll add an OS check myself, or you can fork and send me a pull request.

> -Original Message-
> From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
> boun...@lists.ceph.com] On Behalf Of Karan Singh
> Sent: Wednesday, November 06, 2013 7:56 PM
> To: ceph-users@lists.ceph.com; ceph-users-j...@lists.ceph.com; ceph-
> us...@ceph.com
> Subject: Re: [ceph-users] Puppet Modules for Ceph
> 
> Dear Cephers
> 
> I have a running ceph cluster that was deployed using ceph-deploy , our next
> objective is to build a Puppet setup that can be used for long term scaling of
> ceph infrastructure.
> 
> It would be a great help if any one can
> 
> 1) Provide ceph modules for (centos OS)
> 2) Guidance on how to proceed
> 
> Many Thanks
> Karan Singh
> 
> 
> - Original Message -
> From: "Karan Singh" 
> To: "Loic Dachary" 
> Cc: ceph-users@lists.ceph.com
> Sent: Monday, 4 November, 2013 5:01:26 PM
> Subject: Re: [ceph-users] Ceph deployment using puppet
> 
> Hello Loic
> 
> Thanks for your reply , Ceph-deploy works good to me.
> 
> My next objective is to deploy ceph using puppet. Can you guide me now i
> can proceed.
> 
> Regards
> karan
> 
> - Original Message -
> From: "Loic Dachary" 
> To: ceph-users@lists.ceph.com
> Sent: Monday, 4 November, 2013 4:45:06 PM
> Subject: Re: [ceph-users] Ceph deployment using puppet
> 
> Hi,
> 
> Unless you're force to use puppet for some reason, I suggest you give ceph-
> deploy a try:
> 
> http://ceph.com/docs/master/start/quick-ceph-deploy/
> 
> Cheers
> 
> On 04/11/2013 19:00, Karan Singh wrote:
> > Hello Everyone
> >
> > Can  someone guide me how i can start for " ceph deployment using
> puppet " , what all things i need to have for this .
> >
> > I have no prior idea of using puppet , hence need your help to getting
> started with it.
> >
> >
> > Regards
> > Karan Singh
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> --
> Loïc Dachary, Artisan Logiciel Libre
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

2013-11-06 Thread Dinu Vlad

ST240FN0021 connected via a SAS2x36 to a LSI 9207-8i. 

By "fixed" - you mean replaced the SSDs?  

Thanks,
Dinu

On Nov 6, 2013, at 10:25 PM, Mike Dawson  wrote:

> We just fixed a performance issue on our cluster related to spikes of high 
> latency on some of our SSDs used for osd journals. In our case, the slow SSDs 
> showed spikes of 100x higher latency than expected.
> 
> What SSDs were you using that were so slow?
> 
> Cheers,
> Mike
> 
> On 11/6/2013 12:39 PM, Dinu Vlad wrote:
>> I'm using the latest 3.8.0 branch from raring. Is there a more recent/better 
>> kernel recommended?
>> 
>> Meanwhile, I think I might have identified the culprit - my SSD drives are 
>> extremely slow on sync writes, doing 5-600 iops max with 4k blocksize. By 
>> comparison, an Intel 530 in another server (also installed behind a SAS 
>> expander is doing the same test with ~ 8k iops. I guess I'm good for 
>> replacing them.
>> 
>> Removing the SSD drives from the setup and re-testing with ceph => 595 MB/s 
>> throughput under the same conditions (only mechanical drives, journal on a 
>> separate partition on each one, 8 rados bench processes, 16 threads each).
>> 
>> 
>> On Nov 5, 2013, at 4:38 PM, Mark Nelson  wrote:
>> 
>>> Ok, some more thoughts:
>>> 
>>> 1) What kernel are you using?
>>> 
>>> 2) Mixing SATA and SAS on an expander backplane can some times have bad 
>>> effects.  We don't really know how bad this is and in what circumstances, 
>>> but the Nexenta folks have seen problems with ZFS on solaris and it's not 
>>> impossible linux may suffer too:
>>> 
>>> http://gdamore.blogspot.com/2010/08/why-sas-sata-is-not-such-great-idea.html
>>> 
>>> 3) If you are doing tests and look at disk throughput with something like 
>>> "collectl -sD -oT"  do the writes look balanced across the spinning disks?  
>>> Do any devices have much really high service times or queue times?
>>> 
>>> 4) Also, after the test is done, you can try:
>>> 
>>> find /var/run/ceph/*.asok -maxdepth 1 -exec sudo ceph --admin-daemon {} 
>>> dump_historic_ops \; > foo
>>> 
>>> and then grep for "duration" in foo.  You'll get a list of the slowest 
>>> operations over the last 10 minutes from every osd on the node.  Once you 
>>> identify a slow duration, you can go back and in an editor search for the 
>>> slow duration and look at where in the OSD it hung up.  That might tell us 
>>> more about slow/latent operations.
>>> 
>>> 5) Something interesting here is that I've heard from another party that in 
>>> a 36 drive Supermicro SC847E16 chassis they had 30 7.2K RPM disks and 6 
>>> SSDs on a SAS9207-8i controller and were pushing significantly faster 
>>> throughput than you are seeing (even given the greater number of drives).  
>>> So it's very interesting to me that you are pushing so much less.  The 36 
>>> drive supermicro chassis I have with no expanders and 30 drives with 6 SSDs 
>>> can push about 2100MB/s with a bunch of 9207-8i controllers and XFS (no 
>>> replication).
>>> 
>>> Mark
>>> 
>>> On 11/05/2013 05:15 AM, Dinu Vlad wrote:
 Ok, so after tweaking the deadline scheduler and the filestore_wbthrottle* 
 ceph settings I was able to get 440 MB/s from 8 rados bench instances, 
 over a single osd node (pool pg_num = 1800, size = 1)
 
 This still looks awfully slow to me - fio throughput across all disks 
 reaches 2.8 GB/s!!
 
 I'd appreciate any suggestion, where to look for the issue. Thanks!
 
 
 On Oct 31, 2013, at 6:35 PM, Dinu Vlad  wrote:
 
> 
> I tested the osd performance from a single node. For this purpose I 
> deployed a new cluster (using ceph-deploy, as before) and on 
> fresh/repartitioned drives. I created a single pool, 1800 pgs. I ran the 
> rados bench both on the osd server and on a remote one. Cluster 
> configuration stayed "default", with the same additions about xfs mount & 
> mkfs.xfs as before.
> 
> With a single host, the pgs were "stuck unclean" (active only, not 
> active+clean):
> 
> # ceph -s
>  cluster ffd16afa-6348-4877-b6bc-d7f9d82a4062
>   health HEALTH_WARN 1800 pgs stuck unclean
>   monmap e1: 3 mons at 
> {cephmon1=10.4.0.250:6789/0,cephmon2=10.4.0.251:6789/0,cephmon3=10.4.0.252:6789/0},
>  election epoch 4, quorum 0,1,2 cephmon1,cephmon2,cephmon3
>   osdmap e101: 18 osds: 18 up, 18 in
>pgmap v1055: 1800 pgs: 1800 active; 0 bytes data, 732 MB used, 16758 
> GB / 16759 GB avail
>   mdsmap e1: 0/0/1 up
> 
> 
> Test results:
> Local test, 1 process, 16 threads: 241.7 MB/s
> Local test, 8 processes, 128 threads: 374.8 MB/s
> Remote test, 1 process, 16 threads: 231.8 MB/s
> Remote test, 8 processes, 128 threads: 366.1 MB/s
> 
> Maybe it's just me, but it seems on the low side too.
> 
> Thanks,
> Dinu
> 
> 
> On Oct 30, 2013, at 8:59 PM, Mark Nelson  wrote:
> 
>> On 10/30/2013 01:51 PM, Dinu Vlad wrote:
>

Re: [ceph-users] ceph cluster performance

2013-11-06 Thread Mike Dawson

No, in our case flashing the firmware to the latest release cured the
problem.

If you build a new cluster with the slow SSDs, I'd be interested in the
results of ioping[0] or fsync-tester[1]. I theorize that you may see
spikes of high latency.

[0] https://code.google.com/p/ioping/
[1] https://github.com/gregsfortytwo/fsync-tester

Thanks,
Mike Dawson

On 11/6/2013 4:18 PM, Dinu Vlad wrote:

ST240FN0021 connected via a SAS2x36 to a LSI 9207-8i.

By "fixed" - you mean replaced the SSDs?

Thanks,
Dinu

On Nov 6, 2013, at 10:25 PM, Mike Dawson wrote:

We just fixed a performance issue on our cluster related to spikes of high
latency on some of our SSDs used for osd journals. In our case, the slow SSDs
showed spikes of 100x higher latency than expected.

What SSDs were you using that were so slow?

Cheers,
Mike

On 11/6/2013 12:39 PM, Dinu Vlad wrote:

I'm using the latest 3.8.0 branch from raring. Is there a more recent/better
kernel recommended?

On Nov 5, 2013, at 4:38 PM, Mark Nelson wrote:

Ok, some more thoughts:

1) What kernel are you using?

http://gdamore.blogspot.com/2010/08/why-sas-sata-is-not-such-great-idea.html

4) Also, after the test is done, you can try:

find /var/run/ceph/*.asok -maxdepth 1 -exec sudo ceph --admin-daemon {}
dump_historic_ops \; > foo

Mark

On 11/05/2013 05:15 AM, Dinu Vlad wrote:

Ok, so after tweaking the deadline scheduler and the filestore_wbthrottle* ceph
settings I was able to get 440 MB/s from 8 rados bench instances, over a single
osd node (pool pg_num = 1800, size = 1)

This still looks awfully slow to me - fio throughput across all disks reaches
2.8 GB/s!!

I'd appreciate any suggestion, where to look for the issue. Thanks!

On Oct 31, 2013, at 6:35 PM, Dinu Vlad wrote:

With a single host, the pgs were "stuck unclean" (active only, not
active+clean):

Maybe it's just me, but it seems on the low side too.

Thanks,
Dinu

On Oct 30, 2013, at 8:59 PM, Mark Nelson wrote:

On 10/30/2013 01:51 PM, Dinu Vlad wrote:

Mark,

The SSDs are
http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/ssd/enterpris

[ceph-users] Manual Installation steps without ceph-deploy

2013-11-06 Thread Trivedi, Narendra

Hi All,

I did a fresh install of Ceph (this might be like 10th or 11th install) on 4 
new VMs (one admin, one MON and two OSDs) built from CentOS 6.4 (x64) .iso , 
did a yum update on all of them. They are all running on vmware ESXi 5.1.0. I 
did everything sage et al suggested (i.e. creation of /ceph/osd* and making 
sure /etc/ceph is present on all nodes. /etc/ceph gets created all the 
ceph-deploy install and contains rbdmap FYI). Unusually, I ended up with the 
same problem while activating OSDs (the last 4 lines keep going on and on 
forever):

2013-11-06 14:37:39,626 [ceph_deploy.cli][INFO  ] Invoked (1.3): 
/usr/bin/ceph-deploy osd activate ceph-node2-osd0-centos-6-4:/ceph/osd0 
ceph-node3-osd1-centos-6-4:/ceph/osd1
2013-11-06 14:37:39,627 [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks 
ceph-node2-osd0-centos-6-4:/ceph/osd0: ceph-node3-osd1-centos-6-4:/ceph/osd1:
2013-11-06 14:37:39,901 [ceph-node2-osd0-centos-6-4][DEBUG ] connected to host: 
ceph-node2-osd0-centos-6-4
2013-11-06 14:37:39,902 [ceph-node2-osd0-centos-6-4][DEBUG ] detect platform 
information from remote host
2013-11-06 14:37:39,917 [ceph-node2-osd0-centos-6-4][DEBUG ] detect machine type
2013-11-06 14:37:39,925 [ceph_deploy.osd][INFO  ] Distro info: CentOS 6.4 Final
2013-11-06 14:37:39,925 [ceph_deploy.osd][DEBUG ] activating host 
ceph-node2-osd0-centos-6-4 disk /ceph/osd0
2013-11-06 14:37:39,925 [ceph_deploy.osd][DEBUG ] will use init type: sysvinit
2013-11-06 14:37:39,925 [ceph-node2-osd0-centos-6-4][INFO  ] Running command: 
sudo ceph-disk-activate --mark-init sysvinit --mount /ceph/osd0
2013-11-06 14:37:40,145 [ceph-node2-osd0-centos-6-4][ERROR ] 2013-11-06 
14:37:41.075310 7fac2414c700  0 -- :/1029546 >> 10.12.0.70:6789/0 
pipe(0x7fac20024480 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac200246e0).fault
2013-11-06 14:37:43,167 [ceph-node2-osd0-centos-6-4][ERROR ] 2013-11-06 
14:37:44.071697 7fac1ebfd700  0 -- :/1029546 >> 10.12.0.70:6789/0 
pipe(0x7fac14000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac14000e60).fault
2013-11-06 14:37:46,140 [ceph-node2-osd0-centos-6-4][ERROR ] 2013-11-06 
14:37:47.071938 7fac2414c700  0 -- :/1029546 >> 10.12.0.70:6789/0 
pipe(0x7fac14003010 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac14003270).fault
2013-11-06 14:37:50,165 [ceph-node2-osd0-centos-6-4][ERROR ] 2013-11-06 
14:37:51.071245 7fac1ebfd700  0 -- :/1029546 >> 10.12.0.70:6789/0 
pipe(0x7fac14003a70 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac14003cd0).fault

It might be bad luck but I want to try a manual installation without 
ceph-deploy because it seems  I am jinxed with ceph-deploy. Could anyone please 
forward me the steps. I am happy to share the ceph.log with anyone who would 
like to research on this error but I don't a have clue.


Thanks a lot!
Narendra Trivedi | savviscloud


This message contains information which may be confidential and/or privileged. 
Unless you are the intended recipient (or authorized to receive for the 
intended recipient), you may not read, use, copy or disclose to anyone the 
message or any information contained in the message. If you have received the 
message in error, please advise the sender by reply e-mail and delete the 
message and any attachment(s) thereto without retaining any copies.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance


On 2013-11-06 20:25, Mike Dawson wrote:

   We just fixed a performance issue on our cluster related to spikes 
of high latency on some of our SSDs used for osd journals. In our case, 
the slow SSDs showed spikes of 100x higher latency than expected.



Many SSDs show this behaviour when 100% provisioned and/or never 
TRIM'd, since the pool of ready erased cells is quickly depleted under 
steady write workload, so it has to wait for cells to charge to 
accommodate the write.


The Intel 3700 SSDs look to have some of the best consistency ratings 
of any of the more reasonably priced drives at the moment, and good IOPS 
too:


http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html

Obviously the quoted IOPS numbers are dependent on quite a deep queue 
mind.


There is a big range of performance in the market currently; some 
Enterprise SSDs are quoted at just 4,000 IOPS yet cost as many pounds!



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance


On 11/06/2013 03:35 PM, ja...@peacon.co.uk wrote:

On 2013-11-06 20:25, Mike Dawson wrote:


   We just fixed a performance issue on our cluster related to spikes
of high latency on some of our SSDs used for osd journals. In our
case, the slow SSDs showed spikes of 100x higher latency than expected.



Many SSDs show this behaviour when 100% provisioned and/or never TRIM'd,
since the pool of ready erased cells is quickly depleted under steady
write workload, so it has to wait for cells to charge to accommodate the
write.

The Intel 3700 SSDs look to have some of the best consistency ratings of
any of the more reasonably priced drives at the moment, and good IOPS too:

http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html


Obviously the quoted IOPS numbers are dependent on quite a deep queue mind.

There is a big range of performance in the market currently; some
Enterprise SSDs are quoted at just 4,000 IOPS yet cost as many pounds!


Most vendors won't give you DC S3700s by default, but if you put your 
foot down most of them seem to have SKUs for them lurking around 
somewhere.  Right now they are the first drive I recommend for journals, 
though I believe some of the other vendors may have some interesting 
options in the future too.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] USB pendrive as boot disk

2013-11-06 Thread Craig Lewis

I've done this for some NFS machines (the ones I'm currently migrating 
to Ceph).  It works... but I'm moving back to small SSDs for the OS.



I used a pair of USB thumbdrives, in a RAID1.  It worked fine for about 
a year.  Then I lost both mirrors in multiple machines, all within an 
hour.  I thought it was a bad batch, and made changes to make sure I 
used mirrors from different batches of drives.  Then it happened again 6 
month later.  My best guess is that the automated manufacturing for 
these devices is so tight that the RAID1 wears them both out at exactly 
the same time.


These drives have no controller, so they have no SMART data and no wear 
leveling.  If you hotspot any part of the drive, you'll wear that part 
out very quickly.  Make sure you don't use your swap at all, and make 
sure you mount those filesystem with noatime.  It should have occurred 
to me sooner, but I should have followed a guide for booting off compact 
flash.


I eventually made it work by switching to drives from different 
manufacturers, and attempting to verify that the actual flash chips are 
also from different manufacturers.  It's a bit of work to figure out at 
the time, and it's something you need to re-verify every so often.




For my Ceph cluster, I'm going back to SSDs for the OS.  Instead of 
using two of my precious 3.5" bays, I'm buying some PCI 2.5" drive bays: 
http://www.amazon.com/Syba-Mount-Mobile-2-5-Inch-SY-MRA25023/dp/B0080V73RE 
, and plugging them into the motherboard SATA ports.  The next chassis I 
buy will have some dedicated 2.5" bays, like this:
http://www.supermicro.com/products/system/2U/6027/SSG-6027R-E1R12T.cfm 
(I see a lot of manufacturers starting to do this).



The SSDs support SMART and wear leveling, and cheap SSDs are just a bit 
more than the 32GB thumb drives I was buying.  Since it's just the OS, I 
don't need high performance SSDs, and I'm still following the compact 
flash booting guides to extend their lifetime.



I am worried that I'll have the same problem with SSD mirrors failing at 
the same time.  Since I can monitor wear leveling, I plan to retire the 
OS mirrors before the wear leveling runs out.




*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



On 11/5/13 13:33 , Gandalf Corvotempesta wrote:

Hi,
what do you think to use a USB pendrive as boot disk for OSDs nodes?
Pendrive are cheaper and bigger, and doing this will allow me to use
all spinning disks and SSDs as OSD storage/journal.

More over, in a future, i'll be able to boot from net replacing the
pendrive without loosing space on spinning disks to store operating
system
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Manual Installation steps without ceph-deploy


I also had some difficulty with ceph-deploy on CentOS.

I eventually moved to Ubuntu 13.04 - and haven't looked back.


On 2013-11-06 21:35, Trivedi, Narendra wrote:

Hi All,

I did a fresh install of Ceph (this might be like 10th or 11th
install) on 4 new VMs (one admin, one MON and two OSDs) built from
CentOS 6.4 (x64)... it seems I am jinxed with ceph-deploy.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Manual Installation steps without ceph-deploy

2013-11-06 Thread Trivedi, Narendra

Unfortunately, I don't have that luxury. 

Thanks!
Narendra

-Original Message-
From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of ja...@peacon.co.uk
Sent: Wednesday, November 06, 2013 4:43 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Manual Installation steps without ceph-deploy

I also had some difficulty with ceph-deploy on CentOS.

I eventually moved to Ubuntu 13.04 - and haven't looked back.

On 2013-11-06 21:35, Trivedi, Narendra wrote:
> Hi All,
>
> I did a fresh install of Ceph (this might be like 10th or 11th
> install) on 4 new VMs (one admin, one MON and two OSDs) built from 
> CentOS 6.4 (x64)... it seems I am jinxed with ceph-deploy.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

This message contains information which may be confidential and/or privileged. 
Unless you are the intended recipient (or authorized to receive for the 
intended recipient), you may not read, use, copy or disclose to anyone the 
message or any information contained in the message. If you have received the 
message in error, please advise the sender by reply e-mail and delete the 
message and any attachment(s) thereto without retaining any copies.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] USB pendrive as boot disk

2013-11-06 Thread Gandalf Corvotempesta

Il 06/nov/2013 23:12 "Craig Lewis"  ha scritto:
>
> For my Ceph cluster, I'm going back to SSDs for the OS.  Instead of using
two of my precious 3.5" bays, I'm buying some PCI 2.5" drive bays:
http://www.amazon.com/Syba-Mount-Mobile-2-5-Inch-SY-MRA25023/dp/B0080V73RE,
and plugging them into the motherboard SATA ports.  The next chassis I
buy will have some dedicated 2.5" bays, like this:
> http://www.supermicro.com/products/system/2U/6027/SSG-6027R-E1R12T.cfm (I
see a lot of manufacturers starting to do this).
>
>
> The SSDs support SMART and wear leveling, and cheap SSDs are just a bit
more than the 32GB thumb drives I was buying.  Since it's just the OS, I
don't need high performance SSDs, and I'm still following the compact flash
booting guides to extend their lifetime.
>
>
> I am worried that I'll have the same problem with SSD mirrors failing at
the same time.  Since I can monitor wear leveling, I plan to retire the OS
mirrors before the wear leveling runs out

With the suggested adapter why not using a standard 2.5'' sata disk?
Sata for OS should be enough, no need for an ssd
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Kernel Panic / RBD Instability

2013-11-06 Thread Mikaël Cluseau


Hello,

if you use kernel RBD, maybe your issue is linked to this one : 
http://tracker.ceph.com/issues/5760


Best regards,
Mikael.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] USB pendrive as boot disk

2013-11-06 Thread Craig Lewis



On 11/6/13 15:41 , Gandalf Corvotempesta wrote:

With the suggested adapter why not using a standard 2.5'' sata disk?

Sata for OS should be enough, no need for an ssd

At the time, the smallest SSDs were about half the price of the smallest 
HDDs.  My Ceph nodes are only using ~4GB on /, so small and cheap is 
fine.  I would've continued using the USB pen drives if they hadn't 
caused so many problems.


It only while composing my email that it occurred to me that the SSD 
mirrors might fail the same way as the USB drives.  I'm less worried 
about it, but I do plan to monitor their wear levels now that I has 
occurred to me.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] USB pendrive as boot disk

2013-11-06 Thread Mark Kirkwood


On 07/11/13 13:54, Craig Lewis wrote:


On 11/6/13 15:41 , Gandalf Corvotempesta wrote:

With the suggested adapter why not using a standard 2.5'' sata disk?

Sata for OS should be enough, no need for an ssd

At the time, the smallest SSDs were about half the price of the 
smallest HDDs.  My Ceph nodes are only using ~4GB on /, so small and 
cheap is fine.  I would've continued using the USB pen drives if they 
hadn't caused so many problems.


It only while composing my email that it occurred to me that the SSD 
mirrors might fail the same way as the USB drives.  I'm less worried 
about it, but I do plan to monitor their wear levels now that I has 
occurred to me.





The SSD failures I've seen have all been firmware bugs rather than flash 
wearout. This has the effect that a RAID1 pair are likley to fail at the 
same time!


Regards

Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee

2013-11-06 Thread Alek Paunov


On 06.11.2013 19:35, Loic Dachary wrote:

Hi Ceph,

I would like to open a discussion about organizing a Ceph User Committee. We 
briefly discussed the idea with Ross Turk, Patrick McGarry and Sage Weil today 
during the OpenStack summit. A pad was created and roughly summarizes the idea:

What do you think ?



The core Ceph developers are entirely engaged with deep Ceph internals, 
optimizations and sophisticated new features.


On the other side, I think, the Ceph community is able to help further 
with the wider and smoother Ceph adoption (further than current mailing 
list participation in the support) with far more simple and generic bits 
of code:


Web application - Ceph setups gallery. Could be organized like github: 
account/setup. Setup consists of (declarative - e.g JSON/XML exportable) 
Topology, Hardware, OSs, Ceph (setup method, command/script steps and/or 
chef/puppet artifacts, final running ceph configs on node classes).


When a Ceph architect/admin have a successful, tuned cluster, if she is 
willing to share (or just keep as documentation), describes the setup 
under her account (with private bits obfuscated).


When a new Ceph user want to ask a question or need help, he can (as 
alternative of current setup descriptions in e-mail prose) create an 
account and add his draft there, pointing in the e-mail to the revision. 
Also a new user can just clone existing template, adjust to his local 
context and try to generate scripts and/or management system artifacts 
on top of the revision export.


Once the basic data schema have been roughly determined, interactive SVG 
views (using/like d3js) could be added.


Regards,
Alek

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Error: Package: 1:python-flask-0.9-5.el6.noarch (epel), Requires: python-sphinx

2013-11-06 Thread Eyal Gutkind

Trying to install ceph on my machines.
Using RHEL6.3 I get the following error while invoking ceph-deploy.

Tried to install sphinx on ceph-node, seems to be success full and installed.
Still, it seems that during the installation there is an unresolved dependency.

[apollo006][INFO  ] Running command: sudo yum -y -q install ceph
[apollo006][ERROR ] Error: Package: 1:python-flask-0.9-5.el6.noarch (epel)
[apollo006][ERROR ]Requires: python-sphinx


Below is the deploying command line


$ ceph-deploy install apollo006
[ceph_deploy.cli][INFO  ] Invoked (1.3): /usr/bin/ceph-deploy install apollo006
[ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster 
ceph hosts apollo006
[ceph_deploy.install][DEBUG ] Detecting platform for host apollo006 ...
[apollo006][DEBUG ] connected to host: apollo006
[apollo006][DEBUG ] detect platform information from remote host
[apollo006][DEBUG ] detect machine type
[ceph_deploy.install][INFO  ] Distro info: Red Hat Enterprise Linux Server 6.3 
Santiago
[apollo006][INFO  ] installing ceph on apollo006
[apollo006][INFO  ] Running command: sudo rpm --import 
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[apollo006][INFO  ] Running command: sudo rpm -Uvh --replacepkgs 
http://ceph.com/rpm-dumpling/el6/noarch/ceph-release-1-0.el6.noarch.rpm
[apollo006][DEBUG ] Retrieving 
http://ceph.com/rpm-dumpling/el6/noarch/ceph-release-1-0.el6.noarch.rpm
[apollo006][DEBUG ] Preparing...
##
[apollo006][DEBUG ] ceph-release
##
[apollo006][INFO  ] Running command: sudo yum -y -q install ceph
[apollo006][ERROR ] Error: Package: 1:python-flask-0.9-5.el6.noarch (epel)
[apollo006][ERROR ]Requires: python-sphinx
[apollo006][DEBUG ]  You could try using --skip-broken to work around the 
problem
[apollo006][DEBUG ]  You could try running: rpm -Va --nofiles --nodigest
[apollo006][ERROR ] Traceback (most recent call last):
[apollo006][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/lib/remoto/process.py", line 68, 
in run
[apollo006][ERROR ] reporting(conn, result, timeout)
[apollo006][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/lib/remoto/log.py", line 13, in 
reporting
[apollo006][ERROR ] received = result.receive(timeout)
[apollo006][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py",
 line 455, in receive
[apollo006][ERROR ] raise self._getremoteerror() or EOFError()
[apollo006][ERROR ] RemoteError: Traceback (most recent call last):
[apollo006][ERROR ]   File "", line 806, in executetask
[apollo006][ERROR ]   File "", line 35, in _remote_run
[apollo006][ERROR ] RuntimeError: command returned non-zero exit status: 1
[apollo006][ERROR ]
[apollo006][ERROR ]
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y -q 
install ceph

Thank you  for your help,
EyalG
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw-agent failed to sync object

2013-11-06 Thread lixuehui

Hi all :
After we build a region with two zones distributed in two ceph cluster.Start 
the agent ,it start works!
But what we find in the radosgw-agent stdout is that it failed to sync objects 
all the time .Paste the info:
 (env)root@ceph-rgw41:~/myproject# ./radosgw-agent -c cluster-data-sync.conf -q
region map is: {u'us': [u'us-west', u'us-east']}
ERROR:radosgw_agent.worker:failed to sync object new-east-bucket/new-east.json: 
state is error
ERROR:radosgw_agent.worker:failed to sync object new-east-bucket/new-east.json: 
state is error
ERROR:radosgw_agent.worker:failed to sync object new-east-bucket/new-east.json: 
state is error
ERROR:radosgw_agent.worker:failed to sync object new-east-bucket/new-east.json: 
state is error
ERROR:radosgw_agent.worker:failed to sync object new-east-bucket/new-east.json: 
state is error
Metadata has already been copied form the master zone.I'd like to know the 
reason ,and what the'state is error 'mean!




lixuehui___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Puppet Modules for Ceph

2013-11-06 Thread Don Talton (dotalton)

Hi Karan,

1. Not test on CentOS at all. But since the work is done using ceph-deploy it 
*should* be the same.
2. Everything supported by ceph-deploy (mon, osd, mds).
3. Change the dpkg command to the equivalent rpm command to test whether or not 
a package is already installed.
  
https://github.com/dontalton/puppet-cephdeploy/blob/master/manifests/baseconfig.pp#L114
  
https://github.com/dontalton/puppet-cephdeploy/blob/master/manifests/init.pp#L122
  


> -Original Message-
> From: Karan Singh [mailto:ksi...@csc.fi]
> Sent: Thursday, November 07, 2013 5:02 AM
> To: Don Talton (dotalton)
> Cc: ceph-users@lists.ceph.com; ceph-users-j...@lists.ceph.com; ceph-
> us...@ceph.com
> Subject: Re: [ceph-users] Puppet Modules for Ceph
> 
> A Big thanks Don for creating puppet modules .
> 
> Need your guidance on -
> 
> 1) Did you manage to run this on centos
> 2) What all things can be installed using these modules ( mon , osd , mds OR
> All )
> 3) What all things i need to change in this module
> 
> 
> Many Thanks
> Karan Singh
> 
> 
> - Original Message -
> From: "Don Talton (dotalton)" 
> To: "Karan Singh" , ceph-users@lists.ceph.com, ceph-users-
> j...@lists.ceph.com, ceph-us...@ceph.com
> Sent: Wednesday, 6 November, 2013 6:49:16 PM
> Subject: RE: [ceph-users] Puppet Modules for Ceph
> 
> This will work https://github.com/dontalton/puppet-cephdeploy
> 
> Just change the unless statements (should only be two) from testing dpkg to
> testing rpm instead.
> I'll add an OS check myself, or you can fork and send me a pull request.
> 
> > -Original Message-
> > From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
> > boun...@lists.ceph.com] On Behalf Of Karan Singh
> > Sent: Wednesday, November 06, 2013 7:56 PM
> > To: ceph-users@lists.ceph.com; ceph-users-j...@lists.ceph.com; ceph-
> > us...@ceph.com
> > Subject: Re: [ceph-users] Puppet Modules for Ceph
> >
> > Dear Cephers
> >
> > I have a running ceph cluster that was deployed using ceph-deploy ,
> > our next objective is to build a Puppet setup that can be used for
> > long term scaling of ceph infrastructure.
> >
> > It would be a great help if any one can
> >
> > 1) Provide ceph modules for (centos OS)
> > 2) Guidance on how to proceed
> >
> > Many Thanks
> > Karan Singh
> >
> >
> > - Original Message -
> > From: "Karan Singh" 
> > To: "Loic Dachary" 
> > Cc: ceph-users@lists.ceph.com
> > Sent: Monday, 4 November, 2013 5:01:26 PM
> > Subject: Re: [ceph-users] Ceph deployment using puppet
> >
> > Hello Loic
> >
> > Thanks for your reply , Ceph-deploy works good to me.
> >
> > My next objective is to deploy ceph using puppet. Can you guide me now
> > i can proceed.
> >
> > Regards
> > karan
> >
> > - Original Message -
> > From: "Loic Dachary" 
> > To: ceph-users@lists.ceph.com
> > Sent: Monday, 4 November, 2013 4:45:06 PM
> > Subject: Re: [ceph-users] Ceph deployment using puppet
> >
> > Hi,
> >
> > Unless you're force to use puppet for some reason, I suggest you give
> > ceph- deploy a try:
> >
> > http://ceph.com/docs/master/start/quick-ceph-deploy/
> >
> > Cheers
> >
> > On 04/11/2013 19:00, Karan Singh wrote:
> > > Hello Everyone
> > >
> > > Can  someone guide me how i can start for " ceph deployment using
> > puppet " , what all things i need to have for this .
> > >
> > > I have no prior idea of using puppet , hence need your help to
> > > getting
> > started with it.
> > >
> > >
> > > Regards
> > > Karan Singh
> > >
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
> > --
> > Loïc Dachary, Artisan Logiciel Libre
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph 0.72 with zfs

2013-11-06 Thread Sage Weil

Hi Dinu,

You currently need to compile yourself, and pass --with-zfs to 
./configure.

Once it is built in, ceph-osd will detect whether the underlying fs is zfs 
on its own.

sage



On Wed, 6 Nov 2013, Dinu Vlad wrote:

> Hello,
> 
> I'm testing the 0.72 release and thought to give a spin to the zfs support. 
> 
> While I managed to setup a cluster on top of a number of zfs datasets, the 
> ceph-osd logs show it's using the "genericfilestorebackend": 
> 
> 2013-11-06 09:27:59.386392 7fdfee0ab7c0  0 
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP 
> ioctl is NOT supported
> 2013-11-06 09:27:59.386409 7fdfee0ab7c0  0 
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP 
> ioctl is disabled via 'filestore fiemap' config option
> 2013-11-06 09:27:59.391026 7fdfee0ab7c0  0 
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) 
> syscall fully supported (by glibc and kernel)
> 
> I noticed however that the ceph sources include some files related to zfs: 
> 
> # find . | grep -i zfs
> ./src/os/ZFS.cc
> ./src/os/ZFS.h
> ./src/os/ZFSFileStoreBackend.cc
> ./src/os/ZFSFileStoreBackend.h 
> 
> A coupel of questions: 
> 
> - is 0.72-rc1 package currently in the raring repository compiled with zfs 
> support ? 
> - if yes - how can I "inform" ceph-osd to use the ZFSFileStoreBackend ? 
> 
> Thanks,
> Dinu
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee

2013-11-06 Thread Sage Weil

On Thu, 7 Nov 2013, Alek Paunov wrote:
> When a Ceph architect/admin have a successful, tuned cluster, if she is
> willing to share (or just keep as documentation), describes the setup under
> her account (with private bits obfuscated).

I think this is a great idea.  One of the big questions users have is 
"what kind of hardware should I buy."  An easy way for users to publish 
information about their setup (hardware, software versions, use-case, 
performance) when they have successful deployments would be very valuable.  
Maybe a section of wiki?

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee

Hi Alek,

On 07/11/2013 09:03, Alek Paunov wrote:> On 06.11.2013 19:35, Loic Dachary 
wrote:
>> Hi Ceph,
>>
>> I would like to open a discussion about organizing a Ceph User Committee. We 
>> briefly discussed the idea with Ross Turk, Patrick McGarry and Sage Weil 
>> today during the OpenStack summit. A pad was created and roughly summarizes 
>> the idea:
>>
>> What do you think ?
>>
> 
> The core Ceph developers are entirely engaged with deep Ceph internals, 
> optimizations and sophisticated new features.
> 
> On the other side, I think, the Ceph community is able to help further with 
> the wider and smoother Ceph adoption (further than current mailing list 
> participation in the support) with far more simple and generic bits of code:
> 
> Web application - Ceph setups gallery. Could be organized like github: 
> account/setup. Setup consists of (declarative - e.g JSON/XML exportable) 
> Topology, Hardware, OSs, Ceph (setup method, command/script steps and/or 
> chef/puppet artifacts, final running ceph configs on node classes).

> 
> When a Ceph architect/admin have a successful, tuned cluster, if she is 
> willing to share (or just keep as documentation), describes the setup under 
> her account (with private bits obfuscated).

That makes senses. Do you see this as being in the scope of 
http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph-Brag ? I've added it 
to the "Work items" of http://pad.ceph.com/p/user-committee.

> 
> When a new Ceph user want to ask a question or need help, he can (as 
> alternative of current setup descriptions in e-mail prose) create an account 
> and add his draft there, pointing in the e-mail to the revision. Also a new 
> user can just clone existing template, adjust to his local context and try to 
> generate scripts and/or management system artifacts on top of the revision 
> export.
> 
> Once the basic data schema have been roughly determined, interactive SVG 
> views (using/like d3js) could be added.

It looks like this is a natural extension of 
http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph-Brag , would you like 
to expand this idea there ?

Cheers

> Regards,
> Alek
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee



On 07/11/2013 03:59, Mike Dawson wrote:
> I also have time I could spend.

Cool :-) Would you like to spend the time you have to advance 
http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph-Brag ?


 Thanks for getting this started Loic!
> 
> Thanks,
> Mike Dawson
> 
> 
> On 11/6/2013 12:35 PM, Loic Dachary wrote:
>> Hi Ceph,
>>
>> I would like to open a discussion about organizing a Ceph User Committee. We 
>> briefly discussed the idea with Ross Turk, Patrick McGarry and Sage Weil 
>> today during the OpenStack summit. A pad was created and roughly summarizes 
>> the idea:
>>
>> http://pad.ceph.com/p/user-committee
>>
>> If there is enough interest, I'm willing to devote one day a week working 
>> for the Ceph User Committee. And yes, that includes sitting at the Ceph 
>> booth during the FOSDEM :-) And interviewing Ceph users and describing their 
>> use cases, which I enjoy very much. But also contribute to a user centric 
>> roadmap, which is what ultimately matters for the company I work for.
>>
>> If you'd like to see this happen but don't have time to participate in this 
>> discussion, please add your name + email at the end of the pad.
>>
>> What do you think ?
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd on ubuntu 12.04 LTS

2013-11-06 Thread Gregory Farnum

How interesting; it looks like that command was added post-dumpling
and not backported. It's probably suitable for backport; I've also
created a ticket to create docs for this
(http://tracker.ceph.com/issues/6731). Did you create this cluster on
an older development release? That should be the only way for the
option to have been enabled without you setting it explicitly.
(I'm just following what the release notes at
http://ceph.com/docs/master/release-notes say about hashpspool).
-Greg

On Tue, Nov 5, 2013 at 1:32 AM, Fuchs, Andreas (SwissTXT)
 wrote:
> The command you recomend doesnt work, also I cannot find something in the 
> command reference how to do it.
>
> How can the settings be verified?
> Ceph osd dump does not show any flags:
> pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins 
> pg_num 64 pgp_num 64 last_change 1 owner 0
>
> also I cannot find something in the current running crushmap:
> rule rbd {
> ruleset 2
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
>
> I'm I really looking in the right direction?
>
>
>
>> -Original Message-
>> From: Gregory Farnum [mailto:g...@inktank.com]
>> Sent: Montag, 4. November 2013 19:17
>> To: Fuchs, Andreas (SwissTXT)
>> Cc: Karan Singh; ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] rbd on ubuntu 12.04 LTS
>>
>> On Mon, Nov 4, 2013 at 12:13 AM, Fuchs, Andreas (SwissTXT)
>>  wrote:
>> > I tryed with:
>> > ceph osd crush tunables default
>> > ceph osd crush tunables argonaut
>> >
>> > while the command runs without error, I still get the feature set
>> > mismatch error whe I try to mount do I have to restart some service?
>>
>> Ah, looking more closely it seems the feature mismatch you're getting is
>> actually the "HASHPSPOOL" feature bit. I don't think that should have been
>> enabled on Dumpling, but you can unset it on a pool basis ("ceph osd pool
>> unset  hashpspool", I believe). I don't think you'll need to restart
>> anything, but it's possible.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee


On 2013-11-07 01:03, Alek Paunov wrote:


On the other side, I think, the Ceph community is able to help
further with the wider and smoother Ceph adoption (further than
current mailing list participation in the support)


This was my thinking behind a forum format - most sysadmins, and 
especially those crossing from technologies like VMware & Microsoft, are 
more familiar with that context.  IMO it's way easier to search and far 
more scalable.  Stickys could be used to links to wiki pages detailing 
'community approved' hardware configs, performance leaderboard and such, 
and a separate moderated section for vendor approved contents perhaps... 
though that then becomes essentially a blog.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] USB pendrive as boot disk