Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket

2015-04-13 Thread Karan Singh
Things you can check 

* Is RGW node able to resolve bucket-2.ostore.athome.priv  , try ping 
bucket-2.ostore.athome.priv
* Is # s3cmd ls working or throwing errors ?

l
Are you sure the below entries are correct ? Generally host_base and 
host_bucket should point to RGW FQDN in your case ceph-radosgw1 FQDN . 
ostore.athome.priv looks like a different host to me.

host_base->ostore.athome.priv
host_bucket->%(bucket)s.ostore.athome.priv



Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/


> On 13 Apr 2015, at 06:47, Francois Lafont  wrote:
> 
> Hi,
> 
> On a testing cluster, I have a radosgw on Firefly and the other
> nodes, OSDs and monitors, are on Hammer. The nodes are installed
> with puppet in personal VM, so I can reproduce the problem.
> Generally, I use s3cmd to check the radosgw. While radosgw is on
> Firefly, I can create bucket, no problem. Then, I upgrade the
> radosgw (it's a Ubuntu Trusty):
> 
>sed -i 's/firefly/hammer/g' /etc/apt/sources.list.d/ceph.list
>apt-get update && apt-get dist-upgrade -y
>service stop apache2
>stop radosgw-all
>start radosgw-all 
>service apache2 start
> 
> After that, impossible to create a bucket with s3cmd:
> 
> --
> ~# s3cmd -d mb s3://bucket-2
> DEBUG: ConfigParser: Reading file '/root/.s3cfg'
> DEBUG: ConfigParser: bucket_location->US
> DEBUG: ConfigParser: cloudfront_host->cloudfront.amazonaws.com
> DEBUG: ConfigParser: default_mime_type->binary/octet-stream
> DEBUG: ConfigParser: delete_removed->False
> DEBUG: ConfigParser: dry_run->False
> DEBUG: ConfigParser: enable_multipart->True
> DEBUG: ConfigParser: encoding->UTF-8
> DEBUG: ConfigParser: encrypt->False
> DEBUG: ConfigParser: follow_symlinks->False
> DEBUG: ConfigParser: force->False
> DEBUG: ConfigParser: get_continue->False
> DEBUG: ConfigParser: gpg_command->/usr/bin/gpg
> DEBUG: ConfigParser: gpg_decrypt->%(gpg_command)s -d --verbose --no-use-agent 
> --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s 
> %(input_file)s
> DEBUG: ConfigParser: gpg_encrypt->%(gpg_command)s -c --verbose --no-use-agent 
> --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s 
> %(input_file)s
> DEBUG: ConfigParser: gpg_passphrase->...-3_chars...
> DEBUG: ConfigParser: guess_mime_type->True
> DEBUG: ConfigParser: host_base->ostore.athome.priv
> DEBUG: ConfigParser: access_key->5R...17_chars...Y
> DEBUG: ConfigParser: secret_key->Ij...37_chars...I
> DEBUG: ConfigParser: host_bucket->%(bucket)s.ostore.athome.priv
> DEBUG: ConfigParser: human_readable_sizes->False
> DEBUG: ConfigParser: invalidate_on_cf->False
> DEBUG: ConfigParser: list_md5->False
> DEBUG: ConfigParser: log_target_prefix->
> DEBUG: ConfigParser: mime_type->
> DEBUG: ConfigParser: multipart_chunk_size_mb->15
> DEBUG: ConfigParser: preserve_attrs->True
> DEBUG: ConfigParser: progress_meter->True
> DEBUG: ConfigParser: proxy_host->
> DEBUG: ConfigParser: proxy_port->0
> DEBUG: ConfigParser: recursive->False
> DEBUG: ConfigParser: recv_chunk->4096
> DEBUG: ConfigParser: reduced_redundancy->False
> DEBUG: ConfigParser: send_chunk->4096
> DEBUG: ConfigParser: simpledb_host->sdb.amazonaws.com
> DEBUG: ConfigParser: skip_existing->False
> DEBUG: ConfigParser: socket_timeout->300
> DEBUG: ConfigParser: urlencoding_mode->normal
> DEBUG: ConfigParser: use_https->False
> DEBUG: ConfigParser: verbosity->WARNING
> DEBUG: ConfigParser: 
> website_endpoint->http://%(bucket)s.s3-website-%(location)s.amazonaws.com/
> DEBUG: ConfigParser: website_error->
> DEBUG: ConfigParser: website_index->index.html
> DEBUG: Updating Config.Config encoding -> UTF-8
> DEBUG: Updating Config.Config follow_symlinks -> False
> DEBUG: Updating Config.Config verbosity -> 10
> DEBUG: Unicodising 'mb' using UTF-8
> DEBUG: Unicodising 's3://bucket-2' using UTF-8
> DEBUG: Command: mb
> DEBUG: SignHeaders: 'PUT\n\n\n\nx-amz-date:Mon, 13 Apr 2015 03:32:23 
> +\n/bucket-2/'
> DEBUG: CreateRequest: resource[uri]=/
> DEBUG: SignHeaders: 'PUT\n\n\n\nx-amz-date:Mon, 13 Apr 2015 03:32:23 
> +\n/bucket-2/'
> DEBUG: Processing request, please wait...
> DEBUG: get_hostname(bucket-2): bucket-2.ostore.athome.priv
> DEBUG: format_uri(): /
> DEBUG: Sending request method_string='PUT', uri='/', 
> headers={'content-length': '0', 'Authorization': 'AWS 
> 5RUS0Z3SBG6IK263PLFY:3V1MdXoCGFrJKrO2LSJaBpNMcK4=', 'x-amz-date': 'Mon, 13 
> Apr 2015 03:32:23 +'}, body=(0 bytes)
> DEBUG: Response: {'status': 405, 'headers': {'date': 'Mon, 13 Apr 2015 
> 03:32:23 GMT', 'accept-ranges': 'bytes', 'content-type': 'application/xml', 
> 'content-length': '82', 'server': 'Apache/2.4.7 (Ubuntu)'}, 'reason': 'Method 
> Not Allowed',

Re: [ceph-users] deep scrubbing causes osd down

2015-04-13 Thread 池信泽
Sorry, I am not sure whether it is look ok in your production environment.

Maybe you could use the command: ceph tell osd.0 injectargs
"-osd_scrub_sleep 0.5" . This command would affect only one osd.

If it works fine for some days, you could set for all osd.

This is just a suggestion.

2015-04-13 14:34 GMT+08:00 Lindsay Mathieson :
>
> On 13 April 2015 at 16:00, Christian Balzer  wrote:
>>
>> However the vast majority of people with production clusters will be
>> running something "stable", mostly Firefly at this moment.
>>
>> > Sorry, 0.87 is giant.
>> >
>> > BTW, you could also set osd_scrub_sleep to your cluster. ceph would
>> > sleep some time as you defined when it has scrub some objects.
>> > But I am not sure whether is could works good to you.
>> >
>> Yeah, that bit is backported to Firefly and can definitely help, however
>> the suggested initial value is too small for most people who have scrub
>> issues, starting with 0.5 seconds and see how it goes seems to work
>> better.
>
>
>
> Thanks xinze, Christian.
>
> Yah, I'm on 0.87 in production - I can wait for the next release :)
>
> In the meantime, from the prior msgs I've set this:
>
> [osd]
> osd_scrub_chunk_min = 1
> osd_scrub_chunk_max = 5
> osd_scrub_sleep = 0.5
>
>
> Do the values look ok? is the [osd] section the right spot?
>
> Thanks - Lindsay
>
>
>
> --
> Lindsay



-- 
Regards,
xinze
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph cache tier, delete rbd very slow.

2015-04-13 Thread Yujian Peng
Hi all,
I'm testing ceph cache tier(0.80.9). The IOPS is really very good with 
cache tier, but it's very slow to delete a rbd(even if an empty rbd).
It seems as if the cache pool will mark all the objects in the rbd to be 
deleted, even if the objects do not exist. 
Is this a problem of rbd?
How can I delete a rbd fast with cache tier?


Any help is going to be appreciated!

Thanks!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to dispatch monitors in a multi-site cluster (ie in 2 datacenters)

2015-04-13 Thread Joao Eduardo Luis
On 04/13/2015 02:25 AM, Christian Balzer wrote:
> On Sun, 12 Apr 2015 14:37:56 -0700 Gregory Farnum wrote:
> 
>> On Sun, Apr 12, 2015 at 1:58 PM, Francois Lafont 
>> wrote:
>>> Somnath Roy wrote:
>>>
 Interesting scenario :-).. IMHO, I don't think cluster will be in
 healthy state here if the connections between dc1 and dc2 is cut. The
 reason is the following.

 1. only osd.5 can talk to both data center  OSDs and other 2 mons
 will not be. So, they can't reach to an agreement (and form quorum)
 about the state of OSDs.

 2. OSDs on dc1 and dc2 will not be able to talk to each other,
 considering replicas across data centers, the cluster will be broken.
>>>
>>> Yes, in fact, after thought, I have the first question below.
>>>
>>> If: (more clear with a schema is the head ;))
>>>
>>> 1. mon.1 and mon.2 can talk together (in dc1) and can talk with
>>> mon.5 (via the VPN) but can't talk with mon.3 and mon.4 (in dc2)
>>> 2. mon.3 and mon.4 can talk together (in dc2) and can talk with
>>> mon.5 (via the VPN) but can't talk with mon.1 and mon.2 (in dc1)
>>> 3. mon.5 can talk with mon.1, mon.2, mon.3, mon.4 and mon.5
>>>
>>> is the quorum reached? If yes, which is the quorum?
>>
>> Yes, you should get a quorum as mon.5 will vote for one datacenter or
>> the other. Which one it chooses will depend on which monitor has the
>> "lowest" IP address (I think, or maybe just the monitor IDs or
>> something? Anyway, it's a consistent ordering). 
> 
> Pet peeve alert. ^_-
> 
> It's the lowest IP.

To be more precise, it's the lowest IP:PORT combination:

10.0.1.2:6789 = rank 0
10.0.1.2:6790 = rank 1
10.0.1.3:6789 = rank 3

and so on.

> Which is something that really needs to be documented (better) so that
> people can plan things accordingly and have the leader monitor wind up on
> the best suited hardware (in case not everything is being equal).
> 
> Other than that, the sequence of how (initial?) mons are listed in
> ceph.conf would of course be the most natural, expected way to sort
> monitors.

I don't agree.  I find it hard to rely on ceph.conf for sensitive
decisions like this, because we must ensure that ceph.conf is the same
in all the nodes;  and I've seen this not being the case more often than
not.

On the other hand, I do agree that we should make it easier for people
to specify which monitors they want in the line of succession to the
leader, so that they can plan their clusters accordingly.  I do believe
we can set this on the monmap, ideally once the first quorum is formed;
something like:

ceph mon rank set mon.a 0
ceph mon rank set mon.b 2
ceph mon rank set mon.c 1

ceph mon rank list

  MON   IP:PORT   RANK POLICYSTATUS
  mon.a 10.0.1.2:6789 rank 0  [set-by-user]  leader
  mon.c 10.0.1.3:6789 rank 1  [set-by-user]  peon
  mon.b 10.0.1.2:6790 rank 2  [set-by-user]  down
  mon.d 10.0.1.4:6789 rank 3  [default]  peon


Thoughts?

  -Joao

> 
> Christian
> 
> 
>> Under no circumstances
>> whatsoever will mon.5 help each datacenter create their own quorums at
>> the same time. The other data center will just be out of luck and
>> unable to do anything.
>> Although it's possible that the formed quorum won't be very stable
>> since the out-of-quorum monitors will probably keep trying to form a
>> quorum and that might make mon.5 unhappy. You should test what happens
>> with that kind of net split. :)
>> -Greg
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-13 Thread Götz Reinicke - IT Koordinator
Dear ceph users,

we are planing a ceph storage cluster from scratch. Might be up to 1 PB
within the next 3 years, multiple buildings, new network infrastructure
for the cluster etc.

I had some excellent trainings on ceph, so the essential fundamentals
are familiar to me, and I know our goals/dreams can be reached. :)

There is just "one tiny piece" in the design I'm currently unsure about :)

Ceph follows some sort of keep it small and simple, e.g. dont use raid
controllers, use more boxes and disks, fast network etc.

So from our current design we plan 40Gb Storage and Client LAN.

Would you suggest to connect the OSD nodes redundant to both networks?
That would end up with 4 * 40Gb ports in each box, two Switches to
connect to.

I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for "high" io
pools. (+ currently SSD for journal, but may be until we start, levelDB,
rocksDB are ready ... ?)

Later some less io bound pools for data archiving/backup. (bigger and
more Disks per node)

We would also do some Cache tiering for some pools.

From HP, Intel, Supermicron etc reference documentations, they use
usually non-redundant network connection. (single 10Gb)

I know: redundancy keeps some headaches small, but also adds some more
complexity and increases the budget. (add network adapters, other
server, more switches, etc)

So what would you suggest, what are your experiences?

Thanks for any suggestion and feedback . Regards . Götz
-- 
Götz Reinicke
IT-Koordinator

Tel. +49 7141 969 82 420
E-Mail goetz.reini...@filmakademie.de

Filmakademie Baden-Württemberg GmbH
Akademiehof 10
71638 Ludwigsburg
www.filmakademie.de

Eintragung Amtsgericht Stuttgart HRB 205016

Vorsitzender des Aufsichtsrats: Jürgen Walter MdL
Staatssekretär im Ministerium für Wissenschaft,
Forschung und Kunst Baden-Württemberg

Geschäftsführer: Prof. Thomas Schadt



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-13 Thread Alexandre DERUMIER
>>So what would you suggest, what are your experiences?

Hi, you can have a look at mellanox sx1012 for example
http://www.mellanox.com/page/products_dyn?product_family=163

12 ports 40GB for around 4000€

you can use breakout cables to have 4x12 10GB ports.


They can be stacked with mlag and lacp


- Mail original -
De: "Götz Reinicke - IT Koordinator" 
À: "ceph-users" 
Envoyé: Lundi 13 Avril 2015 11:03:24
Objet: [ceph-users] Network redundancy pro and cons, best practice, 
suggestions?

Dear ceph users, 

we are planing a ceph storage cluster from scratch. Might be up to 1 PB 
within the next 3 years, multiple buildings, new network infrastructure 
for the cluster etc. 

I had some excellent trainings on ceph, so the essential fundamentals 
are familiar to me, and I know our goals/dreams can be reached. :) 

There is just "one tiny piece" in the design I'm currently unsure about :) 

Ceph follows some sort of keep it small and simple, e.g. dont use raid 
controllers, use more boxes and disks, fast network etc. 

So from our current design we plan 40Gb Storage and Client LAN. 

Would you suggest to connect the OSD nodes redundant to both networks? 
That would end up with 4 * 40Gb ports in each box, two Switches to 
connect to. 

I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for "high" io 
pools. (+ currently SSD for journal, but may be until we start, levelDB, 
rocksDB are ready ... ?) 

Later some less io bound pools for data archiving/backup. (bigger and 
more Disks per node) 

We would also do some Cache tiering for some pools. 

From HP, Intel, Supermicron etc reference documentations, they use 
usually non-redundant network connection. (single 10Gb) 

I know: redundancy keeps some headaches small, but also adds some more 
complexity and increases the budget. (add network adapters, other 
server, more switches, etc) 

So what would you suggest, what are your experiences? 

Thanks for any suggestion and feedback . Regards . Götz 
-- 
Götz Reinicke 
IT-Koordinator 

Tel. +49 7141 969 82 420 
E-Mail goetz.reini...@filmakademie.de 

Filmakademie Baden-Württemberg GmbH 
Akademiehof 10 
71638 Ludwigsburg 
www.filmakademie.de 

Eintragung Amtsgericht Stuttgart HRB 205016 

Vorsitzender des Aufsichtsrats: Jürgen Walter MdL 
Staatssekretär im Ministerium für Wissenschaft, 
Forschung und Kunst Baden-Württemberg 

Geschäftsführer: Prof. Thomas Schadt 


___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-13 Thread Götz Reinicke - IT Koordinator
Hi Alexandre,

thanks for that suggestion. mellanox might be on our shoping list
already, but what regarding the redundandency design at all from your POV?

/Götz
Am 13.04.15 um 11:08 schrieb Alexandre DERUMIER:
>>> So what would you suggest, what are your experiences?
> 
> Hi, you can have a look at mellanox sx1012 for example
> http://www.mellanox.com/page/products_dyn?product_family=163
> 
> 12 ports 40GB for around 4000€
> 
> you can use breakout cables to have 4x12 10GB ports.
> 
> 
> They can be stacked with mlag and lacp
> 
> 
> - Mail original -
> De: "Götz Reinicke - IT Koordinator" 
> À: "ceph-users" 
> Envoyé: Lundi 13 Avril 2015 11:03:24
> Objet: [ceph-users] Network redundancy pro and cons, best practice,   
> suggestions?
> 
> Dear ceph users, 
> 
> we are planing a ceph storage cluster from scratch. Might be up to 1 PB 
> within the next 3 years, multiple buildings, new network infrastructure 
> for the cluster etc. 
> 
> I had some excellent trainings on ceph, so the essential fundamentals 
> are familiar to me, and I know our goals/dreams can be reached. :) 
> 
> There is just "one tiny piece" in the design I'm currently unsure about :) 
> 
> Ceph follows some sort of keep it small and simple, e.g. dont use raid 
> controllers, use more boxes and disks, fast network etc. 
> 
> So from our current design we plan 40Gb Storage and Client LAN. 
> 
> Would you suggest to connect the OSD nodes redundant to both networks? 
> That would end up with 4 * 40Gb ports in each box, two Switches to 
> connect to. 
> 
> I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for "high" io 
> pools. (+ currently SSD for journal, but may be until we start, levelDB, 
> rocksDB are ready ... ?) 
> 
> Later some less io bound pools for data archiving/backup. (bigger and 
> more Disks per node) 
> 
> We would also do some Cache tiering for some pools. 
> 
> From HP, Intel, Supermicron etc reference documentations, they use 
> usually non-redundant network connection. (single 10Gb) 
> 
> I know: redundancy keeps some headaches small, but also adds some more 
> complexity and increases the budget. (add network adapters, other 
> server, more switches, etc) 
> 
> So what would you suggest, what are your experiences? 
> 
> Thanks for any suggestion and feedback . Regards . Götz 
> 


-- 
Götz Reinicke
IT-Koordinator

Tel. +49 7141 969 82 420
E-Mail goetz.reini...@filmakademie.de

Filmakademie Baden-Württemberg GmbH
Akademiehof 10
71638 Ludwigsburg
www.filmakademie.de

Eintragung Amtsgericht Stuttgart HRB 205016

Vorsitzender des Aufsichtsrats: Jürgen Walter MdL
Staatssekretär im Ministerium für Wissenschaft,
Forschung und Kunst Baden-Württemberg

Geschäftsführer: Prof. Thomas Schadt



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-13 Thread Christian Balzer

Hello,

On Mon, 13 Apr 2015 11:03:24 +0200 Götz Reinicke - IT Koordinator wrote:

> Dear ceph users,
> 
> we are planing a ceph storage cluster from scratch. Might be up to 1 PB
> within the next 3 years, multiple buildings, new network infrastructure
> for the cluster etc.
> 
> I had some excellent trainings on ceph, so the essential fundamentals
> are familiar to me, and I know our goals/dreams can be reached. :)
> 
> There is just "one tiny piece" in the design I'm currently unsure
> about :)
> 
> Ceph follows some sort of keep it small and simple, e.g. dont use raid
> controllers, use more boxes and disks, fast network etc.
> 
While small and plenty is definitely true, some people actually use RAID
for OSDs (like RAID1) to avoid ever having to deal with a failed OSD and
getting a 4x replication in the end. 
Your needs and budget may of course differ.

> So from our current design we plan 40Gb Storage and Client LAN.
> 
> Would you suggest to connect the OSD nodes redundant to both networks?
> That would end up with 4 * 40Gb ports in each box, two Switches to
> connect to.
> 
If you can afford it, fabric switches are quite nice, as they allow for
LACP over 2 switches, so if everything is working you get twice the speed,
if not still full redundancy. The Brocade VDX stuff comes to mind.

However if you're not tied into an Ethernet network, you might do better
and cheaper with an Infiniband network on the storage side of things.
This will become even more attractive as RDMA support improves with Ceph.

Separating public (client) and private (storage, OSD interconnect)
networks with Ceph makes only sense if your storage node can actually
utilize all that bandwidth.

So at your storage node density of 12 HDDs (16 HDD chassis are not space
efficient), 40GbE is overkill with a single link/network, insanely so with
2 networks.

> I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for "high" io
> pools. (+ currently SSD for journal, but may be until we start, levelDB,
> rocksDB are ready ... ?)
> 
> Later some less io bound pools for data archiving/backup. (bigger and
> more Disks per node)
> 
> We would also do some Cache tiering for some pools.
> 
> From HP, Intel, Supermicron etc reference documentations, they use
> usually non-redundant network connection. (single 10Gb)
> 
> I know: redundancy keeps some headaches small, but also adds some more
> complexity and increases the budget. (add network adapters, other
> server, more switches, etc)
> 
Complexity not so much, cost yes.

> So what would you suggest, what are your experiences?
> 
It all depends on how small (large really) you can start.

I have only small clusters with few nodes, so for me redundancy is a big
deal.
Thus those cluster use Infiniband, 2 switches and dual-port HCAs on the
nodes in an active-standby mode.

If you however can start with something like 10 racks (ToR switches),
loosing one switch would mean a loss of 10% of your cluster, which is
something it should be able to cope with.
Especially if you configured Ceph to _not_ start re-balancing data
automatically if a rack goes down (so that you have a chance to put a
replacement switch in place, which you of course kept handy on-site for
such a case). ^.-

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] low power single disk nodes

2015-04-13 Thread Jerker Nyberg


Hello,

Thanks for all replies! The Banana Pi could work. The built in SATA-power 
in Banana Pi can power a 2.5" SATA disk. Cool. (Not 3.5" SATA since that 
seem to require 12 V too.)


I found this post from Vess Bakalov about the same subject:
http://millibit.blogspot.se/2015/01/ceph-pi-adding-osd-and-more-performance.html

For PoE I have only found Intel Galileo Gen 2 or RouterBOARD RB450G which 
are too slow and/or miss IO-expansion. (But good for signage/Xibo maybe!)


I found two boxes from Quanta and SuperMicro with single socket Xeon or 
with Intel Atom (Avaton) that might be quite ok. I was only aware of the 
dual-Xeons before.


http://www.quantaqct.com/Product/Servers/Rackmount-Servers/STRATOS-S100-L11SL-p151c77c70c83
http://www.supermicro.nl/products/system/1U/5018/SSG-5018A-AR12L.cfm

Kind regards,
Jerker Nyberg



On Thu, 9 Apr 2015, Quentin Hartman wrote:


I'm skeptical about how well this would work, but a Banana Pi might be a
place to start. Like a raspberry pi, but it has a SATA connector:
http://www.bananapi.org/

On Thu, Apr 9, 2015 at 3:18 AM, Jerker Nyberg  wrote:



Hello ceph users,

Is anyone running any low powered single disk nodes with Ceph now? Calxeda
seems to be no more according to Wikipedia. I do not think HP moonshot is
what I am looking for - I want stand-alone nodes, not server cartridges
integrated into server chassis. And I do not want to be locked to a single
vendor.

I was playing with Raspberry Pi 2 for signage when I thought of my old
experiments with Ceph.

I am thinking of for example Odroid-C1 or Odroid-XU3 Lite or maybe
something with a low-power Intel x64/x86 processor. Together with one SSD
or one low power HDD the node could get all power via PoE (via splitter or
integrated into board if such boards exist). PoE provide remote power-on
power-off even for consumer grade nodes.

The cost for a single low power node should be able to compete with
traditional PC-servers price per disk. Ceph take care of redundancy.

I think simple custom casing should be good enough - maybe just strap or
velcro everything on trays in the rack, at least for the nodes with SSD.

Kind regards,
--
Jerker Nyberg, Uppsala, Sweden.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket

2015-04-13 Thread Francois Lafont
Karan Singh wrote:

> Things you can check 
> 
> * Is RGW node able to resolve bucket-2.ostore.athome.priv  , try ping 
> bucket-2.ostore.athome.priv

Yes, my DNS configuration is ok. In fact, I test s3cmd directly
on my radosgw (its hostname is "ceph-radosgw1" but its fqdn is
"ostore.athome.priv"):

---
~# hostname
ceph-radosgw1

~# ip addr show dev eth0 | grep 'inet '
inet 172.31.10.6/16 brd 172.31.255.255 scope global eth0

~# dig +short  ostore.athome.priv \
  bucket-2.ostore.athome.priv \
   foo.ostore.athome.priv \
   bar.ostore.athome.priv \
   hjffhkj.ostore.athome.priv 
172.31.10.6
172.31.10.6
172.31.10.6
172.31.10.6
172.31.10.6

~# getent hosts ostore.athome.priv
172.31.10.6 ostore.athome.priv

~# getent hosts jfkjfl.ostore.athome.priv
172.31.10.6 jfkjfl.ostore.athome.priv
---

> * Is # s3cmd ls working or throwing errors ?

It doesn't work after upgrading with Hammer too. More
precisely, in Firefly radosgw, It works:

---
~# s3cmd ls s3://bucket
2015-04-12 23:35 735985664   s3://bucket/les_evades.avi

~# s3cmd ls
2015-04-12 23:28  s3://bucket
---

But after the upgrade to Hammer, it doesn't work:

---
~# s3cmd ls s3://bucket
ERROR: S3 error: 403 (SignatureDoesNotMatch): 

~# s3cmd ls
2015-04-12 23:28  s3://bucket
---

As you can see, the second command works but not the first.
[1] At the end of this message, I put the output of the first
command with the debug option, just in case.

> Are you sure the below entries are correct ? Generally host_base and 
> host_bucket should point to RGW FQDN in your case ceph-radosgw1 FQDN . 
> ostore.athome.priv looks like a different host to me.
> 
> host_base->ostore.athome.priv
> host_bucket->%(bucket)s.ostore.athome.priv

For me it's ok:

---
~# grep 'host_' ~/.s3cfg 
host_base = ostore.athome.priv
host_bucket = %(bucket)s.ostore.athome.priv
---

And ostore.athome.priv is really my radosgw (see the dig
commands above). And when I try a s3cmd command, I can
see new lines in the apache logs of my radosgw.

Thanks for your help Karan.

[1]

---
~# s3cmd -d ls s3://bucket
DEBUG: ConfigParser: Reading file '/root/.s3cfg'
DEBUG: ConfigParser: bucket_location->US
DEBUG: ConfigParser: cloudfront_host->ostore.athome.priv
DEBUG: ConfigParser: default_mime_type->binary/octet-stream
DEBUG: ConfigParser: delete_removed->False
DEBUG: ConfigParser: dry_run->False
DEBUG: ConfigParser: enable_multipart->True
DEBUG: ConfigParser: encoding->UTF-8
DEBUG: ConfigParser: encrypt->False
DEBUG: ConfigParser: follow_symlinks->False
DEBUG: ConfigParser: force->False
DEBUG: ConfigParser: get_continue->False
DEBUG: ConfigParser: gpg_command->/usr/bin/gpg
DEBUG: ConfigParser: gpg_decrypt->%(gpg_command)s -d --verbose --no-use-agent 
--batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s 
%(input_file)s
DEBUG: ConfigParser: gpg_encrypt->%(gpg_command)s -c --verbose --no-use-agent 
--batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s 
%(input_file)s
DEBUG: ConfigParser: gpg_passphrase->...-3_chars...
DEBUG: ConfigParser: guess_mime_type->True
DEBUG: ConfigParser: host_base->ostore.athome.priv
DEBUG: ConfigParser: access_key->1Q...17_chars...Y
DEBUG: ConfigParser: secret_key->92...37_chars...W
DEBUG: ConfigParser: host_bucket->%(bucket)s.ostore.athome.priv
DEBUG: ConfigParser: human_readable_sizes->False
DEBUG: ConfigParser: invalidate_on_cf->False
DEBUG: ConfigParser: list_md5->False
DEBUG: ConfigParser: log_target_prefix->
DEBUG: ConfigParser: mime_type->
DEBUG: ConfigParser: multipart_chunk_size_mb->15
DEBUG: ConfigParser: preserve_attrs->True
DEBUG: ConfigParser: progress_meter->True
DEBUG: ConfigParser: proxy_host->
DEBUG: ConfigParser: proxy_port->0
DEBUG: ConfigParser: recursive->False
DEBUG: ConfigParser: recv_chunk->4096
DEBUG: ConfigParser: reduced_redundancy->False
DEBUG: ConfigParser: send_chunk->4096
DEBUG: ConfigParser: simpledb_host->ostore.athome.priv
DEBUG: ConfigParser: skip_existing->False
DEBUG: ConfigParser: socket_timeout->300
DEBUG: ConfigParser: urlencoding_mode->normal
DEBUG: ConfigParser: use_https->False
DEBUG: ConfigParser: verbosity->WARNING
DEBUG: ConfigParser: website_endpoint->http://%(bucket)s.ostore.athome.priv
DEBUG: ConfigParser: website_error->
DEBUG: ConfigParser: website_index->index.html
DEBUG: Updating Config.Config encoding -> UTF-8
DEBUG: Updating Config.Config follow_symlinks -> False
DEBUG: Updating Config.Config verbosity -> 10
DEBUG: Unicodising 'ls' using UTF-8
DEBUG: Unicodising 's3://bucket' using UTF-8
DEBUG: Command: ls
DEBUG: Bucket 's3://bucket':
DEBUG: SignHeaders: 'GET\n\n\n\nx-amz-date:Mon, 13 Apr 2015 12:15:16 
+\n/bucket/'
DEBUG: CreateRequest: 

Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket

2015-04-13 Thread Karan Singh
You can give a try with swift API as well.


Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/


> On 13 Apr 2015, at 15:19, Francois Lafont  wrote:
> 
> Karan Singh wrote:
> 
>> Things you can check 
>> 
>> * Is RGW node able to resolve bucket-2.ostore.athome.priv  , try ping 
>> bucket-2.ostore.athome.priv
> 
> Yes, my DNS configuration is ok. In fact, I test s3cmd directly
> on my radosgw (its hostname is "ceph-radosgw1" but its fqdn is
> "ostore.athome.priv"):
> 
> ---
> ~# hostname
> ceph-radosgw1
> 
> ~# ip addr show dev eth0 | grep 'inet '
>inet 172.31.10.6/16 brd 172.31.255.255 scope global eth0
> 
> ~# dig +short  ostore.athome.priv \
>  bucket-2.ostore.athome.priv \
>   foo.ostore.athome.priv \
>   bar.ostore.athome.priv \
>   hjffhkj.ostore.athome.priv 
> 172.31.10.6
> 172.31.10.6
> 172.31.10.6
> 172.31.10.6
> 172.31.10.6
> 
> ~# getent hosts ostore.athome.priv
> 172.31.10.6 ostore.athome.priv
> 
> ~# getent hosts jfkjfl.ostore.athome.priv
> 172.31.10.6 jfkjfl.ostore.athome.priv
> ---
> 
>> * Is # s3cmd ls working or throwing errors ?
> 
> It doesn't work after upgrading with Hammer too. More
> precisely, in Firefly radosgw, It works:
> 
> ---
> ~# s3cmd ls s3://bucket
> 2015-04-12 23:35 735985664   s3://bucket/les_evades.avi
> 
> ~# s3cmd ls
> 2015-04-12 23:28  s3://bucket
> ---
> 
> But after the upgrade to Hammer, it doesn't work:
> 
> ---
> ~# s3cmd ls s3://bucket
> ERROR: S3 error: 403 (SignatureDoesNotMatch): 
> 
> ~# s3cmd ls
> 2015-04-12 23:28  s3://bucket
> ---
> 
> As you can see, the second command works but not the first.
> [1] At the end of this message, I put the output of the first
> command with the debug option, just in case.
> 
>> Are you sure the below entries are correct ? Generally host_base and 
>> host_bucket should point to RGW FQDN in your case ceph-radosgw1 FQDN . 
>> ostore.athome.priv looks like a different host to me.
>> 
>> host_base->ostore.athome.priv
>> host_bucket->%(bucket)s.ostore.athome.priv
> 
> For me it's ok:
> 
> ---
> ~# grep 'host_' ~/.s3cfg 
> host_base = ostore.athome.priv
> host_bucket = %(bucket)s.ostore.athome.priv
> ---
> 
> And ostore.athome.priv is really my radosgw (see the dig
> commands above). And when I try a s3cmd command, I can
> see new lines in the apache logs of my radosgw.
> 
> Thanks for your help Karan.
> 
> [1]
> 
> ---
> ~# s3cmd -d ls s3://bucket
> DEBUG: ConfigParser: Reading file '/root/.s3cfg'
> DEBUG: ConfigParser: bucket_location->US
> DEBUG: ConfigParser: cloudfront_host->ostore.athome.priv
> DEBUG: ConfigParser: default_mime_type->binary/octet-stream
> DEBUG: ConfigParser: delete_removed->False
> DEBUG: ConfigParser: dry_run->False
> DEBUG: ConfigParser: enable_multipart->True
> DEBUG: ConfigParser: encoding->UTF-8
> DEBUG: ConfigParser: encrypt->False
> DEBUG: ConfigParser: follow_symlinks->False
> DEBUG: ConfigParser: force->False
> DEBUG: ConfigParser: get_continue->False
> DEBUG: ConfigParser: gpg_command->/usr/bin/gpg
> DEBUG: ConfigParser: gpg_decrypt->%(gpg_command)s -d --verbose --no-use-agent 
> --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s 
> %(input_file)s
> DEBUG: ConfigParser: gpg_encrypt->%(gpg_command)s -c --verbose --no-use-agent 
> --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s 
> %(input_file)s
> DEBUG: ConfigParser: gpg_passphrase->...-3_chars...
> DEBUG: ConfigParser: guess_mime_type->True
> DEBUG: ConfigParser: host_base->ostore.athome.priv
> DEBUG: ConfigParser: access_key->1Q...17_chars...Y
> DEBUG: ConfigParser: secret_key->92...37_chars...W
> DEBUG: ConfigParser: host_bucket->%(bucket)s.ostore.athome.priv
> DEBUG: ConfigParser: human_readable_sizes->False
> DEBUG: ConfigParser: invalidate_on_cf->False
> DEBUG: ConfigParser: list_md5->False
> DEBUG: ConfigParser: log_target_prefix->
> DEBUG: ConfigParser: mime_type->
> DEBUG: ConfigParser: multipart_chunk_size_mb->15
> DEBUG: ConfigParser: preserve_attrs->True
> DEBUG: ConfigParser: progress_meter->True
> DEBUG: ConfigParser: proxy_host->
> DEBUG: ConfigParser: proxy_port->0
> DEBUG: ConfigParser: recursive->False
> DEBUG: ConfigParser: recv_chunk->4096
> DEBUG: ConfigParser: reduced_redundancy->False
> DEBUG: ConfigParser: send_chunk->4096
> DEBUG: ConfigParser: simpledb_host->ostore.athome.priv
> DEBUG: ConfigParser: skip_existing->False
> DEBUG: ConfigParser:

Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket

2015-04-13 Thread Karan Singh
Also what version of s3cmd you are using ??

To me the error “S3 error: 403 (SignatureDoesNotMatch)” seems to be from s3cmd 
side rather RGW. 

But lets diagnose.


Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/


> On 13 Apr 2015, at 15:43, Karan Singh  wrote:
> 
> You can give a try with swift API as well.
> 
> 
> Karan Singh 
> Systems Specialist , Storage Platforms
> CSC - IT Center for Science,
> Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
> mobile: +358 503 812758
> tel. +358 9 4572001
> fax +358 9 4572302
> http://www.csc.fi/ 
> 
> 
>> On 13 Apr 2015, at 15:19, Francois Lafont > > wrote:
>> 
>> Karan Singh wrote:
>> 
>>> Things you can check 
>>> 
>>> * Is RGW node able to resolve bucket-2.ostore.athome.priv  , try ping 
>>> bucket-2.ostore.athome.priv
>> 
>> Yes, my DNS configuration is ok. In fact, I test s3cmd directly
>> on my radosgw (its hostname is "ceph-radosgw1" but its fqdn is
>> "ostore.athome.priv"):
>> 
>> ---
>> ~# hostname
>> ceph-radosgw1
>> 
>> ~# ip addr show dev eth0 | grep 'inet '
>>inet 172.31.10.6/16 brd 172.31.255.255 scope global eth0
>> 
>> ~# dig +short  ostore.athome.priv \
>>  bucket-2.ostore.athome.priv \
>>   foo.ostore.athome.priv \
>>   bar.ostore.athome.priv \
>>   hjffhkj.ostore.athome.priv 
>> 172.31.10.6
>> 172.31.10.6
>> 172.31.10.6
>> 172.31.10.6
>> 172.31.10.6
>> 
>> ~# getent hosts ostore.athome.priv
>> 172.31.10.6 ostore.athome.priv
>> 
>> ~# getent hosts jfkjfl.ostore.athome.priv
>> 172.31.10.6 jfkjfl.ostore.athome.priv
>> ---
>> 
>>> * Is # s3cmd ls working or throwing errors ?
>> 
>> It doesn't work after upgrading with Hammer too. More
>> precisely, in Firefly radosgw, It works:
>> 
>> ---
>> ~# s3cmd ls s3://bucket 
>> 2015-04-12 23:35 735985664   s3://bucket/les_evades.avi 
>> 
>> 
>> ~# s3cmd ls
>> 2015-04-12 23:28  s3://bucket 
>> ---
>> 
>> But after the upgrade to Hammer, it doesn't work:
>> 
>> ---
>> ~# s3cmd ls s3://bucket 
>> ERROR: S3 error: 403 (SignatureDoesNotMatch): 
>> 
>> ~# s3cmd ls
>> 2015-04-12 23:28  s3://bucket 
>> ---
>> 
>> As you can see, the second command works but not the first.
>> [1] At the end of this message, I put the output of the first
>> command with the debug option, just in case.
>> 
>>> Are you sure the below entries are correct ? Generally host_base and 
>>> host_bucket should point to RGW FQDN in your case ceph-radosgw1 FQDN . 
>>> ostore.athome.priv looks like a different host to me.
>>> 
>>> host_base->ostore.athome.priv
>>> host_bucket->%(bucket)s.ostore.athome.priv
>> 
>> For me it's ok:
>> 
>> ---
>> ~# grep 'host_' ~/.s3cfg 
>> host_base = ostore.athome.priv
>> host_bucket = %(bucket)s.ostore.athome.priv
>> ---
>> 
>> And ostore.athome.priv is really my radosgw (see the dig
>> commands above). And when I try a s3cmd command, I can
>> see new lines in the apache logs of my radosgw.
>> 
>> Thanks for your help Karan.
>> 
>> [1]
>> 
>> ---
>> ~# s3cmd -d ls s3://bucket 
>> DEBUG: ConfigParser: Reading file '/root/.s3cfg'
>> DEBUG: ConfigParser: bucket_location->US
>> DEBUG: ConfigParser: cloudfront_host->ostore.athome.priv
>> DEBUG: ConfigParser: default_mime_type->binary/octet-stream
>> DEBUG: ConfigParser: delete_removed->False
>> DEBUG: ConfigParser: dry_run->False
>> DEBUG: ConfigParser: enable_multipart->True
>> DEBUG: ConfigParser: encoding->UTF-8
>> DEBUG: ConfigParser: encrypt->False
>> DEBUG: ConfigParser: follow_symlinks->False
>> DEBUG: ConfigParser: force->False
>> DEBUG: ConfigParser: get_continue->False
>> DEBUG: ConfigParser: gpg_command->/usr/bin/gpg
>> DEBUG: ConfigParser: gpg_decrypt->%(gpg_command)s -d --verbose 
>> --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o 
>> %(output_file)s %(input_file)s
>> DEBUG: ConfigParser: gpg_encrypt->%(gpg_command)s -c --verbose 
>> --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o 
>> %(output_file)s %(input_file)s
>> DEBUG: ConfigParser: gpg_passphrase->...-3_chars...
>> DEBUG: ConfigParser: guess_mime_type->True
>> DEBUG: ConfigParser: host_base->ostore.athome.priv
>> DEBUG: ConfigParser: access_key->1Q...17_chars...Y
>> DEBUG: ConfigParser: secret_key->92...37_chars...W
>> DEBUG: ConfigPars

Re: [ceph-users] [radosgw] ceph daemon usage

2015-04-13 Thread ghislain.chevalier
HI all,

Works with ceph -admin-daemon 
/var/run/ceph/ceph-client.radosgw.fr-rennes-radosgw1.asok config set debug_rgw 
20

De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de 
ghislain.cheval...@orange.com
Envoyé : mercredi 25 février 2015 15:06
À : Ceph Users
Objet : [ceph-users] [radosgw] ceph daemon usage


Hi all



Context : Firefly 0.80.8, Ubuntu 14.04 LTS



I tried to change « live » the debug level of a rados gateway using ceph daemon 
/var/run/ceph/ceph-client.radosgw.fr-rennes-radosgw1.asok config set debug_rgw 
20 the response is { "success": ""} but it has no effect.



Is there another parameter to change?



Best regards



- - - - - - - - - - - - - - - - -

Ghislain Chevalier

ORANGE FRANCE


_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd: incorrect metadata

2015-04-13 Thread Jason Dillaman
Can you add "debug rbd = 20" to your config, re-run the rbd command, and paste 
a link to the generated client log file?

The rbd_children and rbd_directory objects store state as omap key/values, not 
as actual binary data within the object.  You can use "rados -p rbd 
listomapvals rbd_directory/rbd_children" to see the data within the files.

-- 

Jason Dillaman 
Red Hat 
dilla...@redhat.com 
http://www.redhat.com 


- Original Message -
From: "Matthew Monaco" 
To: ceph-users@lists.ceph.com
Sent: Sunday, April 12, 2015 10:57:46 PM
Subject: [ceph-users] rbd: incorrect metadata

I have a pool used for RBD in a bit of an inconsistent state. Somehow, through
OpenStack, the data associated with a child volume was deleted. If I try to
unprotect the snapshot, librbd complains there is at least one child. If I try
to list out the children, librbd errors out on looking up the image id. "listing
children failed: (2) No such file or directory".

I thought I might get brave and edit rbd_children / rbd_directory / rbd_header*
"by hand." However rbd_children and rbd_directory are empty. I've even tracked
down all 3 copies of each meta file in the filestore and they too are empty. So
where is the parent/child association coming from? Despite this, I don't have
issues listing children for other snapshots in the same pool.

Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-disk command raises partx error

2015-04-13 Thread HEWLETT, Paul (Paul)** CTR **
Hi Everyone

I am using the ceph-disk command to prepare disks for an OSD.
The command is:

ceph-disk prepare --zap-disk --cluster $CLUSTERNAME --cluster-uuid $CLUSTERUUID 
--fs-type xfs /dev/${1}

and this consistently raises the following error on RHEL7.1 and Ceph Hammer viz:

partx: specified range <1:0> does not make sense
partx: /dev/sdb: error adding partition 2
partx: /dev/sdb: error adding partitions 1-2
partx: /dev/sdb: error adding partitions 1-2

I have had similar errors on previous versions of Ceph and RHEL. We have 
decided to stick with Hammer/7.1 and I
am interested if anybody has any comment on this.

The error seems to do no harm so is probably cosmetic but on principle at least 
I would al least like to know if
I can safely ignore this.

Many thanks.

Regards

Paul Hewlett
Senior Systems Engineer
Velocix, Cambridge
Alcatel-Lucent
t: +44 1223 435893 m: +44 7985327353


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados Gateway and keystone

2015-04-13 Thread ghislain.chevalier
Hi all,

Coming back to that issue.

I successfully used keystone users for the rados gateway and the swift API but 
I still don't understand how it can work with S3 API and i.e. S3 users 
(AccessKey/SecretKey)

I found a swift3 initiative but I think It's only compliant in a pure OpenStack 
swift environment  by setting up a specific plug-in. 
https://github.com/stackforge/swift3

A rgw can be, at the same, time under keystone control and  standard 
radosgw-admin if
- for swift, you use the right authentication service (keystone or internal)
- for S3, you use the internal authentication service

So, my questions are still valid.
How can a rgw work for S3 users if there are stored in keystone? Which is the 
accesskey and secretkey?
What is the purpose of "rgw s3 auth use keystone" parameter ?

Best regards

--
De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de 
ghislain.cheval...@orange.com
Envoyé : lundi 23 mars 2015 14:03
À : ceph-users
Objet : [ceph-users] Rados Gateway and keystone

Hi All,

I just would to be sure about keystone configuration for Rados Gateway.

I read the documentation http://ceph.com/docs/master/radosgw/keystone/ and 
http://ceph.com/docs/master/radosgw/config-ref/?highlight=keystone
but I didn't catch if after having configured the rados gateway (ceph.conf) in 
order to use keystone, it becomes mandatory to create all the users in it. 

In other words, can a rgw be, at the same, time under keystone control and  
standard radosgw-admin ?
How does it work for S3 users ?
What is the purpose of "rgw s3 auth use keystone" parameter ?

Best regards

- - - - - - - - - - - - - - - - - 
Ghislain Chevalier
+33299124432
+33788624370
ghislain.cheval...@orange.com 
_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] question about OSD failure detection

2015-04-13 Thread Liu, Ming (HPIT-GADSC)
Thank you Xiaoxi very much!

Your answer is very clear. I need to read more first about CRUSH. But I think 
basically your answer help me a lot. In practice, OSD check OSD should be OK. 
There is possibility of a 'cross product' connections among OSD for heartbeat, 
but by designing rules, that can be avoided. So I think I need first to fully 
understand CRUSH, how PG is mapped to OSD.
My major concern is, if each OSD needs to check all others, there will be too 
much heartbeat as cluster size grow. 10 OSD needs 10*10 heartbeat messages 
every checking interval, which is OK, but 1000 OSDs needs 1000*1000 heartbeats 
which seems too many messages. But I think you just confirmed this will not 
happen.

And I notice all other thread title has a prefix such as [ceph-user]. If this 
is a rule for this maillist, I am sorry, will obey next time.

Thanks,
Ming

From: Chen, Xiaoxi [mailto:xiaoxi.c...@intel.com]
Sent: Monday, April 13, 2015 2:32 PM
To: Liu, Ming (HPIT-GADSC); ceph-users@lists.ceph.com
Subject: RE: question about OSD failure detection

Hi,

1.   In short, the OSD need to heartbeat with up to  #PG x (#Replica -1 ), 
but actually will be much less since most of the peers are redundant.
For example,  An OSD (say OSD 1) is holding  100  PGs, especially for some PGs, 
say PG 1, OSD1 is the primary OSD of PG1, then OSD1 need to peer with all other 
OSDs in PG1's acting set and up set(basically you could think these two sets 
are other replications for PG1).

So if the cluster with very simple(default) ruleset, it's possible that an OSD 
need to peer with all other OSDs.


2.   OSD will randomly select a mon when the OSD boot up , and talking to 
the mon consistently. It's the monitor quorum's job to reach an agreement about 
the OSD status. See paxos if you want to know more details in how to reach the 
agreement.

3.   OSD do ping with Mon, but in reality, the network between monitor and 
osd likely not the networking between OSD's . As an instance, Mon <-> OSD is in 
management network bug OSD<-> OSD is in 10Gb data network. So only ping with 
Mon is not enough.

Actually there are heartbeat on both public and cluster network, just use to 
ensure the network connectivity.



Xiaoxi



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Liu, 
Ming (HPIT-GADSC)
Sent: Monday, April 13, 2015 12:08 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] question about OSD failure detection

Hi, all,

I am new to study Ceph. Trying to understand how it works and designs.
One basic question for me is about how Ceph OSD perform failure detection. I 
did some searching but cannot get satisfied answer, so try to ask here and hope 
someone can kindly help me.
The document said, OSD will send heartbeat to other OSD and report failure to 
Monitor when it detect some other OSD is down.
My questions are
1.  One particular OSD will heartbeat to how many other OSD? Is it possible for 
one OSD to do heartbeat to most of other OSDs or even all other OSDs in the 
cluster? In other words, how OSD decide the list of OSDs that it needs to check 
for health?
2.  If OSD detect a failure, to which Monitor it report to? This selection is 
random or has some rule? Or it needs to report to multiple Monitors?
3.  Why not each OSD directly do heartbeat with a monitor?
If I get question 1 answered that one OSD maybe need to checking all other OSDs 
in the cluster, and if this is true for many OSDs in the cluster, it looks like 
a lot of network traffic and redundant. Say, OSD-1 check OSD-2, OSD-3, OSD-4, 
while OSD-2 also check OSD-3, OSD-4, OSD-5. Then both OSD1,2 do redundant 
health checking for OSD-4,5.
If answer of question 1 is: one OSD only need to do heartbeat with very few 
other OSDs, never has possibility to check most of other OSDs, then I am fine, 
this will decentralize the monitor for health checking. But I want to know the 
rule of how OSD decide which other OSD it needs to check to further understand 
this.

I read almost all articles on internet I can find up to now, but still cannot 
get very satisfied answer. I don't want to dive into source code yet, that may 
take a long time for me. I want to first understand the principles. Then decide 
if I really worth to spend huge time to read src code. So really want someone 
can help me here.

Any help will be very appreciated!!

Thanks,
Ming
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk command raises partx error

2015-04-13 Thread Loic Dachary

Hi,

On 13/04/2015 16:15, HEWLETT, Paul (Paul)** CTR ** wrote:
> Hi Everyone
> 
> I am using the ceph-disk command to prepare disks for an OSD.
> The command is:
> 
> *ceph-disk prepare --zap-disk --cluster $CLUSTERNAME --cluster-uuid 
> $CLUSTERUUID --fs-type xfs /dev/${1}*
> 
> and this consistently raises the following error on RHEL7.1 and Ceph Hammer 
> viz:
> 
> partx: specified range <1:0> does not make sense
> partx: /dev/sdb: error adding partition 2
> partx: /dev/sdb: error adding partitions 1-2
> partx: /dev/sdb: error adding partitions 1-2


> 
> I have had similar errors on previous versions of Ceph and RHEL. We have 
> decided to stick with Hammer/7.1 and I
> am interested if anybody has any comment on this.
> 
> The error seems to do no harm so is probably cosmetic but on principle at 
> least I would al least like to know if
> I can safely ignore this.

It is indeed harmless. It was not suppressed to get more information 
information when and if something goes wrong. I've created 
http://tracker.ceph.com/issues/11377 to remember that confusion should be fixed.

Cheers

> 
> Many thanks.
> 
> Regards
> 
> Paul Hewlett
> Senior Systems Engineer
> Velocix, Cambridge
> Alcatel-Lucent
> t: +44 1223 435893 m: +44 7985327353
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-13 Thread Scott Laird
Redundancy is a means to an end, not an end itself.

If you can afford to lose component X, manually replace it, and then return
everything impacted to service, then there's no point in making X redundant.

If you can afford to lose a single disk (which Ceph certainly can), then
there's no point in local RAID.

If you can afford to lose a single machine, then there's no point in
redundant power supplies (although they can make power maintenance work a
lot less complex).

If you can afford to lose everything attached to a switch, then there's no
point in making it redundant.


Doing redundant networking to the host adds a lot of complexity that isn't
really there with single-attached hosts.  For instance, what happens if one
of the switches loses its connection to the outside world?  With LACP,
you'll probably lose connectivity to half of your peers.  Doing something
like OSPF, possibly with ECMP, avoids that problem, but certainly doesn't
make things less complicated.

In most cases, I'd avoid switch redundancy.  If I had more than 10 racks,
there's really no point, because you should be able to lose a rack without
massive disruption.  If I only had a rack or two, than I quite likely
wouldn't bother, simply because it ends up being a bigger part of the cost
and the added complexity and cost isn't worth it in most cases.

It comes down to engineering tradeoffs and money, and the right balance is
different in just about every situation.  It's a function of money,
acceptance of risk, scale, performance, networking experience, and the cost
of outages.


Scott

On Mon, Apr 13, 2015 at 4:02 AM Christian Balzer  wrote:

>
> Hello,
>
> On Mon, 13 Apr 2015 11:03:24 +0200 Götz Reinicke - IT Koordinator wrote:
>
> > Dear ceph users,
> >
> > we are planing a ceph storage cluster from scratch. Might be up to 1 PB
> > within the next 3 years, multiple buildings, new network infrastructure
> > for the cluster etc.
> >
> > I had some excellent trainings on ceph, so the essential fundamentals
> > are familiar to me, and I know our goals/dreams can be reached. :)
> >
> > There is just "one tiny piece" in the design I'm currently unsure
> > about :)
> >
> > Ceph follows some sort of keep it small and simple, e.g. dont use raid
> > controllers, use more boxes and disks, fast network etc.
> >
> While small and plenty is definitely true, some people actually use RAID
> for OSDs (like RAID1) to avoid ever having to deal with a failed OSD and
> getting a 4x replication in the end.
> Your needs and budget may of course differ.
>
> > So from our current design we plan 40Gb Storage and Client LAN.
> >
> > Would you suggest to connect the OSD nodes redundant to both networks?
> > That would end up with 4 * 40Gb ports in each box, two Switches to
> > connect to.
> >
> If you can afford it, fabric switches are quite nice, as they allow for
> LACP over 2 switches, so if everything is working you get twice the speed,
> if not still full redundancy. The Brocade VDX stuff comes to mind.
>
> However if you're not tied into an Ethernet network, you might do better
> and cheaper with an Infiniband network on the storage side of things.
> This will become even more attractive as RDMA support improves with Ceph.
>
> Separating public (client) and private (storage, OSD interconnect)
> networks with Ceph makes only sense if your storage node can actually
> utilize all that bandwidth.
>
> So at your storage node density of 12 HDDs (16 HDD chassis are not space
> efficient), 40GbE is overkill with a single link/network, insanely so with
> 2 networks.
>
> > I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for "high" io
> > pools. (+ currently SSD for journal, but may be until we start, levelDB,
> > rocksDB are ready ... ?)
> >
> > Later some less io bound pools for data archiving/backup. (bigger and
> > more Disks per node)
> >
> > We would also do some Cache tiering for some pools.
> >
> > From HP, Intel, Supermicron etc reference documentations, they use
> > usually non-redundant network connection. (single 10Gb)
> >
> > I know: redundancy keeps some headaches small, but also adds some more
> > complexity and increases the budget. (add network adapters, other
> > server, more switches, etc)
> >
> Complexity not so much, cost yes.
>
> > So what would you suggest, what are your experiences?
> >
> It all depends on how small (large really) you can start.
>
> I have only small clusters with few nodes, so for me redundancy is a big
> deal.
> Thus those cluster use Infiniband, 2 switches and dual-port HCAs on the
> nodes in an active-standby mode.
>
> If you however can start with something like 10 racks (ToR switches),
> loosing one switch would mean a loss of 10% of your cluster, which is
> something it should be able to cope with.
> Especially if you configured Ceph to _not_ start re-balancing data
> automatically if a rack goes down (so that you have a chance to put a
> replacement switch in place, which you of course kept handy on-site for
> su

[ceph-users] v0.94.1 Hammer released

2015-04-13 Thread Sage Weil
This bug fix release fixes a few critical issues with CRUSH.  The most 
important addresses a bug in feature bit enforcement that may prevent 
pre-hammer clients from communicating with the cluster during an upgrade.  
This only manifests in some cases (for example, when the 'rack' type is in 
use in the CRUSH map, and possibly other cases), but for safety we 
strongly recommend that all users use 0.94.1 instead of 0.94 when 
upgrading.

There is also a fix in the new straw2 buckets when OSD weights are 0.

We recommend that all v0.94 users upgrade.

Notable changes
---

* crush: fix divide-by-0 in straw2 (#11357 Sage Weil)
* crush: fix has_v4_buckets (#11364 Sage Weil)
* osd: fix negative degraded objects during backfilling (#7737 Guang Yang)

For more detailed information, see the complete changelog at

  http://docs.ceph.com/docs/master/_downloads/v0.94.1.txt

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://ceph.com/download/ceph-0.94.1.tar.gz
* For packages, see http://ceph.com/docs/master/install/get-packages
* For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket

2015-04-13 Thread Yehuda Sadeh-Weinraub


- Original Message -
> From: "Francois Lafont" 
> To: ceph-users@lists.ceph.com
> Sent: Sunday, April 12, 2015 8:47:40 PM
> Subject: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to 
> create bucket
> 
> Hi,
> 
> On a testing cluster, I have a radosgw on Firefly and the other
> nodes, OSDs and monitors, are on Hammer. The nodes are installed
> with puppet in personal VM, so I can reproduce the problem.
> Generally, I use s3cmd to check the radosgw. While radosgw is on
> Firefly, I can create bucket, no problem. Then, I upgrade the
> radosgw (it's a Ubuntu Trusty):
> 
> sed -i 's/firefly/hammer/g' /etc/apt/sources.list.d/ceph.list
> apt-get update && apt-get dist-upgrade -y
> service stop apache2
> stop radosgw-all
> start radosgw-all
> service apache2 start
> 
> After that, impossible to create a bucket with s3cmd:
> 
> --
> ~# s3cmd -d mb s3://bucket-2
> DEBUG: ConfigParser: Reading file '/root/.s3cfg'
> DEBUG: ConfigParser: bucket_location->US
> DEBUG: ConfigParser: cloudfront_host->cloudfront.amazonaws.com
> DEBUG: ConfigParser: default_mime_type->binary/octet-stream
> DEBUG: ConfigParser: delete_removed->False
> DEBUG: ConfigParser: dry_run->False
> DEBUG: ConfigParser: enable_multipart->True
> DEBUG: ConfigParser: encoding->UTF-8
> DEBUG: ConfigParser: encrypt->False
> DEBUG: ConfigParser: follow_symlinks->False
> DEBUG: ConfigParser: force->False
> DEBUG: ConfigParser: get_continue->False
> DEBUG: ConfigParser: gpg_command->/usr/bin/gpg
> DEBUG: ConfigParser: gpg_decrypt->%(gpg_command)s -d --verbose --no-use-agent
> --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s
> %(input_file)s
> DEBUG: ConfigParser: gpg_encrypt->%(gpg_command)s -c --verbose --no-use-agent
> --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s
> %(input_file)s
> DEBUG: ConfigParser: gpg_passphrase->...-3_chars...
> DEBUG: ConfigParser: guess_mime_type->True
> DEBUG: ConfigParser: host_base->ostore.athome.priv
> DEBUG: ConfigParser: access_key->5R...17_chars...Y
> DEBUG: ConfigParser: secret_key->Ij...37_chars...I
> DEBUG: ConfigParser: host_bucket->%(bucket)s.ostore.athome.priv
> DEBUG: ConfigParser: human_readable_sizes->False
> DEBUG: ConfigParser: invalidate_on_cf->False
> DEBUG: ConfigParser: list_md5->False
> DEBUG: ConfigParser: log_target_prefix->
> DEBUG: ConfigParser: mime_type->
> DEBUG: ConfigParser: multipart_chunk_size_mb->15
> DEBUG: ConfigParser: preserve_attrs->True
> DEBUG: ConfigParser: progress_meter->True
> DEBUG: ConfigParser: proxy_host->
> DEBUG: ConfigParser: proxy_port->0
> DEBUG: ConfigParser: recursive->False
> DEBUG: ConfigParser: recv_chunk->4096
> DEBUG: ConfigParser: reduced_redundancy->False
> DEBUG: ConfigParser: send_chunk->4096
> DEBUG: ConfigParser: simpledb_host->sdb.amazonaws.com
> DEBUG: ConfigParser: skip_existing->False
> DEBUG: ConfigParser: socket_timeout->300
> DEBUG: ConfigParser: urlencoding_mode->normal
> DEBUG: ConfigParser: use_https->False
> DEBUG: ConfigParser: verbosity->WARNING
> DEBUG: ConfigParser:
> website_endpoint->http://%(bucket)s.s3-website-%(location)s.amazonaws.com/
> DEBUG: ConfigParser: website_error->
> DEBUG: ConfigParser: website_index->index.html
> DEBUG: Updating Config.Config encoding -> UTF-8
> DEBUG: Updating Config.Config follow_symlinks -> False
> DEBUG: Updating Config.Config verbosity -> 10
> DEBUG: Unicodising 'mb' using UTF-8
> DEBUG: Unicodising 's3://bucket-2' using UTF-8
> DEBUG: Command: mb
> DEBUG: SignHeaders: 'PUT\n\n\n\nx-amz-date:Mon, 13 Apr 2015 03:32:23
> +\n/bucket-2/'
> DEBUG: CreateRequest: resource[uri]=/
> DEBUG: SignHeaders: 'PUT\n\n\n\nx-amz-date:Mon, 13 Apr 2015 03:32:23
> +\n/bucket-2/'
> DEBUG: Processing request, please wait...
> DEBUG: get_hostname(bucket-2): bucket-2.ostore.athome.priv
> DEBUG: format_uri(): /
> DEBUG: Sending request method_string='PUT', uri='/',
> headers={'content-length': '0', 'Authorization': 'AWS
> 5RUS0Z3SBG6IK263PLFY:3V1MdXoCGFrJKrO2LSJaBpNMcK4=', 'x-amz-date': 'Mon, 13
> Apr 2015 03:32:23 +'}, body=(0 bytes)
> DEBUG: Response: {'status': 405, 'headers': {'date': 'Mon, 13 Apr 2015
> 03:32:23 GMT', 'accept-ranges': 'bytes', 'content-type': 'application/xml',
> 'content-length': '82', 'server': 'Apache/2.4.7 (Ubuntu)'}, 'reason':
> 'Method Not Allowed', 'data': ' encoding="UTF-8"?>MethodNotAllowed'}
> DEBUG: S3Error: 405 (Method Not Allowed)
> DEBUG: HttpHeader: date: Mon, 13 Apr 2015 03:32:23 GMT
> DEBUG: HttpHeader: accept-ranges: bytes
> DEBUG: HttpHeader: content-type: application/xml
> DEBUG: HttpHeader: content-length: 82
> DEBUG: HttpHeader: server: Apache/2.4.7 (Ubuntu)
> DEBUG: ErrorXML: Code: 'MethodNotAllowed'
> ERROR: S3 error: 405 (MethodNotAllowed):
> --
> 
> But before the upgrade, the same command worked fine.
> I see nothing in the log. Here is my ceph.conf:
> 
> --

Re: [ceph-users] How to dispatch monitors in a multi-site cluster (ie in 2 datacenters)

2015-04-13 Thread Robert LeBlanc
I really like this proposal.

On Mon, Apr 13, 2015 at 2:33 AM, Joao Eduardo Luis  wrote:
> On 04/13/2015 02:25 AM, Christian Balzer wrote:
>> On Sun, 12 Apr 2015 14:37:56 -0700 Gregory Farnum wrote:
>>
>>> On Sun, Apr 12, 2015 at 1:58 PM, Francois Lafont 
>>> wrote:
 Somnath Roy wrote:

> Interesting scenario :-).. IMHO, I don't think cluster will be in
> healthy state here if the connections between dc1 and dc2 is cut. The
> reason is the following.
>
> 1. only osd.5 can talk to both data center  OSDs and other 2 mons
> will not be. So, they can't reach to an agreement (and form quorum)
> about the state of OSDs.
>
> 2. OSDs on dc1 and dc2 will not be able to talk to each other,
> considering replicas across data centers, the cluster will be broken.

 Yes, in fact, after thought, I have the first question below.

 If: (more clear with a schema is the head ;))

 1. mon.1 and mon.2 can talk together (in dc1) and can talk with
 mon.5 (via the VPN) but can't talk with mon.3 and mon.4 (in dc2)
 2. mon.3 and mon.4 can talk together (in dc2) and can talk with
 mon.5 (via the VPN) but can't talk with mon.1 and mon.2 (in dc1)
 3. mon.5 can talk with mon.1, mon.2, mon.3, mon.4 and mon.5

 is the quorum reached? If yes, which is the quorum?
>>>
>>> Yes, you should get a quorum as mon.5 will vote for one datacenter or
>>> the other. Which one it chooses will depend on which monitor has the
>>> "lowest" IP address (I think, or maybe just the monitor IDs or
>>> something? Anyway, it's a consistent ordering).
>>
>> Pet peeve alert. ^_-
>>
>> It's the lowest IP.
>
> To be more precise, it's the lowest IP:PORT combination:
>
> 10.0.1.2:6789 = rank 0
> 10.0.1.2:6790 = rank 1
> 10.0.1.3:6789 = rank 3
>
> and so on.
>
>> Which is something that really needs to be documented (better) so that
>> people can plan things accordingly and have the leader monitor wind up on
>> the best suited hardware (in case not everything is being equal).
>>
>> Other than that, the sequence of how (initial?) mons are listed in
>> ceph.conf would of course be the most natural, expected way to sort
>> monitors.
>
> I don't agree.  I find it hard to rely on ceph.conf for sensitive
> decisions like this, because we must ensure that ceph.conf is the same
> in all the nodes;  and I've seen this not being the case more often than
> not.
>
> On the other hand, I do agree that we should make it easier for people
> to specify which monitors they want in the line of succession to the
> leader, so that they can plan their clusters accordingly.  I do believe
> we can set this on the monmap, ideally once the first quorum is formed;
> something like:
>
> ceph mon rank set mon.a 0
> ceph mon rank set mon.b 2
> ceph mon rank set mon.c 1
>
> ceph mon rank list
>
>   MON   IP:PORT   RANK POLICYSTATUS
>   mon.a 10.0.1.2:6789 rank 0  [set-by-user]  leader
>   mon.c 10.0.1.3:6789 rank 1  [set-by-user]  peon
>   mon.b 10.0.1.2:6790 rank 2  [set-by-user]  down
>   mon.d 10.0.1.4:6789 rank 3  [default]  peon
>
>
> Thoughts?
>
>   -Joao
>
>>
>> Christian
>>
>>
>>> Under no circumstances
>>> whatsoever will mon.5 help each datacenter create their own quorums at
>>> the same time. The other data center will just be out of luck and
>>> unable to do anything.
>>> Although it's possible that the formed quorum won't be very stable
>>> since the out-of-quorum monitors will probably keep trying to form a
>>> quorum and that might make mon.5 unhappy. You should test what happens
>>> with that kind of net split. :)
>>> -Greg
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Binding a pool to certain OSDs

2015-04-13 Thread Giuseppe Civitella
Hi all,

I've got a Ceph cluster which serves volumes to a Cinder installation. It
runs Emperor.
I'd like to be able to replace some of the disks with OPAL disks and create
a new pool which uses exclusively the latter kind of disk. I'd like to have
a "traditional" pool and a "secure" one coexisting on the same ceph host.
I'd then use Cinder multi backend feature to serve them.
My question is: how is it possible to realize such a setup? How can I bind
a pool to certain OSDs?

Thanks
Giuseppe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] low power single disk nodes

2015-04-13 Thread Robert LeBlanc
We are getting ready to put the Quantas into production. We looked at
the Supermico Atoms (we have 6 of them), the rails were crap (they
exploded the first time you pull the server out, and they stick out of
the back of the cabinet about 8 inches, these boxes are already very
deep), we also ran out of CPU on these boxes and had limited PCI I/O).
They may work fine for really cold data. It may also work fine with
XIO and Infiniband. The Atoms still had pretty decent performance
given these limitations.

The Quantas removed some of the issues with NUMA, had much better PCI
I/O bandwidth, comes with a 10 Gb NIC on board. The biggest drawback
is that 8 drives is on a SAS controller and 4 drives are on a SATA
controller, plus SATADOM and a free port. So you have to manage two
different controller types and speeds (6Gb SAS and 3Gb SATA).

I'd say neither is perfect, but we decided on Quanta in the end.

On Mon, Apr 13, 2015 at 5:17 AM, Jerker Nyberg  wrote:
>
> Hello,
>
> Thanks for all replies! The Banana Pi could work. The built in SATA-power in
> Banana Pi can power a 2.5" SATA disk. Cool. (Not 3.5" SATA since that seem
> to require 12 V too.)
>
> I found this post from Vess Bakalov about the same subject:
> http://millibit.blogspot.se/2015/01/ceph-pi-adding-osd-and-more-performance.html
>
> For PoE I have only found Intel Galileo Gen 2 or RouterBOARD RB450G which
> are too slow and/or miss IO-expansion. (But good for signage/Xibo maybe!)
>
> I found two boxes from Quanta and SuperMicro with single socket Xeon or with
> Intel Atom (Avaton) that might be quite ok. I was only aware of the
> dual-Xeons before.
>
> http://www.quantaqct.com/Product/Servers/Rackmount-Servers/STRATOS-S100-L11SL-p151c77c70c83
> http://www.supermicro.nl/products/system/1U/5018/SSG-5018A-AR12L.cfm
>
> Kind regards,
> Jerker Nyberg
>
>
>
>
> On Thu, 9 Apr 2015, Quentin Hartman wrote:
>
>> I'm skeptical about how well this would work, but a Banana Pi might be a
>> place to start. Like a raspberry pi, but it has a SATA connector:
>> http://www.bananapi.org/
>>
>> On Thu, Apr 9, 2015 at 3:18 AM, Jerker Nyberg  wrote:
>>
>>>
>>> Hello ceph users,
>>>
>>> Is anyone running any low powered single disk nodes with Ceph now?
>>> Calxeda
>>> seems to be no more according to Wikipedia. I do not think HP moonshot is
>>> what I am looking for - I want stand-alone nodes, not server cartridges
>>> integrated into server chassis. And I do not want to be locked to a
>>> single
>>> vendor.
>>>
>>> I was playing with Raspberry Pi 2 for signage when I thought of my old
>>> experiments with Ceph.
>>>
>>> I am thinking of for example Odroid-C1 or Odroid-XU3 Lite or maybe
>>> something with a low-power Intel x64/x86 processor. Together with one SSD
>>> or one low power HDD the node could get all power via PoE (via splitter
>>> or
>>> integrated into board if such boards exist). PoE provide remote power-on
>>> power-off even for consumer grade nodes.
>>>
>>> The cost for a single low power node should be able to compete with
>>> traditional PC-servers price per disk. Ceph take care of redundancy.
>>>
>>> I think simple custom casing should be good enough - maybe just strap or
>>> velcro everything on trays in the rack, at least for the nodes with SSD.
>>>
>>> Kind regards,
>>> --
>>> Jerker Nyberg, Uppsala, Sweden.
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados Gateway and keystone

2015-04-13 Thread Erik McCormick
I haven't really used the S3 stuff much, but the credentials should be in
keystone already. If you're in horizon, you can download them under Access
and Security->API Access. Using the CLI you can use the openstack client
like "openstack credential " or with
the keystone client like "keystone ec2-credentials-list", etc.  Then you
should be able to feed those credentials to the rgw like a normal S3 API
call.

Cheers,
Erik

On Mon, Apr 13, 2015 at 10:16 AM,  wrote:

> Hi all,
>
> Coming back to that issue.
>
> I successfully used keystone users for the rados gateway and the swift API
> but I still don't understand how it can work with S3 API and i.e. S3 users
> (AccessKey/SecretKey)
>
> I found a swift3 initiative but I think It's only compliant in a pure
> OpenStack swift environment  by setting up a specific plug-in.
> https://github.com/stackforge/swift3
>
> A rgw can be, at the same, time under keystone control and  standard
> radosgw-admin if
> - for swift, you use the right authentication service (keystone or
> internal)
> - for S3, you use the internal authentication service
>
> So, my questions are still valid.
> How can a rgw work for S3 users if there are stored in keystone? Which is
> the accesskey and secretkey?
> What is the purpose of "rgw s3 auth use keystone" parameter ?
>
> Best regards
>
> --
> De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de
> ghislain.cheval...@orange.com
> Envoyé : lundi 23 mars 2015 14:03
> À : ceph-users
> Objet : [ceph-users] Rados Gateway and keystone
>
> Hi All,
>
> I just would to be sure about keystone configuration for Rados Gateway.
>
> I read the documentation http://ceph.com/docs/master/radosgw/keystone/
> and http://ceph.com/docs/master/radosgw/config-ref/?highlight=keystone
> but I didn't catch if after having configured the rados gateway
> (ceph.conf) in order to use keystone, it becomes mandatory to create all
> the users in it.
>
> In other words, can a rgw be, at the same, time under keystone control
> and  standard radosgw-admin ?
> How does it work for S3 users ?
> What is the purpose of "rgw s3 auth use keystone" parameter ?
>
> Best regards
>
> - - - - - - - - - - - - - - - - -
> Ghislain Chevalier
> +33299124432
> +33788624370
> ghislain.cheval...@orange.com
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
> recu ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been
> modified, changed or falsified.
> Thank you.
>
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
> recu ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been
> modified, changed or falsified.
> Thank you.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] low power single disk nodes

2015-04-13 Thread Nick Fisk
I went for something similar to the Quantas boxes but 4 stacked in 1x 4U box

http://www.supermicro.nl/products/system/4U/F617/SYS-F617H6-FTPT_.cfm

When you do the maths, even something like a banana pi + disk starts costing
a similar amount and you get so much more for your money in temrs of
processing power, NIC bandwidth...etc


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Robert LeBlanc
> Sent: 13 April 2015 17:27
> To: Jerker Nyberg
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] low power single disk nodes
> 
> We are getting ready to put the Quantas into production. We looked at the
> Supermico Atoms (we have 6 of them), the rails were crap (they exploded
> the first time you pull the server out, and they stick out of the back of
the
> cabinet about 8 inches, these boxes are already very deep), we also ran
out
> of CPU on these boxes and had limited PCI I/O).
> They may work fine for really cold data. It may also work fine with XIO
and
> Infiniband. The Atoms still had pretty decent performance given these
> limitations.
> 
> The Quantas removed some of the issues with NUMA, had much better PCI
> I/O bandwidth, comes with a 10 Gb NIC on board. The biggest drawback is
> that 8 drives is on a SAS controller and 4 drives are on a SATA
controller, plus
> SATADOM and a free port. So you have to manage two different controller
> types and speeds (6Gb SAS and 3Gb SATA).
> 
> I'd say neither is perfect, but we decided on Quanta in the end.
> 
> On Mon, Apr 13, 2015 at 5:17 AM, Jerker Nyberg 
> wrote:
> >
> > Hello,
> >
> > Thanks for all replies! The Banana Pi could work. The built in
> > SATA-power in Banana Pi can power a 2.5" SATA disk. Cool. (Not 3.5"
> > SATA since that seem to require 12 V too.)
> >
> > I found this post from Vess Bakalov about the same subject:
> > http://millibit.blogspot.se/2015/01/ceph-pi-adding-osd-and-more-perfor
> > mance.html
> >
> > For PoE I have only found Intel Galileo Gen 2 or RouterBOARD RB450G
> > which are too slow and/or miss IO-expansion. (But good for
> > signage/Xibo maybe!)
> >
> > I found two boxes from Quanta and SuperMicro with single socket Xeon
> > or with Intel Atom (Avaton) that might be quite ok. I was only aware
> > of the dual-Xeons before.
> >
> > http://www.quantaqct.com/Product/Servers/Rackmount-
> Servers/STRATOS-S10
> > 0-L11SL-p151c77c70c83
> > http://www.supermicro.nl/products/system/1U/5018/SSG-5018A-
> AR12L.cfm
> >
> > Kind regards,
> > Jerker Nyberg
> >
> >
> >
> >
> > On Thu, 9 Apr 2015, Quentin Hartman wrote:
> >
> >> I'm skeptical about how well this would work, but a Banana Pi might
> >> be a place to start. Like a raspberry pi, but it has a SATA connector:
> >> http://www.bananapi.org/
> >>
> >> On Thu, Apr 9, 2015 at 3:18 AM, Jerker Nyberg 
> wrote:
> >>
> >>>
> >>> Hello ceph users,
> >>>
> >>> Is anyone running any low powered single disk nodes with Ceph now?
> >>> Calxeda
> >>> seems to be no more according to Wikipedia. I do not think HP
> >>> moonshot is what I am looking for - I want stand-alone nodes, not
> >>> server cartridges integrated into server chassis. And I do not want
> >>> to be locked to a single vendor.
> >>>
> >>> I was playing with Raspberry Pi 2 for signage when I thought of my
> >>> old experiments with Ceph.
> >>>
> >>> I am thinking of for example Odroid-C1 or Odroid-XU3 Lite or maybe
> >>> something with a low-power Intel x64/x86 processor. Together with
> >>> one SSD or one low power HDD the node could get all power via PoE
> >>> (via splitter or integrated into board if such boards exist). PoE
> >>> provide remote power-on power-off even for consumer grade nodes.
> >>>
> >>> The cost for a single low power node should be able to compete with
> >>> traditional PC-servers price per disk. Ceph take care of redundancy.
> >>>
> >>> I think simple custom casing should be good enough - maybe just
> >>> strap or velcro everything on trays in the rack, at least for the
nodes with
> SSD.
> >>>
> >>> Kind regards,
> >>> --
> >>> Jerker Nyberg, Uppsala, Sweden.
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-13 Thread Robert LeBlanc
For us, using two 40Gb ports with VLANs is redundancy enough. We are
doing LACP over two different switches.

On Mon, Apr 13, 2015 at 3:03 AM, Götz Reinicke - IT Koordinator
 wrote:
> Dear ceph users,
>
> we are planing a ceph storage cluster from scratch. Might be up to 1 PB
> within the next 3 years, multiple buildings, new network infrastructure
> for the cluster etc.
>
> I had some excellent trainings on ceph, so the essential fundamentals
> are familiar to me, and I know our goals/dreams can be reached. :)
>
> There is just "one tiny piece" in the design I'm currently unsure about :)
>
> Ceph follows some sort of keep it small and simple, e.g. dont use raid
> controllers, use more boxes and disks, fast network etc.
>
> So from our current design we plan 40Gb Storage and Client LAN.
>
> Would you suggest to connect the OSD nodes redundant to both networks?
> That would end up with 4 * 40Gb ports in each box, two Switches to
> connect to.
>
> I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for "high" io
> pools. (+ currently SSD for journal, but may be until we start, levelDB,
> rocksDB are ready ... ?)
>
> Later some less io bound pools for data archiving/backup. (bigger and
> more Disks per node)
>
> We would also do some Cache tiering for some pools.
>
> From HP, Intel, Supermicron etc reference documentations, they use
> usually non-redundant network connection. (single 10Gb)
>
> I know: redundancy keeps some headaches small, but also adds some more
> complexity and increases the budget. (add network adapters, other
> server, more switches, etc)
>
> So what would you suggest, what are your experiences?
>
> Thanks for any suggestion and feedback . Regards . Götz
> --
> Götz Reinicke
> IT-Koordinator
>
> Tel. +49 7141 969 82 420
> E-Mail goetz.reini...@filmakademie.de
>
> Filmakademie Baden-Württemberg GmbH
> Akademiehof 10
> 71638 Ludwigsburg
> www.filmakademie.de
>
> Eintragung Amtsgericht Stuttgart HRB 205016
>
> Vorsitzender des Aufsichtsrats: Jürgen Walter MdL
> Staatssekretär im Ministerium für Wissenschaft,
> Forschung und Kunst Baden-Württemberg
>
> Geschäftsführer: Prof. Thomas Schadt
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] low power single disk nodes

2015-04-13 Thread Robert LeBlanc
We also got one of those too. I think the cabling on the front and
limited I/O options deterred us, otherwise, I really liked that box
too.

On Mon, Apr 13, 2015 at 10:34 AM, Nick Fisk  wrote:
> I went for something similar to the Quantas boxes but 4 stacked in 1x 4U box
>
> http://www.supermicro.nl/products/system/4U/F617/SYS-F617H6-FTPT_.cfm
>
> When you do the maths, even something like a banana pi + disk starts costing
> a similar amount and you get so much more for your money in temrs of
> processing power, NIC bandwidth...etc
>
>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Robert LeBlanc
>> Sent: 13 April 2015 17:27
>> To: Jerker Nyberg
>> Cc: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] low power single disk nodes
>>
>> We are getting ready to put the Quantas into production. We looked at the
>> Supermico Atoms (we have 6 of them), the rails were crap (they exploded
>> the first time you pull the server out, and they stick out of the back of
> the
>> cabinet about 8 inches, these boxes are already very deep), we also ran
> out
>> of CPU on these boxes and had limited PCI I/O).
>> They may work fine for really cold data. It may also work fine with XIO
> and
>> Infiniband. The Atoms still had pretty decent performance given these
>> limitations.
>>
>> The Quantas removed some of the issues with NUMA, had much better PCI
>> I/O bandwidth, comes with a 10 Gb NIC on board. The biggest drawback is
>> that 8 drives is on a SAS controller and 4 drives are on a SATA
> controller, plus
>> SATADOM and a free port. So you have to manage two different controller
>> types and speeds (6Gb SAS and 3Gb SATA).
>>
>> I'd say neither is perfect, but we decided on Quanta in the end.
>>
>> On Mon, Apr 13, 2015 at 5:17 AM, Jerker Nyberg 
>> wrote:
>> >
>> > Hello,
>> >
>> > Thanks for all replies! The Banana Pi could work. The built in
>> > SATA-power in Banana Pi can power a 2.5" SATA disk. Cool. (Not 3.5"
>> > SATA since that seem to require 12 V too.)
>> >
>> > I found this post from Vess Bakalov about the same subject:
>> > http://millibit.blogspot.se/2015/01/ceph-pi-adding-osd-and-more-perfor
>> > mance.html
>> >
>> > For PoE I have only found Intel Galileo Gen 2 or RouterBOARD RB450G
>> > which are too slow and/or miss IO-expansion. (But good for
>> > signage/Xibo maybe!)
>> >
>> > I found two boxes from Quanta and SuperMicro with single socket Xeon
>> > or with Intel Atom (Avaton) that might be quite ok. I was only aware
>> > of the dual-Xeons before.
>> >
>> > http://www.quantaqct.com/Product/Servers/Rackmount-
>> Servers/STRATOS-S10
>> > 0-L11SL-p151c77c70c83
>> > http://www.supermicro.nl/products/system/1U/5018/SSG-5018A-
>> AR12L.cfm
>> >
>> > Kind regards,
>> > Jerker Nyberg
>> >
>> >
>> >
>> >
>> > On Thu, 9 Apr 2015, Quentin Hartman wrote:
>> >
>> >> I'm skeptical about how well this would work, but a Banana Pi might
>> >> be a place to start. Like a raspberry pi, but it has a SATA connector:
>> >> http://www.bananapi.org/
>> >>
>> >> On Thu, Apr 9, 2015 at 3:18 AM, Jerker Nyberg 
>> wrote:
>> >>
>> >>>
>> >>> Hello ceph users,
>> >>>
>> >>> Is anyone running any low powered single disk nodes with Ceph now?
>> >>> Calxeda
>> >>> seems to be no more according to Wikipedia. I do not think HP
>> >>> moonshot is what I am looking for - I want stand-alone nodes, not
>> >>> server cartridges integrated into server chassis. And I do not want
>> >>> to be locked to a single vendor.
>> >>>
>> >>> I was playing with Raspberry Pi 2 for signage when I thought of my
>> >>> old experiments with Ceph.
>> >>>
>> >>> I am thinking of for example Odroid-C1 or Odroid-XU3 Lite or maybe
>> >>> something with a low-power Intel x64/x86 processor. Together with
>> >>> one SSD or one low power HDD the node could get all power via PoE
>> >>> (via splitter or integrated into board if such boards exist). PoE
>> >>> provide remote power-on power-off even for consumer grade nodes.
>> >>>
>> >>> The cost for a single low power node should be able to compete with
>> >>> traditional PC-servers price per disk. Ceph take care of redundancy.
>> >>>
>> >>> I think simple custom casing should be good enough - maybe just
>> >>> strap or velcro everything on trays in the rack, at least for the
> nodes with
>> SSD.
>> >>>
>> >>> Kind regards,
>> >>> --
>> >>> Jerker Nyberg, Uppsala, Sweden.
>> >>> ___
>> >>> ceph-users mailing list
>> >>> ceph-users@lists.ceph.com
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>
>> >>
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
___
ceph-us

Re: [ceph-users] low power single disk nodes

2015-04-13 Thread Mark Nelson
We have the single-socket version of this chassis with 4 nodes in our 
test lab.  E3-1240v2 CPU with 10 spinners for OSDs, 2 DC S3700s, a 250GB 
spinner for OS, 10GbE, and a SAS2308 HBA + on-board SATA.  They work 
well but were oddly a little slow for sequential reads from what I 
remember.  Overall not bad though and I think a very reasonable 
solution, especially if you want smaller clusters while maintaining 
similar (actually slightly better) drive density vs the 36 drive 
chassis.  They weren't quite able to saturate a 10GbE link for writes 
(about 700MB/s including OSD->OSD replica writes if I recall). Close 
enough that you won't feel like you are wasting the 10GbE.  Gives them a 
bit of room to grow too as Ceph performance improves.


Mark

On 04/13/2015 11:34 AM, Nick Fisk wrote:

I went for something similar to the Quantas boxes but 4 stacked in 1x 4U box

http://www.supermicro.nl/products/system/4U/F617/SYS-F617H6-FTPT_.cfm

When you do the maths, even something like a banana pi + disk starts costing
a similar amount and you get so much more for your money in temrs of
processing power, NIC bandwidth...etc



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Robert LeBlanc
Sent: 13 April 2015 17:27
To: Jerker Nyberg
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] low power single disk nodes

We are getting ready to put the Quantas into production. We looked at the
Supermico Atoms (we have 6 of them), the rails were crap (they exploded
the first time you pull the server out, and they stick out of the back of

the

cabinet about 8 inches, these boxes are already very deep), we also ran

out

of CPU on these boxes and had limited PCI I/O).
They may work fine for really cold data. It may also work fine with XIO

and

Infiniband. The Atoms still had pretty decent performance given these
limitations.

The Quantas removed some of the issues with NUMA, had much better PCI
I/O bandwidth, comes with a 10 Gb NIC on board. The biggest drawback is
that 8 drives is on a SAS controller and 4 drives are on a SATA

controller, plus

SATADOM and a free port. So you have to manage two different controller
types and speeds (6Gb SAS and 3Gb SATA).

I'd say neither is perfect, but we decided on Quanta in the end.

On Mon, Apr 13, 2015 at 5:17 AM, Jerker Nyberg 
wrote:


Hello,

Thanks for all replies! The Banana Pi could work. The built in
SATA-power in Banana Pi can power a 2.5" SATA disk. Cool. (Not 3.5"
SATA since that seem to require 12 V too.)

I found this post from Vess Bakalov about the same subject:
http://millibit.blogspot.se/2015/01/ceph-pi-adding-osd-and-more-perfor
mance.html

For PoE I have only found Intel Galileo Gen 2 or RouterBOARD RB450G
which are too slow and/or miss IO-expansion. (But good for
signage/Xibo maybe!)

I found two boxes from Quanta and SuperMicro with single socket Xeon
or with Intel Atom (Avaton) that might be quite ok. I was only aware
of the dual-Xeons before.

http://www.quantaqct.com/Product/Servers/Rackmount-

Servers/STRATOS-S10

0-L11SL-p151c77c70c83
http://www.supermicro.nl/products/system/1U/5018/SSG-5018A-

AR12L.cfm


Kind regards,
Jerker Nyberg




On Thu, 9 Apr 2015, Quentin Hartman wrote:


I'm skeptical about how well this would work, but a Banana Pi might
be a place to start. Like a raspberry pi, but it has a SATA connector:
http://www.bananapi.org/

On Thu, Apr 9, 2015 at 3:18 AM, Jerker Nyberg 

wrote:




Hello ceph users,

Is anyone running any low powered single disk nodes with Ceph now?
Calxeda
seems to be no more according to Wikipedia. I do not think HP
moonshot is what I am looking for - I want stand-alone nodes, not
server cartridges integrated into server chassis. And I do not want
to be locked to a single vendor.

I was playing with Raspberry Pi 2 for signage when I thought of my
old experiments with Ceph.

I am thinking of for example Odroid-C1 or Odroid-XU3 Lite or maybe
something with a low-power Intel x64/x86 processor. Together with
one SSD or one low power HDD the node could get all power via PoE
(via splitter or integrated into board if such boards exist). PoE
provide remote power-on power-off even for consumer grade nodes.

The cost for a single low power node should be able to compete with
traditional PC-servers price per disk. Ceph take care of redundancy.

I think simple custom casing should be good enough - maybe just
strap or velcro everything on trays in the rack, at least for the

nodes with

SSD.


Kind regards,
--
Jerker Nyberg, Uppsala, Sweden.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/

Re: [ceph-users] low power single disk nodes

2015-04-13 Thread Nick Fisk
Hi Mark,

We added the 2x PCI Drive slot converter, so managed to squeeze 12 OSD's + 2
journals in each tray.

We did look at the E3 based nodes but as our first adventure into Ceph we
were unsure if the single CPU would have enough grunt. Going forward, now we
have some performance data, we might re think this.

Nick

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Mark Nelson
> Sent: 13 April 2015 17:53
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] low power single disk nodes
> 
> We have the single-socket version of this chassis with 4 nodes in our test
lab.
> E3-1240v2 CPU with 10 spinners for OSDs, 2 DC S3700s, a 250GB spinner for
> OS, 10GbE, and a SAS2308 HBA + on-board SATA.  They work well but were
> oddly a little slow for sequential reads from what I remember.  Overall
not
> bad though and I think a very reasonable solution, especially if you want
> smaller clusters while maintaining similar (actually slightly better)
drive
> density vs the 36 drive chassis.  They weren't quite able to saturate a
10GbE
> link for writes (about 700MB/s including OSD->OSD replica writes if I
recall).
> Close enough that you won't feel like you are wasting the 10GbE.  Gives
> them a bit of room to grow too as Ceph performance improves.
> 
> Mark
> 
> On 04/13/2015 11:34 AM, Nick Fisk wrote:
> > I went for something similar to the Quantas boxes but 4 stacked in 1x
> > 4U box
> >
> > http://www.supermicro.nl/products/system/4U/F617/SYS-F617H6-
> FTPT_.cfm
> >
> > When you do the maths, even something like a banana pi + disk starts
> > costing a similar amount and you get so much more for your money in
> > temrs of processing power, NIC bandwidth...etc
> >
> >
> >> -Original Message-
> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> >> Of Robert LeBlanc
> >> Sent: 13 April 2015 17:27
> >> To: Jerker Nyberg
> >> Cc: ceph-users@lists.ceph.com
> >> Subject: Re: [ceph-users] low power single disk nodes
> >>
> >> We are getting ready to put the Quantas into production. We looked at
> >> the Supermico Atoms (we have 6 of them), the rails were crap (they
> >> exploded the first time you pull the server out, and they stick out
> >> of the back of
> > the
> >> cabinet about 8 inches, these boxes are already very deep), we also
> >> ran
> > out
> >> of CPU on these boxes and had limited PCI I/O).
> >> They may work fine for really cold data. It may also work fine with
> >> XIO
> > and
> >> Infiniband. The Atoms still had pretty decent performance given these
> >> limitations.
> >>
> >> The Quantas removed some of the issues with NUMA, had much better
> PCI
> >> I/O bandwidth, comes with a 10 Gb NIC on board. The biggest drawback
> >> is that 8 drives is on a SAS controller and 4 drives are on a SATA
> > controller, plus
> >> SATADOM and a free port. So you have to manage two different
> >> controller types and speeds (6Gb SAS and 3Gb SATA).
> >>
> >> I'd say neither is perfect, but we decided on Quanta in the end.
> >>
> >> On Mon, Apr 13, 2015 at 5:17 AM, Jerker Nyberg 
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> Thanks for all replies! The Banana Pi could work. The built in
> >>> SATA-power in Banana Pi can power a 2.5" SATA disk. Cool. (Not 3.5"
> >>> SATA since that seem to require 12 V too.)
> >>>
> >>> I found this post from Vess Bakalov about the same subject:
> >>> http://millibit.blogspot.se/2015/01/ceph-pi-adding-osd-and-more-perf
> >>> or
> >>> mance.html
> >>>
> >>> For PoE I have only found Intel Galileo Gen 2 or RouterBOARD RB450G
> >>> which are too slow and/or miss IO-expansion. (But good for
> >>> signage/Xibo maybe!)
> >>>
> >>> I found two boxes from Quanta and SuperMicro with single socket Xeon
> >>> or with Intel Atom (Avaton) that might be quite ok. I was only aware
> >>> of the dual-Xeons before.
> >>>
> >>> http://www.quantaqct.com/Product/Servers/Rackmount-
> >> Servers/STRATOS-S10
> >>> 0-L11SL-p151c77c70c83
> >>> http://www.supermicro.nl/products/system/1U/5018/SSG-5018A-
> >> AR12L.cfm
> >>>
> >>> Kind regards,
> >>> Jerker Nyberg
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, 9 Apr 2015, Quentin Hartman wrote:
> >>>
>  I'm skeptical about how well this would work, but a Banana Pi might
>  be a place to start. Like a raspberry pi, but it has a SATA
connector:
>  http://www.bananapi.org/
> 
>  On Thu, Apr 9, 2015 at 3:18 AM, Jerker Nyberg 
> >> wrote:
> 
> >
> > Hello ceph users,
> >
> > Is anyone running any low powered single disk nodes with Ceph now?
> > Calxeda
> > seems to be no more according to Wikipedia. I do not think HP
> > moonshot is what I am looking for - I want stand-alone nodes, not
> > server cartridges integrated into server chassis. And I do not
> > want to be locked to a single vendor.
> >
> > I was playing with Raspberry Pi 2 for signage when I thought of my
> > old experiments with Ceph.
> >
> > I am think

Re: [ceph-users] rbd: incorrect metadata

2015-04-13 Thread Matthew Monaco
On 04/13/2015 07:51 AM, Jason Dillaman wrote:
> Can you add "debug rbd = 20" to your config, re-run the rbd command, and
> paste a link to the generated client log file?
> 

I set both rbd and rados log to 20

VOL=volume-61241645-e20d-4fe8-9ce3-c161c3d34d55
SNAP="$VOL"@snapshot-d535f359-503a-4eaf-9e71-48aa35b28d0c

rbd -p volumes children "$SNAP"   &> http://hastebin.com/lalajofoqu
rbd -p volumes snap unprotect "$SNAP" &> http://hastebin.com/vibonepeme

There are other errors in there about not looking up $SNAP's parent and name for
pool id 4 (which usage to be "images"). What happened was "images" had way too
many PGs, so I flattened all of the rbds in "volumes", copied over to a new
"images" pool and then deleted the old one.

When I flattened the volumes, the snapshots remained associated with the
original parent (is this a bug/limitation/expected...?). I didn't mind just
deleting the snapshots bc I assumed they were borked. However with this
particular snapshot "$SNAP", this thing happened where the metadata broke and I
can't unprotected it.

> The rbd_children and rbd_directory objects store state as omap key/values,
> not as actual binary data within the object.  You can use "rados -p rbd
> listomapvals rbd_directory/rbd_children" to see the data within the files.
> 

Ah, thanks for the info. I guess I need to use rmomapkey on rbd_children.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD hard crash on kernel 3.10

2015-04-13 Thread Shawn Edwards
Here's a vmcore, along with log files from Xen's crash dump utility.

https://drive.google.com/file/d/0Bz8b7ZiWX00AeHRhMjNvdVNLdDQ/view?usp=sharing

Let me know if we can help more.

On Fri, Apr 10, 2015 at 1:04 PM Ilya Dryomov  wrote:

> On Fri, Apr 10, 2015 at 8:03 PM, Shawn Edwards 
> wrote:
> > I took the rbd and ceph drivers out of the patched kernel above and
> merged
> > them into Xen's kernel.  Works as well as the old one; still crashes.
> But
> > now I get logs.  From the Xen logs:
> >
> > [   1128.217561]ERR:
> > Assertion failure in rbd_img_obj_callback() at line 2363:
> >
> > rbd_assert(more ^ (which == img_request->obj_request_count));
>
> Ah, that's a long standing bug which we know wasn't properly fixed -
> a tight race in rbd completion callback.  It looks like it doesn't take
> long for you to reproduce it.  Can you try grabbing a vmcore such that
> it can be inspected with crash utility?
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] norecover and nobackfill

2015-04-13 Thread Robert LeBlanc
I'm looking for documentation about what exactly each of these do and
I can't find it. Can someone point me in the right direction?

The names seem too ambiguous to come to any conclusion about what
exactly they do.

Thanks,
Robert
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd: incorrect metadata

2015-04-13 Thread Jason Dillaman
Yes, when you flatten an image, the snapshots will remain associated to the 
original parent.  This is a side-effect from how librbd handles CoW with 
clones.  There is an open RBD feature request to add support for flattening 
snapshots as well.  


-- 

Jason Dillaman 
Red Hat 
dilla...@redhat.com 
http://www.redhat.com 


- Original Message -
From: "Matthew Monaco" 
To: "Jason Dillaman" 
Cc: ceph-users@lists.ceph.com
Sent: Monday, April 13, 2015 2:50:17 PM
Subject: Re: [ceph-users] rbd: incorrect metadata

On 04/13/2015 07:51 AM, Jason Dillaman wrote:
> Can you add "debug rbd = 20" to your config, re-run the rbd command, and
> paste a link to the generated client log file?
> 

I set both rbd and rados log to 20

VOL=volume-61241645-e20d-4fe8-9ce3-c161c3d34d55
SNAP="$VOL"@snapshot-d535f359-503a-4eaf-9e71-48aa35b28d0c

rbd -p volumes children "$SNAP"   &> http://hastebin.com/lalajofoqu
rbd -p volumes snap unprotect "$SNAP" &> http://hastebin.com/vibonepeme

There are other errors in there about not looking up $SNAP's parent and name for
pool id 4 (which usage to be "images"). What happened was "images" had way too
many PGs, so I flattened all of the rbds in "volumes", copied over to a new
"images" pool and then deleted the old one.

When I flattened the volumes, the snapshots remained associated with the
original parent (is this a bug/limitation/expected...?). I didn't mind just
deleting the snapshots bc I assumed they were borked. However with this
particular snapshot "$SNAP", this thing happened where the metadata broke and I
can't unprotected it.

> The rbd_children and rbd_directory objects store state as omap key/values,
> not as actual binary data within the object.  You can use "rados -p rbd
> listomapvals rbd_directory/rbd_children" to see the data within the files.
> 

Ah, thanks for the info. I guess I need to use rmomapkey on rbd_children.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD hard crash on kernel 3.10

2015-04-13 Thread Ilya Dryomov
On Mon, Apr 13, 2015 at 10:18 PM, Shawn Edwards  wrote:
> Here's a vmcore, along with log files from Xen's crash dump utility.
>
> https://drive.google.com/file/d/0Bz8b7ZiWX00AeHRhMjNvdVNLdDQ/view?usp=sharing
>
> Let me know if we can help more.
>
> On Fri, Apr 10, 2015 at 1:04 PM Ilya Dryomov  wrote:
>>
>> On Fri, Apr 10, 2015 at 8:03 PM, Shawn Edwards 
>> wrote:
>> > I took the rbd and ceph drivers out of the patched kernel above and
>> > merged
>> > them into Xen's kernel.  Works as well as the old one; still crashes.
>> > But
>> > now I get logs.  From the Xen logs:
>> >
>> > [   1128.217561]ERR:
>> > Assertion failure in rbd_img_obj_callback() at line 2363:
>> >
>> > rbd_assert(more ^ (which == img_request->obj_request_count));
>>
>> Ah, that's a long standing bug which we know wasn't properly fixed -
>> a tight race in rbd completion callback.  It looks like it doesn't take
>> long for you to reproduce it.  Can you try grabbing a vmcore such that
>> it can be inspected with crash utility?

On a closer inspection, that looks like a simple error handling bug.  The out
of memory splat before the assert sets ->result to -ENOMEM and the logic in
rbd_img_obj_callback() just fails to handle it.  I'll fix it later this week.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] norecover and nobackfill

2015-04-13 Thread Robert LeBlanc
After doing some testing, I'm a bit confused even more.

What I'm trying to achieve is minimal data movement when I have to service
a node to replace a failed drive. Since these nodes don't have hot-swap
bays, I'll need to power down the box to replace the failed drive. I don't
want Ceph to shuffle data until the new drive comes up and is ready.

My thought was to set norecover nobackfill, take down the host, replace the
drive, start the host, remove the old OSD from the cluster, ceph-disk
prepare the new disk then unset norecover nobackfill.

However in my testing with a 4 node cluster ( v.94.0 10 OSDs each,
replication 3, min_size 2, chooselead_fristn host), if I take down a host
I/O becomes blocked even though only one copy should be taken down and
still satisfies min_size. When I unset norecover, then I/O proceeds and
some backfill activity happens. At some point the backfill stops and
everything seems to be "happy" in the degraded state.

I'm really interested to know what is going on with "norecover" as the
cluster seems to break if it is set. Unsetting the "norecover" flag causes
some degraded objects to recover, but not all. Writing to new blocks in an
RBD causes the number of degraded objects to increase, but works just fine
otherwise. Here is an example after taking down one host and removing the
OSDs from the CRUSH map (I'm reformatting all the drives in the host
currently).

# ceph status
cluster 146c4fe8-7c85-46dc-b8b3-69072d658287
 health HEALTH_WARN
1345 pgs backfill
10 pgs backfilling
2016 pgs degraded
661 pgs recovery_wait
2016 pgs stuck degraded
2016 pgs stuck unclean
1356 pgs stuck undersized
1356 pgs undersized
recovery 40642/167785 objects degraded (24.223%)
recovery 31481/167785 objects misplaced (18.763%)
too many PGs per OSD (665 > max 300)
nobackfill flag(s) set
 monmap e5: 3 mons at {nodea=
10.8.6.227:6789/0,nodeb=10.8.6.228:6789/0,nodec=10.8.6.229:6789/0}
election epoch 2576, quorum 0,1,2 nodea,nodeb,nodec
 osdmap e59031: 30 osds: 30 up, 30 in; 1356 remapped pgs
flags nobackfill
  pgmap v4723208: 6656 pgs, 4 pools, 330 GB data, 53235 objects
863 GB used, 55000 GB / 55863 GB avail
40642/167785 objects degraded (24.223%)
31481/167785 objects misplaced (18.763%)
4640 active+clean
1345 active+undersized+degraded+remapped+wait_backfill
 660 active+recovery_wait+degraded
  10 active+undersized+degraded+remapped+backfilling
   1 active+recovery_wait+undersized+degraded+remapped
  client io 1864 kB/s rd, 8853 kB/s wr, 65 op/s

Any help understanding these flags would be very helpful.

Thanks,
Robert

On Mon, Apr 13, 2015 at 1:40 PM, Robert LeBlanc 
wrote:

> I'm looking for documentation about what exactly each of these do and
> I can't find it. Can someone point me in the right direction?
>
> The names seem too ambiguous to come to any conclusion about what
> exactly they do.
>
> Thanks,
> Robert
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd: incorrect metadata

2015-04-13 Thread Matthew Monaco
On 04/13/2015 03:17 PM, Jason Dillaman wrote:
> Yes, when you flatten an image, the snapshots will remain associated to the 
> original parent.  This is a side-effect from how librbd handles CoW with 
> clones.  There is an open RBD feature request to add support for flattening 
> snapshots as well.  
> 
> 

So, I see which key/val pairs to remove. But I'm hesitant because I don't want
to make a mistake. The docs for rados_omap_get_next() say that key is
NULL-terminated. However looking at the hex for listomapvals rbd_children I see:

key: (34 bytes):
 : 03 00 00 00 00 00 00 00 0e 00 00 00 63 65 31 62 : ce1b
0010 : 33 33 34 35 64 65 66 64 32 30 74 00 00 00 00 00 : 3345defd20t.
0020 : 00 00   : ..

value: (22 bytes) :
 : 01 00 00 00 0e 00 00 00 31 63 63 30 35 61 31 33 : 1cc05a13
0010 : 62 31 61 65 66 32   : b1aef2

What am I missing?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Purpose of the s3gw.fcgi script?

2015-04-13 Thread Francois Lafont
Hi,

Yehuda Sadeh-Weinraub wrote:

> You're not missing anything. The script was only needed when we used
> the process manager of the fastcgi module, but it has been very long
> since we stopped using it.

Just to be sure, so if I understand well, these parts of the documentation:

1. 
http://docs.ceph.com/docs/master/radosgw/config/#create-a-cgi-wrapper-script
2. 
http://docs.ceph.com/docs/master/radosgw/config/#adjust-cgi-wrapper-script-permission

can be completely skipped. Is it correct?

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Purpose of the s3gw.fcgi script?

2015-04-13 Thread Yehuda Sadeh-Weinraub


- Original Message -
> From: "Francois Lafont" 
> To: ceph-users@lists.ceph.com
> Sent: Monday, April 13, 2015 5:17:47 PM
> Subject: Re: [ceph-users] Purpose of the s3gw.fcgi script?
> 
> Hi,
> 
> Yehuda Sadeh-Weinraub wrote:
> 
> > You're not missing anything. The script was only needed when we used
> > the process manager of the fastcgi module, but it has been very long
> > since we stopped using it.
> 
> Just to be sure, so if I understand well, these parts of the documentation:
> 
> 1.
> 
> http://docs.ceph.com/docs/master/radosgw/config/#create-a-cgi-wrapper-script
> 2.
> 
> http://docs.ceph.com/docs/master/radosgw/config/#adjust-cgi-wrapper-script-permission
> 
> can be completely skipped. Is it correct?
> 

Yes.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] norecover and nobackfill

2015-04-13 Thread Francois Lafont
Hi,

Robert LeBlanc wrote:

> What I'm trying to achieve is minimal data movement when I have to service
> a node to replace a failed drive. [...]

I will perhaps say something stupid but it seems to me that it's the
goal of the "noout" flag, isn't it?

1. ceph osd set noout
2. an old OSD disk failed, no rebalancing of data because noout is set, the 
cluster is just degraded.
3. You remove of the cluster the OSD daemon which used the old disk.
4. You power off the host and replace the old disk by a new disk and you 
restart the host.
5. You create a new OSD on the new disk.

With these steps, there will be no movement of data. Only during the step 5
where the data will be recreated in the new disk (but it's normal and desired).

Sorry in advance if there is something I'm missing in your problem.
Regards.


-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket

2015-04-13 Thread Francois Lafont
Hi,

Yehuda Sadeh-Weinraub wrote:

> The 405 in this case usually means that rgw failed to translate the http 
> hostname header into
> a bucket name. Do you have 'rgw dns name' set correctly? 

Ah, I have found and indeed it concerned "rgw dns name" as also Karan thought. 
;)
But it's a little curious. Explanations:

My s3cmd client use these hostnames (which are well resolved with the IP address
of the radosgw host):

.ostore.athome.priv

And in the configuration of my radosgw, I had:

---
[client.radosgw.gw1]
  host= ceph-radosgw1
  rgw dns name= ostore
  ...
---

ie just the *short* name of the radosgw's fqdn (its fqdn is ostore.athome.priv).
And with Firefly, it worked well, I never had problem with this configuration!
But with Hammer, it doesn't work anymore (I don't know why). Now, with Hammer,
I just notice that I have to put the fqdn in "rgw dns name" not the short name:

---
[client.radosgw.gw1]
  host= ceph-radosgw1
  rgw dns name= ostore.athome.priv
  ...
---

And with this configuration, it works.

Is it normal? In fact, maybe my configuration with the short name (instead of 
the
fqdn) was not valid and I just was lucky it work well so far. Is it the good 
conclusion
of the story?

In fact, I think I never have well understood the meaning of the "rgw dns name"
parameter. Can you confirm to me (or not) this: 

This parameter is *only* used when a S3 client accesses to a bucket with
the method http://.. If we don't set this
parameter, such access will not work and a S3 client could access to a
bucket only with the method http:///

Is it correct?

Thx Yehuda and thx to Karan (who has pointed the real problem in fact ;)).

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd performance problem on kernel 3.13.6 and 3.18.11

2015-04-13 Thread yangruifeng.09...@h3c.com
Hi all!

I am testing rbd performance based on kernel rbd dirver, when I compared the 
result of the kernel 3.13.6 with 3.18.11, my head gets so confused.

look at the result, down by a third.


3.13.6 IOPS

3.18.11 IOPS

4KB seq read

97169

23714

4KB seq write

10110

3177

4KB rand read

7589

4565

4KB rand write

10497

2307



thanks for any help!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to dispatch monitors in a multi-site cluster (ie in 2 datacenters)

2015-04-13 Thread Francois Lafont
Joao Eduardo wrote:

> To be more precise, it's the lowest IP:PORT combination:
> 
> 10.0.1.2:6789 = rank 0
> 10.0.1.2:6790 = rank 1
> 10.0.1.3:6789 = rank 3
> 
> and so on.

Ok, so if there is 2 possible quorum, the quorum with the
lowest IP:PORT will be chosen. But what happens if, in the
2 possible quorum, quorum A and quorum B, the monitor which
has the lowest IP:PORT belongs to quorum A and quorum B?

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ERROR: missing keyring, cannot use cephx for authentication

2015-04-13 Thread oyym...@gmail.com
UUID strings  in the file /etc/fstab refer to devices.  you can retrieve UUID 
using command blkid, for instance:
# blkid /dev/sda1
/dev/sda1: UUID="1bcb1cbb-abd2-4cfa-bf89-1726ea6cd2fa" TYPE="xfs" 



oyym...@gmail.com
 
From: Jesus Chavez (jeschave)
Date: 2015-04-14 07:03
To: oyym...@gmail.com
CC: ceph-users
Subject: Re: [ceph-users] ERROR: missing keyring, cannot use cephx for 
authentication
How did you get the UUIDs without mounting the osds?
Thanks


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433

On Apr 10, 2015, at 11:47 PM, "oyym...@gmail.com"  wrote:

In my cluster,  mount osd devices manually in /etc/fstab, for instances:
vi /etc/fstab
# ...
UUID=98c87892-21cd-481b-9619-1a4a65620d79 /var/lib/ceph/osd/ceph-0 xfs 
rw,noatime,inode64,logbsize=256k,delaylog 0 3 
UUID=d56aeb7b-6160-4872-91dc-346b39019081 /var/lib/ceph/osd/ceph-1 xfs 
rw,noatime,inode64,logbsize=256k,delaylog 0 3 
UUID=db9782a6-b661-46b5-b993-a4d96bdc37d8 /var/lib/ceph/osd/ceph-2 xfs 
rw,noatime,inode64,logbsize=256k,delaylog 0 3 
UUID=ec95aa19-127d-410b-86a3-d68ee6baff2e /var/lib/ceph/osd/ceph-3 xfs 
rw,noatime,inode64,logbsize=256k,delaylog 0 3 
UUID=44694045-d9a3-4df4-948c-a77696d22406 /var/lib/ceph/osd/ceph-4 xfs 
rw,noatime,inode64,logbsize=256k,delaylog 0 3

mount -a
mount
/dev/sda12 on /var/lib/ceph/osd/ceph-0 type xfs 
(rw,noatime,seclabel,attr2,inode64,logbsize=256k,noquota) 
/dev/sda13 on /var/lib/ceph/osd/ceph-1 type xfs 
(rw,noatime,seclabel,attr2,inode64,logbsize=256k,noquota) 
/dev/sdb1 on /var/lib/ceph/osd/ceph-2 type xfs 
(rw,noatime,seclabel,attr2,inode64,logbsize=256k,noquota) 
/dev/sdb2 on /var/lib/ceph/osd/ceph-3 type xfs 
(rw,noatime,seclabel,attr2,inode64,logbsize=256k,noquota) 
/dev/sdb3 on /var/lib/ceph/osd/ceph-4 type xfs 
(rw,noatime,seclabel,attr2,inode64,logbsize=256k,noquota)

reboot

ceph osd tree 
# id weight type name up/down reweight 
-1 22.75 root default 
-2 4.55 host storage1 
0 0.91 osd.0 up 1 
1 0.91 osd.1 up 1 
2 0.91 osd.2 up 1 
3 0.91 osd.3 up 1 
4 0.91 osd.4 up 1 
-3 4.55 host storage2 
5 0.91 osd.5 up 1 
6 0.91 osd.6 up 1 
7 0.91 osd.7 up 1 
8 0.91 osd.8 up 1 
9 0.91 osd.9 up 1 
-4 4.55 host lkl-storage3 
10 0.91 osd.10 up 1 
11 0.91 osd.11 up 1 
12 0.91 osd.12 up 1 
13 0.91 osd.13 up 1 
14 0.91 osd.14 up 1 
-5 4.55 host storage4 
15 0.91 osd.15 up 1 
16 0.91 osd.16 up 1 
17 0.91 osd.17 up 1 
18 0.91 osd.18 up 1 
19 0.91 osd.19 up 1 
-6 4.55 host storage5 
20 0.91 osd.20 up 1 
21 0.91 osd.21 up 1 
22 0.91 osd.22 up 1 
23 0.91 osd.23 up 1 
24 0.91 osd.24 up 1



oyym...@gmail.com
 
From: Jesus Chavez (jeschave)
Date: 2015-04-10 22:17
To: oyym...@gmail.com
Subject: Re: [ceph-users] ERROR: missing keyring, cannot use cephx for 
authentication
Hi there, just to be sure if you could figure how to fix this problem? 

Thanks


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433

Cisco.com


 



  Think before you print.
This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
Please click here for Company Registration Information.




On Mar 27, 2015, at 8:05 AM, Jesus Chavez (jeschave)  wrote:

Did you know how to solve it? it is really weird…  

Thanks!


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433

Cisco.com


 



  Think before you print.
This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
Please click here for Company Registration Information.




On Mar 25, 2015, at 1:31 AM, oyym...@gmail.com wrote:

Hi,Jesus
I encountered similar problem.
1. shut down one of nodes, but all osds can't reactive on the node after reboot.
2. run service ceph restart  manually, got the same error message:
[root@storage4 ~]# /etc/init.d/ceph start 
=== osd.15 === 
2015-03-23 14:43:32.399811 7fed0fcf4700 -1 monclient(hunting): ERROR: missing 
keyring, cannot use cephx for authentication 
2015-03-23 14:43:32.399814 7fed0fcf4700 0 librados: osd.15 initialization error 
(2) No such file or directory 
Error connecting to cluster: ObjectNotFound 
failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.15 
--keyring=/var/lib/ceph/osd/ceph-15/keyring osd crush create-or-move -- 15 0.19 
host=storage4 root=default' 
..
3.  ll /var/lib/ceph/osd/ceph-15/ 
total 0 

all files disappeared in the /va