[ceph-users] Question about "osd objectstore = keyvaluestore-dev" setting

2014-05-22 Thread Geert Lindemulder

Hello All

Trying to implement the osd leveldb backend at an existing ceph test 
cluster.

The test cluster was updated from 0.72.1 to 0.80.1. The update was ok.
After the update, the "osd objectstore = keyvaluestore-dev" setting was 
added to ceph.conf.

After restarting an osd it gives the following error:
2014-05-22 12:28:06.805290 7f2e7d9de800 -1 KeyValueStore::mount : stale 
version stamp 3. Please run the KeyValueStore update script before 
starting the OSD, or set keyvaluestore_update_to to 1


How can the "keyvaluestore_update_to" parameter be set or where can i 
find the "KeyValueStore update script"



Thanks,
Geert

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to find the disk partitions attached to a OSD

2014-05-22 Thread John Spray
On Thu, May 22, 2014 at 10:57 AM, Sharmila Govind
 wrote:
> root@cephnode4:/mnt/ceph/osd2# mount |grep ceph
> /dev/sdc on /mnt/ceph/osd3 type ext4 (rw)
> /dev/sdb on /mnt/ceph/osd2 type ext4 (rw)
>
> All the above commands just pointed out the mount points(/mnt/ceph/osd3),
> the folders were named by me as ceph/osd. But, if a new user has to get the
> osd mapping to the mounted devices, would be difficult if we named the osd
> disk folders differently. Any other command which could give the mapping
> would be useful.

It really depends on how you have set up the OSDs.  If you're using
ceph-deploy or ceph-disk to partition and format the drives, they get
a special partition type set which marks them as a Ceph OSD.  On a
system set up that way, you get nice uniform output like this:

# ceph-disk list
/dev/sda :
 /dev/sda1 other, ext4, mounted on /boot
 /dev/sda2 other, LVM2_member
/dev/sdb :
 /dev/sdb1 ceph data, active, cluster ceph, osd.0, journal /dev/sdb2
 /dev/sdb2 ceph journal, for /dev/sdb1
/dev/sdc :
 /dev/sdc1 ceph data, active, cluster ceph, osd.3, journal /dev/sdc2
 /dev/sdc2 ceph journal, for /dev/sdc1

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] recommendations for erasure coded pools and profile question

2014-05-22 Thread Kenneth Waegeman

Hi,

How can we apply the recommendations of the number of placement groups  
onto erasure-coded pools?


(OSDs * 100)
Total PGs = 
  Replicas

Shoudl we set replica = 1, or should it be set against some EC parameters?


Also a question about the EC profiles.
I know you can show them with 'ceph osd erasure-code-profile ls',
get or set parameters with 'ceph osd erasure-code-profile get/set',  
and create a pool with it with 'ceph osd create ecpool pg pgn erasure  
. But can you also list which pool has which profile?


Thanks!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy mon create-initial

2014-05-22 Thread Alfredo Deza
Why are you using "--overwrite-conf" ? Have you deployed the monitor
and attempted to re-deploy it again?

It could be that if that is the case you are getting into an
inconsistent state and monitors are having issues. For example, you
could have stale keyrings and authentication would not work.

Try with a brand new host or, if you don't care about the data, do
`ceph-deploy purge ceph-node1 && ceph-deploy purgedata ceph-node1 &&
ceph-deploy install ceph-node1`
and then try create-initial again.

create-initial is the best way to do this, and it already gives very
good/useful output!

On Thu, May 22, 2014 at 6:56 AM, Mārtiņš Jakubovičs  wrote:
> Unfortunately upgrade to 0.80 didn't help. Same error.
>
> In monitor node I checked file /etc/ceph/ceph.client.admin.keyring and it
> didn't exist. Should it exist?
> Maybe I can manually perform some actions in monitor node to generate key's?
>
>
> On 2014.05.22. 13:00, Wido den Hollander wrote:
>>
>> On 05/22/2014 11:54 AM, Mārtiņš Jakubovičs wrote:
>>>
>>> Hello,
>>>
>>> Thanks for such fast response.
>>>
>>> Warning still persist:
>>>
>>> http://pastebin.com/QnciHG6v
>>>
>>
>> Hmm, that's weird.
>>
>>> I didn't mention it, but admin and monitoring nodes are Ubuntu 14.04
>>> x64, ceph-deploy 1.4 and ceph 0.79.
>>>
>>
>> Why aren't you trying with Ceph 0.80 Firefly? I'd recommend you try that.
>>
>> The monitor should still have generated the client.admin keyring, but
>> that's something different.
>>
>> Wido
>>
>>> On 2014.05.22. 12:50, Wido den Hollander wrote:

 On 05/22/2014 11:46 AM, Mārtiņš Jakubovičs wrote:
>
> Hello,
>
> I follow this guide
> 
> and stuck in item 4.
>
> Add the initial monitor(s) and gather the keys (new
> inceph-deployv1.1.3).
>
> ceph-deploy mon create-initial
>
> For example:
>
> ceph-deploy mon create-initial
>
> If I perform this action I got warning messages and didn't receive any
> key's.
>
> http://pastebin.com/g21CNPyY
>
> How I can solve this issue?
>

 Sometimes it takes a couple of seconds for the keys to be generated.

 What if you run this afterwards:

 $ ceph-deploy gatherkeys ceph-node1

 Wido

> Thanks.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to find the disk partitions attached to a OSD

2014-05-22 Thread Alfredo Deza
Hopefully I am not late to the party :)

But ceph-deploy recently gained a `osd list` subcommand that does this
plus a bunch of other interesting metadata:

$ ceph-deploy osd list node1
[ceph_deploy.conf][DEBUG ] found configuration file at:
/Users/alfredo/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.2):
/Users/alfredo/.virtualenvs/ceph-deploy/bin/ceph-deploy osd list node1
[node1][DEBUG ] connected to host: node1
[node1][DEBUG ] detect platform information from remote host
[node1][DEBUG ] detect machine type
[node1][INFO  ] Running command: sudo ceph --cluster=ceph osd tree --format=json
[node1][DEBUG ] connected to host: node1
[node1][DEBUG ] detect platform information from remote host
[node1][DEBUG ] detect machine type
[node1][INFO  ] Running command: sudo ceph-disk list
[node1][INFO  ] 
[node1][INFO  ] ceph-0
[node1][INFO  ] 
[node1][INFO  ] Path   /var/lib/ceph/osd/ceph-0
[node1][INFO  ] ID 0
[node1][INFO  ] Name   osd.0
[node1][INFO  ] Status up
[node1][INFO  ] Reweight   1.00
[node1][INFO  ] Magic  ceph osd volume v026
[node1][INFO  ] Journal_uuid   214a6865-416b-4c09-b031-a354d4f8bdff
[node1][INFO  ] Active ok
[node1][INFO  ] Device /dev/sdb1
[node1][INFO  ] Whoami 0
[node1][INFO  ] Journal path   /dev/sdb2
[node1][INFO  ] 

On Thu, May 22, 2014 at 8:30 AM, John Spray  wrote:
> On Thu, May 22, 2014 at 10:57 AM, Sharmila Govind
>  wrote:
>> root@cephnode4:/mnt/ceph/osd2# mount |grep ceph
>> /dev/sdc on /mnt/ceph/osd3 type ext4 (rw)
>> /dev/sdb on /mnt/ceph/osd2 type ext4 (rw)
>>
>> All the above commands just pointed out the mount points(/mnt/ceph/osd3),
>> the folders were named by me as ceph/osd. But, if a new user has to get the
>> osd mapping to the mounted devices, would be difficult if we named the osd
>> disk folders differently. Any other command which could give the mapping
>> would be useful.
>
> It really depends on how you have set up the OSDs.  If you're using
> ceph-deploy or ceph-disk to partition and format the drives, they get
> a special partition type set which marks them as a Ceph OSD.  On a
> system set up that way, you get nice uniform output like this:
>
> # ceph-disk list
> /dev/sda :
>  /dev/sda1 other, ext4, mounted on /boot
>  /dev/sda2 other, LVM2_member
> /dev/sdb :
>  /dev/sdb1 ceph data, active, cluster ceph, osd.0, journal /dev/sdb2
>  /dev/sdb2 ceph journal, for /dev/sdb1
> /dev/sdc :
>  /dev/sdc1 ceph data, active, cluster ceph, osd.3, journal /dev/sdc2
>  /dev/sdc2 ceph journal, for /dev/sdc1
>
> John
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Radosgw Timeout

2014-05-22 Thread Georg Höllrigl

Hello List,

Using the radosgw works fine, as long as the amount of data doesn't get 
too big.


I have created one bucket that holds many small files, separated into 
different "directories". But whenever I try to acess the bucket, I only 
run into some timeout. The timeout is at around 30 - 100 seconds. This 
is smaller then the Apache timeout of 300 seconds.


I've tried to access the bucket with different clients - one thing is 
s3cmd - which still is able to upload things, but takes rather long 
time, when listing the contents.

Then I've  tried with s3fs-fuse - which throws
ls: reading directory .: Input/output error

Also Cyberduck and S3Browser show a similar behaivor.

Is there an option, to only send back maybe 1000 list entries, like 
Amazon das? So that the client might decide, if he want's to list all 
the contents?


Are there any timeout values in radosgw?

Any further thoughts, how I would increase performance on these listings?


Kind Regards,
Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy mon create-initial

2014-05-22 Thread Mārtiņš Jakubovičs

Yes, indeed, I created new cluster in host ceph-node3 and all works great.
I tested with data purge on ceph-node1 but it didn't work anyway...

Badly that I can't find where was problem.

Thanks!

On 2014.05.22. 16:05, Alfredo Deza wrote:

Why are you using "--overwrite-conf" ? Have you deployed the monitor
and attempted to re-deploy it again?

It could be that if that is the case you are getting into an
inconsistent state and monitors are having issues. For example, you
could have stale keyrings and authentication would not work.

Try with a brand new host or, if you don't care about the data, do
`ceph-deploy purge ceph-node1 && ceph-deploy purgedata ceph-node1 &&
ceph-deploy install ceph-node1`
and then try create-initial again.

create-initial is the best way to do this, and it already gives very
good/useful output!

On Thu, May 22, 2014 at 6:56 AM, Mārtiņš Jakubovičs  wrote:

Unfortunately upgrade to 0.80 didn't help. Same error.

In monitor node I checked file /etc/ceph/ceph.client.admin.keyring and it
didn't exist. Should it exist?
Maybe I can manually perform some actions in monitor node to generate key's?


On 2014.05.22. 13:00, Wido den Hollander wrote:

On 05/22/2014 11:54 AM, Mārtiņš Jakubovičs wrote:

Hello,

Thanks for such fast response.

Warning still persist:

http://pastebin.com/QnciHG6v


Hmm, that's weird.


I didn't mention it, but admin and monitoring nodes are Ubuntu 14.04
x64, ceph-deploy 1.4 and ceph 0.79.


Why aren't you trying with Ceph 0.80 Firefly? I'd recommend you try that.

The monitor should still have generated the client.admin keyring, but
that's something different.

Wido


On 2014.05.22. 12:50, Wido den Hollander wrote:

On 05/22/2014 11:46 AM, Mārtiņš Jakubovičs wrote:

Hello,

I follow this guide

and stuck in item 4.

 Add the initial monitor(s) and gather the keys (new
 inceph-deployv1.1.3).

 ceph-deploy mon create-initial

 For example:

 ceph-deploy mon create-initial

If I perform this action I got warning messages and didn't receive any
key's.

http://pastebin.com/g21CNPyY

How I can solve this issue?


Sometimes it takes a couple of seconds for the keys to be generated.

What if you run this afterwards:

$ ceph-deploy gatherkeys ceph-node1

Wido


Thanks.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy mon create-initial

2014-05-22 Thread Sergey Motovilovets
Hello there.

I had the same issue when mon_initial_members (in my case it was 1 node)
resolved to different IP then mon_host was set to in ceph.conf.


2014-05-22 16:05 GMT+03:00 Alfredo Deza :

> Why are you using "--overwrite-conf" ? Have you deployed the monitor
> and attempted to re-deploy it again?
>
> It could be that if that is the case you are getting into an
> inconsistent state and monitors are having issues. For example, you
> could have stale keyrings and authentication would not work.
>
> Try with a brand new host or, if you don't care about the data, do
> `ceph-deploy purge ceph-node1 && ceph-deploy purgedata ceph-node1 &&
> ceph-deploy install ceph-node1`
> and then try create-initial again.
>
> create-initial is the best way to do this, and it already gives very
> good/useful output!
>
> On Thu, May 22, 2014 at 6:56 AM, Mārtiņš Jakubovičs 
> wrote:
> > Unfortunately upgrade to 0.80 didn't help. Same error.
> >
> > In monitor node I checked file /etc/ceph/ceph.client.admin.keyring and it
> > didn't exist. Should it exist?
> > Maybe I can manually perform some actions in monitor node to generate
> key's?
> >
> >
> > On 2014.05.22. 13:00, Wido den Hollander wrote:
> >>
> >> On 05/22/2014 11:54 AM, Mārtiņš Jakubovičs wrote:
> >>>
> >>> Hello,
> >>>
> >>> Thanks for such fast response.
> >>>
> >>> Warning still persist:
> >>>
> >>> http://pastebin.com/QnciHG6v
> >>>
> >>
> >> Hmm, that's weird.
> >>
> >>> I didn't mention it, but admin and monitoring nodes are Ubuntu 14.04
> >>> x64, ceph-deploy 1.4 and ceph 0.79.
> >>>
> >>
> >> Why aren't you trying with Ceph 0.80 Firefly? I'd recommend you try
> that.
> >>
> >> The monitor should still have generated the client.admin keyring, but
> >> that's something different.
> >>
> >> Wido
> >>
> >>> On 2014.05.22. 12:50, Wido den Hollander wrote:
> 
>  On 05/22/2014 11:46 AM, Mārtiņš Jakubovičs wrote:
> >
> > Hello,
> >
> > I follow this guide
> > <
> http://ceph.com/docs/master/start/quick-ceph-deploy/#create-a-cluster>
> > and stuck in item 4.
> >
> > Add the initial monitor(s) and gather the keys (new
> > inceph-deployv1.1.3).
> >
> > ceph-deploy mon create-initial
> >
> > For example:
> >
> > ceph-deploy mon create-initial
> >
> > If I perform this action I got warning messages and didn't receive
> any
> > key's.
> >
> > http://pastebin.com/g21CNPyY
> >
> > How I can solve this issue?
> >
> 
>  Sometimes it takes a couple of seconds for the keys to be generated.
> 
>  What if you run this afterwards:
> 
>  $ ceph-deploy gatherkeys ceph-node1
> 
>  Wido
> 
> > Thanks.
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> 
> >>>
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >>
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw Timeout

2014-05-22 Thread Yehuda Sadeh
On Thu, May 22, 2014 at 6:16 AM, Georg Höllrigl
 wrote:
> Hello List,
>
> Using the radosgw works fine, as long as the amount of data doesn't get too
> big.
>
> I have created one bucket that holds many small files, separated into
> different "directories". But whenever I try to acess the bucket, I only run
> into some timeout. The timeout is at around 30 - 100 seconds. This is
> smaller then the Apache timeout of 300 seconds.
>
> I've tried to access the bucket with different clients - one thing is s3cmd
> - which still is able to upload things, but takes rather long time, when
> listing the contents.
> Then I've  tried with s3fs-fuse - which throws
> ls: reading directory .: Input/output error
>
> Also Cyberduck and S3Browser show a similar behaivor.
>
> Is there an option, to only send back maybe 1000 list entries, like Amazon
> das? So that the client might decide, if he want's to list all the contents?


That how it works, it doesn't return more than 1000 entries at once.

>
> Are there any timeout values in radosgw?

Are you sure the timeout is in the gateway itself? Could be apache
that is timing out. Will need to see the apache access logs for these
operations, radosgw debug and messenger logs (debug rgw = 20, debug ms
= 1), to give a better answer.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Journal SSD durability

2014-05-22 Thread Simon Ironside

Hi,

Just to revisit this one last time . . .

Is the issue only with the SandForce SF-2281 in the Kingston E50? Or are 
all SandForce controllers considered dodgy, including the SF-2582 in the 
Kingston E100 and a few other manufacturer's enterprise SSDs?


Thanks,
Simon.

On 16/05/14 22:30, Carlos M. Perez wrote:

Unfortunately, the Seagate Pro 600 has been discontinued, 
http://comms.seagate.com/servlet/servlet.FileDownload?file=00P300JHLCCEA5.  
The replacement is the 1200 series which are more 2x the price but have a SAS 
12gbps interface.  You can still find the 600's out there at around $300/drive. 
 Still a very good price based on specs and backed by the reviews.

The Kingston E100's have a DWPD rating of 11 at the 100/200GB capacity, and similar 
specs to the S3700's (400GB), but more expensive per GB & PBW than the intel 
S3700, so I'd probably stick with the S3700s.

Carlos M. Perez
CMP Consulting Services
305-669-1515


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Simon Ironside
Sent: Friday, May 16, 2014 4:08 PM
To: Christian Balzer
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Journal SSD durability

On 16/05/14 16:34, Christian Balzer wrote:

Thanks for bringing that to my attention.
It looks very good until one gets to the Sandforce controller in the specs.

As in, if you're OK with occasional massive spikes in latency, go for
it (same for the Intel 530).
If you prefer consistent perfomance, avoid.


Cool, that saves me from burning £100 unnecessarily. Thanks.
I've one more suggestion before I just buy an Intel DC S3500 . . .

Seagate 600 Pro 100GB
520/300 Sequential Read/Write
80k/20k Random 4k Read/Write IOPS
Power Loss Protection
280/650TB endurance (two figures, weird, but both high) 5yr warranty and
not a bad price

http://www.seagate.com/www-content/product-content/ssd-fam/600-pro-
ssd/en-gb/docs/600-pro-ssd-data-sheet-ds1790-3-1310gb.pdf

It's not a SandForce controller :) It's a LAMD LM87800.

Cheers,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about "osd objectstore = keyvaluestore-dev" setting

2014-05-22 Thread Gregory Farnum
On Thu, May 22, 2014 at 5:04 AM, Geert Lindemulder  wrote:
> Hello All
>
> Trying to implement the osd leveldb backend at an existing ceph test
> cluster.
> The test cluster was updated from 0.72.1 to 0.80.1. The update was ok.
> After the update, the "osd objectstore = keyvaluestore-dev" setting was
> added to ceph.conf.

Does that mean you tried to switch to the KeyValueStore on one of your
existing OSDs? That isn't going to work; you'll need to create new
ones (or knock out old ones and recreate them with it).

> After restarting an osd it gives the following error:
> 2014-05-22 12:28:06.805290 7f2e7d9de800 -1 KeyValueStore::mount : stale
> version stamp 3. Please run the KeyValueStore update script before starting
> the OSD, or set keyvaluestore_update_to to 1
>
> How can the "keyvaluestore_update_to" parameter be set or where can i find
> the "KeyValueStore update script"

Hmm, it looks like that config value isn't actually plugged in to the
KeyValueStore, so you can't set it with the stock binaries. Maybe
Haomai has an idea?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Feature request: stable naming for external journals

2014-05-22 Thread Scott Laird
I recently created a few OSDs with journals on a partitioned SSD.  Example:

$ ceph-deploy osd prepare v2:sde:sda8

It worked fine at first, but after rebooting, the new OSD failed to start.
 I discovered that the journal drive had been renamed from /dev/sda to
/dev/sdc, so the journal symlink in /var/lib/ceph/osd/ceph-XX no longer
pointed to the correct block device.

I have a couple requests/suggestions:

1.  Make this clearer in the logs.  I've seen at least a couple cases where
a simple "Unable to open journal" message would have saved me a bunch of
time.

2.  Consider some method of generating more stable journal names under the
hood.  I'm using /dev/disk/by-id/... under Ubuntu, but that's probably not
generally portable.  I've been tempted to put a filesystem on my journal
devices, mount it by UUID, and then symlink to a file on the mounted
device.  It's not as fast, but at least it'd have a stable name.

(This was caused by adding an SSD and then moving / onto it; during the
reboots needed for migrating /, drive ordering changed several times.  It
probably wouldn't have happened if I'd started with hardware bought new and
dedicated to Ceph)


Scott
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw Timeout

2014-05-22 Thread Craig Lewis

On 5/22/14 06:16 , Georg Höllrigl wrote:


I have created one bucket that holds many small files, separated into 
different "directories". But whenever I try to acess the bucket, I 
only run into some timeout. The timeout is at around 30 - 100 seconds. 
This is smaller then the Apache timeout of 300 seconds.


Just so we're all talking about the same things, what does "many small 
files" mean to you?  Also, how are you separating them into 
"directories"?  Are you just giving files in the same "directory" the 
same leading string, like "dir1_subdir1_filename"?


I'm putting about 1M objects, random sizes, in each bucket.  I'm not 
having problems getting individual files, or uploading new ones.  It 
does take a long time for s3cmd to list the contents of the bucket. The 
only time I get timeouts is when my cluster is very unhealthy.


If you're doing a lot more than that, say 10M or 100M objects, then that 
could cause a hot spot on disk.  You might be better off taking your 
"directories", and putting them in their own bucket.



--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] full osd ssd cluster advise : replication 2x or 3x ?

2014-05-22 Thread Alexandre DERUMIER
Hi,

I'm looking to build a full osd ssd cluster, with this config:

6 nodes,

each node 10 osd/ ssd drives (dual 10gbit network).  (1journal + datas on each 
osd)

ssd drive will be entreprise grade, 

maybe intel sc3500 800GB (well known ssd)

or new Samsung SSD PM853T 960GB (don't have too much info about it for the 
moment, but price seem a little bit lower than intel)


I would like to have some advise on replication level,


Maybe somebody have experience with intel sc3500 failure rate ?
How many chance to have 2 failing disks on 2 differents nodes at the same time 
(murphy's law ;).


I think in case of disk failure, pgs should replicate fast with 10gbits links.


So the question is:

2x or 3x ?


Regards,

Alexandre___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Expanding pg's of an erasure coded pool

2014-05-22 Thread Gregory Farnum
On Thu, May 22, 2014 at 4:09 AM, Kenneth Waegeman
 wrote:
>
> - Message from Gregory Farnum  -
>Date: Wed, 21 May 2014 15:46:17 -0700
>
>From: Gregory Farnum 
> Subject: Re: [ceph-users] Expanding pg's of an erasure coded pool
>  To: Kenneth Waegeman 
>  Cc: ceph-users 
>
>
>> On Wed, May 21, 2014 at 3:52 AM, Kenneth Waegeman
>>  wrote:
>>>
>>> Thanks! I increased the max processes parameter for all daemons quite a
>>> lot
>>> (until ulimit -u 3802720)
>>>
>>> These are the limits for the daemons now..
>>> [root@ ~]# cat /proc/17006/limits
>>> Limit Soft Limit   Hard Limit   Units
>>> Max cpu time  unlimitedunlimited
>>> seconds
>>> Max file size unlimitedunlimitedbytes
>>> Max data size unlimitedunlimitedbytes
>>> Max stack size10485760 unlimitedbytes
>>> Max core file sizeunlimitedunlimitedbytes
>>> Max resident set  unlimitedunlimitedbytes
>>> Max processes 3802720  3802720
>>> processes
>>> Max open files3276832768files
>>> Max locked memory 6553665536bytes
>>> Max address space unlimitedunlimitedbytes
>>> Max file locksunlimitedunlimitedlocks
>>> Max pending signals   9506895068
>>> signals
>>> Max msgqueue size 819200   819200   bytes
>>> Max nice priority 00
>>> Max realtime priority 00
>>> Max realtime timeout  unlimitedunlimitedus
>>>
>>> But this didn't help. Are there other parameters I should change?
>>
>>
>> Hrm, is it exactly the same stack trace? You might need to bump the
>> open files limit as well, although I'd be surprised. :/
>
>
> I increased the open file limit as test to 128000, still the same results.
>
> Stack trace:



> But I see some things happening on the system while doing this too:
>
>
>
> [root@ ~]# ceph osd pool set ecdata15 pgp_num 4096
> set pool 16 pgp_num to 4096
> [root@ ~]# ceph status
> Traceback (most recent call last):
>   File "/usr/bin/ceph", line 830, in 
> sys.exit(main())
>   File "/usr/bin/ceph", line 590, in main
> conffile=conffile)
>   File "/usr/lib/python2.6/site-packages/rados.py", line 198, in __init__
> librados_path = find_library('rados')
>   File "/usr/lib64/python2.6/ctypes/util.py", line 209, in find_library
> return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
>   File "/usr/lib64/python2.6/ctypes/util.py", line 203, in
> _findSoname_ldconfig
> os.popen('LANG=C /sbin/ldconfig -p 2>/dev/null').read())
> OSError: [Errno 12] Cannot allocate memory
> [root@ ~]# lsof | wc
> -bash: fork: Cannot allocate memory
> [root@ ~]# lsof | wc
>   21801  211209 3230028
> [root@ ~]# ceph status
> ^CError connecting to cluster: InterruptedOrTimeoutError
> ^[[A[root@ ~]# lsof | wc
>2028   17476  190947
>
>
>
> And meanwhile the daemons has then been crashed.
>
> I verified the memory never ran out.

Is there anything in dmesg? It sure looks like the OS thinks it's run
out of memory one way or another.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Expanding pg's of an erasure coded pool

2014-05-22 Thread Henrik Korkuc
On 2014.05.22 19:55, Gregory Farnum wrote:
> On Thu, May 22, 2014 at 4:09 AM, Kenneth Waegeman
>  wrote:
>> - Message from Gregory Farnum  -
>>Date: Wed, 21 May 2014 15:46:17 -0700
>>
>>From: Gregory Farnum 
>> Subject: Re: [ceph-users] Expanding pg's of an erasure coded pool
>>  To: Kenneth Waegeman 
>>  Cc: ceph-users 
>>
>>
>>> On Wed, May 21, 2014 at 3:52 AM, Kenneth Waegeman
>>>  wrote:
 Thanks! I increased the max processes parameter for all daemons quite a
 lot
 (until ulimit -u 3802720)

 These are the limits for the daemons now..
 [root@ ~]# cat /proc/17006/limits
 Limit Soft Limit   Hard Limit   Units
 Max cpu time  unlimitedunlimited
 seconds
 Max file size unlimitedunlimitedbytes
 Max data size unlimitedunlimitedbytes
 Max stack size10485760 unlimitedbytes
 Max core file sizeunlimitedunlimitedbytes
 Max resident set  unlimitedunlimitedbytes
 Max processes 3802720  3802720
 processes
 Max open files3276832768files
 Max locked memory 6553665536bytes
 Max address space unlimitedunlimitedbytes
 Max file locksunlimitedunlimitedlocks
 Max pending signals   9506895068
 signals
 Max msgqueue size 819200   819200   bytes
 Max nice priority 00
 Max realtime priority 00
 Max realtime timeout  unlimitedunlimitedus

 But this didn't help. Are there other parameters I should change?
>>>
>>> Hrm, is it exactly the same stack trace? You might need to bump the
>>> open files limit as well, although I'd be surprised. :/
>>
>> I increased the open file limit as test to 128000, still the same results.
>>
>> Stack trace:
> 
>
>> But I see some things happening on the system while doing this too:
>>
>>
>>
>> [root@ ~]# ceph osd pool set ecdata15 pgp_num 4096
>> set pool 16 pgp_num to 4096
>> [root@ ~]# ceph status
>> Traceback (most recent call last):
>>   File "/usr/bin/ceph", line 830, in 
>> sys.exit(main())
>>   File "/usr/bin/ceph", line 590, in main
>> conffile=conffile)
>>   File "/usr/lib/python2.6/site-packages/rados.py", line 198, in __init__
>> librados_path = find_library('rados')
>>   File "/usr/lib64/python2.6/ctypes/util.py", line 209, in find_library
>> return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
>>   File "/usr/lib64/python2.6/ctypes/util.py", line 203, in
>> _findSoname_ldconfig
>> os.popen('LANG=C /sbin/ldconfig -p 2>/dev/null').read())
>> OSError: [Errno 12] Cannot allocate memory
>> [root@ ~]# lsof | wc
>> -bash: fork: Cannot allocate memory
>> [root@ ~]# lsof | wc
>>   21801  211209 3230028
>> [root@ ~]# ceph status
>> ^CError connecting to cluster: InterruptedOrTimeoutError
>> ^[[A[root@ ~]# lsof | wc
>>2028   17476  190947
>>
>>
>>
>> And meanwhile the daemons has then been crashed.
>>
>> I verified the memory never ran out.
> Is there anything in dmesg? It sure looks like the OS thinks it's run
> out of memory one way or another.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

May it be related to memory fragmentation?
http://dom.as/2014/01/17/on-swapping-and-kernels/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] slow requests

2014-05-22 Thread Győrvári Gábor

Hello,

Got this kind of logs in two node of 3 node cluster both node has 2 OSD, 
only affected 2 OSD on two separate node thats why i dont understand the 
situation. There wasnt any extra io on the system at the given time.


Using radosgw with s3 api to store objects under ceph average ops around 
20-150 and bw usage 100-2000kb read / sec and only 50-1000kb / sec written.


A few lines from the log (this is OSD.5 and more log entries on OSD.2)
2014-05-22 18:41:03.725702 7ff85cc07700  0 log [WRN] : 1 slow requests, 
1 included below; oldest blocked for > 30.093011 secs
2014-05-22 18:41:03.725721 7ff85cc07700  0 log [WRN] : slow request 
30.093011 seconds old, received at 2014-05-22 18:40:33.628042: 
osd_op(client.7821.0:67251068 
default.4181.1_products/800x600/537e28022fdcc.jpg [cmpxattr 
user.rgw.idtag (22) op 1 mode 1,setxattr user.rgw.idtag (33),call 
refcount.put] 11.fe53a6fb e590) v4 currently waiting for subops from [2]
2014-05-22 18:41:33.730590 7ff85cc07700  0 log [WRN] : 1 slow requests, 
1 included below; oldest blocked for > 60.102500 secs
2014-05-22 18:41:33.730602 7ff85cc07700  0 log [WRN] : slow request 
60.102500 seconds old, received at 2014-05-22 18:40:33.628042: 
osd_op(client.7821.0:67251068 
default.4181.1_products/800x600/537e28022fdcc.jpg [cmpxattr 
user.rgw.idtag (22) op 1 mode 1,setxattr user.rgw.idtag (33),call 
refcount.put] 11.fe53a6fb e590) v4 currently waiting for subops from [2]
2014-05-22 18:41:34.730785 7ff85cc07700  0 log [WRN] : 2 slow requests, 
1 included below; oldest blocked for > 61.102703 secs
2014-05-22 18:41:34.730805 7ff85cc07700  0 log [WRN] : slow request 
30.113226 seconds old, received at 2014-05-22 18:41:04.617519: 
osd_op(client.7821.0:67251426 
default.4181.1_products/800x600/537e28022fdcc.jpg [getxattrs,stat] 
11.fe53a6fb e590) v4 currently waiting for rw locks
2014-05-22 18:42:04.735887 7ff85cc07700  0 log [WRN] : 2 slow requests, 
1 included below; oldest blocked for > 91.107830 secs
2014-05-22 18:42:04.735890 7ff85cc07700  0 log [WRN] : slow request 
60.118353 seconds old, received at 2014-05-22 18:41:04.617519: 
osd_op(client.7821.0:67251426 
default.4181.1_products/800x600/537e28022fdcc.jpg [getxattrs,stat] 
11.fe53a6fb e590) v4 currently waiting for rw locks
2014-05-22 18:42:06.736279 7ff85cc07700  0 log [WRN] : 3 slow requests, 
1 included below; oldest blocked for > 93.108188 secs
2014-05-22 18:42:06.736298 7ff85cc07700  0 log [WRN] : slow request 
30.085101 seconds old, received at 2014-05-22 18:41:36.651129: 
osd_op(client.7821.0:67251757 
default.4181.1_products/800x600/537e28022fdcc.jpg [getxattrs,stat] 
11.fe53a6fb e590) v4 currently waiting for rw locks

... some more on OSD.5

--
Győrvári Gábor - Scr34m
scr...@frontember.hu

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unable to update Swift ACL's on existing containers

2014-05-22 Thread James Page
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Folks

I'm seeing some odd behaviour with RADOS Gateway as part of an
OpenStack deployment:

Environment:

Ceph 0.80.1
Ubuntu 14.04
OpenStack Icehouse

Setting ACL's on initial container creation works just fine:

$ swift post -r '.r:*,.rlistings' 61853c5a-e1d4-11e3-b125-2c768a4f56ac

container is created with ".r:*" read acl

but if I try to update the ACL post creation, I get a 401 unauthorized:

$ swift post -r '.r:*,.rlistings' 61853c5a-e1d4-11e3-b125-2c768a4f56ac
Container POST failed:
http://10.98.191.31/swift/v1/61853c5a-e1d4-11e3-b125-2c768a4f56ac 401
Unauthorized   AccessDenied

Any ideas?
- -- 
James Page
Ubuntu and Debian Developer
james.p...@ubuntu.com
jamesp...@debian.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJTfnDUAAoJEL/srsug59jDzfsP+gJMS6MJkLiz20i332Rtb0Xi
ushFFVDPO2i/nYIQNJOWP5HZlDw8n8kZ3na7wcw900YRQsUCzmcItLsdIGUeyESN
/aoeViPmJxgAv6hwG284ps8Uu+ZFHFxMsSAjOX5jNYmaoPaHm3AB0laPDasrmCdH
2+e5Oe3tQeff4xqN1RNRY9o9BlSwqwXKqcnqTjaqzCBFN8sQTg8qJ1y7zs2m8VXd
lALyGSy4kfUO1CEGMfECx29Z20IRDdHV5Hao7+5/vlUKT3yqErHnBJPHF003n/se
hzwdnjAERc4OkbpFzhiwTWyQ+nZ/xhqEDcp2SN1y3XqNbmL4RUe5hGf6UbWaExVe
INhxRUelG59hXpBs91XqlpzReSEibFQFupmn+omegnYaMjTFrABzZ2m0VocYyacq
fqJgFcOmXhTsDr+aw7Xt3nYiNldu6yiKLAoqhGIq9m+lm5vhpbdugKgpKFb9vDCv
pf4j+4DQI08rKdvndIxenUvn9jBQDK5YsPUj1vvTdKTOLOCPecId9hg9+yll8cVK
W2hRLjQ0EjjA2WDYDN0CDq67P6horHl1WApyrN+Y8TkndligKTplspS3KTG0nOyd
AGqGT4Aw1Ng4tOjEC1iS6KeFZTntbOFnwZ4WyBE3B8dub06Gu4N0ujHJ7QAn08kX
pwCynLP5iO7W7RR0HbUG
=PZCP
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph deploy on rhel6.5 installs ceph from el6 and fails

2014-05-22 Thread Lukac, Erik
Hi there,

it seems like ceph-deploy (in firefly but also in 0.72) on rhel6.5 wants to 
install stuff from el6 repo, even when ceph admin-server is configured to use 
rhel6

This is how /etc/yum.repos.d/ceph looks like on my admin-node:
[ceph@ceph-mir-dmz-admin ceph-mir-dmz]$ cat /etc/yum.repos.d/ceph.repo
[ceph-noarch]
name=Ceph noarch packages
baseurl=http://ceph.com/rpm-firefly/rhel6/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc


And this is how it looks like when executing ceph-deploy to the ceph-servers:

[ceph@ceph-mir-dmz-admin ceph-mir-dmz]$ ceph-deploy install 
ceph-mir-dmz-3-backup.TLD ceph-mir-dmz-1-backup.TLD
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/ceph/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.2): /usr/bin/ceph-deploy install 
ceph-mir-dmz-3-backup.TLD ceph-mir-dmz-1-backup.TLD
[ceph_deploy.install][DEBUG ] Installing stable version firefly on cluster ceph 
hosts ceph-mir-dmz-3-backup.TLD ceph-mir-dmz-1-backup.TLD
[ceph_deploy.install][DEBUG ] Detecting platform for host 
ceph-mir-dmz-3-backup.TLD...
[ceph-mir-dmz-3-backup.TLD][DEBUG ] connected to host: ceph-mir-dmz-3-backup.TLD
[ceph-mir-dmz-3-backup.TLD][DEBUG ] detect platform information from remote host
[ceph-mir-dmz-3-backup.TLD][DEBUG ] detect machine type
[ceph_deploy.install][INFO  ] Distro info: Red Hat Enterprise Linux Server 6.5 
Santiago
[ceph-mir-dmz-3-backup.TLD][INFO  ] installing ceph on ceph-mir-dmz-3-p.TLD
[ceph-mir-dmz-3-backup.TLD][INFO  ] Running command: sudo yum clean all
[ceph-mir-dmz-3-backup.TLD][WARNIN] This system is not registered to Red Hat 
Subscription Management. You can use subscription-manager to register.
[ceph-mir-dmz-3-backup.TLD][DEBUG ] Loaded plugins: product-id, rhnplugin,  
 security, 
subscription-manager
[ceph-mir-dmz-3-backup.TLD][DEBUG ] Cleaning repos: puppetlabs-x86_64 rhel- 
 x86_64-server-6
[ceph-mir-dmz-3-backup.TLD][DEBUG ]   : rhel-x86_64-server-opti 
 onal-6 
rhel-x86_64-server-supplementary-6
[ceph-mir-dmz-3-backup.TLD][DEBUG ]   : rhn-tools-rhel-x86_64-s 
 erver-6
[ceph-mir-dmz-3-backup.TLD][DEBUG ] Cleaning up Everything
[ceph-mir-dmz-3-backup.TLD][INFO  ] Running command: sudo yum -y -q install  
wget
[ceph-mir-dmz-3-backup.TLD][WARNIN] This system is not registered to Red Hat 
Subscription Management. You can use subscription-manager to register.
[ceph-mir-dmz-3-backup.TLD][DEBUG ] Package wget-1.12-1.11.el6_5.x86_64 already 
installed and latest version

[ceph-mir-dmz-3-backup.TLD][INFO  ] Running command: sudo rpm --import 
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[ceph-mir-dmz-3-backup.TLD][INFO  ] Running command: sudo rpm -Uvh 
--replacepkgs 
http://ceph.com/rpm-firefly/el6/noarch/ceph-release-1-0.el6.noarch.rpm
[ceph-mir-dmz-3-backup.TLD][DEBUG ] Retrieving 
http://ceph.com/rpm-firefly/el6/noarch/ceph-release-1-0.el6.noarch.rpm
[ceph-mir-dmz-3-backup.TLD][DEBUG ] Preparing...
##
[ceph-mir-dmz-3-backup.TLD][DEBUG ] ceph-release
##
[ceph-mir-dmz-3-backup.TLD][INFO  ] Running command: sudo yum -y -q install ceph
[ceph-mir-dmz-3-backup.TLD][WARNIN] This system is not registered to Red Hat 
Subscription Management. You can use subscription-manager to register.
[ceph-mir-dmz-3-backup.TLD][WARNIN] Error: Package: ceph-0.80.1-0.el6.x86_64 
(Ceph)
[ceph-mir-dmz-3-backup.TLD][WARNIN]Requires: xfsprogs
[ceph-mir-dmz-3-backup.TLD][DEBUG ]  You could try using --skip-broken to work 
around the problem
[ceph-mir-dmz-3-backup.TLD][DEBUG ]  You could try running: rpm -Va --nofiles 
--nodigest
[ceph-mir-dmz-3-backup.TLD][ERROR ] RuntimeError: command returned non-zero 
exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y -q 
install ceph


But: this fails because of the dependencies. xfsprogs is in rhel6 repo, but not 
in el6 ☹

I haven’t looked yet into ceph-deploy pythoncode and also haven’t checked 
wether installing manually would help (I'll do tomorrow) but maybe somebody has 
any ideas howto use deph-deploy cleanly.


Thanks in advance

Erik
--
Bayerischer Rundfunk; Rundfunkplatz 1; 80335 München
Telefon: +49 89 590001; E-Mail: i...@br.de; Website: http://www.BR.de
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph deploy on rhel6.5 installs ceph from el6 and fails

2014-05-22 Thread Simon Ironside

On 22/05/14 23:56, Lukac, Erik wrote:

But: this fails because of the dependencies. xfsprogs is in rhel6 repo,
but not in el6 L


I hadn't noticed that xfsprogs is included in the ceph repos, I'm using 
the package from the RHEL 6.5 DVD, which is the same version, you'll 
find it in the ScalableFileSystem repo on the Install DVD.


HTH,
Simon.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] collectd / graphite / grafana .. calamari?

2014-05-22 Thread Ricardo Rocha
Hi.

I saw the thread a couple days ago on ceph-users regarding collectd...
and yes, i've been working on something similar for the last few days
:)

https://github.com/rochaporto/collectd-ceph

It has a set of collectd plugins pushing metrics which mostly map what
the ceph commands return. In the setup we have it pushes them to
graphite and the displays rely on grafana (check for a screenshot in
the link above).

As it relies on common building blocks, it's easily extensible and
we'll come up with new dashboards soon - things like plotting osd data
against the metrics from the collectd disk plugin, which we also
deploy.

This email is mostly to share the work, but also to check on Calamari?
I asked Patrick after the RedHat/Inktank news and have no idea what it
provides, but i'm sure it comes with lots of extra sauce - he
suggested to ask in the list.

What's the timeline to have it open sourced? It would be great to have
a look at it, and as there's work from different people in this area
maybe start working together on some fancier monitoring tools.

Regards,
  Ricardo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to update Swift ACL's on existing containers

2014-05-22 Thread Yehuda Sadeh
That looks like a bug; generally the permission checks there are
broken. I opened issue #8428, and pushed a fix on top of the firefly
branch to wip-8428.

Thanks!
Yehuda

On Thu, May 22, 2014 at 2:49 PM, James Page  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Hi Folks
>
> I'm seeing some odd behaviour with RADOS Gateway as part of an
> OpenStack deployment:
>
> Environment:
>
> Ceph 0.80.1
> Ubuntu 14.04
> OpenStack Icehouse
>
> Setting ACL's on initial container creation works just fine:
>
> $ swift post -r '.r:*,.rlistings' 61853c5a-e1d4-11e3-b125-2c768a4f56ac
>
> container is created with ".r:*" read acl
>
> but if I try to update the ACL post creation, I get a 401 unauthorized:
>
> $ swift post -r '.r:*,.rlistings' 61853c5a-e1d4-11e3-b125-2c768a4f56ac
> Container POST failed:
> http://10.98.191.31/swift/v1/61853c5a-e1d4-11e3-b125-2c768a4f56ac 401
> Unauthorized   AccessDenied
>
> Any ideas?
> - --
> James Page
> Ubuntu and Debian Developer
> james.p...@ubuntu.com
> jamesp...@debian.org
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBCAAGBQJTfnDUAAoJEL/srsug59jDzfsP+gJMS6MJkLiz20i332Rtb0Xi
> ushFFVDPO2i/nYIQNJOWP5HZlDw8n8kZ3na7wcw900YRQsUCzmcItLsdIGUeyESN
> /aoeViPmJxgAv6hwG284ps8Uu+ZFHFxMsSAjOX5jNYmaoPaHm3AB0laPDasrmCdH
> 2+e5Oe3tQeff4xqN1RNRY9o9BlSwqwXKqcnqTjaqzCBFN8sQTg8qJ1y7zs2m8VXd
> lALyGSy4kfUO1CEGMfECx29Z20IRDdHV5Hao7+5/vlUKT3yqErHnBJPHF003n/se
> hzwdnjAERc4OkbpFzhiwTWyQ+nZ/xhqEDcp2SN1y3XqNbmL4RUe5hGf6UbWaExVe
> INhxRUelG59hXpBs91XqlpzReSEibFQFupmn+omegnYaMjTFrABzZ2m0VocYyacq
> fqJgFcOmXhTsDr+aw7Xt3nYiNldu6yiKLAoqhGIq9m+lm5vhpbdugKgpKFb9vDCv
> pf4j+4DQI08rKdvndIxenUvn9jBQDK5YsPUj1vvTdKTOLOCPecId9hg9+yll8cVK
> W2hRLjQ0EjjA2WDYDN0CDq67P6horHl1WApyrN+Y8TkndligKTplspS3KTG0nOyd
> AGqGT4Aw1Ng4tOjEC1iS6KeFZTntbOFnwZ4WyBE3B8dub06Gu4N0ujHJ7QAn08kX
> pwCynLP5iO7W7RR0HbUG
> =PZCP
> -END PGP SIGNATURE-
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] full osd ssd cluster advise : replication 2x or 3x ?

2014-05-22 Thread Christian Balzer

Hello,

On Thu, 22 May 2014 18:00:56 +0200 (CEST) Alexandre DERUMIER wrote:

> Hi,
> 
> I'm looking to build a full osd ssd cluster, with this config:
> 
What is your main goal for that cluster, high IOPS, high sequential writes
or reads?

Remember my "Slow IOPS on RBD..." thread, you probably shouldn't expect
more than 800 write IOPS and 4000 read IOPS per OSD (replication 2).

> 6 nodes,
> 
> each node 10 osd/ ssd drives (dual 10gbit network).  (1journal + datas
> on each osd)
> 
Halving the write speed of the SSD, leaving you with about 2GB/s max write
speed per node.

If you're after good write speeds and with a replication factor of 2 I
would split the network into public and cluster ones.
If you're however after top read speeds, use bonding for the 2 links into
the public network, half of your SSDs per node are able to saturate that.

> ssd drive will be entreprise grade, 
> 
> maybe intel sc3500 800GB (well known ssd)
> 
How much write activity do you expect per OSD (remember that you in your
case writes are doubled)? Those drives have a total write capacity of
about 450TB (within 5 years).

> or new Samsung SSD PM853T 960GB (don't have too much info about it for
> the moment, but price seem a little bit lower than intel)
> 

Looking at the specs it seems to have a better endurance (I used
500GB/day, a value that seemed realistic given the 2 numbers they gave),
at least double that of the Intel. 
Alas they only give a 3 year warranty, which makes me wonder.
Also the latencies are significantly higher than the 3500.

> 
> I would like to have some advise on replication level,
> 
> 
> Maybe somebody have experience with intel sc3500 failure rate ?

I doubt many people have managed to wear out SSDs of that vintage in
normal usage yet. And so far none of my dozens of Intel SSDs (including
some ancient X25-M ones) have died.

> How many chance to have 2 failing disks on 2 differents nodes at the
> same time (murphy's law ;).
> 
Indeed.

>From my experience and looking at the technology I would postulate that:
1. SSD failures are very rare during their guaranteed endurance
period/data volume. 
2. Once the endurance level is exceeded the probability of SSDs failing
within short periods of each other becomes pretty high.

So if you're monitoring the SSDs (SMART) religiously and take measure to
avoid clustered failures (for example by replacing SSDs early or adding
new nodes gradually, like 1 every 6 months or so) you probably are OK.

Keep in mind however that the larger this cluster grows, the more likely a
double failure scenario becomes. 
Statistics and Murphy are out to get you.

With normal disks I would use a Ceph replication of 3 or when using RAID6
nothing larger than 12 disks per set.

> 
> I think in case of disk failure, pgs should replicate fast with 10gbits
> links.
> 
That very much also depends on your cluster load and replication settings.

Regards,

Christian

> 
> So the question is:
> 
> 2x or 3x ?
> 
> 
> Regards,
> 
> Alexandre


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Feature request: stable naming for external journals

2014-05-22 Thread Thomas Matysik
I made this mistake originally, too…

 

It’s not real clear in the documentation, but it turns out that if you just 
initialize your journal drives as GPT, but don’t create the partitions, and 
then prepare your OSDs with:

 

$ ceph-deploy osd prepare node1:sde:sda

 

(ie, specify the device, not an individual partition)

 

then it will create a new partition (sized according to the osd_journal_size 
setting under [osd] in ceph.conf), and will link to it by UUID.

 

Regards,

Thomas.

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Scott 
Laird
Sent: Thursday, 22 May 2014 8:19 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Feature request: stable naming for external journals

 

I recently created a few OSDs with journals on a partitioned SSD.  Example:

$ ceph-deploy osd prepare v2:sde:sda8 

It worked fine at first, but after rebooting, the new OSD failed to start.  I 
discovered that the journal drive had been renamed from /dev/sda to /dev/sdc, 
so the journal symlink in /var/lib/ceph/osd/ceph-XX no longer pointed to the 
correct block device.

I have a couple requests/suggestions:

1.  Make this clearer in the logs.  I've seen at least a couple cases where a 
simple "Unable to open journal" message would have saved me a bunch of time.

2.  Consider some method of generating more stable journal names under the 
hood.  I'm using /dev/disk/by-id/... under Ubuntu, but that's probably not 
generally portable.  I've been tempted to put a filesystem on my journal 
devices, mount it by UUID, and then symlink to a file on the mounted device.  
It's not as fast, but at least it'd have a stable name.

(This was caused by adding an SSD and then moving / onto it; during the reboots 
needed for migrating /, drive ordering changed several times.  It probably 
wouldn't have happened if I'd started with hardware bought new and dedicated to 
Ceph)

 

Scott

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] full osd ssd cluster advise : replication 2x or 3x ?

2014-05-22 Thread Alexandre DERUMIER
>>What is your main goal for that cluster, high IOPS, high sequential writes
>>or reads?

high iops, mostly random. (it's an rbd cluster, with qemu-kvm guest, around 
1000vms, doing smalls ios each one).

80%read|20% write

I don't care about sequential workload, or bandwith. 


>>Remember my "Slow IOPS on RBD..." thread, you probably shouldn't expect
>>more than 800 write IOPS and 4000 read IOPS per OSD (replication 2).

Yes, that's enough for me !  I can't use spinner disk, because it's really too 
slow.
I need around 3iops for around 20TB of storage.

I could even go to cheaper consummer ssd (like crucial m550), I think I could 
reach 2000-4000 iops from it.
But I'm afraid of durability|stability.

- Mail original - 

De: "Christian Balzer"  
À: ceph-users@lists.ceph.com 
Envoyé: Vendredi 23 Mai 2014 04:57:51 
Objet: Re: [ceph-users] full osd ssd cluster advise : replication 2x or 3x ? 


Hello, 

On Thu, 22 May 2014 18:00:56 +0200 (CEST) Alexandre DERUMIER wrote: 

> Hi, 
> 
> I'm looking to build a full osd ssd cluster, with this config: 
> 
What is your main goal for that cluster, high IOPS, high sequential writes 
or reads? 

Remember my "Slow IOPS on RBD..." thread, you probably shouldn't expect 
more than 800 write IOPS and 4000 read IOPS per OSD (replication 2). 

> 6 nodes, 
> 
> each node 10 osd/ ssd drives (dual 10gbit network). (1journal + datas 
> on each osd) 
> 
Halving the write speed of the SSD, leaving you with about 2GB/s max write 
speed per node. 

If you're after good write speeds and with a replication factor of 2 I 
would split the network into public and cluster ones. 
If you're however after top read speeds, use bonding for the 2 links into 
the public network, half of your SSDs per node are able to saturate that. 

> ssd drive will be entreprise grade, 
> 
> maybe intel sc3500 800GB (well known ssd) 
> 
How much write activity do you expect per OSD (remember that you in your 
case writes are doubled)? Those drives have a total write capacity of 
about 450TB (within 5 years). 

> or new Samsung SSD PM853T 960GB (don't have too much info about it for 
> the moment, but price seem a little bit lower than intel) 
> 

Looking at the specs it seems to have a better endurance (I used 
500GB/day, a value that seemed realistic given the 2 numbers they gave), 
at least double that of the Intel. 
Alas they only give a 3 year warranty, which makes me wonder. 
Also the latencies are significantly higher than the 3500. 

> 
> I would like to have some advise on replication level, 
> 
> 
> Maybe somebody have experience with intel sc3500 failure rate ? 

I doubt many people have managed to wear out SSDs of that vintage in 
normal usage yet. And so far none of my dozens of Intel SSDs (including 
some ancient X25-M ones) have died. 

> How many chance to have 2 failing disks on 2 differents nodes at the 
> same time (murphy's law ;). 
> 
Indeed. 

From my experience and looking at the technology I would postulate that: 
1. SSD failures are very rare during their guaranteed endurance 
period/data volume. 
2. Once the endurance level is exceeded the probability of SSDs failing 
within short periods of each other becomes pretty high. 

So if you're monitoring the SSDs (SMART) religiously and take measure to 
avoid clustered failures (for example by replacing SSDs early or adding 
new nodes gradually, like 1 every 6 months or so) you probably are OK. 

Keep in mind however that the larger this cluster grows, the more likely a 
double failure scenario becomes. 
Statistics and Murphy are out to get you. 

With normal disks I would use a Ceph replication of 3 or when using RAID6 
nothing larger than 12 disks per set. 

> 
> I think in case of disk failure, pgs should replicate fast with 10gbits 
> links. 
> 
That very much also depends on your cluster load and replication settings. 

Regards, 

Christian 

> 
> So the question is: 
> 
> 2x or 3x ? 
> 
> 
> Regards, 
> 
> Alexandre 


-- 
Christian Balzer Network/Systems Engineer 
ch...@gol.com Global OnLine Japan/Fusion Communications 
http://www.gol.com/ 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] full osd ssd cluster advise : replication 2x or 3x ?

2014-05-22 Thread Christian Balzer

On Fri, 23 May 2014 07:02:15 +0200 (CEST) Alexandre DERUMIER wrote:

> >>What is your main goal for that cluster, high IOPS, high sequential
> >>writes or reads?
> 
> high iops, mostly random. (it's an rbd cluster, with qemu-kvm guest,
> around 1000vms, doing smalls ios each one).
> 
> 80%read|20% write
> 
> I don't care about sequential workload, or bandwith. 
> 
> 
> >>Remember my "Slow IOPS on RBD..." thread, you probably shouldn't expect
> >>more than 800 write IOPS and 4000 read IOPS per OSD (replication 2).
> 
> Yes, that's enough for me !  I can't use spinner disk, because it's
> really too slow. I need around 3iops for around 20TB of storage.
> 
> I could even go to cheaper consummer ssd (like crucial m550), I think I
> could reach 2000-4000 iops from it. But I'm afraid of
> durability|stability.
> 
That's not the only thing you should worry about.
Aside from the higher risk there's total cost of ownership or Cost per
terabyte written ($/TBW).
So while the DC S3700 800GB is about $1800 and the same sized DC S3500 at
about $850, the 3700 can reliably store 7300TB while the 3500 is only
rated for 450TB. 
You do the math. ^.^

Christian
> - Mail original - 
> 
> De: "Christian Balzer"  
> À: ceph-users@lists.ceph.com 
> Envoyé: Vendredi 23 Mai 2014 04:57:51 
> Objet: Re: [ceph-users] full osd ssd cluster advise : replication 2x or
> 3x ? 
> 
> 
> Hello, 
> 
> On Thu, 22 May 2014 18:00:56 +0200 (CEST) Alexandre DERUMIER wrote: 
> 
> > Hi, 
> > 
> > I'm looking to build a full osd ssd cluster, with this config: 
> > 
> What is your main goal for that cluster, high IOPS, high sequential
> writes or reads? 
> 
> Remember my "Slow IOPS on RBD..." thread, you probably shouldn't expect 
> more than 800 write IOPS and 4000 read IOPS per OSD (replication 2). 
> 
> > 6 nodes, 
> > 
> > each node 10 osd/ ssd drives (dual 10gbit network). (1journal + datas 
> > on each osd) 
> > 
> Halving the write speed of the SSD, leaving you with about 2GB/s max
> write speed per node. 
> 
> If you're after good write speeds and with a replication factor of 2 I 
> would split the network into public and cluster ones. 
> If you're however after top read speeds, use bonding for the 2 links
> into the public network, half of your SSDs per node are able to saturate
> that. 
> 
> > ssd drive will be entreprise grade, 
> > 
> > maybe intel sc3500 800GB (well known ssd) 
> > 
> How much write activity do you expect per OSD (remember that you in your 
> case writes are doubled)? Those drives have a total write capacity of 
> about 450TB (within 5 years). 
> 
> > or new Samsung SSD PM853T 960GB (don't have too much info about it for 
> > the moment, but price seem a little bit lower than intel) 
> > 
> 
> Looking at the specs it seems to have a better endurance (I used 
> 500GB/day, a value that seemed realistic given the 2 numbers they gave), 
> at least double that of the Intel. 
> Alas they only give a 3 year warranty, which makes me wonder. 
> Also the latencies are significantly higher than the 3500. 
> 
> > 
> > I would like to have some advise on replication level, 
> > 
> > 
> > Maybe somebody have experience with intel sc3500 failure rate ? 
> 
> I doubt many people have managed to wear out SSDs of that vintage in 
> normal usage yet. And so far none of my dozens of Intel SSDs (including 
> some ancient X25-M ones) have died. 
> 
> > How many chance to have 2 failing disks on 2 differents nodes at the 
> > same time (murphy's law ;). 
> > 
> Indeed. 
> 
> From my experience and looking at the technology I would postulate that: 
> 1. SSD failures are very rare during their guaranteed endurance 
> period/data volume. 
> 2. Once the endurance level is exceeded the probability of SSDs failing 
> within short periods of each other becomes pretty high. 
> 
> So if you're monitoring the SSDs (SMART) religiously and take measure to 
> avoid clustered failures (for example by replacing SSDs early or adding 
> new nodes gradually, like 1 every 6 months or so) you probably are OK. 
> 
> Keep in mind however that the larger this cluster grows, the more likely
> a double failure scenario becomes. 
> Statistics and Murphy are out to get you. 
> 
> With normal disks I would use a Ceph replication of 3 or when using
> RAID6 nothing larger than 12 disks per set. 
> 
> > 
> > I think in case of disk failure, pgs should replicate fast with
> > 10gbits links. 
> > 
> That very much also depends on your cluster load and replication
> settings. 
> 
> Regards, 
> 
> Christian 
> 
> > 
> > So the question is: 
> > 
> > 2x or 3x ? 
> > 
> > 
> > Regards, 
> > 
> > Alexandre 
> 
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] recommendations for erasure coded pools and profile question

2014-05-22 Thread Loic Dachary
Hi Kenneth,

In the case of erasure coded pools, the "Replicas" should be replaced by "K+M". 

$ ceph osd erasure-code-profile get myprofile
k=2
m=1
plugin=jerasure
technique=reed_sol_van
ruleset-failure-domain=osd

You have K+M=3

I proposed a fix to the documentation https://github.com/ceph/ceph/pull/1856 . 
If I understand correctly, this formula is more a helper than an exact count, 
otherwise it would be embedded into Ceph and the operator would not have to 
think about it.

  ceph --format json osd dump 

will show you which pool is using which profile.

Cheers

On 22/05/2014 14:38, Kenneth Waegeman wrote:
> Hi,
> 
> How can we apply the recommendations of the number of placement groups onto 
> erasure-coded pools?
> 
> (OSDs * 100)
> Total PGs = 
>   Replicas
> 
> Shoudl we set replica = 1, or should it be set against some EC parameters?
> 
> 
> Also a question about the EC profiles.
> I know you can show them with 'ceph osd erasure-code-profile ls',
> get or set parameters with 'ceph osd erasure-code-profile get/set', and 
> create a pool with it with 'ceph osd create ecpool pg pgn erasure . 
> But can you also list which pool has which profile?
> 
> Thanks!
> 
> Kenneth
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] full osd ssd cluster advise : replication 2x or 3x ?

2014-05-22 Thread Alexandre DERUMIER
>>That's not the only thing you should worry about.
>>Aside from the higher risk there's total cost of ownership or Cost per
>>terabyte written ($/TBW).
>>So while the DC S3700 800GB is about $1800 and the same sized DC S3500 at
>>about $850, the 3700 can reliably store 7300TB while the 3500 is only
>>rated for 450TB. 
>>You do the math. ^.^

Yes, I known,I have already do the math. But I'm far from reach this amount of 
write. 

workload is (really) random, so 20% of write of 3iops, 4k block = 25MB/s of 
write, 2TB each day.
with replication 3x, 6TB each day of write.
60x450TBW = 27000TBW  / 6TB = 4500 days = 12,5 years ;)

so with journal write, it of course less, but I think it should be enough for 5 
years


I'll also test key-value store, as no more journal, less write. 
(Not sure it works fine with rbd for the moment)

- Mail original - 

De: "Christian Balzer"  
À: ceph-users@lists.ceph.com 
Envoyé: Vendredi 23 Mai 2014 07:29:52 
Objet: Re: [ceph-users] full osd ssd cluster advise : replication 2x or 3x ? 


On Fri, 23 May 2014 07:02:15 +0200 (CEST) Alexandre DERUMIER wrote: 

> >>What is your main goal for that cluster, high IOPS, high sequential 
> >>writes or reads? 
> 
> high iops, mostly random. (it's an rbd cluster, with qemu-kvm guest, 
> around 1000vms, doing smalls ios each one). 
> 
> 80%read|20% write 
> 
> I don't care about sequential workload, or bandwith. 
> 
> 
> >>Remember my "Slow IOPS on RBD..." thread, you probably shouldn't expect 
> >>more than 800 write IOPS and 4000 read IOPS per OSD (replication 2). 
> 
> Yes, that's enough for me ! I can't use spinner disk, because it's 
> really too slow. I need around 3iops for around 20TB of storage. 
> 
> I could even go to cheaper consummer ssd (like crucial m550), I think I 
> could reach 2000-4000 iops from it. But I'm afraid of 
> durability|stability. 
> 
That's not the only thing you should worry about. 
Aside from the higher risk there's total cost of ownership or Cost per 
terabyte written ($/TBW). 
So while the DC S3700 800GB is about $1800 and the same sized DC S3500 at 
about $850, the 3700 can reliably store 7300TB while the 3500 is only 
rated for 450TB. 
You do the math. ^.^ 

Christian 
> - Mail original - 
> 
> De: "Christian Balzer"  
> À: ceph-users@lists.ceph.com 
> Envoyé: Vendredi 23 Mai 2014 04:57:51 
> Objet: Re: [ceph-users] full osd ssd cluster advise : replication 2x or 
> 3x ? 
> 
> 
> Hello, 
> 
> On Thu, 22 May 2014 18:00:56 +0200 (CEST) Alexandre DERUMIER wrote: 
> 
> > Hi, 
> > 
> > I'm looking to build a full osd ssd cluster, with this config: 
> > 
> What is your main goal for that cluster, high IOPS, high sequential 
> writes or reads? 
> 
> Remember my "Slow IOPS on RBD..." thread, you probably shouldn't expect 
> more than 800 write IOPS and 4000 read IOPS per OSD (replication 2). 
> 
> > 6 nodes, 
> > 
> > each node 10 osd/ ssd drives (dual 10gbit network). (1journal + datas 
> > on each osd) 
> > 
> Halving the write speed of the SSD, leaving you with about 2GB/s max 
> write speed per node. 
> 
> If you're after good write speeds and with a replication factor of 2 I 
> would split the network into public and cluster ones. 
> If you're however after top read speeds, use bonding for the 2 links 
> into the public network, half of your SSDs per node are able to saturate 
> that. 
> 
> > ssd drive will be entreprise grade, 
> > 
> > maybe intel sc3500 800GB (well known ssd) 
> > 
> How much write activity do you expect per OSD (remember that you in your 
> case writes are doubled)? Those drives have a total write capacity of 
> about 450TB (within 5 years). 
> 
> > or new Samsung SSD PM853T 960GB (don't have too much info about it for 
> > the moment, but price seem a little bit lower than intel) 
> > 
> 
> Looking at the specs it seems to have a better endurance (I used 
> 500GB/day, a value that seemed realistic given the 2 numbers they gave), 
> at least double that of the Intel. 
> Alas they only give a 3 year warranty, which makes me wonder. 
> Also the latencies are significantly higher than the 3500. 
> 
> > 
> > I would like to have some advise on replication level, 
> > 
> > 
> > Maybe somebody have experience with intel sc3500 failure rate ? 
> 
> I doubt many people have managed to wear out SSDs of that vintage in 
> normal usage yet. And so far none of my dozens of Intel SSDs (including 
> some ancient X25-M ones) have died. 
> 
> > How many chance to have 2 failing disks on 2 differents nodes at the 
> > same time (murphy's law ;). 
> > 
> Indeed. 
> 
> From my experience and looking at the technology I would postulate that: 
> 1. SSD failures are very rare during their guaranteed endurance 
> period/data volume. 
> 2. Once the endurance level is exceeded the probability of SSDs failing 
> within short periods of each other becomes pretty high. 
> 
> So if you're monitoring the SSDs (SMART) religiously and take measure to 
> avoid clustered failures (for example by 

Re: [ceph-users] full osd ssd cluster advise : replication 2x or 3x ?

2014-05-22 Thread Alexandre DERUMIER
BTW,
the new samsung PM853T SSD, announce 665 TBW for 4K random write
http://www.tomsitpro.com/articles/samsung-3-bit-nand-enterprise-ssd,1-1922.html

and price are cheaper than intel s3500. (around 450€ ex vat)

(Cluster will be build next year, so I have some time before choose the good 
one ssd)


my main concern, is to known if it's really needed to have replication x3 
(mainly for cost price).
But I can wait to have lower ssd price next year, and go to 3x if necessary.



- Mail original - 

De: "Alexandre DERUMIER"  
À: "Christian Balzer"  
Cc: ceph-users@lists.ceph.com 
Envoyé: Vendredi 23 Mai 2014 07:59:58 
Objet: Re: [ceph-users] full osd ssd cluster advise : replication 2x or 3x ? 

>>That's not the only thing you should worry about. 
>>Aside from the higher risk there's total cost of ownership or Cost per 
>>terabyte written ($/TBW). 
>>So while the DC S3700 800GB is about $1800 and the same sized DC S3500 at 
>>about $850, the 3700 can reliably store 7300TB while the 3500 is only 
>>rated for 450TB. 
>>You do the math. ^.^ 

Yes, I known,I have already do the math. But I'm far from reach this amount of 
write. 

workload is (really) random, so 20% of write of 3iops, 4k block = 25MB/s of 
write, 2TB each day. 
with replication 3x, 6TB each day of write. 
60x450TBW = 27000TBW / 6TB = 4500 days = 12,5 years ;) 

so with journal write, it of course less, but I think it should be enough for 5 
years 


I'll also test key-value store, as no more journal, less write. 
(Not sure it works fine with rbd for the moment) 

- Mail original - 

De: "Christian Balzer"  
À: ceph-users@lists.ceph.com 
Envoyé: Vendredi 23 Mai 2014 07:29:52 
Objet: Re: [ceph-users] full osd ssd cluster advise : replication 2x or 3x ? 


On Fri, 23 May 2014 07:02:15 +0200 (CEST) Alexandre DERUMIER wrote: 

> >>What is your main goal for that cluster, high IOPS, high sequential 
> >>writes or reads? 
> 
> high iops, mostly random. (it's an rbd cluster, with qemu-kvm guest, 
> around 1000vms, doing smalls ios each one). 
> 
> 80%read|20% write 
> 
> I don't care about sequential workload, or bandwith. 
> 
> 
> >>Remember my "Slow IOPS on RBD..." thread, you probably shouldn't expect 
> >>more than 800 write IOPS and 4000 read IOPS per OSD (replication 2). 
> 
> Yes, that's enough for me ! I can't use spinner disk, because it's 
> really too slow. I need around 3iops for around 20TB of storage. 
> 
> I could even go to cheaper consummer ssd (like crucial m550), I think I 
> could reach 2000-4000 iops from it. But I'm afraid of 
> durability|stability. 
> 
That's not the only thing you should worry about. 
Aside from the higher risk there's total cost of ownership or Cost per 
terabyte written ($/TBW). 
So while the DC S3700 800GB is about $1800 and the same sized DC S3500 at 
about $850, the 3700 can reliably store 7300TB while the 3500 is only 
rated for 450TB. 
You do the math. ^.^ 

Christian 
> - Mail original - 
> 
> De: "Christian Balzer"  
> À: ceph-users@lists.ceph.com 
> Envoyé: Vendredi 23 Mai 2014 04:57:51 
> Objet: Re: [ceph-users] full osd ssd cluster advise : replication 2x or 
> 3x ? 
> 
> 
> Hello, 
> 
> On Thu, 22 May 2014 18:00:56 +0200 (CEST) Alexandre DERUMIER wrote: 
> 
> > Hi, 
> > 
> > I'm looking to build a full osd ssd cluster, with this config: 
> > 
> What is your main goal for that cluster, high IOPS, high sequential 
> writes or reads? 
> 
> Remember my "Slow IOPS on RBD..." thread, you probably shouldn't expect 
> more than 800 write IOPS and 4000 read IOPS per OSD (replication 2). 
> 
> > 6 nodes, 
> > 
> > each node 10 osd/ ssd drives (dual 10gbit network). (1journal + datas 
> > on each osd) 
> > 
> Halving the write speed of the SSD, leaving you with about 2GB/s max 
> write speed per node. 
> 
> If you're after good write speeds and with a replication factor of 2 I 
> would split the network into public and cluster ones. 
> If you're however after top read speeds, use bonding for the 2 links 
> into the public network, half of your SSDs per node are able to saturate 
> that. 
> 
> > ssd drive will be entreprise grade, 
> > 
> > maybe intel sc3500 800GB (well known ssd) 
> > 
> How much write activity do you expect per OSD (remember that you in your 
> case writes are doubled)? Those drives have a total write capacity of 
> about 450TB (within 5 years). 
> 
> > or new Samsung SSD PM853T 960GB (don't have too much info about it for 
> > the moment, but price seem a little bit lower than intel) 
> > 
> 
> Looking at the specs it seems to have a better endurance (I used 
> 500GB/day, a value that seemed realistic given the 2 numbers they gave), 
> at least double that of the Intel. 
> Alas they only give a 3 year warranty, which makes me wonder. 
> Also the latencies are significantly higher than the 3500. 
> 
> > 
> > I would like to have some advise on replication level, 
> > 
> > 
> > Maybe somebody have experience with intel sc3500 failure rate ? 
> 
> I doubt many pe

Re: [ceph-users] 70+ OSD are DOWN and not coming up

2014-05-22 Thread Craig Lewis

On 5/21/14 21:15 , Sage Weil wrote:

On Wed, 21 May 2014, Craig Lewis wrote:

If you do this over IRC, can you please post a summary to the mailling
list?

I believe I'm having this issue as well.

In the other case, we found that some of the OSDs were behind processing
maps (by several thousand epochs).  The trick here to give them a chance
to catch up is

  ceph osd set noup
  ceph osd set nodown
  ceph osd set noout

and wait for them to stop spinning on the CPU.  You can check which map
each OSD is on with

  ceph daemon osd.NNN status

to see which epoch they are on and compare that to

  ceph osd stat

Once they are within 100 or less epochs,

  ceph osd unset noup

and let them all start up.

We haven't determined whether the original problem was caused by this or
the other way around; we'll see once they are all caught up.

sage


I was seeing the CPU spinning too, so I think it is the same issue. 
Thanks for the explanation!  I've been pulling my hair out for weeks.



I can give you a data point for the "how".  My problems started with a 
kswapd problem on 12.04.04 (kernel 3.5.0-46-generic 
#70~precise1-Ubuntu).  kswapd was consuming 100% CPU, and it was 
blocking the ceph-osd processes.  Once I prevented kswapd from doing 
that, my OSDs couldn't recover.  noout and nodown didn't help; the OSDs 
would suicide and restart.



Upgrading to Ubuntu 14.04 seems to have helped.  The cluster isn't all 
clear yet, but it's getting better.  The cluster is finally healthy 
after 2 weeks of incomplete and stale.  It's still unresponsive, but 
it's making progress.  I am still seeing OSD's consuming 100% CPU, but 
only the OSDs that are actively deep-scrubing.  Once the deep-scrub 
finishes, the OSD starts behaving again.  They seem to be slowly getting 
better, which matches up with your explanation.



I'll go ahead at set noup.  I don't think it's necessary at this point, 
but it's not going to hurt.


I'm running Emperor, and looks like osd status isn't supported.  Not a 
big deal though.  Deep-scrub has made it through half of the PGs in the 
last 36 hours, so I'll just watch for another day or two. This is a 
slave cluster, so I have that luxury.








--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd watchers

2014-05-22 Thread James Eckersall
Hi,

Thanks for the suggestion, but unfortunately there are no snapshots for
this image either.

Still confused :(


On 22 May 2014 02:54, Mandell Degerness  wrote:

> The times I have seen this message, it has always been because there
> are snapshots of the image that haven't been deleted yet. You can see
> the snapshots with "rbd snap list ".
>
> On Tue, May 20, 2014 at 4:26 AM, James Eckersall
>  wrote:
> > Hi,
> >
> >
> >
> > I'm having some trouble with an rbd image.  I want to rename the current
> rbd
> > and create a new rbd with the same name.
> >
> > I renamed the rbd with rbd mv, but it was still mapped on another node,
> so
> > rbd mv gave me an error that it was unable to remove the source.
> >
> >
> > I then unmapped the original rbd and tried to remove it.
> >
> >
> > Despite it being unmapped, the cluster still believes that there is a
> > watcher on the rbd:
> >
> >
> > root@ceph-admin:~# rados -p poolname listwatchers rbdname.rbd
> >
> > watcher=x.x.x.x:0/2329830975 client.26367 cookie=48
> >
> > root@ceph-admin:~# rbd rm -p poolname rbdname
> >
> > Removing image: 99% complete...failed.2014-05-20 11:50:15.023823
> > 7fa6372e4780 -1 librbd: error removing header: (16) Device or resource
> busy
> >
> >
> > rbd: error: image still has watchers
> >
> > This means the image is still open or the client using it crashed. Try
> again
> > after closing/unmapping it or waiting 30s for the crashed client to
> timeout.
> >
> >
> >
> > I've already rebooted the node that the cluster claims is a watcher and
> > confirmed it definitely is not mapped.
> >
> > I'm 99.9% sure that there are no nodes actually using this rbd.
> >
> >
> > Does anyone know how I can get rid of it?
> >
> >
> > Currently running ceph 0.73-1 on Ubuntu 12.04.
> >
> >
> > Thanks
> >
> >
> > J
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Access denied error for list users

2014-05-22 Thread alain.dechorgnat
GET /admin/metadata/user returns only user ids (no detail)
GET /admin/user  returns 403
GET /admin/user?uid=XXX  returns detail on user XXX

So, if you want the user list with details, you’ll have to call once GET 
/admin/metadata/user to fetch all uids and for every user GET 
/admin/user?uid=XXX  

I don’t know more for PHP.

Alain


De : Shanil S [mailto:xielessha...@gmail.com] 
Envoyé : mercredi 21 mai 2014 13:25
À : DECHORGNAT Alain IMT/OLPS; ceph-users@lists.ceph.com; Yehuda Sadeh; Sage 
Weil; w...@42on.com
Objet : Re: [ceph-users] Access denied error for list users

Hi Alain,
Thanks for your reply.
Do you mean we can't list out all users with complete user details using GET 
/admin/metadata/user or using GET /admin/user?
Yes, i checked http://ceph.com/docs/master/radosgw/s3/php/ and it contains only 
the bucket operations and not any admin operations like list users,create users 
modify user etc. Is there any other php for this ? if so, i can use directly 
that api for admin operations

On Wed, May 21, 2014 at 1:33 PM,  wrote:
There is no detail with GET /admin/metadata/user, only ids.

For PHP, have a look at http://ceph.com/docs/master/radosgw/s3/php/

Alain

De : Shanil S [mailto:xielessha...@gmail.com]
Envoyé : mercredi 21 mai 2014 05:48
À : DECHORGNAT Alain IMT/OLPS
Objet : Re: [ceph-users] Access denied error for list users

Hi Alain,

Thanks..
I used the GET /admin/metadata/user to fetch the user list but it only shows 
the usernames in the list. I would like to show the other details too like 
bucket number,id etc. Can i use the same GET /admin/metadata/user to get all 
these details ? Also, is there any easy way to generate the access token 
authorization header using php ?

On Tue, May 20, 2014 at 7:36 PM,  wrote:
Hi,

GET /admin/user with no parameter doesn't work.

You must use GET /admin/metadata/user to fetch the user list (with metadata 
capabity).

Alain


De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Shanil 
S
Envoyé : mardi 20 mai 2014 07:13
À : ceph-users@lists.ceph.com; w...@42on.com; s...@inktank.com; Yehuda Sadeh
Objet : [ceph-users] Access denied error for list users

Hi,

I am trying to create and list all users by using the functions 
http://ceph.com/docs/master/radosgw/adminops/ and i successfully created the 
access tokens but i am getting an access denied and 403 for listing users 
function. The GET /{admin}/user is used for getting the complete users list, 
but its not listing and getting the error. The user which called this function 
has the complete permission and i am adding the permission of this user

{ "type": "admin",
  "perm": "*"},
    { "type": "buckets",
  "perm": "*"},
    { "type": "caps",
  "perm": "*"},
    { "type": "metadata",
  "perm": "*"},
    { "type": "usage",
  "perm": "*"},
    { "type": "users",
  "perm": "*"}],
  "op_mask": "read, write, delete",
  "default_placement": "",
  "placement_tags": [],
  "bucket_quota": { "enabled": false,
  "max_size_kb": -1,
  "max_objects": -1}}


This is in the log file which executed the list user function

-

GET

application/x-www-form-urlencoded
Tue, 20 May 2014 05:06:57 GMT
/admin/user/
2014-05-20 13:06:59.506233 7f0497fa7700 15 calculated 
digest=Z8FgXRLk+ah5MUThpP9IBJrMnrA=
2014-05-20 13:06:59.506236 7f0497fa7700 15 
auth_sign=Z8FgXRLk+ah5MUThpP9IBJrMnrA=
2014-05-20 13:06:59.506237 7f0497fa7700 15 compare=0
2014-05-20 13:06:59.506240 7f0497fa7700  2 req 98:0.000308::GET 
/admin/user/:get_user_info:reading permissions
2014-05-20 13:06:59.506244 7f0497fa7700  2 req 98:0.000311::GET 
/admin/user/:get_user_info:init op
2014-05-20 13:06:59.506247 7f0497fa7700  2 req 98:0.000314::GET 
/admin/user/:get_user_info:verifying op mask
2014-05-20 13:06:59.506249 7f0497fa7700 20 required_mask= 0 user.op_mask=7
2014-05-20 13:06:59.506251 7f0497fa7700  2 req 98:0.000319::GET 
/admin/user/:get_user_info:verifying op permissions
2014-05-20 13:06:59.506254 7f0497fa7700  2 req 98:0.000322::GET 
/admin/user/:get_user_info:verifying op params
2014-05-20 13:06:59.506257 7f0497fa7700  2 req 98:0.000324::GET 
/admin/user/:get_user_info:executing
2014-05-20 13:06:59.506291 7f0497fa7700  2 req 98:0.000359::GET 
/admin/user/:get_user_info:http status=403
2014-05-20 13:06:59.506294 7f0497fa7700  1 == req done req=0x7f04c800d7f0 
http_status=403 ==
2014-05-20 13:06:59.506302 7f0497fa7700 20 process_request() returned -13

-

Could you please check what is the issue ?
I am using the ceph version : ceph version 0.80.1

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou 

Re: [ceph-users] Data still in OSD directories after removing

2014-05-22 Thread Olivier Bonvalet

Le mercredi 21 mai 2014 à 18:20 -0700, Josh Durgin a écrit :
> On 05/21/2014 03:03 PM, Olivier Bonvalet wrote:
> > Le mercredi 21 mai 2014 à 08:20 -0700, Sage Weil a écrit :
> >> You're certain that that is the correct prefix for the rbd image you
> >> removed?  Do you see the objects lists when you do 'rados -p rbd ls - |
> >> grep '?
> >
> > I'm pretty sure yes : since I didn't see a lot of space freed by the
> > "rbd snap purge" command, I looked at the RBD prefix before to do the
> > "rbd rm" (it's not the first time I see that problem, but previous time
> > without the RBD prefix I was not able to check).
> >
> > So :
> > - "rados -p sas3copies ls - | grep rb.0.14bfb5a.238e1f29" return nothing
> > at all
> > - # rados stat -p sas3copies rb.0.14bfb5a.238e1f29.0002f026
> >   error stat-ing sas3copies/rb.0.14bfb5a.238e1f29.0002f026: No such
> > file or directory
> > - # rados stat -p sas3copies rb.0.14bfb5a.238e1f29.
> >   error stat-ing sas3copies/rb.0.14bfb5a.238e1f29.: No such
> > file or directory
> > - # ls -al 
> > /var/lib/ceph/osd/ceph-67/current/9.1fe_head/DIR_E/DIR_F/DIR_1/DIR_7/rb.0.14bfb5a.238e1f29.0002f026__a252_E68871FE__9
> > -rw-r--r-- 1 root root 4194304 oct.   8  2013 
> > /var/lib/ceph/osd/ceph-67/current/9.1fe_head/DIR_E/DIR_F/DIR_1/DIR_7/rb.0.14bfb5a.238e1f29.0002f026__a252_E68871FE__9
> >
> >
> >> If the objects really are orphaned, teh way to clean them up is via 'rados
> >> -p rbd rm '.  I'd like to get to the bottom of how they ended
> >> up that way first, though!
> >
> > I suppose the problem came from me, by doing CTRL+C while "rbd snap
> > purge $IMG".
> > "rados rm -p sas3copies rb.0.14bfb5a.238e1f29.0002f026" don't remove
> > thoses files, and just answer with a "No such file or directory".
> 
> Those files are all for snapshots, which are removed by the osds
> asynchronously in a process called 'snap trimming'. There's no
> way to directly remove them via rados.
> 
> Since you stopped 'rbd snap purge' partway through, it may
> have removed the reference to the snapshot before removing
> the snapshot itself.
> 
> You can get a list of snapshot ids for the remaining objects
> via the 'rados listsnaps' command, and use
> rados_ioctx_selfmanaged_snap_remove() (no convenient wrapper
> unfortunately) on each of those snapshot ids to be sure they are all
> scheduled for asynchronous deletion.
> 
> Josh
> 

Great : "rados listsnaps" see it :
# rados listsnaps -p sas3copies rb.0.14bfb5a.238e1f29.0002f026
rb.0.14bfb5a.238e1f29.0002f026:
cloneid snaps   sizeoverlap
41554   35746   4194304 []

So, I have to write&compile a wrapper to
rados_ioctx_selfmanaged_snap_remove(), and find a way to obtain a list
of all "orphan" objects ?

I also try to recreate the object (rados put) then remove it (rados rm),
but snapshots still here.

Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy mon create-initial

2014-05-22 Thread Mārtiņš Jakubovičs

Hello,

I follow this guide 
 
and stuck in item 4.


   Add the initial monitor(s) and gather the keys (new
   inceph-deployv1.1.3).

   ceph-deploy mon create-initial

   For example:

   ceph-deploy mon create-initial

If I perform this action I got warning messages and didn't receive any 
key's.


http://pastebin.com/g21CNPyY

How I can solve this issue?

Thanks.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy mon create-initial

2014-05-22 Thread Wido den Hollander

On 05/22/2014 11:46 AM, Mārtiņš Jakubovičs wrote:

Hello,

I follow this guide

and stuck in item 4.

Add the initial monitor(s) and gather the keys (new
inceph-deployv1.1.3).

ceph-deploy mon create-initial

For example:

ceph-deploy mon create-initial

If I perform this action I got warning messages and didn't receive any
key's.

http://pastebin.com/g21CNPyY

How I can solve this issue?



Sometimes it takes a couple of seconds for the keys to be generated.

What if you run this afterwards:

$ ceph-deploy gatherkeys ceph-node1

Wido


Thanks.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy mon create-initial

2014-05-22 Thread Mārtiņš Jakubovičs

Hello,

Thanks for such fast response.

Warning still persist:

http://pastebin.com/QnciHG6v

I didn't mention it, but admin and monitoring nodes are Ubuntu 14.04 
x64, ceph-deploy 1.4 and ceph 0.79.


On 2014.05.22. 12:50, Wido den Hollander wrote:

On 05/22/2014 11:46 AM, Mārtiņš Jakubovičs wrote:

Hello,

I follow this guide

and stuck in item 4.

Add the initial monitor(s) and gather the keys (new
inceph-deployv1.1.3).

ceph-deploy mon create-initial

For example:

ceph-deploy mon create-initial

If I perform this action I got warning messages and didn't receive any
key's.

http://pastebin.com/g21CNPyY

How I can solve this issue?



Sometimes it takes a couple of seconds for the keys to be generated.

What if you run this afterwards:

$ ceph-deploy gatherkeys ceph-node1

Wido


Thanks.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to find the disk partitions attached to a OSD

2014-05-22 Thread Sharmila Govind
root@cephnode4:/mnt/ceph/osd2# *ceph-disk list*
/dev/sda :
 /dev/sda1 other, ext4, mounted on /
 /dev/sda2 other, ext4, mounted on /boot
 /dev/sda3 other
 /dev/sda4 swap, swap
 /dev/sda5 other, ext4, mounted on /home
 /dev/sda6 other, ext4
 /dev/sda7 other, ext4, mounted on /mnt/Storage
/dev/sdb other, ext4, mounted on /mnt/ceph/osd2
/dev/sdc other, ext4, mounted on /mnt/ceph/osd3




root@cephnode4:/mnt/ceph/osd2# *ceph osd tree*
# idweight  type name   up/down reweight
-1  2.2 root default
-2  1.66host cephnode2
0   0.76osd.0   up  1
2   0.9 osd.2   up  1
-3  0.54host cephnode4
1   0.27osd.1   up  1
3   0.27osd.3   up  1


root@cephnode4:/mnt/ceph/osd2# *mount |grep ceph*
/dev/sdc on /mnt/ceph/osd3 type ext4 (rw)
/dev/sdb on /mnt/ceph/osd2 type ext4 (rw)




All the above commands just pointed out the mount points(/mnt/ceph/osd3),
the folders were named by me as ceph/osd. But, if a new user has to get the
osd mapping to the mounted devices, would be difficult if we named the osd
disk folders differently. Any other command which could give the mapping
would be useful.

Thanks for the Info,
Sharmila

On Wed, May 21, 2014 at 9:03 PM, Sage Weil  wrote:

> You might also try
>
>  ceph-disk list
>
> sage
>
>
> On Wed, 21 May 2014, Mike Dawson wrote:
>
> > Looks like you may not have any OSDs properly setup and mounted. It
> should
> > look more like:
> >
> > user@host:~# mount | grep ceph
> > /dev/sdb1 on /var/lib/ceph/osd/ceph-0 type xfs (rw,noatime,inode64)
> > /dev/sdc1 on /var/lib/ceph/osd/ceph-1 type xfs (rw,noatime,inode64)
> > /dev/sdd1 on /var/lib/ceph/osd/ceph-2 type xfs (rw,noatime,inode64)
> >
> > Confirm the OSD in your ceph cluster with:
> >
> > user@host:~# ceph osd tree
> >
> > - Mike
> >
> >
> > On 5/21/2014 11:15 AM, Sharmila Govind wrote:
> > > Hi Mike,
> > > Thanks for your quick response. When I try mount on the storage node
> > > this is what I get:
> > >
> > > *root@cephnode4:~# mount*
> > > */dev/sda1 on / type ext4 (rw,errors=remount-ro)*
> > > *proc on /proc type proc (rw,noexec,nosuid,nodev)*
> > > *sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)*
> > > *none on /sys/fs/fuse/connections type fusectl (rw)*
> > > *none on /sys/kernel/debug type debugfs (rw)*
> > > *none on /sys/kernel/security type securityfs (rw)*
> > > *udev on /dev type devtmpfs (rw,mode=0755)*
> > > *devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)*
> > > *tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)*
> > > *none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)*
> > > *none on /run/shm type tmpfs (rw,nosuid,nodev)*
> > > */dev/sdb on /mnt/CephStorage1 type ext4 (rw)*
> > > */dev/sdc on /mnt/CephStorage2 type ext4 (rw)*
> > > */dev/sda7 on /mnt/Storage type ext4 (rw)*
> > > */dev/sda2 on /boot type ext4 (rw)*
> > > */dev/sda5 on /home type ext4 (rw)*
> > > */dev/sda6 on /mnt/CephStorage type ext4 (rw)*
> > >
> > >
> > >
> > > Is there anything wrong in the setup I have? I dont have any 'ceph'
> > > related mounts.
> > >
> > > Thanks,
> > > Sharmila
> > >
> > >
> > >
> > > On Wed, May 21, 2014 at 8:34 PM, Mike Dawson  > > > wrote:
> > >
> > > Perhaps:
> > >
> > > # mount | grep ceph
> > >
> > > - Mike Dawson
> > >
> > >
> > >
> > > On 5/21/2014 11:00 AM, Sharmila Govind wrote:
> > >
> > > Hi,
> > >I am new to Ceph. I have a storage node with 2 OSDs. Iam
> > > trying to
> > > figure out to which pyhsical device/partition each of the OSDs
> are
> > > attached to. Is there are command that can be executed in the
> > > storage
> > > node to find out the same.
> > >
> > > Thanks in Advance,
> > > Sharmila
> > >
> > >
> > > _
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com 
> > > http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
> > > 
> > >
> > >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy mon create-initial

2014-05-22 Thread Wido den Hollander

On 05/22/2014 11:54 AM, Mārtiņš Jakubovičs wrote:

Hello,

Thanks for such fast response.

Warning still persist:

http://pastebin.com/QnciHG6v



Hmm, that's weird.


I didn't mention it, but admin and monitoring nodes are Ubuntu 14.04
x64, ceph-deploy 1.4 and ceph 0.79.



Why aren't you trying with Ceph 0.80 Firefly? I'd recommend you try that.

The monitor should still have generated the client.admin keyring, but 
that's something different.


Wido


On 2014.05.22. 12:50, Wido den Hollander wrote:

On 05/22/2014 11:46 AM, Mārtiņš Jakubovičs wrote:

Hello,

I follow this guide

and stuck in item 4.

Add the initial monitor(s) and gather the keys (new
inceph-deployv1.1.3).

ceph-deploy mon create-initial

For example:

ceph-deploy mon create-initial

If I perform this action I got warning messages and didn't receive any
key's.

http://pastebin.com/g21CNPyY

How I can solve this issue?



Sometimes it takes a couple of seconds for the keys to be generated.

What if you run this afterwards:

$ ceph-deploy gatherkeys ceph-node1

Wido


Thanks.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy mon create-initial

2014-05-22 Thread Mārtiņš Jakubovičs

Thanks,

I will try upgrade to 0.80.

On 2014.05.22. 13:00, Wido den Hollander wrote:

On 05/22/2014 11:54 AM, Mārtiņš Jakubovičs wrote:

Hello,

Thanks for such fast response.

Warning still persist:

http://pastebin.com/QnciHG6v



Hmm, that's weird.


I didn't mention it, but admin and monitoring nodes are Ubuntu 14.04
x64, ceph-deploy 1.4 and ceph 0.79.



Why aren't you trying with Ceph 0.80 Firefly? I'd recommend you try that.

The monitor should still have generated the client.admin keyring, but 
that's something different.


Wido


On 2014.05.22. 12:50, Wido den Hollander wrote:

On 05/22/2014 11:46 AM, Mārtiņš Jakubovičs wrote:

Hello,

I follow this guide
 


and stuck in item 4.

Add the initial monitor(s) and gather the keys (new
inceph-deployv1.1.3).

ceph-deploy mon create-initial

For example:

ceph-deploy mon create-initial

If I perform this action I got warning messages and didn't receive any
key's.

http://pastebin.com/g21CNPyY

How I can solve this issue?



Sometimes it takes a couple of seconds for the keys to be generated.

What if you run this afterwards:

$ ceph-deploy gatherkeys ceph-node1

Wido


Thanks.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy mon create-initial

2014-05-22 Thread Mārtiņš Jakubovičs

Unfortunately upgrade to 0.80 didn't help. Same error.

In monitor node I checked file /etc/ceph/ceph.client.admin.keyring and 
it didn't exist. Should it exist?

Maybe I can manually perform some actions in monitor node to generate key's?

On 2014.05.22. 13:00, Wido den Hollander wrote:

On 05/22/2014 11:54 AM, Mārtiņš Jakubovičs wrote:

Hello,

Thanks for such fast response.

Warning still persist:

http://pastebin.com/QnciHG6v



Hmm, that's weird.


I didn't mention it, but admin and monitoring nodes are Ubuntu 14.04
x64, ceph-deploy 1.4 and ceph 0.79.



Why aren't you trying with Ceph 0.80 Firefly? I'd recommend you try that.

The monitor should still have generated the client.admin keyring, but 
that's something different.


Wido


On 2014.05.22. 12:50, Wido den Hollander wrote:

On 05/22/2014 11:46 AM, Mārtiņš Jakubovičs wrote:

Hello,

I follow this guide
 


and stuck in item 4.

Add the initial monitor(s) and gather the keys (new
inceph-deployv1.1.3).

ceph-deploy mon create-initial

For example:

ceph-deploy mon create-initial

If I perform this action I got warning messages and didn't receive any
key's.

http://pastebin.com/g21CNPyY

How I can solve this issue?



Sometimes it takes a couple of seconds for the keys to be generated.

What if you run this afterwards:

$ ceph-deploy gatherkeys ceph-node1

Wido


Thanks.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Expanding pg's of an erasure coded pool

2014-05-22 Thread Kenneth Waegeman


- Message from Gregory Farnum  -
   Date: Wed, 21 May 2014 15:46:17 -0700
   From: Gregory Farnum 
Subject: Re: [ceph-users] Expanding pg's of an erasure coded pool
 To: Kenneth Waegeman 
 Cc: ceph-users 



On Wed, May 21, 2014 at 3:52 AM, Kenneth Waegeman
 wrote:

Thanks! I increased the max processes parameter for all daemons quite a lot
(until ulimit -u 3802720)

These are the limits for the daemons now..
[root@ ~]# cat /proc/17006/limits
Limit Soft Limit   Hard Limit   Units
Max cpu time  unlimitedunlimitedseconds
Max file size unlimitedunlimitedbytes
Max data size unlimitedunlimitedbytes
Max stack size10485760 unlimitedbytes
Max core file sizeunlimitedunlimitedbytes
Max resident set  unlimitedunlimitedbytes
Max processes 3802720  3802720
processes
Max open files3276832768files
Max locked memory 6553665536bytes
Max address space unlimitedunlimitedbytes
Max file locksunlimitedunlimitedlocks
Max pending signals   9506895068signals
Max msgqueue size 819200   819200   bytes
Max nice priority 00
Max realtime priority 00
Max realtime timeout  unlimitedunlimitedus

But this didn't help. Are there other parameters I should change?


Hrm, is it exactly the same stack trace? You might need to bump the
open files limit as well, although I'd be surprised. :/


I increased the open file limit as test to 128000, still the same results.

Stack trace:

   -16> 2014-05-22 11:10:05.262456 7f3bfcaee700  5 osd.398 pg_epoch:  
6327 pg[16.8f5s14( empty local-les=6326 n=0 ec=6293 les/c 6326/6326  
6293/6310/6293)  
[255,52,147,15,402,280,129,321,125,180,301,85,22,340,398] r=14  
lpr=6310 pi=6293-6309/1 crt=0'0 active] exit  
Started/ReplicaActive/RepNotRecovering 52.314752 4 0.000408
   -15> 2014-05-22 11:10:05.262649 7f3bfcaee700  5 osd.398 pg_epoch:  
6327 pg[16.8f5s14( empty local-les=6326 n=0 ec=6293 les/c 6326/6326  
6293/6310/6293)  
[255,52,147,15,402,280,129,321,125,180,301,85,22,340,398] r=14  
lpr=6310 pi=6293-6309/1 crt=0'0 active] exit Started/ReplicaActive  
52.315020 0 0.00
   -14> 2014-05-22 11:10:05.262667 7f3bfcaee700  5 osd.398 pg_epoch:  
6327 pg[16.8f5s14( empty local-les=6326 n=0 ec=6293 les/c 6326/6326  
6293/6310/6293)  
[255,52,147,15,402,280,129,321,125,180,301,85,22,340,398] r=14  
lpr=6310 pi=6293-6309/1 crt=0'0 active] exit Started 55.181842 0  
0.00
   -13> 2014-05-22 11:10:05.262681 7f3bfcaee700  5 osd.398 pg_epoch:  
6327 pg[16.8f5s14( empty local-les=6326 n=0 ec=6293 les/c 6326/6326  
6293/6310/6293)  
[255,52,147,15,402,280,129,321,125,180,301,85,22,340,398] r=14  
lpr=6310 pi=6293-6309/1 crt=0'0 active] enter Reset
   -12> 2014-05-22 11:10:05.262797 7f3bfcaee700  5 osd.398 pg_epoch:  
6327 pg[16.8f5s14( empty local-les=6326 n=0 ec=6293 les/c 6326/6326  
6327/6327/6327)  
[200,176,57,135,107,426,234,409,264,280,338,381,317,220,79] r=-1  
lpr=6327 pi=6293-6326/2 crt=0'0 inactive NOTIFY] exit Reset 0.000117 1  
0.000338
   -11> 2014-05-22 11:10:05.262956 7f3bfcaee700  5 osd.398 pg_epoch:  
6327 pg[16.8f5s14( empty local-les=6326 n=0 ec=6293 les/c 6326/6326  
6327/6327/6327)  
[200,176,57,135,107,426,234,409,264,280,338,381,317,220,79] r=-1  
lpr=6327 pi=6293-6326/2 crt=0'0 inactive NOTIFY] enter Started
   -10> 2014-05-22 11:10:05.262983 7f3bfcaee700  5 osd.398 pg_epoch:  
6327 pg[16.8f5s14( empty local-les=6326 n=0 ec=6293 les/c 6326/6326  
6327/6327/6327)  
[200,176,57,135,107,426,234,409,264,280,338,381,317,220,79] r=-1  
lpr=6327 pi=6293-6326/2 crt=0'0 inactive NOTIFY] enter Start
-9> 2014-05-22 11:10:05.262994 7f3bfcaee700  1 osd.398 pg_epoch:  
6327 pg[16.8f5s14( empty local-les=6326 n=0 ec=6293 les/c 6326/6326  
6327/6327/6327)  
[200,176,57,135,107,426,234,409,264,280,338,381,317,220,79] r=-1  
lpr=6327 pi=6293-6326/2 crt=0'0 inactive NOTIFY] state:  
transitioning to Stray
-8> 2014-05-22 11:10:05.263151 7f3bfcaee700  5 osd.398 pg_epoch:  
6327 pg[16.8f5s14( empty local-les=6326 n=0 ec=6293 les/c 6326/6326  
6327/6327/6327)  
[200,176,57,135,107,426,234,409,264,280,338,381,317,220,79] r=-1  
lpr=6327 pi=6293-6326/2 crt=0'0 inactive NOTIFY] exit Start 0.000169 0  
0.00
-7> 2014-05-22 11:10:05.263385 7f3bfcaee700  5 osd.398 pg_epoch:  
6327 pg[16.8f5s14( empty local-les=6326 n=0 ec=6293 les/c 6326/6326  
6327/6327/6327)  
[200,176,57,135,107,426,234,409,264,280,338,381,317,220,79] r=-1  
lpr=6327 pi=6293-6326/2 crt=0'0 inactive NOTIFY] enter Started/Stray
-6> 2014-05-22 11:10:05.26