Re: [ceph-users] vm fs corrupt after pgs stuck

2014-01-03 Thread Josh Durgin

On 01/02/2014 01:40 PM, James Harper wrote:


I just had to restore an ms exchange database after an ceph hiccup (no actual
data lost - Exchange is very good like that with its no loss restore!). The 
order
of events went something like:

. Loss of connection on osd to the cluster network (public network was okay)
. pgs reported stuck
. stopped osd on the bad server
. resolved network problem
. restarted osd on the bad server
. noticed that the vm running exchange had hung
. rebooted and vm did a chkdsk automatically
. exchange refused to mount the main mailbox store

I'm not using rbd caching or anything, so for ntfs to lose files like that means
something fairly nasty happened. My best guess is that the loss of
connectivity and function while ceph was figuring out what was going on
meant that windows IO was frozen and started timing out, but I still can't see
how that could result in corruption.


NTFS may have gotten confused if some I/Os completed fine but others
timed out. It looks like ntfs journals metadata, but not data, so it
could lose data not written out yet after this kind of failure,
assuming it stops doing I/O after some timeouts are hit, so it's
similar to a sudden power loss. If the application was not doing the
windows equivalent of O_SYNC it could still lose writes. I'm not too
familiar with windows, but perhaps there's a way to configure disk
timeout behavior or NTFS writeback.


Any suggestions on how I could avoid this situation in the future would be
greatly appreciated!



Forgot to mention. This has also happened once previously when the OOM killer 
targeted ceph-osd.


If this caused I/O timeouts, it would make sense. If you can't adjust
the guest timeouts, you might want to decrease the ceph timeouts for
noticing and marking out osds with network or other issues.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] snapshot atomicity

2014-01-03 Thread Josh Durgin

On 01/02/2014 10:51 PM, James Harper wrote:

I've not used ceph snapshots before. The documentation says that the rbd device 
should not be in use before creating a snapshot. Does this mean that creating a 
snapshot is not an atomic operation? I'm happy with a crash consistent 
filesystem if that's all the warning is about.


It's atomic, the warning is just that it's crash consistent, not
application-level consistent.


If it is atomic, can you create multiple snapshots as an atomic operation? The 
use case for this would be a database spread across multiple volumes, eg 
database on one rbd, logfiles on another.


No, but now that you mention it this would be technically pretty
simple to implement. If multiple rbds referred to the same place to get
their snapshot context, they could all be snapshotted atomically.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph osd perf question

2014-01-03 Thread Andrei Mikhailovsky
Hi guys, 

Could someone explain what's the new perf stats show and if the numbers are 
reasonable on my cluster? 

I am concerned about the high fs_commit_latency, which seems to be above 150ms 
for all osds. I've tried to find the documentation on what this command 
actually shows, but couldn't find anything. 

I am using 3TB sas drives with 4 osd journals on each ssd. Are the numbers 
below reasonable for a fairly idle ceph cluster (osd utilisation below 10% on 
average)? 

# ceph osd perf 
osdid fs_commit_latency(ms) fs_apply_latency(ms) 
0 192 4 
1 265 4 
2 116 1 
3 125 2 
4 166 1 
5 209 3 
6 184 6 
7 142 2 
8 209 1 
9 166 1 
10 216 1 
11 308 3 
12 150 2 
13 125 1 
14 175 2 
15 142 2 
16 150 4 


when the cluster get's a bit busy (osd utilisation below 50% on average) I see: 

# ceph osd perf 
osdid fs_commit_latency(ms) fs_apply_latency(ms) 
0 551 11 
1 284 25 
2 517 41 
3 492 14 
4 625 13 
5 309 26 
6 650 9 
7 517 21 
8 634 25 
9 784 32 
10 392 7 
11 501 8 
12 602 12 
13 467 14 
14 476 36 
15 451 11 
16 383 21 



Thanks 

Andrei 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Rados] How long will it take to fix a broken replica

2014-01-03 Thread Wido den Hollander

On 01/02/2014 04:00 PM, Kuo Hugo wrote:

Hi all,

I did a test to ensure Rados's recovering.

1. echo string into a object from a placement group's directory on a OSD.
2. After osd scrub, the ceph health shows " 1pgs inconsistent " . Will
it be fixed later?



You manually have to instruct the OSD to repair the PG.

iirc it's: ceph pg repair 


Thanks


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is cepf feasible for storing large no. of small files with care of handling OSD failure(disk i/o error....) so it can complete pending replication independent from no. of files

2014-01-03 Thread Wido den Hollander

On 01/02/2014 05:42 PM, upendrayadav.u wrote:

Hi,

1. Is ceph is feasible for storing large no. of small files in ceph
cluster with care of osd failure and recovery process.

2. if we have *4TB **OSD(almosst 85% full)* and storing only small size
files(500 KB to 1024 KB), And it got failed(due to disk i/o error)
then how much time it will take to complete all pending replication?
What are the factors that will affect this replication process? Is this
total time to complete pending replication is independent from the *no.
of files* to replicate. Means failure recovery depends on only size of
OSD not on no. of files to replicate.


Please forget the concept of files, we talk about object inside Ceph / 
RADOS :)


It's hard to predict how long it will take, but it depends on the number 
of PGs and the amount of objects inside the PGs.


The more objects you have, the longer recovery will take.

Btw, I wouldn't fill a OSD until 85%, that's a bit to high. I'd stay 
below 80%.



3. We have 64 no. of disks(with JBOD configuration) for one machine. Is
this necessary to run one OSD per disk. In this, Is there possible to
combined 8 no. of disk for one OSD?



Run one OSD per disk, that gives you best fault tolerance. You can run 
one OSD with something like RAID on multiple drives, but that reduces 
your fault tolerance.


Wido


Thanks a lot for giving ur precious time for me... hope this time will
get response.
*
*
*:( Last 2 mail have no reply... :( *

*
*
*Regards,*
*Upendra Yadav*
*DFS*



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rados Gateway problem

2014-01-03 Thread Julien Calvet
Hello,

I have a problem on Rados Gw

When I do a wget http://p1.13h.com/swift/v1/test/test.mp3 on this object, there 
is no problem to get it.

but I put it in a browser or VLC, it stopped playing after 32 seconds or less

Any one could help me ?

Regards,

Julien___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is cepf feasible for storing large no. of small files with care of handling OSD failure(disk i/o error....) so it can complete pending replication independent from no. of files

2014-01-03 Thread upendrayadav.u
Thanks a lot... for your detailed and very clear answer :)Regards,Upendra YadavDFS On Fri, 03 Jan 2014 15:52:09 +0530 Wido den Hollander  wrote  On 01/02/2014 05:42 PM, upendrayadav.u wrote: > Hi, > > 1. Is ceph is feasible for storing large no. of small files in ceph > cluster with care of osd failure and recovery process. > > 2. if we have *4TB **OSD(almosst 85% full)* and storing only small size > files(500 KB to 1024 KB), And it got failed(due to disk i/o error) > then how much time it will take to complete all pending replication? > What are the factors that will affect this replication process? Is this > total time to complete pending replication is independent from the *no. > of files* to replicate. Means failure recovery depends on only size of > OSD not on no. of files to replicate.  Please forget the concept of files, we talk about object inside Ceph /  RADOS :)  It's hard to predict how long it will take, but it depends on the number  of PGs and the amount of objects inside the PGs.  The more objects you have, the longer recovery will take.  Btw, I wouldn't fill a OSD until 85%, that's a bit to high. I'd stay  below 80%.  > 3. We have 64 no. of disks(with JBOD configuration) for one machine. Is > this necessary to run one OSD per disk. In this, Is there possible to > combined 8 no. of disk for one OSD? >  Run one OSD per disk, that gives you best fault tolerance. You can run  one OSD with something like RAID on multiple drives, but that reduces  your fault tolerance.  Wido  > Thanks a lot for giving ur precious time for me... hope this time will > get response. > * > * > *:( Last 2 mail have no reply... :( * > > * > * > *Regards,* > *Upendra Yadav* > *DFS* > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >   --  Wido den Hollander 42on B.V.  Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [ANN] ceph-deploy 1.3.4 released!

2014-01-03 Thread Alfredo Deza
Hi All,

There is a new release of ceph-deploy, the easy deployment tool for Ceph.

This is mostly a bug-fix release, although one minor feature was
added: the ability to
install/remove packages from remote hosts with a new sub-command: `pkg`

As we continue to add features (or improve old ones) we are also
making sure proper
documentation goes hand in hand with those changes too. For `pkg` this
is now documented
in the ceph-deploy docs page: http://ceph.com/ceph-deploy/docs/pkg.html

The complete changelog, including 1.3.4 changes can be found here:
http://ceph.com/ceph-deploy/docs/changelog.html#id1


Make sure you update!


Thanks,


Alfredo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to use the function ceph_open_layout

2014-01-03 Thread Noah Watkins
You'll need to register the new pool with the MDS:

ceph mds add_data_pool 

On Thu, Jan 2, 2014 at 9:48 PM, 鹏  wrote:
>  Hi all;
> today,  I want to use the fuction of ceph_open_layout() in libcephFs.h
>
> I creat a new pool  success,
> # rados mkpool data1
> and then I  edit the code like this:
>
> int fd = ceph_open_layout( cmount, c_path, O_RDONLY|O_CREAT, 0666. (1<<22),
> 1, (1<<22) , "data1")
>
> and then the fd is -22!
>
> when I use the data pool , it can success
> int fd = ceph_open_layout( cmount, c_path, O_RDONLY|O_CREAT, 0666. (1<<22),
> 1, (1<<22) , "data")
>
> the ceph_open_layout support read/write to a new pool???
>
>  thinks you for the help!
> yous !
>
>
>
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to use the function ceph_open_layout

2014-01-03 Thread Sage Weil
On Fri, 3 Jan 2014, ? wrote:
>  Hi all;
> today,  I want to use the fuction of ceph_open_layout() in libcephFs.h
> 
> I creat a new pool  success,
> # rados mkpool data1

You also need to do

ceph mds add_data_pool data1

sage

> and then I  edit the code like this:
> 
> int fd = ceph_open_layout( cmount, c_path, O_RDONLY|O_CREAT, 0666. (1<<22),
> 1, (1<<22) , "data1")
> 
> and then the fd is -22!
> 
> when I use the data pool , it can success
> int fd = ceph_open_layout( cmount, c_path, O_RDONLY|O_CREAT, 0666. (1<<22),
> 1, (1<<22) , "data")
> 
> the ceph_open_layout support read/write to a new pool???
> 
>  thinks you for the help!
> yous !
> 
> 
> 
> 
> 
>  
>  
> 
> 
> 
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] crush chooseleaf vs. choose

2014-01-03 Thread Sage Weil
Run

'ceph osd crush tunables optimal'

or adjust an offline map file via the crushtool command line (more 
annoying) and retest; I suspect that is the problem.

http://ceph.com/docs/master/rados/operations/crush-map/#tunables

sage


On Fri, 3 Jan 2014, Dietmar Maurer wrote:

> > In both cases, you only get 2 replicas on the remaining 2 hosts.
> 
> OK, I was able to reproduce this with crushtool.
> 
> > The difference is if you have 4 hosts with 2 osds.  In the choose case, you 
> > have
> > some fraction of the data that chose the down host in the first step (most 
> > of the
> > attempts, actually!) and then couldn't find a usable osd, leaving you with 
> > only 2
> 
> This is also reproducible.
> 
> > replicas.  With chooseleaf that doesn't happen.
> > 
> > The other difference is if you have one of the two OSDs on the host marked 
> > out.
> > In the choose case, the remaining OSD will get allocated 2x the data; in the
> > chooseleaf case, usage will remain proportional with the rest of the 
> > cluster and
> > the data from the out OSD will be distributed across other OSDs (at least 
> > when
> > there are > 3 hosts!).
> 
> I see, but data distribution seems not optimal in that case.
> 
> For example using this crush map:
> 
> # types
> type 0 osd
> type 1 host
> type 2 rack
> type 3 row
> type 4 room
> type 5 datacenter
> type 6 root
> 
> # buckets
> host prox-ceph-1 {
>   id -2   # do not change unnecessarily
>   # weight 7.260
>   alg straw
>   hash 0  # rjenkins1
>   item osd.0 weight 3.630
>   item osd.1 weight 3.630
> }
> host prox-ceph-2 {
>   id -3   # do not change unnecessarily
>   # weight 7.260
>   alg straw
>   hash 0  # rjenkins1
>   item osd.2 weight 3.630
>   item osd.3 weight 3.630
> }
> host prox-ceph-3 {
>   id -4   # do not change unnecessarily
>   # weight 3.630
>   alg straw
>   hash 0  # rjenkins1
>   item osd.4 weight 3.630
> }
> 
> host prox-ceph-4 {
>   id -5   # do not change unnecessarily
>   # weight 3.630
>   alg straw
>   hash 0  # rjenkins1
>   item osd.5 weight 3.630
> }
> 
> root default {
>   id -1   # do not change unnecessarily
>   # weight 21.780
>   alg straw
>   hash 0  # rjenkins1
>   item prox-ceph-1 weight 7.260   # 2 OSDs
>   item prox-ceph-2 weight 7.260   # 2 OSDs
>   item prox-ceph-3 weight 3.630   # 1 OSD
>   item prox-ceph-4 weight 3.630   # 1 OSD
> }
> 
> # rules
> rule data {
>   ruleset 0
>   type replicated
>   min_size 1
>   max_size 10
>   step take default
>   step chooseleaf firstn 0 type host
>   step emit
> }
> # end crush map
> 
> crushtool shows the following utilization:
> 
> # crushtool --test -i my.map --rule 0 --num-rep 3 --show-utilization
>   device 0:   423
>   device 1:   452
>   device 2:   429
>   device 3:   452
>   device 4:   661
>   device 5:   655
> 
> Any explanation for that?  Maybe related to the small number of devices?
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] snapshot atomicity

2014-01-03 Thread LaSalle, Jurvis


On 1/3/14, 3:21 AM, "Josh Durgin"  wrote:

>On 01/02/2014 10:51 PM, James Harper wrote:
>> I've not used ceph snapshots before. The documentation says that the
>>rbd device should not be in use before creating a snapshot. Does this
>>mean that creating a snapshot is not an atomic operation? I'm happy with
>>a crash consistent filesystem if that's all the warning is about.
>
>It's atomic, the warning is just that it's crash consistent, not
>application-level consistent.
>
>> If it is atomic, can you create multiple snapshots as an atomic
>>operation? The use case for this would be a database spread across
>>multiple volumes, eg database on one rbd, logfiles on another.
>
>No, but now that you mention it this would be technically pretty
>simple to implement. If multiple rbds referred to the same place to get
>their snapshot context, they could all be snapshotted atomically.
>
>Josh
>___

I had been trying to imagine a use for pool-level snapshotting after I¹d
read about that feature.  Thanks for settling that!

JL

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Rados] How long will it take to fix a broken replica

2014-01-03 Thread Kuo Hugo
That's useful information.

Thanks.




2014/1/3 Wido den Hollander 

> On 01/02/2014 04:00 PM, Kuo Hugo wrote:
>
>> Hi all,
>>
>> I did a test to ensure Rados's recovering.
>>
>> 1. echo string into a object from a placement group's directory on a OSD.
>> 2. After osd scrub, the ceph health shows " 1pgs inconsistent " . Will
>> it be fixed later?
>>
>>
> You manually have to instruct the OSD to repair the PG.
>
> iirc it's: ceph pg repair 
>
>  Thanks
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> --
> Wido den Hollander
> 42on B.V.
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Strange things on Swift RADOS Gateway

2014-01-03 Thread Julien Calvet
Hi all


I have a problem with gateway and swift.


When i try ti get by wget, curl, or swift command, I have no problem to get my 
file ! But when I tried to do it directly in my browser it stopped between 6 
and 40 seconds.

Ceph.conf:


[client.radosgw.gateway]
host = p1
keyring = /etc/ceph/keyring.radosgw.gateway
rgw socket path = /tmp/radosgw.sock
log file = /var/log/ceph/radosgw.log



Rados GW log level 20:

014-01-03 20:17:58.575271 7fa20f35d780 20 enqueued request req=0x20e72e0
2014-01-03 20:17:58.575283 7fa20f35d780 20 RGWWQ:
2014-01-03 20:17:58.575285 7fa20f35d780 20 req: 0x20e72e0
2014-01-03 20:17:58.575291 7fa20f35d780 10 allocated request req=0x20dd580
2014-01-03 20:17:58.575331 7fa1c37fe700 20 dequeued request req=0x20e72e0
2014-01-03 20:17:58.575340 7fa1c37fe700 20 RGWWQ: empty
2014-01-03 20:17:58.575346 7fa1c37fe700  1 == starting new request 
req=0x20e72e0 =
2014-01-03 20:17:58.575439 7fa1c37fe700  2 req 4:0.93::GET 
/swift/v1/test/big_buck_bunny.mp4::initializing
2014-01-03 20:17:58.575485 7fa1c37fe700 10 ver=v1 first=test 
req=big_buck_bunny.mp4
2014-01-03 20:17:58.575494 7fa1c37fe700 10 s->object=big_buck_bunny.mp4 
s->bucket=test
2014-01-03 20:17:58.575501 7fa1c37fe700 20 FCGI_ROLE=RESPONDER
2014-01-03 20:17:58.575503 7fa1c37fe700 20 
SCRIPT_URL=/swift/v1/test/big_buck_bunny.mp4
2014-01-03 20:17:58.575505 7fa1c37fe700 20 
SCRIPT_URI=http://p1.13h.com/swift/v1/test/big_buck_bunny.mp4
2014-01-03 20:17:58.575507 7fa1c37fe700 20 RGW_LOG_LEVEL=20
2014-01-03 20:17:58.575509 7fa1c37fe700 20 RGW_PRINT_CONTINUE=yes
2014-01-03 20:17:58.575511 7fa1c37fe700 20 RGW_SHOULD_LOG=yes
2014-01-03 20:17:58.575513 7fa1c37fe700 20 HTTP_HOST=p1.13h.com
2014-01-03 20:17:58.575514 7fa1c37fe700 20 HTTP_CONNECTION=keep-alive
2014-01-03 20:17:58.575516 7fa1c37fe700 20 HTTP_CACHE_CONTROL=max-age=0
2014-01-03 20:17:58.575518 7fa1c37fe700 20 
HTTP_ACCEPT=text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
2014-01-03 20:17:58.575521 7fa1c37fe700 20 HTTP_USER_AGENT=Mozilla/5.0 
(Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/31.0.1650.63 Safari/537.36
2014-01-03 20:17:58.575523 7fa1c37fe700 20 
HTTP_ACCEPT_ENCODING=gzip,deflate,sdch
2014-01-03 20:17:58.575525 7fa1c37fe700 20 
HTTP_ACCEPT_LANGUAGE=fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
2014-01-03 20:17:58.575527 7fa1c37fe700 20 
HTTP_COOKIE=_ga=GA1.2.942382960.1369422161
2014-01-03 20:17:58.575529 7fa1c37fe700 20 HTTP_RANGE=bytes=34256-34256
2014-01-03 20:17:58.575531 7fa1c37fe700 20 
HTTP_IF_RANGE=f13004eed4251c602bbe15737e8a1ecb
2014-01-03 20:17:58.575532 7fa1c37fe700 20 PATH=/usr/local/bin:/usr/bin:/bin
2014-01-03 20:17:58.575534 7fa1c37fe700 20 SERVER_SIGNATURE=
2014-01-03 20:17:58.575536 7fa1c37fe700 20 SERVER_SOFTWARE=Apache/2.2.22 
(Ubuntu)
2014-01-03 20:17:58.575538 7fa1c37fe700 20 SERVER_NAME=p1.13h.com
2014-01-03 20:17:58.575540 7fa1c37fe700 20 SERVER_ADDR=62.210.177.137
2014-01-03 20:17:58.575542 7fa1c37fe700 20 SERVER_PORT=80
2014-01-03 20:17:58.575544 7fa1c37fe700 20 REMOTE_ADDR=213.245.29.151
2014-01-03 20:17:58.575545 7fa1c37fe700 20 DOCUMENT_ROOT=/var/www
2014-01-03 20:17:58.575547 7fa1c37fe700 20 SERVER_ADMIN=ad...@13h.com
2014-01-03 20:17:58.575549 7fa1c37fe700 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
2014-01-03 20:17:58.575551 7fa1c37fe700 20 REMOTE_PORT=51892
2014-01-03 20:17:58.575553 7fa1c37fe700 20 GATEWAY_INTERFACE=CGI/1.1
2014-01-03 20:17:58.57 7fa1c37fe700 20 SERVER_PROTOCOL=HTTP/1.1
2014-01-03 20:17:58.575556 7fa1c37fe700 20 REQUEST_METHOD=GET
2014-01-03 20:17:58.575558 7fa1c37fe700 20 
QUERY_STRING=page=swift¶ms=/v1/test/big_buck_bunny.mp4
2014-01-03 20:17:58.575560 7fa1c37fe700 20 
REQUEST_URI=/swift/v1/test/big_buck_bunny.mp4
2014-01-03 20:17:58.575562 7fa1c37fe700 20 
SCRIPT_NAME=/swift/v1/test/big_buck_bunny.mp4
2014-01-03 20:17:58.575564 7fa1c37fe700  2 req 4:0.000219:swift:GET 
/swift/v1/test/big_buck_bunny.mp4::getting op
2014-01-03 20:17:58.575571 7fa1c37fe700  2 req 4:0.000226:swift:GET 
/swift/v1/test/big_buck_bunny.mp4:get_obj:authorizing
2014-01-03 20:17:58.575578 7fa1c37fe700  2 req 4:0.000233:swift:GET 
/swift/v1/test/big_buck_bunny.mp4:get_obj:reading permissions
2014-01-03 20:17:58.575602 7fa1c37fe700 20 get_obj_state: rctx=0x7fa1840027c0 
obj=.rgw:test state=0x7fa184012918 s->prefetch_data=0
2014-01-03 20:17:58.575615 7fa1c37fe700 10 moving .rgw+test to cache LRU end
2014-01-03 20:17:58.575619 7fa1c37fe700 10 cache get: name=.rgw+test : hit
2014-01-03 20:17:58.575630 7fa1c37fe700 20 get_obj_state: s->obj_tag was set 
empty
2014-01-03 20:17:58.575634 7fa1c37fe700 20 Read xattr: user.rgw.idtag
2014-01-03 20:17:58.575637 7fa1c37fe700 20 Read xattr: user.rgw.manifest
2014-01-03 20:17:58.575644 7fa1c37fe700 10 moving .rgw+test to cache LRU end
2014-01-03 20:17:58.575648 7fa1c37fe700 10 cache get: name=.rgw+test : hit
2014-01-03 20:17:58.575673 7fa1c37fe700 20 rgw_get_bucket_info: bucket 
instance: test(@{i=.rgw.buckets.index}.rgw.buckets[default.6016.1])
2014-01-

Re: [ceph-users] Monitor configuration issue

2014-01-03 Thread Matt Rabbitt
I figured out why this was happening.  When I went through the quick start
guide, I created a directory on the admin node that was /home/ceph/storage
and this is where ceph.conf, ceph.log, keyrings, etc. ended up.  What I
realized though is that when i was running the ceph commands on the admin
node, it reads the configuration file from /etc/ceph/ceph.conf.  I failed
to update the config on the admin node itself when I ran ceph-deploy config
push.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Procedure for planned reboots?

2014-01-03 Thread Dane Elwell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Hi,

I was wondering if there are any procedures for rebooting a node? Presumably 
when a node is rebooted, Ceph will lose contact with the OSDs and begin moving 
data around. I’ve not actually had to reboot a node of my cluster yet, but may 
need to do so in the near future. Does Ceph handle reboots gracefully or will 
it immediately begin moving data around?

I ask because due to the hardware we’re running, we have a fair few OSDs per 
node (between 32-40, I know this isn’t ideal). We recently had a node die on us 
briefly and it took about 2 hours to get back to HEALTH_OK once the node was 
back online after being down for around 15 minutes. This is with about 16TB of 
data (of 588TB total), so I’m worried about how a reboot (or another node 
failure) will affect us when we have more data on there.

Thanks

Dane
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJSx0stAAoJEFGV5AlrXTNfjf8P/AlBryGdjlOVmpGTO3hlOKSr
pq9gOXrn3x6hf/sX8yXg14TzsdbPkLADhgZ3s8di5uaeLlZJGtvv3zbUx5p8nGVy
LsXuLm+lL1FMCBSB+dhn/o5x9UknFNT7tgQbK/JpzKK4UuTZNIFkDPI676O1tcxu
L01tzX8OPoDpHeN0aLWSnFuRuS5i89WkDES9kZtimgc5cl9Rm6ELUHpUSznzhKm4
PHlS6/BTF4R39hXCLGDhgjL2zIFqGzXVIkC438Ns+thkWQ3xbjLIpBEpFjTd7lCS
bSaePLt3cBLW/kSYfkebJg8skdhoKYazRKNTW5vJ1aCNDnILae1sPsYXuiIS7aLT
H+eNY5aP4dvefvdWHg2bKkVdj+ERzd9yihvhxL6/3BZSFuC3D/fR2zl98WwY5Fkb
VMTj+HAwdx8bBtieWYsXy22Upnb/oTIuH6Q+PGivPJBftdpeTWpB57xEzWHcLpjd
nPZEB96ha3zV6Q1mUylkJXIuvzemOD2gUZJzt6bw/DHaswJXvPgzfEx2Nfb0Zd1l
2sjZ0Tp9bB15++5drXjpJRkpA2s+fIxkZPDh93IURzWQK49PZEUV7BfKcALrSRw4
2p8V7lNxtz/yQpNZJym5alrSS6xCx3dEeVL3lMiVwxBsVWJ+jquCkDNFyWf7jYFH
OraOjShq/roUD1gZvdqO
=Vyeo
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Procedure for planned reboots?

2014-01-03 Thread John Nielsen
On Jan 3, 2014, at 4:43 PM, Dane Elwell  wrote:

> I was wondering if there are any procedures for rebooting a node? Presumably 
> when a node is rebooted, Ceph will lose contact with the OSDs and begin 
> moving data around. I’ve not actually had to reboot a node of my cluster yet, 
> but may need to do so in the near future. Does Ceph handle reboots gracefully 
> or will it immediately begin moving data around?

There's a delay. By default I think it is 5 minutes. You can also run "ceph osd 
set noout" beforehand to prevent OSD's from being marked 'out' no matter how 
long they may have been 'down'. After your maintenance don't forget to run 
"ceph osd unset noout" to put things back to normal.

> I ask because due to the hardware we’re running, we have a fair few OSDs per 
> node (between 32-40, I know this isn’t ideal). We recently had a node die on 
> us briefly and it took about 2 hours to get back to HEALTH_OK once the node 
> was back online after being down for around 15 minutes. This is with about 
> 16TB of data (of 588TB total), so I’m worried about how a reboot (or another 
> node failure) will affect us when we have more data on there.

I normally set the "noout" flag as above, then reboot a single node and wait 
for all the OSDs to come back online and for peering, etc. to finish. I like to 
run "ceph osd tree" and "ceph pg stat" while waiting to see how things are 
going. Only once the cluster is happy and stable after the first reboot will I 
start a second. This all presumes that your crush map has multiple replicas and 
stores them on different hosts, of course.

JN

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com