Re: [ceph-users] e release

2013-05-13 Thread Dan van der Ster
On Fri, May 10, 2013 at 8:31 PM, Sage Weil  wrote:
> So far I've found
> a few latin names, but the main problem is that I can't find a single
> large list of species with the common names listed.

Go here: http://www.marinespecies.org/aphia.php?p=search
Search for common name begins with e
Taxon rank lower or equal to Class
Limit to
taxa belonging to Cephalopoda.

Then you'll get:

East Pacific red octopus
elédone
elédone
elegant bobtail
elegant bobtail squid
elegant cuttlefish
elongate jewell squid
emperor nautilus
encornet
encornet de Forbes
encornet oiseau
encornet veinè
encornet volant
espírula
Europæisk loligo
European common squid
European common squid
European flying squid
European squid
eye-flash squid

Also found some latin not yet posted:

"Ellesmere" - http://en.wikipedia.org/wiki/Ellesmerocerida
"Endo" - http://en.wikipedia.org/wiki/Endocerida

--
Dan
CERN IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor upgrade from 0.56.6 to 0.61.1 on squeeze failed!

2013-05-13 Thread Joao Eduardo Luis

On 05/12/2013 03:57 AM, Smart Weblications GmbH - Florian Wiessner wrote:

Hi,

i upgraded from 0.56.6 to 0.61.1 and tried to restart one monitor:



Hello Florian,


We are aware and actively working on a fix for this.

Ticket: http://tracker.ceph.com/issues/4974

Thanks!

  -Joao



/etc/init.d/ceph start mon
=== mon.4 ===
Starting Ceph mon.4 on node05...
[16366]: (33) Numerical argument out of domain
failed: 'ulimit -n 8192;  /usr/bin/ceph-mon -i 4 --pid-file
/var/run/ceph/mon.4.pid -c /etc/ceph/ceph.conf '
root@node05:/data/ceph/mon# ps aux|grep ceph-mon
root 16568  0.0  0.0   7596   840 pts/1S+   04:16   0:00 grep ceph-mon


I read the log and found out that conversion seemed to have failed, so i removed
store.db and tried again, without success, always saying:

0 _convert_machines mdsmap gv 31535406 already exists


Full log:

2013-05-12 04:04:54.266783 7fa95a3a6780  0 ceph version 0.61.1
(56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 11398
2013-05-12 04:04:54.339686 7fa95a3a6780  1 unable to open monitor store at
/data/ceph/mon
2013-05-12 04:04:54.339698 7fa95a3a6780  1 check for old monitor store format
2013-05-12 04:04:54.339700 7fa95a3a6780  1 store(/data/ceph/mon) mount
2013-05-12 04:04:54.347528 7fa95a3a6780  1 found old GV monitor store format --
should convert!
2013-05-12 04:04:54.353621 7fa95a3a6780  1 store(/data/ceph/mon) mount
2013-05-12 04:06:51.530091 7fa95a3a6780  0 _convert_machines mdsmap gv 31535406
already exists
2013-05-12 04:06:51.547889 7fa95a3a6780 -1 mon/Monitor.cc: In function 'void
Monitor::StoreConverter::_convert_machines(std::string)' thread 7fa95a3a6780
time 2013-05-12 04:06:51.530115
mon/Monitor.cc: 4413: FAILED assert(0 == "Duplicate GV -- something is wrong!")

  ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d)
  1: (Monitor::StoreConverter::_convert_machines(std::string)+0xcad) [0x49b0ed]
  2: (Monitor::StoreConverter::_convert_machines()+0xd7) [0x49be17]
  3: (Monitor::StoreConverter::convert()+0x2e0) [0x49c210]
  4: (main()+0x7ca) [0x47593a]
  5: (__libc_start_main()+0xfd) [0x7fa95b00dc8d]
  6: /usr/bin/ceph-mon() [0x473e39]
  NOTE: a copy of the executable, or `objdump -rdS ` is needed to
interpret this.

--- begin dump of recent events ---
-26> 2013-05-12 04:04:54.263903 7fa95a3a6780  5 asok(0x13f3000)
register_command perfcounters_dump hook 0x13e7010
-25> 2013-05-12 04:04:54.263925 7fa95a3a6780  5 asok(0x13f3000)
register_command 1 hook 0x13e7010
-24> 2013-05-12 04:04:54.263928 7fa95a3a6780  5 asok(0x13f3000)
register_command perf dump hook 0x13e7010
-23> 2013-05-12 04:04:54.263935 7fa95a3a6780  5 asok(0x13f3000)
register_command perfcounters_schema hook 0x13e7010
-22> 2013-05-12 04:04:54.263939 7fa95a3a6780  5 asok(0x13f3000)
register_command 2 hook 0x13e7010
-21> 2013-05-12 04:04:54.263941 7fa95a3a6780  5 asok(0x13f3000)
register_command perf schema hook 0x13e7010
-20> 2013-05-12 04:04:54.263946 7fa95a3a6780  5 asok(0x13f3000)
register_command config show hook 0x13e7010
-19> 2013-05-12 04:04:54.263951 7fa95a3a6780  5 asok(0x13f3000)
register_command config set hook 0x13e7010
-18> 2013-05-12 04:04:54.263956 7fa95a3a6780  5 asok(0x13f3000)
register_command log flush hook 0x13e7010
-17> 2013-05-12 04:04:54.263961 7fa95a3a6780  5 asok(0x13f3000)
register_command log dump hook 0x13e7010
-16> 2013-05-12 04:04:54.263966 7fa95a3a6780  5 asok(0x13f3000)
register_command log reopen hook 0x13e7010
-15> 2013-05-12 04:04:54.266783 7fa95a3a6780  0 ceph version 0.61.1
(56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 11398
-14> 2013-05-12 04:04:54.267348 7fa95a3a6780  1 finished 
global_init_daemonize
-13> 2013-05-12 04:04:54.322995 7fa95a3a6780  5 asok(0x13f3000) init
/var/run/ceph/ceph-mon.4.asok
-12> 2013-05-12 04:04:54.323018 7fa95a3a6780  5 asok(0x13f3000)
bind_and_listen /var/run/ceph/ceph-mon.4.asok
-11> 2013-05-12 04:04:54.323059 7fa95a3a6780  5 asok(0x13f3000)
register_command 0 hook 0x13e60b8
-10> 2013-05-12 04:04:54.323065 7fa95a3a6780  5 asok(0x13f3000)
register_command version hook 0x13e60b8
 -9> 2013-05-12 04:04:54.323070 7fa95a3a6780  5 asok(0x13f3000)
register_command git_version hook 0x13e60b8
 -8> 2013-05-12 04:04:54.323072 7fa95a3a6780  5 asok(0x13f3000)
register_command help hook 0x13e70b0
 -7> 2013-05-12 04:04:54.323105 7fa958c79700  5 asok(0x13f3000) entry start
 -6> 2013-05-12 04:04:54.339686 7fa95a3a6780  1 unable to open monitor store
at /data/ceph/mon
 -5> 2013-05-12 04:04:54.339698 7fa95a3a6780  1 check for old monitor store
format
 -4> 2013-05-12 04:04:54.339700 7fa95a3a6780  1 store(/data/ceph/mon) mount
 -3> 2013-05-12 04:04:54.347528 7fa95a3a6780  1 found old GV monitor store
format -- should convert!
 -2> 2013-05-12 04:04:54.353621 7fa95a3a6780  1 store(/data/ceph/mon) mount
 -1> 2013-05-12 04:06:51.530091 7fa95a3a6780  0 _convert_machines mdsmap gv
31535406 already exists
  0> 20

Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Greg

Le 13/05/2013 07:38, Olivier Bonvalet a écrit :

Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to verify the
disks are not the bottleneck for 1 client). All nodes are connected on a
1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :

sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
  0   0 0 0 0 0 - 0
  1  163923   91.958692 0.966117  0.431249
  2  166448   95.9602   100 0.513435   0.53849
  3  169074   98.6317   104 0.25631   0.55494
  4  119584   83.973540 1.80038   0.58712
  Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719

91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s

There is a 3x performance factor (same for write: ~60M benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?

You should try to increase read_ahead to 512K instead of the defaults
128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
on reads with that.


Olivier,

thanks a lot for pointing this out, it indeed makes a *huge* difference !

# dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s

(caches dropped before each test of course)

Mark, this is probably something you will want to investigate and 
explain in a "tweaking" topic of the documentation.


Regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Kernel support syncfs for Centos6.3

2013-05-13 Thread Lenon Join
Hi all,

I am test ceph 0.56.6 with CentOS 6.3
I have one server, use raid 6, and then divided into 2 partitions (2 OSD)
With CentOS 6.3 (kernel 2.6.32-358), OSDs on the same server frequently
error:

"  . osd.x [WRN] slow request x seconds old

... "

I think due to syncfs error.

I use command: grep -r syncfs /usr/include

Nothing is out


With kernel 3.9.0 :

grep -r syncfs /usr/include


/usr/include/asm-generic/unistd.h:#define __NR_syncfs 267
/usr/include/asm-generic/unistd.h:__SYSCALL(__NR_syncfs, sys_syncfs)
/usr/include/asm/unistd_32.h:#define __NR_syncfs 344
/usr/include/asm/unistd_64.h:#define __NR_syncfs 306
/usr/include/asm/unistd_64.h:__SYSCALL(__NR_syncfs, sys_syncfs)
/usr/include/bits/syscall.h:#define SYS_syncfs __NR_syncfs



Thus, kernel suitable for deployment CEPH ?

I need to install glibc or not?


JoinLenon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] e release

2013-05-13 Thread Rick Richardson
This might be taking some artistic license, but "Elegant Eledone " has a
nice ring to it.
 On May 13, 2013 3:42 AM, "Dan van der Ster"  wrote:

> On Fri, May 10, 2013 at 8:31 PM, Sage Weil  wrote:
> > So far I've found
> > a few latin names, but the main problem is that I can't find a single
> > large list of species with the common names listed.
>
> Go here: http://www.marinespecies.org/aphia.php?p=search
> Search for common name begins with e
> Taxon rank lower or equal to Class
> Limit to
> taxa belonging to Cephalopoda.
>
> Then you'll get:
>
> East Pacific red octopus
> elédone
> elédone
> elegant bobtail
> elegant bobtail squid
> elegant cuttlefish
> elongate jewell squid
> emperor nautilus
> encornet
> encornet de Forbes
> encornet oiseau
> encornet veinè
> encornet volant
> espírula
> Europæisk loligo
> European common squid
> European common squid
> European flying squid
> European squid
> eye-flash squid
>
> Also found some latin not yet posted:
>
> "Ellesmere" - http://en.wikipedia.org/wiki/Ellesmerocerida
> "Endo" - http://en.wikipedia.org/wiki/Endocerida
>
> --
> Dan
> CERN IT
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Mark Nelson

On 05/13/2013 07:26 AM, Greg wrote:

Le 13/05/2013 07:38, Olivier Bonvalet a écrit :

Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to verify the
disks are not the bottleneck for 1 client). All nodes are connected on a
1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :

sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat
avg lat
  0   0 0 0 0 0 - 0
  1  163923   91.958692 0.966117
0.431249
  2  166448   95.9602   100 0.513435
0.53849
  3  169074   98.6317   104 0.25631
0.55494
  4  119584   83.973540 1.80038
0.58712
  Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719

91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s

There is a 3x performance factor (same for write: ~60M benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?

You should try to increase read_ahead to 512K instead of the defaults
128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
on reads with that.


Olivier,

thanks a lot for pointing this out, it indeed makes a *huge* difference !

# dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s

(caches dropped before each test of course)

Mark, this is probably something you will want to investigate and
explain in a "tweaking" topic of the documentation.

Regards,


Out of curiosity, has your rados bench performance improved as well? 
We've also seen improvements for sequential read throughput when 
increasing read_ahead_kb. (it may decrease random iops in some cases 
though!)  The reason I didn't think to mention it here though is because 
I was just focused on the difference between rados bench and rbd.  It 
would be interesting to know if rbd has improved more dramatically than 
rados bench.


Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] e release

2013-05-13 Thread Loic Dachary


On 05/13/2013 03:32 PM, Rick Richardson wrote:
> This might be taking some artistic license, but "Elegant Eledone " has a nice 
> ring to it.

+1 :-)

> 
> On May 13, 2013 3:42 AM, "Dan van der Ster"  > wrote:
> 
> On Fri, May 10, 2013 at 8:31 PM, Sage Weil  > wrote:
> > So far I've found
> > a few latin names, but the main problem is that I can't find a single
> > large list of species with the common names listed.
> 
> Go here: http://www.marinespecies.org/aphia.php?p=search
> Search for common name begins with e
> Taxon rank lower or equal to Class
> Limit to
> taxa belonging to Cephalopoda.
> 
> Then you'll get:
> 
> East Pacific red octopus
> elédone
> elédone
> elegant bobtail
> elegant bobtail squid
> elegant cuttlefish
> elongate jewell squid
> emperor nautilus
> encornet
> encornet de Forbes
> encornet oiseau
> encornet veinè
> encornet volant
> espírula
> Europæisk loligo
> European common squid
> European common squid
> European flying squid
> European squid
> eye-flash squid
> 
> Also found some latin not yet posted:
> 
> "Ellesmere" - http://en.wikipedia.org/wiki/Ellesmerocerida
> "Endo" - http://en.wikipedia.org/wiki/Endocerida
> 
> --
> Dan
> CERN IT
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org 
> 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] e release

2013-05-13 Thread Dave Spano
Personally, just naming the release Emperor after the emperor nautilus or 
Encornet sounds nice. Word-wise, it seems to fit with release names like 
Argonaut, whereas Elegant Eledone sounds more like an Ubuntu release. 

Dave Spano 
 



- Original Message - 

From: "Rick Richardson"  
Cc: ceph-de...@vger.kernel.org, ceph-us...@ceph.com 
Sent: Monday, May 13, 2013 9:32:23 AM 
Subject: Re: [ceph-users] e release 


This might be taking some artistic license, but "Elegant Eledone " has a nice 
ring to it. 

On May 13, 2013 3:42 AM, "Dan van der Ster" < d...@vanderster.com > wrote: 


On Fri, May 10, 2013 at 8:31 PM, Sage Weil < s...@inktank.com > wrote: 
> So far I've found 
> a few latin names, but the main problem is that I can't find a single 
> large list of species with the common names listed. 

Go here: http://www.marinespecies.org/aphia.php?p=search 
Search for common name begins with e 
Taxon rank lower or equal to Class 
Limit to 
taxa belonging to Cephalopoda. 

Then you'll get: 

East Pacific red octopus 
elédone 
elédone 
elegant bobtail 
elegant bobtail squid 
elegant cuttlefish 
elongate jewell squid 
emperor nautilus 
encornet 
encornet de Forbes 
encornet oiseau 
encornet veinè 
encornet volant 
espírula 
Europæisk loligo 
European common squid 
European common squid 
European flying squid 
European squid 
eye-flash squid 

Also found some latin not yet posted: 

"Ellesmere" - http://en.wikipedia.org/wiki/Ellesmerocerida 
"Endo" - http://en.wikipedia.org/wiki/Endocerida 

-- 
Dan 
CERN IT 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majord...@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 



___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] e release

2013-05-13 Thread Steven Presser

+1 for Encornet


On 05/13/2013 10:31 AM, Dave Spano wrote:

Personally, just naming the release Emperor after the emperor nautilus or 
Encornet sounds nice. Word-wise, it seems to fit with release names like 
Argonaut, whereas Elegant Eledone sounds more like an Ubuntu release.

Dave Spano
  




- Original Message -

From: "Rick Richardson" 
Cc: ceph-de...@vger.kernel.org, ceph-us...@ceph.com
Sent: Monday, May 13, 2013 9:32:23 AM
Subject: Re: [ceph-users] e release


This might be taking some artistic license, but "Elegant Eledone " has a nice 
ring to it.

On May 13, 2013 3:42 AM, "Dan van der Ster" < d...@vanderster.com > wrote:


On Fri, May 10, 2013 at 8:31 PM, Sage Weil < s...@inktank.com > wrote:

So far I've found
a few latin names, but the main problem is that I can't find a single
large list of species with the common names listed.

Go here: http://www.marinespecies.org/aphia.php?p=search
Search for common name begins with e
Taxon rank lower or equal to Class
Limit to
taxa belonging to Cephalopoda.

Then you'll get:

East Pacific red octopus
elédone
elédone
elegant bobtail
elegant bobtail squid
elegant cuttlefish
elongate jewell squid
emperor nautilus
encornet
encornet de Forbes
encornet oiseau
encornet veinè
encornet volant
espírula
Europæisk loligo
European common squid
European common squid
European flying squid
European squid
eye-flash squid

Also found some latin not yet posted:

"Ellesmere" - http://en.wikipedia.org/wiki/Ellesmerocerida
"Endo" - http://en.wikipedia.org/wiki/Endocerida



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] e release

2013-05-13 Thread John Wilkins
Here a link to Nautiloids beginning with E:
http://en.wikipedia.org/wiki/List_of_nautiloids#E

On Mon, May 13, 2013 at 7:37 AM, Steven Presser  wrote:
> +1 for Encornet
>
>
> On 05/13/2013 10:31 AM, Dave Spano wrote:
>
> Personally, just naming the release Emperor after the emperor nautilus or
> Encornet sounds nice. Word-wise, it seems to fit with release names like
> Argonaut, whereas Elegant Eledone sounds more like an Ubuntu release.
>
> Dave Spano
>
>
>
>
> - Original Message -
>
> From: "Rick Richardson" 
> Cc: ceph-de...@vger.kernel.org, ceph-us...@ceph.com
> Sent: Monday, May 13, 2013 9:32:23 AM
> Subject: Re: [ceph-users] e release
>
>
> This might be taking some artistic license, but "Elegant Eledone " has a
> nice ring to it.
>
> On May 13, 2013 3:42 AM, "Dan van der Ster" < d...@vanderster.com > wrote:
>
>
> On Fri, May 10, 2013 at 8:31 PM, Sage Weil < s...@inktank.com > wrote:
>
> So far I've found
> a few latin names, but the main problem is that I can't find a single
> large list of species with the common names listed.
>
> Go here: http://www.marinespecies.org/aphia.php?p=search
> Search for common name begins with e
> Taxon rank lower or equal to Class
> Limit to
> taxa belonging to Cephalopoda.
>
> Then you'll get:
>
> East Pacific red octopus
> elédone
> elédone
> elegant bobtail
> elegant bobtail squid
> elegant cuttlefish
> elongate jewell squid
> emperor nautilus
> encornet
> encornet de Forbes
> encornet oiseau
> encornet veinè
> encornet volant
> espírula
> Europæisk loligo
> European common squid
> European common squid
> European flying squid
> European squid
> eye-flash squid
>
> Also found some latin not yet posted:
>
> "Ellesmere" - http://en.wikipedia.org/wiki/Ellesmerocerida
> "Endo" - http://en.wikipedia.org/wiki/Endocerida
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Greg

Le 13/05/2013 15:55, Mark Nelson a écrit :

On 05/13/2013 07:26 AM, Greg wrote:

Le 13/05/2013 07:38, Olivier Bonvalet a écrit :

Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to 
verify the
disks are not the bottleneck for 1 client). All nodes are connected 
on a

1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :

sec Cur ops   started  finished avg MB/s  cur MB/s  last lat
avg lat
  0   0 0 0 0 0 - 0
  1  163923   91.958692 0.966117
0.431249
  2  166448   95.9602   100 0.513435
0.53849
  3  169074   98.6317   104 0.25631
0.55494
  4  119584   83.973540 1.80038
0.58712
  Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719

91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s

There is a 3x performance factor (same for write: ~60M benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?

You should try to increase read_ahead to 512K instead of the defaults
128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
on reads with that.


Olivier,

thanks a lot for pointing this out, it indeed makes a *huge* 
difference !

# dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s

(caches dropped before each test of course)

Mark, this is probably something you will want to investigate and
explain in a "tweaking" topic of the documentation.

Regards,


Out of curiosity, has your rados bench performance improved as well? 
We've also seen improvements for sequential read throughput when 
increasing read_ahead_kb. (it may decrease random iops in some cases 
though!)  The reason I didn't think to mention it here though is 
because I was just focused on the difference between rados bench and 
rbd.  It would be interesting to know if rbd has improved more 
dramatically than rados bench.
Mark, the read ahead is set on the RBD block device (on the client), so 
it doesn't improve benchmark results as the benchmark doesn't use the 
block layer.


1 question remains : why did I have poor performance with 1 single 
writing thread ?


Regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Gandalf Corvotempesta
2013/5/13 Greg :
> thanks a lot for pointing this out, it indeed makes a *huge* difference !
>>
>> # dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
>>
>> 100+0 records in
>> 100+0 records out
>> 419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s
>
> (caches dropped before each test of course)

What if you set 1024 or greater value ?
Is bandwidth relative to the read ahead size?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Mark Nelson

On 05/13/2013 10:01 AM, Gandalf Corvotempesta wrote:

2013/5/13 Greg :

thanks a lot for pointing this out, it indeed makes a *huge* difference !


# dd if=/mnt/t/1 of=/dev/zero bs=4M count=100

100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s


(caches dropped before each test of course)


What if you set 1024 or greater value ?
Is bandwidth relative to the read ahead size?


It may help with sequential reads, but it may also slow down small 
random reads if you set it too big.  Probably a whole new article could 
be written on testing the effects of read_ahead at difference levels in 
the storage stack.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Greg

Le 13/05/2013 17:01, Gandalf Corvotempesta a écrit :

2013/5/13 Greg :

thanks a lot for pointing this out, it indeed makes a *huge* difference !

# dd if=/mnt/t/1 of=/dev/zero bs=4M count=100

100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s

(caches dropped before each test of course)

What if you set 1024 or greater value ?
Is bandwidth relative to the read ahead size?
Setting the value too high degrades performance, especially random IO 
performance.

You have to determine  the right choice for your usage.

Cheers,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CRUSH maps for multiple switches

2013-05-13 Thread Gregory Farnum
On Wednesday, May 8, 2013, Gandalf Corvotempesta wrote:

> Let's assume 20 OSDs servers and 4x 12 ports switches, 2 for public
> network and 2 for cluster netowork
>
> No link between public switches and no link between cluster switches.
>
> first 10 OSD servers connected to public switch1 and the other 10 OSDs
> connected to public switch2. The same apply for cluster network.
>
> 1 HP c7000 chassis with 4x 10GbE connected to public network (2x10 for
> each pubic switch)
>
> All mons will be connected (if needed) to both switches
>
> Will ceph able to load share across both switches with no interswitch link?
> What I would like to do is avoid a stackable switch (too expansive)
> and start with smaller switch and then adding ports when needed
> without loosing redundancy or performance.
>
> Interconnectinc public switches will result in at least 2 lost ports
> and a bottleneck when traffic is routed across that link.
>

What's your goal here? If the switches are completely isolated from each
other than Ceph is going to have trouble (it expects a fully connected
network), so I think the answer to your question I "no". But maybe you mean
something else and I'm just missing it. :)
-Greg

-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Mark Nelson

On 05/13/2013 09:52 AM, Greg wrote:

Le 13/05/2013 15:55, Mark Nelson a écrit :

On 05/13/2013 07:26 AM, Greg wrote:

Le 13/05/2013 07:38, Olivier Bonvalet a écrit :

Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to
verify the
disks are not the bottleneck for 1 client). All nodes are connected
on a
1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :

sec Cur ops   started  finished avg MB/s  cur MB/s  last lat
avg lat
  0   0 0 0 0 0 - 0
  1  163923   91.958692 0.966117
0.431249
  2  166448   95.9602   100 0.513435
0.53849
  3  169074   98.6317   104 0.25631
0.55494
  4  119584   83.973540 1.80038
0.58712
  Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719

91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s

There is a 3x performance factor (same for write: ~60M benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?

You should try to increase read_ahead to 512K instead of the defaults
128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
on reads with that.


Olivier,

thanks a lot for pointing this out, it indeed makes a *huge*
difference !

# dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s

(caches dropped before each test of course)

Mark, this is probably something you will want to investigate and
explain in a "tweaking" topic of the documentation.

Regards,


Out of curiosity, has your rados bench performance improved as well?
We've also seen improvements for sequential read throughput when
increasing read_ahead_kb. (it may decrease random iops in some cases
though!)  The reason I didn't think to mention it here though is
because I was just focused on the difference between rados bench and
rbd.  It would be interesting to know if rbd has improved more
dramatically than rados bench.

Mark, the read ahead is set on the RBD block device (on the client), so
it doesn't improve benchmark results as the benchmark doesn't use the
block layer.


Ah, I was thinking you had increased it on the OSDs (which can also 
help).  On the OSD side, if you are targeting spinning disks, it can 
depend a lot on how much data is stored per track and the cost of head 
switches and track switches.




1 question remains : why did I have poor performance with 1 single
writing thread ?


In general, parallelism is really helpful because it hides latency and 
also helps you spread the load over all of your OSDs.  Even on a single 
disk, having concurrent requests lets the scheduler/controller do a 
better job of ordering requests.  Even on high performance distributed 
file systems like lustre you generally are going to do best with lots of 
IO nodes reading/writing multiple files.




Regards,


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Number of objects per pool?

2013-05-13 Thread Gregory Farnum
On Wednesday, May 8, 2013, Craig Lewis wrote:

>  Is there a practical limit to the number of objects I can store in a pool?
>

Nope!


> I'm planning to use RADOS Gateway, and I'm planning to start by adding
> about 1M objects to the gateway.  Once that initial migration is done and
> burns in, I want to migrate in another 20M objects.  I was planning to use
> a single S3 bucket, but I can work with many buckets if necessary.
>

This might cause you trouble -- each bucket maintains an index which is
located (for now) on a single OSD. You'll probably want one form or another
of sharing across buckets.



> I see that the RADOS Gateway stores the objects in the .rgw.buckets pool.
> So I did a quick test with the rados bench tool.  I see object creation
> slowing down as more objects are added to the pool, and latency increases.
>

That's fairly odd. Can you share more details of the results? My guess is
that you were measuring something else, like the impact of filling up the
OSD journals. :)
-Greg


-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 0.56.6 with kernel 2.6.32-358 (centos 6.3)

2013-05-13 Thread Gregory Farnum
On Friday, May 10, 2013, Lenon Join wrote:

> Hi all,
>
> I deploy ceph 0.56.6,
>
> I have 1 server run OSD deamon (format ext4), 1 server run Mon + MDS.
>
>
>
> I use RAID 6 with 44TB capacity, I divided into 2 partitions *(ext4)*,
> each corresponding to 1 OSD.
>
> Ceph -s:
>
>health HEALTH_OK
>monmap e1: 1 mons at {0=10.160.0.70:6789/0}, election epoch 1, quorum
> 0 0
>osdmap e34: 2 osds: 2 up, 2 in
> pgmap v1207: 576 pgs: 576 active+clean; 79200 MB data, 194 GB used,
> 38569 GB / 40811 GB avail; 60490KB/s wr, 14op/s
>mdsmap e18: 1/1/1 up {0=1=up:active}, 1 up:standby
>
> ceph osd tree
>
> # idweight  type name   up/down reweight
> -1  44  root default
> -3  44  rack unknownrack
> -2  44  host Ceph-store
> 0   22  osd.0   up  1
> 1   22  osd.1   up  1
>
>
> But when I upload the data to mount partition of CEPH, I see errors:
>
> 2013-05-10 21:59:00.500316 osd.1 [WRN] 3 slow requests, 1 included below;
> oldest blocked for > 80.457194 secs
> 2013-05-10 21:59:00.500326 osd.1 [WRN] slow request 80.457194 seconds old,
> received at 2013-05-10 21:57:40.043056: osd_op(mds.0.5:437 200.0001
> [write 1189391~25263] 1.6e5f474) v4 currently no flag points reached
> 2013-05-10 21:59:05.500955 osd.1 [WRN] 4 slow requests, 1 included below;
> oldest blocked for > 85.457859 secs
> 2013-05-10 21:59:05.500960 osd.1 [WRN] slow request 40.456829 seconds old,
> received at 2013-05-10 21:58:25.044086: osd_op(mds.0.5:441 200.0001
> [write 1226678~7515] 1.6e5f474) v4 currently no flag points reached
> 2013-05-10 21:59:05.045241 osd.0 [WRN] 1 slow requests, 1 included below;
> oldest blocked for > 40.001093 secs
> 2013-05-10 21:59:05.045246 osd.0 [WRN] slow request 40.001093 seconds old,
> received at 2013-05-10 21:58:25.044108: osd_op(mds.0.5:442 200.
> [writefull 0~84] 1.844f3494) v4 currently no flag points reached
> 2013-05-10 21:59:26.577860 mon.0 [INF] pgmap v1216: 576 pgs: 576
> active+clean; 84095 MB data, 203 GB used, 38559 GB / 40811 GB avail;
> 19867KB/s wr, 4op/s
> 2013-05-10 21:59:10.501512 osd.1 [WRN] 4 slow requests, 1 included below;
> oldest blocked for > 90.458411 secs
> 2013-05-10 21:59:10.501518 osd.1 [WRN] slow request 80.645454 seconds old,
> received at 2013-05-10 21:57:49.856013: osd_op(mds.0.5:439 200.0001
> [write 1214654~1503] 1.6e5f474) v4 currently no flag points reached
> 2013-05-10 21:59:32.040478 mon.0 [INF] pgmap v1217: 576 pgs: 576
> active+clean; 84667 MB data, 204 GB used, 38558 GB / 40811 GB avail;
> 104MB/s wr, 26op/s
> 2013-05-10 21:59:15.502405 osd.1 [WRN] 4 slow requests, 1 included below;
> oldest blocked for > 95.459295 secs
> 2013-05-10 21:59:15.502414 osd.1 [WRN] slow request 80.458998 seconds old,
> received at 2013-05-10 21:57:55.043353: osd_op(mds.0.5:440 200.0001
> [write 1216157~10521] 1.6e5f474) v4 currently no flag points reached
> 2013-05-10 22:00:11.662631 mon.0 [INF] pgmap v1218: 576 pgs: 576
> active+clean; 85451 MB data, 205 GB used, 38557 GB / 40811 GB avail;
> 20290KB/s wr, 5op/s
> 2013-05-10 22:00:17.109001 mon.0 [INF] pgmap v1219: 576 pgs: 576
> active+clean; 86007 MB data, 206 GB used, 38556 GB / 40811 GB avail;
> 101MB/s wr, 27op/s
>
> It takes place continuously, without a break.
>
> Kernel 2.6.32-358 did not support sync ?
>
> I need help!
>
> Thanks
>

It sounds like your disks are unhappy with the Ceph-OSD workload (not too
surprising on top of a RAID-6 and ext4). Can you run "ceph osd tell
\* bench" and watch the results in a separate "ceph -w" window?

You can also try some more basic tests. See how each array does with
multiple synchronous write streams hitting it.
-Greg


-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Striped read

2013-05-13 Thread Gregory Farnum
On Friday, May 10, 2013, wrote:

> Hi,
>
> ** **
>
> I’d like to know how a file that’s been striped across multiple
> objects/object sets (potentially multiple placement groups) is
> reconstituted and returned back to a client?
>
> ** **
>
> For example say I have a 100 MB file, foo that’s been striped across 16
> objects in 2 object sets.  What is the data flow in terms of CRUSH to get
> back all the “bits” and return the reconstruct original file.  Is there
> some meta involved?
>
By default files are simply chucked into 4MB pieces, so the client computes
the location of each in turn (based on ino and offset in the FS, or RBD
volume and offset, etc) and fetches it. More complicated striping
strategies (which I believe you can lean about in the docs) might read part
of a few objects several times before moving on to another set.
If you're interested in more you'll need more detailed questions. :)
-Greg


-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] shared images

2013-05-13 Thread Harald Rößler

Hi Together

is there a description of how a shared image works in detail? Can such
an image can be used for a shared file system on two virtual machine
(KVM) to mount. In my case, write on one machine and read only on the
other KVM.Are the changes are visible on the read only KVM?

Thanks
With Regards
Harry
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd image clone flattening @ client or cluster level?

2013-05-13 Thread w sun
While planning the usage of fast clone from openstack glance image store to 
cinder volume, I am a little concerned about possible IO performance impact to 
the cinder volume service node if I have to perform flattening of the multiple 
image down the road. 
Am I right to assume the copying of the blocks incurred by the flattening task 
is done on the backend nodes of the cluster and there is no extra IO activities 
on the client side where the librbd operation is issued?
Thanks. --weiguo  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CRUSH maps for multiple switches

2013-05-13 Thread Gregory Farnum
[Please keep conversations on the list.]
On Mon, May 13, 2013 at 9:15 AM, Gandalf Corvotempesta
 wrote:
> 2013/5/13 Gregory Farnum :
>> What's your goal here? If the switches are completely isolated from each
>> other than Ceph is going to have trouble (it expects a fully connected
>> network), so I think the answer to your question I "no". But maybe you mean
>> something else and I'm just missing it. :)
>
> My goal is start with a small 10GbE switch and then add more switched
> when needed without
> loosing network ports by interconnectin them (that ports will also be
> a bottleneck, because i'll have 11 10GbE ports that will need to reach
> the other 11x 10GbE ports on the second switch with a single 10GbE
> link)

So you want to shard up your network and have the only common points
be the computers attached to multiple NICs? There are systems that
work like this in supercomputers, but they require all kinds of
specialized work and don't use TCP/IP. Unless there's a big disconnect
in my understanding, Ceph doesn't support this and never will — it's a
standard TCP/IP user and this network topology would never work for
any of those systems.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD snapshot - time and consistent

2013-05-13 Thread Gregory Farnum
On Sat, May 11, 2013 at 1:34 AM, Timofey Koolin  wrote:
> Is snapshot time depend from image size?

It shouldn't be.

> Do snapshot create consistent state of image for moment at start snapshot?
>
> For example if I have file system on don't stop IO before start snapshot -
> Is it worse than turn of power during IO?

It is exactly the same as turning off power during IO.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Maximums for Ceph architectures

2013-05-13 Thread Gregory Farnum
On Sat, May 11, 2013 at 4:47 AM, Igor Laskovy  wrote:
> Hi all,
>
> Does anybody know where to learn about Maximums for Ceph architectures?
> For example, I'm trying to find out about the maximum size of rbd image and
> cephfs file. Additionally want to know maximum size for RADOS Gateway object
> (meaning file for uploading).

The maximum size of a CephFS file is very large (a terabyte) and
configurable on MDSMap creation with the "mds max file size" config
option. I don't think RBD or RGW have max sizes, although somebody
might correct me.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel support syncfs for Centos6.3

2013-05-13 Thread Gregory Farnum
On Mon, May 13, 2013 at 6:13 AM, Lenon Join  wrote:
> Hi all,
>
> I am test ceph 0.56.6 with CentOS 6.3
> I have one server, use raid 6, and then divided into 2 partitions (2 OSD)
> With CentOS 6.3 (kernel 2.6.32-358), OSDs on the same server frequently
> error:
>
> "  . osd.x [WRN] slow request x seconds old
>
> ... "
>
> I think due to syncfs error.
>
> I use command: grep -r syncfs /usr/include
>
> Nothing is out
>
>
> With kernel 3.9.0 :
>
> grep -r syncfs /usr/include
>
> /usr/include/asm-generic/unistd.h:#define __NR_syncfs 267
> /usr/include/asm-generic/unistd.h:__SYSCALL(__NR_syncfs, sys_syncfs)
> /usr/include/asm/unistd_32.h:#define __NR_syncfs 344
> /usr/include/asm/unistd_64.h:#define __NR_syncfs
> 306
> /usr/include/asm/unistd_64.h:__SYSCALL(__NR_syncfs, sys_syncfs)
> /usr/include/bits/syscall.h:#define SYS_syncfs __NR_syncfs
>
>
>
> Thus, kernel suitable for deployment CEPH ?
>
> I need to install glibc or not?

I was under the impression that the 6.3 kernel did include the syncfs
call but didn't export it via glibc or anything, but maybe I'm
mistaken. Dan did the detection work for this and might know?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD snapshot - time and consistent

2013-05-13 Thread Leen Besselink
On Mon, May 13, 2013 at 09:39:09AM -0700, Gregory Farnum wrote:
> On Sat, May 11, 2013 at 1:34 AM, Timofey Koolin  wrote:
> > Is snapshot time depend from image size?
> 
> It shouldn't be.
> 
> > Do snapshot create consistent state of image for moment at start snapshot?
> >
> > For example if I have file system on don't stop IO before start snapshot -
> > Is it worse than turn of power during IO?
> 
> It is exactly the same as turning off power during IO.
> -Greg

Timofey, I assume you are asking about this for VMs ?

If you want a really good copy, which prevents potential problems with 
databases and so on.

You will need to ask the OS running in the VM to quiesce it's filesystems, 
before you snapshot.

You'll probably want to use an agent running on the OS so the hypervisor can 
tell the OS to do this.

Have a good day,
Leen.

> Software Engineer #42 @ http://inktank.com | http://ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel support syncfs for Centos6.3

2013-05-13 Thread Mark Nelson

On 05/13/2013 11:50 AM, Gregory Farnum wrote:

On Mon, May 13, 2013 at 6:13 AM, Lenon Join  wrote:

Hi all,

I am test ceph 0.56.6 with CentOS 6.3
I have one server, use raid 6, and then divided into 2 partitions (2 OSD)
With CentOS 6.3 (kernel 2.6.32-358), OSDs on the same server frequently
error:

"  . osd.x [WRN] slow request x seconds old

... "

I think due to syncfs error.

I use command: grep -r syncfs /usr/include

Nothing is out


With kernel 3.9.0 :

grep -r syncfs /usr/include

/usr/include/asm-generic/unistd.h:#define __NR_syncfs 267
/usr/include/asm-generic/unistd.h:__SYSCALL(__NR_syncfs, sys_syncfs)
/usr/include/asm/unistd_32.h:#define __NR_syncfs 344
/usr/include/asm/unistd_64.h:#define __NR_syncfs
306
/usr/include/asm/unistd_64.h:__SYSCALL(__NR_syncfs, sys_syncfs)
/usr/include/bits/syscall.h:#define SYS_syncfs __NR_syncfs



Thus, kernel suitable for deployment CEPH ?

I need to install glibc or not?


I was under the impression that the 6.3 kernel did include the syncfs
call but didn't export it via glibc or anything, but maybe I'm
mistaken. Dan did the detection work for this and might know?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


I believe you are correct Greg.  I was able to get bobtail to use syncfs 
on CentOS 5 so long as it had a modern kernel.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel support syncfs for Centos6.3

2013-05-13 Thread Dan Mick
On May 13, 2013 9:50 AM, "Gregory Farnum"  wrote:
>
> On Mon, May 13, 2013 at 6:13 AM, Lenon Join  wrote:
> > Hi all,
> >
> > I am test ceph 0.56.6 with CentOS 6.3
> > I have one server, use raid 6, and then divided into 2 partitions (2
OSD)
> > With CentOS 6.3 (kernel 2.6.32-358), OSDs on the same server frequently
> > error:
> >
> > "  . osd.x [WRN] slow request x seconds old
> >
> > ... "
> >
> > I think due to syncfs error.
> >
> > I use command: grep -r syncfs /usr/include
> >
> > Nothing is out
> >
> >
> > With kernel 3.9.0 :
> >
> > grep -r syncfs /usr/include
> >
> > /usr/include/asm-generic/unistd.h:#define __NR_syncfs 267
> > /usr/include/asm-generic/unistd.h:__SYSCALL(__NR_syncfs, sys_syncfs)
> > /usr/include/asm/unistd_32.h:#define __NR_syncfs 344
> > /usr/include/asm/unistd_64.h:#define __NR_syncfs
> > 306
> > /usr/include/asm/unistd_64.h:__SYSCALL(__NR_syncfs, sys_syncfs)
> > /usr/include/bits/syscall.h:#define SYS_syncfs __NR_syncfs
> >
> >
> >
> > Thus, kernel suitable for deployment CEPH ?
> >
> > I need to install glibc or not?
>
> I was under the impression that the 6.3 kernel did include the syncfs
> call but didn't export it via glibc or anything, but maybe I'm
> mistaken. Dan did the detection work for this and might know?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com

The daemon log should have info about what was detected.  It might not have
the symbol in that case in the file, but it should definitely mention it in
the log.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] shared images

2013-05-13 Thread Gregory Farnum
On Mon, May 13, 2013 at 9:10 AM, Harald Rößler  wrote:
>
> Hi Together
>
> is there a description of how a shared image works in detail? Can such
> an image can be used for a shared file system on two virtual machine
> (KVM) to mount. In my case, write on one machine and read only on the
> other KVM.Are the changes are visible on the read only KVM?

The image is just striped across RADOS objects. In general you can
think of it behaving exactly like a hard drive connected to your
computer over iSCSI — a proper shared FS (eg, OCFS2) will work on top
of it, but there's no magic that makes running an ext4 mount on two
machines work...
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] shared images

2013-05-13 Thread Harald Rößler
On Mon, 2013-05-13 at 18:55 +0200, Gregory Farnum wrote:
> On Mon, May 13, 2013 at 9:10 AM, Harald Rößler  wrote:
> >
> > Hi Together
> >
> > is there a description of how a shared image works in detail? Can such
> > an image can be used for a shared file system on two virtual machine
> > (KVM) to mount. In my case, write on one machine and read only on the
> > other KVM.Are the changes are visible on the read only KVM?
> 
> The image is just striped across RADOS objects. In general you can
> think of it behaving exactly like a hard drive connected to your
> computer over iSCSI — a proper shared FS (eg, OCFS2) will work on top
> of it, but there's no magic that makes running an ext4 mount on two
> machines work...
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com

Hi Greg

Thanks, and sorry maybe I did not explain clearly what I  mean or. When
I' mounting a rbd image on two KVM machines, then if I am writing a file
in one system the other system does not recognize the change of the file
system. I thought there is some magic in librbd which give the OS the
information that something have changed, like when I am mounting a NFS
share.

I saw in the documentation the "--shared tag" : 

Option for lock add that allows multiple clients to lock the same image
if they use the same tag. The tag is an arbitrary string. This is useful
for situations where an image must be open from more than one client at
once, like during live migration of a virtual machine, or for use
underneath a clustered file system.

The use case is multiply KVM systems are mounting the same data storage,
without use of cephfs (MDS).

Thanks
Harry


   

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] shared images

2013-05-13 Thread Gregory Farnum
On Mon, May 13, 2013 at 11:35 AM, Harald Rößler  wrote:
> On Mon, 2013-05-13 at 18:55 +0200, Gregory Farnum wrote:
>> On Mon, May 13, 2013 at 9:10 AM, Harald Rößler  
>> wrote:
>> >
>> > Hi Together
>> >
>> > is there a description of how a shared image works in detail? Can such
>> > an image can be used for a shared file system on two virtual machine
>> > (KVM) to mount. In my case, write on one machine and read only on the
>> > other KVM.Are the changes are visible on the read only KVM?
>>
>> The image is just striped across RADOS objects. In general you can
>> think of it behaving exactly like a hard drive connected to your
>> computer over iSCSI — a proper shared FS (eg, OCFS2) will work on top
>> of it, but there's no magic that makes running an ext4 mount on two
>> machines work...
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
> Hi Greg
>
> Thanks, and sorry maybe I did not explain clearly what I  mean or. When
> I' mounting a rbd image on two KVM machines, then if I am writing a file
> in one system the other system does not recognize the change of the file
> system. I thought there is some magic in librbd which give the OS the
> information that something have changed, like when I am mounting a NFS
> share.
>
> I saw in the documentation the "--shared tag" :
>
> Option for lock add that allows multiple clients to lock the same image
> if they use the same tag. The tag is an arbitrary string. This is useful
> for situations where an image must be open from more than one client at
> once, like during live migration of a virtual machine, or for use
> underneath a clustered file system.
>
> The use case is multiply KVM systems are mounting the same data storage,
> without use of cephfs (MDS).

Yeah, that absolutely will not work. RBD is a block device, not a
shared filesystem. ;)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] shared images

2013-05-13 Thread Jens Kristian Søgaard

Hi,


Thanks, and sorry maybe I did not explain clearly what I  mean or. When
I' mounting a rbd image on two KVM machines, then if I am writing a file
in one system the other system does not recognize the change of the file
system. I thought there is some magic in librbd which give the OS the


There is no magic. If you have formatted that rbd with a file system 
like ext4 or similar, it will never work.


I saw in the documentation the "--shared tag" : 


This can be used if you format the rbd with a clustered file system that 
is meant to be mounted on more than computer at the same time (like for 
example the OCFS2 file system).


--
Jens Kristian Søgaard, Mermaid Consulting ApS,
j...@mermaidconsulting.dk,
http://www.mermaidconsulting.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] shared images

2013-05-13 Thread Dan Mick



On 05/13/2013 09:55 AM, Gregory Farnum wrote:

On Mon, May 13, 2013 at 9:10 AM, Harald Rößler  wrote:


Hi Together

is there a description of how a shared image works in detail? Can such
an image can be used for a shared file system on two virtual machine
(KVM) to mount. In my case, write on one machine and read only on the
other KVM.Are the changes are visible on the read only KVM?


The image is just striped across RADOS objects. In general you can
think of it behaving exactly like a hard drive connected to your
computer over iSCSI — a proper shared FS (eg, OCFS2) will work on top
of it, but there's no magic that makes running an ext4 mount on two
machines work...


Also, if you're talking about RBD images, there can be VM-side caching, 
and so there's no guarantee that writes from the writing VM will be seen 
by the readonly VM.  rbd isn't meant for sharing.  You want a filesystem 
for things like that.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hardware recommendation / calculation for large cluster

2013-05-13 Thread Tim Mohlmann
Hi,

Ok, thanks for al the info.

Just "fyi", I am a mechanical / electrical marine service engineer. So basicly 
I think in Pressure, Flow, Contents, Voltage, (mili)amps, power and torque. So 
I am just trying to relevate it to the same prinicple. Hence my questions. I 
am certainly not a noob in linux, opensource and that kind of stuff.

It is just I got interested in on-line storage and by some googling I came 
across certain "products" (most of them being propietary) and some of them 
opensource (but looked unmaintained / not very modern) and one of them was 
ceph. After the reading the docs I had some questions and in my opinion they 
are answered.

I know now how to spend the money, and now it time to start finding out how to 
make it. I've got a whole bucket of ideas about public apps for my storage and 
all this needs to be researched for possibilities. (This was yet the start of 
my quest).

Again, thanks for the info. If this baby is going to fly, I will keep you 
posted about my findings. Maybe (and really really maybe) I will try to 
contribute to the source, for some features I already think I want to have ;).

Regards, Tim


On Monday 13 May 2013 00:25:19 Dmitri Maziuk wrote:
> On 2013-05-12 08:34, Tim Mohlmann wrote:
> > As for choking the backplane: That would just slow things down a bit, am I
> > right?
> 
> A bit, a lot, or not at all -- I think IRL you'll have to test it under
> your workload and see.
> 
> [ WD performance ]
> 
> > Did not know that. Do you have any references. Does this also apply for
> > the
> > enterprise disks?
> 
> Here's one write-up: https://wiki.archlinux.org/index.php/Advanced_Format
> 
> Have not tested "enterprise" disks.
> 
> > Another question: do you use desktop or enterprise disks in your cluster?
> > I am having trouble finding a MTBFs for desktop drives. And if I find
> > them, they are almost the same as enterprise drives. Is there a caveat in
> > there? Is the failure test done is different conditions? (Not that you
> > have to know that)
> > 
> > If the annual failure rate would be double, it would still be cheaper to
> > use desktop drives in a large cluster, but I just like to know to be
> > sure.
> I don't think anyone knows for sure how much of it is marketing bull.
> One rumour is the difference between "enterprise" and "desktop" drives
> is very often only the firmware and the price tag. So yeah, we use
> desktop versions because it's cheaper, but we use them in raids (usually
> 1/10 - and it's still cheaper), and we don't do super high performance
> i/o on them. (Our requirements are size rather than speed.)
> 
> Dima
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help! 61.1 killed my monitors in prod

2013-05-13 Thread Stephen Street

On May 10, 2013, at 3:39 PM, Joao Eduardo Luis  wrote:

> We would certainly be interested in taking a look at logs from those 
> monitors, and would appreciate if you could set 'debug mon = 20', 'debug auth 
> = 10' and 'debug ms = 1', and give them a spin until you hit your issue.
> 

I seeing the same problem at Jeppesen.  I running 0.61.1 with 3 MON, 4 OSD and 
1 MDS and a reboot of the cluster falls in the same state with hung 
ceph-create-keys and the monitors not running.  I add the debug setting as 
indicated.  This is a excerpt from of the output of "ceph status

"2013-05-13 12:37:21.249265 7f8b428a6780  1 -- :/0 messenger.start
2013-05-13 12:37:21.249500 7f8b428a6780  5 adding auth protocol: cephx
2013-05-13 12:37:21.249807 7f8b428a6780  2 auth: KeyRing::load: loaded key file 
/etc/ceph/ceph.client.admin.keyring
2013-05-13 12:37:21.250031 7f8b428a6780  1 -- :/12649 --> 192.168.139.4:6789/0 
-- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x2ae5b60 con 0x2ae57c0
2013-05-13 12:37:21.250219 7f8b428a4700  0 -- :/12649 >> 192.168.139.4:6789/0 
pipe(0x2ae5560 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:24.249964 7f8b3d918700  1 -- :/12649 mark_down 0x2ae57c0 -- 
0x2ae5560
2013-05-13 12:37:24.250150 7f8b3d918700  1 -- :/12649 --> 192.168.139.3:6789/0 
-- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34001350 con 0x7f8b34000e60
2013-05-13 12:37:24.250409 7f8b3c115700  0 -- :/12649 >> 192.168.139.3:6789/0 
pipe(0x7f8b34000c00 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:27.250277 7f8b3d918700  1 -- :/12649 mark_down 0x7f8b34000e60 
-- 0x7f8b34000c00
2013-05-13 12:37:27.250374 7f8b3d918700  1 -- :/12649 --> 192.168.139.4:6789/0 
-- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34003440 con 0x7f8b34003270
2013-05-13 12:37:27.250607 7f8b428a4700  0 -- :/12649 >> 192.168.139.4:6789/0 
pipe(0x7f8b34003010 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:30.250523 7f8b3d918700  1 -- :/12649 mark_down 0x7f8b34003270 
-- 0x7f8b34003010
2013-05-13 12:37:30.250619 7f8b3d918700  1 -- :/12649 --> 192.168.139.2:6789/0 
-- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34003dc0 con 0x7f8b34003b20
2013-05-13 12:37:30.251151 7f8b3c115700  1 -- 192.168.139.254:0/12649 learned 
my addr 192.168.139.254:0/12649
2013-05-13 12:37:33.250733 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b34003b20 -- 0x7f8b340038c0
2013-05-13 12:37:33.250885 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 
192.168.139.3:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34002920 
con 0x7f8b340025c0
2013-05-13 12:37:33.251081 7f8b2700  0 -- 192.168.139.254:0/12649 >> 
192.168.139.3:6789/0 pipe(0x7f8b34002360 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:36.251046 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b340025c0 -- 0x7f8b34002360
2013-05-13 12:37:36.251133 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 
192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34005010 
con 0x7f8b340030d0
2013-05-13 12:37:36.251376 7f8b428a4700  0 -- 192.168.139.254:0/12649 >> 
192.168.139.4:6789/0 pipe(0x7f8b34002e70 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:39.251250 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b340030d0 -- 0x7f8b34002e70
2013-05-13 12:37:39.251347 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 
192.168.139.2:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34005720 
con 0x7f8b34005480
2013-05-13 12:37:42.251493 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b34005480 -- 0x7f8b34005220
2013-05-13 12:37:42.251614 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 
192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b340047c0 
con 0x7f8b34004520
2013-05-13 12:37:42.251800 7f8b3c115700  0 -- 192.168.139.254:0/12649 >> 
192.168.139.4:6789/0 pipe(0x7f8b340042c0 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:45.251683 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b34004520 -- 0x7f8b340042c0
2013-05-13 12:37:45.251777 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 
192.168.139.2:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34004c40 
con 0x7f8b340049d0
2013-05-13 12:37:48.251928 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b340049d0 -- 0x7f8b34005d30
2013-05-13 12:37:48.252058 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 
192.168.139.3:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b340052e0 
con 0x7f8b34005040
2013-05-13 12:37:48.252252 7f8b2700  0 -- 192.168.139.254:0/12649 >> 
192.168.139.3:6789/0 pipe(0x7f8b34004de0 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:51.252149 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b34005040 -- 0x7f8b34004de0
2013-05-13 12:37:51.252236 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 
192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b340075b0 
con 0x7f8b34004a70
2013-05-13 12:37:51.252466 7f8b3c115700  0 -- 192.168.139.254:0/12649 >> 
192.168.139.4:6789/0 pipe(0x7f8b34007280 sd=8 :0 s=1 pgs=0 c

Re: [ceph-users] Hardware recommendation / calculation for large cluster

2013-05-13 Thread Leen Besselink
On Mon, May 13, 2013 at 09:30:38PM +0200, Tim Mohlmann wrote:
> Hi,
> 
> Ok, thanks for al the info.
> 
> Just "fyi", I am a mechanical / electrical marine service engineer. So 
> basicly 
> I think in Pressure, Flow, Contents, Voltage, (mili)amps, power and torque. 
> So 
> I am just trying to relevate it to the same prinicple. Hence my questions. I 
> am certainly not a noob in linux, opensource and that kind of stuff.
> 
> It is just I got interested in on-line storage and by some googling I came 
> across certain "products" (most of them being propietary) and some of them 
> opensource (but looked unmaintained / not very modern) and one of them was 
> ceph. After the reading the docs I had some questions and in my opinion they 
> are answered.
> 
> I know now how to spend the money, and now it time to start finding out how 
> to 
> make it. I've got a whole bucket of ideas about public apps for my storage 
> and 
> all this needs to be researched for possibilities. (This was yet the start of 
> my quest).
> 

Every journey starts with the first step. :-)

> Again, thanks for the info. If this baby is going to fly, I will keep you 
> posted about my findings. Maybe (and really really maybe) I will try to 
> contribute to the source, for some features I already think I want to have ;).
> 

Don't be shy about sharing your ideas for new features, maybe some are already 
available
or you might be able to do it with some scripting. Maybe someone thinks it is 
something
they might want to have as well and will start to work on it.

Recently there was an online developer summit to try and compile such a list 
for the near term:

http://ceph.com/events/ceph-developer-summit-summary-and-session-videos/

> Regards, Tim
> 
> 
> On Monday 13 May 2013 00:25:19 Dmitri Maziuk wrote:
> > On 2013-05-12 08:34, Tim Mohlmann wrote:
> > > As for choking the backplane: That would just slow things down a bit, am I
> > > right?
> > 
> > A bit, a lot, or not at all -- I think IRL you'll have to test it under
> > your workload and see.
> > 
> > [ WD performance ]
> > 
> > > Did not know that. Do you have any references. Does this also apply for
> > > the
> > > enterprise disks?
> > 
> > Here's one write-up: https://wiki.archlinux.org/index.php/Advanced_Format
> > 
> > Have not tested "enterprise" disks.
> > 
> > > Another question: do you use desktop or enterprise disks in your cluster?
> > > I am having trouble finding a MTBFs for desktop drives. And if I find
> > > them, they are almost the same as enterprise drives. Is there a caveat in
> > > there? Is the failure test done is different conditions? (Not that you
> > > have to know that)
> > > 
> > > If the annual failure rate would be double, it would still be cheaper to
> > > use desktop drives in a large cluster, but I just like to know to be
> > > sure.
> > I don't think anyone knows for sure how much of it is marketing bull.
> > One rumour is the difference between "enterprise" and "desktop" drives
> > is very often only the firmware and the price tag. So yeah, we use
> > desktop versions because it's cheaper, but we use them in raids (usually
> > 1/10 - and it's still cheaper), and we don't do super high performance
> > i/o on them. (Our requirements are size rather than speed.)
> > 
> > Dima
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: On Developer Summit topic Ceph stats and monitoring tools

2013-05-13 Thread Leen Besselink
Hi folks,

As I did't get a reply on the developer list at the time.

I thought I might try it again on the users-list.

So what do you think, good idea ? Bad idea ?

- Forwarded message from Leen Besselink  -

Date: Fri, 10 May 2013 00:35:28 +0200
From: Leen Besselink 
To: ceph-de...@vger.kernel.org
Cc: Kyle Bader 
Subject: On Developer Summit topic Ceph stats and monitoring tools
Reply-To: l...@consolejunkie.net
User-Agent: Mutt/1.5.21 (2010-09-15)

Hi Folks,

Today I looked at the blueprint and video on Ceph stats and monitoring tools.

And I'm thinking if you want to push data into Ganglia did you consider using 
sFlow instead ?

I've been looking at using sFlow and Ganglia for many different uses, just like 
they are blogging about:

http://blog.sflow.com/

The advantage of sFlow is that many applications/programs, including network 
switches and parts (like libvirt, openvswitch, kvm) that people would use to 
build OpenStack already support it.

And sFlow doesn't just sent statistics, but also samples in real time. So you 
actually get much more information.

Supposedly it is easy to add sFlow to an application/daemon/programs with very 
little overhead, but haven't looked into it.

Have a nice day,
Leen.

PS Kyle: I added you to the CC, I thought you might not be on ceph-devel and 
you seem to be sort
of a lead of this project.

- End forwarded message -
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd image clone flattening @ client or cluster level?

2013-05-13 Thread Josh Durgin

On 05/13/2013 09:17 AM, w sun wrote:

While planning the usage of fast clone from openstack glance image store
to cinder volume, I am a little concerned about possible IO performance
impact to the cinder volume service node if I have to perform flattening
of the multiple image down the road.

Am I right to assume the copying of the blocks incurred by the
flattening task is done on the backend nodes of the cluster and there is
no extra IO activities on the client side where the librbd operation is
issued?


The client reads data from the parent and copies it to the clone. This
can't be done by the osds alone since rados does not support
multi-object transactions, or having an osd act like a client to
another osd right now.

There's no limitation on flatten being performed just on one node
though. If client I/O is a problem, you can run flattens on as many
nodes as you like.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD Reference Counts for deletion

2013-05-13 Thread Mandell Degerness
I know that there was another report of the bad behavior when deleting
an RBD that is currently mounted on a host.  My problem is related,
but slightly different.

We are using openstack and Grizzly Cinder to create a bootable ceph
volume.  The instance was booted and all was well.  The server on
which the instance had been booted was unplugged.  The user deleted
the instance - which amounts to a database update on the Nova side.
They then tried to delete the volume, which failed with the following
error:

Traceback (most recent call last):
21262   File "/usr/lib64/python2.7/site-packages/cinder/volume/driver.py",
line 90, in _try_execute
21263 self._execute(*command, **kwargs)
21264   File "/usr/lib64/python2.7/site-packages/cinder/utils.py",
line 190, in execute
21265 cmd=' '.join(cmd))
21266 ProcessExecutionError: Unexpected error while running command.
21267 Command: rbd rm --pool rbd volume-66e11621-1c38-4e2d-9d90-cc511013c290
21268 Exit code: 16
21269 Stdout: '\rRemoving image: 1% complete...\rRemoving image: 2%
complete...\rRemoving image: 3% complete...\rRemoving image: 4%
complete...\rRemoving image: 5% complete  ...\rRemoving image: 6%
complete...\rRemoving image: 7% complete...\rRemoving image: 8%
complete...\rRemoving image: 9% complete...\rRemoving image: 10%
complete...\r  Removing image: 11% complete...\rRemoving image:
12% complete...\rRemoving image: 13% complete...\rRemoving image: 14%
complete...\rRemoving image: 15% complete...\rR  emoving image:
16% complete...\rRemoving image: 17% complete...\rRemoving image: 18%
complete...\rRemoving image: 19% complete...\rRemoving image: 20%
complete...\rRe  moving image: 21% complete...\rRemoving image:
22% complete...\rRemoving image: 23% complete...\rRemoving image: 24%
complete...\rRemoving image: 25% complete...\rRem  oving image:
26% complete...\rRemoving image: 27% complete...\rRemoving image: 28%
complete...\rRemoving image: 29% complete...\rRemoving image: 30%
complete...\rRemo  ving image: 31% complete...\rRemoving image:
32% complete...\rRemoving image: 33% complete...\rRemoving image: 34%
complete...\rRemoving image: 35% complete...\rRemov  ing image:
36% complete...\rRemoving image: 37% complete...\rRemoving image: 38%
complete...\rRemoving image: 39% complete...\rRemoving image: 40%
complete...\rRemovi  ng image: 41% complete...\rRemoving image:
42% complete...\rRemoving image: 43% complete...\rRemoving image: 44%
complete...\rRemoving image: 45% complete...\rRemovin  g image:
46% complete...\rRemoving image: 47% complete...\rRemoving image: 48%
complete...\rRemoving image: 49% complete...\rRemoving image: 50%
complete...\rRemoving   image: 51% complete...\rRemoving image:
52% complete...\rRemoving image: 53% complete...\rRemoving image: 54%
complete...\rRemoving image: 55% complete...\rRemoving   image:
56% complete...\rRemoving image: 57% complete...\rRemoving image: 58%
complete...\rRemoving image: 59% complete...\rRemoving image: 60%
complete...\rRemoving i  mage: 61% complete...\rRemoving image:
62% complete...\rRemoving image: 63% complete...\rRemoving image: 64%
complete...\rRemoving image: 65% complete...\rRemoving im  age:
66% complete...\rRemoving image: 67% complete...\rRemoving image: 68%
complete...\rRemoving image: 69% complete...\rRemoving image: 70%
complete...\rRemoving ima  ge: 71% complete...\rRemoving image:
72% complete...\rRemoving image: 73% complete...\rRemoving image: 74%
complete...\rRemoving image: 75% complete...\rRemoving imag  e:
76% complete...\rRemoving image: 77% complete...\rRemoving image: 78%
complete...\rRemoving image: 79% complete...\rRemoving image: 80%
complete...\rRemoving image  : 81% complete...\rRemoving image:
82% complete...\rRemoving image: 83% complete...\rRemoving image: 84%
complete...\rRemoving image: 85% complete...\rRemoving image:
86% complete...\rRemoving image: 87% complete...\rRemoving image: 88%
complete...\rRemoving image: 89% complete...\rRemoving image: 90%
complete...\rRemoving image:   91% complete...\rRemoving image:
92% complete...\rRemoving image: 93% complete...\rRemoving image: 94%
complete...\rRemoving image: 95% complete...\rRemoving image: 9
6% complete...\rRemoving image: 97% complete...\rRemoving image: 98%
complete...\rRemoving image: 99% complete...\rRemoving image: 99%
complete...failed.\n'
21270 Stderr: 'rbd: error: image still has watchers\nThis means the
image is still open or the client using it crashed. Try again after
closing/unmapping it or waiting 30s   for the crashed client to
timeout.\n2013-05-09 21:51:27.522986 7f8aca884780 -1 librbd: error
removing header: (16) Device or resource busy\n'

It appears to me that Ceph still believes the volume is mounted
somewhere.  Is there a way to tell Ceph to delete the RBD, in spite of
it's belief that it is mounted?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.c

Re: [ceph-users] RBD Reference Counts for deletion

2013-05-13 Thread Mandell Degerness
Sorry.  I should have mentioned, this is using the bobtail version of ceph.

On Mon, May 13, 2013 at 1:13 PM, Mandell Degerness
 wrote:
> I know that there was another report of the bad behavior when deleting
> an RBD that is currently mounted on a host.  My problem is related,
> but slightly different.
>
> We are using openstack and Grizzly Cinder to create a bootable ceph
> volume.  The instance was booted and all was well.  The server on
> which the instance had been booted was unplugged.  The user deleted
> the instance - which amounts to a database update on the Nova side.
> They then tried to delete the volume, which failed with the following
> error:
>
> Traceback (most recent call last):
> 21262   File "/usr/lib64/python2.7/site-packages/cinder/volume/driver.py",
> line 90, in _try_execute
> 21263 self._execute(*command, **kwargs)
> 21264   File "/usr/lib64/python2.7/site-packages/cinder/utils.py",
> line 190, in execute
> 21265 cmd=' '.join(cmd))
> 21266 ProcessExecutionError: Unexpected error while running command.
> 21267 Command: rbd rm --pool rbd volume-66e11621-1c38-4e2d-9d90-cc511013c290
> 21268 Exit code: 16
> 21269 Stdout: '\rRemoving image: 1% complete...\rRemoving image: 2%
> complete...\rRemoving image: 3% complete...\rRemoving image: 4%
> complete...\rRemoving image: 5% complete  ...\rRemoving image: 6%
> complete...\rRemoving image: 7% complete...\rRemoving image: 8%
> complete...\rRemoving image: 9% complete...\rRemoving image: 10%
> complete...\r  Removing image: 11% complete...\rRemoving image:
> 12% complete...\rRemoving image: 13% complete...\rRemoving image: 14%
> complete...\rRemoving image: 15% complete...\rR  emoving image:
> 16% complete...\rRemoving image: 17% complete...\rRemoving image: 18%
> complete...\rRemoving image: 19% complete...\rRemoving image: 20%
> complete...\rRe  moving image: 21% complete...\rRemoving image:
> 22% complete...\rRemoving image: 23% complete...\rRemoving image: 24%
> complete...\rRemoving image: 25% complete...\rRem  oving image:
> 26% complete...\rRemoving image: 27% complete...\rRemoving image: 28%
> complete...\rRemoving image: 29% complete...\rRemoving image: 30%
> complete...\rRemo  ving image: 31% complete...\rRemoving image:
> 32% complete...\rRemoving image: 33% complete...\rRemoving image: 34%
> complete...\rRemoving image: 35% complete...\rRemov  ing image:
> 36% complete...\rRemoving image: 37% complete...\rRemoving image: 38%
> complete...\rRemoving image: 39% complete...\rRemoving image: 40%
> complete...\rRemovi  ng image: 41% complete...\rRemoving image:
> 42% complete...\rRemoving image: 43% complete...\rRemoving image: 44%
> complete...\rRemoving image: 45% complete...\rRemovin  g image:
> 46% complete...\rRemoving image: 47% complete...\rRemoving image: 48%
> complete...\rRemoving image: 49% complete...\rRemoving image: 50%
> complete...\rRemoving   image: 51% complete...\rRemoving image:
> 52% complete...\rRemoving image: 53% complete...\rRemoving image: 54%
> complete...\rRemoving image: 55% complete...\rRemoving   image:
> 56% complete...\rRemoving image: 57% complete...\rRemoving image: 58%
> complete...\rRemoving image: 59% complete...\rRemoving image: 60%
> complete...\rRemoving i  mage: 61% complete...\rRemoving image:
> 62% complete...\rRemoving image: 63% complete...\rRemoving image: 64%
> complete...\rRemoving image: 65% complete...\rRemoving im  age:
> 66% complete...\rRemoving image: 67% complete...\rRemoving image: 68%
> complete...\rRemoving image: 69% complete...\rRemoving image: 70%
> complete...\rRemoving ima  ge: 71% complete...\rRemoving image:
> 72% complete...\rRemoving image: 73% complete...\rRemoving image: 74%
> complete...\rRemoving image: 75% complete...\rRemoving imag  e:
> 76% complete...\rRemoving image: 77% complete...\rRemoving image: 78%
> complete...\rRemoving image: 79% complete...\rRemoving image: 80%
> complete...\rRemoving image  : 81% complete...\rRemoving image:
> 82% complete...\rRemoving image: 83% complete...\rRemoving image: 84%
> complete...\rRemoving image: 85% complete...\rRemoving image:
> 86% complete...\rRemoving image: 87% complete...\rRemoving image: 88%
> complete...\rRemoving image: 89% complete...\rRemoving image: 90%
> complete...\rRemoving image:   91% complete...\rRemoving image:
> 92% complete...\rRemoving image: 93% complete...\rRemoving image: 94%
> complete...\rRemoving image: 95% complete...\rRemoving image: 9
> 6% complete...\rRemoving image: 97% complete...\rRemoving image: 98%
> complete...\rRemoving image: 99% complete...\rRemoving image: 99%
> complete...failed.\n'
> 21270 Stderr: 'rbd: error: image still has watchers\nThis means the
> image is still open or the client using it crashed. Try again after
> closing/unmapping it or waiting 30s   for the crashed client to
> timeout.\n2013-05-09 21:51:27.522986 7f8aca884780 -1 librbd: error
> removing header: (16) Device or resource busy\

Re: [ceph-users] Help! 61.1 killed my monitors in prod

2013-05-13 Thread Joao Eduardo Luis

On 05/13/2013 08:40 PM, Stephen Street wrote:


On May 10, 2013, at 3:39 PM, Joao Eduardo Luis  wrote:


We would certainly be interested in taking a look at logs from
those  monitors, and would appreciate if you could set 'debug mon = 20', 'debug
auth = 10' and 'debug ms = 1', and give them a spin until you hit your
issue.


I seeing the same problem at Jeppesen.  I running 0.61.1 with 3 MON,
4 OSD and 1 MDS and a reboot of the cluster falls in the same state
with hung ceph-create-keys and the monitors not running.  I add the
debug setting as indicated.  This is a excerpt from of the output of
"ceph status


All this shows is that connections from 'ceph' to the monitors are being 
dropped/closed.


Assessing what's going on will require logs from the monitors with the 
same debug levels as stated before.


  -Joao



"2013-05-13 12:37:21.249265 7f8b428a6780  1 -- :/0 messenger.start
2013-05-13 12:37:21.249500 7f8b428a6780  5 adding auth protocol: cephx
2013-05-13 12:37:21.249807 7f8b428a6780  2 auth: KeyRing::load: loaded key file 
/etc/ceph/ceph.client.admin.keyring
2013-05-13 12:37:21.250031 7f8b428a6780  1 -- :/12649 --> 192.168.139.4:6789/0 
-- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x2ae5b60 con 0x2ae57c0
2013-05-13 12:37:21.250219 7f8b428a4700  0 -- :/12649 >> 192.168.139.4:6789/0 
pipe(0x2ae5560 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:24.249964 7f8b3d918700  1 -- :/12649 mark_down 0x2ae57c0 -- 
0x2ae5560
2013-05-13 12:37:24.250150 7f8b3d918700  1 -- :/12649 --> 192.168.139.3:6789/0 
-- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34001350 con 0x7f8b34000e60
2013-05-13 12:37:24.250409 7f8b3c115700  0 -- :/12649 >> 192.168.139.3:6789/0 
pipe(0x7f8b34000c00 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:27.250277 7f8b3d918700  1 -- :/12649 mark_down 0x7f8b34000e60 
-- 0x7f8b34000c00
2013-05-13 12:37:27.250374 7f8b3d918700  1 -- :/12649 --> 192.168.139.4:6789/0 
-- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34003440 con 0x7f8b34003270
2013-05-13 12:37:27.250607 7f8b428a4700  0 -- :/12649 >> 192.168.139.4:6789/0 
pipe(0x7f8b34003010 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:30.250523 7f8b3d918700  1 -- :/12649 mark_down 0x7f8b34003270 
-- 0x7f8b34003010
2013-05-13 12:37:30.250619 7f8b3d918700  1 -- :/12649 --> 192.168.139.2:6789/0 
-- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34003dc0 con 0x7f8b34003b20
2013-05-13 12:37:30.251151 7f8b3c115700  1 -- 192.168.139.254:0/12649 learned 
my addr 192.168.139.254:0/12649
2013-05-13 12:37:33.250733 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b34003b20 -- 0x7f8b340038c0
2013-05-13 12:37:33.250885 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 
192.168.139.3:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34002920 
con 0x7f8b340025c0
2013-05-13 12:37:33.251081 7f8b2700  0 -- 192.168.139.254:0/12649 >> 
192.168.139.3:6789/0 pipe(0x7f8b34002360 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:36.251046 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b340025c0 -- 0x7f8b34002360
2013-05-13 12:37:36.251133 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 
192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34005010 
con 0x7f8b340030d0
2013-05-13 12:37:36.251376 7f8b428a4700  0 -- 192.168.139.254:0/12649 >> 
192.168.139.4:6789/0 pipe(0x7f8b34002e70 sd=8 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:39.251250 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b340030d0 -- 0x7f8b34002e70
2013-05-13 12:37:39.251347 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 
192.168.139.2:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34005720 
con 0x7f8b34005480
2013-05-13 12:37:42.251493 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b34005480 -- 0x7f8b34005220
2013-05-13 12:37:42.251614 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 
192.168.139.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b340047c0 
con 0x7f8b34004520
2013-05-13 12:37:42.251800 7f8b3c115700  0 -- 192.168.139.254:0/12649 >> 
192.168.139.4:6789/0 pipe(0x7f8b340042c0 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:45.251683 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b34004520 -- 0x7f8b340042c0
2013-05-13 12:37:45.251777 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 
192.168.139.2:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b34004c40 
con 0x7f8b340049d0
2013-05-13 12:37:48.251928 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b340049d0 -- 0x7f8b34005d30
2013-05-13 12:37:48.252058 7f8b3d918700  1 -- 192.168.139.254:0/12649 --> 
192.168.139.3:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f8b340052e0 
con 0x7f8b34005040
2013-05-13 12:37:48.252252 7f8b2700  0 -- 192.168.139.254:0/12649 >> 
192.168.139.3:6789/0 pipe(0x7f8b34004de0 sd=7 :0 s=1 pgs=0 cs=0 l=1).fault
2013-05-13 12:37:51.252149 7f8b3d918700  1 -- 192.168.139.254:0/12649 mark_down 
0x7f8b34005040 -- 0x7f8b34004de0
2013-05-13 12:37:51.252236 7f8b3d918700  1 -- 192.168.139.

Re: [ceph-users] RBD Reference Counts for deletion

2013-05-13 Thread Sage Weil
On Mon, 13 May 2013, Mandell Degerness wrote:
> Sorry.  I should have mentioned, this is using the bobtail version of ceph.
> 
> On Mon, May 13, 2013 at 1:13 PM, Mandell Degerness
>  wrote:
> > I know that there was another report of the bad behavior when deleting
> > an RBD that is currently mounted on a host.  My problem is related,
> > but slightly different.
> >
> > We are using openstack and Grizzly Cinder to create a bootable ceph
> > volume.  The instance was booted and all was well.  The server on
> > which the instance had been booted was unplugged.  The user deleted
> > the instance - which amounts to a database update on the Nova side.
> > They then tried to delete the volume, which failed with the following
> > error:
> >
> > Traceback (most recent call last):
> > 21262   File "/usr/lib64/python2.7/site-packages/cinder/volume/driver.py",
> > line 90, in _try_execute
> > 21263 self._execute(*command, **kwargs)
> > 21264   File "/usr/lib64/python2.7/site-packages/cinder/utils.py",
> > line 190, in execute
> > 21265 cmd=' '.join(cmd))
> > 21266 ProcessExecutionError: Unexpected error while running command.
> > 21267 Command: rbd rm --pool rbd volume-66e11621-1c38-4e2d-9d90-cc511013c290
> > 21268 Exit code: 16
> > 21269 Stdout: '\rRemoving image: 1% complete...\rRemoving image: 2%
> > complete...\rRemoving image: 3% complete...\rRemoving image: 4%
> > complete...\rRemoving image: 5% complete  ...\rRemoving image: 6%
> > complete...\rRemoving image: 7% complete...\rRemoving image: 8%
> > complete...\rRemoving image: 9% complete...\rRemoving image: 10%
> > complete...\r  Removing image: 11% complete...\rRemoving image:
> > 12% complete...\rRemoving image: 13% complete...\rRemoving image: 14%
> > complete...\rRemoving image: 15% complete...\rR  emoving image:
> > 16% complete...\rRemoving image: 17% complete...\rRemoving image: 18%
> > complete...\rRemoving image: 19% complete...\rRemoving image: 20%
> > complete...\rRe  moving image: 21% complete...\rRemoving image:
> > 22% complete...\rRemoving image: 23% complete...\rRemoving image: 24%
> > complete...\rRemoving image: 25% complete...\rRem  oving image:
> > 26% complete...\rRemoving image: 27% complete...\rRemoving image: 28%
> > complete...\rRemoving image: 29% complete...\rRemoving image: 30%
> > complete...\rRemo  ving image: 31% complete...\rRemoving image:
> > 32% complete...\rRemoving image: 33% complete...\rRemoving image: 34%
> > complete...\rRemoving image: 35% complete...\rRemov  ing image:
> > 36% complete...\rRemoving image: 37% complete...\rRemoving image: 38%
> > complete...\rRemoving image: 39% complete...\rRemoving image: 40%
> > complete...\rRemovi  ng image: 41% complete...\rRemoving image:
> > 42% complete...\rRemoving image: 43% complete...\rRemoving image: 44%
> > complete...\rRemoving image: 45% complete...\rRemovin  g image:
> > 46% complete...\rRemoving image: 47% complete...\rRemoving image: 48%
> > complete...\rRemoving image: 49% complete...\rRemoving image: 50%
> > complete...\rRemoving   image: 51% complete...\rRemoving image:
> > 52% complete...\rRemoving image: 53% complete...\rRemoving image: 54%
> > complete...\rRemoving image: 55% complete...\rRemoving   image:
> > 56% complete...\rRemoving image: 57% complete...\rRemoving image: 58%
> > complete...\rRemoving image: 59% complete...\rRemoving image: 60%
> > complete...\rRemoving i  mage: 61% complete...\rRemoving image:
> > 62% complete...\rRemoving image: 63% complete...\rRemoving image: 64%
> > complete...\rRemoving image: 65% complete...\rRemoving im  age:
> > 66% complete...\rRemoving image: 67% complete...\rRemoving image: 68%
> > complete...\rRemoving image: 69% complete...\rRemoving image: 70%
> > complete...\rRemoving ima  ge: 71% complete...\rRemoving image:
> > 72% complete...\rRemoving image: 73% complete...\rRemoving image: 74%
> > complete...\rRemoving image: 75% complete...\rRemoving imag  e:
> > 76% complete...\rRemoving image: 77% complete...\rRemoving image: 78%
> > complete...\rRemoving image: 79% complete...\rRemoving image: 80%
> > complete...\rRemoving image  : 81% complete...\rRemoving image:
> > 82% complete...\rRemoving image: 83% complete...\rRemoving image: 84%
> > complete...\rRemoving image: 85% complete...\rRemoving image:
> > 86% complete...\rRemoving image: 87% complete...\rRemoving image: 88%
> > complete...\rRemoving image: 89% complete...\rRemoving image: 90%
> > complete...\rRemoving image:   91% complete...\rRemoving image:
> > 92% complete...\rRemoving image: 93% complete...\rRemoving image: 94%
> > complete...\rRemoving image: 95% complete...\rRemoving image: 9
> > 6% complete...\rRemoving image: 97% complete...\rRemoving image: 98%
> > complete...\rRemoving image: 99% complete...\rRemoving image: 99%
> > complete...failed.\n'
> > 21270 Stderr: 'rbd: error: image still has watchers\nThis means the
> > image is still open or the client using it crashed. T

Re: [ceph-users] Trouble with bobtail->cuttlefish upgrade

2013-05-13 Thread Gregory Farnum
See http://tracker.ceph.com/issues/4974; we're testing the fix out for
a packaged release now.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sat, May 11, 2013 at 12:40 AM, Pawel Stefanski  wrote:
> hello!
>
> I'm trying to upgrade my test cluster to cuttlefish, but I'm stucked with
> mon upgrade.
>
> Bobtail version - 0.56.6 (previous rolling upgrades)
> cuttlefish version - 0.61.1
>
> While starting upgraded mon demon it's faulting on store conversion.
>
> [25622]: (33) Numerical argument out of domain
>
> in log:
>  0> 2013-05-11 09:28:35.073868 7f0687994780 -1 mon/Monitor.cc: In
> function 'void Monitor::StoreConverter::_convert_machines(std::string)'
> thread 7f0687994780 time 2013-05-11 09:28:35.072909
> mon/Monitor.cc: 4413: FAILED assert(0 == "Duplicate GV -- something is
> wrong!")
>
>
> Full stacktrace on:
> https://gist.github.com/anonymous/5559212
>
> What should I do ?
>
> best regards!
> --
> pawel
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD crash during script, 0.56.4

2013-05-13 Thread Gregory Farnum
On Tue, May 7, 2013 at 9:44 AM, Travis Rhoden  wrote:
> Hey folks,
>
> Saw this crash the other day:
>
>  ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)
>  1: /usr/bin/ceph-osd() [0x788fba]
>  2: (()+0xfcb0) [0x7f19d1889cb0]
>  3: (gsignal()+0x35) [0x7f19d0248425]
>  4: (abort()+0x17b) [0x7f19d024bb8b]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f19d0b9a69d]
>  6: (()+0xb5846) [0x7f19d0b98846]
>  7: (()+0xb5873) [0x7f19d0b98873]
>  8: (()+0xb596e) [0x7f19d0b9896e]
>  9: (operator new[](unsigned long)+0x47e) [0x7f19d102db1e]
>  10: (ceph::buffer::create(unsigned int)+0x67) [0x834727]
>  11: (ceph::buffer::ptr::ptr(unsigned int)+0x15) [0x834a95]
>  12: (FileStore::read(coll_t, hobject_t const&, unsigned long,
> unsigned long, ceph::buffer::list&)+0x1ae) [0x6fbdde]
>  13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t,
> bool)+0x347) [0x69ac57]
>  14: (PG::chunky_scrub()+0x375) [0x69faf5]
>  15: (PG::scrub()+0x145) [0x6a0e95]
>  16: (OSD::ScrubWQ::_process(PG*)+0xc) [0x6384ec]
>  17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8297e6]
>  18: (ThreadPool::WorkThread::entry()+0x10) [0x82b610]
>  19: (()+0x7e9a) [0x7f19d1881e9a]
>  20: (clone()+0x6d) [0x7f19d0305cbd]
>  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
>
> Appears to have gone down during a scrub?
>
> I don't see anything interesting in /var/log/syslog or anywhere else
> at the same time.  It's actually the second time I've seen this exact
> stack trace.  First time was reported here...  (was going to insert
> GMane link, but search.gmane.org appears to be down for me).  Well,
> for those inclined, the thread was titled "question about mon memory
> usage", and was also started by me.
>
> Any thoughts?  I do plan to upgrade to 0.56.6 when I can.  I'm a
> little leery of doing it on a production system without a maintenance
> window, though.  When I went from 0.56.3 --> 0.56.4 on a live system,
> a system using the RBD kernel module kpanic'd.  =)

Do you have a core from when this happened? It was indeed during a
scrub, but it didn't fail an assert or anything — looks like maybe it
tried to allocate too much memory or something... :/
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD crash during script, 0.56.4

2013-05-13 Thread Travis Rhoden
I'm afraid I don't.  I don't think I looked when it happened, and
searching for one just now came up empty.  :/  If it happens again,
I'll be sure to keep my eye out for one.

FWIW, this particular server (1 out of 5) has 8GB *less* RAM than the
others (one bad stick, it seems), and this has happened twice.  But it
still has 40GB for 12 OSDs, so I think it should be plenty.  Thanks
for responding.

 - Travis

On Mon, May 13, 2013 at 4:49 PM, Gregory Farnum  wrote:
> On Tue, May 7, 2013 at 9:44 AM, Travis Rhoden  wrote:
>> Hey folks,
>>
>> Saw this crash the other day:
>>
>>  ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)
>>  1: /usr/bin/ceph-osd() [0x788fba]
>>  2: (()+0xfcb0) [0x7f19d1889cb0]
>>  3: (gsignal()+0x35) [0x7f19d0248425]
>>  4: (abort()+0x17b) [0x7f19d024bb8b]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f19d0b9a69d]
>>  6: (()+0xb5846) [0x7f19d0b98846]
>>  7: (()+0xb5873) [0x7f19d0b98873]
>>  8: (()+0xb596e) [0x7f19d0b9896e]
>>  9: (operator new[](unsigned long)+0x47e) [0x7f19d102db1e]
>>  10: (ceph::buffer::create(unsigned int)+0x67) [0x834727]
>>  11: (ceph::buffer::ptr::ptr(unsigned int)+0x15) [0x834a95]
>>  12: (FileStore::read(coll_t, hobject_t const&, unsigned long,
>> unsigned long, ceph::buffer::list&)+0x1ae) [0x6fbdde]
>>  13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t,
>> bool)+0x347) [0x69ac57]
>>  14: (PG::chunky_scrub()+0x375) [0x69faf5]
>>  15: (PG::scrub()+0x145) [0x6a0e95]
>>  16: (OSD::ScrubWQ::_process(PG*)+0xc) [0x6384ec]
>>  17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8297e6]
>>  18: (ThreadPool::WorkThread::entry()+0x10) [0x82b610]
>>  19: (()+0x7e9a) [0x7f19d1881e9a]
>>  20: (clone()+0x6d) [0x7f19d0305cbd]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is
>> needed to interpret this.
>>
>> Appears to have gone down during a scrub?
>>
>> I don't see anything interesting in /var/log/syslog or anywhere else
>> at the same time.  It's actually the second time I've seen this exact
>> stack trace.  First time was reported here...  (was going to insert
>> GMane link, but search.gmane.org appears to be down for me).  Well,
>> for those inclined, the thread was titled "question about mon memory
>> usage", and was also started by me.
>>
>> Any thoughts?  I do plan to upgrade to 0.56.6 when I can.  I'm a
>> little leery of doing it on a production system without a maintenance
>> window, though.  When I went from 0.56.3 --> 0.56.4 on a live system,
>> a system using the RBD kernel module kpanic'd.  =)
>
> Do you have a core from when this happened? It was indeed during a
> scrub, but it didn't fail an assert or anything — looks like maybe it
> tried to allocate too much memory or something... :/
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Trouble with bobtail->cuttlefish upgrade

2013-05-13 Thread Smart Weblications GmbH - Florian Wiessner
Am 13.05.2013 22:47, schrieb Gregory Farnum:
> See http://tracker.ceph.com/issues/4974; we're testing the fix out for
> a packaged release now.


I see this has been resolved, when will there be a new package for debian
squeeze ready?




-- 

Mit freundlichen Grüßen,

Florian Wiessner

Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila

fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de

--
Sitz der Gesellschaft: Naila
Geschäftsführer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] distinguish administratively down OSDs

2013-05-13 Thread Travis Rhoden
Hey folks,

This is either a feature request, or a request for guidance to handle
something that must be common...  =)

I have a cluster with dozens of OSDs, and one started having read
errors (media errors) from the hard disk.  Ceph complained, I took it
out of service my marking it down and out.  "ceph osd tree" showed it
as down, with a weight of 0 (out).  Perfect.  In the meantime, I RMA'd
the disk.  The replacement is on-hand, but we haven't done the
swap-out yet.  Woohoo, rot in place.  =)

Fast forward a few days, and we had a server failure.  This took a
bunch of OSDs with it, but we were able to bring it back online, but
not before before normal recovery operations had started. The failed
server came back up, and things started to migrate *back*.  All this
is normal.  However, the loads were pretty intense, and I actually saw
a few OSDs on *other* servers fail.  Seemingly randomly.  Only 3 or 4.
 Thankfully I was watching for that, and restarted them before hitting
the default 5 minute timeout and kicking off *more* recovery.

On to my question...  During this time where I was watching for newly
down OSDs, I had no way of knowing which OSDs were newly down (and
potentially out), and which was the one I had set down on purpose.  At
least not from the CLI.  I figured it out from some notes I had taken
when I RMA'd the drive, but (sheepishly) not before I tried restarting
the OSD that had a bad hard drive behind it.

So, from the CLI, how could one distinguish OSDs that are down *on
purpose* and should be left that way?

My first thought would be to allow for a "note" field to be attached
to an OSD, and have that displayed in the output of "ceph osd tree".
If anyone is familiar with HPC and specifically PBS (pbsnodes command,
specifically), this would be similar to "pbsnodes -ln", which shows
notes attached to compute nodes that an administrator might have
attached to compute nodes that are down.  Examples I see from this on
one of our current compute clusters are "bad RAM", "bad scratch disk",
"does not POST", etc.

Anyone else want to be able to track such a thing?  Is there an
existing method I could achieve such a goal with?  As things scale to
hundreds of OSDs are more, seems like a useful thing to note OSDs that
have failed, and why.

Thanks,

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Trouble with bobtail->cuttlefish upgrade

2013-05-13 Thread Ian Colle
Florian,

It's building now, should be out in a few hours.

Ian R. Colle
Ceph Program Manager
Inktank
Cell: +1.303.601.7713 
Email: i...@inktank.com


Delivering the Future of Storage


 

 




On 5/13/13 3:37 PM, "Smart Weblications GmbH - Florian Wiessner"
 wrote:

>Am 13.05.2013 22:47, schrieb Gregory Farnum:
>> See http://tracker.ceph.com/issues/4974; we're testing the fix out for
>> a packaged release now.
>
>
>I see this has been resolved, when will there be a new package for debian
>squeeze ready?
>
>
>
>
>-- 
>
>Mit freundlichen Grüßen,
>
>Florian Wiessner
>
>Smart Weblications GmbH
>Martinsberger Str. 1
>D-95119 Naila
>
>fon.: +49 9282 9638 200
>fax.: +49 9282 9638 205
>24/7: +49 900 144 000 00 - 0,99 EUR/Min*
>http://www.smart-weblications.de
>
>--
>Sitz der Gesellschaft: Naila
>Geschäftsführer: Florian Wiessner
>HRB-Nr.: HRB 3840 Amtsgericht Hof
>*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Trouble with bobtail->cuttlefish upgrade

2013-05-13 Thread Smart Weblications GmbH - Florian Wiessner
Am 13.05.2013 23:49, schrieb Ian Colle:
> Florian,
> 
> It's building now, should be out in a few hours.


thank you.


-- 

Mit freundlichen Grüßen,

Florian Wiessner

Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila

fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de

--
Sitz der Gesellschaft: Naila
Geschäftsführer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help! 61.1 killed my monitors in prod

2013-05-13 Thread Stephen Street

Joao,

Thanks for you response.  Sorry for the marginal quality of the original 
e-mail.. 

Better log information in-line.

On May 13, 2013, at 1:19 PM, Joao Eduardo Luis  wrote:

> On 05/13/2013 08:40 PM, Stephen Street wrote:
>> 
>> On May 10, 2013, at 3:39 PM, Joao Eduardo Luis  wrote:
>> 
>>> We would certainly be interested in taking a look at logs from
>>> those  monitors, and would appreciate if you could set 'debug mon = 20', 
>>> 'debug
>>> auth = 10' and 'debug ms = 1', and give them a spin until you hit your
>>> issue.
>> 
>> I seeing the same problem at Jeppesen.  I running 0.61.1 with 3 MON,
>> 4 OSD and 1 MDS and a reboot of the cluster falls in the same state
>> with hung ceph-create-keys and the monitors not running.  I add the
>> debug setting as indicated.  This is a excerpt from of the output of
>> "ceph status
> 
> All this shows is that connections from 'ceph' to the monitors are being 
> dropped/closed.
> 
> Assessing what's going on will require logs from the monitors with the same 
> debug levels as stated before.
> 

>From the logs, it appears that the monitors are struggling to bind to the 
>network at system start. If I issue a initctl restart ceph-mon-all to all 
>nodes running monitors, the system starts functioning correctly.

My ceph.conf generated by ceph-deploy (I added the debug):

[global]
fsid = f3aaf545-515c-4597-a0c2-2b08a309e944
mon_initial_members = cloud-2, cloud-3, cloud-4
mon_host = 192.168.139.2,192.168.139.3,192.168.139.4
auth_supported = cephx
osd_journal_size = 2048
filestore_xattr_use_omap = true
debug_mon = 20
debug_auth = 10
debug_ms = 1

First MON:

ps aux | grep ceph
root  1295  0.0  0.0  15584  1328 ?S22:11   0:00 initctl emit 
ceph-osd cluster=ceph id=1
root  1304  0.0  0.0  33676  7108 ?Ss   22:11   0:00 
/usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i cloud-2
root  1319  0.0  0.0 313364  4152 ?Sl   22:11   0:00 ceph 
--cluster=ceph --name=osd.1 --keyring=/var/lib/ceph/osd/ceph-1/keyring osd 
crush create-or-move -- 1 0.68 root=default host=cloud-2
root  2766  0.0  0.0   9440   956 pts/0S+   22:13   0:00 grep 
--color=auto ceph

root@cloud-2:~# cat /var/log/ceph/ceph-mon.cloud-2.log 
2013-05-13 22:11:16.776048 7f18ddca87c0  0 ceph version 0.61.1 
(56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1302
2013-05-13 22:11:16.783543 7f18ddca87c0 10 needs_conversion
2013-05-13 22:11:16.805741 7f18d9a9d700 -1 asok(0x2cf80e0) AdminSocket: request 
'mon_status' not defined
2013-05-13 22:11:16.814564 7f18ddca87c0 10 obtain_monmap
2013-05-13 22:11:16.814605 7f18ddca87c0 10 obtain_monmap read last committed 
monmap ver 1
2013-05-13 22:11:16.814683 7f18ddca87c0 -1 accepter.accepter.bind unable to 
bind to 192.168.139.2:6789: Cannot assign requested address
2013-05-13 22:11:16.821721 7fc9c91757c0  0 ceph version 0.61.1 
(56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1338
2013-05-13 22:11:16.822949 7fc9c91757c0 10 needs_conversion
2013-05-13 22:11:16.827517 7fc9c91757c0 10 obtain_monmap
2013-05-13 22:11:16.827563 7fc9c91757c0 10 obtain_monmap read last committed 
monmap ver 1
2013-05-13 22:11:16.827636 7fc9c91757c0 -1 accepter.accepter.bind unable to 
bind to 192.168.139.2:6789: Cannot assign requested address
2013-05-13 22:11:16.834545 7fb0c3fc27c0  0 ceph version 0.61.1 
(56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1347
2013-05-13 22:11:16.835990 7fb0c3fc27c0 10 needs_conversion
2013-05-13 22:11:16.841349 7fb0c3fc27c0 10 obtain_monmap
2013-05-13 22:11:16.841388 7fb0c3fc27c0 10 obtain_monmap read last committed 
monmap ver 1
2013-05-13 22:11:16.841470 7fb0c3fc27c0 -1 accepter.accepter.bind unable to 
bind to 192.168.139.2:6789: Cannot assign requested address
2013-05-13 22:11:16.848379 7f593c5037c0  0 ceph version 0.61.1 
(56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1354
2013-05-13 22:11:16.849630 7f593c5037c0 10 needs_conversion
2013-05-13 22:11:16.854367 7f593c5037c0 10 obtain_monmap
2013-05-13 22:11:16.854400 7f593c5037c0 10 obtain_monmap read last committed 
monmap ver 1
2013-05-13 22:11:16.854471 7f593c5037c0 -1 accepter.accepter.bind unable to 
bind to 192.168.139.2:6789: Cannot assign requested address
2013-05-13 22:11:16.861371 7fe46afba7c0  0 ceph version 0.61.1 
(56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1362
2013-05-13 22:11:16.862730 7fe46afba7c0 10 needs_conversion
2013-05-13 22:11:16.867683 7fe46afba7c0 10 obtain_monmap
2013-05-13 22:11:16.867722 7fe46afba7c0 10 obtain_monmap read last committed 
monmap ver 1
2013-05-13 22:11:16.867804 7fe46afba7c0 -1 accepter.accepter.bind unable to 
bind to 192.168.139.2:6789: Cannot assign requested address
2013-05-13 22:11:16.874695 7faadb1b57c0  0 ceph version 0.61.1 
(56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1369
2013-05-13 22:11:16.875922 7faadb1b57c0 10 needs_conversion
2013-05-13 22:11:16.880680 7faadb1b57c0 10 obtain_monma

[ceph-users] HEALTH_ERR 14 pgs inconsistent; 18 scrub errors

2013-05-13 Thread James Harper
After replacing a failed harddisk, ceph health reports "HEALTH_ERR 14 pgs 
inconsistent; 18 scrub errors"

The disk was a total loss so I replaced it, ran mkfs etc and rebuilt the osd 
and while it has resynchronised everything the above still remains.

What should I do to resolve this?

Thanks

James


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_ERR 14 pgs inconsistent; 18 scrub errors

2013-05-13 Thread Smart Weblications GmbH - Florian Wiessner
Am 14.05.2013 01:46, schrieb James Harper:
> After replacing a failed harddisk, ceph health reports "HEALTH_ERR 14 pgs 
> inconsistent; 18 scrub errors"
> 
> The disk was a total loss so I replaced it, ran mkfs etc and rebuilt the osd 
> and while it has resynchronised everything the above still remains.
> 
> What should I do to resolve this?
> 

have you tried to repair the pgs?

you can do ceph health details and then run ceph pg repair  to repair the
inconsistent PGs


-- 

Mit freundlichen Grüßen,

Florian Wiessner

Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila

fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de

--
Sitz der Gesellschaft: Naila
Geschäftsführer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_ERR 14 pgs inconsistent; 18 scrub errors

2013-05-13 Thread Smart Weblications GmbH - Florian Wiessner
Am 14.05.2013 02:11, schrieb James Harper:
>>
>> Am 14.05.2013 01:46, schrieb James Harper:
>>> After replacing a failed harddisk, ceph health reports "HEALTH_ERR 14 pgs
>> inconsistent; 18 scrub errors"
>>>
>>> The disk was a total loss so I replaced it, ran mkfs etc and rebuilt the osd
>> and while it has resynchronised everything the above still remains.
>>>
>>> What should I do to resolve this?
>>
>> have you tried to repair the pgs?
>>
>> you can do ceph health details and then run ceph pg repair  to repair the
>> inconsistent PGs
>>
> 
> I didn't know how to proceed so I hadn't done anything. I have issues the 
> repair commands now and they are coming good.
> 
> Does this mean I have lost data somewhere? If not, why isn't the repair 
> automatic?
> 


No the data should be safe if the repair succeeds.




-- 

Mit freundlichen Grüßen,

Florian Wiessner

Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila

fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de

--
Sitz der Gesellschaft: Naila
Geschäftsführer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_ERR 14 pgs inconsistent; 18 scrub errors

2013-05-13 Thread James Harper
> 
> Am 14.05.2013 02:11, schrieb James Harper:
> >>
> >> Am 14.05.2013 01:46, schrieb James Harper:
> >>> After replacing a failed harddisk, ceph health reports "HEALTH_ERR 14
> pgs
> >> inconsistent; 18 scrub errors"
> >>>
> >>> The disk was a total loss so I replaced it, ran mkfs etc and rebuilt the 
> >>> osd
> >> and while it has resynchronised everything the above still remains.
> >>>
> >>> What should I do to resolve this?
> >>
> >> have you tried to repair the pgs?
> >>
> >> you can do ceph health details and then run ceph pg repair  to repair
> the
> >> inconsistent PGs
> >>
> >
> > I didn't know how to proceed so I hadn't done anything. I have issues the
> repair commands now and they are coming good.
> >
> > Does this mean I have lost data somewhere? If not, why isn't the repair
> automatic?
> >
> 
> No the data should be safe if the repair succeeds.
> 

Thanks. The repair has now completed so I'm all healthy again (or at least I 
would be if one of my mon's wasn't running low on disk!)  I'll run a bunch of 
tests to confirm the state of the data. I'm still in the testing phase with 
ceph so I don't have any data of consequence on there, just a domain controller 
(one of several so lots of redundancy) which had gone BSoD at some point while 
things  were failing, but has booted up again since and appears to be working.

I'm still seeing this on the client in the logs every 25 minutes or so though:

May 14 10:11:23 bitvs5 kernel: [947275.120247] libceph: osd82 
192.168.200.191:6882 socket closed
May 14 10:11:23 bitvs5 kernel: [947275.121654] libceph: osd82 
192.168.200.191:6882 connect authorization failure

Which appears to have been happening since I rebuilt the osd. Have I missed 
copying a key somewhere?

Client is Debian "wheezy" with 3.2.x kernel, using the kernel client. Could it 
just be that this is old?

Thanks again!

James

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unable to get RadosGW working on CentOS 6

2013-05-13 Thread Jeff Bachtel
Environment is CentOS 6.4, Apache, mod_fastcgi (from repoforge, so probably
without the continue 100 patches). I'm attempting to install radosgw on the
2nd mon host.

My setup consistently fails when running s3test.py from
http://wiki.debian.org/OpenStackCephHowto (with appropriate values filled
in, of course. I used /var/www/html instead of /var/www). The radosgw-admin
user and subuser commands on that Howto execute and give expected results.

Apache error.log throws (repeated):

[Mon May 13 21:49:02 2013] [warn] FastCGI: (dynamic) server
"/var/www/html/s3gw.fcgi" (uid 0, gid 0) restarted (pid 20102)
[Mon May 13 21:49:02 2013] [warn] FastCGI: (dynamic) server
"/var/www/html/s3gw.fcgi" (pid 20102) terminated by calling exit with
status '0'

virtualhost access log throws (repeated):
10.100.2.2 - - [13/May/2013:21:48:55 -0400] "PUT /my-new-bucket/ HTTP/1.1"
500 538 "-" "Boto/2.5.2 (linux2)"
10.100.2.2 - - [13/May/2013:21:49:30 -0400] "PUT /my-new-bucket/ HTTP/1.1"
500 538 "-" "Boto/2.5.2 (linux2)"

virtualhost error log throws (repeated):
[Mon May 13 21:49:51 2013] [error] [client 10.100.2.2] FastCGI: comm with
(dynamic) server "/var/www/html/s3gw.fcgi" aborted: (first read) idle
timeout (20 sec)
[Mon May 13 21:49:51 2013] [error] [client 10.100.2.2] FastCGI: incomplete
headers (0 bytes) received from server "/var/www/html/s3gw.fcgi"

radosgw log is voluminous because I've got debug ms=1 and debug rgw=20 set,
but the most common error message looking bit is about not being able to
obtain a lock on gc (garbage collection, I presume) objects. Excerpt at
http://pastebin.com/zyAXMLjF

radosgw is being spawned, perhaps to excess, by Apache:
apache   32566  0.5  0.0 5394300 13464 ?   Ssl  21:57   0:00
/usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway
apache   32689  0.5  0.0 5394300 13472 ?   Ssl  21:57   0:00
/usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway
[root@controller2 ceph]# ps auxww | grep ceph | wc -l
237

VirtualHost servername matches fqdn. ceph.conf uses short hostname (both
are in /etc/hosts pointing to same IP).

Any ideas what might be causing the FastCGI errors? I saw the similar
problems originally with fcgid, which was what led me to install
mod_fastcgi.

Thanks,

Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph monitor crashes

2013-05-13 Thread Mr. NPP
hello, i'm currently running 0.61, with about 44 osd's and 4 monitors, one
as a spare.

with about 6 hosts.

I've been running into an issue where when one ceph host would go down the
entire system become unusable. today we recovered from a ssd crash crash
for an osd's journal, and it was a lot of work to get it back up, i
couldn't get monitors to come up and establish quorum. I was going to
rebuild it manually, but the documentation for ceph is outdated to manually
(dirty) remove a monitor using the monmap tool, i couldn't find the
/mon-$id/monmap directory.

anyway, I recovered eventually and was able to run with 4 monitors, and i
updated the crushmap and it crashed the monitor that i was updating the
crushmap too.

it now gives me

[976]: (33) Numerical argument out of domain

when i try to manually start it, i've seen this assert failure before, just
not sure whats causing it.

below i the log from the crash.
https://docs.google.com/a/nopatentpending.com/file/d/0BwQnRodV8ActNTVFUVpLVjdMSGc/edit

i'm not even really sure if my configs are right, i'm still pretty new at
this.

below are the configs, and the last map

ceph.conf
https://docs.google.com/file/d/0BwQnRodV8Acta3ZfSnBrOU40MW8/edit?usp=sharing

crush.map.txt
https://docs.google.com/file/d/0BwQnRodV8Actbl9hY054Mm9UTXM/edit?usp=sharing

if you need additional dumps from the monitor i can get it.

thanks
mr.npp
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.61.2 released

2013-05-13 Thread Sage Weil
This release has only two changes: it disables a debug log by default that 
consumes disk space on the monitor, and fixes a bug with upgrading bobtail 
monitor stores with duplicated GV values.  We urge all v0.61.1 users to 
upgrade to avoid filling the monitor data disks.

 * mon: fix conversion of stores with duplicated GV values
 * mon: disable 'mon debug dump transactions' by default

You can get v0.61.2 from the usual places:

 * Git at git://github.com/ceph/ceph.git
 * Tarball at http://ceph.com/download/ceph-0.61.2.tar.gz
 * For Debian/Ubuntu packages, see http://ceph.com/docs/master/install/debian
 * For RPMs, see http://ceph.com/docs/master/install/rpm
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help! 61.1 killed my monitors in prod

2013-05-13 Thread Stephen Street
Joao,

On May 13, 2013, at 3:24 PM, Stephen Street  wrote:

> 
> From the logs, it appears that the monitors are struggling to bind to the 
> network at system start. If I issue a initctl restart ceph-mon-all to all 
> nodes running monitors, the system starts functioning correctly.
> 

I found the issue.  My nodes have two ethernet interfaces (eth0 and eth1) and 
both are configured to use static DHCP leases. My cluster is configured to use 
address on eth0 (192.168.139.0/24).  The upstart job /etc/init/ceph-all.conf 
contains the following line:

start on (local-filesystems and net-device-up IFACE!=lo)

It appears that eth1 emits a net-device-up before eth0 causing the ceph-all 
upstart job to begin running before the desired network is available, leading 
to the address binding error seen in the log.  I changed the upstart job line 
to:

start on (local-filesystems and net-device-up IFACE=eth0)

and the cluster cold starts successfully.

Thanks for your help
Stephen








___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to get RadosGW working on CentOS 6

2013-05-13 Thread Yehuda Sadeh
On Mon, May 13, 2013 at 7:01 PM, Jeff Bachtel
 wrote:
> Environment is CentOS 6.4, Apache, mod_fastcgi (from repoforge, so probably
> without the continue 100 patches). I'm attempting to install radosgw on the
> 2nd mon host.
>
> My setup consistently fails when running s3test.py from
> http://wiki.debian.org/OpenStackCephHowto (with appropriate values filled
> in, of course. I used /var/www/html instead of /var/www). The radosgw-admin
> user and subuser commands on that Howto execute and give expected results.
>
> Apache error.log throws (repeated):
>
> [Mon May 13 21:49:02 2013] [warn] FastCGI: (dynamic) server
> "/var/www/html/s3gw.fcgi" (uid 0, gid 0) restarted (pid 20102)
> [Mon May 13 21:49:02 2013] [warn] FastCGI: (dynamic) server
> "/var/www/html/s3gw.fcgi" (pid 20102) terminated by calling exit with status
> '0'
>
> virtualhost access log throws (repeated):
> 10.100.2.2 - - [13/May/2013:21:48:55 -0400] "PUT /my-new-bucket/ HTTP/1.1"
> 500 538 "-" "Boto/2.5.2 (linux2)"
> 10.100.2.2 - - [13/May/2013:21:49:30 -0400] "PUT /my-new-bucket/ HTTP/1.1"
> 500 538 "-" "Boto/2.5.2 (linux2)"
>
> virtualhost error log throws (repeated):
> [Mon May 13 21:49:51 2013] [error] [client 10.100.2.2] FastCGI: comm with
> (dynamic) server "/var/www/html/s3gw.fcgi" aborted: (first read) idle
> timeout (20 sec)
> [Mon May 13 21:49:51 2013] [error] [client 10.100.2.2] FastCGI: incomplete
> headers (0 bytes) received from server "/var/www/html/s3gw.fcgi"
>
> radosgw log is voluminous because I've got debug ms=1 and debug rgw=20 set,
> but the most common error message looking bit is about not being able to
> obtain a lock on gc (garbage collection, I presume) objects. Excerpt at
> http://pastebin.com/zyAXMLjF
>
> radosgw is being spawned, perhaps to excess, by Apache:
> apache   32566  0.5  0.0 5394300 13464 ?   Ssl  21:57   0:00
> /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway
> apache   32689  0.5  0.0 5394300 13472 ?   Ssl  21:57   0:00
> /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway
> [root@controller2 ceph]# ps auxww | grep ceph | wc -l
> 237
>
> VirtualHost servername matches fqdn. ceph.conf uses short hostname (both are
> in /etc/hosts pointing to same IP).
>
> Any ideas what might be causing the FastCGI errors? I saw the similar
> problems originally with fcgid, which was what led me to install
> mod_fastcgi.
>

Try setting 'rgw print continue = false' on your gateway config (and
restart the gateway).

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Regd: Ceph-deploy

2013-05-13 Thread Sridhar Mahadevan
Hi,
I am trying to setup ceph and I am using ceph-deploy. I am following the
steps in object store quick guide. When I execute ceph-deploy gatherkeys it
throws up the following error.

Unable to find /etc/ceph/ceph.client.admin.keyring
Unable to find /var/lib/ceph/bootstrap-osd/ceph.keyring
Unable to find /var/lib/ceph/bootstrap-msd/ceph.keyring

Kindly help

Thanks and Regards

-- 
--sridhar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com