[ceph-users] radosgw s3 API feature

2013-10-23 Thread lixuehui
Hi all 
We know that radosgw supported a subset of Amazon s3 functional features.Does 
radosgw support server-side encryption algorithm ,when creating an 
object,through Amazon s3 api support AES256 on server-side? Or can we use 
client-side encryption algorithm ?




lixuehui___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] errors cause by update ceph version(from 0.56.3 to 0.62)

2013-10-23 Thread
|
 

Hi all!
  
 now my ceph version is 0.62! yesday I update it from 0.56.3!
 
 I found  the init-ceph scrip in defferent ceph version has lots devfrents!

my ceph version is 0.62! what make me confuse is that I can not use the 
init-ceph,  which version is 0.62 ,  start my cluster. when I can user the 
init-ceph , which version is 0.52.3, start my cluster!  what is more !  the 
0.62 vesion's init-ceph can break my osd map!  I should rebuild the cluter use 
the mkcephfs! when I run ceph osd tree , the result is break!

by the way!  I found that  if I would not clearn the /run/ceph/, there 
will some wrong can not let cluster run well!
so , if I want to rebuilt the ceph cluster!
   # umont /data/osd.id
   # rm -rf /data/*
   #rm -rf /run/ceph/*
   
 Any pointers would be appreciated!

thinks!
peng





|
|
|   |   |
|___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Crushmap / Bench

2013-10-23 Thread NEVEU Stephane
Hi list,

I'm trying to get the best from my 3 node "low cost" hardware for testing 
purposes :
3 Dell PowerEdge 2950.
Cluster/Public networks both with 2x1Gb LACP (layer3+4 hash)
No MDS running for now.
SAS disks (no ssd), both 1 & 15000 rpm.
Sda = system
Sdb, sdc, sdd, sde = OSDs
Sdf = journal (osd_journal_size = 1024)

My question is how can I optimize my osd weights (or something else) regarding 
my config ? is the hdparm -t test below useful do to so ? For instance, is it a 
better choice to use sdd for journals ?

Some quick tests :

ceph1:
hdparm -t /dev/sdb ; hdparm -t /dev/sdc ; hdparm -t /dev/sdd ; hdparm -t 
/dev/sde ; hdparm -t /dev/sdf
/dev/sdb:
Timing buffered disk reads: 374 MB in  3.00 seconds = 124.62 MB/sec
/dev/sdc:
Timing buffered disk reads: 264 MB in  3.02 seconds =  87.51 MB/sec
/dev/sdd:
Timing buffered disk reads: 470 MB in  3.01 seconds = 156.19 MB/sec
/dev/sde:
Timing buffered disk reads: 264 MB in  3.01 seconds =  87.61 MB/sec
/dev/sdf:
Timing buffered disk reads: 268 MB in  3.01 seconds =  89.00 MB/sec

ceph2:
hdparm -t /dev/sdb ; hdparm -t /dev/sdc ; hdparm -t /dev/sdd ; hdparm -t 
/dev/sde ; hdparm -t /dev/sdf
/dev/sdb:
Timing buffered disk reads: 376 MB in  3.00 seconds = 125.15 MB/sec
/dev/sdc:
Timing buffered disk reads: 264 MB in  3.01 seconds =  87.68 MB/sec
/dev/sdd:
Timing buffered disk reads: 502 MB in  3.01 seconds = 166.71 MB/sec
/dev/sde:
Timing buffered disk reads: 264 MB in  3.02 seconds =  87.55 MB/sec
/dev/sdf:
Timing buffered disk reads: 268 MB in  3.01 seconds =  89.09 MB/sec

ceph3:
hdparm -t /dev/sdb ; hdparm -t /dev/sdc ; hdparm -t /dev/sdd ; hdparm -t 
/dev/sde ; hdparm -t /dev/sdf
/dev/sdb:
Timing buffered disk reads: 376 MB in  3.00 seconds = 125.14 MB/sec
/dev/sdc:
Timing buffered disk reads: 236 MB in  3.00 seconds =  78.64 MB/sec
/dev/sdd:
Timing buffered disk reads: 504 MB in  3.00 seconds = 167.77 MB/sec
/dev/sde:
Timing buffered disk reads: 258 MB in  3.01 seconds =  85.66 MB/sec
/dev/sdf:
Timing buffered disk reads: 274 MB in  3.00 seconds =  91.28 MB/sec

Some test now with dd and rados bench on rbd client mountpoint :

rados bench -p logs 100 write

Total time run:101.479343
Total writes made: 1626
Write size:4194304
Bandwidth (MB/sec):64.092

Average Latency:   0.996858
Max latency:   3.23692
Min latency:   0.106049


rados bench -p logs 100 seq

Total time run:35.625185
Total reads made: 1626
Read size:4194304
Bandwidth (MB/sec):182.567

Average Latency:   0.350297
Max latency:   1.5221
Min latency:   0.048679


DD on a remote rbd mountpoint:
dd if=/dev/zero of=here bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 16.8977 s, 63.5 MB/s

dd if=/dev/zero of=here bs=200M count=100 oflag=direct
100+0 records in
100+0 records out
2097152 bytes (21 GB) copied, 341.549 s, 61.4 MB/s

And my crushmap :
# begin crush map

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 root

# buckets
host ceph1 {
  id -2   # do not change unnecessarily
  # weight 0.880
  alg straw
  hash 0  # rjenkins1
  item osd.0 weight 0.070
  item osd.1 weight 0.270
  item osd.2 weight 0.270
  item osd.3 weight 0.270
}
host ceph2 {
  id -3   # do not change unnecessarily
  # weight 0.880
  alg straw
  hash 0  # rjenkins1
  item osd.4 weight 0.070
  item osd.5 weight 0.270
  item osd.6 weight 0.270
  item osd.7 weight 0.270
}
host ceph3 {
  id -4   # do not change unnecessarily
  # weight 0.880
  alg straw
  hash 0  # rjenkins1
  item osd.8 weight 0.070
  item osd.9 weight 0.270
  item osd.10 weight 0.270
  item osd.11 weight 0.270
}
root default {
  id -1   # do not change unnecessarily
  # weight 2.640
  alg straw
  hash 0  # rjenkins1
  item ceph1 weight 0.880
  item ceph2 weight 0.880
  item ceph3 weight 0.880
}

# rules
rule data {
  ruleset 0
  type replicated
  min_size 1
  max_size 10
  step take default
  step chooseleaf firstn 0 type host
  step emit
}
rule metadata {
  ruleset 1
  type replicated
  min_size 1
  max_size 10
  step take default
  step chooseleaf firstn 0 type host
  step emit
}
rule rbd {
  ruleset 2
  type replicated
  min_size 1
  max_size 10
  step take default
  step chooseleaf firstn 0 type host
  step emit
}

# end crush map





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RE : Balance data on near full osd warning or error

2013-10-23 Thread HURTEVENT VINCENT
Hi,

thank you it's rebalancing now :)




De : Eric Eastman [eri...@aol.com]
Date d'envoi : mercredi 23 octobre 2013 01:19
À : HURTEVENT VINCENT; ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Balance data on near full osd warning or error

Hello,
What I have used to rebalance my cluster is:

ceph osd reweight-by-utilization


>we're using a small Ceph cluster with 8 nodes, each 4 osds. People are
using it
>through instances and volumes in a Openstack platform.
>
>We're facing a HEALTH_ERR with full or near full osds :
>
> cluster 5942e110-ea2f-4bac-80f7-243fe3e35732
>   health HEALTH_ERR 1 full osd(s); 13 near full osd(s)
>   monmap e1: 3 mons at
{0=192.168.73.131:6789/0,1=192.168.73.135:6789/0,2=192.168.73.140:6789/0}
,
>election epoch 2974, quorum 0,1,2 0,1,2
 >  osdmap e4127: 32 osds: 32 up, 32 in full
  >   pgmap v6055899: 10304 pgs: 10304 active+clean; 12444 GB data,
24953 GB used,
>4840 GB / 29793 GB avail
 >  mdsmap e792: 1/1/1 up {0=2=up:active}, 2 up:standby

Eric
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw s3 API feature

2013-10-23 Thread Wido den Hollander

On 10/23/2013 09:27 AM, lixuehui wrote:

Hi all
We know that radosgw supported a subset of Amazon s3 functional
features.Does radosgw support server-side encryption algorithm ,when
creating an object,through Amazon s3 api support AES256 on server-side?
Or can we use client-side encryption algorithm ?


I would always recommend to use client side encryption, this way your 
private key can stay local and can never be intercepted.


And the current implementation of the RGW doesn't support server-side 
encryption.




lixuehui


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD & Windows Failover Clustering

2013-10-23 Thread Gilles Mocellin

Le 22/10/2013 14:38, Damien Churchill a écrit :
Yeah, I'd thought of doing it that way, however it would be nice to 
avoid that if possible since the machines in the cluster will be 
running under QEMU using librbd, so it'd be additional overhead having 
to re-export the drives using iSCSI.


Hello,
So, if your cluster nodes are running virtualized with Qemu/KVM, you can 
present them a virtual SCSI drive, from the same RBD image.

It will be like a shared FC SCSI SAN LUN.




On 22 October 2013 13:36, > wrote:



RBD can be re-published via iSCSI using a gateway host to sit in
between, for example using targetcli.



On 2013-10-22 13:15, Damien Churchill wrote:

Hi,

I was wondering if anyone has had any experience in attempting
to use
a RBD volume as a clustered drive in Windows Failover
Clustering? I'm
getting the impression that it won't work since it needs to be
either
an iSCSI LUN or a SCSI LUN.

Thanks,
Damien
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD & Windows Failover Clustering

2013-10-23 Thread James Harper
> Hello,
> So, if your cluster nodes are running virtualized with Qemu/KVM, you can
> present them a virtual SCSI drive, from the same RBD image.
> It will be like a shared FC SCSI SAN LUN.
> 

You would want to be absolutely sure that neither qemu or rbd was doing any 
sort of caching though for this to work in principle.

For CSV qemu would need to support some SCSI protocols to do with reservation 
and these will not be supported by qemu (or if they are, it will be a stub). 
Even for just using failover you probably need this support. The cluster 
validation tool will tell you this pretty quick.

Use iSCSI.

James
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] num of placement groups created for default pools

2013-10-23 Thread Snider, Tim
I have a newly created cluster with 68 osds and the default of 2 replicas. The 
default pools  are created with 64 placement groups . The documentation in 
http://ceph.com/docs/master/rados/operations/pools/ states  for osd pool 
creation :
"We recommend approximately 50-100 placement groups per OSD to balance out 
memory and CPU requirements and per-OSD load. For a single pool of objects, you 
can use the following formula: Total PGS = (osds *100)/Replicas"

For this cluster pools should have  3200 pgs [ (64*100)/2] according to the 
recommendation.
Why isn't  the guideline followed for default pools? 
Maybe they're created prior to having all the osds activated? 
Maybe I'm reading the documentation incorrectly.

/home/ceph/bin# ceph osd getmaxosd
max_osd = 68 in epoch 219
/home/ceph/bin# ceph osd getmaxosd
max_osd = 68 in epoch 219
/home/ceph/bin# ceph osd lspools
0 data,1 metadata,2 rbd,
/home/ceph/bin# ceph osd pool get data pg_num
pg_num: 64
/home/ceph/bin# ceph osd pool get data size  
size: 2

Thanks,
Tim
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy hang on CentOS 6.4

2013-10-23 Thread Alfredo Deza
On Tue, Oct 22, 2013 at 8:31 PM, Gruher, Joseph R
 wrote:
> This was resolved by setting the curl proxy (which conveniently was
> identified as necessary in another email on this list just earlier today).
>
>
>
> Overall I had to directly configure the proxies for wget, rpm and curl
> before I could “ceph-deploy install” completely.  Setting global or user
> proxies (environment variable and/or bashrc) didn’t seem to work for me with
> CentOS, I had to set each proxy individually.
>

Did you tried working with the `--no-adjust-repos` flag in ceph-deploy
? It will allow you to tell ceph-deploy
to just go and install ceph without attempting to import keys or doing
anything with your repos.

Provided that you have your mirrors properly set up though. This would
be the recommended way to install
ceph if you are in a restrictive environment.

The documentation for this can be found here:
https://github.com/ceph/ceph-deploy#proxy-or-firewall-installs

>
>
> -Joe
>
>
>
> From: Gruher, Joseph R
> Sent: Tuesday, October 22, 2013 1:20 PM
> To: ceph-users@lists.ceph.com
> Subject: ceph-deploy hang on CentOS 6.4
>
>
>
> Hi all-
>
>
>
> Ceph-Deploy 1.2.7 is hanging for me on CentOS 6.4 at this step:
>
>
>
> [joceph01][INFO  ] Running command: rpm -Uvh --replacepkgs
> http://ceph.com/rpm-dumpling/el6/noarch/ceph-release-1-0.el6.noarch.rpm
>
>
>
> The command runs fine if I execute it myself via SSH with sudo to the target
> system:
>
>
>
> [ceph@joceph05 ceph]$ ssh joceph01 'sudo rpm -Uvh --replacepkgs
> http://ceph.com/rpm-dumpling/el6/noarch/ceph-release-1-0.el6.noarch.rpm'
>
>
>
> A proxy issue seems like a reasonable suspect but I’ve tried configuring the
> proxy about 1000 different ways, plus the command completes fine when I run
> it myself, either over SSH or locally on the system.  Curious if anyone else
> has had a similar experience and might have hit upon a solution.
>
>
>
> Failing to resolve this issue, are the steps on this page a complete
> alternative to “ceph-deploy install” for CentOS?  I am able to complete
> these steps and run “yum install ceph” but I am wondering if ceph-deploy is
> also performing any other tasks in the “install” step.
>
>
>
> http://ceph.com/docs/master/install/rpm/
>
>
>
> Thanks,
>
> Joe
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Adding a cluster network to a ceph-deploy cluster

2013-10-23 Thread Stefan Schwarz

Hello to all,

I installed a ceph cluster using ceph-deploy and i am quite happy.

Now i want to add a cluster network to it. According to "ceph report" 
public_addr and cluster_addr are set to the same ip. How can i change 
this now?


Is it save to add something like this to my ceph.conf:

#( for 1 to 11 )
[osd.1]
  host = ceph-s01
  cluster addr = 10.0.0.1
  public addr = 192.168.55.1

#( for 11 to 23 )
[osd.11]
  host = ceph-s02
  cluster addr = 10.0.0.2
  public addr = 192.168.55.2

I'm a bit scared about crashing my existing osds and loosing all my data.

Thanks for your help
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS & Project Manila (OpenStack)

2013-10-23 Thread Dimitri Maziuk

On 2013-10-22 22:41, Gregory Farnum wrote:
...

Right now, unsurprisingly, the focus of the existing Manila developers
is on Option 1: it's less work than the others and supports the most
common storage protocols very well. But as mentioned, it would be a
pretty poor fit for CephFS


I must be missing something, I thought CephFS was supposed to be a 
distributed filesystem which to me means option 1 was the point.


Dima

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG repair failing when object missing

2013-10-23 Thread Harry Harrington
Hi,

I've been taking a look at the repair functionality in ceph. As I understand it 
the osds should try to copy an object from another member of the pg if it is 
missing. I have been attempting to test this by manually removing  a file from 
one of the osds however each time the repair completes the the file has not 
been restored. If I run another scrub on the pg it gets flagged as 
inconsistent. See below for the output from my testing. I assume I'm missing 
something obvious, any insight into this process would be greatly appreciated.

Thanks,
Harry

# ceph --version
ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
# ceph status
  cluster a4e417fe-0386-46a5-4475-ca7e10294273
   health HEALTH_OK
   monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum 0 ceph1
   osdmap e13: 3 osds: 3 up, 3 in
    pgmap v232: 192 pgs: 192 active+clean; 44 bytes data, 15465 MB used, 164 GB 
/ 179 GB avail
   mdsmap e1: 0/0/1 up

file removed from osd.2

# ceph pg scrub 0.b
instructing pg 0.b on osd.1 to scrub

# ceph status
  cluster a4e417fe-0386-46a5-4475-ca7e10294273
   health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
   monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum 0 ceph1
   osdmap e13: 3 osds: 3 up, 3 in
    pgmap v233: 192 pgs: 191 active+clean, 1 active+clean+inconsistent; 44 
bytes data, 15465 MB used, 164 GB / 179 GB avail
   mdsmap e1: 0/0/1 up

# ceph pg repair 0.b
instructing pg 0.b on osd.1 to repair

# ceph status
  cluster a4e417fe-0386-46a5-4475-ca7e10294273
   health HEALTH_OK
   monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum 0 ceph1
   osdmap e13: 3 osds: 3 up, 3 in
    pgmap v234: 192 pgs: 192 active+clean; 44 bytes data, 15465 MB used, 164 GB 
/ 179 GB avail
   mdsmap e1: 0/0/1 up

# ceph pg scrub 0.b
instructing pg 0.b on osd.1 to scrub

# ceph status
  cluster a4e417fe-0386-46a5-4475-ca7e10294273
   health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
   monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum 0 ceph1
   osdmap e13: 3 osds: 3 up, 3 in
    pgmap v236: 192 pgs: 191 active+clean, 1 active+clean+inconsistent; 44 
bytes data, 15465 MB used, 164 GB / 179 GB avail
   mdsmap e1: 0/0/1 up



The logs from osd.1:
2013-10-23 14:12:31.188281 7f02a5161700  0 log [ERR] : 0.b osd.2 missing 
3a643fcb/testfile1/head//0
2013-10-23 14:12:31.188312 7f02a5161700  0 log [ERR] : 0.b scrub 1 missing, 0 
inconsistent objects
2013-10-23 14:12:31.188319 7f02a5161700  0 log [ERR] : 0.b scrub 1 errors
2013-10-23 14:13:03.197802 7f02a5161700  0 log [ERR] : 0.b osd.2 missing 
3a643fcb/testfile1/head//0
2013-10-23 14:13:03.197837 7f02a5161700  0 log [ERR] : 0.b repair 1 missing, 0 
inconsistent objects
2013-10-23 14:13:03.197850 7f02a5161700  0 log [ERR] : 0.b repair 1 errors, 1 
fixed
2013-10-23 14:14:47.232953 7f02a5161700  0 log [ERR] : 0.b osd.2 missing 
3a643fcb/testfile1/head//0
2013-10-23 14:14:47.232985 7f02a5161700  0 log [ERR] : 0.b scrub 1 missing, 0 
inconsistent objects
2013-10-23 14:14:47.232991 7f02a5161700  0 log [ERR] : 0.b scrub 1 errors   
  
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding a cluster network to a ceph-deploy cluster

2013-10-23 Thread Loic Dachary
Hi,

Why not just add cluster network and public network to the global section ( 
http://ceph.com/docs/master/rados/configuration/network-config-ref/#ceph-networks
 ) ? The OSDs will pick their address when restarted. I did it once ( just once 
;-) and it worked.

My 2cts

On 23/10/2013 16:33, Stefan Schwarz wrote:
> Hello to all,
> 
> I installed a ceph cluster using ceph-deploy and i am quite happy.
> 
> Now i want to add a cluster network to it. According to "ceph report" 
> public_addr and cluster_addr are set to the same ip. How can i change this 
> now?
> 
> Is it save to add something like this to my ceph.conf:
> 
> #( for 1 to 11 )
> [osd.1]
>   host = ceph-s01
>   cluster addr = 10.0.0.1
>   public addr = 192.168.55.1
> 
> #( for 11 to 23 )
> [osd.11]
>   host = ceph-s02
>   cluster addr = 10.0.0.2
>   public addr = 192.168.55.2
> 
> I'm a bit scared about crashing my existing osds and loosing all my data.
> 
> Thanks for your help
> Stefan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding a cluster network to a ceph-deploy cluster

2013-10-23 Thread Stefan Schwarz

Hi,

well that was way easier than expected.

Thanks a lot :)

On 10/23/2013 04:57 PM, Loic Dachary wrote:

Hi,

Why not just add cluster network and public network to the global section ( 
http://ceph.com/docs/master/rados/configuration/network-config-ref/#ceph-networks
 ) ? The OSDs will pick their address when restarted. I did it once ( just once 
;-) and it worked.

My 2cts

On 23/10/2013 16:33, Stefan Schwarz wrote:

Hello to all,

I installed a ceph cluster using ceph-deploy and i am quite happy.

Now i want to add a cluster network to it. According to "ceph report" 
public_addr and cluster_addr are set to the same ip. How can i change this now?

Is it save to add something like this to my ceph.conf:

#( for 1 to 11 )
[osd.1]
   host = ceph-s01
   cluster addr = 10.0.0.1
   public addr = 192.168.55.1

#( for 11 to 23 )
[osd.11]
   host = ceph-s02
   cluster addr = 10.0.0.2
   public addr = 192.168.55.2

I'm a bit scared about crashing my existing osds and loosing all my data.

Thanks for your help
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] saucy salamander support?

2013-10-23 Thread LaSalle, Jurvis
On 13/10/22 6:28 PM, "Dan Mick"  wrote:

>/etc/ceph should be installed by the package named 'ceph'.  Make sure
>you're using ceph-deploy install to install the Ceph packages before
>trying to use the machines for mon create.

I'll admit, I did skip that step a couple times in my testing since I did
a purgedata, not a purge. But I also went back and completed every step
verbatim and had the same problem.

Now that I've gone to saucy salamander, ceph-deploy install insists on
trying to grab saucy pkgs, but I was able to use ceph-deploy install
--no-adjust-repos to override that behavior and get the install command to
complete without errors.  The result: still no /etc/ceph directory.

It seems that ceph-deploy purgedata removes the /etc/ceph directory and
leaves the rest of ceph package files in place, and subsequent runs of
ceph-deploy install detects that the ceph package is installed, so no
/etc/ceph directory is recreated.  Can anyone who hasn't borked their test
env confirm this behavior?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy hang on CentOS 6.4

2013-10-23 Thread Gruher, Joseph R


>-Original Message-
>From: Alfredo Deza [mailto:alfredo.d...@inktank.com]
>
>Did you tried working with the `--no-adjust-repos` flag in ceph-deploy ? It 
>will
>allow you to tell ceph-deploy to just go and install ceph without attempting to
>import keys or doing anything with your repos.

I have tried this in the past but it caused problems further down the install 
process, I believe due to old or mismatched versions being installed.  That was 
on Ubuntu 12.04.2.  It was discussed a bit on this list at the time.  I would 
not recommend --no-adjust-repos based on my experience.

>The documentation for this can be found here:
>https://github.com/ceph/ceph-deploy#proxy-or-firewall-installs

This doc only mentions setting the wget proxy, I would suggest it be updated to 
include the curl and rpm proxies may need to be set as well.

Thanks,
Joe

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] saucy salamander support?

2013-10-23 Thread Gregory Farnum
http://tracker.ceph.com/issues/6485

I don't believe it's in a release yet, but yes, that's the problem and it's
fixed in the ceph-deploy source repo. :)
-Greg

On Wednesday, October 23, 2013, LaSalle, Jurvis wrote:

> On 13/10/22 6:28 PM, "Dan Mick" >
> wrote:
>
> >/etc/ceph should be installed by the package named 'ceph'.  Make sure
> >you're using ceph-deploy install to install the Ceph packages before
> >trying to use the machines for mon create.
>
> I'll admit, I did skip that step a couple times in my testing since I did
> a purgedata, not a purge. But I also went back and completed every step
> verbatim and had the same problem.
>
> Now that I've gone to saucy salamander, ceph-deploy install insists on
> trying to grab saucy pkgs, but I was able to use ceph-deploy install
> --no-adjust-repos to override that behavior and get the install command to
> complete without errors.  The result: still no /etc/ceph directory.
>
> It seems that ceph-deploy purgedata removes the /etc/ceph directory and
> leaves the rest of ceph package files in place, and subsequent runs of
> ceph-deploy install detects that the ceph package is installed, so no
> /etc/ceph directory is recreated.  Can anyone who hasn't borked their test
> env confirm this behavior?
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS & Project Manila (OpenStack)

2013-10-23 Thread Gregory Farnum
On Wed, Oct 23, 2013 at 7:43 AM, Dimitri Maziuk  wrote:
> On 2013-10-22 22:41, Gregory Farnum wrote:
> ...
>
>> Right now, unsurprisingly, the focus of the existing Manila developers
>> is on Option 1: it's less work than the others and supports the most
>> common storage protocols very well. But as mentioned, it would be a
>> pretty poor fit for CephFS
>
>
> I must be missing something, I thought CephFS was supposed to be a
> distributed filesystem which to me means option 1 was the point.

It's a question of infrastructure and security.
1) For a private cloud with flat networking this would probably be
fine, but if you're running per-tenant VLANs then you might not want
to plug all 1000 Ceph IPs into each tenant.
2) Each running VM would need to have a good native CephFS client.
That means either a working FUSE install with ceph-fuse and its
dependencies installed, or a very new Linux kernel — quite different
from NFS or CIFS, which every OS in the world has a good client for.
3) Even if we get our multi-tenancy as good as we can make it, a buggy
or malicious client will still be able to introduce service
interruptions for the tenant in question. (By disappearing without
notification, leaving leases to time out; by grabbing locks and
refusing to release them; etc.) Nobody wants their cloud to be blamed
for a tenant's software issues, and this would cause trouble.
4) As a development shortcut, providing multitenancy via CephFS would
be a lot easier if we can trust the client (that is, the hypervisor
host) to provide some of the security we need.

Make sense, or am I just crazy? :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] saucy salamander support?

2013-10-23 Thread James Page
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 22/10/13 08:51, Mike Lowe wrote:
> And a +1 from me as well.  It would appear that ubuntu has picked
> up the 0.67.4 source and included a build of it in their official
> repo, so you may be able to get by until the next point release
> with those.
> 
> http://packages.ubuntu.com/search?keywords=ceph

Ceph has been a fully supported part of every Ubuntu release since
12.04; Dumpling will be maintained for the life of Saucy and will
receive point release updates as and when they are released.

The packages are interchangeable with those from ceph.com (I try to
keep things in sync as much as possible) so should work OK with
ceph-deploy.

- -- 
James Page
Ubuntu and Debian Developer
james.p...@ubuntu.com
jamesp...@debian.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSaBliAAoJEL/srsug59jDv+gP/0jEZo+hndnmrewxc459FlvV
bucfEoCBocFi3lDWjroMX4zmtVC06poJ3poR8QAIIdspaA2PD3u4UcbvANK9Ft/b
TVaw+MYuVmq2wohX7okAI4sma4JVadsamIYvpO4qJlEJ3QYvWtl77s+N2P0qcG6/
RJtyhIGCEVggMmBp4vEyL4NP73q9ddXJS5a9usKpCKbWe/75vZUZ+GaWN4FDi6Ja
Q+6FPY35cJAk2kvhP1bXDvKutFH/0ZjuyDP1ReEiGDNb9eyQxEiSLGZnq97/7gOd
1ORgFbpFeficonsPhYanCHm6Ce6bmYKRLgoqIEpY17p52RbF6z+kfsmguATJ5IEI
V7/6dSYoYaNl5VuaGQ8+z+KvkaW6SDMSgYBWQEY0EFbFrND244xDCp3/0gmgc59h
i6yW+WJbj8Li4KY2Ah58NZLBpCyE+qKAzXorl6F3OhCgapBvqHjScMu0Brlsfskd
vQVPOOa4fhS0obxJEK8IcwbfTC0OF0E4tuErJ8ipP1jKY1khDY6dFM9dzvGnbL9H
76uBB+pbA2j0wPbciEjzVzl8qdafSwOBEf9OCDDAJ17ZuQvGEFGbf8f1uPL1LDYQ
2HRBIUKFv4nGFwzlAIuYMxL0z9dibNqtKgq68MtBJd8WmmAe4/93nQv1KyLlzt4M
fvjqETvQJW5NW6MkyyTu
=qObf
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS & Project Manila (OpenStack)

2013-10-23 Thread Dimitri Maziuk
On 10/23/2013 12:53 PM, Gregory Farnum wrote:
> On Wed, Oct 23, 2013 at 7:43 AM, Dimitri Maziuk  wrote:
>> On 2013-10-22 22:41, Gregory Farnum wrote:
>> ...
>>
>>> Right now, unsurprisingly, the focus of the existing Manila developers
>>> is on Option 1: it's less work than the others and supports the most
>>> common storage protocols very well. But as mentioned, it would be a
>>> pretty poor fit for CephFS
>>
>>
>> I must be missing something, I thought CephFS was supposed to be a
>> distributed filesystem which to me means option 1 was the point.
> 
> It's a question of infrastructure and security.
> 1) For a private cloud with flat networking this would probably be
> fine, but if you're running per-tenant VLANs then you might not want
> to plug all 1000 Ceph IPs into each tenant.

What's a "tenant"? How is he different from "share" in the context of a
filesystem?

Why plug 1000 IPs into it, I thought you only needed an MDS or three  to
mount the filesystem? Now, exporting different filesystems via different
MDSes on top of the same set of OSDs might be useful for spreading the
load, too.

> 2) Each running VM would need to have a good native CephFS client.
> That means either a working FUSE install with ceph-fuse and its
> dependencies installed, or a very new Linux kernel — quite different
> from NFS or CIFS, which every OS in the world has a good client for.

This is a problem you have to face anyway: right now cephfs is unusable
on rhel family because elrepo's kernel-ml isn't fit for stable
deployments and I've no idea if their -lt is, like the stock el6 kernel,
"too old".

I doubt rhel 7 will go any newer than 3.9, though with the number of
patches they normally use their version numbers don't mean much.

No idea what suse & ubuntu lts do, but with rhel family you're looking
at "maybe 3.9 by maybe next summer".

> 3) Even if we get our multi-tenancy as good as we can make it, a buggy
> or malicious client will still be able to introduce service
> interruptions for the tenant in question. (By disappearing without
> notification, leaving leases to time out; by grabbing locks and
> refusing to release them; etc.) Nobody wants their cloud to be blamed
> for a tenant's software issues, and this would cause trouble.
> 4) As a development shortcut, providing multitenancy via CephFS would
> be a lot easier if we can trust the client (that is, the hypervisor
> host) to provide some of the security we need.

There's a fix for that and it's called EULA: "your client breaks our
cloud, we sue the swift out of you". See e.g. http://aws.amazon.com/aup/

You can't trust the client. All you can do is make sure that when e.g.
they kill their MDSes, other tenants' MDSes are not killed. The rest is
essentially a non-technical problem, there's no software pill for those.

Again, I'm likely missing a lot of somethings, so take it with a pound
of salt.
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Default PGs

2013-10-23 Thread Gruher, Joseph R
Should osd_pool_default_pg_num and osd_pool_default_pgp_num apply to the 
default pools?  I put them in ceph.conf before creating any OSDs but after 
bringing up the OSDs the default pools are using a value of 64.

Ceph.conf contains these lines in [global]:
osd_pool_default_pgp_num = 800
osd_pool_default_pg_num = 800

After creating and activating OSDs:

[ceph@joceph05 ceph]$ ceph osd pool get data pg_num
pg_num: 64
[ceph@joceph05 ceph]$ ceph osd pool get data pgp_num
pgp_num: 64

[ceph@joceph05 ceph]$ ceph osd dump

pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 
64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0
pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 
64 pgp_num 64 last_change 1 owner 0

I have ceph-deploy 1.2.7 and ceph 0.67.4 on CentOS 6.4 with 3.11.6 kernel.

Thanks,
Joe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS and clients [was: CephFS & Project Manila (OpenStack)]

2013-10-23 Thread Gregory Farnum
On Wed, Oct 23, 2013 at 11:47 AM, Dimitri Maziuk  wrote:
> On 10/23/2013 12:53 PM, Gregory Farnum wrote:
>> On Wed, Oct 23, 2013 at 7:43 AM, Dimitri Maziuk  
>> wrote:
>>> On 2013-10-22 22:41, Gregory Farnum wrote:
>>> ...
>>>
 Right now, unsurprisingly, the focus of the existing Manila developers
 is on Option 1: it's less work than the others and supports the most
 common storage protocols very well. But as mentioned, it would be a
 pretty poor fit for CephFS
>>>
>>>
>>> I must be missing something, I thought CephFS was supposed to be a
>>> distributed filesystem which to me means option 1 was the point.
>>
>> It's a question of infrastructure and security.
>> 1) For a private cloud with flat networking this would probably be
>> fine, but if you're running per-tenant VLANs then you might not want
>> to plug all 1000 Ceph IPs into each tenant.
>
> What's a "tenant"? How is he different from "share" in the context of a
> filesystem?
>
> Why plug 1000 IPs into it, I thought you only needed an MDS or three  to
> mount the filesystem? Now, exporting different filesystems via different
> MDSes on top of the same set of OSDs might be useful for spreading the
> load, too.

Ah, I see. No, each CephFS client needs to communicate with the whole
cluster. Only the POSIX metadata changes flow through the MDS.

>
>> 2) Each running VM would need to have a good native CephFS client.
>> That means either a working FUSE install with ceph-fuse and its
>> dependencies installed, or a very new Linux kernel — quite different
>> from NFS or CIFS, which every OS in the world has a good client for.
>
> This is a problem you have to face anyway: right now cephfs is unusable
> on rhel family because elrepo's kernel-ml isn't fit for stable
> deployments and I've no idea if their -lt is, like the stock el6 kernel,
> "too old".
>
> I doubt rhel 7 will go any newer than 3.9, though with the number of
> patches they normally use their version numbers don't mean much.
>
> No idea what suse & ubuntu lts do, but with rhel family you're looking
> at "maybe 3.9 by maybe next summer".

True. Still, the nature of the problem is different between supporting
one organization with one system, versus supporting the public with
whatever install media they bring in to your cloud.

>> 3) Even if we get our multi-tenancy as good as we can make it, a buggy
>> or malicious client will still be able to introduce service
>> interruptions for the tenant in question. (By disappearing without
>> notification, leaving leases to time out; by grabbing locks and
>> refusing to release them; etc.) Nobody wants their cloud to be blamed
>> for a tenant's software issues, and this would cause trouble.
>> 4) As a development shortcut, providing multitenancy via CephFS would
>> be a lot easier if we can trust the client (that is, the hypervisor
>> host) to provide some of the security we need.
>
> There's a fix for that and it's called EULA: "your client breaks our
> cloud, we sue the swift out of you". See e.g. http://aws.amazon.com/aup/
>
> You can't trust the client. All you can do is make sure that when e.g.
> they kill their MDSes, other tenants' MDSes are not killed. The rest is
> essentially a non-technical problem, there's no software pill for those.

It is better to make such issues technically difficult or impossible,
than to make them legal requirements — being able to sue the guy
running 3 VMs for his side project doesn't do much good if he's
managed to damage somebody else. We need to not *need* to trust the
clients; there are a lot of things we can do in CephFS to make the
attack surface smaller but it is never going to be as small as
something over the NFS protocol.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd client module in centos 6.4

2013-10-23 Thread Gruher, Joseph R
Hi all,

I have CentOS 6.4 with 3.11.6 kernel running (built from latest stable on 
kernel.org) and I cannot load the rbd client module.  Should I have to do 
anything to enable/install it?  Shouldn't it be present in this kernel?

[ceph@joceph05 /]$ cat /etc/centos-release
CentOS release 6.4 (Final)

[ceph@joceph05 /]$ uname -a
Linux joceph05.jf.intel.com 3.11.6 #1 SMP Mon Oct 21 17:23:07 PDT 2013 x86_64 
x86_64 x86_64 GNU/Linux

[ceph@joceph05 /]$ modprobe rbd
FATAL: Module rbd not found.
[ceph@joceph05 /]$

Thanks,
Joe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD journal size

2013-10-23 Thread Shain Miley
Alfredo,

Do you know what version of ceph-deploy has this updated functionality

I just updated to 1.2.7 and it does not appear to include it.

Thanks,

Shain

Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
smi...@npr.org | 202.513.3649


From: ceph-users-boun...@lists.ceph.com [ceph-users-boun...@lists.ceph.com] on 
behalf of Shain Miley [smi...@npr.org]
Sent: Monday, October 21, 2013 6:13 PM
To: Alfredo Deza
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] OSD journal size

Alfredo,

Thanks a lot for the info.

I'll make sure I have an updated version of ceph-deploy and give it another  
shot.

Shain
Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
smi...@npr.org | 202.513.3649


From: Alfredo Deza [alfredo.d...@inktank.com]
Sent: Monday, October 21, 2013 2:03 PM
To: Shain Miley
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] OSD journal size

On Mon, Oct 21, 2013 at 1:21 PM, Shain Miley  wrote:
> Hi,
>
> We have been testing a ceph cluster with the following specs:
>
> 3 Mon's
> 72 OSD's spread across 6 Dell R-720xd servers
> 4 TB SAS drives
> 4 bonded 10 GigE NIC ports per server
> 64 GB of RAM
>
> Up until this point we have been running tests using the default journal
> size of '1024'.
> Before we start to place production data on the cluster I was want to clear
> up the following questions I have:
>
> 1)Is there a more appropriate journal size for my setup given the specs
> listed above?
>
> 2)According to this link:
>
> http://www.slideshare.net/Inktank_Ceph/cern-ceph-day-london-2013/11
>
> CERN is using  '/dev/disk/by-path' for their OSD's.
>
> Does ceph-deploy currently support setting up OSD's using this method?

Indeed it does!

`ceph-deploy osd --help` got updated recently to demonstrate how this
needs to be done (an extra step is involved):

For paths, first prepare and then activate:

ceph-deploy osd prepare {osd-node-name}:/path/to/osd
ceph-deploy osd activate {osd-node-name}:/path/to/osd



>
> Thanks,
>
> Shain
>
> Shain Miley | Manager of Systems and Infrastructure, Digital Media |
> smi...@npr.org | 202.513.3649
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS and clients [was: CephFS & Project Manila (OpenStack)]

2013-10-23 Thread Dimitri Maziuk
On 10/23/2013 02:46 PM, Gregory Farnum wrote:

> Ah, I see. No, each CephFS client needs to communicate with the whole
> cluster. Only the POSIX metadata changes flow through the MDS.

Yeah, I thought you'd say that. Back in February I asked if I could get
a cephfs client to read from a specific osd, localhost in my case, and
was given to understand that the whole point of cephfs is that it won't.

> It is better to make such issues technically difficult or impossible,
> than to make them legal requirements — being able to sue the guy
> running 3 VMs for his side project doesn't do much good if he's
> managed to damage somebody else.

Well, you can't, can you? If every client is banging on every osd, the
amount of damage it can potentially do is non-deterministic with upper
bound of "the entire storage infrastructure". At which point suing
anybody won't help indeed.

All I need to do is subvert one "trusted" hypervisor, and then your "the
entire storage infrastructure" is just as dead.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS and clients [was: CephFS & Project Manila (OpenStack)]

2013-10-23 Thread Gregory Farnum
On Wed, Oct 23, 2013 at 1:28 PM, Dimitri Maziuk  wrote:
> On 10/23/2013 02:46 PM, Gregory Farnum wrote:
>
>> Ah, I see. No, each CephFS client needs to communicate with the whole
>> cluster. Only the POSIX metadata changes flow through the MDS.
>
> Yeah, I thought you'd say that. Back in February I asked if I could get
> a cephfs client to read from a specific osd, localhost in my case, and
> was given to understand that the whole point of cephfs is that it won't.
>
>> It is better to make such issues technically difficult or impossible,
>> than to make them legal requirements — being able to sue the guy
>> running 3 VMs for his side project doesn't do much good if he's
>> managed to damage somebody else.
>
> Well, you can't, can you? If every client is banging on every osd, the
> amount of damage it can potentially do is non-deterministic with upper
> bound of "the entire storage infrastructure". At which point suing
> anybody won't help indeed.
>
> All I need to do is subvert one "trusted" hypervisor, and then your "the
> entire storage infrastructure" is just as dead.

Actually, the OSDs are a pretty small attack vector. Buffer overflow
attacks or whatever aside, we have a rich enough capabilities system
to prevent anybody from accessing data not their own in the OSDs, and
although heavy users can increase the latency for everybody else, the
op processing is fair so they can't block access.
(If somebody manages to subvert an OpenStack hypervisor, I believe
they can do a lot worse than bang on the storage cluster!)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw usage

2013-10-23 Thread Derek Yarnell
Hi,

So the problem was that '.usage' pool was not created.  I haven't
traversed the code well enough yet to know where this pool is supposed
to get created but it wasn't even though the option was on.  As soon as
I hand created the pool the radosgw started logging usage.

Thanks,
derek

-- 
---
Derek T. Yarnell
University of Maryland
Institute for Advanced Computer Studies
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD journal size

2013-10-23 Thread Shain Miley
O.K...I found the help section in 1.2.7 that talks about using paths...however 
I still cannot get this to work:


root@hqceph1:/usr/local/ceph-install-1# ceph-deploy osd prepare 
hqosd1:/dev/disk/by-path/pci-:02:00.0-scsi-0:2:1:0

usage: ceph-deploy osd [-h] [--zap-disk] [--fs-type FS_TYPE] [--dmcrypt]
   [--dmcrypt-key-dir KEYDIR]
   SUBCOMMAND HOST:DISK[:JOURNAL] [HOST:DISK[:JOURNAL]
   ...]
ceph-deploy osd: error: argument HOST:DISK[:JOURNAL]: must be in form 
HOST:DISK[:JOURNAL]


is '/dev/disk/by-path' names supported...or am I doing something wrong?

Thanks,

Shain



Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
smi...@npr.org | 202.513.3649


From: ceph-users-boun...@lists.ceph.com [ceph-users-boun...@lists.ceph.com] on 
behalf of Shain Miley [smi...@npr.org]
Sent: Wednesday, October 23, 2013 4:19 PM
To: Alfredo Deza
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] OSD journal size

Alfredo,

Do you know what version of ceph-deploy has this updated functionality

I just updated to 1.2.7 and it does not appear to include it.

Thanks,

Shain

Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
smi...@npr.org | 202.513.3649


From: ceph-users-boun...@lists.ceph.com [ceph-users-boun...@lists.ceph.com] on 
behalf of Shain Miley [smi...@npr.org]
Sent: Monday, October 21, 2013 6:13 PM
To: Alfredo Deza
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] OSD journal size

Alfredo,

Thanks a lot for the info.

I'll make sure I have an updated version of ceph-deploy and give it another  
shot.

Shain
Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
smi...@npr.org | 202.513.3649


From: Alfredo Deza [alfredo.d...@inktank.com]
Sent: Monday, October 21, 2013 2:03 PM
To: Shain Miley
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] OSD journal size

On Mon, Oct 21, 2013 at 1:21 PM, Shain Miley  wrote:
> Hi,
>
> We have been testing a ceph cluster with the following specs:
>
> 3 Mon's
> 72 OSD's spread across 6 Dell R-720xd servers
> 4 TB SAS drives
> 4 bonded 10 GigE NIC ports per server
> 64 GB of RAM
>
> Up until this point we have been running tests using the default journal
> size of '1024'.
> Before we start to place production data on the cluster I was want to clear
> up the following questions I have:
>
> 1)Is there a more appropriate journal size for my setup given the specs
> listed above?
>
> 2)According to this link:
>
> http://www.slideshare.net/Inktank_Ceph/cern-ceph-day-london-2013/11
>
> CERN is using  '/dev/disk/by-path' for their OSD's.
>
> Does ceph-deploy currently support setting up OSD's using this method?

Indeed it does!

`ceph-deploy osd --help` got updated recently to demonstrate how this
needs to be done (an extra step is involved):

For paths, first prepare and then activate:

ceph-deploy osd prepare {osd-node-name}:/path/to/osd
ceph-deploy osd activate {osd-node-name}:/path/to/osd



>
> Thanks,
>
> Shain
>
> Shain Miley | Manager of Systems and Infrastructure, Digital Media |
> smi...@npr.org | 202.513.3649
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS & Project Manila (OpenStack)

2013-10-23 Thread Kyle Bader
> Option 1) The service plugs your filesystem's IP into the VM's network
> and provides direct IP access. For a shared box (like an NFS server)
> this is fairly straightforward and works well (*everything* has a
> working NFS client). It's more troublesome for CephFS, since we'd need
> to include access to many hosts, lots of operating systems don't
> include good CephFS clients by default, and the client is capable of
> forcing some service disruptions if they misbehave or disappear (most
> likely via lease timeouts), but it may not be impossible.
>

This is going to get horribly ugly when you add neutron into the mix, so
much so I'd consider this option a non-starter. If someone is using
openvswitch to create network overlays to isolate each tenant I can't
imagine this ever working.


> Option 2) The hypervisor mediates access to the FS via some
> pass-through filesystem (presumably P9 — Plan 9 FS, which QEMU/KVM is
> already prepared to work with). This works better for us; the
> hypervisor host can have a single CephFS mount that it shares
> selectively to client VMs or something.
>

This seems like the only sane way to do it IMO.


> Option 3) An agent communicates with the client via a well-understood
> protocol (probably NFS) on their VLAN, and to the the backing
> filesystem on a different VLAN in the native protocol. This would also
> work for CephFS, but of course having to use a gateway agent (either
> on a per-tenant or per-many-tenants basis) is a bit of a bummer in
> terms of latency, etc.
>

Again, this still tricky with neutron and network overlays. You would need
one agent per tenant network and encapsulate the agents traffic using with
openvswitch (STT/VxLAN/etc) or a physical switch (only VxLAN is supported
in silicon).

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD journal size

2013-10-23 Thread Gruher, Joseph R
Speculating, but it seems possible that the ':' in the path is problematic, 
since that is also the separator between disk and journal (HOST:DISK:JOURNAL)?

Perhaps if you enclose in ''s or or use /dev/disk/by-id?

>-Original Message-
>From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
>boun...@lists.ceph.com] On Behalf Of Shain Miley
>Sent: Wednesday, October 23, 2013 1:55 PM
>To: Alfredo Deza
>Cc: ceph-us...@ceph.com
>Subject: Re: [ceph-users] OSD journal size
>
>O.K...I found the help section in 1.2.7 that talks about using paths...however 
>I
>still cannot get this to work:
>
>
>root@hqceph1:/usr/local/ceph-install-1# ceph-deploy osd prepare
>hqosd1:/dev/disk/by-path/pci-:02:00.0-scsi-0:2:1:0
>
>usage: ceph-deploy osd [-h] [--zap-disk] [--fs-type FS_TYPE] [--dmcrypt]
>   [--dmcrypt-key-dir KEYDIR]
>   SUBCOMMAND HOST:DISK[:JOURNAL] [HOST:DISK[:JOURNAL]
>   ...]
>ceph-deploy osd: error: argument HOST:DISK[:JOURNAL]: must be in form
>HOST:DISK[:JOURNAL]
>
>
>is '/dev/disk/by-path' names supported...or am I doing something wrong?
>
>Thanks,
>
>Shain
>
>
>
>Shain Miley | Manager of Systems and Infrastructure, Digital Media |
>smi...@npr.org | 202.513.3649
>
>
>From: ceph-users-boun...@lists.ceph.com [ceph-users-
>boun...@lists.ceph.com] on behalf of Shain Miley [smi...@npr.org]
>Sent: Wednesday, October 23, 2013 4:19 PM
>To: Alfredo Deza
>Cc: ceph-us...@ceph.com
>Subject: Re: [ceph-users] OSD journal size
>
>Alfredo,
>
>Do you know what version of ceph-deploy has this updated functionality
>
>I just updated to 1.2.7 and it does not appear to include it.
>
>Thanks,
>
>Shain
>
>Shain Miley | Manager of Systems and Infrastructure, Digital Media |
>smi...@npr.org | 202.513.3649
>
>
>From: ceph-users-boun...@lists.ceph.com [ceph-users-
>boun...@lists.ceph.com] on behalf of Shain Miley [smi...@npr.org]
>Sent: Monday, October 21, 2013 6:13 PM
>To: Alfredo Deza
>Cc: ceph-us...@ceph.com
>Subject: Re: [ceph-users] OSD journal size
>
>Alfredo,
>
>Thanks a lot for the info.
>
>I'll make sure I have an updated version of ceph-deploy and give it another
>shot.
>
>Shain
>Shain Miley | Manager of Systems and Infrastructure, Digital Media |
>smi...@npr.org | 202.513.3649
>
>
>From: Alfredo Deza [alfredo.d...@inktank.com]
>Sent: Monday, October 21, 2013 2:03 PM
>To: Shain Miley
>Cc: ceph-us...@ceph.com
>Subject: Re: [ceph-users] OSD journal size
>
>On Mon, Oct 21, 2013 at 1:21 PM, Shain Miley  wrote:
>> Hi,
>>
>> We have been testing a ceph cluster with the following specs:
>>
>> 3 Mon's
>> 72 OSD's spread across 6 Dell R-720xd servers
>> 4 TB SAS drives
>> 4 bonded 10 GigE NIC ports per server
>> 64 GB of RAM
>>
>> Up until this point we have been running tests using the default
>> journal size of '1024'.
>> Before we start to place production data on the cluster I was want to
>> clear up the following questions I have:
>>
>> 1)Is there a more appropriate journal size for my setup given the
>> specs listed above?
>>
>> 2)According to this link:
>>
>> http://www.slideshare.net/Inktank_Ceph/cern-ceph-day-london-2013/11
>>
>> CERN is using  '/dev/disk/by-path' for their OSD's.
>>
>> Does ceph-deploy currently support setting up OSD's using this method?
>
>Indeed it does!
>
>`ceph-deploy osd --help` got updated recently to demonstrate how this needs
>to be done (an extra step is involved):
>
>For paths, first prepare and then activate:
>
>ceph-deploy osd prepare {osd-node-name}:/path/to/osd
>ceph-deploy osd activate {osd-node-name}:/path/to/osd
>
>
>
>>
>> Thanks,
>>
>> Shain
>>
>> Shain Miley | Manager of Systems and Infrastructure, Digital Media |
>> smi...@npr.org | 202.513.3649
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ls/file access hangs on a single ceph directory

2013-10-23 Thread Michael

Tying to gather some more info.

CentOS - hanging ls
[root@srv ~]# cat /proc/14614/stack
[] wait_answer_interruptible+0x81/0xc0 [fuse]
[] fuse_request_send+0x1cb/0x290 [fuse]
[] fuse_do_getattr+0x10c/0x2c0 [fuse]
[] fuse_update_attributes+0x75/0x80 [fuse]
[] fuse_getattr+0x53/0x60 [fuse]
[] vfs_getattr+0x51/0x80
[] vfs_fstatat+0x60/0x80
[] vfs_stat+0x1b/0x20
[] sys_newstat+0x24/0x50
[] system_call_fastpath+0x16/0x1b
[] 0x

Ubuntu - hanging ls
root@srv:~# cat /proc/30012/stack
[] ceph_mdsc_do_request+0xcb/0x1a0 [ceph]
[] ceph_do_getattr+0xe7/0x120 [ceph]
[] ceph_getattr+0x24/0x100 [ceph]
[] vfs_getattr+0x4e/0x80
[] vfs_fstatat+0x4e/0x70
[] vfs_lstat+0x1e/0x20
[] sys_newlstat+0x1a/0x40
[] system_call_fastpath+0x16/0x1b
[] 0x

Started occurring shortly (within an hour or so) after adding a pool, 
not sure if that's relevant yet.


-Michael

On 23/10/2013 21:10, Michael wrote:
I have a filesystem shared by several systems mounted on 2 ceph nodes 
with a 3rd as a reference monitor.
It's been used for a couple of months now but suddenly the root 
directory for the mount has become inaccessible and requests to files 
in it just hang, there's no ceph errors reported before/after and 
subdirectories of the directory can be used (and still are currently 
being used by VM's still running from it). It's being mounted in a 
mixed kernel driver (ubuntu) and centos (ceph-fuse) environment.


 cluster ab3f7bc0-4cf7-4489-9cde-1af11d68a834
   health HEALTH_OK
   monmap e1: 3 mons at 
{srv10=##:6789/0,srv11=##:6789/0,srv8=##:6789/0}, election epoch 96, 
quorum 0,1,2 srv10,srv11,srv8

   osdmap e2873: 6 osds: 6 up, 6 in
   pgmap v2451618: 728 pgs: 728 active+clean; 128 GB data, 260 GB 
used, 3929 GB / 4189 GB avail; 30365B/s wr, 5op/s

   mdsmap e51: 1/1/1 up {0=srv10=up:active}

Have done a full deep scrub/repair cycle on all of the osd which has 
come back fine so not really sure where to start looking to find out 
what's wrong with it.


Any ideas?

-Michael

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-users mail digest settings

2013-10-23 Thread Blair Bethwaite
Hi all,

Can I request that somebody with list admin rights please fixes the digest
settings for this list - I'm regularly receiving 8+ digest messages within
a 24 hour period, not really a "digest" :-).

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ls/file access hangs on a single ceph directory

2013-10-23 Thread Sage Weil
If you do

 ceph mds tell 0 dumpcache /tmp/foo

it will dump the dms cache, and 

 ceph-post-file /tmp/foo

will send the file to ceph.com so we can get some clue what happened.  I 
suspect that restarting the ceph-mds process will resolve the hang.

Thanks!
sage


On Wed, 23 Oct 2013, Michael wrote:

> Tying to gather some more info.
> 
> CentOS - hanging ls
> [root@srv ~]# cat /proc/14614/stack
> [] wait_answer_interruptible+0x81/0xc0 [fuse]
> [] fuse_request_send+0x1cb/0x290 [fuse]
> [] fuse_do_getattr+0x10c/0x2c0 [fuse]
> [] fuse_update_attributes+0x75/0x80 [fuse]
> [] fuse_getattr+0x53/0x60 [fuse]
> [] vfs_getattr+0x51/0x80
> [] vfs_fstatat+0x60/0x80
> [] vfs_stat+0x1b/0x20
> [] sys_newstat+0x24/0x50
> [] system_call_fastpath+0x16/0x1b
> [] 0x
> 
> Ubuntu - hanging ls
> root@srv:~# cat /proc/30012/stack
> [] ceph_mdsc_do_request+0xcb/0x1a0 [ceph]
> [] ceph_do_getattr+0xe7/0x120 [ceph]
> [] ceph_getattr+0x24/0x100 [ceph]
> [] vfs_getattr+0x4e/0x80
> [] vfs_fstatat+0x4e/0x70
> [] vfs_lstat+0x1e/0x20
> [] sys_newlstat+0x1a/0x40
> [] system_call_fastpath+0x16/0x1b
> [] 0x
> 
> Started occurring shortly (within an hour or so) after adding a pool, not sure
> if that's relevant yet.
> 
> -Michael
> 
> On 23/10/2013 21:10, Michael wrote:
> > I have a filesystem shared by several systems mounted on 2 ceph nodes with a
> > 3rd as a reference monitor.
> > It's been used for a couple of months now but suddenly the root directory
> > for the mount has become inaccessible and requests to files in it just hang,
> > there's no ceph errors reported before/after and subdirectories of the
> > directory can be used (and still are currently being used by VM's still
> > running from it). It's being mounted in a mixed kernel driver (ubuntu) and
> > centos (ceph-fuse) environment.
> > 
> >  cluster ab3f7bc0-4cf7-4489-9cde-1af11d68a834
> >health HEALTH_OK
> >monmap e1: 3 mons at {srv10=##:6789/0,srv11=##:6789/0,srv8=##:6789/0},
> > election epoch 96, quorum 0,1,2 srv10,srv11,srv8
> >osdmap e2873: 6 osds: 6 up, 6 in
> >pgmap v2451618: 728 pgs: 728 active+clean; 128 GB data, 260 GB used, 3929
> > GB / 4189 GB avail; 30365B/s wr, 5op/s
> >mdsmap e51: 1/1/1 up {0=srv10=up:active}
> > 
> > Have done a full deep scrub/repair cycle on all of the osd which has come
> > back fine so not really sure where to start looking to find out what's wrong
> > with it.
> > 
> > Any ideas?
> > 
> > -Michael
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD journal size

2013-10-23 Thread Shain Miley
Joseph,

I suspect the same...I was just wondering if it was supposed to be supported 
using ceph-deploy since CERN had it in their setup.

I was able to use '/dev/disk/by-id', although when I list out the osd mount 
points it still shows sdb,sdc, etc:


oot@hqosd1:/dev/disk/by-id# df -h

Filesystem  Size  Used Avail Use% Mounted on
/dev/sdb1   3.7T   36M  3.7T   1% /var/lib/ceph/osd/ceph-0
/dev/sdc1   3.7T   36M  3.7T   1% /var/lib/ceph/osd/ceph-1

I guess I was excepting the mount points to use those  'by-id' names 
insteadbut maybe this is expected?


Thanks,

Shain


Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
smi...@npr.org | 202.513.3649


From: Gruher, Joseph R [joseph.r.gru...@intel.com]
Sent: Wednesday, October 23, 2013 6:32 PM
To: Shain Miley
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] OSD journal size

Speculating, but it seems possible that the ':' in the path is problematic, 
since that is also the separator between disk and journal (HOST:DISK:JOURNAL)?

Perhaps if you enclose in ''s or or use /dev/disk/by-id?

>-Original Message-
>From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
>boun...@lists.ceph.com] On Behalf Of Shain Miley
>Sent: Wednesday, October 23, 2013 1:55 PM
>To: Alfredo Deza
>Cc: ceph-us...@ceph.com
>Subject: Re: [ceph-users] OSD journal size
>
>O.K...I found the help section in 1.2.7 that talks about using paths...however 
>I
>still cannot get this to work:
>
>
>root@hqceph1:/usr/local/ceph-install-1# ceph-deploy osd prepare
>hqosd1:/dev/disk/by-path/pci-:02:00.0-scsi-0:2:1:0
>
>usage: ceph-deploy osd [-h] [--zap-disk] [--fs-type FS_TYPE] [--dmcrypt]
>   [--dmcrypt-key-dir KEYDIR]
>   SUBCOMMAND HOST:DISK[:JOURNAL] [HOST:DISK[:JOURNAL]
>   ...]
>ceph-deploy osd: error: argument HOST:DISK[:JOURNAL]: must be in form
>HOST:DISK[:JOURNAL]
>
>
>is '/dev/disk/by-path' names supported...or am I doing something wrong?
>
>Thanks,
>
>Shain
>
>
>
>Shain Miley | Manager of Systems and Infrastructure, Digital Media |
>smi...@npr.org | 202.513.3649
>
>
>From: ceph-users-boun...@lists.ceph.com [ceph-users-
>boun...@lists.ceph.com] on behalf of Shain Miley [smi...@npr.org]
>Sent: Wednesday, October 23, 2013 4:19 PM
>To: Alfredo Deza
>Cc: ceph-us...@ceph.com
>Subject: Re: [ceph-users] OSD journal size
>
>Alfredo,
>
>Do you know what version of ceph-deploy has this updated functionality
>
>I just updated to 1.2.7 and it does not appear to include it.
>
>Thanks,
>
>Shain
>
>Shain Miley | Manager of Systems and Infrastructure, Digital Media |
>smi...@npr.org | 202.513.3649
>
>
>From: ceph-users-boun...@lists.ceph.com [ceph-users-
>boun...@lists.ceph.com] on behalf of Shain Miley [smi...@npr.org]
>Sent: Monday, October 21, 2013 6:13 PM
>To: Alfredo Deza
>Cc: ceph-us...@ceph.com
>Subject: Re: [ceph-users] OSD journal size
>
>Alfredo,
>
>Thanks a lot for the info.
>
>I'll make sure I have an updated version of ceph-deploy and give it another
>shot.
>
>Shain
>Shain Miley | Manager of Systems and Infrastructure, Digital Media |
>smi...@npr.org | 202.513.3649
>
>
>From: Alfredo Deza [alfredo.d...@inktank.com]
>Sent: Monday, October 21, 2013 2:03 PM
>To: Shain Miley
>Cc: ceph-us...@ceph.com
>Subject: Re: [ceph-users] OSD journal size
>
>On Mon, Oct 21, 2013 at 1:21 PM, Shain Miley  wrote:
>> Hi,
>>
>> We have been testing a ceph cluster with the following specs:
>>
>> 3 Mon's
>> 72 OSD's spread across 6 Dell R-720xd servers
>> 4 TB SAS drives
>> 4 bonded 10 GigE NIC ports per server
>> 64 GB of RAM
>>
>> Up until this point we have been running tests using the default
>> journal size of '1024'.
>> Before we start to place production data on the cluster I was want to
>> clear up the following questions I have:
>>
>> 1)Is there a more appropriate journal size for my setup given the
>> specs listed above?
>>
>> 2)According to this link:
>>
>> http://www.slideshare.net/Inktank_Ceph/cern-ceph-day-london-2013/11
>>
>> CERN is using  '/dev/disk/by-path' for their OSD's.
>>
>> Does ceph-deploy currently support setting up OSD's using this method?
>
>Indeed it does!
>
>`ceph-deploy osd --help` got updated recently to demonstrate how this needs
>to be done (an extra step is involved):
>
>For paths, first prepare and then activate:
>
>ceph-deploy osd prepare {osd-node-name}:/path/to/osd
>ceph-deploy osd activate {osd-node-name}:/path/to/osd
>
>
>
>>
>> Thanks,
>>
>> Shain
>>
>> Shain Miley | Manager of Systems and Infrastructure, Digital Media |
>> smi...@npr.org | 202.513.3649
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>___
>ceph-users mailing

Re: [ceph-users] CephFS & Project Manila (OpenStack)

2013-10-23 Thread Mark Nelson

On 10/23/2013 01:47 PM, Dimitri Maziuk wrote:

On 10/23/2013 12:53 PM, Gregory Farnum wrote:

On Wed, Oct 23, 2013 at 7:43 AM, Dimitri Maziuk  wrote:

On 2013-10-22 22:41, Gregory Farnum wrote:
...


Right now, unsurprisingly, the focus of the existing Manila developers
is on Option 1: it's less work than the others and supports the most
common storage protocols very well. But as mentioned, it would be a
pretty poor fit for CephFS



I must be missing something, I thought CephFS was supposed to be a
distributed filesystem which to me means option 1 was the point.


It's a question of infrastructure and security.
1) For a private cloud with flat networking this would probably be
fine, but if you're running per-tenant VLANs then you might not want
to plug all 1000 Ceph IPs into each tenant.


What's a "tenant"? How is he different from "share" in the context of a
filesystem?

Why plug 1000 IPs into it, I thought you only needed an MDS or three  to
mount the filesystem? Now, exporting different filesystems via different
MDSes on top of the same set of OSDs might be useful for spreading the
load, too.


2) Each running VM would need to have a good native CephFS client.
That means either a working FUSE install with ceph-fuse and its
dependencies installed, or a very new Linux kernel — quite different
from NFS or CIFS, which every OS in the world has a good client for.


This is a problem you have to face anyway: right now cephfs is unusable
on rhel family because elrepo's kernel-ml isn't fit for stable
deployments and I've no idea if their -lt is, like the stock el6 kernel,
"too old".

I doubt rhel 7 will go any newer than 3.9, though with the number of
patches they normally use their version numbers don't mean much.

No idea what suse & ubuntu lts do, but with rhel family you're looking
at "maybe 3.9 by maybe next summer".


AFAIK trusty is going to be using 3.12.




3) Even if we get our multi-tenancy as good as we can make it, a buggy
or malicious client will still be able to introduce service
interruptions for the tenant in question. (By disappearing without
notification, leaving leases to time out; by grabbing locks and
refusing to release them; etc.) Nobody wants their cloud to be blamed
for a tenant's software issues, and this would cause trouble.
4) As a development shortcut, providing multitenancy via CephFS would
be a lot easier if we can trust the client (that is, the hypervisor
host) to provide some of the security we need.


There's a fix for that and it's called EULA: "your client breaks our
cloud, we sue the swift out of you". See e.g. http://aws.amazon.com/aup/

You can't trust the client. All you can do is make sure that when e.g.
they kill their MDSes, other tenants' MDSes are not killed. The rest is
essentially a non-technical problem, there's no software pill for those.

Again, I'm likely missing a lot of somethings, so take it with a pound
of salt.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds laggy or crashed

2013-10-23 Thread Gregory Farnum
Looks like your journal has some bad events in it, probably due to
bugs in the multi-MDS systems. Did you start out this cluster on 67.4,
or has it been upgraded at some point?
Why did you use two active MDS daemons?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Oct 21, 2013 at 7:05 PM, Gagandeep Arora  wrote:
> Hello,
>
> We are running ceph-0.67.4 with two mds and both of the mds daemons are
> crashing see the logs below:
>
>
> [root@ceph1 ~]# ceph health detail
> HEALTH_ERR mds rank 1 has failed; mds cluster is degraded; mds a is laggy
> mds.1 has failed
> mds cluster is degraded
> mds.a at 192.168.6.101:6808/14609 rank 0 is replaying journal
> mds.a at 192.168.6.101:6808/14609 is laggy/unresponsive
>
>
> [root@ceph1 ~]# ceph mds dump
> dumped mdsmap epoch 19386
> epoch 19386
> flags 0
> created 2013-03-20 08:56:13.873024
> modified 2013-10-22 11:58:31.374700
> tableserver 0
> root 0
> session_timeout 60
> session_autoclose 300
> last_failure 19253
> last_failure_osd_epoch 6648
> compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding}
> max_mds 2
> in 0,1
> up {0=30}
> failed 1
> stopped
> data_pools 0,13,14
> metadata_pool 1
> 30: 192.168.6.101:6808/14609 'a' mds.0.19 up:replay seq 1 laggy since
> 2013-10-22 11:55:50.972032
>
>
> [root@ceph1 ~]# ceph-mds -i a -d
> 2013-10-22 11:55:28.093342 7f343195f7c0  0 ceph version 0.67.4
> (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-mds, pid 14609
> starting mds.a at :/0
> 2013-10-22 11:55:31.550871 7f342c593700  1 mds.-1.0 handle_mds_map standby
> 2013-10-22 11:55:32.151652 7f342c593700  1 mds.0.19 handle_mds_map i am now
> mds.0.19
> 2013-10-22 11:55:32.151658 7f342c593700  1 mds.0.19 handle_mds_map state
> change up:standby --> up:replay
> 2013-10-22 11:55:32.151661 7f342c593700  1 mds.0.19 replay_start
> 2013-10-22 11:55:32.151673 7f342c593700  1 mds.0.19  recovery set is 1
> 2013-10-22 11:55:32.151675 7f342c593700  1 mds.0.19  need osdmap epoch 6648,
> have 6647
> 2013-10-22 11:55:32.151677 7f342c593700  1 mds.0.19  waiting for osdmap 6648
> (which blacklists prior instance)
> 2013-10-22 11:55:32.275413 7f342c593700  0 mds.0.cache creating system inode
> with ino:100
> 2013-10-22 11:55:32.275720 7f342c593700  0 mds.0.cache creating system inode
> with ino:1
> mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*,
> MDSlaveUpdate*)' thread 7f3428078700 time 2013-10-22 11:55:37.562600
> mds/journal.cc: 1096: FAILED assert(in->first == p->dnfirst ||
> (in->is_multiversion() && in->first > p->dnfirst))
>  ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
>  1: (EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)+0x399d) [0x65b0ad]
>  2: (EUpdate::replay(MDS*)+0x3a) [0x663c0a]
>  3: (MDLog::_replay_thread()+0x5cf) [0x82e17f]
>  4: (MDLog::ReplayThread::entry()+0xd) [0x6393ad]
>  5: (()+0x7d15) [0x7f3430fc2d15]
>  6: (clone()+0x6d) [0x7f342fa3948d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to
> interpret this.
> 2013-10-22 11:55:37.563382 7f3428078700 -1 mds/journal.cc: In function 'void
> EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' thread 7f3428078700
> time 2013-10-22 11:55:37.562600
> mds/journal.cc: 1096: FAILED assert(in->first == p->dnfirst ||
> (in->is_multiversion() && in->first > p->dnfirst))
>
>  ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
>  1: (EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)+0x399d) [0x65b0ad]
>  2: (EUpdate::replay(MDS*)+0x3a) [0x663c0a]
>  3: (MDLog::_replay_thread()+0x5cf) [0x82e17f]
>  4: (MDLog::ReplayThread::entry()+0xd) [0x6393ad]
>  5: (()+0x7d15) [0x7f3430fc2d15]
>  6: (clone()+0x6d) [0x7f342fa3948d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to
> interpret this.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds laggy or crashed

2013-10-23 Thread Gregory Farnum
[ Adding back the list for archival and general edification. :) ]

On Wed, Oct 23, 2013 at 5:53 PM, Gagandeep Arora  wrote:
> Hello Greg,
>
> mds was running fine for more than a month and last week on Thursday, we
> created a snapshot to test the snapshot functionality of cephfs and the
> snapshot was removed the same day. After that, the mds crashed with the
> laggy status. The cluster was setup with 67.3 and I upgraded it to 67.4 to
> see if it fixes mds problem but it doesn't.

Oh dear. The multi-mds and snapshot capabilities are both less stable
than a single-mds, regular filesystem, use case is. Combining them is
definitely likely to cause issues and you appear to have hit one. If
it's available to you the easiest course is probably to recreate the
filesystem and try to avoid using those features.
It's conceivable somebody could clean up your FS, but the assert
you're seeing is basically saying "we lost track of some updates at
some point in the past and now we're inconsistent". It's unlikely we
could find the root cause and fixing it is presently not a trivial
matter. :(
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS & Project Manila (OpenStack)

2013-10-23 Thread Sage Weil
On Wed, 23 Oct 2013, Kyle Bader wrote:
> 
>   Option 1) The service plugs your filesystem's IP into the VM's
>   network
>   and provides direct IP access. For a shared box (like an NFS
>   server)
>   this is fairly straightforward and works well (*everything* has
>   a
>   working NFS client). It's more troublesome for CephFS, since
>   we'd need
>   to include access to many hosts, lots of operating systems don't
>   include good CephFS clients by default, and the client is
>   capable of
>   forcing some service disruptions if they misbehave or disappear
>   (most
>   likely via lease timeouts), but it may not be impossible.
> 
> 
> This is going to get horribly ugly when you add neutron into the mix, so
> much so I'd consider this option a non-starter. If someone is using
> openvswitch to create network overlays to isolate each tenant I can't
> imagine this ever working.

I'm not following here.  Are this only needed if ceph shares the same 
subnet as the VM?  I don't know much about how these things work, but I 
would expect that it would be possible to route IP traffic from a guest 
network to the storage network (or anywhere else, for that matter)...

That aside, however, I think it would be a mistake to take the 
availability of cephfs vs nfs clients as a reason alone for a particular 
architectural approach.  One of the whole points of ceph is that we ignore 
legacy when it doesn't make sense.  (Hence rbd, not iscsi; cephfs, not 
[p]nfs.)

If using manila and cephfs requires that the guest support cephfs, that 
seems perfectly reasonable to me.  It will be awkward to use today, but 
will only get easier over time.

>   Option 2) The hypervisor mediates access to the FS via some
>   pass-through filesystem (presumably P9 ? Plan 9 FS, which QEMU/KVM
>   is
>   already prepared to work with). This works better for us; the
>   hypervisor host can have a single CephFS mount that it shares
>   selectively to client VMs or something.
> 
> This seems like the only sane way to do it IMO.

I also think that an fs pass-thru (like virtfs/9p) is worth considering 
(although I prefer option 1 if we only get to choose one).  I'm not sure 
the performance is great, but for many use cases it will be fine.

(Note, however, that this will also depend on recent Linux kernel support 
as I think the 9p/virtfs bits only landed in mainline in ~2011.)

I'm not too keen on option 3 (nfs proxy on host).

One other thought about multitenancy and DoS from clients: the same thing 
is also possible using NFSv4 delegations.  Possibly even NFS3 locks (not 
sure).  Any stateful fs protocol will have the same issues.  In many 
cases, though, clients/tenants will be using isolated file sets and won't 
be able interfere with each other even when they misbehave.

Also, in many (private) cloud deployments these security/DoS/etc issues 
aren't a concern.  The capability can still be useful even with a big * 
next to it.  :)

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ls/file access hangs on a single ceph directory

2013-10-23 Thread Yan, Zheng
On Thu, Oct 24, 2013 at 6:44 AM, Michael  wrote:
> Tying to gather some more info.
>
> CentOS - hanging ls
> [root@srv ~]# cat /proc/14614/stack
> [] wait_answer_interruptible+0x81/0xc0 [fuse]
> [] fuse_request_send+0x1cb/0x290 [fuse]
> [] fuse_do_getattr+0x10c/0x2c0 [fuse]
> [] fuse_update_attributes+0x75/0x80 [fuse]
> [] fuse_getattr+0x53/0x60 [fuse]
> [] vfs_getattr+0x51/0x80
> [] vfs_fstatat+0x60/0x80
> [] vfs_stat+0x1b/0x20
> [] sys_newstat+0x24/0x50
> [] system_call_fastpath+0x16/0x1b
> [] 0x
>
> Ubuntu - hanging ls
> root@srv:~# cat /proc/30012/stack
> [] ceph_mdsc_do_request+0xcb/0x1a0 [ceph]
> [] ceph_do_getattr+0xe7/0x120 [ceph]
> [] ceph_getattr+0x24/0x100 [ceph]
> [] vfs_getattr+0x4e/0x80
> [] vfs_fstatat+0x4e/0x70
> [] vfs_lstat+0x1e/0x20
> [] sys_newlstat+0x1a/0x40
> [] system_call_fastpath+0x16/0x1b
> [] 0x
>
> Started occurring shortly (within an hour or so) after adding a pool, not
> sure if that's relevant yet.
>
> -Michael
>
> On 23/10/2013 21:10, Michael wrote:
>>
>> I have a filesystem shared by several systems mounted on 2 ceph nodes with
>> a 3rd as a reference monitor.
>> It's been used for a couple of months now but suddenly the root directory
>> for the mount has become inaccessible and requests to files in it just hang,
>> there's no ceph errors reported before/after and subdirectories of the
>> directory can be used (and still are currently being used by VM's still
>> running from it). It's being mounted in a mixed kernel driver (ubuntu) and
>> centos (ceph-fuse) environment.

kernel, ceph-fuse and ceph-mds version? the hang was likely caused by an known
bug in kernel 3.10.

Regards
Yan, Zheng


>>
>>  cluster ab3f7bc0-4cf7-4489-9cde-1af11d68a834
>>health HEALTH_OK
>>monmap e1: 3 mons at {srv10=##:6789/0,srv11=##:6789/0,srv8=##:6789/0},
>> election epoch 96, quorum 0,1,2 srv10,srv11,srv8
>>osdmap e2873: 6 osds: 6 up, 6 in
>>pgmap v2451618: 728 pgs: 728 active+clean; 128 GB data, 260 GB used,
>> 3929 GB / 4189 GB avail; 30365B/s wr, 5op/s
>>mdsmap e51: 1/1/1 up {0=srv10=up:active}
>>
>> Have done a full deep scrub/repair cycle on all of the osd which has come
>> back fine so not really sure where to start looking to find out what's wrong
>> with it.
>>
>> Any ideas?
>>
>> -Michael
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS & Project Manila (OpenStack)

2013-10-23 Thread Kyle Bader
>> This is going to get horribly ugly when you add neutron into the mix, so
>> much so I'd consider this option a non-starter. If someone is using
>> openvswitch to create network overlays to isolate each tenant I can't
>> imagine this ever working.
>
> I'm not following here.  Are this only needed if ceph shares the same
> subnet as the VM?  I don't know much about how these things work, but I
> would expect that it would be possible to route IP traffic from a guest
> network to the storage network (or anywhere else, for that matter)...
>
> That aside, however, I think it would be a mistake to take the
> availability of cephfs vs nfs clients as a reason alone for a particular
> architectural approach.  One of the whole points of ceph is that we ignore
> legacy when it doesn't make sense.  (Hence rbd, not iscsi; cephfs, not
> [p]nfs.)

In an overlay world, physical VLANs have no relation to virtual
networks. An overlay is literally encapsulating layer 2 inside layer 3
and adding a VNI (virtual network identifier) and using tunnels
(VxLAN, STT, GRE, etc) to connect VMs on disparate hypervisors that
may not even have L2 connectivity to each other.  One of the core
tenants of virtual networks is providing tenants the ability to have
overlapping RFC1918 addressing, in this case you could have tenants
already utilizing the addresses used by the NFS storage at the
physical layer. Even if we could pretend that would never happen
(namespaces or jails maybe?) you would still need to provision a
distinct NFS IP per tenant and run a virtual switch that supports the
tunneling protocol used by the overlay and the southbound API used by
that overlays virtual switch to insert/remove flow information. The
only alternative to embedding a myriad of different virtual switch
protocols on the filer head would be to use a VTEP capable switch for
encapsulation. I think there are only 1-2 vendors that ship these,
Arista's 7150 and something in the Cumulus lineup.  Even if you could
get past all this I'm somewhat terrified by the proposition of
connecting the storage fabric to a tenant network, although this is
much more acute concern for public clouds.

Here's a good RFC wrt overlays if anyone is in dire need of bed time reading:

http://tools.ietf.org/html/draft-mity-nvo3-use-case-04

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com