Re: [ceph-users] Number of SSD for OSD journal

2014-12-16 Thread Mike
16.12.2014 10:53, Daniel Schwager пишет:
> Hallo Mike,
> 
>> This is also have another way.
>> * for CONF 2,3 replace 200Gb SSD to 800Gb and add another 1-2 SSD to
>> each node.
>> * make tier1 read-write cache on SSDs
>> * also you can add journal partition on them if you wish - then data
>> will moving from SSD to SSD before let down on HDD
>> * on HDD you can make erasure pool or replica pool
> 
> Do you have some experience (performance ?)  with SSD as caching tier1? Maybe 
> some small benchmarks? From the mailing list, I "feel" that SSD-tearing is 
> not much used in productive.
> 
> regards
> Danny
> 
> 

No. But I think it's better than using SSD only for journals. Looks on
StorPool or Nutanix (in some way) - they used SSD as a storage/long life
cache as a storage.

Cache pool tiering it's a new feature in Ceph introducing in Firefly.
It's explain why cache tiering by now haven't used in production.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to download files from ceph radosgw node using openstack juno swift client.

2014-12-16 Thread Vivek Varghese Cherian
Hi,

On Tue, Dec 16, 2014 at 12:54 PM, pushpesh sharma 
wrote:
>
> Vivek,
>
> The problem is swift client is only downloading a chunk of object not
> the whole object so the etag mismatch. Could you paste the value of
> 'rgw_max_chunk_size'. Please be sure you set this to a sane
> value(<4MB, atleast for Giant release this works below this value).
>
>
>
 Where can I find the rgw_max_chunk_size ?.

I am using ceph firefly.

Regards,
-- 
Vivek Varghese Cherian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to download files from ceph radosgw node using openstack juno swift client.

2014-12-16 Thread Vivek Varghese Cherian
Hi,

root@ppm-c240-ceph3:/var/run/ceph# ceph --admin-daemon
/var/run/ceph/ceph-osd.11.asok config show | less | grep rgw_max_chunk_size
"rgw_max_chunk_size": "524288",
root@ppm-c240-ceph3:/var/run/ceph#

And the value is above 4 MB.


Regards,
-- 
Vivek Varghese Cherian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dual RADOSGW Network

2014-12-16 Thread Georgios Dimitrakakis

Thanks Craig.

I will try that!

I thought it was more complicate than that because of the entries for 
the "public_network" and "rgw dns name" in the config file...


I will give it a try.

Best,


George




That shouldnt be a problem.  Just have Apache bind to all interfaces
instead of the external IP.

In my case, I only have Apache bound to the internal interface.  My
load balancer has an external and internal IP, and Im able to talk to
it on both interfaces.

On Mon, Dec 15, 2014 at 2:00 PM, Georgios Dimitrakakis  wrote:


Hi all!

I have a single CEPH node which has two network interfaces.

One is configured to be accessed directly by the internet (153.*)
and the other one is configured on an internal LAN (192.*)

For the moment radosgw is listening on the external (internet)
interface.

Can I configure radosgw to be accessed by both interfaces? What I
would like to do is to save bandwidth and time for the machines on
the internal network and use the internal net for all rados
communications.

Any ideas?

Best regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]



Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3] mailto:gior...@acmac.uoc.gr

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw timeout

2014-12-16 Thread Alejandro de Brito Fontes
I have a 3 node Ceph 0.87 cluster. After a while I see an error in radosgw
and I don’t find references in the list archives


heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7fc4eac2d700' had timed
out after 600


The only solution is restart radosgw and for a while it works just fine

Any idea?

Thanks




ceph.conf


[global]

fsid = fc0e2e09-ade3-4ff6-b23e-f789775b2515

mon initial members = nodo-3

mon host = 192.168.2.200, 192.168.2.201, 192.168.2.202

mon addr = 192.168.2.200:6789, 192.168.2.201:6789, 192.168.2.202:6789

auth cluster required = cephx

auth service required = cephx

auth client required = cephx

osd pool default size = 3

osd pool default min_size = 1

osd pool default pg_num = 128

osd pool default pgp_num = 128

osd recovery delay start = 15

log file = /dev/stdout

mon clock drift allowed = 1


[client.radosgw.gateway]

host = deis-store-gateway

keyring = /etc/ceph/ceph.client.radosgw.keyring

rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock

log file = /var/log/ceph/radosgw.log



Full trace:


2014-12-15 21:59:27.976981 7fc70cb1c840  0 ceph version 0.87
(c51c8f9d80fa4e0168aa52685b8de40e42758578), process radosgw, pid 127

2014-12-15 21:59:28.005388 7fc70cb1c840  0 framework: fastcgi

2014-12-15 21:59:28.005393 7fc70cb1c840  0 framework: civetweb

2014-12-15 21:59:28.005398 7fc70cb1c840  0 framework conf key: port, val:
7480

2014-12-15 21:59:28.005402 7fc70cb1c840  0 starting handler: civetweb

2014-12-15 21:59:28.010659 7fc70cb1c840  0 starting handler: fastcgi

2014-12-15 21:59:39.961503 7fc55cd11700  1 == starting new request
req=0x7fc6ec1148e0 =

2014-12-15 21:59:39.965239 7fc55cd11700  1 == req done
req=0x7fc6ec1148e0 http_status=200 ==

2014-12-15 21:59:40.033219 7fc554500700  1 == starting new request
req=0x7fc6ec11c190 =

2014-12-15 21:59:40.038634 7fc554500700  0 WARNING: couldn't find acl
header for object, generating default

2014-12-15 21:59:40.348267 7fc554500700  1 == req done
req=0x7fc6ec11c190 http_status=200 ==

2014-12-15 22:00:42.522831 7fc554500700  1 == starting new request
req=0x7fc6ec11c220 =

2014-12-15 22:00:42.786590 7fc554500700  1 == req done
req=0x7fc6ec11c220 http_status=200 ==

2014-12-15 22:04:41.906676 7fc55cd11700  1 == starting new request
req=0x7fc6ec11c4c0 =

2014-12-15 22:04:42.077969 7fc55cd11700  1 == req done
req=0x7fc6ec11c4c0 http_status=200 ==

2014-12-15 22:09:42.270387 7fc554500700  1 == starting new request
req=0x7fc6ec11bb90 =

2014-12-15 22:09:42.634896 7fc554500700  1 == req done
req=0x7fc6ec11bb90 http_status=200 ==

2014-12-15 22:14:42.812094 7fc554500700  1 == starting new request
req=0x7fc6ec11a2c0 =

2014-12-15 22:14:43.027164 7fc554500700  1 == req done
req=0x7fc6ec11a2c0 http_status=200 ==

2014-12-15 22:19:43.330578 7fc5acdb1700  1 == starting new request
req=0x7fc6ec11a560 =

2014-12-15 22:19:43.505847 7fc5acdb1700  1 == req done
req=0x7fc6ec11a560 http_status=200 ==

2014-12-15 22:24:31.664914 7fc6fb7fe700  0 monclient: hunting for new mon

2014-12-15 22:24:31.691258 7fc70cb14700  0 -- 192.168.2.201:0/1000131 >>
192.168.2.202:6800/1 pipe(0x7fc6f0120610 sd=9 :0 s=1 pgs=0 cs=0 l=1
c=0x7fc6f01208a0).fault

2014-12-15 22:24:43.653020 7fc5acdb1700  1 == starting new request
req=0x7fc6ec11a3b0 =

2014-12-15 22:24:49.093981 7fc55cd11700  1 == starting new request
req=0x7fc6ec119d60 =

2014-12-15 22:24:55.165618 7fc51347e700  1 == starting new request
req=0x7fc6ec121290 =

2014-12-15 22:25:04.181370 7fc57cd51700  1 == starting new request
req=0x7fc6ec125fa0 =

2014-12-15 22:25:11.936946 7fc531cbb700  1 == starting new request
req=0x7fc6ec12ad20 =

2014-12-15 22:25:12.401848 7fc5acdb1700  1 == req done
req=0x7fc6ec11a3b0 http_status=200 ==

2014-12-15 22:25:12.402031 7fc57cd51700  1 == req done
req=0x7fc6ec125fa0 http_status=200 =

2014-12-15 22:25:12.402164 7fc51347e700  1 == req done
req=0x7fc6ec121290 http_status=200 ==

2014-12-15 22:25:12.402286 7fc531cbb700  1 == req done
req=0x7fc6ec12ad20 http_status=200 ==

2014-12-15 22:25:12.574183 7fc55cd11700  1 == req done
req=0x7fc6ec119d60 http_status=200 ==

2014-12-15 22:28:44.138277 7fc531cbb700  1 == starting new request
req=0x7fc6ec12fa80 =

2014-12-15 22:28:44.277586 7fc531cbb700  1 == req done
req=0x7fc6ec12fa80 http_status=200 ==

2014-12-15 22:29:44.023631 7fc531cbb700  1 == starting new request
req=0x7fc6ec11c560 =

2014-12-15 22:29:44.233772 7fc531cbb700  1 == req done
req=0x7fc6ec11c560 http_status=200 ==

2014-12-15 22:34:43.458371 7fc51347e700  1 == starting new request
req=0x7fc6ec119dc0 =

2014-12-15 22:34:43.618785 7fc51347e700  1 == req done
req=0x7fc6ec119dc0 http_status=200 ==

2014-12-15 22:39:43.772838 7fc531cbb700  1 == starting new request
req=0x7fc6ec11c560 =

2014-12-15 22:39:43.954160 7fc5

[ceph-users] OSD Crash makes whole cluster unusable ?

2014-12-16 Thread Christoph Adomeit

Hi there,

today I had an osd crash with ceph 0.87/giant which made my hole cluster 
unusable for 45 Minutes.

First it began with a disk error:

sd 0:1:2:0: [sdc] CDB: Read(10)Read(10):: 28 28 00 00 0d 15 fe d0 fd 7b e8 f8 
00 00 00 00 b0 08 00 00
XFS (sdc1): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. 

Then most other osds found out that my osd.3 is down:

2014-12-16 08:45:15.873478 mon.0 10.67.1.11:6789/0 3361077 : cluster [INF] 
osd.3 10.67.1.11:6810/713621 failed (42 reports from 35 peers after 23.642482 
>= grace 23.348982) 

5 minutes later the osd is marked as out:
2014-12-16 08:50:21.095903 mon.0 10.67.1.11:6789/0 3361367 : cluster [INF] 
osd.3 out (down for 304.581079) 

However, since 8:45 until 9:20 I have 1000 slow requests and 107 incomplete 
pgs. Many requests are not answered:

2014-12-16 08:46:03.029094 mon.0 10.67.1.11:6789/0 3361126 : cluster [INF] 
pgmap v6930583: 4224 pgs: 4117 active+clean, 107 incomplete; 7647 GB data, 
19090 GB used, 67952 GB / 87042 GB avail; 2307 kB/s rd, 2293 kB/s wr, 407 op/s

Also a recovery to another osd was not starting

Seems the osd thinks it is still up and all other osds think this osd is down ?
I found this in the log of osd3:
ceph-osd.3.log:2014-12-16 08:45:19.319152 7faf81296700  0 log_channel(default) 
log [WRN] : map e61177 wrongly marked me down
ceph-osd.3.log:  -440> 2014-12-16 08:45:19.319152 7faf81296700  0 
log_channel(default) log [WRN] : map e61177 wrongly marked me down

Luckily I was able to restart osd3 and everything was working again but I do 
not understand what has happened. The cluster ways simply not usable for 45 
Minutes.

Any ideas

Thanks
  Christoph


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd snapshot slow restore

2014-12-16 Thread Lindsay Mathieson
On Tue, 16 Dec 2014 11:26:35 AM you wrote:
> Is this normal? is ceph just really slow at restoring rbd snapshots,
> or have I really borked my setup?


I'm not looking for a fix or a tuning suggestions, just feedback on whether 
this is normal
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] can not add osd

2014-12-16 Thread Karan Singh
Hi

You logs does not provides much information , if you are following any other 
documentation for Ceph , i would recommend you to follow official Ceph docs.

http://ceph.com/docs/master/start/quick-start-preflight/




Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/


On 16 Dec 2014, at 09:55, yang.bi...@zte.com.cn wrote:

> hi 
> 
> When i execute "ceph-deploy osd prepare node3:/dev/sdb",always come out err 
> like this : 
> 
> [node3][WARNIN] INFO:ceph-disk:Running command: /bin/umount -- 
> /var/lib/ceph/tmp/mnt.u2KXW3 
> [node3][WARNIN] umount: /var/lib/ceph/tmp/mnt.u2KXW3: target is busy. 
> 
> Then i execute "/bin/umount -- /var/lib/ceph/tmp/mnt.u2KXW3",result is ok. 
> 
> 
> ZTE Information Security Notice: The information contained in this mail (and 
> any attachment transmitted herewith) is privileged and confidential and is 
> intended for the exclusive use of the addressee(s).  If you are not an 
> intended recipient, any disclosure, reproduction, distribution or other 
> dissemination or use of the information contained is strictly prohibited.  If 
> you have received this mail in error, please delete it and notify us 
> immediately.
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RESOLVED Re: Cluster with pgs in active (unclean) status

2014-12-16 Thread Eneko Lacunza

Hi Gregory,

Sorry for the delay getting back.

There was no activity at all on those 3 pools. Activity on the fourth 
pool was under 1 Mbps of writes.


I think I waited several hours, but I can't recall exactly. One hour at 
least is for sure.


Thanks
Eneko

On 11/12/14 19:32, Gregory Farnum wrote:

Was there any activity against your cluster when you reduced the size
from 3 -> 2? I think maybe it was just taking time to percolate
through the system if nothing else was going on. When you reduced them
to size 1 then data needed to be deleted so everything woke up and
started processing.
-Greg

On Wed, Dec 10, 2014 at 5:27 AM, Eneko Lacunza  wrote:

Hi all,

I fixed the issue with the following commands:
# ceph osd pool set data size 1
(wait some seconds for clean+active state of +64pgs)
# ceph osd pool set data size 2
# ceph osd pool set metadata size 1
(wait some seconds for clean+active state of +64pgs)
# ceph osd pool set metadata size 2
# ceph osd pool set rbd size 1
(wait some seconds for clean+active state of +64pgs)
# ceph osd pool set rbd size 2

This now gives me:
# ceph status
 cluster 3e91b908-2af3-4288-98a5-dbb77056ecc7
  health HEALTH_OK
  monmap e3: 3 mons at
{0=10.0.3.3:6789/0,1=10.0.3.1:6789/0,2=10.0.3.2:6789/0}, election epoch 32,
quorum 0,1,2 1,2,0
  osdmap e275: 2 osds: 2 up, 2 in
   pgmap v395557: 256 pgs, 4 pools, 194 GB data, 49820 objects
 388 GB used, 116 GB / 505 GB avail
  256 active+clean

I'm still curious whether this can be fixed without this trick?

Cheers
Eneko


On 10/12/14 13:14, Eneko Lacunza wrote:

Hi all,

I have a small ceph cluster with just 2 OSDs, latest firefly.

Default data, metadata and rbd pools were created with size=3 and
min_size=1
An additional pool rbd2 was created with size=2 and min_size=1

This would give me a warning status, saying that 64 pgs were active+clean
and 192 active+degraded. (there are 64 pg per pool).

I realized it was due to the size=3 in the three pools, so I changed that
value to 2:
# ceph osd pool set data size 2
# ceph osd pool set metadata size 2
# ceph osd pool set rbd size 2

Those 3 pools are empty. After those commands status would report 64 pgs
active+clean, and 192 pgs active, with a warning saying 192 pgs were
unclean.

I have created a rbd block with:
rbd create -p rbd --image test --size 1024

And now the status is:
# ceph status
 cluster 3e91b908-2af3-4288-98a5-dbb77056ecc7
  health HEALTH_WARN 192 pgs stuck unclean; recovery 2/99640 objects
degraded (0.002%)
  monmap e3: 3 mons at
{0=10.0.3.3:6789/0,1=10.0.3.1:6789/0,2=10.0.3.2:6789/0}, election epoch 32,
quorum 0,1,2 1,2,0
  osdmap e263: 2 osds: 2 up, 2 in
   pgmap v393763: 256 pgs, 4 pools, 194 GB data, 49820 objects
 388 GB used, 116 GB / 505 GB avail
 2/99640 objects degraded (0.002%)
  192 active
   64 active+clean

Looking to an unclean non-empty pg:
# ceph pg 2.14 query
{ "state": "active",
   "epoch": 263,
   "up": [
 0,
 1],
   "acting": [
 0,
 1],
   "actingbackfill": [
 "0",
 "1"],
   "info": { "pgid": "2.14",
   "last_update": "263'1",
   "last_complete": "263'1",
   "log_tail": "0'0",
   "last_user_version": 1,
   "last_backfill": "MAX",
   "purged_snaps": "[]",
   "history": { "epoch_created": 1,
   "last_epoch_started": 136,
   "last_epoch_clean": 136,
   "last_epoch_split": 0,
   "same_up_since": 135,
   "same_interval_since": 135,
   "same_primary_since": 11,
   "last_scrub": "0'0",
   "last_scrub_stamp": "2014-11-26 12:23:57.023493",
   "last_deep_scrub": "0'0",
   "last_deep_scrub_stamp": "2014-11-26 12:23:57.023493",
   "last_clean_scrub_stamp": "0.00"},
   "stats": { "version": "263'1",
   "reported_seq": "306",
   "reported_epoch": "263",
   "state": "active",
   "last_fresh": "2014-12-10 12:53:37.766465",
   "last_change": "2014-12-10 10:32:24.189000",
   "last_active": "2014-12-10 12:53:37.766465",
   "last_clean": "0.00",
   "last_became_active": "0.00",
   "last_unstale": "2014-12-10 12:53:37.766465",
   "mapping_epoch": 128,
   "log_start": "0'0",
   "ondisk_log_start": "0'0",
   "created": 1,
   "last_epoch_clean": 136,
   "parent": "0.0",
   "parent_split_bits": 0,
   "last_scrub": "0'0",
   "last_scrub_stamp": "2014-11-26 12:23:57.023493",
   "last_deep_scrub": "0'0",
   "last_deep_scrub_stamp": "2014-11-26 12:23:57.023493",
   "last_clean_scrub_stamp": "0.00",
   "log_size": 1,
   "ondisk_log_size": 1,
   "stats_invalid": "0",
   "stat_sum": { "num_bytes": 112,
   "num_objects": 1,
   "num_object_clones": 0,

Re: [ceph-users] Number of SSD for OSD journal

2014-12-16 Thread Christian Balzer
On Tue, 16 Dec 2014 12:10:42 +0300 Mike wrote:

> 16.12.2014 10:53, Daniel Schwager пишет:
> > Hallo Mike,
> > 
> >> This is also have another way.
> >> * for CONF 2,3 replace 200Gb SSD to 800Gb and add another 1-2 SSD to
> >> each node.
> >> * make tier1 read-write cache on SSDs
> >> * also you can add journal partition on them if you wish - then data
> >> will moving from SSD to SSD before let down on HDD
> >> * on HDD you can make erasure pool or replica pool
> > 
> > Do you have some experience (performance ?)  with SSD as caching
> > tier1? Maybe some small benchmarks? From the mailing list, I "feel"
> > that SSD-tearing is not much used in productive.
> > 
> > regards
> > Danny
> > 
> > 
> 
> No. But I think it's better than using SSD only for journals. Looks on
> StorPool or Nutanix (in some way) - they used SSD as a storage/long life
> cache as a storage.
> 
Unfortunately a promising design doesn't make a well rounded working
solution. 

> Cache pool tiering it's a new feature in Ceph introducing in Firefly.
> It's explain why cache tiering by now haven't used in production.
>
If you'd followed the various discussions here, you'd know that SSD based
cache tiers are pointless (from a performance perspective) in Firefly and
still riddled with bugs in Giant with only minor improvements. 

They show great promise/potential and I'm looking forward to use them, but
right now (and probably for the next 1-2 releases) the best bang for the
buck in speeding up Ceph is classic SSD journals for writes and lots of
RAM for reads.
 
Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd snapshot slow restore

2014-12-16 Thread Carl-Johan Schenström

On 2014-12-16 14:53, Lindsay Mathieson wrote:


Is this normal? is ceph just really slow at restoring rbd snapshots,
or have I really borked my setup?


I'm not looking for a fix or a tuning suggestions, just feedback on whether
this is normal


That is my experience as well. I rolled back a 1,5 TB volume once, and 
had to leave it running overnight before it would complete.


--
Carl-Johan Schenström
Driftansvarig / System Administrator
Språkbanken & Svensk nationell datatjänst /
The Swedish Language Bank & Swedish National Data Service
Göteborgs universitet / University of Gothenburg
carl-johan.schenst...@gu.se / +46 709 116769
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd snapshot slow restore

2014-12-16 Thread Wido den Hollander
On 12/16/2014 04:14 PM, Carl-Johan Schenström wrote:
> On 2014-12-16 14:53, Lindsay Mathieson wrote:
> 
>>> Is this normal? is ceph just really slow at restoring rbd snapshots,
>>> or have I really borked my setup?
>>
>> I'm not looking for a fix or a tuning suggestions, just feedback on
>> whether
>> this is normal
> 
> That is my experience as well. I rolled back a 1,5 TB volume once, and
> had to leave it running overnight before it would complete.
> 

That is normal behavior. Snapshotting itself is a fast process, but
restoring means merging and rolling back.

It's easier to protect a snapshot and clone it into a new image and use
that one.

Afterwards you can flatten the image to detach the clone from the
parent. Never tried if this can be done live.

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd snapshot slow restore

2014-12-16 Thread Alexandre DERUMIER






Alexandre Derumier 
Ingénieur système et stockage 


Fixe : 03 20 68 90 88 
Fax : 03 20 68 90 81 


45 Bvd du Général Leclerc 59100 Roubaix 
12 rue Marivaux 75002 Paris 


MonSiteEstLent.com - Blog dédié à la webperformance et la gestion de pics de 
trafic 


De: "Wido den Hollander"  
À: "ceph-users"  
Envoyé: Mardi 16 Décembre 2014 16:18:09 
Objet: Re: [ceph-users] rbd snapshot slow restore 

On 12/16/2014 04:14 PM, Carl-Johan Schenström wrote: 
> On 2014-12-16 14:53, Lindsay Mathieson wrote: 
> 
>>> Is this normal? is ceph just really slow at restoring rbd snapshots, 
>>> or have I really borked my setup? 
>> 
>> I'm not looking for a fix or a tuning suggestions, just feedback on 
>> whether 
>> this is normal 
> 
> That is my experience as well. I rolled back a 1,5 TB volume once, and 
> had to leave it running overnight before it would complete. 
> 

That is normal behavior. Snapshotting itself is a fast process, but 
restoring means merging and rolling back. 

It's easier to protect a snapshot and clone it into a new image and use 
that one. 

Afterwards you can flatten the image to detach the clone from the 
parent. Never tried if this can be done live. 

-- 
Wido den Hollander 
42on B.V. 
Ceph trainer and consultant 

Phone: +31 (0)20 700 9902 
Skype: contact42on 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd snapshot slow restore

2014-12-16 Thread Alexandre DERUMIER
Hi,

>>That is normal behavior. Snapshotting itself is a fast process, but
>>restoring means merging and rolling back.


Any future plan to add something similar to zfs or netapp,
where you can instant rollback a snapshot ?

(Not sure it's technically possible to implement such snapshot with distributed 
storage)



- Mail original -
De: "aderumier" 
À: "Wido den Hollander" 
Cc: "ceph-users" 
Envoyé: Mardi 16 Décembre 2014 17:02:12
Objet: Re: [ceph-users] rbd snapshot slow restore







Alexandre Derumier 
Ingénieur système et stockage 


Fixe : 03 20 68 90 88 
Fax : 03 20 68 90 81 


45 Bvd du Général Leclerc 59100 Roubaix 
12 rue Marivaux 75002 Paris 


MonSiteEstLent.com - Blog dédié à la webperformance et la gestion de pics de 
trafic 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd read speed only 1/4 of write speed

2014-12-16 Thread VELARTIS Philipp Dürhammer
Hello,

Read speed inside our vms (most of them windows) is only ¼ of the write speed.
Write speed is about 450MB/s - 500mb/s and
Read is only about 100/MB/s

Our network is 10Gbit for OSDs and 10GB for MONS. We have 3 Servers with 15 
osds each

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd snapshot slow restore

2014-12-16 Thread Robert LeBlanc
There are really only two ways to do snapshots that I know of and they have
trade-offs:

COW into the snapshot (like VMware, Ceph, etc):

When a write is committed, the changes are committed to a diff file and the
base file is left untouched. This only has a single write penalty, if you
want to discard the child, it is fast as you just delete the diff file. The
negative side effects is that reads may have to query each diff file before
being satisfied, and if you want to delete the snapshot, but keep the
changes (merge the snapshot into the base), then you have to copy all the
diff blocks into the base image.


COW into the base image (like most Enterprise disk systems with snapshots
for backups):

When a write is committed, the system reads the blocks to be changed out of
the base disk and places those original blocks into a diff file, then
writes the new blocks directly into the base image. The pros to this
approach is that snapshots can be deleted quickly and the data is "merged"
already. Read access for the current data is always fast as it only has to
search one location. The cons are that each write is really a read and two
writes, recovering data from a snapshot can be slow as the reads have to
search one or more snapshots.


My experience is that you can't have your cake and eat it too. If you have
the choice, you choose the option that fits your use case best. Ceph
doesn't have the ability to select which snapshot method it uses (most
systems don't).

I hope that helps explain why the request is not easily fulfilled.

On Tue, Dec 16, 2014 at 9:04 AM, Alexandre DERUMIER 
wrote:

> Hi,
>
> >>That is normal behavior. Snapshotting itself is a fast process, but
> >>restoring means merging and rolling back.
>
>
> Any future plan to add something similar to zfs or netapp,
> where you can instant rollback a snapshot ?
>
> (Not sure it's technically possible to implement such snapshot with
> distributed storage)
>
>
>
> - Mail original -
> De: "aderumier" 
> À: "Wido den Hollander" 
> Cc: "ceph-users" 
> Envoyé: Mardi 16 Décembre 2014 17:02:12
> Objet: Re: [ceph-users] rbd snapshot slow restore
>
>
>
>
>
>
>
> Alexandre Derumier
> Ingénieur système et stockage
>
>
> Fixe : 03 20 68 90 88
> Fax : 03 20 68 90 81
>
>
> 45 Bvd du Général Leclerc 59100 Roubaix
> 12 rue Marivaux 75002 Paris
>
>
> MonSiteEstLent.com - Blog dédié à la webperformance et la gestion de pics
> de trafic
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dual RADOSGW Network

2014-12-16 Thread Craig Lewis
You may need split horizon DNS.  The internal machines' DNS should resolve
to the internal IP, and the external machines' DNS should resolve to the
external IP.

There are various ways to do that.  The RadosGW config has an example of
setting up Dnsmasq:
http://ceph.com/docs/master/radosgw/config/#enabling-subdomain-s3-calls

On Tue, Dec 16, 2014 at 3:05 AM, Georgios Dimitrakakis  wrote:
>
> Thanks Craig.
>
> I will try that!
>
> I thought it was more complicate than that because of the entries for the
> "public_network" and "rgw dns name" in the config file...
>
> I will give it a try.
>
> Best,
>
>
> George
>
>
>
>  That shouldnt be a problem.  Just have Apache bind to all interfaces
>> instead of the external IP.
>>
>> In my case, I only have Apache bound to the internal interface.  My
>> load balancer has an external and internal IP, and Im able to talk to
>> it on both interfaces.
>>
>> On Mon, Dec 15, 2014 at 2:00 PM, Georgios Dimitrakakis  wrote:
>>
>>  Hi all!
>>>
>>> I have a single CEPH node which has two network interfaces.
>>>
>>> One is configured to be accessed directly by the internet (153.*)
>>> and the other one is configured on an internal LAN (192.*)
>>>
>>> For the moment radosgw is listening on the external (internet)
>>> interface.
>>>
>>> Can I configure radosgw to be accessed by both interfaces? What I
>>> would like to do is to save bandwidth and time for the machines on
>>> the internal network and use the internal net for all rados
>>> communications.
>>>
>>> Any ideas?
>>>
>>> Best regards,
>>>
>>> George
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com [1]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]
>>>
>>
>>
>> Links:
>> --
>> [1] mailto:ceph-users@lists.ceph.com
>> [2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> [3] mailto:gior...@acmac.uoc.gr
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd read speed only 1/4 of write speed

2014-12-16 Thread David Clarke
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 17/12/14 05:26, VELARTIS Philipp Dürhammer wrote:
> Hello,
> 
> 
> 
> Read speed inside our vms (most of them windows) is only ¼ of the
> write speed.
> 
> Write speed is about 450MB/s – 500mb/s and
> 
> Read is only about 100/MB/s
> 
> 
> 
> Our network is 10Gbit for OSDs and 10GB for MONS. We have 3 Servers
> with 15 osds each

We saw similar things, until we started playing around with read ahead
parameters inside the VMs.  Our environment is almost 100% Ubuntu, but
the same basic principles should hold.

I'm pretty sure that there have been previous posts to the lists about
this, but the value(s) we tweaked are:

/sys/block/$device/queue/read_ahead_kb

It defaults to 128, but we had pretty drastic increases to read speed
all of the way up to around 8192 (8 MB) with no obvious regressions to
random read speed.

I'm not sure what the equivalent option is in Windows, sorry.

Unfortunately this is a per VM (per disk per VM, even) setting, but it
can be automated to some degree.  We have a udev rule snippet pushed
out to each VM in order to set the value(s).

It may also be worth investigating read ahead options on the storage
nodes themselves, both at the OS and disk controller levels.  This
isn't something we've yet been able to test, however.



- -- 
David Clarke
Systems Architect
Catalyst IT
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBCAAGBQJUkJuQAAoJEPH5xSy8rJPMGnYP/iwfzBJMHsJ9oYetY+s0U6id
IeA6iAK3DjTSJmJE0reO7olUBZ6Kq1T/u5yW/wA/qvdPw/UTb0G9NsY80DXsc2nc
74EUM4o6TiqzauUqxWxiLyh/vaBg5WtFmmJlCwGxclWrCuazeTYmTTUmN76WKJ4G
nrR2m+HlZMz0jaOcwoTB5dQALVvJ4g4zRg4Cz4M8sLJX13pEZDRC1FGfiAA0OvxZ
kUFZBtvFhQxAVisTBc7gqiz9PwAk1vn8WlfaUD2h1DNNoQBE4wqLW8QgWbGrnrWl
mIrtKH5c8ykDIR2aaSgcWiYw3QrOgU/2PyRKw/yrMuhany7VXPA3oJMnr7HZNAUn
Z0t3bPBjrsdayERwJJz4PscZuIVvAoJs8He+ssg7BT5/R9jRJawL23FDxcbtUmp6
1io/y2GSL7SQhR9vlRl6/AFGJc24nU52wRzkUHCcCMVz3AgBv7bfyBE+7DzaJTNs
/0MvIea8u8owDnWP/YTfYLNBOwi6WqZG4m8IqrygvlCTO5Ijtl1CUhvxFKaVzdGi
FAoG+k3x5HXdw8T1sgMeiF6kLx0ifw6pe5J4bxkNMkoNvqS9f3vPARe+FGgtl/QM
inli/qGipqYvqfgzONNILNkpp+k67dAjCfNT22jvAUtnFDEOmYMSCIDcwoi0rVAP
Yqn0EwypITm1jEDRrqof
=wu0K
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Erasure coded PGs incomplete

2014-12-16 Thread Italo Santos
Hello,

I'm trying to create an erasure pool following  
http://docs.ceph.com/docs/master/rados/operations/erasure-code/, but when I try 
create a pool with a specifc erasure-code-profile ("myprofile") the PGs became 
on incomplete state.

Anyone can help me?

Below the profile I created:
root@ceph0001:~# ceph osd erasure-code-profile get myprofile
directory=/usr/lib/ceph/erasure-code
k=6
m=2
plugin=jerasure
technique=reed_sol_van

The status of cluster:
root@ceph0001:~# ceph health
HEALTH_WARN 12 pgs incomplete; 12 pgs stuck inactive; 12 pgs stuck unclean

health detail:
root@ceph0001:~# ceph health detail
HEALTH_WARN 12 pgs incomplete; 12 pgs stuck inactive; 12 pgs stuck unclean
pg 2.9 is stuck inactive since forever, current state incomplete, last acting 
[4,10,15,2147483647,3,2147483647,2147483647,2147483647]
pg 2.8 is stuck inactive since forever, current state incomplete, last acting 
[0,2147483647,4,2147483647,10,2147483647,15,2147483647]
pg 2.b is stuck inactive since forever, current state incomplete, last acting 
[8,3,14,2147483647,5,2147483647,2147483647,2147483647]
pg 2.a is stuck inactive since forever, current state incomplete, last acting 
[11,7,2,2147483647,2147483647,2147483647,15,2147483647]
pg 2.5 is stuck inactive since forever, current state incomplete, last acting 
[12,8,5,1,2147483647,2147483647,2147483647,2147483647]
pg 2.4 is stuck inactive since forever, current state incomplete, last acting 
[5,2147483647,13,1,2147483647,2147483647,8,2147483647]
pg 2.7 is stuck inactive since forever, current state incomplete, last acting 
[12,2,10,7,2147483647,2147483647,2147483647,2147483647]
pg 2.6 is stuck inactive since forever, current state incomplete, last acting 
[9,15,2147483647,4,2,2147483647,2147483647,2147483647]
pg 2.1 is stuck inactive since forever, current state incomplete, last acting 
[2,4,2147483647,13,2147483647,10,2147483647,2147483647]
pg 2.0 is stuck inactive since forever, current state incomplete, last acting 
[14,1,2147483647,4,10,2147483647,2147483647,2147483647]
pg 2.3 is stuck inactive since forever, current state incomplete, last acting 
[14,11,6,2147483647,2147483647,2147483647,2,2147483647]
pg 2.2 is stuck inactive since forever, current state incomplete, last acting 
[13,5,11,2147483647,2147483647,3,2147483647,2147483647]
pg 2.9 is stuck unclean since forever, current state incomplete, last acting 
[4,10,15,2147483647,3,2147483647,2147483647,2147483647]
pg 2.8 is stuck unclean since forever, current state incomplete, last acting 
[0,2147483647,4,2147483647,10,2147483647,15,2147483647]
pg 2.b is stuck unclean since forever, current state incomplete, last acting 
[8,3,14,2147483647,5,2147483647,2147483647,2147483647]
pg 2.a is stuck unclean since forever, current state incomplete, last acting 
[11,7,2,2147483647,2147483647,2147483647,15,2147483647]
pg 2.5 is stuck unclean since forever, current state incomplete, last acting 
[12,8,5,1,2147483647,2147483647,2147483647,2147483647]
pg 2.4 is stuck unclean since forever, current state incomplete, last acting 
[5,2147483647,13,1,2147483647,2147483647,8,2147483647]
pg 2.7 is stuck unclean since forever, current state incomplete, last acting 
[12,2,10,7,2147483647,2147483647,2147483647,2147483647]
pg 2.6 is stuck unclean since forever, current state incomplete, last acting 
[9,15,2147483647,4,2,2147483647,2147483647,2147483647]
pg 2.1 is stuck unclean since forever, current state incomplete, last acting 
[2,4,2147483647,13,2147483647,10,2147483647,2147483647]
pg 2.0 is stuck unclean since forever, current state incomplete, last acting 
[14,1,2147483647,4,10,2147483647,2147483647,2147483647]
pg 2.3 is stuck unclean since forever, current state incomplete, last acting 
[14,11,6,2147483647,2147483647,2147483647,2,2147483647]
pg 2.2 is stuck unclean since forever, current state incomplete, last acting 
[13,5,11,2147483647,2147483647,3,2147483647,2147483647]
pg 2.9 is incomplete, acting 
[4,10,15,2147483647,3,2147483647,2147483647,2147483647] (reducing pool ecpool 
min_size from 6 may help; search ceph.com/docs for 'incomplete')
pg 2.8 is incomplete, acting 
[0,2147483647,4,2147483647,10,2147483647,15,2147483647] (reducing pool ecpool 
min_size from 6 may help; search ceph.com/docs for 'incomplete')
pg 2.b is incomplete, acting 
[8,3,14,2147483647,5,2147483647,2147483647,2147483647] (reducing pool ecpool 
min_size from 6 may help; search ceph.com/docs for 'incomplete')
pg 2.a is incomplete, acting 
[11,7,2,2147483647,2147483647,2147483647,15,2147483647] (reducing pool ecpool 
min_size from 6 may help; search ceph.com/docs for 'incomplete')
pg 2.5 is incomplete, acting 
[12,8,5,1,2147483647,2147483647,2147483647,2147483647] (reducing pool ecpool 
min_size from 6 may help; search ceph.com/docs for 'incomplete')
pg 2.4 is incomplete, acting 
[5,2147483647,13,1,2147483647,2147483647,8,2147483647] (reducing pool ecpool 
min_size from 6 may help; search ceph.com/docs for 'incomplete')
pg 2.7 is incomplete, acting 
[12,2,10,7,2147483

Re: [ceph-users] Erasure coded PGs incomplete

2014-12-16 Thread Loic Dachary
Hi,

The 2147483647 means that CRUSH did not find enough OSD for a given PG. If you 
check the crush rule associated with the erasure coded pool, you will most 
probably find why.

Cheers

On 16/12/2014 23:32, Italo Santos wrote:
> Hello,
> 
> I'm trying to create an erasure pool following  
> http://docs.ceph.com/docs/master/rados/operations/erasure-code/, but when I 
> try create a pool with a specifc erasure-code-profile ("myprofile") the PGs 
> became on incomplete state.
> 
> Anyone can help me?
> 
> Below the profile I created:
> root@ceph0001:~# ceph osd erasure-code-profile get myprofile
> directory=/usr/lib/ceph/erasure-code
> k=6
> m=2
> plugin=jerasure
> technique=reed_sol_van
> 
> The status of cluster:
> root@ceph0001:~# ceph health
> HEALTH_WARN 12 pgs incomplete; 12 pgs stuck inactive; 12 pgs stuck unclean
> 
> health detail:
> root@ceph0001:~# ceph health detail
> HEALTH_WARN 12 pgs incomplete; 12 pgs stuck inactive; 12 pgs stuck unclean
> pg 2.9 is stuck inactive since forever, current state incomplete, last acting 
> [4,10,15,2147483647,3,2147483647,2147483647,2147483647]
> pg 2.8 is stuck inactive since forever, current state incomplete, last acting 
> [0,2147483647,4,2147483647,10,2147483647,15,2147483647]
> pg 2.b is stuck inactive since forever, current state incomplete, last acting 
> [8,3,14,2147483647,5,2147483647,2147483647,2147483647]
> pg 2.a is stuck inactive since forever, current state incomplete, last acting 
> [11,7,2,2147483647,2147483647,2147483647,15,2147483647]
> pg 2.5 is stuck inactive since forever, current state incomplete, last acting 
> [12,8,5,1,2147483647,2147483647,2147483647,2147483647]
> pg 2.4 is stuck inactive since forever, current state incomplete, last acting 
> [5,2147483647,13,1,2147483647,2147483647,8,2147483647]
> pg 2.7 is stuck inactive since forever, current state incomplete, last acting 
> [12,2,10,7,2147483647,2147483647,2147483647,2147483647]
> pg 2.6 is stuck inactive since forever, current state incomplete, last acting 
> [9,15,2147483647,4,2,2147483647,2147483647,2147483647]
> pg 2.1 is stuck inactive since forever, current state incomplete, last acting 
> [2,4,2147483647,13,2147483647,10,2147483647,2147483647]
> pg 2.0 is stuck inactive since forever, current state incomplete, last acting 
> [14,1,2147483647,4,10,2147483647,2147483647,2147483647]
> pg 2.3 is stuck inactive since forever, current state incomplete, last acting 
> [14,11,6,2147483647,2147483647,2147483647,2,2147483647]
> pg 2.2 is stuck inactive since forever, current state incomplete, last acting 
> [13,5,11,2147483647,2147483647,3,2147483647,2147483647]
> pg 2.9 is stuck unclean since forever, current state incomplete, last acting 
> [4,10,15,2147483647,3,2147483647,2147483647,2147483647]
> pg 2.8 is stuck unclean since forever, current state incomplete, last acting 
> [0,2147483647,4,2147483647,10,2147483647,15,2147483647]
> pg 2.b is stuck unclean since forever, current state incomplete, last acting 
> [8,3,14,2147483647,5,2147483647,2147483647,2147483647]
> pg 2.a is stuck unclean since forever, current state incomplete, last acting 
> [11,7,2,2147483647,2147483647,2147483647,15,2147483647]
> pg 2.5 is stuck unclean since forever, current state incomplete, last acting 
> [12,8,5,1,2147483647,2147483647,2147483647,2147483647]
> pg 2.4 is stuck unclean since forever, current state incomplete, last acting 
> [5,2147483647,13,1,2147483647,2147483647,8,2147483647]
> pg 2.7 is stuck unclean since forever, current state incomplete, last acting 
> [12,2,10,7,2147483647,2147483647,2147483647,2147483647]
> pg 2.6 is stuck unclean since forever, current state incomplete, last acting 
> [9,15,2147483647,4,2,2147483647,2147483647,2147483647]
> pg 2.1 is stuck unclean since forever, current state incomplete, last acting 
> [2,4,2147483647,13,2147483647,10,2147483647,2147483647]
> pg 2.0 is stuck unclean since forever, current state incomplete, last acting 
> [14,1,2147483647,4,10,2147483647,2147483647,2147483647]
> pg 2.3 is stuck unclean since forever, current state incomplete, last acting 
> [14,11,6,2147483647,2147483647,2147483647,2,2147483647]
> pg 2.2 is stuck unclean since forever, current state incomplete, last acting 
> [13,5,11,2147483647,2147483647,3,2147483647,2147483647]
> pg 2.9 is incomplete, acting 
> [4,10,15,2147483647,3,2147483647,2147483647,2147483647] (reducing pool ecpool 
> min_size from 6 may help; search ceph.com/docs for 'incomplete')
> pg 2.8 is incomplete, acting 
> [0,2147483647,4,2147483647,10,2147483647,15,2147483647] (reducing pool ecpool 
> min_size from 6 may help; search ceph.com/docs for 'incomplete')
> pg 2.b is incomplete, acting 
> [8,3,14,2147483647,5,2147483647,2147483647,2147483647] (reducing pool ecpool 
> min_size from 6 may help; search ceph.com/docs for 'incomplete')
> pg 2.a is incomplete, acting 
> [11,7,2,2147483647,2147483647,2147483647,15,2147483647] (reducing pool ecpool 
> min_size from 6 may help; search ceph.com/docs for 'incomplet

Re: [ceph-users] Test 6

2014-12-16 Thread Lindsay Mathieson
On Tue, 16 Dec 2014 07:57:19 AM Leen de Braal wrote:
> If you are trying to see if your mails come through, don't check on the
> list. You have a gmail account, gmail removes mails that you have sent
> yourself.

Not the case, I am on a dozen other mailman lists via gmail, all of them show 
my posts. ceph-users is the only exception.

However ceph-us...@ceph.com seems to work reliably rather than using ceph-
us...@lists.ceph.com

> You can check the archives to see.

A number of my posts are missing from there. Some are there, it seems very 
erratic.

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd snapshot slow restore

2014-12-16 Thread Lindsay Mathieson
On 17 December 2014 at 04:50, Robert LeBlanc  wrote:
> There are really only two ways to do snapshots that I know of and they have
> trade-offs:
>
> COW into the snapshot (like VMware, Ceph, etc):
>
> When a write is committed, the changes are committed to a diff file and the
> base file is left untouched. This only has a single write penalty,

This is when you are accessing the snapshot image?

I suspect I'm probably looking at this differently - when I take a snapshot
I never access it "live", I only ever restore it - would that be merging it
back into the base?

>
> COW into the base image (like most Enterprise disk systems with snapshots
> for backups):
>
> When a write is committed, the system reads the blocks to be changed out of
> the base disk and places those original blocks into a diff file, then writes
> the new blocks directly into the base image. The pros to this approach is
> that snapshots can be deleted quickly and the data is "merged" already. Read
> access for the current data is always fast as it only has to search one
> location. The cons are that each write is really a read and two writes,
> recovering data from a snapshot can be slow as the reads have to search one
> or more snapshots.


Whereabout does qcow2 fall on this spectrum?

Thanks,




-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd read speed only 1/4 of write speed

2014-12-16 Thread Christian Balzer
On Tue, 16 Dec 2014 16:26:17 + VELARTIS Philipp Dürhammer wrote:

> Hello,
> 
> Read speed inside our vms (most of them windows) is only ¼ of the write
> speed. Write speed is about 450MB/s - 500mb/s and
> Read is only about 100/MB/s
> 
> Our network is 10Gbit for OSDs and 10GB for MONS. We have 3 Servers with
> 15 osds each
> 

Basically what David Clarke wrote, it has indeed been discussed several
times.
Find my "The woes of sequential reads" thread, it has data and a link to a
blueprint that is attempting to fix this on the Ceph side.
Unfortunately I don't think there has been any progress with this.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd read speed only 1/4 of write speed

2014-12-16 Thread Mark Nelson



On 12/16/2014 07:08 PM, Christian Balzer wrote:

On Tue, 16 Dec 2014 16:26:17 + VELARTIS Philipp Dürhammer wrote:


Hello,

Read speed inside our vms (most of them windows) is only ¼ of the write
speed. Write speed is about 450MB/s - 500mb/s and
Read is only about 100/MB/s

Our network is 10Gbit for OSDs and 10GB for MONS. We have 3 Servers with
15 osds each



Basically what David Clarke wrote, it has indeed been discussed several
times.
Find my "The woes of sequential reads" thread, it has data and a link to a
blueprint that is attempting to fix this on the Ceph side.
Unfortunately I don't think there has been any progress with this.

Christian



Yeah read ahead definitely seems to help quite a bit.  I've been 
wondering now with the work going into improving random read performance 
on SSDs if we are going to pay for it sooner or later, but so far 
increasing readahead seems to typically be a win.


Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] can not add osd

2014-12-16 Thread yang . bin18
From official Ceph docs,i still get the same err:

[root@node3 ceph-cluster]# ceph-deploy osd activate node2:/dev/sdb1
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.21): /usr/bin/ceph-deploy osd 
activate node2:/dev/sdb1
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks node2:/dev/sdb1:
[node2][DEBUG ] connected to host: node2 
[node2][DEBUG ] detect platform information from remote host
[node2][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.0.1406 Core
[ceph_deploy.osd][DEBUG ] activating host node2 disk /dev/sdb1
[ceph_deploy.osd][DEBUG ] will use init type: sysvinit
[node2][INFO  ] Running command: ceph-disk -v activate --mark-init 
sysvinit --mount /dev/sdb1
[node2][WARNIN] INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE 
-ovalue -- /dev/sdb1
[node2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[node2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[node2][WARNIN] DEBUG:ceph-disk:Mounting /dev/sdb1 on 
/var/lib/ceph/tmp/mnt.NC9pdv with options noatime,inode64
[node2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o 
noatime,inode64 -- /dev/sdb1 /var/lib/ceph/tmp/mnt.NC9pdv
[node2][WARNIN] DEBUG:ceph-disk:Cluster uuid is 
cadb2f14-e2ea-41fb-8050-a2f0fe447475
[node2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=fsid
[node2][WARNIN] DEBUG:ceph-disk:Cluster name is ceph
[node2][WARNIN] DEBUG:ceph-disk:OSD uuid is 
8bbf6631-8722-4e97-bf18-06253143acf6
[node2][WARNIN] DEBUG:ceph-disk:Allocating OSD id...
[node2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster 
ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring osd create --concise 
8bbf6631-8722-4e97-bf18-06253143acf6
[node2][WARNIN] ERROR:ceph-disk:Failed to activate
[node2][WARNIN] DEBUG:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.NC9pdv
[node2][WARNIN] Traceback (most recent call last):
[node2][WARNIN]   File "/usr/sbin/ceph-disk", line 2784, in 
[node2][WARNIN] main()
[node2][WARNIN]   File "/usr/sbin/ceph-disk", line 2762, in main
[node2][WARNIN] args.func(args)
[node2][WARNIN]   File "/usr/sbin/ceph-disk", line 1996, in main_activate
[node2][WARNIN] init=args.mark_init,
[node2][WARNIN]   File "/usr/sbin/ceph-disk", line 1819, in mount_activate
[node2][WARNIN] os.rmdir(path)
[node2][WARNIN] OSError: [Errno 16] Device or resource busy: 
'/var/lib/ceph/tmp/mnt.NC9pdv'
[node2][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk 
-v activate --mark-init sysvinit --mount /dev/sdb1



发件人: Karan Singh 
收件人: yang.bi...@zte.com.cn, 
抄送:   ceph-users 
日期:   2014/12/16 22:51
主题:   Re: [ceph-users] can not add osd



Hi

You logs does not provides much information , if you are following any 
other documentation for Ceph , i would recommend you to follow official 
Ceph docs.

http://ceph.com/docs/master/start/quick-start-preflight/




Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/


On 16 Dec 2014, at 09:55, yang.bi...@zte.com.cn wrote:

hi 

When i execute "ceph-deploy osd prepare node3:/dev/sdb",always come out 
err like this : 

[node3][WARNIN] INFO:ceph-disk:Running command: /bin/umount -- 
/var/lib/ceph/tmp/mnt.u2KXW3 
[node3][WARNIN] umount: /var/lib/ceph/tmp/mnt.u2KXW3: target is busy. 

Then i execute "/bin/umount -- /var/lib/ceph/tmp/mnt.u2KXW3",result is ok. 



ZTE Information Security Notice: The information contained in this mail 
(and any attachment transmitted herewith) is privileged and confidential 
and is intended for the exclusive use of the addressee(s).  If you are not 
an intended recipient, any disclosure, reproduction, distribution or other 
dissemination or use of the information contained is strictly prohibited. 
If you have received this mail in error, please delete it and notify us 
immediately.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



ZTE Information Security Notice: The information contained in this mail (and 
any attachment transmitted herewith) is privileged and confidential and is 
intended for the exclusive use of the addressee(s).  If you are not an intended 
recipient, any disclosure, reproduction, distribution or other d

Re: [ceph-users] File System stripping data

2014-12-16 Thread Kevin Shiah
Hello,

I am trying to set the extended attribute to a newly created created
directory (call it "dir" here) using setfattr. I run the following command.

setfattr -n ceph.dir.layout.stripe_count -v 2 dir
And return:

setfattr: dir: Operation not supported

I am wondering if the underlying file system does not support xattr. Has
anyone ever run into similar problem before?

I deployed CephFS on Debian wheezy.
And here is the mounting information:
ceph-fuse on /dfs type fuse.ceph-fuse
(rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)

Many thanks,
Kevin

On Mon Dec 15 2014 at 1:49:15 AM PST John Spray 
wrote:

> Yes, setfattr is the preferred way.  The docs are here:
> http://ceph.com/docs/master/cephfs/file-layouts/
>
> Cheers,
> John
>
> On Mon, Dec 15, 2014 at 8:12 AM, Ilya Dryomov 
> wrote:
> > On Sun, Dec 14, 2014 at 10:38 AM, Kevin Shiah  wrote:
> >> Hello All,
> >>
> >> Does anyone know how to configure data stripping when using ceph as file
> >> system? My understanding is that configuring stripping with rbd is only
> for
> >> block device.
> >
> > You should be able to set layout.* xattrs on directories and empty
> > files (directory layout just sets the default layout for the newly
> > created files within it).  There are also a couple of ioctls which do
> > essentially the same thing but I think their use is discouraged.
> > John will correct me if I'm wrong.
> >
> > Thanks,
> >
> > Ilya
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd snapshot slow restore

2014-12-16 Thread Robert LeBlanc
On Tue, Dec 16, 2014 at 5:37 PM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:
>
> On 17 December 2014 at 04:50, Robert LeBlanc  wrote:
> > There are really only two ways to do snapshots that I know of and they
> have
> > trade-offs:
> >
> > COW into the snapshot (like VMware, Ceph, etc):
> >
> > When a write is committed, the changes are committed to a diff file and
> the
> > base file is left untouched. This only has a single write penalty,
>
> This is when you are accessing the snapshot image?
>
> I suspect I'm probably looking at this differently - when I take a snapshot
> I never access it "live", I only ever restore it - would that be merging it
> back into the base?
>

I'm not sure what you mean by this. If you take a snapshot then you
technically only work on the snapshot. If in VMware (sorry, most of my
experience comes from VMware, but I believe KVM is the same) you take a
snapshot, then the VM immediately uses the snapshot for all the
writes/reads. You then have three options: 1. keep the snapshot
indefinitely, 2. revert back to the snapshot point, or 3. delete the
snapshot and merge the changes into the base to make it permanent.

In case "2" the reverting of the snapshot is fast because it only deletes
the diff file and points back to the original base disk ready to make a new
diff file.

In case "3" depending on how much write activity to "new" blocks have
happened, then it may take a long time to copy the blocks into the base
disk.

Rereading your previous post, I understand that you are using rbd snapshots
and then using the rbd rollback command. You are testing this performance
vs. the rollback feature in QEMU/KVM when on local/NFS disk. Is that
accurate?

I haven't used the rollback feature. If you want to go back to a snapshot,
would it be faster to create a clone off the snapshot, then run your VM off
that, then just delete and recreate the clone?

rbd snap create rbd/test-image@snap1
rbd snap protect rbd/test-image@snap1
rbd clone rbd/test-image@snap1 rbd/test-image-snap1

You can then run:

rbd rm rbd/test-image-snap1
rbd clone rbd/test-image@snap1 rbd/test-image-snap1

to revert back to the original snapshot.


> Whereabout does qcow2 fall on this spectrum?
>
> I think qcow2 falls into the same category as VMware, but I'm still
cutting my teeth on QEMU/KVM.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Test 6

2014-12-16 Thread Craig Lewis
I always wondered why my posts didn't show up until somebody replied to
them.  I thought it was my filters.

Thanks!

On Mon, Dec 15, 2014 at 10:57 PM, Leen de Braal  wrote:
>
> If you are trying to see if your mails come through, don't check on the
> list. You have a gmail account, gmail removes mails that you have sent
> yourself.
> You can check the archives to see.
>
> And your mails did come on the list.
>
>
> > --
> > Lindsay
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> L. de Braal
> BraHa Systems
> NL - Terneuzen
> T +31 115 649333
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Crash makes whole cluster unusable ?

2014-12-16 Thread Craig Lewis
So the problem started once remapping+backfilling started, and lasted until
the cluster was healthy again?  Have you adjusted any of the recovery
tunables?  Are you using SSD journals?

I had a similar experience the first time my OSDs started backfilling.  The
average RadosGW operation latency went from 0.1 seconds to 10 seconds,
which is longer than the default HAProxy timeout.  Fun times.

Since then, I've increased HAProxy's timeouts, de-prioritized Ceph's
recovery, and I added SSD journals.

The relevant sections of ceph.conf are:

[global]
  mon osd down out interval = 900
  mon osd min down reporters = 9
  mon osd min down reports = 12
  mon warn on legacy crush tunables = false
  osd pool default flag hashpspool = true

[osd]
  osd max backfills = 3
  osd recovery max active = 3
  osd recovery op priority = 1
  osd scrub sleep = 1.0
  osd snap trim sleep = 1.0


Before the SSD journals, I had osd_max_backfills and
osd_recovery_max_active set to 1.  I watched my latency graphs, and used ceph
tell osd.\* injectargs '--osd_max_backfills 1 --osd_recovery_max_active 1 to
tweak the values until the latency was acceptable.

On Tue, Dec 16, 2014 at 5:37 AM, Christoph Adomeit <
christoph.adom...@gatworks.de> wrote:
>
>
> Hi there,
>
> today I had an osd crash with ceph 0.87/giant which made my hole cluster
> unusable for 45 Minutes.
>
> First it began with a disk error:
>
> sd 0:1:2:0: [sdc] CDB: Read(10)Read(10):: 28 28 00 00 0d 15 fe d0 fd 7b e8
> f8 00 00 00 00 b0 08 00 00
> XFS (sdc1): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
>
> Then most other osds found out that my osd.3 is down:
>
> 2014-12-16 08:45:15.873478 mon.0 10.67.1.11:6789/0 3361077 : cluster
> [INF] osd.3 10.67.1.11:6810/713621 failed (42 reports from 35 peers after
> 23.642482 >= grace 23.348982)
>
> 5 minutes later the osd is marked as out:
> 2014-12-16 08:50:21.095903 mon.0 10.67.1.11:6789/0 3361367 : cluster
> [INF] osd.3 out (down for 304.581079)
>
> However, since 8:45 until 9:20 I have 1000 slow requests and 107
> incomplete pgs. Many requests are not answered:
>
> 2014-12-16 08:46:03.029094 mon.0 10.67.1.11:6789/0 3361126 : cluster
> [INF] pgmap v6930583: 4224 pgs: 4117 active+clean, 107 incomplete; 7647 GB
> data, 19090 GB used, 67952 GB / 87042 GB avail; 2307 kB/s rd, 2293 kB/s wr,
> 407 op/s
>
> Also a recovery to another osd was not starting
>
> Seems the osd thinks it is still up and all other osds think this osd is
> down ?
> I found this in the log of osd3:
> ceph-osd.3.log:2014-12-16 08:45:19.319152 7faf81296700  0
> log_channel(default) log [WRN] : map e61177 wrongly marked me down
> ceph-osd.3.log:  -440> 2014-12-16 08:45:19.319152 7faf81296700  0
> log_channel(default) log [WRN] : map e61177 wrongly marked me down
>
> Luckily I was able to restart osd3 and everything was working again but I
> do not understand what has happened. The cluster ways simply not usable for
> 45 Minutes.
>
> Any ideas
>
> Thanks
>   Christoph
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd snapshot slow restore

2014-12-16 Thread Lindsay Mathieson
On 17 December 2014 at 11:50, Robert LeBlanc  wrote:
>
>
> On Tue, Dec 16, 2014 at 5:37 PM, Lindsay Mathieson
>  wrote:
>>
>> On 17 December 2014 at 04:50, Robert LeBlanc  wrote:
>> > There are really only two ways to do snapshots that I know of and they
>> > have
>> > trade-offs:
>> >
>> > COW into the snapshot (like VMware, Ceph, etc):
>> >
>> > When a write is committed, the changes are committed to a diff file and
>> > the
>> > base file is left untouched. This only has a single write penalty,
>>
>> This is when you are accessing the snapshot image?
>>
>> I suspect I'm probably looking at this differently - when I take a
>> snapshot
>> I never access it "live", I only ever restore it - would that be merging
>> it
>> back into the base?
>
>
> I'm not sure what you mean by this. If you take a snapshot then you
> technically only work on the snapshot. If in VMware (sorry, most of my
> experience comes from VMware, but I believe KVM is the same) you take a
> snapshot, then the VM immediately uses the snapshot for all the
> writes/reads. You then have three options: 1. keep the snapshot
> indefinitely, 2. revert back to the snapshot point, or 3. delete the
> snapshot and merge the changes into the base to make it permanent.

I suspect I'm using terms different;y, probably because I don't know
what is really happening underneath. To me a VM snapshot is a static
thing you you can roll back to, but all VM activity takes place on the
"main" image.


>
> In case "2" the reverting of the snapshot is fast because it only deletes
> the diff file and points back to the original base disk ready to make a new
> diff file.

What happens if you have multiple snapshots? e.g. Snap 1, 2 & 3.
Deleting Snap 2 won't be a simple rollback to the base.

>
> In case "3" depending on how much write activity to "new" blocks have
> happened, then it may take a long time to copy the blocks into the base
> disk.
>
> Rereading your previous post, I understand that you are using rbd snapshots
> and then using the rbd rollback command. You are testing this performance
> vs. the rollback feature in QEMU/KVM when on local/NFS disk. Is that
> accurate?

Yes, though the rollback feature is a function of the image format
used (e.g qcow2), not something specific to qemu. If you use RAW then
snapshots are not supported.

>
> I haven't used the rollback feature. If you want to go back to a snapshot,
> would it be faster to create a clone off the snapshot, then run your VM off
> that, then just delete and recreate the clone?

I'll test that, but wouldn't it involve flattening the clone, which is
also a very slow process?

I don't know if this is relevant, but with qcow2 and vmware rolling
back or deleting snapshots are both operations that only take a few
tens of seconds.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Placing Different Pools on Different OSDS

2014-12-16 Thread Yujian Peng
I've found the problem.
The command "ceph osd crush rule create-simple ssd_ruleset ssd root" should
be "ceph osd crush rule create-simple ssd_ruleset ssd host"

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help with Integrating Ceph with various Cloud Storage

2014-12-16 Thread Manoj Singh
Hi All,

I am new to Ceph. Due to physical machines shortage I have installed Ceph
cluster with single OSD and MON in a single Virtual Machine.

I have few queries as below:

1.  Whether having the Ceph setup on a VM is fine or it require to be on
Physical server.

2. Since Amazon S3, Azure Blob Storage, Swift are Object based Storage,
what is the feasibility of attaching these Cloud Storage to Ceph and to be
able to allocate disc space from the same while creating new VM from local
CloudStack or OpenStack

3. When I am integrating CloudStack with Ceph whether libvert should be
installed on the CloudStack management server or on Ceph server. From
diagram given in Ceph documentation it's bit confusing.
Thank you in advance. your help shall be really appreciated.

Best Regards,
Manoj Kumar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com