[ceph-users] journal size suggestions

2013-07-09 Thread Gandalf Corvotempesta
Hi, i'm planning a new cluster on a 10GbE network. Each storage node will have a maximum of 12 SATA disks and 2 SSD as journals. What do you suggest as journal size for each OSD? 5GB is enough? Should I just consider SATA writing speed when calculating journal size or also network speed? _

Re: [ceph-users] journal size suggestions

2013-07-09 Thread Gandalf Corvotempesta
frequency is 5 seconds. What do you mean with fine tuning spinning storage media? On which tuning are you referring to? Il giorno 09/lug/2013 23:45, "Andrey Korolyov" ha scritto: > On Wed, Jul 10, 2013 at 1:16 AM, Gandalf Corvotempesta > wrote: > > Hi, > > i'm

Re: [ceph-users] Num of PGs

2013-07-12 Thread Gandalf Corvotempesta
2013/7/12 Mark Nelson : > At large numbers of PGs it may not matter very much, but I don't think it > would hurt either! > > Basically this has to do with how ceph_stable_mod works. At > non-power-of-two values, the bucket counts aren't even, but that's only a > small part of the story and may ult

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-07-16 Thread Gandalf Corvotempesta
2013/6/20 Matthew Anderson : > Hi All, > > I've had a few conversations on IRC about getting RDMA support into Ceph and > thought I would give it a quick attempt to hopefully spur some interest. > What I would like to accomplish is an RSockets only implementation so I'm > able to use Ceph, RBD and

[ceph-users] SSD suggestions as journal

2013-07-22 Thread Gandalf Corvotempesta
I'm looking at some SSDs drives to be used as journal. Seagate 600 should be the better in write intensive operation (like a journal): http://www.storagereview.com/seagate_600_pro_enterprise_ssd_review what do you suggest? Is this good enough ? Should I look for write-intensive operations when se

Re: [ceph-users] SSD suggestions as journal

2013-07-22 Thread Gandalf Corvotempesta
2013/7/22 Mark Nelson : >> http://www.storagereview.com/seagate_600_pro_enterprise_ssd_review "If you used this SSD for 100% sequential writes, you could theoretically kill it in a little more than a month." very bad. Any other suggestions for SSD device? _

Re: [ceph-users] SSD recommendations for OSD journals

2013-07-22 Thread Gandalf Corvotempesta
2013/7/22 Chen, Xiaoxi : > With “journal writeahead”,the data first write to journal ,ack to the > client, and write to OSD, note that, the data always keep in memory before > it write to both OSD and journal,so the write is directly from memory to > OSDs. This mode suite for XFS and EXT4. What ha

Re: [ceph-users] SSD recommendations for OSD journals

2013-07-22 Thread Gandalf Corvotempesta
2013/7/22 Chen, Xiaoxi : > Imaging you have several writes have been flushed to journal and acked,but > not yet write to disk. Now the system crash by kernal panic or power > failure,you will lose your data in ram disk,thus lose data that assumed to be > successful written. The same apply in ca

Re: [ceph-users] SSD suggestions as journal

2013-07-22 Thread Gandalf Corvotempesta
2013/7/22 Mark Nelson : > I don't have any in my test lab, but the DC S3700 continues to look like a > good option and has a great reputation, but might be a bit pricey. From that > article it looks like the Micron P400m might be worth looking at too, but > seems to be a bit slower. DC S3500 shoul

[ceph-users] Multiple cluster addresses

2013-07-24 Thread Gandalf Corvotempesta
Hi to all, I would like to archieve a faul tollerance cluster with an infiniband network. Actually, one rsocket is bound to a single IB port. In case of a dual-port HBA, I have to use multiple rsockets to use both ports. Is possible to configure ceph with multiple cluster addresses for each OSDs?

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-28 Thread Gandalf Corvotempesta
2013/6/20 Matthew Anderson : > Hi All, > > I've had a few conversations on IRC about getting RDMA support into Ceph and > thought I would give it a quick attempt to hopefully spur some interest. > What I would like to accomplish is an RSockets only implementation so I'm > able to use Ceph, RBD and

[ceph-users] VM storage and OSD Ceph failures

2013-09-17 Thread Gandalf Corvotempesta
Hi to all. Let's assume a Ceph cluster used to store VM disk images. VMs will be booted directly from the RBD. What will happens in case of OSD failure if the failed OSD is the primary where VM is reading from ? ___ ceph-users mailing list ceph-users@lis

Re: [ceph-users] VM storage and OSD Ceph failures

2013-09-17 Thread Gandalf Corvotempesta
2013/9/17 Gregory Farnum : > The VM read will hang until a replica gets promoted and the VM resends the > read. In a healthy cluster with default settings this will take about 15 > seconds. Thank you. ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] 10/100 network for Mons?

2013-09-18 Thread Gandalf Corvotempesta
Hi to all. Actually I'm building a test cluster with 3 OSD servers connected with IPoIB for cluster networks and 10GbE for public network. I have to connect these OSDs to some MONs servers located in another rack with no gigabit or 10Gb connection. Could I use some 10/100 networks ports? Which ki

[ceph-users] MONs numbers, hardware sizing and write ack

2013-09-19 Thread Gandalf Corvotempesta
Hi to all, increasing the total numbers of MONs available in a cluster, for example growing from 3 to 5, will also decrease the hardware requirements (i.e. RAM and CPU) for each mon instance ? I'm asking this because our cluster will be made with 5 OSD server and I can easily put one MON on each O

Re: [ceph-users] MONs numbers, hardware sizing and write ack

2013-09-19 Thread Gandalf Corvotempesta
2013/9/19 Joao Eduardo Luis : > We have no benchmarks on that, that I am aware of. But the short and sweet > answer should be "not really, highly unlikely". > > If anything, increasing the number of mons should increase the response > time, although for such low numbers that should also be virtual

Re: [ceph-users] About use same SSD for OS and Journal

2013-10-26 Thread Gandalf Corvotempesta
2013/10/24 Wido den Hollander : > I have never seen one Intel SSD fail. I've been using them since the X25-M > 80GB SSDs and those are still in production without even one wearing out or > failing. Which kind of SSD are you using, right now, as journal ? ___

[ceph-users] USB pendrive as boot disk

2013-11-05 Thread Gandalf Corvotempesta
Hi, what do you think to use a USB pendrive as boot disk for OSDs nodes? Pendrive are cheaper and bigger, and doing this will allow me to use all spinning disks and SSDs as OSD storage/journal. More over, in a future, i'll be able to boot from net replacing the pendrive without loosing space on sp

Re: [ceph-users] USB pendrive as boot disk

2013-11-05 Thread Gandalf Corvotempesta
2013/11/5 : > It has been reported that the system is heavy on the OS during recovery; Why? Recovery is made from OSDs/SSD, why ceph is heavy on OS disks? There is nothing usefull to read from that disks during a recovery. ___ ceph-users mailing list ce

Re: [ceph-users] USB pendrive as boot disk

2013-11-06 Thread Gandalf Corvotempesta
Il 06/nov/2013 23:12 "Craig Lewis" ha scritto: > > For my Ceph cluster, I'm going back to SSDs for the OS. Instead of using two of my precious 3.5" bays, I'm buying some PCI 2.5" drive bays: http://www.amazon.com/Syba-Mount-Mobile-2-5-Inch-SY-MRA25023/dp/B0080V73RE, and plugging them into the mot

[ceph-users] Docker

2013-11-28 Thread Gandalf Corvotempesta
Anybody using MONs and RGW inside docker containers? I would like to use a server with two docker containers, one for mon and one for RGW This to archieve a better isolation between services and some reusable components (the same container can be exported and used multiple times on multiple server

Re: [ceph-users] installing OS on software RAID

2013-11-30 Thread Gandalf Corvotempesta
2013/11/25 James Harper : > Is the OS doing anything apart from ceph? Would booting a ramdisk-only system > from USB or compact flash work? This is the same question i've made some times ago. Is ok to use USB as standard OS (OS, non OSD!) disk? OSDs and journals will be on dedicated disks. USB wi

[ceph-users] Journal, SSD and OS

2013-12-03 Thread Gandalf Corvotempesta
Hi, what do you think to use the same SSD as journal and as root partition? Forexample: 1x 128GB SSD 6 OSD 15GB for each journal, for each OSD 5GB as root partition for OS. This give me 105GB of used space and 23GB of unused space (i've read somewhere that is better to not use the whole SSD f

Re: [ceph-users] Journal, SSD and OS

2013-12-05 Thread Gandalf Corvotempesta
2013/12/4 Simon Leinen : > I think this is a fine configuration - you won't be writing to the root > partition too much, outside journals. We also put journals on the same > SSDs as root partitions (not that we're very ambitious about > performance...). Do you suggest a RAID1 for the OS partition

Re: [ceph-users] Journal, SSD and OS

2013-12-06 Thread Gandalf Corvotempesta
2013/12/6 Sebastien Han : > @James: I think that Gandalf’s main idea was to save some costs/space on the > servers so having dedicated disks is not an option. (that what I understand > from your comment “have the OS somewhere else” but I could be wrong) You are right. I don't have space for one

Re: [ceph-users] USB pendrive as boot disk

2013-12-16 Thread Gandalf Corvotempesta
2013/11/7 Kyle Bader : > Ceph handles it's own logs vs using syslog so I think your going to have to > write to tmpfs and have a logger ship it somewhere else quickly. I have a > feeling Ceph logs will eat a USB device alive, especially if you have to > crank up debugging. I wasn't aware of this.

Re: [ceph-users] USB pendrive as boot disk

2013-12-17 Thread Gandalf Corvotempesta
2013/12/16 Gregory Farnum : > There are log_to_syslog and err_to_syslog config options that will > send the ceph log output there. I don't remember all the config stuff > you need to set up properly and be aware of, but you should be able to > find it by searching the list archives or the docs. Th

[ceph-users] ceph-deploy: cluster network and admin node

2013-12-17 Thread Gandalf Corvotempesta
Hi to all I'm playing with ceph-deploy for the first time. Some questions: 1. how can I set a cluster network to be used by OSDs? Should I set it manually? 2. does admin node need to be reachable from each other server or can I use a natted workstation ? __

Re: [ceph-users] ceph-deploy: cluster network and admin node

2013-12-17 Thread Gandalf Corvotempesta
2013/12/17 Alfredo Deza : > The docs have a quick section to do this with ceph-deploy > (http://ceph.com/docs/master/start/quick-ceph-deploy/) > Have you seen that before? Or do you need something that covers a > cluster in more detail? There isnt' anything about how to define a cluster network fo

Re: [ceph-users] ceph-deploy: cluster network and admin node

2013-12-19 Thread Gandalf Corvotempesta
2013/12/17 Gandalf Corvotempesta : > There isnt' anything about how to define a cluster network for OSD. > I don't know how to set a cluster address to each OSD. No help about this? I would like to set a cluster-address for each OSD Is this possible

Re: [ceph-users] Sanity check of deploying Ceph very unconventionally (on top of RAID6, with very few nodes and OSDs)

2013-12-22 Thread Gandalf Corvotempesta
2013/12/17 Christian Balzer : > Network: > Infiniband QDR, 2x 18port switches (interconnected of course), redundant > paths everywhere, including to the clients (compute nodes). Are you using IPoIB ? How do you interconnect both switches without making loops ? AFAIK, IB switches doesn't support ST

[ceph-users] Chef cookbooks

2014-01-29 Thread Gandalf Corvotempesta
I'm looking at this: https://github.com/ceph/ceph-cookbooks seems to support the whole ceph stack (rgw, mons, osd, msd) Here: http://wiki.ceph.com/Guides/General_Guides/Deploying_Ceph_with_Chef#Configure_your_Ceph_Environment I can see that I need to configure the environment as for example and I

[ceph-users] ceph-deploy: update ceph.conf

2014-01-29 Thread Gandalf Corvotempesta
Hi, I would like to customize my ceph.conf generated by ceph-deploy. Should I customize ceph.conf stored on admin node and then sync it on each ceph nodes? If yes: 1. can I sync directly from ceph-deploy or I have to sync manually via scp ? 2. I don't see any host definition in ceph.conf, what wi

[ceph-users] clock skew

2014-01-30 Thread Gandalf Corvotempesta
Hi. I'm using ntpd on each ceph server and is syncing properly but every time that I reboot, ceph starts in degraded mode with "clock skew" warning. The only way that I have to solve this is manually restart ceph on each node (without resyncing clock) Any suggestion ?

Re: [ceph-users] clock skew

2014-01-30 Thread Gandalf Corvotempesta
2014-01-30 Emmanuel Lacour : > here, I just wait until the skew is finished, without touching ceph. It > doesn't seems to do anything bad ... I've waited more than 1 hour with no success. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists

[ceph-users] Add RGW replication

2014-03-01 Thread Gandalf Corvotempesta
Hi, I have a working ceph cluster. Is possible to add RGW replication across two sites in a second time or is a feature that needs to be implemented from the start? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/c

Re: [ceph-users] clock skew

2014-03-12 Thread Gandalf Corvotempesta
2014-01-30 18:41 GMT+01:00 Eric Eastman : > I have this problem on some of my Ceph clusters, and I think it is due to > the older hardware the I am using does not have the best clocks. To fix the > problem, I setup one server in my lab to be my local NTP time server, and > then on each of my Ceph

[ceph-users] Wrong PG nums

2014-03-12 Thread Gandalf Corvotempesta
Hi to all I have this in my conf: # grep 'pg num' /etc/ceph/ceph.conf osd pool default pg num = 5600 But: # ceph osd pool get data pg_num pg_num: 64 Is this normal ? Why just 64 pg was created ? ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] OSD down after PG increase

2014-03-12 Thread Gandalf Corvotempesta
I've increased PG number to a running cluster. After this operation, all OSDs from one node was marked as down. Now, after a while, i'm seeing that OSDs are slowly coming up again (sequentially) after rebalancing. Is this an expected behaviour ? ___ cep

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Gandalf Corvotempesta
2014-03-13 9:02 GMT+01:00 Andrey Korolyov : > Yes, if you have essentially high amount of commited data in the cluster > and/or large number of PG(tens of thousands). I've increased from 64 to 8192 PGs > If you have a room to > experiment with this transition from scratch you may want to play wit

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Gandalf Corvotempesta
2014-03-13 10:53 GMT+01:00 Kasper Dieter : > After adding two new pools (each with 2 PGs) > 100 out of 140 OSDs are going down + out. > The cluster never recovers. In my case, cluster recovered after a couple of hours. How much time did you wait ? __

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Gandalf Corvotempesta
2014-03-13 11:19 GMT+01:00 Dan Van Der Ster : > Do you mean you used PG splitting? > > You should split PGs by a factor of 2x at a time. So to get from 64 to 8192, > do 64->128, then 128->256, ..., 4096->8192. I've brutally increased, no further steps. 64 -> 8192 :-)

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Gandalf Corvotempesta
2014-03-13 11:23 GMT+01:00 Gandalf Corvotempesta : > I've brutally increased, no further steps. > > 64 -> 8192 :-) I'm also unsure if 8192 PGs are correct for my cluster. At maximum i'll have 168 OSDs (14 servers, 12 disks each, 1 osd per disk), with replica set to

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Gandalf Corvotempesta
2014-03-13 11:26 GMT+01:00 Dan Van Der Ster : > See http://tracker.ceph.com/issues/6922 > > This is explicity blocked in latest code (not sure if thats released yet). This seems to explain my behaviour ___ ceph-users mailing list ceph-users@lists.ceph.co

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Gandalf Corvotempesta
2014-03-13 11:32 GMT+01:00 Dan Van Der Ster : > Do you have any other pools? Remember that you need to include _all_ pools > in the PG calculation, not just a single pool. Actually I have only standard pools (that should be 3) In production i'll also have RGW. So, which is the exact equation to d

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Gandalf Corvotempesta
So, in normal condition with RGW enabled, only 2 pools has data on it: "data" and ".rgw.buckets" ? In this case, I could use ReplicaNum*2 2014-03-13 11:48 GMT+01:00 Dan Van Der Ster : > On 13 Mar 2014 at 11:41:30, Gandalf Corvotempesta > (gandalf.corvotempe...@gmail.com

Re: [ceph-users] clock skew

2014-03-13 Thread Gandalf Corvotempesta
2014-03-13 12:59 GMT+01:00 Joao Eduardo Luis : > Anyway, most timeouts will hold for 5 seconds. Allowing clock drifts up to > 1 second may work, but we don't have hard data to support such claim. Over > a second of drift may be problematic if the monitors are under some workload > and message han

[ceph-users] RadosGW: bad request

2014-04-07 Thread Gandalf Corvotempesta
I'm getting these trying to upload any file: 2014-04-07 14:33:27.084369 7f5268f86700 5 Getting permissions id=testuser owner=testuser perm=2 2014-04-07 14:33:27.084372 7f5268f86700 10 uid=testuser requested perm (type)=2, policy perm=2, user_perm_mask=2, acl perm=2 2014-04-07 14:33:27.084377 7f5

Re: [ceph-users] RadosGW: bad request

2014-04-09 Thread Gandalf Corvotempesta
2014-04-07 20:24 GMT+02:00 Yehuda Sadeh : > Try bumping up logs (debug rgw = 20, debug ms = 1). Not enough info > here to say much, note that it takes exactly 30 seconds for the > gateway to send the error response, may be some timeout. I'd verify > that the correct fastcgi module is running. Sorr

[ceph-users] Fwd: RadosGW: bad request

2014-04-14 Thread Gandalf Corvotempesta
-- Forwarded message -- From: Gandalf Corvotempesta Date: 2014-04-09 14:31 GMT+02:00 Subject: Re: [ceph-users] RadosGW: bad request To: Yehuda Sadeh Cc: "ceph-users@lists.ceph.com" 2014-04-07 20:24 GMT+02:00 Yehuda Sadeh : > Try bumping up logs (debug rgw = 20,

[ceph-users] Fwd: RadosGW: bad request

2014-04-23 Thread Gandalf Corvotempesta
-- Forwarded message -- From: Gandalf Corvotempesta Date: 2014-04-14 16:06 GMT+02:00 Subject: Fwd: [ceph-users] RadosGW: bad request To: "ceph-users@lists.ceph.com" -- Forwarded message ------ From: Gandalf Corvotempesta Date: 2014-04-09 14:31 GMT+02:00 S

[ceph-users] cluster_network ignored

2014-04-24 Thread Gandalf Corvotempesta
I'm trying to configure a small ceph cluster with both public and cluster networks. This is my conf: [global] public_network = 192.168.0/24 cluster_network = 10.0.0.0/24 auth cluster required = cephx auth service required = cephx auth client required = cephx fsid = 004baba0-74dc-4429-8

[ceph-users] OOM-Killer for ceph-osd

2014-04-24 Thread Gandalf Corvotempesta
During a recovery, I'm hitting oom-killer for ceph-osd because it's using more than 90% of avaialble ram (8GB) How can I decrease the memory footprint during a recovery ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinf

Re: [ceph-users] cluster_network ignored

2014-04-24 Thread Gandalf Corvotempesta
2014-04-24 18:09 GMT+02:00 Peter : > Do you have a typo? : > > public_network = 192.168.0/24 > > > should this read: > > public_network = 192.168.0.0/24 Sorry, it was a typo when posting in list. ceph.conf is correct. ___ ceph-users mailing list ceph-use

Re: [ceph-users] cluster_network ignored

2014-04-26 Thread Gandalf Corvotempesta
cluster IP's defined in the host file on each OSD > server? As I understand it, the mon's do not use a cluster network, only the > OSD servers. > > -Original Message- > From: ceph-users-boun...@lists.ceph.com > [mailto:ceph-users-boun...@lists.ceph.com] On Be

Re: [ceph-users] OOM-Killer for ceph-osd

2014-04-27 Thread Gandalf Corvotempesta
ount(>~1e8) or any extraordinary configuration parameter. > > On Mon, Apr 28, 2014 at 12:26 AM, Gandalf Corvotempesta > wrote: >> So, are you suggesting to lower the pg count ? >> Actually i'm using the suggested number of OSD*100/Replicas >> and I have just 2 OSD

Re: [ceph-users] OOM-Killer for ceph-osd

2014-04-27 Thread Gandalf Corvotempesta
2014-04-27 23:20 GMT+02:00 Andrey Korolyov : > For the record, ``rados df'' will give an object count. Would you mind > to send out your ceph.conf? I cannot imagine what parameter may raise > memory consumption so dramatically, so config at a glance may reveal > some detail. Also core dump should b

Re: [ceph-users] OOM-Killer for ceph-osd

2014-04-27 Thread Gandalf Corvotempesta
So, are you suggesting to lower the pg count ? Actually i'm using the suggested number of OSD*100/Replicas and I have just 2 OSDs per server. 2014-04-24 19:34 GMT+02:00 Andrey Korolyov : > On 04/24/2014 08:14 PM, Gandalf Corvotempesta wrote: >> During a recovery, I'm hitting

Re: [ceph-users] OOM-Killer for ceph-osd

2014-04-28 Thread Gandalf Corvotempesta
2014-04-27 23:58 GMT+02:00 Andrey Korolyov : > Nothing looks wrong, except heartbeat interval which probably should > be smaller due to recovery considerations. Try ``ceph osd tell X heap > release'' and if it will not change memory consumption, file a bug. What should I look for running this ? Se

Re: [ceph-users] cluster_network ignored

2014-04-28 Thread Gandalf Corvotempesta
2014-04-26 12:06 GMT+02:00 Gandalf Corvotempesta : > I've not defined cluster IPs for each OSD server but only the whole subnet. > Should I define each IP for each OSD ? This is not wrote on docs and > could be tricky to do this in big environments with hundreds of nodes I've

Re: [ceph-users] cluster_network ignored

2014-04-28 Thread Gandalf Corvotempesta
2014-04-28 17:17 GMT+02:00 Kurt Bauer : > What do you mean by "I see all OSDs down"? I mean that my OSDs are detected as down: $ sudo ceph osd tree # id weight type name up/down reweight -1 12.74 root default -2 3.64 host osd13 0 1.82 osd.0 down 0 2 1.82 osd.2 down 0 -3 5.46 host osd12 1 1.82 osd

[ceph-users] Unable to bring cluster up

2014-04-29 Thread Gandalf Corvotempesta
After a simple "service ceph restart" on a server, i'm unable to get my cluster up again http://pastebin.com/raw.php?i=Wsmfik2M suddenly, some OSDs goes UP and DOWN randomly. I don't see any network traffic on cluster interface. How can I detect what ceph is doing ? From the posted output there i

[ceph-users] pgmap version increasing

2014-04-30 Thread Gandalf Corvotempesta
I'm testing an idle ceph cluster. my pgmap version is always increasing, is this normal ? 2014-04-30 17:20:41.934127 mon.0 [INF] pgmap v281: 640 pgs: 640 active+clean; 0 bytes data, 333 MB used, 14896 GB / 14896 GB avail 2014-04-30 17:20:42.962033 mon.0 [INF] pgmap v282: 640 pgs: 640 active+clean;

Re: [ceph-users] Unable to bring cluster up

2014-04-30 Thread Gandalf Corvotempesta
2014-04-30 22:11 GMT+02:00 Andrey Korolyov : > regarding this one and previous you told about memory consumption - > there are too much PGs, so memory consumption is so high as you are > observing. Dead loop of osd-never-goes-up is probably because of > suicide timeout of internal queues. It is may

Re: [ceph-users] Red Hat to acquire Inktank

2014-04-30 Thread Gandalf Corvotempesta
2014-04-30 14:18 GMT+02:00 Sage Weil : > Today we are announcing some very big news: Red Hat is acquiring Inktank. Great news. Any changes to get native Infiniband support in ceph like in GlusterFS ? ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Red Hat to acquire Inktank

2014-04-30 Thread Gandalf Corvotempesta
2014-04-30 22:27 GMT+02:00 Mark Nelson : > Check out the xio work that the linuxbox/mellanox folks are working on. > Matt Benjamin has posted quite a bit of info to the list recently! Is that usable ? ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Red Hat to acquire Inktank

2014-04-30 Thread Gandalf Corvotempesta
2014-05-01 0:11 GMT+02:00 Mark Nelson : > Usable is such a vague word. I imagine it's testable after a fashion. :D Ok but I prefere an "official" support with IB integrated in main ceph repo ___ ceph-users mailing list ceph-users@lists.ceph.com http://l

Re: [ceph-users] Red Hat to acquire Inktank

2014-04-30 Thread Gandalf Corvotempesta
2014-05-01 0:20 GMT+02:00 Matt W. Benjamin : > Hi, > > Sure, that's planned for integration in Giant (see Blueprints). Great. Any ETA? Firefly was planned for February :) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinf

[ceph-users] Replace journals disk

2014-05-06 Thread Gandalf Corvotempesta
Hi to all, I would like to replace a disk used as journal (one partition for each OSD) Which is the safest method to do so? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Replace journals disk

2014-05-06 Thread Gandalf Corvotempesta
2014-05-06 12:39 GMT+02:00 Andrija Panic : > Good question - I'm also interested. Do you want to movejournal to dedicated > disk/partition i.e. on SSD or just replace (failed) disk with new/bigger one > ? I would like to replace the disk with a bigger one (in fact, my new disk is smaller, but this

Re: [ceph-users] Replace journals disk

2014-05-06 Thread Gandalf Corvotempesta
2014-05-06 13:08 GMT+02:00 Dan Van Der Ster : > I've followed this recipe successfully in the past: > > http://wiki.skytech.dk/index.php/Ceph_-_howto,_rbd,_lvm,_cluster#Add.2Fmove_journal_in_running_cluster I'll try but my ceph.conf doesn't have any "osd journal" setting set (i'm using ceph-ansibl

Re: [ceph-users] Replace journals disk

2014-05-06 Thread Gandalf Corvotempesta
2014-05-06 14:09 GMT+02:00 Fred Yang : > The journal location is not in ceph.conf, check > /var/lib/ceph/osd/ceph-X/journal, which is a symlink to the osd's journal > device. Symlink are pointing to partition UUID this prevent the replacement without manual intervetion: journal -> /dev/disk/by-pa

Re: [ceph-users] Replace journals disk

2014-05-06 Thread Gandalf Corvotempesta
2014-05-06 16:33 GMT+02:00 Gandalf Corvotempesta : > Symlink are pointing to partition UUID this prevent the replacement > without manual intervetion: > > journal -> /dev/disk/by-partuuid/b234da10-dcad-40c7-aa97-92d35099e5a4 > > is not possible to create symlink pointing t

Re: [ceph-users] Replace journals disk

2014-05-06 Thread Gandalf Corvotempesta
2014-05-06 19:40 GMT+02:00 Craig Lewis : > I haven't tried this yet, but I imagine that the process is similar to > moving your journal from the spinning disk to an SSD. My journals are on SSD. I have to replace that SSD. ___ ceph-users mailing list ceph

[ceph-users] Cache tiering

2014-05-07 Thread Gandalf Corvotempesta
Very simple question: what happen if server bound to the cache pool goes down? For example, a read-only cache could be archived by using a single server with no redudancy. Is ceph smart enough to detect that cache is unavailable and transparently redirect all request to the main pool as usual ? Th

Re: [ceph-users] Replace journals disk

2014-05-08 Thread Gandalf Corvotempesta
2014-05-08 18:43 GMT+02:00 Indra Pramana : > Since we don't use ceph.conf to indicate the data and journal paths, how can > I recreate the journal partitions? 1. Dump the partition scheme: sgdisk --backup=/tmp/journal_table /dev/sdd 2. Replace the journal disk device 3. Restore the old partition

[ceph-users] Migrate whole clusters

2014-05-09 Thread Gandalf Corvotempesta
Let's assume a test cluster up and running with real data on it. Which is the best way to migrate everything to a production (and larger) cluster? I'm thinking to add production MONs to the test cluster, after that, add productions OSDs to the test cluster, waiting for a full rebalance and then st

Re: [ceph-users] Replace journals disk

2014-05-09 Thread Gandalf Corvotempesta
2014-05-09 15:55 GMT+02:00 Sage Weil : > This looks correct to me! Some command to automate this in ceph would be nice. For example, skipping the "mkjournal" step: ceph-osd -i 30 --mkjournal ceph-osd -i 31 --mkjournal ceph should be smarth enough to automatically make journals if missing so that

Re: [ceph-users] Migrate whole clusters

2014-05-13 Thread Gandalf Corvotempesta
2014-05-13 21:21 GMT+02:00 Gregory Farnum : > You misunderstand. Migrating between machines for incrementally > upgrading your hardware is normal behavior and well-tested (likewise > for swapping in all-new hardware, as long as you understand the IO > requirements involved). So is decommissioning o

[ceph-users] Disaster recovery and backups

2016-06-05 Thread Gandalf Corvotempesta
Let's assume that everything went very very bad and i have to manually recover a cluster with an unconfigured ceph. 1. How can i recover datas directly from raw disks? Is this possible? 2. How can i restore a ceph cluster (and have data back) by using existing disks? 3. How do you manage backups

[ceph-users] Disk failures

2016-06-07 Thread Gandalf Corvotempesta
Hi, How ceph detect and manage disk failures? What happens if some data are wrote on a bad sector? Are there any change to get the bad sector "distributed" across the cluster due to the replication? Is ceph able to remove the OSD bound to the failed disk automatically? __

Re: [ceph-users] Disk failures

2016-06-08 Thread Gandalf Corvotempesta
2016-06-08 20:49 GMT+02:00 Krzysztof Nowicki : > From my own experience with failing HDDs I've seen cases where the drive was > failing silently initially. This manifested itself in repeated deep scrub > failures. Correct me if I'm wrong here, but Ceph keeps checksums of data > being written and in

Re: [ceph-users] Disk failures

2016-06-08 Thread Gandalf Corvotempesta
Il 09 giu 2016 02:09, "Christian Balzer" ha scritto: > Ceph currently doesn't do any (relevant) checksumming at all, so if a > PRIMARY PG suffers from bit-rot this will be undetected until the next > deep-scrub. > > This is one of the longest and gravest outstanding issues with Ceph and > supposed

Re: [ceph-users] Disk failures

2016-06-09 Thread Gandalf Corvotempesta
2016-06-09 9:16 GMT+02:00 Christian Balzer : > Neither, a journal failure is lethal for the OSD involved and unless you > have LOTS of money RAID1 SSDs are a waste. Ok, so if a journal failure is lethal, ceph automatically remove the affected OSD and start rebalance, right ? > Additionally your c

[ceph-users] RDMA/Infiniband status

2016-06-09 Thread Gandalf Corvotempesta
Last time i've used Ceph (about 2014) RDMA/Infiniband support was just a proof of concept and I was using IPoIB with low performance (about 8-10GB/s on a Infiniband DDR 20Gb/s) This was 2 years ago. Any news about this? Is RDMA/Infiniband supported like with GlusterFS?

Re: [ceph-users] RDMA/Infiniband status

2016-06-09 Thread Gandalf Corvotempesta
2016-06-09 10:18 GMT+02:00 Christian Balzer : > IPoIB is about half the speed of your IB layer, yes. Ok, so it's normal. I've seen benchmarks on net stating that IPoIB on DDR should reach about 16-17Gb/s I'll plan to move to QDR > And bandwidth is (usually) not the biggest issue, latency is. I'v

Re: [ceph-users] Disk failures

2016-06-09 Thread Gandalf Corvotempesta
2016-06-09 10:28 GMT+02:00 Christian Balzer : > Define "small" cluster. Max 14 OSD nodes with 12 disks each, replica 3. > Your smallest failure domain both in Ceph (CRUSH rules) and for > calculating how much over-provisioning you need should always be the > node/host. > This is the default CRUSH

Re: [ceph-users] RDMA/Infiniband status

2016-06-09 Thread Gandalf Corvotempesta
Il 09 giu 2016 15:41, "Adam Tygart" ha scritto: > > If you're > using pure DDR, you may need to tune the broadcast group in your > subnet manager to set the speed to DDR. Do you know how to set this with opensm? I would like to bring up my test cluster again next days

Re: [ceph-users] Disk failures

2016-06-14 Thread Gandalf Corvotempesta
Il 15 giu 2016 03:27, "Christian Balzer" ha scritto: > And that makes deep-scrubbing something of quite limited value. This is not true. If you checksum *before* writing to disk (so when data is still in ram) then when reading back from disk you could do the checksum verification and if doesn't m

Re: [ceph-users] Disk failures

2016-06-15 Thread Gandalf Corvotempesta
Il 15 giu 2016 09:42, "Christian Balzer" ha scritto: > > This is why people are using BTRFS and ZFS for filestore (despite the > problems they in turn create) and why the roadmap for bluestore has > checksums for reads on it as well (or so we've been told). Bitrot happens only on files? what abou

Re: [ceph-users] Disk failures

2016-06-15 Thread Gandalf Corvotempesta
Il 15 giu 2016 09:58, "Christian Balzer" ha scritto > You _do_ know how and where Ceph/RBD store their data? > > Right now that's on disks/SSDs, formated with a file system. > And XFS or EXT4 will not protect against bitrot, while BTRFS and ZFS will. > Wait, I'm new to ceph and some things are no

[ceph-users] Switches and latency

2016-06-15 Thread Gandalf Corvotempesta
Let's assume a fully redundant network. We need 4 switches, 2 for the public network, 2 for the cluster network. 10GBase-T has higher latency than SFP+ but are also cheaper, as manu new servers ha 10GBaseT integrated onboard and there is no need for twinax cables or transceaver. I think that low

Re: [ceph-users] Switches and latency

2016-06-15 Thread Gandalf Corvotempesta
2016-06-15 22:13 GMT+02:00 Nick Fisk : > I would reconsider if you need separate switches for each network, vlans > would normally be sufficient. If bandwidth is not an issue, you could even > tag both vlans over the same uplinks. Then there is the discussion around > whether separate networks are

Re: [ceph-users] Switches and latency

2016-06-15 Thread Gandalf Corvotempesta
2016-06-15 22:59 GMT+02:00 Nick Fisk : > Possibly, but by how much? 20GB of bandwidth is a lot to feed 12x7.2k disks, > particularly if they start doing any sort of non-sequential IO. Assuming 100MB/s for each SATA disk, 12 disks are 1200MB/s = 9600mbit/s Why are you talking about 20Gb/s ? By usi

Re: [ceph-users] Switches and latency

2016-06-16 Thread Gandalf Corvotempesta
2016-06-16 3:53 GMT+02:00 Christian Balzer : > Gandalf, first read: > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg29546.html > > And this thread by Nick: > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg29708.html Interesting reading. Thanks. > Overly optimistic. > In an

Re: [ceph-users] Switches and latency

2016-06-16 Thread Gandalf Corvotempesta
2016-06-16 12:54 GMT+02:00 Oliver Dzombic : > aside from the question of the coolness factor of Infinitiband, > you should always also consider the question of replacing parts and > extending cluster. > > A 10G Network environment is up to date currently, and will be for some > more years. You can

[ceph-users] IOPS requirements

2016-06-17 Thread Gandalf Corvotempesta
As I'm planning a new cluster where to move all my virtual machine (currently on local storage on each hypervisor) i would like to evaluate the current IOPS on each server Knowing the current iops i'll be able to know how many iops i need on ceph I'm not an expert, do know know how to get this in

Re: [ceph-users] IOPS requirements

2016-06-17 Thread Gandalf Corvotempesta
2016-06-17 10:03 GMT+02:00 Christian Balzer : > I'm unfamilar with Xen and Xenserver (the later doesn't support RBD, btw), > but if you can see all the combined activity of your VMs on your HW in the > dom0 like with KVM/qemu, a simple "iostat" or "iostat -x" will give you the > average IOPS of a d

Re: [ceph-users] IOPS requirements

2016-06-20 Thread Gandalf Corvotempesta
Il 18 giu 2016 07:10, "Christian Balzer" ha scritto: > That sounds extremely high, is that more or less consistent? > How many VMs is that for? > What are you looking at, as in are those individual disks/SSDs, a raid > (what kind)? 800-1000 was a peak in a about 5 minutes. it was just a test to s

Re: [ceph-users] RadosGW Admin API

2013-02-27 Thread Gandalf Corvotempesta
2013/2/26 Yehuda Sadeh : > The admin endpoint is 'admin' by default. You set it through the 'rgw > admin entry' configurable. What do you mean with "endpoint"? Actually I'm able to get usages stats (after adding the usage caps in read only to my users) from "bucket" admin: GET /admin/usage Host:

  1   2   >