Re: [ceph-users] Ceph migration to AWS

2015-05-04 Thread Kyle Bader
> To those interested in a tricky problem, > > We have a Ceph cluster running at one of our data centers. One of our > client's requirements is to have them hosted at AWS. My question is: How do > we effectively migrate our data on our internal Ceph cluster to an AWS Ceph > cluster? > > Ideas curre

Re: [ceph-users] xfs/nobarrier

2014-12-27 Thread Kyle Bader
> do people consider a UPS + Shutdown procedures a suitable substitute? I certainly wouldn't, I've seen utility power fail and the transfer switch fail to transition to UPS strings. Had this happened to me with nobarrier it would have been a very sad day. --

Re: [ceph-users] private network - VLAN vs separate switch

2014-11-26 Thread Kyle Bader
> Thanks for all the help. Can the moving over from VLAN to separate > switches be done on a live cluster? Or does there need to be a down > time? You can do it on a life cluster. The more cavalier approach would be to quickly switch the link over one server at a time, which might cause a short io

Re: [ceph-users] private network - VLAN vs separate switch

2014-11-25 Thread Kyle Bader
> For a large network (say 100 servers and 2500 disks), are there any > strong advantages to using separate switch and physical network > instead of VLAN? Physical isolation will ensure that congestion on one does not affect the other. On the flip side, asymmetric network failures tend to be more

Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-08-06 Thread Kyle Bader
> Can you paste me the whole output of the install? I am curious why/how you > are getting el7 and el6 packages. priority=1 required in /etc/yum.repos.d/ceph.repo entries -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph

Re: [ceph-users] Is OSDs based on VFS?

2014-07-21 Thread Kyle Bader
> I wonder that OSDs use system calls of Virtual File System (i.e. open, read, > write, etc) when they access disks. > > I mean ... Could I monitor I/O command requested by OSD to disks if I > monitor VFS? Ceph OSDs run on top of a traditional filesystem, so long as they support xattrs - xfs by de

Re: [ceph-users] Bypass Cache-Tiering for special reads (Backups)

2014-07-02 Thread Kyle Bader
> I was wondering, having a cache pool in front of an RBD pool is all fine > and dandy, but imagine you want to pull backups of all your VMs (or one > of them, or multiple...). Going to the cache for all those reads isn't > only pointless, it'll also potentially fill up the cache and possibly > evi

Re: [ceph-users] Journal SSD durability

2014-05-13 Thread Kyle Bader
> TL;DR: Power outages are more common than your colo facility will admit. Seconded. I've seen power failures in at least 4 different facilities and all of them had the usual gamut of batteries/generators/etc. Some of those facilities I've seen problems multiple times in a single year. Even a data

Re: [ceph-users] Migrate whole clusters

2014-05-13 Thread Kyle Bader
> Anyway replacing set of monitors means downtime for every client, so > I`m in doubt if 'no outage' word is still applicable there. Taking the entire quorum down for migration would be bad. It's better to add one in the new location, remove one at the old, ad infinitum. -- Kyle ___

Re: [ceph-users] Migrate whole clusters

2014-05-09 Thread Kyle Bader
> Let's assume a test cluster up and running with real data on it. > Which is the best way to migrate everything to a production (and > larger) cluster? > > I'm thinking to add production MONs to the test cluster, after that, > add productions OSDs to the test cluster, waiting for a full rebalance

Re: [ceph-users] SSDs: cache pool/tier versus node-local block cache

2014-04-17 Thread Kyle Bader
>> >> I think the timing should work that we'll be deploying with Firefly and >> >> so >> >> have Ceph cache pool tiering as an option, but I'm also evaluating >> >> Bcache >> >> versus Tier to act as node-local block cache device. Does anybody have >> >> real >> >> or anecdotal evidence about whic

Re: [ceph-users] SSDs: cache pool/tier versus node-local block cache

2014-04-16 Thread Kyle Bader
>> Obviously the ssds could be used as journal devices, but I'm not really >> convinced whether this is worthwhile when all nodes have 1GB of hardware >> writeback cache (writes to journal and data areas on the same spindle have >> time to coalesce in the cache and minimise seek time hurt). Any adv

Re: [ceph-users] question on harvesting freed space

2014-04-15 Thread Kyle Bader
> I'm assuming Ceph/RBD doesn't have any direct awareness of this since > the file system doesn't traditionally have a "give back blocks" > operation to the block device. Is there anything special RBD does in > this case that communicates the release of the Ceph storage back to the > pool? VMs ru

Re: [ceph-users] 答复: 答复: why object can't be recovered when delete one replica

2014-03-24 Thread Kyle Bader
> I have run the repair command, and the warning info disappears in the output of "ceph health detail", but the replicas isn't recovered in the "current" directory. > In all, the ceph cluster status can recover (the pg's status recover from inconsistent to active and clean), but not the replica. I

Re: [ceph-users] Error initializing cluster client: Error

2014-03-22 Thread Kyle Bader
> I have two nodes with 8 OSDs on each. First node running 2 monitors on > different virtual machines (mon.1 and mon.2), second node runing mon.3 > After several reboots (I have tested power failure scenarios) "ceph -w" on > node 2 always fails with message: > > root@bes-mon3:~# ceph --verbose -w

Re: [ceph-users] why object can't be recovered when delete one replica

2014-03-22 Thread Kyle Bader
> I upload a file through swift API, then delete it in the “current” directory > in the secondary OSD manually, why the object can’t be recovered? > > If I delete it in the primary OSD, the object is deleted directly in the > pool .rgw.bucket and it can’t be recovered from the secondary OSD. > > Do

Re: [ceph-users] Mounting with dmcrypt still fails

2014-03-22 Thread Kyle Bader
> ceph-disk-prepare --fs-type xfs --dmcrypt --dmcrypt-key-dir > /etc/ceph/dmcrypt-keys --cluster ceph -- /dev/sdb > ceph-disk: Error: Device /dev/sdb2 is in use by a device-mapper mapping > (dm-crypt?): dm-0 It sounds like device-mapper still thinks it's using the the volume, you might be able t

Re: [ceph-users] osd rebalance question

2014-03-22 Thread Kyle Bader
> I need to add a extend server, which reside several osds, to a > running ceph cluster. During add osds, ceph would not automatically modify > the ceph.conf. So I manually modify the ceph.conf > > And restart the whole ceph cluster with command: ’service ceph –a restart’. > I just confuse

Re: [ceph-users] OSD + FlashCache vs. Cache Pool for RBD...

2014-03-22 Thread Kyle Bader
> One downside of the above arrangement: I read that support for mapping > newer-format RBDs is only present in fairly recent kernels. I'm running > Ubuntu 12.04 on the cluster at present with its stock 3.2 kernel. There > is a PPA for the 3.11 kernel used in Ubuntu 13.10, but if you're looking >

Re: [ceph-users] What's the difference between using /dev/sdb and /dev/sdb1 as osd?

2014-03-22 Thread Kyle Bader
> If I want to use a disk dedicated for osd, can I just use something like > /dev/sdb instead of /dev/sdb1? Is there any negative impact on performance? You can pass /dev/sdb to ceph-disk-prepare and it will create two partitions, one for the journal (raw partition) and one for the data volume (de

Re: [ceph-users] if partition name changes, will ceph get corrupted?

2014-03-12 Thread Kyle Bader
> We use /dev/disk/by-path for this reason, but we confirmed that is stable > for our HBAs. Maybe /dev/disk/by-something is consistent with your > controller. The upstart/udev scripts will handle mounting and osd id detection, at least on Ubuntu. -- Kyle

Re: [ceph-users] Put Ceph Cluster Behind a Pair of LB

2014-03-12 Thread Kyle Bader
> This is in my lab. Plain passthrough setup with automap enabled on the F5. s3 > & curl work fine as far as queries go. But file transfer rate degrades badly > once I start file up/download. Maybe the difference can be attributed to LAN client traffic with jumbo frames vs F5 using a smaller WAN

Re: [ceph-users] Put Ceph Cluster Behind a Pair of LB

2014-03-12 Thread Kyle Bader
> You're right. Sorry didn't specify I was trying this for Radosgw. Even for > this I'm seeing performance degrade once my clients start to hit the LB VIP. Could you tell us more about your load balancer and configuration? -- Kyle ___ ceph-users ma

Re: [ceph-users] Put Ceph Cluster Behind a Pair of LB

2014-03-12 Thread Kyle Bader
> Anybody has a good practice on how to set up a ceph cluster behind a pair of > load balancer? The only place you would want to put a load balancer in the context of a Ceph cluster would be north of RGW nodes. You can do L3 transparent load balancing or balance with a L7 proxy, ie Linux Virtual

Re: [ceph-users] qemu-rbd

2014-03-11 Thread Kyle Bader
> I tried rbd-fuse and it's throughput using fio is approx. 1/4 that of the > kernel client. > > Can you please let me know how to setup RBD backend for FIO? I'm assuming > this RBD backend is also based on librbd? You will probably have to build fio from source since the rbd engine is new: htt

Re: [ceph-users] Utilizing DAS on XEN or XCP hosts for Openstack Cinder

2014-03-11 Thread Kyle Bader
> 1. Is it possible to install Ceph and Ceph monitors on the the XCP > (XEN) Dom0 or would we need to install it on the DomU containing the > Openstack components? I'm not a Xen guru but in the case of KVM I would run the OSDs on the hypervisor to avoid virtualization overhead. > 2. I

Re: [ceph-users] Encryption/Multi-tennancy

2014-03-11 Thread Kyle Bader
> There could be millions of tennants. Looking deeper at the docs, it looks > like Ceph prefers to have one OSD per disk. We're aiming at having > backblazes, so will be looking at 45 OSDs per machine, many machines. I want > to separate the tennants and separately encrypt their data. The enc

Re: [ceph-users] Recommended node size for Ceph

2014-03-10 Thread Kyle Bader
> Why the limit of 6 OSDs per SSD? SATA/SAS throughput generally. > I am doing testing with a PCI-e based SSD, and showing that even with 15 OSD disk drives per SSD that the SSD is keeping up. That will probably be fine performance wise but it's worth noting that all OSDs will fail if the flash

Re: [ceph-users] Encryption/Multi-tennancy

2014-03-10 Thread Kyle Bader
> Ceph is seriously badass, but my requirements are to create a cluster in > which I can host my customer's data in separate areas which are independently > encrypted, with passphrases which we as cloud admins do not have access to. > > My current thoughts are: > 1. Create an OSD per machine stre

Re: [ceph-users] Running a mon on a USB stick

2014-03-08 Thread Kyle Bader
> Is there an issue with IO performance? Ceph monitors store cluster maps and various other things in leveldb, which persists to disk. I wouldn't recommend using a sd/usb cards for the monitor store because they tend to be slow and have poor durability. -- Kyle _

Re: [ceph-users] questions about ceph cluster in multi-dacenter

2014-02-20 Thread Kyle Bader
> What could be the best replication ? Are you using two sites to increase availability, durability, or both? For availability your really better off using three sites and use CRUSH to place each of three replicas in a different datacenter. In this setup you can survive losing 1 of 3 datacenters.

Re: [ceph-users] How client choose among replications?

2014-02-11 Thread Kyle Bader
> Why would it help? Since it's not that ONE OSD will be primary for all objects. There will be 1 Primary OSD per PG and you'll probably have a couple of thousands PGs. The primary may be across a oversubscribed/expensive link, in which case a local replica with a common ancestor to the client may

Re: [ceph-users] poor data distribution

2014-02-01 Thread Kyle Bader
> Change pg_num for .rgw.buckets to power of 2, an 'crush tunables > optimal' didn't help :( Did you bump pgp_num as well? The split pgs will stay in place until pgp_num is bumped as well, if you do this be prepared for (potentially lots) of data movement. _

Re: [ceph-users] RADOS Gateway Issues

2014-01-23 Thread Kyle Bader
> HEALTH_WARN 1 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck unclean; 7 requests are blocked > 32 sec; 3 osds have slow requests; pool cloudstack has too few pgs; pool .rgw.buckets has too few pgs > pg 14.0 is stuck inactive since forever, current state incomplete, last acting [5,0

Re: [ceph-users] Power Cycle Problems

2014-01-16 Thread Kyle Bader
gt; wants to use Ceph for VM storage in the future, we need to find a solution. That's a shame, but at least you will be better prepared if it happens again, hopefully your luck is not as unfortunate as mine! -- Kyle Bader ___ ceph-users mailin

Re: [ceph-users] Networking questions

2013-12-26 Thread Kyle Bader
> Do monitors have to be on the cluster network as well or is it sufficient > for them to be on the public network as > http://ceph.com/docs/master/rados/configuration/network-config-ref/ > suggests? Monitors only need to be on the public network. > Also would the OSDs re-route their traffic over

Re: [ceph-users] Failure probability with largish deployments

2013-12-26 Thread Kyle Bader
> Yes, that also makes perfect sense, so the aforementioned 12500 objects > for a 50GB image, at a 60 TB cluster/pool with 72 disk/OSDs and 3 way > replication that makes 2400 PGs, following the recommended formula. > >> > What amount of disks (OSDs) did you punch in for the following run? >> >> Di

Re: [ceph-users] Failure probability with largish deployments

2013-12-23 Thread Kyle Bader
> Is an object a CephFS file or a RBD image or is it the 4MB blob on the > actual OSD FS? Objects are at the RADOS level, CephFS filesystems, RBD images and RGW objects are all composed by striping RADOS objects - default is 4MB. > In my case, I'm only looking at RBD images for KVM volume storage

Re: [ceph-users] Failure probability with largish deployments

2013-12-20 Thread Kyle Bader
Using your data as inputs to in the Ceph reliability calculator [1] results in the following: Disk Modeling Parameters size: 3TiB FIT rate:826 (MTBF = 138.1 years) NRE rate:1.0E-16 RAID parameters replace: 6 hours recovery rate: 500MiB/s (100 mi

Re: [ceph-users] Ceph network topology with redundant switches

2013-12-20 Thread Kyle Bader
> The area I'm currently investigating is how to configure the > networking. To avoid a SPOF I'd like to have redundant switches for > both the public network and the internal network, most likely running > at 10Gb. I'm considering splitting the nodes in to two separate racks > and connecting each

Re: [ceph-users] radosgw daemon stalls on download of some files

2013-12-19 Thread Kyle Bader
> Do you have any futher detail on this radosgw bug? https://github.com/ceph/ceph/commit/0f36eddbe7e745665a634a16bf3bf35a3d0ac424 https://github.com/ceph/ceph/commit/0b9dc0e5890237368ba3dc34cb029010cb0b67fd > Does it only apply to emperor? The bug is present in dumpling too.

Re: [ceph-users] Rbd image performance

2013-12-15 Thread Kyle Bader
>> Has anyone tried scaling a VMs io by adding additional disks and >> striping them in the guest os? I am curious what effect this would have >> on io performance? > Why would it? You can also change the stripe size of the RBD image. Depending on the workload you might change it from 4MB to some

[ceph-users] SysAdvent: Day 15 - Distributed Storage with Ceph

2013-12-15 Thread Kyle Bader
For you holiday pleasure I've prepared a SysAdvent article on Ceph: http://sysadvent.blogspot.com/2013/12/day-15-distributed-storage-with-ceph.html Check it out! -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/list

Re: [ceph-users] CEPH and Savanna Integration

2013-12-14 Thread Kyle Bader
> Introduction of Savanna for those haven't heard of it: > > Savanna project aims to provide users with simple means to provision a > Hadoop > > cluster at OpenStack by specifying several parameters like Hadoop version, > cluster > > topology, nodes hardware details and a few more. > > For now, Sav

[ceph-users] NUMA and ceph

2013-12-12 Thread Kyle Bader
It seems that NUMA can be problematic for ceph-osd daemons in certain circumstances. Namely it seems that if a NUMA zone is running out of memory due to uneven allocation it is possible for a NUMA zone to enter reclaim mode when threads/processes are scheduled on a core in that zone and those proce

Re: [ceph-users] ceph reliability in large RBD setups

2013-12-10 Thread Kyle Bader
> I've been running similar calculations recently. I've been using this > tool from Inktank to calculate RADOS reliabilities with different > assumptions: > https://github.com/ceph/ceph-tools/tree/master/models/reliability > > But I've also had similar questions about RBD (or any multi-part files

Re: [ceph-users] Anybody doing Ceph for OpenStack with OSDs across compute/hypervisor nodes?

2013-12-09 Thread Kyle Bader
> We're running OpenStack (KVM) with local disk for ephemeral storage. > Currently we use local RAID10 arrays of 10k SAS drives, so we're quite rich > for IOPS and have 20GE across the board. Some recent patches in OpenStack > Havana make it possible to use Ceph RBD as the source of ephemeral VM >

Re: [ceph-users] optimal setup with 4 x ethernet ports

2013-12-06 Thread Kyle Bader
> looking at tcpdump all the traffic is going exactly where it is supposed to > go, in particular an osd on the 192.168.228.x network appears to talk to an > osd on the 192.168.229.x network without anything strange happening. I was > just wondering if there was anything about ceph that could ma

Re: [ceph-users] optimal setup with 4 x ethernet ports

2013-12-04 Thread Kyle Bader
>> Is having two cluster networks like this a supported configuration? Every >> osd and mon can reach every other so I think it should be. > > Maybe. If your back end network is a supernet and each cluster network is a > subnet of that supernet. For example: > > Ceph.conf cluster network (supernet)

Re: [ceph-users] optimal setup with 4 x ethernet ports

2013-12-02 Thread Kyle Bader
> Is having two cluster networks like this a supported configuration? Every osd and mon can reach every other so I think it should be. Maybe. If your back end network is a supernet and each cluster network is a subnet of that supernet. For example: Ceph.conf cluster network (supernet): 10.0.0.0/8

Re: [ceph-users] Impact of fancy striping

2013-11-30 Thread Kyle Bader
> This journal problem is a bit of wizardry to me, I even had weird intermittent issues with OSDs not starting because the journal was not found, so please do not hesitate to suggest a better journal setup. You mentioned using SAS for journal, if your OSDs are SATA and a expander is in the data pa

Re: [ceph-users] installing OS on software RAID

2013-11-30 Thread Kyle Bader
> > Is the OS doing anything apart from ceph? Would booting a ramdisk-only system from USB or compact flash work? I haven't tested this kind of configuration myself but I can't think of anything that would preclude this type of setup. I'd probably use sqashfs layered with a tmpfs via aufs to avoid

Re: [ceph-users] 回复:Re: testing ceph performance issue

2013-11-27 Thread Kyle Bader
> How much performance can be improved if use SSDs to storage journals? You will see roughly twice the throughput unless you are using btrfs (still improved but not as dramatic). You will also see lower latency because the disk head doesn't have to seek back and forth between journal and data par

Re: [ceph-users] OSD on an external, shared device

2013-11-26 Thread Kyle Bader
> Is there any way to manually configure which OSDs are started on which > machines? The osd configuration block includes the osd name and host, so is > there a way to say that, say, osd.0 should only be started on host vashti > and osd.1 should only be started on host zadok? I tried using thi

Re: [ceph-users] installing OS on software RAID

2013-11-25 Thread Kyle Bader
Several people have reported issues with combining OS and OSD journals on the same SSD drives/RAID due to contention. If you do something like this I would definitely test to make sure it meets your expectations. Ceph logs are going to compose the majority of the writes to the OS storage devices.

Re: [ceph-users] misc performance tuning queries (related to OpenStack in particular)

2013-11-19 Thread Kyle Bader
> So quick correction based on Michael's response. In question 4, I should > have not made any reference to Ceph objects, since objects are not striped > (per Michael's response). Instead, I should simply have used the words "Ceph > VM Image" instead of "Ceph objects". A Ceph VM image would constit

Re: [ceph-users] Ceph performance

2013-11-15 Thread Kyle Bader
> We have the plan to run ceph as block storage for openstack, but from test > we found the IOPS is slow. > > Our apps primarily use the block storage for saving logs (i.e, nginx's > access logs). > How to improve this? There are a number of things you can do, notably: 1. Tuning cache on the hype

Re: [ceph-users] Today I’ve encountered multiple OSD down and multiple OSD won’t start and OSD disk access “Input/Output” error”

2013-11-15 Thread Kyle Bader
> 3).Comment out, #hashtag the bad OSD drives in the “/etc/fstab”. This is unnecessary if your using the provided upstart and udev scripts, OSD data devices will be identified by label and mounted. If you choose not to use the upstart and udev scripts then you should write init scripts that do si

Re: [ceph-users] Ceph User Committee

2013-11-07 Thread Kyle Bader
> Would this be something like > http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph-Brag ? Something very much like that :) -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph User Committee

2013-11-07 Thread Kyle Bader
> I think this is a great idea. One of the big questions users have is > "what kind of hardware should I buy." An easy way for users to publish > information about their setup (hardware, software versions, use-case, > performance) when they have successful deployments would be very valuable. > Ma

Re: [ceph-users] Running on disks that lose their head

2013-11-07 Thread Kyle Bader
> Zackc, Loicd, and I have been the main participants in a weekly Teuthology > call the past few weeks. We've talked mostly about methods to extend > Teuthology to capture performance metrics. Would you be willing to join us > during the Teuthology and Ceph-Brag sessions at the Firefly Developer >

Re: [ceph-users] ceph cluster performance

2013-11-07 Thread Kyle Bader
> ST240FN0021 connected via a SAS2x36 to a LSI 9207-8i. The problem might be SATA transport protocol overhead at the expander. Have you tried directly connecting the SSDs to SATA2/3 ports on the mainboard? -- Kyle ___ ceph-users mailing list ceph-user

Re: [ceph-users] radosgw questions

2013-11-07 Thread Kyle Bader
> 1. To build a high performance yet cheap radosgw storage, which pools should > be placed on ssd and which on hdd backed pools? Upon installation of > radosgw, it created the following pools: .rgw, .rgw.buckets, > .rgw.buckets.index, .rgw.control, .rgw.gc, .rgw.root, .usage, .users, > .users.email

Re: [ceph-users] Running on disks that lose their head

2013-11-07 Thread Kyle Bader
>> Once I know a drive has had a head failure, do I trust that the rest of the >> drive isn't going to go at an inconvenient moment vs just fixing it right >> now when it's not 3AM on Christmas morning? (true story) As good as Ceph >> is, do I trust that Ceph is smart enough to prevent spreadin

Re: [ceph-users] Ceph node Info

2013-10-30 Thread Kyle Bader
The quick start guide is linked below, it should help you hit the ground running. http://ceph.com/docs/master/start/quick-ceph-deploy/ Let us know if you have questions or bump into trouble! ___ ceph-users mailing list ceph-users@lists.ceph.com http://l

Re: [ceph-users] ceph recovery killing vms

2013-10-29 Thread Kyle Bader
Recovering from a degraded state by copying existing replicas to other OSDs is going to cause reads on existing replicas and writes to the new locations. If you have slow media then this is going to be felt more acutely. Tuning the backfill options I posted is one way to lessen the impact, another

Re: [ceph-users] changing journals post-bobcat?

2013-10-28 Thread Kyle Bader
The bobtail release added udev/upstart capabilities that allowed you to not have per OSD entries in ceph.conf. Under the covers the new udev/upstart scripts look for a special label on OSD data volumes, matching volumes are mounted and then a few files are inspected: journal_uuid whoami The jour

Re: [ceph-users] ceph recovery killing vms

2013-10-28 Thread Kyle Bader
You can change some OSD tunables to lower the priority of backfills: osd recovery max chunk: 8388608 osd recovery op priority: 2 In general a lower op priority means it will take longer for your placement groups to go from degraded to active+clean, the idea is to balance recover

Re: [ceph-users] Hardware: SFP+ or 10GBase-T

2013-10-24 Thread Kyle Bader
> I know that 10GBase-T has more delay then SFP+ with direct attached > cables (.3 usec vs 2.6 usec per link), but does that matter? Some > sites stay it is a huge hit, but we are talking usec, not ms, so I > find it hard to believe that it causes that much of an issue. I like > the lower cost and

Re: [ceph-users] CephFS & Project Manila (OpenStack)

2013-10-23 Thread Kyle Bader
>> This is going to get horribly ugly when you add neutron into the mix, so >> much so I'd consider this option a non-starter. If someone is using >> openvswitch to create network overlays to isolate each tenant I can't >> imagine this ever working. > > I'm not following here. Are this only needed

Re: [ceph-users] CephFS & Project Manila (OpenStack)

2013-10-23 Thread Kyle Bader
> Option 1) The service plugs your filesystem's IP into the VM's network > and provides direct IP access. For a shared box (like an NFS server) > this is fairly straightforward and works well (*everything* has a > working NFS client). It's more troublesome for CephFS, since we'd need > to include a

Re: [ceph-users] Rados bench result when increasing OSDs

2013-10-21 Thread Kyle Bader
Besides what Mark and Greg said it could be due to additional hops through network devices. What network devices are you using, what is the network topology and does your CRUSH map reflect the network topology? On Oct 21, 2013 9:43 AM, "Gregory Farnum" wrote: > On Mon, Oct 21, 2013 at 7:13 AM, Gu

Re: [ceph-users] Ceph configuration data sharing requirements

2013-10-17 Thread Kyle Bader
> > * The IP address of at least one MON in the Ceph cluster > If you configure nodes with a single monitor in the "mon hosts" directive then I believe your nodes will have issues if that one monitor goes down. With Chef I've gone back and forth between using Chef search and having monitors be dec

Re: [ceph-users] mounting RBD in linux containers

2013-10-17 Thread Kyle Bader
My first guess would be that it's due to LXC dropping capabilities, I'd investigate whether CAP_SYS_ADMIN is being dropped. You need CAP_SYS_ADMIN for mount and block ioctls, if the container doesn't have those privs a map will likely fail. Maybe try tracing the command with strace? On Thu, Oct 17

Re: [ceph-users] Speed limit on RadosGW?

2013-10-14 Thread Kyle Bader
I've personally saturated 1Gbps links on multiple radosgw nodes on a large cluster, if I remember correctly, Yehuda has tested it up into the 7Gbps range with 10Gbps gear. Could you describe your clusters hardware and connectivity? On Mon, Oct 14, 2013 at 3:34 AM, Chu Duc Minh wrote: > Hi sorry

Re: [ceph-users] Expanding ceph cluster by adding more OSDs

2013-10-10 Thread Kyle Bader
I've contracted and expanded clusters by up to a rack of 216 OSDs - 18 nodes, 12 drives each. New disks are configured with a CRUSH weight of 0 and I slowly add weight (0.1 to 0.01 increments), wait for the cluster to become active+clean and then add more weight. I was expanding after contraction

Re: [ceph-users] About Ceph SSD and HDD strategy

2013-10-10 Thread Kyle Bader
ges to implement a faster tier > via SSD. > > -- > Warren > > On Oct 9, 2013, at 5:52 PM, Kyle Bader wrote: > > Journal on SSD should effectively double your throughput because data will > not be written to the same device twice to ensure transactional integrity. > Additional

Re: [ceph-users] Same journal device for multiple OSDs?

2013-10-09 Thread Kyle Bader
You can certainly use a similarly named device to back an OSD journal if the OSDs are on separate hosts. If you want to take a single SSD device and utilize it as a journal for many OSDs on the same machine then you would want to partition the SSD device and use a different partition for each OSD j

Re: [ceph-users] About Ceph SSD and HDD strategy

2013-10-09 Thread Kyle Bader
Journal on SSD should effectively double your throughput because data will not be written to the same device twice to ensure transactional integrity. Additionally, by placing the OSD journal on an SSD you should see less latency, the disk head no longer has to seek back and forth between the journa