Re: [ceph-users] Radosgw Timeout

2014-05-22 Thread Craig Lewis
ery unhealthy. If you're doing a lot more than that, say 10M or 100M objects, then that could cause a hot spot on disk. You might be better off taking your "directories", and putting them in their own bucket. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Emai

Re: [ceph-users] slow requests

2014-05-23 Thread Craig Lewis
d. Check osd.2 logs, and check any osd that are blocking osd.2. If your cluster is small, it might be faster to just check all disks instead of following the trail. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com <mailto:cle...@centralde

Re: [ceph-users] osd pool default pg num problem

2014-05-23 Thread Craig Lewis
users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@ce

Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Craig Lewis
it, but they're such a pain that I never bothered. And I have the scripts I used to use to make LVM snapshots of MySQL data directories. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com> *Cent

Re: [ceph-users] Radosgw Timeout

2014-05-23 Thread Craig Lewis
On 5/23/14 03:47 , Georg Höllrigl wrote: On 22.05.2014 17:30, Craig Lewis wrote: On 5/22/14 06:16 , Georg Höllrigl wrote: I have created one bucket that holds many small files, separated into different "directories". But whenever I try to acess the bucket, I only run into some ti

Re: [ceph-users] Questions about zone and disater recovery

2014-05-23 Thread Craig Lewis
esn't have those objects. I don't know what would happen if you created one of those buckets or objects on the master. Maybe replication breaks, or maybe it just overwrites the data in the slave. That's a lot of "in theory" though. I wouldn't attempt it wi

Re: [ceph-users] How to backup mon-data?

2014-05-27 Thread Craig Lewis
dirty shutdown. It would be better to stop the monitor, snapshot, and start the monitor. It shouldn't cause any problems if you don't, and I wouldn't bother. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com <mailto:cle...@centr

Re: [ceph-users] ceph-deploy or manual?

2014-05-27 Thread Craig Lewis
ort for that statement. I am a recent convert to Config Management. I might be a bit of a zealot, but I don't plan to manage nodes by hand ever again. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com&

Re: [ceph-users] Is there a way to repair placement groups? [Offtopic - ZFS]

2014-05-27 Thread Craig Lewis
er from when parity calculations needed dedicated hardware. I won't be building any more ZFS RAID10 arrays. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com> *Central Desktop. Work together in ways you never

Re: [ceph-users] Ceph-deploy to deploy osds simultaneously

2014-05-27 Thread Craig Lewis
__ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com> *Central Desktop. Work together in ways you nev

Re: [ceph-users] 70+ OSD are DOWN and not coming up

2014-05-27 Thread Craig Lewis
On 5/22/14 00:26 , Craig Lewis wrote: On 5/21/14 21:15 , Sage Weil wrote: On Wed, 21 May 2014, Craig Lewis wrote: If you do this over IRC, can you please post a summary to the mailling list? I believe I'm having this issue as well. In the other case, we found that some of the OSDs

Re: [ceph-users] How to implement a rados plugin to encode/decode data while r/w

2014-05-28 Thread Craig Lewis
curity does. cryptsetup looks like only AES256 is compiled in Ubuntu. If you need stronger crypto, I'm sure it's available with a bit more effort. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>

Re: [ceph-users] Is there a way to repair placement groups? [Offtopic - ZFS]

2014-05-28 Thread Craig Lewis
ris: "I'm good enough that I don't need those things anymore." Hence my assertion to re-test things you think you know. :-) -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>

Re: [ceph-users] someone using btrfs with ceph

2014-05-28 Thread Craig Lewis
while. The multi-million dollar storage array probably helped. I never used ReiserFS in production, so I can't comment. I haven't tried any other COW filesystems. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com <mailto:cle...@centra

Re: [ceph-users] Inter-region data replication through radosgw

2014-05-28 Thread Craig Lewis
mes the new master, and you need to delete the old master and replicate back. This is pretty common in replication scenarios. I have to do this when my PostgreSQL servers fail from master to secondary. Because of the limit of files operations in slave zone, I think there will be some cont

Re: [ceph-users] OSD not up

2014-05-30 Thread Craig Lewis
e osd" for more discussions. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com> *Central Desktop. Work together in ways you never thought possible.* Connect with us Website <http://www.centralde

Re: [ceph-users] OSD suffers problems after filesystem crashed and recovered.

2014-05-30 Thread Craig Lewis
arts involved, which leaves more room for bugs. If you see this problem come back on the same disk, I'd replace the disk. If you see this happen again to other disks, I would get your Fiber Channel vendor involved. It wouldn't hurt to make sure you have the latest firmware on the dis

[ceph-users] How to avoid deep-scrubbing performance hit?

2014-06-09 Thread Craig Lewis
I've correlated a large deep scrubbing operation to cluster stability problems. My primary cluster does a small amount of deep scrubs all the time, spread out over the whole week. It has no stability problems. My secondary cluster doesn't spread them out. It saves them up, and tries to do all o

[ceph-users] I have PGs that I can't deep-scrub

2014-06-10 Thread Craig Lewis
Every time I deep-scrub one PG, all of the OSDs responsible get kicked out of the cluster. I've deep-scrubbed this PG 4 times now, and it fails the same way every time. OSD logs are linked at the bottom. What can I do to get this deep-scrub to complete cleanly? This is the first time I've deep-

Re: [ceph-users] How to avoid deep-scrubbing performance hit?

2014-06-10 Thread Craig Lewis
y small regions of the key space, but the expensive part > is that deep scrub actually has to read all the data off disk, and > that's often a lot more disk seeks than simply examining the metadata > is. > -Greg > >> >> 0: http://ceph.com/docs/master/dev/osd_internals/s

Re: [ceph-users] about rgw region and zone

2014-06-10 Thread Craig Lewis
The idea of regions and zones is to replicate Amazon's S3 storage. Here's some links from Amazon descriping EC2 regions and zones (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html) and S3 Regions (http://docs.aws.amazon.com/AmazonS3/latest/dev/LocationSelect

Re: [ceph-users] I have PGs that I can't deep-scrub

2014-06-11 Thread Craig Lewis
ms = 1 and debug osd = 20? > > There were a few things fixed in scrub between emperor and firefly. Are > you planning on upgrading soon? > > sage > > > On Tue, 10 Jun 2014, Craig Lewis wrote: > >> Every time I deep-scrub one PG, all of the OSDs responsible get k

Re: [ceph-users] Some easy questions

2014-06-17 Thread Craig Lewis
> 3. You must use MDS from the start, because it's a metadata > structure/directory that only gets populated when writing files through > cephfs / FUSE. Otherwise, it doesn't even know about other objects and > therefore isn't visible on cephfs. > 4. MDS does not get updated when radosgw / S3 is us

Re: [ceph-users] about rgw region and zone

2014-06-17 Thread Craig Lewis
Metadata replication is about keeping a global namespace in all zones. It will replicate all of your users and bucket names, but not the data itself. That way you don't end up with a bucket named "mybucket" in your US and EU zones that are owned by different people. It's up to you to decide if t

Re: [ceph-users] what is the Recommandation configure for a ceph cluster with 10 servers without memory leak?

2014-06-18 Thread Craig Lewis
I haven't seen behavior like that. I have seen my OSDs use a lot of RAM while they're doing a recovery, but it goes back down when they're done. Your OSD is doing something, it's using 126% CPU. What does `ceph osd tree` and `ceph health detail` say? When you say you're installing Ceph on 10 se

Re: [ceph-users] java.net.UnknownHostException while creating a bucket

2014-06-18 Thread Craig Lewis
> java.net.UnknownHostException: my-new-ceph-bucket.svl-cephstack-05.cisco.com Amazon's S3 libraries generate the URL by prepending the bucket name to the hostname. See https://ceph.com/docs/master/radosgw/config/#enabling-subdomain-s3-calls Aside from the RadosGW configuration mentioned above,

Re: [ceph-users] Using S3 REST API

2014-06-18 Thread Craig Lewis
You went through the RadosGW configuration at https://ceph.com/docs/master/radosgw/config/ ? Once you complete that, you can test it by going to http://cluster.hostname/. You should get http://s3.amazonaws.com/doc/2006-03-01/"; class=" cd-browser-extension"> anonymous If you get a 500 erro

Re: [ceph-users] Using S3 REST API

2014-06-18 Thread Craig Lewis
I just replied to another user with a similar issue. Take a look at a recent post with the subject line "java.net.UnknownHostException while creating a bucket". ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph

Re: [ceph-users] what is the Recommandation configure for a ceph cluster with 10 servers without memory leak?

2014-06-18 Thread Craig Lewis
inactive since forever, current state creating, last > acting [905,805,204] > pg 2.fb74 is stuck inactive since forever, current state creating, last > acting [901,404] > pg 0.fb75 is stuck inactive since forever, current state creating, last > acting [903,403] > pg 1.fb74 is stuc

Re: [ceph-users] Some easy questions

2014-06-19 Thread Craig Lewis
> > >> Just to clarify. Suppose you insert an object into rados directly, you > won't be able to see that file > in cephfs anywhere, since it won't be listed in MDS. Correct? > > Meaning, you can start using CephFS+MDS at any point in time, but it will > only ever list objects/files > that were ins

Re: [ceph-users] RADOSGW + OpenStack basic question

2014-06-19 Thread Craig Lewis
Unfortunately, I can't help much. I'm just using the S3 interface for object storage. Looking back at the archives, this question does come up a lot, and there aren't a lot of replies. The best thread I see in the archive is http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-November/00628

Re: [ceph-users] RADOSGW + OpenStack basic question

2014-06-19 Thread Craig Lewis
ool name > or command for that. how you are accessing your object storage. > > Thanks for writing back. > > > > On Thu, Jun 19, 2014 at 8:43 PM, Craig Lewis > wrote: > >> Unfortunately, I can't help much. I'm just using the S3 interface for >> ob

Re: [ceph-users] How to improve performance of ceph objcect storage cluster

2014-06-26 Thread Craig Lewis
Cern noted that they need to reformat to put the Journal in a partition rather than on the OSD's filesystem like you did. See http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern, slide 24. When I saw that ceph disk prepare created a journal partition, I thought it was stupid to force a se

Re: [ceph-users] Difference between "ceph osd reweight" and "ceph osd crush reweight"

2014-06-26 Thread Craig Lewis
Note that 'ceph osd reweight' is not a persistent setting. When an OSD gets marked out, the osd weight will be set to 0. When it gets marked in again, the weight will be changed to 1. Because of this 'ceph osd reweight' is a temporary solution. You should only use it to keep your cluster runnin

Re: [ceph-users] Problem with RadosGW and special characters

2014-06-26 Thread Craig Lewis
Note that wget did URL encode the space ("test file" became "test%20file"), because it knows that a space is never valid. It can't know if you meant an actual plus, or a encoded space in "test+file", so it left it alone. I will say that I would prefer that the + be left alone. If I have a static

Re: [ceph-users] Problem with RadosGW and special characters

2014-06-27 Thread Craig Lewis
backslash characters (\) are known to cause problems for some clients (like s3cmd). Try removing them from your secret, and see if that works. If it doesn't, just remove the key and secret, and regenerate until the secret doesn't have any backslashes. On Fri, Jun 27, 2014 at 12:22 AM, Florent B

Re: [ceph-users] about rgw region and zone

2014-06-30 Thread Craig Lewis
there any parameter to > control the maximum amount of data or time window that secondary zone can > be lagging behind? > > Thanks, > Fred > On Jun 17, 2014 4:46 PM, "Craig Lewis" wrote: > >> Metadata replication is about keeping a global namespace in all zones. &

Re: [ceph-users] external monitoring tools for ceph

2014-06-30 Thread Craig Lewis
You should check out Calamari (https://github.com/ceph/calamari), Inktank's monitoring and administration tool. I started before Calamari was announced, so I rolled my own using using Zabbix. It handles all the monitoring, graphing, and alerting in one tool. It's kind of a pain to setup, but wo

Re: [ceph-users] RadosGW & data striping

2014-06-30 Thread Craig Lewis
RadosGW stripes data by default. Objects larger than 4MiB are broken up into 4MiB chunks. On Wed, Jun 25, 2014 at 3:49 AM, Florent B wrote: > Hi, > > Is it possible to get data striped with radosgw, as in RBD or CephFS ? > > Thank you > ___ > ceph-us

Re: [ceph-users] radosgw scalability questions

2014-07-08 Thread Craig Lewis
You can and should run multiple RadosGW and Apache instances per zone. The whole point of Ceph is eliminating as many points of failure as possible. You'll want to setup a load balancer just like you would for any website. You'll want your load balancer to recognize and forward both http://us-wes

Re: [ceph-users] Throttle pool pg_num/pgp_num increase impact

2014-07-09 Thread Craig Lewis
FWIW, I'm beginning to think that SSD journals are a requirement. Even with minimal recovery/backfilling settings, it's very easy to kick off an operation that will bring a cluster to it's knees. Increasing PG/PGP, increasing replication, adding too many new OSDs, etc. These operations can cause

Re: [ceph-users] radosgw-agent failed to parse

2014-07-09 Thread Craig Lewis
Just to ask a couple obvious questions... You didn't accidentally put 'http://us-secondary.example.comhttp:// us-secondary.example.com/' in any of your region or zone configuration files? The fact that it's missing the :80 makes me think it's getting that URL from someplace that isn't the command

Re: [ceph-users] I have PGs that I can't deep-scrub

2014-07-10 Thread Craig Lewis
ms. It's painful, but my cluster has been rock solid since I finished. On Wed, Jun 11, 2014 at 2:23 PM, Craig Lewis wrote: > New logs, with debug ms = 1, debug osd = 20. > > > In this timeline, I started the deep-scrub at 11:04:00 Ceph start > deep-scrubing at 11:04:03.

Re: [ceph-users] Creating a bucket on a non-master region in a multi-region configuration with unified namespace/replication

2014-07-14 Thread Craig Lewis
hether the RGW just > isn’t working as expected… > > > > Thank you for your reply – keep in touch if you end up doing some > multi-region replication, would love to hear your experience. > > > > Kurt > > > > *From:* Craig Lewis [mailto:cle...@centraldesktop.c

Re: [ceph-users] HW recommendations for OSD journals?

2014-07-16 Thread Craig Lewis
The good SSDs will report how much of their estimated life has been used. It's not in the SMART spec though, so different manufacturers do it differently (or not at all). I've got Intel DC S3700s, and the SMART attributes include: 233 Media_Wearout_Indicator 0x0032 100 100 000Old_age

Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-16 Thread Craig Lewis
One of the things I've learned is that many small changes to the cluster are better than one large change. Adding 20% more OSDs? Don't add them all at once, trickle them in over time. Increasing pg_num & pgp_num from 128 to 1024? Go in steps, not one leap. I try to avoid operations that will t

Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-16 Thread Craig Lewis
PM, Sage Weil wrote: > On Wed, 16 Jul 2014, Gregory Farnum wrote: > > On Wed, Jul 16, 2014 at 4:45 PM, Craig Lewis > wrote: > > > One of the things I've learned is that many small changes to the > cluster are > > > better than one large change. Adding 20% mo

Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-17 Thread Craig Lewis
I'd like to see some way to cap recovery IOPS per OSD. Don't allow backfill to do no more than 50 operations per second. It will slow backfill down, but reserve plenty of IOPS for normal operation. I know that implementing this well is not a simple task. I know I did some stupid things that ca

Re: [ceph-users] radosgw-agent failed to parse

2014-07-21 Thread Craig Lewis
s-secondary.domain.rgw", > "control_pool": ".us-secondary.rgw.control", > "gc_pool": ".us-secondary.rgw.gc", > "log_pool": ".us-secondary.log", > "intent_log_pool": ".us-secondary.intent-log",

Re: [ceph-users] radosgw monitoring

2014-07-28 Thread Craig Lewis
(Sorry for the duplicate email, I forgot to CC the list) Assuming you're using the default setup (RadosGW, FastCGI, and Apache), it's the same as monitoring a web site. On every node, verify that request for / returns a 200. If the RadosGW agent is down, or FastCGI is mis-configured, the request

Re: [ceph-users] Deployment scenario with 2 hosts

2014-07-28 Thread Craig Lewis
That's expected. You need > 50% of the monitors up. If you only have 2 machines, rebooting one means that 50% are up, so the cluster halts operations. That's done on purpose to avoid problems when the cluster is divided in exactly half, and both halves continue to run thinking the other half is

Re: [ceph-users] OSD daemon code in /var/lib/ceph/osd/ceph-2/ "dissapears" after creating pool/rbd -

2014-08-05 Thread Craig Lewis
. > > > > *From:* Craig Lewis [mailto:cle...@centraldesktop.com] > *Sent:* Tuesday, August 05, 2014 11:35 AM > *To:* Bruce McFarland > *Subject:* Re: [ceph-users] OSD daemon code in /var/lib/ceph/osd/ceph-2/ > "dissapears" after creating pool/rbd - >

Re: [ceph-users] [Ceph-community] Remote replication

2014-08-05 Thread Craig Lewis
That depends on which features of Ceph you're using. RadosGW supports replication. It's not real time, but it's near real time. Everything in my primary cluster is copied to my secondary within a few minutes. Take a look at http://ceph.com/docs/master/radosgw/federated-config/ . The details o

Re: [ceph-users] Using Crucial MX100 for journals or cache pool

2014-08-05 Thread Craig Lewis
You really do want power-loss protection on your journal SSDs. Data centers do have power outages, even with all the redundant grid connections, UPSes, and diesel generators. Losing an SSD will lose of all of the OSDs that are using it as a journal. If the data center loses power, you're probabl

Re: [ceph-users] Ceph can't seem to forget

2014-08-07 Thread Craig Lewis
Have you re-formatted and re-added all of the lost OSDs? I've found that if you lose an OSD, you can tell Ceph the data is gone (ceph osd lost ), but it won't believe you until it can talk to that OSDID again. If you have OSDs that are offline, you can verify that Ceph is waiting on them with cep

Re: [ceph-users] Ceph can't seem to forget

2014-08-07 Thread Craig Lewis
For your RDB volumes, you've lost random 4MiB chunks from your virtual disks. Think of it as unrecoverable bad sectors on the HDD. It was only a few unfound objects though (ceph status said 23 out of 5128982). You can probably recovery from that. I'd fsck all of the volumes, and perform any app

[ceph-users] Apache on Trusty

2014-08-08 Thread Craig Lewis
Is anybody running Ubuntu Trusty, but using Ceph's apache 2.2 and fastcgi packages? I'm a bit of a Ubuntu noob. I can't figure out the correct /etc/apt/preferences.d/ configs to prioritize Ceph's version of the packages. I keep getting Ubuntu's apache 2.4 packages. Can somebody that has this w

Re: [ceph-users] ceph-disk: Error: ceph osd start failed: Command '['/sbin/service', 'ceph', 'start', 'osd.5']' returned non-zero exit status 1

2014-08-11 Thread Craig Lewis
Are the disks mounted? You should have a single mount for each OSD in /var/lib/ceph/osd/ceph-/. If they're not mounted, is there anything complicated about your disks? On Mon, Aug 11, 2014 at 6:32 AM, Yitao Jiang wrote: > Hi, > > I launched a ceph (ceph version 0.80.5) lab on my laptop with 7

Re: [ceph-users] CRUSH map advice

2014-08-11 Thread Craig Lewis
Your MON nodes are separate hardware from the OSD nodes, right? If so, with replication=2, you should be able to shut down one of the two OSD nodes, and everything will continue working. Since it's for experimentation, I wouldn't deal with the extra hassle of replication=4 and custom CRUSH rules

Re: [ceph-users] ceph network

2014-08-11 Thread Craig Lewis
Only the OSDs use the cluster network. OSD heartbeat use both networks, to verify connectivity. Check out the Network Configuration Reference: http://ceph.com/docs/master/rados/configuration/network-config-ref/ On Mon, Aug 11, 2014 at 6:30 PM, yuelongguang wrote: > hi,all > i know ceph diffe

Re: [ceph-users] best practice of installing ceph(large-scale deployment)

2014-08-11 Thread Craig Lewis
Take a look at Cern's "Scaling Ceph at Cern" slides , as well as Inktank's Hardware Configuration Guide . You need at least 3 MONs for production. You might want m

Re: [ceph-users] CRUSH map advice

2014-08-12 Thread Craig Lewis
On Mon, Aug 11, 2014 at 11:26 PM, John Morris wrote: > On 08/11/2014 08:26 PM, Craig Lewis wrote: > >> Your MON nodes are separate hardware from the OSD nodes, right? >> > > Two nodes are OSD + MON, plus a separate MON node. > > > If so, >> with replica

Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean

2014-08-12 Thread Craig Lewis
For the incomplete PGs, can you give me the output of ceph pg dump I'm interested in the recovery_state key of that JSON data. On Tue, Aug 12, 2014 at 5:29 AM, Riederer, Michael wrote: > Sorry, but I think that does not help me. I forgot to mention something about > the operating system: >

Re: [ceph-users] Power Outage

2014-08-12 Thread Craig Lewis
I can't really help with MDS. Hopefully somebody else will chime in here. (Resending, because my last reply was too large.) On Tue, Aug 12, 2014 at 12:44 PM, hjcho616 wrote: > Craig, > > Thanks. It turns out one of my memory stick went bad after that power > outage. While trying to fix the

Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean

2014-08-13 Thread Craig Lewis
> > # ceph health detail > HEALTH_WARN crush map has legacy tunables crush map has legacy tunables; > see http://ceph.com/docs/master/rados/operations/crush-map/#tunables > > # ceph osd crush tunables optimal > adjusted tunables profile to optimal > > Mike > -- &

Re: [ceph-users] can osd start up if journal is lost and it has not been replayed?

2014-08-13 Thread Craig Lewis
If the journal is lost, the OSD is lost. This can be a problem if you use 1 SSD for journals for many OSDs. There has been some discussion about making the OSDs able to recover from a lost journal, but I haven't heard anything else about it. I haven't been paying much attention to the developer

Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean

2014-08-14 Thread Craig Lewis
f the > data is necessary. If the load is very high (over 30) I have seen exactly > what you describe. osds go down and out and come back up and in. > > OK. I'll try the slow osd to remove and then to scrub, deep-scrub the pgs. > > Many thanks for your help. > > Re

[ceph-users] Translating a RadosGW object name into a filename on disk

2014-08-14 Thread Craig Lewis
In my effort to learn more of the details of Ceph, I'm trying to figure out how to get from an object name in RadosGW, through the layers, down to the files on disk. clewis@clewis-mac ~ $ s3cmd ls s3://cpltest/ 2014-08-13 23:0214M 28dde9db15fdcb5a342493bc81f91151 s3://cpltest/vmware-freeb

Re: [ceph-users] Performance really drops from 700MB/s to 10MB/s

2014-08-14 Thread Craig Lewis
I find graphs really help here. One screen that has all the disk I/O and latency for all OSDs makes it easy to pin point the bottleneck. If you don't have that, I'd go low tech: Watch the blinky lights. It's really easy to see which disk is the hotspot. On Thu, Aug 14, 2014 at 6:56 AM, Mariusz

Re: [ceph-users] CRUSH map advice

2014-08-14 Thread Craig Lewis
On Thu, Aug 14, 2014 at 12:47 AM, Christian Balzer wrote: > > Hello, > > On Tue, 12 Aug 2014 10:53:21 -0700 Craig Lewis wrote: > >> That's a low probability, given the number of disks you have. I would've >> taken that bet (with backups). As the number o

Re: [ceph-users] can osd start up if journal is lost and it has not been replayed?

2014-08-15 Thread Craig Lewis
d is down for some time , its > journal is out of date(lose part of journal), but it can catch up with > other osds. why? > that example can tell that either outdated osd can get all journal from > others or 'catch up' has different theory with journal. > could you explain

Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean

2014-08-18 Thread Craig Lewis
- > *Von:* ceph-users [ceph-users-boun...@lists.ceph.com]" im Auftrag von > "Riederer, Michael [michael.riede...@br.de] > *Gesendet:* Montag, 18. August 2014 13:40 > *An:* Craig Lewis > *Cc:* ceph-users@lists.ceph.com; Karan Singh > > *Betreff:* Re: [ceph-user

Re: [ceph-users] [radosgw-admin] bilog list confusion

2014-08-18 Thread Craig Lewis
I have the same results. The primary zone (with log_meta and log_data true) have bilog data, the secondary zone (with log_meta and log_data false) do not have bilog data. I'm just guessing here (I can't test it right now)... I would think that disabling log_meta and log_data will stop adding new

Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean

2014-08-19 Thread Craig Lewis
On Tue, Aug 19, 2014 at 1:22 AM, Riederer, Michael wrote: > > > root@ceph-admin-storage:~# ceph pg force_create_pg 2.587 > pg 2.587 now creating, ok > root@ceph-admin-storage:~# ceph pg 2.587 query > ... > "probing_osds": [ > "5", > "8", >

Re: [ceph-users] some pgs active+remapped, Ceph can not recover itself.

2014-08-19 Thread Craig Lewis
I believe you need to remove the authorization for osd.4 and osd.6 before re-creating them. When I re-format disks, I migrate data off of the disk using: ceph osd out $OSDID Then wait for the remapping to finish. Once it does: stop ceph-osd id=$OSDID ceph osd out $OSDID ceph auth del osd

Re: [ceph-users] how radosgw recycle bucket index object and bucket meta object

2014-08-19 Thread Craig Lewis
My default, Ceph will wait two hours to garbage collect those RGW objects. You can adjust that time by changing rgw gc obj min wait See http://ceph.com/docs/master/radosgw/config-ref/ for the full list of configs. On Tue, Aug 19, 2014 at 7:18 PM, baijia...@126.com wrote: > I create a buck

Re: [ceph-users] Translating a RadosGW object name into a filename on disk

2014-08-19 Thread Craig Lewis
since I worked on this, but let's see what I remember... > > On Thu, Aug 14, 2014 at 11:34 AM, Craig Lewis > wrote: > > In my effort to learn more of the details of Ceph, I'm trying to > > figure out how to get from an object name in RadosGW, through the > > laye

Re: [ceph-users] Translating a RadosGW object name into a filename on disk

2014-08-20 Thread Craig Lewis
I worked on this, but let's see what I remember... > > On Thu, Aug 14, 2014 at 11:34 AM, Craig Lewis > wrote: >> In my effort to learn more of the details of Ceph, I'm trying to >> figure out how to get from an object name in RadosGW, through the >> layers, d

Re: [ceph-users] Problem setting tunables for ceph firefly

2014-08-21 Thread Craig Lewis
There was a good discussion of this a month ago: https://www.mail-archive.com/ceph-users%40lists.ceph.com/msg11483.html That'll give you some things you can try, and information on how to undo it if it does cause problems. You can disable the warning by adding this to the [mon] section of ceph.c

Re: [ceph-users] Question on OSD node failure recovery

2014-08-21 Thread Craig Lewis
The default rules are sane for small clusters with few failure domains. Anything larger than a single rack should customize their rules. It's a good idea to figure this out early. Changes to your CRUSH rules can result in a large percentage of data moving around, which will make your cluster unu

Re: [ceph-users] Best practice K/M-parameters EC pool

2014-08-26 Thread Craig Lewis
My OSD rebuild time is more like 48 hours (4TB disks, >60% full, osd max backfills = 1). I believe that increases my risk of failure by 48^2 . Since your numbers are failure rate per hour per disk, I need to consider the risk for the whole time for each disk. So more formally, rebuild time to t

Re: [ceph-users] Ceph monitor load, low performance

2014-08-26 Thread Craig Lewis
I had a similar problem once. I traced my problem it to a failed battery on my RAID card, which disabled write caching. One of the many things I need to add to monitoring. On Tue, Aug 26, 2014 at 3:58 AM, wrote: > Hello Gentelmen:-) > > Let me point one important aspect of this "low perform

Re: [ceph-users] Best practice K/M-parameters EC pool

2014-08-27 Thread Craig Lewis
> factors other than cost that prevent this ? > > Cheers > > On 26/08/2014 19:37, Craig Lewis wrote: > > My OSD rebuild time is more like 48 hours (4TB disks, >60% full, osd max > backfills = 1). I believe that increases my risk of failure by 48^2 . > Since your numbe

Re: [ceph-users] do RGW have billing feature? If have, how do we use it ?

2014-08-27 Thread Craig Lewis
Not directly, no. There is data recorded per bucket that could be used for billing. Take a look at radosgw-admin bucket --bucket= stats . That only covers storage. If you're looking to bill the same was Amazon does, I believe that you'll need to query your web server logs to get number of uploa

Re: [ceph-users] Best practice K/M-parameters EC pool

2014-08-28 Thread Craig Lewis
My initial experience was similar to Mike's, causing a similar level of paranoia. :-) I'm dealing with RadosGW though, so I can tolerate higher latencies. I was running my cluster with noout and nodown set for weeks at a time. Recovery of a single OSD might cause other OSDs to crash. In the pr

Re: [ceph-users] Uneven OSD usage

2014-09-03 Thread Craig Lewis
ceph osd reweight-by-utilization is ok to use, as long as it's tempory. I've used it while waiting for new hardware to arrive. It adjusts the weight displayed in ceph osd tree, but not the weight used in the crushmap. Yeah, there are two different weights for an OSD. Leave the crushmap weight a

Re: [ceph-users] ceph can not repair itself after accidental power down, half of pgs are peering

2014-09-03 Thread Craig Lewis
If you're running ntpd, then I believe your clocks were too skewed for the authentication to work. Once ntpd got the clocks syncing, authentication would start working again. You can query ntpd for how skewed the clock is relative to the NTP servers: clewis@ceph2:~$ sudo ntpq -p remote

Re: [ceph-users] I fail to add a monitor in a ceph cluster

2014-09-03 Thread Craig Lewis
"monclient: hunting for new mon" happens whenever the monmap changes. It will hang if there's no quorum. I haven't done this manually in a long time, so I'll refer to the Chef recipes. The recipe doesn't do the 'ceph-mon add', it just starts the daemon up. Try: sudo ceph-mon -i gail --mkfs --m

Re: [ceph-users] SSD journal deployment experiences

2014-09-04 Thread Craig Lewis
On Thu, Sep 4, 2014 at 9:21 AM, Dan Van Der Ster wrote: > > > 1) How often are DC S3700's failing in your deployments? > None of mine have failed yet. I am planning to monitor the wear level indicator, and preemptively replace any SSDs that go below 10%. Manually flushing the journal, replacin

Re: [ceph-users] SSD journal deployment experiences

2014-09-09 Thread Craig Lewis
On Sat, Sep 6, 2014 at 7:50 AM, Dan van der Ster wrote: > > BTW, do you happen to know, _if_ we re-use an OSD after the journal has > failed, are any object inconsistencies going to be found by a > scrub/deep-scrub? > I haven't tested this, but I did something I *think* is similar. I deleted an

Re: [ceph-users] SSD journal deployment experiences

2014-09-09 Thread Craig Lewis
On Sat, Sep 6, 2014 at 9:27 AM, Christian Balzer wrote: > On Sat, 06 Sep 2014 16:06:56 + Scott Laird wrote: > > > Backing up slightly, have you considered RAID 5 over your SSDs? > > Practically speaking, there's no performance downside to RAID 5 when > > your devices aren't IOPS-bound. > > >

Re: [ceph-users] osd going down every 15m blocking recovery from degraded state

2014-09-16 Thread Craig Lewis
Is it using any CPU or Disk I/O during the 15 minutes? On Sun, Sep 14, 2014 at 11:34 AM, Christopher Thorjussen < christopher.thorjus...@onlinebackupcompany.com> wrote: > I'm waiting for my cluster to recover from a crashed disk and a second osd > that has been taken out (crushmap, rm, stopped).

Re: [ceph-users] full/near full ratio

2014-09-16 Thread Craig Lewis
On Fri, Sep 12, 2014 at 4:35 PM, JIten Shah wrote: > > 1. If we need to modify those numbers, do we need to update the values in > ceph.conf and restart every OSD or we can run a command on MON, that will > overwrite it? > That will work. You can also update the values without a restart using:

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-16 Thread Craig Lewis
On Mon, Sep 8, 2014 at 2:53 PM, Francois Deppierraz wrote: > > > XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) > > All logs from before the disaster are still there, do you have any > advise on what would be relevant? > > This is a problem. It's not necessarily a deadlock.

Re: [ceph-users] osd going down every 15m blocking recovery from degraded state

2014-09-16 Thread Craig Lewis
ote: > I've got several osds that are spinning at 100%. > > I've retained some professional services to have a look. Its out of my > newbie reach.. > > /Christopher > > On Tue, Sep 16, 2014 at 11:23 PM, Craig Lewis > wrote: > >> Is it using any CPU or Disk

Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-18 Thread Craig Lewis
olved this problem. Since then, I ran into http://tracker.ceph.com/issues/5699. Snapshots are off until I've deployed Firefly. On Wed, Sep 17, 2014 at 8:09 AM, Florian Haas wrote: > Hi Craig, > > just dug this up in the list archives. > > On Fri, Mar 28, 2014 at 2:04 AM

Re: [ceph-users] osd going down every 15m blocking recovery from degraded state

2014-09-18 Thread Craig Lewis
n throught your post many times (google likes it ;) > I've been trying all the noout/nodown/noup. > But I will look into the XFS issue you are talking about. And read all of > the post one more time.. > > /C > > > On Wed, Sep 17, 2014 at 12:01 AM, Craig Lewis >

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-19 Thread Craig Lewis
On Fri, Sep 19, 2014 at 2:35 AM, Francois Deppierraz wrote: > Hi Craig, > > I'm planning to completely re-install this cluster with firefly because > I started to see other OSDs crashes with the same trim_object error... > I did lose data because of this, but it was unrelated to the XFS issues.

Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-19 Thread Craig Lewis
Excellent find. On Fri, Sep 19, 2014 at 7:11 AM, Florian Haas wrote: > Hi Craig, > > On Fri, Sep 19, 2014 at 2:49 AM, Craig Lewis > wrote: > > No, removing the snapshots didn't solve my problem. I eventually traced > > this problem to XFS deadlocks caused by >

Re: [ceph-users] confusion when kill 3 osds that store the same pg

2014-09-19 Thread Craig Lewis
Comments inline. On Thu, Sep 18, 2014 at 8:33 PM, yuelongguang wrote: > > 1. > [root@cephosd5-gw current]# ceph pg 2.30 query > Error ENOENT: i don't have pgid 2.30 > > why i can not query infomations of this pg? how to dump this pg? > I haven't actually tried this, but I expect something lik

  1   2   3   4   >