from:"Sean Sullivan"

[ceph-users] Ceph can't seem to forget.

2014-08-06 Thread Sean Sullivan

I forgot to register before posting so reposting. I think I have a split issue or I can't seem to get rid of these objects. How can I tell ceph to forget the objects and revert? How this happened is that due to the python 2.7.8/ceph bug ( a whole rack of ceph went town (it had ubuntu 14.10 and th

[ceph-users] Ceph can't seem to forget

2014-08-07 Thread Sean Sullivan

I think I have a split issue or I can't seem to get rid of these objects. How can I tell ceph to forget the objects and revert? How this happened is that due to the python 2.7.8/ceph bug ( a whole rack of ceph went town (it had ubuntu 14.10 and that seemed to have 2.7.8 before 14.04). I didn't kno

[ceph-users] Swift can upload, list, and delete, but not download

2014-09-19 Thread Sean Sullivan

So this was working a moment ago and I was running rados bencharks as well as swift benchmarks to try to see how my install was doing. Now when I try to download an object I get this read_length error:: http://pastebin.com/R4CW8Cgj To try to poke at this I wiped all of the .rgw pools, removed

Re: [ceph-users] ceph health related message

2014-09-22 Thread Sean Sullivan

I had this happen to me as well. Turned out to be a connlimit thing for me. I would check dmesg/kernel log and see if you see any conntrack limit reached connection dropped messages then increase connlimit. Odd as I connected over ssh for this but I can't deny syslog. __

[ceph-users] Can not list objects in large bucket

2015-03-11 Thread Sean Sullivan

I have a single radosgw user with 2 s3 keys and 1 swift key. I have created a few buckets and I can list all of the contents of bucket A and C but not B with either S3 (boto) or python-swiftclient. I am able to list the first 1000 entries using radosgw-admin 'bucket list --bucket=bucketB' withou

Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

2015-04-28 Thread Sean Sullivan

Will do. The reason for the partial request is that the total size of the file is close to 1TB so attempting a download would take quite some time on our 10Gb connection. What is odd is that if I request the last bit received to the end of the file we get a 406 can not be satisfied response

Re: [ceph-users] RGW - Can't download complete object

2015-05-13 Thread Sean Sullivan

Sorry for the delay. It took me a while to figure out how to do a range request and append the data to a single file. The good news is that the end file seems to be 14G in size which matches the files manifest size. The bad news is that the file is completely corrupt and the radosgw log has erro

Re: [ceph-users] RGW - Can't download complete object

2015-05-13 Thread Sean Sullivan

fest, I'll need to take a look > at the code. Could such a sequence happen with the client that you're using > to upload? > > Yehuda > > - Original Message - > > From: "Sean Sullivan" > > To: "Yehuda Sadeh-Weinraub" > > C

[ceph-users] Ceph-deploy won't write journal if partition exists and using -- dmcrypt

2015-07-16 Thread Sean Sullivan

Some context. I have a small cluster running ubuntu 14.04 and giant ( now hsmmer). I ran some updates everything was fine. Rebooted a node and a drive must have failed as it no longer shows up. I use --dmcrypt with ceph deploy and 5 osds per ssd journal. To do this I created the ssd partit

[ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-18 Thread Sean Sullivan

Hello Yall! I can't figure out why my gateways are performing so poorly and I am not sure where to start looking. My RBD mounts seem to be performing fine (over 300 MB/s) while uploading a 5G file to Swift/S3 takes 2m32s (32MBps i believe). If we try a 1G file it's closer to 8MBps. Testing with nu

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-18 Thread Sean Sullivan

t? Examining the > OSDs to see if they're behaving differently on the different requests > is one angle of attack. The other is look into is if the RGW daemons > are hitting throttler limits or something that the RBD clients aren't. > -Greg > On Thu, Dec 18, 2014 at 7:35 PM Sea

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-18 Thread Sean Sullivan

third gateway was made last minute to test and rule out the hardware. On December 18, 2014 10:57:41 PM Christian Balzer wrote: Hello, Nice cluster, I wouldn't mind getting my hand or her ample nacelles, er, wrong movie. ^o^ On Thu, 18 Dec 2014 21:35:36 -0600 Sean Sullivan wrote: > Hell

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-18 Thread Sean Sullivan

Wow Christian, Sorry I missed these in line replies. Give me a minute to gather some data. Thanks a million for the in depth responses! I thought about raiding it but I needed the space unfortunately. I had a 3x60 osd node test cluster that we tried before this and it didn't have this floppi

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-19 Thread Sean Sullivan

lzer wrote: Hello, On Thu, 18 Dec 2014 23:45:57 -0600 Sean Sullivan wrote: Wow Christian, Sorry I missed these in line replies. Give me a minute to gather some data. Thanks a million for the in depth responses! No worries. I thought about raiding it but I needed the space unfortunately. I had

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-22 Thread Sean Sullivan

/q5E6JjkG On 12/19/2014 08:10 PM, Christian Balzer wrote > Hello Sean, > > On Fri, 19 Dec 2014 02:47:41 -0600 Sean Sullivan wrote: > >> Hello Christian, >> >> Thanks again for all of your help! I started a bonnie test using the >> following:: >> bonnie -

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-22 Thread Sean Sullivan

014 at 2:57 PM, Sean Sullivan > mailto:seapasu...@uchicago.edu>> wrote: > > Thanks Craig! > > I think that this may very well be my issue with osds dropping out > but I am still not certain as I had the cluster up for a small > period while running rado

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-23 Thread Sean Sullivan

I am trying to understand these drive throttle markers that were mentioned to get an idea of why these drives are marked as slow.:: here is the iostat of the drive /dev/sdbm http://paste.ubuntu.com/9607168/ an IO wait of .79 doesn't seem bad but a write wait of 21.52 seems really high. Looking

[ceph-users] Power Outage! Oh No!

2016-08-10 Thread Sean Sullivan

So we recently had a power outage and I seem to have lost 2 of 3 of my monitors. I have since copied /var/lib/ceph/mon/ceph-$(hostname){,.BAK} and then created a new cephfs and finally generated a new filesystem via ''' sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}

[ceph-users] lost power. monitors died. Cephx errors now

2016-08-10 Thread Sean Sullivan

So our datacenter lost power and 2/3 of our monitors died with FS corruption. I tried fixing it but it looks like the store.db didn't make it. I copied the working journal via 1. sudo mv /var/lib/ceph/mon/ceph-$(hostname){,.BAK} 2. sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/

[ceph-users] Fwd: lost power. monitors died. Cephx errors now

2016-08-10 Thread Sean Sullivan

I think it just got worse:: all three monitors on my other cluster say that ceph-mon can't open /var/lib/ceph/mon/$(hostname). Is there any way to recover if you lose all 3 monitors? I saw a post by Sage saying that the data can be recovered as all of the data is held on other servers. Is this pos

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2016-08-11 Thread Sean Sullivan

previous osd maps are there. I just don't understand what key/values I need inside. On Aug 11, 2016 1:33 AM, "Wido den Hollander" wrote: > > > Op 11 augustus 2016 om 0:10 schreef Sean Sullivan < > seapasu...@uchicago.edu>: > > > > > > I think i

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2016-08-12 Thread Sean Sullivan

hat lost all 3 monitors in power falure? If I am going down the right path is there any advice on how I can assemble/repair the database? I see that there is a rbd recovery from a dead cluster tool. Is it possible to do the same with s3 objects? On Thu, Aug 11, 2016 at 11:15 AM, Wido den Hollander w

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2016-08-12 Thread Sean Sullivan

other? All should have the same keys/values although constructed differently right? I can't blindly copy /var/lib/ceph/mon/ceph-$(hostname)/store.db/ from one host to another right? But can I copy the keys/values from one to another? On Fri, Aug 12, 2016 at 12:45 PM, Sean Sullivan wrote: >

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2016-08-13 Thread Sean Sullivan

log_file --- end dump of recent events --- Aborted (core dumped) --- --- I feel like I am so close but so far. Can anyone give me a nudge as to what I can do next? it looks like it is bombing out on trying to get an updated p

Re: [ceph-users] How can we repair OSD leveldb?

2016-08-18 Thread Sean Sullivan

We have a hammer cluster that experienced a similar power failure and ended up corrupting our monitors leveldb stores. I am still trying to repair ours but I can give you a few tips that seem to help. 1.) I would copy the database off to somewhere safe right away. Just opening it seems to change

[ceph-users] Luminous can't seem to provision more than 32 OSDs per server

2017-10-18 Thread Sean Sullivan

I am trying to install Ceph luminous (ceph version 12.2.1) on 4 ubuntu 16.04 servers each with 74 disks, 60 of which are HGST 7200rpm sas drives:: HGST HUS724040AL sdbv sas root@kg15-2:~# lsblk --output MODEL,KNAME,TRAN | grep HGST | wc -l 60 I am trying to deploy them all with :: a line like th

Re: [ceph-users] Luminous can't seem to provision more than 32 OSDs per server

2017-10-19 Thread Sean Sullivan

I have tried using ceph-disk directly and i'm running into all sorts of trouble but I'm trying my best. Currently I am using the following cobbled script which seems to be working: https://github.com/seapasulli/CephScripts/blob/master/provision_storage.sh I'm at 11 right now. I hope this works. ___

[ceph-users] zombie partitions, ceph-disk failure.

2017-10-20 Thread Sean Sullivan

I am trying to stand up ceph (luminous) on 3 72 disk supermicro servers running ubuntu 16.04 with HWE enabled (for a 4.10 kernel for cephfs). I am not sure how this is possible but even though I am running the following line to wipe all disks of their partitions, once I run ceph-disk to partition t

[ceph-users] luminous ubuntu 16.04 HWE (4.10 kernel). ceph-disk can't prepare a disk

2017-10-22 Thread Sean Sullivan

On freshly installed ubuntu 16.04 servers with the HWE kernel selected (4.10). I can not use ceph-deploy or ceph-disk to provision osd. whenever I try I get the following:: ceph-disk -v prepare --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --bluestore --cluster ceph --fs-type xfs -- /dev/s

[ceph-users] Filling up ceph past 75%

2016-08-28 Thread Sean Sullivan

I was curious if anyone has filled ceph storage beyond 75%. Admitedly we lost a single host due to power failure and are down 1 host until the replacement parts arrive but outside of that I am seeing disparity between the most and least full osd:: ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR M

Re: [ceph-users] Filling up ceph past 75%

2016-08-28 Thread Sean Sullivan

sure I've been hesitant. I'll give it another shot in a test instance and see how it goes. Thanks for your help as always Mr. Balzer. On Aug 28, 2016 8:59 PM, "Christian Balzer" wrote: > > Hello, > > On Sun, 28 Aug 2016 14:34:25 -0500 Sean Sullivan wrote: > >

[ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-04-30 Thread Sean Sullivan

I was creating a new user and mount point. On another hardware node I mounted CephFS as admin to mount as root. I created /aufstest and then unmounted. From there it seems that both of my mds nodes crashed for some reason and I can't start them any more. https://pastebin.com/1ZgkL9fa -- my mds log

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-04-30 Thread Sean Sullivan

4:32 PM, Sean Sullivan wrote: > I was creating a new user and mount point. On another hardware node I > mounted CephFS as admin to mount as root. I created /aufstest and then > unmounted. From there it seems that both of my mds nodes crashed for some > reason and I can't st

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-04-30 Thread Sean Sullivan

can't seem to get them to start again. On Mon, Apr 30, 2018 at 5:06 PM, Sean Sullivan wrote: > I had 2 MDS servers (one active one standby) and both were down. I took a > dumb chance and marked the active as down (it said it was up but laggy). > Then started the primary again and now

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-04-30 Thread Sean Sullivan

r 30, 2018 at 7:24 PM, Sean Sullivan wrote: > So I think I can reliably reproduce this crash from a ceph client. > > ``` > root@kh08-8:~# ceph -s > cluster: > id: 9f58ee5a-7c5d-4d68-81ee-debe16322544 > health: HEALTH_OK > > services: > mon: 3 dae

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-05-01 Thread Sean Sullivan

, May 1, 2018 at 12:09 AM, Patrick Donnelly wrote: > Hello Sean, > > On Mon, Apr 30, 2018 at 2:32 PM, Sean Sullivan > wrote: > > I was creating a new user and mount point. On another hardware node I > > mounted CephFS as admin to mount as root. I created /aufstest and then >

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-05-04 Thread Sean Sullivan

arge (one 4.1G and the other 200MB) > > kh10-8 (200MB) mds log -- https://griffin-objstore.op > ensciencedatacloud.org/logs/ceph-mds.kh10-8.log > kh09-8 (4.1GB) mds log -- https://griffin-objstore.op > ensciencedatacloud.org/logs/ceph-mds.kh09-8.log > > On Tue, May 1, 2018 at 12:0

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-05-23 Thread Sean Sullivan

iable is a different type or a missing delimiter. womp. I am definitely out of my depth but now is a great time to learn! Can anyone shed some more light as to what may be wrong? On Fri, May 4, 2018 at 7:49 PM, Yan, Zheng wrote: > On Wed, May 2, 2018 at 7:19 AM, Sean Sullivan wrote: > >

[ceph-users] ceph radosgw - 500 errors -- odd

2017-01-13 Thread Sean Sullivan

I am sorry for posting this if this has been addressed already. I am not sure on how to search through old ceph-users mailing list posts. I used to use gmane.org but that seems to be down. My setup:: I have a moderate ceph cluster (ceph hammer 94.9 - fe6d859066244b97b24f09d46552afc2071e6f90 ). Th

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2017-02-06 Thread Sean Sullivan

reshold) max_recent 500 max_new 1000 log_file --- end dump of recent events --- Segmentation fault (core dumped) -- I have tried copying my monitor and admin keyring into the admin.keyring used to try to r

[ceph-users] ceph-monstore-tool rebuild assert error

2017-02-07 Thread Sean Sullivan

I have a hammer cluster that died a bit ago (hammer 94.9) consisting of 3 monitors and 630 osds spread across 21 storage hosts. The clusters monitors all died due to leveldb corruption and the cluster was shut down. I was finally given word that I could try to revive the cluster this week! https:/

Re: [ceph-users] Radosgw (civetweb) hangs once around 850 established connections

2016-03-20 Thread Sean Sullivan

Hi Ben! I'm using ubuntu 14.04 I have restarted the gateways with the numthreads line you suggested. I hope this helps. I would think I would get some kind of throttle log or something. 500 seems really strange as well. Do you have a thread for this? RGW still has a weird race condition w

42 matches

Mail list logo