date:20160511

Re: [ceph-users] Erasure pool performance expectations

2016-05-11 Thread Nick Fisk

Hi Peter, yes just restart the OSD for the setting to take effect. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Peter Kerdisle Sent: 10 May 2016 19:06 To: Nick Fisk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Erasure pool performance expectations Th

Re: [ceph-users] thanks for a double check on ceph's config

2016-05-11 Thread Geocast Networks

Hi, We plan to create image with this format, how do you think about it? thanks. rbd create myimage --size 102400 --order 25 --stripe-unit 4K --stripe-count 32 --image-feature layering --image-feature striping 2016-05-10 21:19 GMT+08:00 : > Hi, > > > Am 2016-05-10 05:48, schrieb Geocast: > >>

Re: [ceph-users] Weighted Priority Queue testing

2016-05-11 Thread Christian Balzer

Hello, not sure if the Cc: to the users ML was intentional or not, but either way. The issue seen in the tracker: http://tracker.ceph.com/issues/15763 and what you have seen (and I as well) feels a lot like the lack of parallelism towards the end of rebuilds. This becomes even more obvious whe

Re: [ceph-users] Ceph OSD not goint up and join to the cluster. OSD does not goes up. ceph version 10.1.2

2016-05-11 Thread Gonzalo Aguilar Delgado

Hello again, I was looking at the patches sent on the repository and I found a patch that made the OSD to check for cluster health before starting up. Can this be patch the source of all my problems? Best regards, On Tue, May 10, 2016 at 6:07 PM, Gonzalo Aguilar Delgado < gaguilar.delg...@gmai

Re: [ceph-users] rbd resize option

2016-05-11 Thread Wido den Hollander

> Op 11 mei 2016 om 8:38 schreef M Ranga Swami Reddy : > > > Hello, > I wanted to resize an image using 'rbd' resize option, but it should > be have data loss. > For ex: I have image with 100 GB size (thin provisioned). and this > image has data of 10GB only. Here I wanted to resize this image t

Re: [ceph-users] Erasure pool performance expectations

2016-05-11 Thread Peter Kerdisle

Thanks, I tried that earlier but so far I am still getting slow requests. Although I also found I didn't have writeback enabled on my hardware controller. It seems that after changing that and setting the max bytes things are a bit more stable, less slow requests popping up. The fact that the write

Re: [ceph-users] thanks for a double check on ceph's config

2016-05-11 Thread Christian Balzer

Hello, On Wed, 11 May 2016 15:16:14 +0800 Geocast Networks wrote: > Hi, > > We plan to create image with this format, how do you think about it? > thanks. > Not really related to OSD formatting. > rbd create myimage --size 102400 --order 25 --stripe-unit 4K > --stripe-count 32 --image-feature

Re: [ceph-users] CephFS + CTDB/Samba - MDS session timeout on lockfile

2016-05-11 Thread Nick Fisk

> -Original Message- > From: Eric Eastman [mailto:eric.east...@keepertech.com] > Sent: 10 May 2016 18:29 > To: Nick Fisk > Cc: Ceph Users > Subject: Re: [ceph-users] CephFS + CTDB/Samba - MDS session timeout on > lockfile > > On Tue, May 10, 2016 at 6:48 AM, Nick Fisk wrote: > > > > > >

Re: [ceph-users] rbd resize option

2016-05-11 Thread M Ranga Swami Reddy

Thank you. but the fstrim can used with in mount partition...But I wanted to as cloud admin... I have a few uses with high volume size (ie capacity) allotted, but only used 5% of the capacity. so I wanted to reduce the size to 10% of size using the rbd resize command. But in this process, if a cus

[ceph-users] Mixed versions of Ceph Cluster and RadosGW

2016-05-11 Thread Saverio Proto

Hello, I have a production Ceph cluster running the latest Hammer Release. We are not planning soon the upgrade to Jewel. However, I would like to upgrade just the Rados Gateway to Jewel, because I want to test the new SWIFT compatibiltiy improvements. Is it supported to run the system with thi

Re: [ceph-users] rbd resize option

2016-05-11 Thread Alexandre DERUMIER

>>but the fstrim can used with in mount partition...But I wanted to as >>cloud admin... if you use qemu, you can launch fstrim through guest-agent http://dustymabe.com/2013/06/26/enabling-qemu-guest-agent-and-fstrim-again/ - Mail original - De: "M Ranga Swami Reddy" À: "Wido den

Re: [ceph-users] rbd resize option

2016-05-11 Thread Christian Balzer

Hello, On Wed, 11 May 2016 13:33:44 +0200 (CEST) Alexandre DERUMIER wrote: > >>but the fstrim can used with in mount partition...But I wanted to as > >>cloud admin... > > if you use qemu, you can launch fstrim through guest-agent > This of course assumes that qemu/kvm is using a disk method

[ceph-users] Ceph ANT task and file is empty

2016-05-11 Thread gjprabu

Hi, We are using ceph rbd with cepfs mounted file system, Here while use ant copy task with in ceph shared directory the file is copied properly but after few seconds content become empty. Is there any solution for this issue. Regards Prabu GJ _

[ceph-users] 2 pg in 'active+undersized+degraded' state

2016-05-11 Thread Max Vernimmen

Hi, I’m looking for some help in figuring out why there are 2 pg’s in our cluster in 'active+undersized+degraded' state. They don’t seem to get assigned a 3rd osd to place data on. I’m not sure why, everything looks ‘ok’ to me. Our ceph cluster consists of 3 nodes and has been upgraded from fir

Re: [ceph-users] Weighted Priority Queue testing

2016-05-11 Thread Mark Nelson

1. First scenario, only 4 node scenario and since it is chassis level replication single node remaining on the chassis taking all the traffic. It seems that is a bottleneck as for the host level replication on the similar setup recovery time is much less (data is not in this table). 2. In the s

Re: [ceph-users] Weighted Priority Queue testing

2016-05-11 Thread Nick Fisk

> -Original Message- > From: Mark Nelson [mailto:mnel...@redhat.com] > Sent: 11 May 2016 13:16 > To: Somnath Roy ; Nick Fisk > ; Ben England ; Kyle Bader > > Cc: Sage Weil ; Samuel Just ; ceph- > us...@lists.ceph.com > Subject: Re: Weighted Priority Queue testing > > > 1. First scenario,

[ceph-users] Hammer vs Jewel librbd performance testing and git bisection results

2016-05-11 Thread Mark Nelson

Hi Guys, we spent some time over the past week looking at hammer vs jewel RBD performance in HDD only, HDD+NVMe journal, and NVMe cases with the default filestore backend. We ran into a number of issues during testing and I don't want to get into everything, but we were eventually able to ge

Re: [ceph-users] 2 pg in 'active+undersized+degraded' state

2016-05-11 Thread Gaurav Bafna

Hi Max, I encountered same error with my 3 node cluster few days ago. When I added a fourth node to the cluster , the PGs came back to healthy state. It seems to be a corner case of CRUSH algorithm which hits only in a small cluster. Quoting from another ceph user : " yes the pg should get remap

[ceph-users] ACL nightmare on RadosGW for 200 TB dataset

2016-05-11 Thread Saverio Proto

Hello there, Our setup is with Ceph Hammer (latest release). We want to publish in our Object Storage some Scientific Datasets. These are collections of around 100K objects and total size of about 200 TB. For Object Storage we use the RadosGW with S3 API. For the initial testing we are using a

[ceph-users] Unfound objects and inconsistent status reports

2016-05-11 Thread Simon Engelsman

Hello everyone, We experienced a strange scenario last week of unfound objects and inconsistent reports from ceph tools. We solved it with the help from Sage, and we wanted to share our experience and to see if it can be of any use for developers too. After OSDs segfaulting randomly, our cluster

Re: [ceph-users] Hammer vs Jewel librbd performance testing and git bisection results

2016-05-11 Thread Jason Dillaman

Awesome work Mark! Comments / questions inline below: On Wed, May 11, 2016 at 9:21 AM, Mark Nelson wrote: > There are several commits of interest that have a noticeable effect on 128K > sequential read performance: > > > 1) https://github.com/ceph/ceph/commit/3a7b5e3 > > This commit was the firs

Re: [ceph-users] Unfound objects and inconsistent status reports

2016-05-11 Thread Wido den Hollander

> Op 11 mei 2016 om 15:51 schreef Simon Engelsman : > > > Hello everyone, > > We experienced a strange scenario last week of unfound objects and > inconsistent reports from ceph tools. We solved it with the help from > Sage, and we wanted to share our experience and to see if it can be of > any

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-11 Thread Saverio Proto

Thank you. It is exactly a problem with multipart. So I tried two clients (s3cmd and rclone). When you upload a file in S3 using multipart, you are not able to read anymore this object with the SWIFT API because the md5 check fails. Saverio 2016-05-09 12:00 GMT+02:00 Xusangdi : > Hi, > > I'm

Re: [ceph-users] Hammer vs Jewel librbd performance testing and git bisection results

2016-05-11 Thread Mark Nelson

On 05/11/2016 08:52 AM, Jason Dillaman wrote: Awesome work Mark! Comments / questions inline below: On Wed, May 11, 2016 at 9:21 AM, Mark Nelson wrote: There are several commits of interest that have a noticeable effect on 128K sequential read performance: 1) https://github.com/ceph/ceph/

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-11 Thread Saverio Proto

It does not work also the way around: If I upload a file with the swift client with the -S options to force swift to make multipart: swift upload -S 100 multipart 180.mp4 Then I am not able to read the file with S3 s3cmd get s3://multipart/180.mp4 download: 's3://multipart/180.mp4' -> './18

Re: [ceph-users] Hammer vs Jewel librbd performance testing and git bisection results

2016-05-11 Thread Jason Dillaman

On Wed, May 11, 2016 at 10:07 AM, Mark Nelson wrote: > Perhaps 0024677 or 3ad19ae introduced another regression that was being > masked by c474e4 and when 66e7464 improved the situation, the other > regression appeared? 0024677 is in Hammer as 7004149 and 3ad19ae is in Hammer as b38da480. I opene

[ceph-users] radosgw error

2016-05-11 Thread Mike Lowe

Can anybody help shed some light on this error I’m getting from radosgw? 2016-05-11 10:09:03.471649 7f1b957fa700 1 -- 172.16.129.49:0/3896104243 --> 172.16.128.128:6814/121075 -- osd_op(client.111957498.0:726 27.4742be4b 97c56252-6103-4ef4-b37a-42739393f0f1.113770300.1_interfaces [create 0~0 [

Re: [ceph-users] Hammer vs Jewel librbd performance testing and git bisection results

2016-05-11 Thread Mark Nelson

On 05/11/2016 09:19 AM, Jason Dillaman wrote: On Wed, May 11, 2016 at 10:07 AM, Mark Nelson wrote: Perhaps 0024677 or 3ad19ae introduced another regression that was being masked by c474e4 and when 66e7464 improved the situation, the other regression appeared? 0024677 is in Hammer as 7004149

Re: [ceph-users] CephFS + CTDB/Samba - MDS session timeout on lockfile

2016-05-11 Thread Eric Eastman

On Wed, May 11, 2016 at 2:04 AM, Nick Fisk wrote: >> -Original Message- >> From: Eric Eastman [mailto:eric.east...@keepertech.com] >> Sent: 10 May 2016 18:29 >> To: Nick Fisk >> Cc: Ceph Users >> Subject: Re: [ceph-users] CephFS + CTDB/Samba - MDS session timeout on >> lockfile >> >> On

Re: [ceph-users] Ceph OSD not goint up and join to the cluster. OSD does not goes up. ceph version 10.1.2

2016-05-11 Thread Gonzalo Aguilar Delgado

Hi, For your information and for all in the same situation than me. I found on release notes that's very well explained that when the server is down but controller doesn't know about it. It can be because the upgrades done in ceph during several releases. In this case firefly there are some ins

[ceph-users] Single OSD Nodes Rest on Disaster

2016-05-11 Thread Lazuardi Nasution

Hi, How to make ceph still work if the cluster lost all OSD hosts except one on disaster where the capacity of single host is bigger than total used data? I need to minimize downtime while recovering/reinstalling the lost hosts. More generally, how to make ceph choose highest number of types afte

Re: [ceph-users] journal or cache tier on SSDs ?

2016-05-11 Thread Jonathan D. Proulx

On Tue, May 10, 2016 at 10:40:08AM +0200, Yoann Moulin wrote: :RadowGW (S3 and maybe swift for hadoop/spark) will be the main usage. Most of :the access will be in read only mode. Write access will only be done by the :admin to update the datasets. No one seems to have pointed this out, but if yo

Re: [ceph-users] Unfound objects and inconsistent status reports

2016-05-11 Thread Simon Engelsman

On 05/11/2016 04:00 PM, Wido den Hollander wrote: > >> Op 11 mei 2016 om 15:51 schreef Simon Engelsman : >> >> >> Hello everyone, >> >> We experienced a strange scenario last week of unfound objects and >> inconsistent reports from ceph tools. We solved it with the help from >> Sage, and we want

Re: [ceph-users] Weighted Priority Queue testing

2016-05-11 Thread Somnath Roy

I bumped up the backfill/recovery settings to match up Hammer. It is probably unlikely that long tail latency is a parallelism issue. If so, entire recovery would be suffering not the tail alone. It's probably a prioritization issue. Will start looking and update my findings. I can't add devl be

Re: [ceph-users] Incomplete PGs, how do I get them back without data loss?

2016-05-11 Thread Dan van der Ster

Hi George, Which version of Ceph is this? I've never had incompete pgs stuck like this before. AFAIK it means that osd.52 would need to be brought up before you can restore those PGs. Perhaps you'll need ceph-objectstore-tool to help dump osd.52 and bring up its data elsewhere. A quick check on t

Re: [ceph-users] Weighted Priority Queue testing

2016-05-11 Thread Somnath Roy

Yes Mark, I have the following io profile going on during recovery. [recover-test] ioengine=rbd clientname=admin pool=mypool rbdname=<> direct=1 invalidate=0 rw=randrw norandommap randrepeat=0 rwmixread=40 rwmixwrite=60 iodepth=256 numjobs=6 end_fsync=0 bssplit=512/4:1024/1:1536/1:2048/1:2560/1:30

Re: [ceph-users] Incomplete PGs, how do I get them back without data loss?

2016-05-11 Thread george.vasilakakos

Hey Dan, This is on Hammer 0.94.5. osd.52 was always on a problematic machine and when this happened had less data on its local disk than the other OSDs. I've tried adapting that blog post's solution to this situation to no avail. I've tried things like looking at all probing OSDs in the query

[ceph-users] CDM Tonight

2016-05-11 Thread Patrick McGarry

Hey cephers, Just a reminder, the Ceph Developer Monthly for May was rescheduled from last week and will be happening tonight at 9p EST. http://wiki.ceph.com/Planning If you have ongoing work in Ceph, please join us for a review and discussion. Thanks! -- Best Regards, Patrick McGarry Direct

[ceph-users] Error setting up Ceph from latest master

2016-05-11 Thread Aakanksha Pudipeddi

Hi all, I tried to upgrade from Infernalis to the master branch. I see the following error: ssd@OptiPlex-9020-1:~/src/jewel-master$ ceph -s Traceback (most recent call last): File "/home/ssd/src/jewel-master/ceph-install/bin/ceph", line 118, in import rados ImportError: /home/ssd/src/jew

[ceph-users] Error setting up Ceph from latest master

2016-05-11 Thread Aakanksha Pudipeddi

Hi all, I tried to upgrade from Infernalis to the master branch. I see the following error: ssd@OptiPlex-9020-1:~/src/jewel-master$ ceph -s Traceback (most recent call last): File "/home/ssd/src/jewel-master/ceph-install/bin/ceph", line 118, in import rados ImportError: /home/ssd/src/jew

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-11 Thread Yehuda Sadeh-Weinraub

While I'm usually not fond of blaming the client application, this is really the swift command line tool issue. It tries to be smart by comparing the md5sum of the object's content with the object's etag, and it breaks with multipart objects. Multipart objects is calculated differently (md5sum of t

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API atthe same time

2016-05-11 Thread zhbingyin

In multipart upload of S3, the checksum algorithm is different from that in swift. Including the following page for your convenience. http://stackoverflow.com/questions/12186993/what-is-the-algorithm-to-compute-the-amazon-s3-etag-for-a-file-larger-than-5gb 原始邮件发件人:Saverio protoziopr...@gmail.co

Re: [ceph-users] journal or cache tier on SSDs ?

2016-05-11 Thread Christian Balzer

Hello, On Wed, 11 May 2016 11:27:28 -0400 Jonathan D. Proulx wrote: > On Tue, May 10, 2016 at 10:40:08AM +0200, Yoann Moulin wrote: > > :RadowGW (S3 and maybe swift for hadoop/spark) will be the main usage. > Most of :the access will be in read only mode. Write access will only be > done by the

Re: [ceph-users] Weighted Priority Queue testing

2016-05-11 Thread Christian Balzer

On Wed, 11 May 2016 16:10:06 + Somnath Roy wrote: > I bumped up the backfill/recovery settings to match up Hammer. It is > probably unlikely that long tail latency is a parallelism issue. If so, > entire recovery would be suffering not the tail alone. It's probably a > prioritization issue. Wi

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-11 Thread Xusangdi

It looks like the Etag computing algorithm for Swift multipart doesn't add the dash character, which makes s3cmd regards the file as a regular one (otherwise it will just skip the Etag checking step). You may also take a look at this: https://github.com/s3tools/s3cmd/blob/master/S3/S3.py#L1509

[ceph-users] about available space

2016-05-11 Thread Geocast Networks

Hi, my ceph df output as following, # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 911T 911T 121G 0.01 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS block7 0 0 303T 0 image

Re: [ceph-users] ACL nightmare on RadosGW for 200 TB dataset

2016-05-11 Thread Wido den Hollander

> Op 11 mei 2016 om 15:42 schreef Saverio Proto : > > > Hello there, > > Our setup is with Ceph Hammer (latest release). > > We want to publish in our Object Storage some Scientific Datasets. > These are collections of around 100K objects and total size of about > 200 TB. > > For Object Stora

Re: [ceph-users] Mixed versions of Ceph Cluster and RadosGW

2016-05-11 Thread Gregory Farnum

Sadly not. RGW generally requires updates to the OSD-side object class code for a lot of its functionality andisnt expected to work against older clusters. :( On Wednesday, May 11, 2016, Saverio Proto wrote: > Hello, > > I have a production Ceph cluster running the latest Hammer Release. > > We

Re: [ceph-users] wrong exit status if bucket already exists

2016-05-11 Thread Gregory Farnum

Yes, it's intentional. All ceph CLI operations are idempotent. On Tuesday, May 10, 2016, Swapnil Jain wrote: > Hi > > I am using infernalis 9.2.1. While creating bucket, if the bucket already > exists, its still returns 0 as exit status. Is it intentional out of some > reason or a bug? > > > > r

49 matches

Mail list logo