date:20140916

Re: [ceph-users] Ceph general configuration questions

2014-09-16 Thread Dan Van Der Ster

Hi, On 17 Sep 2014, at 06:11, shiva rkreddy mailto:shiva.rkre...@gmail.com>> wrote: Thanks Dan. Is there any preferred filesystem filesystem for the leveldb files? I understand that the filesystem should be of same type on both /var and ssd partition. Should it be ext4, xfs, something else or

Re: [ceph-users] vdb busy error when attaching to instance

2014-09-16 Thread m.channappa.negalur

Hello Sebastien, Thanks for your reply. I fixed error. It was a configuration mistake from my end. Regards, Malleshi CN -Original Message- From: Sebastien Han [mailto:sebastien@enovance.com] Sent: Tuesday, September 16, 2014 7:43 PM To: Channappa Negalur, M. Cc: ceph-users@lists.ce

Re: [ceph-users] Ceph general configuration questions

2014-09-16 Thread shiva rkreddy

Thanks Dan. Is there any preferred filesystem filesystem for the leveldb files? I understand that the filesystem should be of same type on both /var and ssd partition. Should it be ext4, xfs, something else or doesn't matter? On Tue, Sep 16, 2014 at 10:15 AM, Dan Van Der Ster < daniel.vanders...@c

[ceph-users] getting ulimit set error while installing ceph in admin node

2014-09-16 Thread Subhadip Bagui

Hi I'm getting the below error while installing ceph in admin node. Please let me know how to resolve the same. [ceph@ceph-admin ceph-cluster]$* ceph-deploy mon create-initial ceph-admin* [ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf [ceph_deploy.cli][INF

Re: [ceph-users] OSD troubles on FS+Tiering

2014-09-16 Thread Haomai Wang

Hi Kenneth, This problem is much like your last reported problem. It doesn't backport to 0.85, so the only master branch has no existing bug. On Tue, Sep 16, 2014 at 9:58 PM, Gregory Farnum wrote: > Heh, you'll have to talk to Haomai about issues with the > KeyValueStore, but I know he's found a

Re: [ceph-users] Replication factor of 50 on a 1000 OSD node cluster

2014-09-16 Thread Gregory Farnum

Yeah, so generally those will be correlated with some failure domain, and if you spread your replicas across failure domains you won't hit any issues. And if hosts are down for any length of time the OSDs will re-replicate data to keep it at proper redundancy. -Greg Software Engineer #42 @ http://i

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-16 Thread Somnath Roy

Hi Mark/Alexandre, The results are with journal and data configured in the same SSD ? Also, how are you configuring your journal device, is it a block device ? If journal and data are not in the same device result may change. BTW, there are SSDs like SanDisk optimas drives that is using capacitor

Re: [ceph-users] Replication factor of 50 on a 1000 OSD node cluster

2014-09-16 Thread Gregory Farnum

On Tue, Sep 16, 2014 at 5:10 PM, JIten Shah wrote: > Hi Guys, > > We have a cluster with 1000 OSD nodes and 5 MON nodes and 1 MDS node. In > order to be able to loose quite a few OSD’s and still survive the load, we > were thinking of making the replication factor to 50. > > Is that too big of a

[ceph-users] Replication factor of 50 on a 1000 OSD node cluster

2014-09-16 Thread JIten Shah

Hi Guys, We have a cluster with 1000 OSD nodes and 5 MON nodes and 1 MDS node. In order to be able to loose quite a few OSD’s and still survive the load, we were thinking of making the replication factor to 50. Is that too big of a number? what is the performance implications and any other iss

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-16 Thread Mark Kirkwood

On 17/09/14 08:39, Alexandre DERUMIER wrote: Hi, I’m just surprised that you’re only getting 5299 with 0.85 since I’ve been able to get 6,4K, well I was using the 200GB model Your model is DC S3700 mine is DC s3500 with lower writes, so that could explain the difference. Interesting - I

Re: [ceph-users] osd going down every 15m blocking recovery from degraded state

2014-09-16 Thread Christopher Thorjussen

I've been throught your post many times (google likes it ;) I've been trying all the noout/nodown/noup. But I will look into the XFS issue you are talking about. And read all of the post one more time.. /C On Wed, Sep 17, 2014 at 12:01 AM, Craig Lewis wrote: > I ran into a similar issue before

Re: [ceph-users] full/near full ratio

2014-09-16 Thread JIten Shah

Thanks Craig. That’s exactly what I was looking for. —Jiten On Sep 16, 2014, at 2:42 PM, Craig Lewis wrote: > > > On Fri, Sep 12, 2014 at 4:35 PM, JIten Shah wrote: > > 1. If we need to modify those numbers, do we need to update the values in > ceph.conf and restart every OSD or we can run

Re: [ceph-users] osd going down every 15m blocking recovery from degraded state

2014-09-16 Thread Craig Lewis

I ran into a similar issue before. I was having a lot of OSD crashes caused by XFS memory allocation deadlocks. My OSDs crashed so many times that they couldn't replay the OSD Map before they would be marked unresponsive. See if this sounds familiar: http://lists.ceph.com/pipermail/ceph-users-ce

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-16 Thread Craig Lewis

On Mon, Sep 8, 2014 at 2:53 PM, Francois Deppierraz wrote: > > > XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) > > All logs from before the disaster are still there, do you have any > advise on what would be relevant? > > This is a problem. It's not necessarily a deadlock.

Re: [ceph-users] osd going down every 15m blocking recovery from degraded state

2014-09-16 Thread Christopher Thorjussen

I've got several osds that are spinning at 100%. I've retained some professional services to have a look. Its out of my newbie reach.. /Christopher On Tue, Sep 16, 2014 at 11:23 PM, Craig Lewis wrote: > Is it using any CPU or Disk I/O during the 15 minutes? > > On Sun, Sep 14, 2014 at 11:34 AM

Re: [ceph-users] full/near full ratio

2014-09-16 Thread Craig Lewis

On Fri, Sep 12, 2014 at 4:35 PM, JIten Shah wrote: > > 1. If we need to modify those numbers, do we need to update the values in > ceph.conf and restart every OSD or we can run a command on MON, that will > overwrite it? > That will work. You can also update the values without a restart using:

Re: [ceph-users] osd going down every 15m blocking recovery from degraded state

2014-09-16 Thread Craig Lewis

Is it using any CPU or Disk I/O during the 15 minutes? On Sun, Sep 14, 2014 at 11:34 AM, Christopher Thorjussen < christopher.thorjus...@onlinebackupcompany.com> wrote: > I'm waiting for my cluster to recover from a crashed disk and a second osd > that has been taken out (crushmap, rm, stopped).

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-16 Thread Alexandre DERUMIER

Hi, >> I’m just surprised that you’re only getting 5299 with 0.85 since I’ve been >> able to get 6,4K, well I was using the 200GB model Your model is DC S3700 mine is DC s3500 with lower writes, so that could explain the difference. BTW, I'll be at the ceph days in paris thursday, could be g

Re: [ceph-users] what are these files for mon?

2014-09-16 Thread Florian Haas

On Tue, Sep 16, 2014 at 6:15 PM, Joao Eduardo Luis wrote: > Forcing the monitor to compact on start and restarting the mon is the > current workaround for overgrown ssts. This happens on a regular basis with > some clusters and I've not been able to track down the source. It seems > that leveldb

Re: [ceph-users] Still seing scrub errors in .80.5

2014-09-16 Thread Marc

We ran a for-loop to tell all the OSDs to deep scrub (since * still doesn't work) after the upgrade. The deep scrub this week that produced these errors is the weekly scheduled one though. I shall go investigate the mentioned thread... On 16/09/2014 20:36, Gregory Farnum wrote: > Ah, you're right

Re: [ceph-users] ceph-deploy

2014-09-16 Thread John Wilkins

No noise. I ran into the /var/local/osd0/journal issue myself. I will add notes shortly. On Fri, Apr 4, 2014 at 6:18 AM, Brian Candler wrote: > On 04/04/2014 14:11, Alfredo Deza wrote: > > Have you set passwordless sudo on the remote host?# > > No. Ah... I missed this bit: > > echo "ceph ALL = (r

Re: [ceph-users] Packages for 0.85?

2014-09-16 Thread Gregory Farnum

Thanks for the poke; looks like something went wrong during the release build last week. We're investigating now. -Greg On Tue, Sep 16, 2014 at 11:08 AM, Daniel Swarbrick wrote: > Hi, > > I saw that the development snapshot 0.85 was released last week, and > have been patiently waiting for packag

Re: [ceph-users] Still seing scrub errors in .80.5

2014-09-16 Thread Gregory Farnum

Ah, you're right — it wasn't popping up in the same searches and I'd forgotten that was so recent. In that case, did you actually deep scrub *everything* in the cluster, Marc? You'll need to run and fix every PG in the cluster, and the background deep scrubbing doesn't move through the data very q

Re: [ceph-users] Still seing scrub errors in .80.5

2014-09-16 Thread Dan Van Der Ster

Hi Greg, I believe Marc is referring to the corruption triggered by set_extsize on xfs. That option was disabled by default in 0.80.4... See the thread "firefly scrub error". Cheers, Dan From: Gregory Farnum Sent: Sep 16, 2014 8:15 PM To: Marc Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-u

[ceph-users] Packages for 0.85?

2014-09-16 Thread Daniel Swarbrick

Hi, I saw that the development snapshot 0.85 was released last week, and have been patiently waiting for packages to appear, so that I can upgrade a test cluster here. Can we still expect packages (wheezy, in my case) of 0.85 to be published? Thanks!

Re: [ceph-users] Still seing scrub errors in .80.5

2014-09-16 Thread Gregory Farnum

On Tue, Sep 16, 2014 at 12:03 AM, Marc wrote: > Hello fellow cephalopods, > > every deep scrub seems to dig up inconsistencies (i.e. scrub errors) > that we could use some help with diagnosing. > > I understand there used to be a data corruption issue before .80.3 so we > made sure that all the no

Re: [ceph-users] Mount ceph block device over specific NIC

2014-09-16 Thread Gregory Farnum

Assuming you're using the kernel? In any case, Ceph generally doesn't do anything to select between different NICs; it just asks for a connection to a given IP. So you should just be able to set up a route for that IP. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Se

Re: [ceph-users] Crushmap ruleset for rack aware PG placement

2014-09-16 Thread Loic Dachary

Hi Daniel, I see the core dump now, thank you. http://tracker.ceph.com/issues/9490 Cheers On 16/09/2014 18:39, Daniel Swarbrick wrote: > Hi Loic, > > Thanks for providing a detailed example. I'm able to run the example > that you provide, and also got my own live crushmap to produce some > resu

Re: [ceph-users] Crushmap ruleset for rack aware PG placement

2014-09-16 Thread Johnu George (johnugeo)

Hi Daniel, Can you provide your exact crush map and exact crushtool command that results in segfaults? Johnu On 9/16/14, 10:23 AM, "Daniel Swarbrick" wrote: >Replying to myself, and for the benefit of other caffeine-starved people: > >Setting the last rule to "chooseleaf firstn 0" do

Re: [ceph-users] Crushmap ruleset for rack aware PG placement

2014-09-16 Thread Daniel Swarbrick

Replying to myself, and for the benefit of other caffeine-starved people: Setting the last rule to "chooseleaf firstn 0" does not generate the desired results, and ends up sometimes putting all replicas in the same zone. I'm slowly getting the hang of customised crushmaps ;-) On 16/09/14 18:39,

Re: [ceph-users] Crushmap ruleset for rack aware PG placement

2014-09-16 Thread Daniel Swarbrick

Hi Loic, Thanks for providing a detailed example. I'm able to run the example that you provide, and also got my own live crushmap to produce some results, when I appended the "--num-rep 3" option to the command. Without that option, even your example is throwing segfaults - maybe a bug in crushtoo

Re: [ceph-users] inktank-mellanox webinar access ?

2014-09-16 Thread Georgios Dimitrakakis

Dear Karan and rest of the followers, since I haven't received anything from Mellanox regarding this webinar I 've decided to look for it myself. You can find the webinar here: http://www.mellanox.com/webinars/2014/inktank_ceph/ Best, G. On Mon, 14 Jul 2014 15:47:39 +0300, Karan Singh wro

Re: [ceph-users] what are these files for mon?

2014-09-16 Thread Joao Eduardo Luis

On 09/16/2014 04:35 PM, Gregory Farnum wrote: I don't really know; Joao has handled all these cases. I *think* they've been tied to a few bad versions of LevelDB, but I'm not certain. (There were a number of discussions about it on the public mailing lists.) -Greg On Tuesday, September 16, 2014,

Re: [ceph-users] what are these files for mon?

2014-09-16 Thread Gregory Farnum

I don't really know; Joao has handled all these cases. I *think* they've been tied to a few bad versions of LevelDB, but I'm not certain. (There were a number of discussions about it on the public mailing lists.) -Greg On Tuesday, September 16, 2014, Florian Haas wrote: > Hi Greg, > > just picke

Re: [ceph-users] Ceph general configuration questions

2014-09-16 Thread Dan Van Der Ster

Hi, On 16 Sep 2014, at 16:46, shiva rkreddy mailto:shiva.rkre...@gmail.com>> wrote: 2. Has any one used SSD devices for Monitors. If so, can you please share the details ? Any specific changes to the configuration files? We use SSDs on our monitors — a spinning disk was not fast enough for lev

[ceph-users] Ceph general configuration questions

2014-09-16 Thread shiva rkreddy

Hi. I'm new to ceph and have been going thorough the setup phase. I was able to setup couple of Proof of Concept cluster. Got some general questions, thought the community would be able to clarify. 1. I've been using ceph-deploy for deployment. In a 3 Monitor and 3 OSD configuration, one of the VM

Re: [ceph-users] what are these files for mon?

2014-09-16 Thread Florian Haas

Hi Greg, just picked up this one from the archive while researching a different issue and thought I'd follow up. On Tue, Aug 19, 2014 at 6:24 PM, Gregory Farnum wrote: > The sst files are files used by leveldb to store its data; you cannot > remove them. Are you running on a very small VM? How m

Re: [ceph-users] Crushmap ruleset for rack aware PG placement

2014-09-16 Thread Loic Dachary

Hi Daniel, When I run crushtool --outfn crushmap --build --num_osds 100 host straw 2 rack straw 10 default straw 0 crushtool -d crushmap -o crushmap.txt cat >> crushmap.txt < On 15/09/14 17:28, Sage Weil wrote: >> >> rule myrule { >> ruleset 1 >> type replicated >> min_size 1 >>

Re: [ceph-users] vdb busy error when attaching to instance

2014-09-16 Thread Sebastien Han

Did you follow this ceph.com/docs/master/rbd/rbd-openstack/ to configure your env? On 12 Sep 2014, at 14:38, m.channappa.nega...@accenture.com wrote: > Hello Team, > > I have configured ceph as a multibackend for openstack. > > I have created 2 pools . > 1. Volumes (replication size =3

Re: [ceph-users] does CephFS still have no fsck utility?

2014-09-16 Thread Gregory Farnum

http://tracker.ceph.com/issues/4137 contains links to all the tasks we have so far. You can also search any of the ceph-devel list archives for "forward scrub". On Mon, Sep 15, 2014 at 10:16 PM, brandon li wrote: > Great to know you are working on it! > > I am new to the mailing list. Is there a

Re: [ceph-users] OSD troubles on FS+Tiering

2014-09-16 Thread Gregory Farnum

Heh, you'll have to talk to Haomai about issues with the KeyValueStore, but I know he's found a number of issues in the version of it that went to 0.85. In future please flag when you're running with experimental stuff; it helps direct attention to the right places! ;) -Greg Software Engineer #42

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-16 Thread Sebastien Han

Hi, Thanks for keeping us updated on this subject. dsync is definitely killing the ssd… I don’t have much to add, I’m just surprised that you’re only getting 5299 with 0.85 since I’ve been able to get 6,4K, well I was using the 200GB model, that might explain this. On 12 Sep 2014, at 16:32, A

Re: [ceph-users] OSD troubles on FS+Tiering

2014-09-16 Thread Kenneth Waegeman

- Message from Gregory Farnum - Date: Mon, 15 Sep 2014 10:37:07 -0700 From: Gregory Farnum Subject: Re: [ceph-users] OSD troubles on FS+Tiering To: Kenneth Waegeman Cc: ceph-users The pidfile bug is already fixed in master/giant branches. As for the crashing, I

[ceph-users] Mount ceph block device over specific NIC

2014-09-16 Thread Arne K. Haaje

Hello, We have a machine that mounts a rbd image as a block device, then rsync files from another server to this mount. As this rsync traffic will have to share bandwith with the writing to the RBD, I wonder if it is possible to specify which NIC to mount the RBD through? We are using 0.85.5

[ceph-users] multi-site replication

2014-09-16 Thread Santhosh Fernandes

Hi All, I am trying configure multi-site data replication. I am getting this error continuously. INFO:urllib3.connectionpool:Starting new HTTP connection (1): cephog1 ERROR:radosgw_agent.sync:finding number of shards failed WARNING:radosgw_agent.sync:error preparing for sync, will retry. Trace

Re: [ceph-users] Crushmap ruleset for rack aware PG placement

2014-09-16 Thread Daniel Swarbrick

On 15/09/14 17:28, Sage Weil wrote: > > rule myrule { > ruleset 1 > type replicated > min_size 1 > max_size 10 > step take default > step choose firstn 2 type rack > step chooseleaf firstn 2 type host > step emit > } > > That will give you 4 osds, spr

[ceph-users] Still seing scrub errors in .80.5

2014-09-16 Thread Marc

Hello fellow cephalopods, every deep scrub seems to dig up inconsistencies (i.e. scrub errors) that we could use some help with diagnosing. I understand there used to be a data corruption issue before .80.3 so we made sure that all the nodes were upgraded to .80.5 and all the daemons were restart

47 matches

Mail list logo