Hi,
On 17 Sep 2014, at 06:11, shiva rkreddy
mailto:shiva.rkre...@gmail.com>> wrote:
Thanks Dan. Is there any preferred filesystem filesystem for the leveldb files?
I understand that the filesystem should be of same type on both /var and ssd
partition.
Should it be ext4, xfs, something else or
Hello Sebastien,
Thanks for your reply. I fixed error. It was a configuration mistake from my
end.
Regards,
Malleshi CN
-Original Message-
From: Sebastien Han [mailto:sebastien@enovance.com]
Sent: Tuesday, September 16, 2014 7:43 PM
To: Channappa Negalur, M.
Cc: ceph-users@lists.ce
Thanks Dan. Is there any preferred filesystem filesystem for the leveldb
files? I understand that the filesystem should be of same type on both /var
and ssd partition.
Should it be ext4, xfs, something else or doesn't matter?
On Tue, Sep 16, 2014 at 10:15 AM, Dan Van Der Ster <
daniel.vanders...@c
Hi
I'm getting the below error while installing ceph in admin node. Please let
me know how to resolve the same.
[ceph@ceph-admin ceph-cluster]$* ceph-deploy mon create-initial ceph-admin*
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/ceph/.cephdeploy.conf
[ceph_deploy.cli][INF
Hi Kenneth,
This problem is much like your last reported problem. It doesn't
backport to 0.85, so the only master branch has no existing bug.
On Tue, Sep 16, 2014 at 9:58 PM, Gregory Farnum wrote:
> Heh, you'll have to talk to Haomai about issues with the
> KeyValueStore, but I know he's found a
Yeah, so generally those will be correlated with some failure domain,
and if you spread your replicas across failure domains you won't hit
any issues. And if hosts are down for any length of time the OSDs will
re-replicate data to keep it at proper redundancy.
-Greg
Software Engineer #42 @ http://i
Hi Mark/Alexandre,
The results are with journal and data configured in the same SSD ?
Also, how are you configuring your journal device, is it a block device ?
If journal and data are not in the same device result may change.
BTW, there are SSDs like SanDisk optimas drives that is using capacitor
On Tue, Sep 16, 2014 at 5:10 PM, JIten Shah wrote:
> Hi Guys,
>
> We have a cluster with 1000 OSD nodes and 5 MON nodes and 1 MDS node. In
> order to be able to loose quite a few OSD’s and still survive the load, we
> were thinking of making the replication factor to 50.
>
> Is that too big of a
Hi Guys,
We have a cluster with 1000 OSD nodes and 5 MON nodes and 1 MDS node. In order
to be able to loose quite a few OSD’s and still survive the load, we were
thinking of making the replication factor to 50.
Is that too big of a number? what is the performance implications and any other
iss
On 17/09/14 08:39, Alexandre DERUMIER wrote:
Hi,
I’m just surprised that you’re only getting 5299 with 0.85 since I’ve been able
to get 6,4K, well I was using the 200GB model
Your model is
DC S3700
mine is DC s3500
with lower writes, so that could explain the difference.
Interesting - I
I've been throught your post many times (google likes it ;)
I've been trying all the noout/nodown/noup.
But I will look into the XFS issue you are talking about. And read all of
the post one more time..
/C
On Wed, Sep 17, 2014 at 12:01 AM, Craig Lewis
wrote:
> I ran into a similar issue before
Thanks Craig. That’s exactly what I was looking for.
—Jiten
On Sep 16, 2014, at 2:42 PM, Craig Lewis wrote:
>
>
> On Fri, Sep 12, 2014 at 4:35 PM, JIten Shah wrote:
>
> 1. If we need to modify those numbers, do we need to update the values in
> ceph.conf and restart every OSD or we can run
I ran into a similar issue before. I was having a lot of OSD crashes
caused by XFS memory allocation deadlocks. My OSDs crashed so many times
that they couldn't replay the OSD Map before they would be marked
unresponsive.
See if this sounds familiar:
http://lists.ceph.com/pipermail/ceph-users-ce
On Mon, Sep 8, 2014 at 2:53 PM, Francois Deppierraz
wrote:
>
>
> XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
>
> All logs from before the disaster are still there, do you have any
> advise on what would be relevant?
>
>
This is a problem. It's not necessarily a deadlock.
I've got several osds that are spinning at 100%.
I've retained some professional services to have a look. Its out of my
newbie reach..
/Christopher
On Tue, Sep 16, 2014 at 11:23 PM, Craig Lewis
wrote:
> Is it using any CPU or Disk I/O during the 15 minutes?
>
> On Sun, Sep 14, 2014 at 11:34 AM
On Fri, Sep 12, 2014 at 4:35 PM, JIten Shah wrote:
>
> 1. If we need to modify those numbers, do we need to update the values in
> ceph.conf and restart every OSD or we can run a command on MON, that will
> overwrite it?
>
That will work. You can also update the values without a restart using:
Is it using any CPU or Disk I/O during the 15 minutes?
On Sun, Sep 14, 2014 at 11:34 AM, Christopher Thorjussen <
christopher.thorjus...@onlinebackupcompany.com> wrote:
> I'm waiting for my cluster to recover from a crashed disk and a second osd
> that has been taken out (crushmap, rm, stopped).
Hi,
>> I’m just surprised that you’re only getting 5299 with 0.85 since I’ve been
>> able to get 6,4K, well I was using the 200GB model
Your model is
DC S3700
mine is DC s3500
with lower writes, so that could explain the difference.
BTW, I'll be at the ceph days in paris thursday, could be g
On Tue, Sep 16, 2014 at 6:15 PM, Joao Eduardo Luis
wrote:
> Forcing the monitor to compact on start and restarting the mon is the
> current workaround for overgrown ssts. This happens on a regular basis with
> some clusters and I've not been able to track down the source. It seems
> that leveldb
We ran a for-loop to tell all the OSDs to deep scrub (since * still
doesn't work) after the upgrade. The deep scrub this week that produced
these errors is the weekly scheduled one though. I shall go investigate
the mentioned thread...
On 16/09/2014 20:36, Gregory Farnum wrote:
> Ah, you're right
No noise. I ran into the /var/local/osd0/journal issue myself. I will
add notes shortly.
On Fri, Apr 4, 2014 at 6:18 AM, Brian Candler wrote:
> On 04/04/2014 14:11, Alfredo Deza wrote:
>
> Have you set passwordless sudo on the remote host?#
>
> No. Ah... I missed this bit:
>
> echo "ceph ALL = (r
Thanks for the poke; looks like something went wrong during the
release build last week. We're investigating now.
-Greg
On Tue, Sep 16, 2014 at 11:08 AM, Daniel Swarbrick
wrote:
> Hi,
>
> I saw that the development snapshot 0.85 was released last week, and
> have been patiently waiting for packag
Ah, you're right — it wasn't popping up in the same searches and I'd
forgotten that was so recent.
In that case, did you actually deep scrub *everything* in the cluster,
Marc? You'll need to run and fix every PG in the cluster, and the
background deep scrubbing doesn't move through the data very q
Hi Greg,
I believe Marc is referring to the corruption triggered by set_extsize on xfs.
That option was disabled by default in 0.80.4... See the thread "firefly scrub
error".
Cheers,
Dan
From: Gregory Farnum
Sent: Sep 16, 2014 8:15 PM
To: Marc
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-u
Hi,
I saw that the development snapshot 0.85 was released last week, and
have been patiently waiting for packages to appear, so that I can
upgrade a test cluster here.
Can we still expect packages (wheezy, in my case) of 0.85 to be published?
Thanks!
On Tue, Sep 16, 2014 at 12:03 AM, Marc wrote:
> Hello fellow cephalopods,
>
> every deep scrub seems to dig up inconsistencies (i.e. scrub errors)
> that we could use some help with diagnosing.
>
> I understand there used to be a data corruption issue before .80.3 so we
> made sure that all the no
Assuming you're using the kernel?
In any case, Ceph generally doesn't do anything to select between
different NICs; it just asks for a connection to a given IP. So you
should just be able to set up a route for that IP.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Se
Hi Daniel,
I see the core dump now, thank you. http://tracker.ceph.com/issues/9490
Cheers
On 16/09/2014 18:39, Daniel Swarbrick wrote:
> Hi Loic,
>
> Thanks for providing a detailed example. I'm able to run the example
> that you provide, and also got my own live crushmap to produce some
> resu
Hi Daniel,
Can you provide your exact crush map and exact crushtool command
that results in segfaults?
Johnu
On 9/16/14, 10:23 AM, "Daniel Swarbrick"
wrote:
>Replying to myself, and for the benefit of other caffeine-starved people:
>
>Setting the last rule to "chooseleaf firstn 0" do
Replying to myself, and for the benefit of other caffeine-starved people:
Setting the last rule to "chooseleaf firstn 0" does not generate the
desired results, and ends up sometimes putting all replicas in the same
zone.
I'm slowly getting the hang of customised crushmaps ;-)
On 16/09/14 18:39,
Hi Loic,
Thanks for providing a detailed example. I'm able to run the example
that you provide, and also got my own live crushmap to produce some
results, when I appended the "--num-rep 3" option to the command.
Without that option, even your example is throwing segfaults - maybe a
bug in crushtoo
Dear Karan and rest of the followers,
since I haven't received anything from Mellanox regarding this webinar
I 've decided to look for it myself.
You can find the webinar here:
http://www.mellanox.com/webinars/2014/inktank_ceph/
Best,
G.
On Mon, 14 Jul 2014 15:47:39 +0300, Karan Singh wro
On 09/16/2014 04:35 PM, Gregory Farnum wrote:
I don't really know; Joao has handled all these cases. I *think* they've
been tied to a few bad versions of LevelDB, but I'm not certain. (There
were a number of discussions about it on the public mailing lists.)
-Greg
On Tuesday, September 16, 2014,
I don't really know; Joao has handled all these cases. I *think* they've
been tied to a few bad versions of LevelDB, but I'm not certain. (There
were a number of discussions about it on the public mailing lists.)
-Greg
On Tuesday, September 16, 2014, Florian Haas wrote:
> Hi Greg,
>
> just picke
Hi,
On 16 Sep 2014, at 16:46, shiva rkreddy
mailto:shiva.rkre...@gmail.com>> wrote:
2. Has any one used SSD devices for Monitors. If so, can you please share the
details ? Any specific changes to the configuration files?
We use SSDs on our monitors — a spinning disk was not fast enough for lev
Hi.
I'm new to ceph and have been going thorough the setup phase. I was able to
setup couple of Proof of Concept cluster. Got some general questions,
thought the community would be able to clarify.
1. I've been using ceph-deploy for deployment. In a 3 Monitor and 3 OSD
configuration, one of the VM
Hi Greg,
just picked up this one from the archive while researching a different
issue and thought I'd follow up.
On Tue, Aug 19, 2014 at 6:24 PM, Gregory Farnum wrote:
> The sst files are files used by leveldb to store its data; you cannot
> remove them. Are you running on a very small VM? How m
Hi Daniel,
When I run
crushtool --outfn crushmap --build --num_osds 100 host straw 2 rack straw 10
default straw 0
crushtool -d crushmap -o crushmap.txt
cat >> crushmap.txt < On 15/09/14 17:28, Sage Weil wrote:
>>
>> rule myrule {
>> ruleset 1
>> type replicated
>> min_size 1
>>
Did you follow this ceph.com/docs/master/rbd/rbd-openstack/ to configure your
env?
On 12 Sep 2014, at 14:38, m.channappa.nega...@accenture.com wrote:
> Hello Team,
>
> I have configured ceph as a multibackend for openstack.
>
> I have created 2 pools .
> 1. Volumes (replication size =3
http://tracker.ceph.com/issues/4137 contains links to all the tasks we
have so far. You can also search any of the ceph-devel list archives
for "forward scrub".
On Mon, Sep 15, 2014 at 10:16 PM, brandon li wrote:
> Great to know you are working on it!
>
> I am new to the mailing list. Is there a
Heh, you'll have to talk to Haomai about issues with the
KeyValueStore, but I know he's found a number of issues in the version
of it that went to 0.85.
In future please flag when you're running with experimental stuff; it
helps direct attention to the right places! ;)
-Greg
Software Engineer #42
Hi,
Thanks for keeping us updated on this subject.
dsync is definitely killing the ssd…
I don’t have much to add, I’m just surprised that you’re only getting 5299 with
0.85 since I’ve been able to get 6,4K, well I was using the 200GB model, that
might explain this.
On 12 Sep 2014, at 16:32, A
- Message from Gregory Farnum -
Date: Mon, 15 Sep 2014 10:37:07 -0700
From: Gregory Farnum
Subject: Re: [ceph-users] OSD troubles on FS+Tiering
To: Kenneth Waegeman
Cc: ceph-users
The pidfile bug is already fixed in master/giant branches.
As for the crashing, I
Hello,
We have a machine that mounts a rbd image as a block device, then rsync files
from another server to this mount.
As this rsync traffic will have to share bandwith with the writing to the RBD,
I
wonder if it is possible to specify which NIC to mount the RBD through?
We are using 0.85.5
Hi All,
I am trying configure multi-site data replication. I am getting this error
continuously.
INFO:urllib3.connectionpool:Starting new HTTP connection (1): cephog1
ERROR:radosgw_agent.sync:finding number of shards failed
WARNING:radosgw_agent.sync:error preparing for sync, will retry. Trace
On 15/09/14 17:28, Sage Weil wrote:
>
> rule myrule {
> ruleset 1
> type replicated
> min_size 1
> max_size 10
> step take default
> step choose firstn 2 type rack
> step chooseleaf firstn 2 type host
> step emit
> }
>
> That will give you 4 osds, spr
Hello fellow cephalopods,
every deep scrub seems to dig up inconsistencies (i.e. scrub errors)
that we could use some help with diagnosing.
I understand there used to be a data corruption issue before .80.3 so we
made sure that all the nodes were upgraded to .80.5 and all the daemons
were restart
47 matches
Mail list logo