On 01/09/2015 04:31 PM, Jeff wrote:
We had a power failure last night and our five node cluster has
two nodes with mon's that fail to start. Here's what we see:
# /usr/bin/ceph-mon --cluster=ceph -i ceph2 -f
2015-01-09 11:28:45.579267 b6c10740 -1 ERROR: on disk data includes unsupported
featu
It is not too difficult to get going, once you add various patches so it
works:
- missing __init__.py
- Allow to set ceph.conf
- Fix write issue: ioctx.write() does not return the written length
- Add param to async_update call (for swift in Juno)
There are a number of forks/pulls etc for these
first 3 stat commands shows blocks and size changing, but not the times
after a touch it changes and tail works
I saw some cephfs freezes related to it, it came back after touching the files
coreos2 logs # stat deis-router.log
File: 'deis-router.log'
Size: 148564 Blocks: 291IO Blo
Hi,
I have a program that tails a file and this file is create on another machine
some tail programs does not work because the modification time is not
updated in the remote machines
I've find this old thread
http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/11001
it mentions the pr
Hi John,
For the last part, there being two different versions of packages in
Giant, I don't think that's the actual problem.
What's really happening there is that python-ceph has been obsoleted
by other packages that are getting picked up by Yum. See the line
that says "Package python-ceph is o
Ken,
I had a number of issues installing Ceph on RHEL 7, which I think are
mostly due to dependencies. I followed the quick start guide, which
gets the latest major release--e.g., Firefly, Giant.
ceph.conf is here: http://goo.gl/LNjFp3
ceph.log common errors included: http://goo.gl/yL8UsM
To res
Hi,
I had an similiar effect two weeks ago - 1PG backfill_toofull and due
reweighting and delete there was enough free space but the rebuild
process stopped after a while.
After stop and start ceph on the second node, the rebuild process runs
without trouble and the backfill_toofull are gone.
Thi
In this case the root cause was half denied reservations.
http://tracker.ceph.com/issues/9626
This stopped backfills since, those listed as backfilling were
actually half denied and doing nothing. The toofull status is not
checked until a free backfill slot happens, so everything was just
Hi,
We've setup ceph and openstack on a fairly peculiar network
configuration (or at least I think it is) and I'm looking for
information on how to make it work properly.
Basically, we have 3 networks, a management network, a storage network
and a cluster network. The management network is o
On Fri, Jan 9, 2015 at 2:00 AM, Nico Schottelius
wrote:
> Lionel, Christian,
>
> we do have the exactly same trouble as Christian,
> namely
>
> Christian Eichelmann [Fri, Jan 09, 2015 at 10:43:20AM +0100]:
>> We still don't know what caused this specific error...
>
> and
>
>> ...there is currently
On Fri, Jan 9, 2015 at 3:00 AM, Nico Schottelius
wrote:
> Even though I do not like the fact that we lost a pg for
> an unknown reason, I would prefer ceph to handle that case to recover to
> the best possible situation.
>
> Namely I wonder if we can integrate a tool that shows
> which (parts of)
What was the osd_backfill_full_ratio? That's the config that controls
backfill_toofull. By default, it's 85%. The mon_osd_*_ratio affect the
ceph status.
I've noticed that it takes a while for backfilling to restart after
changing osd_backfill_full_ratio. Backfilling usually restarts for me in
On Thu, Jan 8, 2015 at 5:46 AM, Zeeshan Ali Shah wrote:
> I just finished configuring ceph up to 100 TB with openstack ... Since we
> are also using Lustre in our HPC machines , just wondering what is the
> bottle neck in ceph going on Peta Scale like Lustre .
>
> any idea ? or someone tried it
I
On Fri, Jan 9, 2015 at 1:24 AM, Christian Eichelmann
wrote:
> Hi all,
>
> as mentioned last year, our ceph cluster is still broken and unusable.
> We are still investigating what has happened and I am taking more deep
> looks into the output of ceph pg query.
>
> The problem is that I can find so
100GB objects (or ~40 on a hard drive!) are way too large for you to
get an effective random distribution.
-Greg
On Thu, Jan 8, 2015 at 5:25 PM, Mark Nelson wrote:
> On 01/08/2015 03:35 PM, Michael J Brewer wrote:
>>
>> Hi all,
>>
>> I'm working on filling a cluster to near capacity for testing p
Have you looked at
http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/
http://ceph.com/docs/master/rados/operations/pg-states/
http://ceph.com/docs/master/rados/operations/pg-concepts/
On Fri, Jan 9, 2015 at 1:24 AM, Christian Eichelmann
wrote:
> Hi all,
>
> as mentioned last year, o
Hi Nico,
I would probably recommend to upgrade to 0.87 (giant). I am running this
version for some time now and it works very well. I also upgraded from
firefly and it was easy.
The issue you are experiencing seems quite complex and it would require
debug logs to troubleshoot.
Apology that
Although it seems like having a regularly scheduled cron job to do a
recursive directory listing may be ok for us as a bit of a work
around...I am still in the processes of trying to improve performance.
A few other questions have come up as a result.
a)I am in the process of looking at specs
I doesn't seem like the problem here, but I've noticed that slow OSDs have
a large fan-out. I have less than 100 OSDs, so every OSD talks to every
other OSD in my cluster.
I was getting slow notices from all of my OSDs. Nothing jumped out, so I
started looking at disk write latency graphs. I no
I didn't actually calculate the per-OSD object density but yes, I agree
that will hurt.
On 01/09/2015 12:09 PM, Gregory Farnum wrote:
100GB objects (or ~40 on a hard drive!) are way too large for you to
get an effective random distribution.
-Greg
On Thu, Jan 8, 2015 at 5:25 PM, Mark Nelson wr
We had a power failure last night and our five node cluster has
two nodes with mon's that fail to start. Here's what we see:
# /usr/bin/ceph-mon --cluster=ceph -i ceph2 -f
2015-01-09 11:28:45.579267 b6c10740 -1 ERROR: on disk data includes unsupported
features: compat={},rocompat={},incompat={6
On Fri, Jan 9, 2015 at 7:17 AM, Robert LeBlanc wrote:
> Protect against bit rot. Checked on read and on deep scrub.
There are still issues (at least in firefly) with FDCache and scrub
completion having corrupted on-disk data, so throughout checksumming
will not cover every possible corruption cas
Hi all,
as mentioned last year, our ceph cluster is still broken and unusable.
We are still investigating what has happened and I am taking more deep
looks into the output of ceph pg query.
The problem is that I can find some informations about what some of the
sections mean, but mostly I can on
On Thu, 8 Jan 2015 21:17:12 -0700 Robert LeBlanc wrote:
> On Thu, Jan 8, 2015 at 8:31 PM, Christian Balzer wrote:
> > On Thu, 8 Jan 2015 11:41:37 -0700 Robert LeBlanc wrote:
> > Which of course currently means a strongly consistent lockup in these
> > scenarios. ^o^
>
> That is one way of puttin
On Thu, Jan 8, 2015 at 8:31 PM, Christian Balzer wrote:
> On Thu, 8 Jan 2015 11:41:37 -0700 Robert LeBlanc wrote:
> Which of course currently means a strongly consistent lockup in these
> scenarios. ^o^
That is one way of putting it
> Slightly off-topic and snarky, that strong consistency is of
On Thu, 8 Jan 2015 01:35:03 + Garg, Pankaj wrote:
> Hi,
> I am trying to get a very minimal Ceph cluster up and running (on ARM)
> and I'm wondering what is the smallest unit that I can run rados-bench
> on ? Documentation at
> (http://ceph.com/docs/next/start/quick-ceph-deploy/) seems to refe
Good morning Jiri,
sure, let me catch up on this:
- Kernel 3.16
- ceph: 0.80.7
- fs: xfs
- os: debian (backports) (1x)/ubuntu (2x)
Cheers,
Nico
Jiri Kanicky [Fri, Jan 09, 2015 at 10:44:33AM +1100]:
> Hi Nico.
>
> If you are experiencing such issues it would be good if you provide more info
>
Lionel, Christian,
we do have the exactly same trouble as Christian,
namely
Christian Eichelmann [Fri, Jan 09, 2015 at 10:43:20AM +0100]:
> We still don't know what caused this specific error...
and
> ...there is currently no way to make ceph forget about the data of this pg
> and create it as
I patch the http://tracker.ceph.com/issues/8452
run s3 test suite and still is error;
err log: ERROR: failed to get obj attrs,
obj=test-client.0-31zepqoawd8dxfa-212:_multipart_mymultipart.2/0IQGoJ7hG8ZtTyfAnglChBO79HUsjeC.meta
ret=-2
I found code that it may has problem:
when function exec "re
Very very good :)
пт, 9 янв. 2015, 2:17, William Bloom (wibloom) :
> Awesome, thanks Michael.
>
>
>
> Regards
>
> William
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Michael J. Kidd
> *Sent:* Wednesday, January 07, 2015 2:09 PM
> *To:* ceph-us...@ceph.c
Hi Lionel,
we have a ceph cluster with in sum about 1PB, 12 OSDs with 60 Disks,
devided into 4 racks in 2 rooms, all connected with a dedicated 10G
cluster network. Of course with a replication level of 3.
We did about 9 Month intensive testing. Just like you, we were never
experiences that kind
Hi Italo,
If you check for a post from me from a couple of days back, I have done exactly
this.
I created a k=5 m=3 over 4 hosts. This ensured that I could lose a whole host
and then an OSD on another host and the cluster was still fully operational.
I’m not sure if my method I used i
32 matches
Mail list logo