(Adding devel list to the CC)
Hi Eric,
To add more context to the problem:
Min_size was set to 1 and replication size is 2.
There was a flaky power connection to one of the enclosures. With min_size 1,
we were able to continue the IO's, and recovery was active once the power comes
back. But i
On 07/23/2015 06:31 AM, Jan Schermer wrote:
Hi all,
I am looking for a way to alleviate the overhead of RBD snapshots/clones for
some time.
In our scenario there are a few “master” volumes that contain production data,
and are frequently snapshotted and cloned for dev/qa use. Those
snapshots/
Sorry for the broken post previously. I have looked into this more and
it looks like ceph-deploy is not seeing that it is a partition and
attempting to create an additional partition in the journals place. I
read in the documentation that if I set osd journal size = 0, that it
will assume that
Hey cephers,
Since Ceph Days for both Chicago and Raleigh are fast approaching, I
wanted to put another call out on the mailing lists for anyone who
might be interested in sharing their Ceph experiences with the
community at either location. If you have something to share
(integration, use case, p
On 07/23/2015 06:47 PM, Ilya Dryomov wrote:
>
> To me this looks like a writev() interrupted by a SIGALRM. I think
> nginx guys read your original email the same way I did, which is "write
> syscall *returned* ERESTARTSYS", but I'm pretty sure that is not the
> case here.
>
> ERESTARTSYS shows u
Greetings,
I am working on standing up a fresh Ceph object storage cluster and have some
questions about what I should be seeing as far as inter-OSD connectivity. I
have spun up my monitor and radosgw nodes as VMs, all running on a
192.168.10.0/24 network (all IP ranges have been changed to pr
Ah, I made the same mistake... Sorry for the noise.
--
Eino Tuominen
> Ilya Dryomov kirjoitti 23.7.2015 kello 19.47:
>
>> On Thu, Jul 23, 2015 at 6:28 PM, Vedran Furač wrote:
>>> On 07/23/2015 05:25 PM, Ilya Dryomov wrote:
On Thu, Jul 23, 2015 at 6:02 PM, Vedran Furač
wrote:
>>>
On Thu, Jul 23, 2015 at 6:28 PM, Vedran Furač wrote:
> On 07/23/2015 05:25 PM, Ilya Dryomov wrote:
>> On Thu, Jul 23, 2015 at 6:02 PM, Vedran Furač wrote:
>>> 4118 writev(377, [{"\5\356\307l\361"..., 4096}, {"\337\261\17<\257"...,
>>> 4096}, {"\211&;s\310"..., 4096}, {"\370N\372:\252"..., 4096},
Hi, guys.
These days we are testing the ceph cache tiering, it seems that the cache
tiering agent does not honor the quota setting on the cache
pool, which means that if we have set a smaller quota size on the cache pool
than "target_max_bytes * cache_target_dirty_ratio" or so,
the cache tieri
Hi Greg,
I've been looking at the tcmalloc issues, but did seem to affect osd's, and
I do notice it in heavy read workloads (even after the patch and
increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728). This is
affecting the mon process though.
looking at perf top I'm getting most of the
Oh, if you were running dev releases, it's not super surprising that the stat
tracking was at some point buggy.
-Sam
- Original Message -
From: "Dan van der Ster"
To: "Samuel Just"
Cc: ceph-users@lists.ceph.com
Sent: Thursday, July 23, 2015 8:21:07 AM
Subject: Re: [ceph-users] PGs going
Hi,
That looks like a bug, ERESTARTSYS is not a valid error condition for write().
http://pubs.opengroup.org/onlinepubs/9699919799/
--
Eino Tuominen
> Vedran Furač kirjoitti 23.7.2015 kello 15.18:
>
> Hello,
>
> I'm having an issue with nginx writing to cephfs. Often I'm getting:
>
> wr
On 07/23/2015 05:25 PM, Ilya Dryomov wrote:
> On Thu, Jul 23, 2015 at 6:02 PM, Vedran Furač wrote:
>> 4118 writev(377, [{"\5\356\307l\361"..., 4096}, {"\337\261\17<\257"...,
>> 4096}, {"\211&;s\310"..., 4096}, {"\370N\372:\252"..., 4096},
>> {"\202\311/\347\260"..., 4096}, ...], 33) = ? ERESTARTS
On Thu, Jul 23, 2015 at 6:02 PM, Vedran Furač wrote:
> 4118 writev(377, [{"\5\356\307l\361"..., 4096}, {"\337\261\17<\257"...,
> 4096}, {"\211&;s\310"..., 4096}, {"\370N\372:\252"..., 4096},
> {"\202\311/\347\260"..., 4096}, ...], 33) = ? ERESTARTSYS (To be restarted)
> 4118 --- SIGALRM (Alarm c
Those pools were a few things: rgw.buckets plus a couple pools we use
for developing new librados clients. But the source of this issue is
likely related to the few pre-hammer development releases (and
crashes) we upgraded through whilst running a large scale test.
Anyway, now I'll know how to bett
From: Aaron
Sent: Jul 23, 2015 6:39 AM
To: dan.m...@inktank.com
Subject: Ceph problem
hello,
I am a user of ceph, I'm from china
I have two problem on ceph, I need your help
>>> import boto
>>> import boto.s3.connection
>>> access_key = '2EOCDA99UCZQFA1CQRCM'
>>> secret_key =
You may want to check your min_size value for your pools. If it is
set to the pool size value, then the cluster will not do I/O if you
loose a chassis.
On Sun, Jul 5, 2015 at 11:04 PM, Mallikarjun Biradar
wrote:
> Hi all,
>
> Setup details:
> Two storage enclosures each connected to 4 OSD nodes
On 07/23/2015 04:45 PM, Ilya Dryomov wrote:
>
> Can you provide the full strace output?
This is pretty much the all the relevant part:
4118 open("/home/ceph/temp/45/45/5/154545", O_RDWR|O_CREAT|O_EXCL,
0600) = 377
4118 writev(377, [{"\3\0\0\0\0"..., 4096}, {"\247\0\0\3\23"..., 4096},
{"\22
I did some (non-ceph) work on these, and concluded that bcache was the best
supported, most stable, and fastest. This was ~1 year ago, to take it with a
grain of salt, but that's what I would recommend.
Daniel
- Original Message -
From: "Dominik Zalewski"
To: "German Anders"
Cc:
correct.
Best Regards,
Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com || http://community.redhat.com
@scuttlemonkey || @ceph
On Tue, Jul 21, 2015 at 6:03 PM, Gregory Farnum wrote:
> On Tue, Jul 21, 2015 at 6:09 PM, Patrick McGarry wrote:
>> Hey cephers,
>>
>> Just a rem
On 07/23/2015 03:20 PM, Gregory Farnum wrote:
> On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač wrote:
>> Hello,
>>
>> I'm having an issue with nginx writing to cephfs. Often I'm getting:
>>
>> writev() "/home/ceph/temp/44/94/1/119444" failed (4: Interrupted
>> system call) while reading upstrea
On Thu, Jul 23, 2015 at 5:37 PM, Vedran Furač wrote:
> On 07/23/2015 04:19 PM, Ilya Dryomov wrote:
>> On Thu, Jul 23, 2015 at 4:23 PM, Vedran Furač wrote:
>>> On 07/23/2015 03:20 PM, Gregory Farnum wrote:
That's...odd. Are you using the kernel client or ceph-fuse, and on
which vers
On 07/23/2015 04:19 PM, Ilya Dryomov wrote:
> On Thu, Jul 23, 2015 at 4:23 PM, Vedran Furač wrote:
>> On 07/23/2015 03:20 PM, Gregory Farnum wrote:
>>>
>>> That's...odd. Are you using the kernel client or ceph-fuse, and on
>>> which version?
>>
>> Sorry, forgot to mention, it's kernel client, trie
On Thu, Jul 23, 2015 at 4:23 PM, Vedran Furač wrote:
> On 07/23/2015 03:20 PM, Gregory Farnum wrote:
>> On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač wrote:
>>> Hello,
>>>
>>> I'm having an issue with nginx writing to cephfs. Often I'm getting:
>>>
>>> writev() "/home/ceph/temp/44/94/1/119444
The packages were probably rebuilt without changing their name/version (bad
idea btw) and metadata either weren’t regenerated because of that or because of
some other problem.
You can mirror it and generate your own metadata or install the packages by
hand until it gets fixed.
Jan
P.S.In my ex
On 07/23/2015 03:20 PM, Gregory Farnum wrote:
> On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač wrote:
>>
>> Is it possible Ceph cannot find the destination PG fast enough and
>> returns ERESTARTSYS? Is there any way to fix this behavior or reduce it?
>
> That's...odd. Are you using the kernel clie
- Original Message -
> From: "Ken Dreyer"
> To: ceph-users@lists.ceph.com
> Sent: Tuesday, July 14, 2015 9:06:01 PM
> Subject: Re: [ceph-users] Ruby bindings for Librados
>
> On 07/13/2015 02:11 PM, Wido den Hollander wrote:
> > On 07/13/2015 09:43 PM, Corin Langosch wrote:
> >> Hi Wido
Hi all,
I am looking for a way to alleviate the overhead of RBD snapshots/clones for
some time.
In our scenario there are a few “master” volumes that contain production data,
and are frequently snapshotted and cloned for dev/qa use. Those
snapshots/clones live for a few days to a few weeks befo
On 07/23/2015 03:20 PM, Gregory Farnum wrote:
> On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač wrote:
>> Hello,
>>
>> I'm having an issue with nginx writing to cephfs. Often I'm getting:
>>
>> writev() "/home/ceph/temp/44/94/1/119444" failed (4: Interrupted
>> system call) while reading upstrea
On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač wrote:
> Hello,
>
> I'm having an issue with nginx writing to cephfs. Often I'm getting:
>
> writev() "/home/ceph/temp/44/94/1/119444" failed (4: Interrupted
> system call) while reading upstream
>
> looking with strace, this happens:
>
> ...
> wri
I am having the same issue and haven't figured out a resolution yet. The repo
is pointing to a valid URL, and I can whet the packages from that URL, but yum
complains about them. My initial thought is that something is screwy with the
md5sum either on package versions in the repo, or in my rpm
You should add the required capabilities to your user:
# radosgw-admin caps add --uid=testuser --caps="users=*"
# radosgw-admin caps add --uid=testuser --caps="buckets=*"
# radosgw-admin caps add --uid=testuser --caps="metadata=*"
# radosgw-admin caps add --uid=testuser --caps="zone=*"
On 3 J
image metadata isn't supported by hammer, interfails supports
On Mon, Jul 13, 2015 at 11:29 PM, Maged Mokhtar wrote:
> Hello
>
> i am trying to use the rbd image-meta set.
> i get an error from rbd that this command is not recognized
> yet it is documented in rdb documentation:
> http://ceph.com/
The docker/distribution project runs a continuous integration VM using
CircleCI, and part of the VM setup installs Ceph packages using
ceph-deploy. This has been working well for quite a while, but we are
seeing a failure running `ceph-deploy install --release hammer`. The
snippet is here where it
Hello
i am trying to use the rbd image-meta set.
i get an error from rbd that this command is not recognized
yet it is documented in rdb documentation:
http://ceph.com/docs/next/man/8/rbd/
I am using Hammer release deployed using ceph_deploy on Ubutnu 14.04
Is image-meta set supported in rbd in H
Hi,
Please Respond
Regards,
Bindu
On Fri, Jul 3, 2015 at 11:52 AM, Bindu Kharb wrote:
> Hi,
>
> I am trying to use swift as frontend with ceph storage. I have a small
> cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw
> on one of my machine and radosgw(gateway1) is als
Hi all,
Setup details:
Two storage enclosures each connected to 4 OSD nodes (Shared storage).
Failure domain is Chassis (enclosure) level. Replication count is 2.
Each host has allotted with 4 drives.
I have active client IO running on cluster. (Random write profile with
4M block size & 64 Queue
Hi,
Please reply...
Regards,
Bindu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hi,
I’ve asked same question last weeks or so (just search the mailing list
archives for EnhanceIO :) and got some interesting answers.
Looks like the project is pretty much dead since it was bought out by HGST.
Even their website has some broken links in regards to EnhanceIO
I’m keen to try f
Hi, I'm trying to use the Curl for rados admin ops requests.
I have problems with the keys, you use this autorizacción Authorization: AWS
{access-key}: {hash-of-header-and-secret}.
Where I can get the hash-of-header-and-secret?
Info of user:
radosgw-admin user info --uid=usuario1
{
"use
Hi all,
Setup details:
Two storage enclosures each connected to 4 OSD nodes (Shared storage).
Failure domain is Chassis (enclosure) level. Replication count is 2.
Each host has allotted with 4 drives.
I have active client IO running on cluster. (Random write profile with 4M
block size & 64 Queue
Hello
i am trying to use the rbd image-meta set.
i get an error from rbd that this command is not recognized
yet it is documented in rdb documentation:
http://ceph.com/docs/next/man/8/rbd/
I am using Hammer release deployed using ceph_deploy on Ubutnu 14.04
Is image-meta set supported in rbd in H
Hi all,
Setup details:
Two storage enclosures each connected to 4 OSD nodes (Shared storage).
Failure domain is Chassis (enclosure) level. Replication count is 2.
Each host has allotted with 4 drives.
I have active client IO running on cluster. (Random write profile with
4M block size & 64 Queue
Hi all,
Setup details:
Two storage enclosures each connected to 4 OSD nodes (Shared storage).
Failure domain is Chassis (enclosure) level. Replication count is 2.
Each host has allotted with 4 drives.
I have active client IO running on cluster. (Random write profile with
4M block size & 64 Queue
Hi,
I am trying to use swift as frontend with ceph storage. I have a small
cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw
on one of my machine and radosgw(gateway1) is also up and communicating
with cluster.
Now I have installed swift client and created user and subuser
All IO drops to ZERO IOPS for 1-15 minutes during the deep-scrub on my cluster.
There is clearly a locking bug!
I have VMs - every day, several times, sometime on all of them disk IO
_completely_ stops. Disk queue is growing, 0 IOPS are performed, services are
dying with timeouts... At the sam
Hi all,
Setup details:
Two storage enclosures each connected to 4 OSD nodes (Shared storage).
Failure domain is Chassis (enclosure) level. Replication count is 2.
Each host has allotted with 4 drives.
I have active client IO running on cluster. (Random write profile with
4M block size & 64 Queue
Hi,
We are trying to implement CEPH and have really huge issue with replication
between DC.
The issue we have is related to replication setup in our infrastructure, single
region, 2 zones in different datacenter. While trying to configure replication
we receive bellow message. We wonder if this
I'm trying to use the ceph el6 yum repo. Yesterday afternoon, I found
yum complain about 8 packages when trying to install or update ceph,
such as this:
(4/46): ceph-0.94.2-0.el6.x86_64.rpm
| 21 MB 00:01
http://ceph.com/rpm-hammer/el6/x86_64/ceph-0.94.2-0.el6.x86_64.rpm:
[Errno -1]
Hi,
I am trying to use swift as frontend with ceph storage. I have a small
cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw
on one of my machine and radosgw(gateway1) is also up and communicating
with cluster.
Now I have installed swift client and created user and subuser
Hi everyone,
This is announcing a new release of ceph-deploy that focuses on usability
improvements.
- Most of the help menus for ceph-deploy subcommands (e.ge. “ceph-deploy mon”
and “ceph-deploy osd”) have been improved to be more context aware, such that
help for “ceph-deploy osd create --h
Hi ceph users,
Please respond to my query..
Regards,
Bindu
On Fri, Jul 3, 2015 at 11:52 AM, Bindu Kharb wrote:
> Hi,
>
> I am trying to use swift as frontend with ceph storage. I have a small
> cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw
> on one of my machine and
Nevermind. I see that `ceph-deploy mon create-initial` has stopped
accepting the trailing hostname which was causing the failure. I don't
know if those problems above I showed are actually anything to worry
about :)
On Tue, Jul 21, 2015 at 3:17 PM, Noah Watkins wrote:
> The docker/distribution pr
Hi all,
Setup details:
Two storage enclosures each connected to 4 OSD nodes (Shared storage).
Failure domain is Chassis (enclosure) level. Replication count is 2.
Each host has allotted with 4 drives.
I have active client IO running on cluster. (Random write profile with 4M
block size & 64 Queue
Hello,
I'm having an issue with nginx writing to cephfs. Often I'm getting:
writev() "/home/ceph/temp/44/94/1/119444" failed (4: Interrupted
system call) while reading upstream
looking with strace, this happens:
...
write(65, "e\314\366\36\302"..., 65536) = ? ERESTARTSYS (To be restarted)
On Thu, 23 Jul 2015 11:14:22 +0100 Gregory Farnum wrote:
> Your note that dd can do 2GB/s without networking makes me think that
> you should explore that. As you say, network interrupts can be
> problematic in some systems. The only thing I can think of that's been
> really bad in the past is tha
Hi,
I use ceph 0.94 from wheezy repro (deb http://eu.ceph.com/debian-hammer wheezy
main) inside jessie.
0.94.1 are installable without trouble, but an upgrade to 0.94.2 don't work
correctly:
dpkg -l | grep ceph
ii ceph 0.94.1-1~bpo70+1 amd64
dis
Do you use upstream ceph version previously? Or do you shutdown
running ceph-osd when upgrading osd?
How many osds meet this problems?
This assert failure means that osd detects a upgraded pg meta object
but failed to read(or lack of 1 key) meta keys from object.
On Thu, Jul 23, 2015 at 7:03 PM,
Am 21.07.2015 12:06, schrieb Udo Lembke:
> Hi all,
> ...
>
> Normaly I would say, if one OSD-Node die, I simply reinstall the OS and ceph
> and I'm back again... but this looks bad
> for me.
> Unfortunality the system also don't start 9 OSDs as I switched back to the
> old system-disk... (only t
Hi,
Well I think the journaling would still appear in the dstat output, as that's
still IOs : even if the user-side bandwidth indeed is cut in half, that should
not be the case of disks IO.
For instance I just tried a replicated pool for the test, and got around
1300MiB/s in dstat for about 600
I'm not sure. It looks like Ceph and your disk controllers are doing
basically the right thing since you're going from 1GB/s to 420MB/s
when moving from dd to Ceph (the full data journaling cuts it in
half), but just fyi that dd task is not doing nearly the same thing as
Ceph does — you'd need to u
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Gregory Farnum
> Sent: 22 July 2015 15:05
> To: Nick Fisk
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] osd_agent_max_ops relating to number of OSDs in
> the cache pool
>
>
On 22/07/15 20:39, Shneur Zalman Mattern wrote:
Third test:
We wanted to try CephFS, because our client is familiar with
Lustre, that's very near to CephFS capabilities:
1. I've used my CEPH nodes in the client's role. I've
mounted CephFS on one of nodes, and ran dd with bs=1M
On Thu, Jul 23, 2015 at 8:39 AM, Luis Periquito wrote:
> The ceph-mon is already taking a lot of memory, and I ran a heap stats
>
> MALLOC: 32391696 ( 30.9 MiB) Bytes in use by application
> MALLOC: + 27597135872 (26318.7 MiB) Bytes in page
On Wed, Jul 22, 2015 at 8:39 PM, Shneur Zalman Mattern
wrote:
> Workaround... We're building now a huge computing cluster 140 computing
> DISKLESS nodes and they are pulling to storage a lot of computing data
> concurrently
> User that put job for the cluster - need also access to the same sto
The ceph-mon is already taking a lot of memory, and I ran a heap stats
MALLOC: 32391696 ( 30.9 MiB) Bytes in use by application
MALLOC: + 27597135872 (26318.7 MiB) Bytes in page heap freelist
MALLOC: + 16598552 ( 15.8 MiB) Bytes in cen
66 matches
Mail list logo