I removed the latest OSD that was respawing (osd.23) and now I having
the same problem with osd.30. It looks like they both have pg 3.f9 in
common. I tried "ceph pg repair 3.f9" but the OSD is still respawning.
Does anyone have any ideas?
Thanks,
Chris
ceph-osd-03:ceph-osd.30.log
-29> 20
e have experienced a repeatable issue when performing the following:
Ceph backend with no issues, we can repeat any time at will in lab and
production. Cloning an ESXi VM to another VM on the same datastore on
which the original VM resides. Practically instantly, the LIO machine
becomes unrespon
Hi Jan,
Thanks for the advice, hit the nail on the head.
I checked the limits and watched the no. of fd's and as it reached the soft
limit (1024) thats when the transfer came to a grinding halt and the vm
started locking up.
After your reply I also did some more googling and found another old th
Hello,
On Wed, 2 Sep 2015 22:38:12 + Deneau, Tom wrote:
> In a small cluster I have 2 OSD nodes with identical hardware, each with
> 6 osds.
>
> * Configuration 1: I shut down the osds on one node so I am using 6
> OSDS on a single node
>
Shut down how?
Just a "service blah stop" or actual
In a small cluster I have 2 OSD nodes with identical hardware, each with 6 osds.
* Configuration 1: I shut down the osds on one node so I am using 6 OSDS on a
single node
* Configuration 2: I shut down 3 osds on each node so now I have 6 total OSDS
but 3 on each node.
I measure read performa
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Changing to the acpi_idle driver dropped the performance by about 50%.
That was an unexpected result.
I'm having issues with powertop and the userspace governor, it always
shows 100% idle. I downloaded the latest version with the same result.
Still
Hi Noah,
What is the ownership on /var/lib/ceph ?
ceph-deploy should only be trying to use --setgroup if /var/lib/ceph is
owned by non-root.
On a fresh install of Hammer, this should be root:root.
The --setgroup flag was added to ceph-deploy in 1.5.26.
- Travis
On Wed, Sep 2, 2015 at 1:59 PM
When I lose a disk OR replace a OSD in my POC ceph cluster, it takes a very
long time to rebalance. I should note that my cluster is slightly unique
in that I am using cephfs(shouldn't matter?) and it currently contains
about 310 million objects.
The last time I replaced a disk/OSD was 2.5 days a
Thank You Mark , please see my response below.
On Wed, Sep 2, 2015 at 5:23 PM, Mark Nelson wrote:
> On 09/02/2015 08:51 AM, Vickey Singh wrote:
>
>> Hello Ceph Experts
>>
>> I have a strange problem , when i am reading or writing to Ceph pool ,
>> its not writing properly. Please notice Cur MB/s
On Wed, Sep 2, 2015 at 7:23 PM, Sage Weil wrote:
> On Wed, 2 Sep 2015, Dan van der Ster wrote:
>> On Wed, Sep 2, 2015 at 4:23 PM, Sage Weil wrote:
>> > On Wed, 2 Sep 2015, Dan van der Ster wrote:
>> >> On Wed, Sep 2, 2015 at 4:11 PM, Sage Weil wrote:
>> >> > On Wed, 2 Sep 2015, Dan van der Ster
Le 02/09/2015 18:16, Mathieu GAUTHIER-LAFAYE a écrit :
> Hi Lionel,
>
> - Original Message -
>> From: "Lionel Bouton"
>> To: "Mathieu GAUTHIER-LAFAYE" ,
>> ceph-us...@ceph.com
>> Sent: Wednesday, 2 September, 2015 4:40:26 PM
>> Subject: Re: [ceph-users] Corruption of file systems on RBD i
On Wed, 2 Sep 2015, Dan van der Ster wrote:
> On Wed, Sep 2, 2015 at 4:23 PM, Sage Weil wrote:
> > On Wed, 2 Sep 2015, Dan van der Ster wrote:
> >> On Wed, Sep 2, 2015 at 4:11 PM, Sage Weil wrote:
> >> > On Wed, 2 Sep 2015, Dan van der Ster wrote:
> >> >> ...
> >> >> Normally I use crushtool --te
Hey cephers,
While I'm sure that most of you probably get your Ceph-related
questions answered here on the mailing lists, Sage is doing an "Ask me
anything" on Reddit in about an hour:
https://www.reddit.com/r/IAmA/comments/3jdnnd/i_am_sage_weil_lead_architect_and_cocreator_of/
You can ask him q
Hi Lionel,
- Original Message -
> From: "Lionel Bouton"
> To: "Mathieu GAUTHIER-LAFAYE" ,
> ceph-us...@ceph.com
> Sent: Wednesday, 2 September, 2015 4:40:26 PM
> Subject: Re: [ceph-users] Corruption of file systems on RBD images
>
> Hi Mathieu,
>
> Le 02/09/2015 14:10, Mathieu GAUTHIER
Hi,
We're using Ceph Hammer 0.94.1 on centOS 7. On the monitor, when we set
log_to_syslog = true
Ceph starts shooting logs at stdout. I thought at first it might be
rsyslog that is wrongly configured, but I did not find a rule that could
explain this behavior.
Can anybody else replicate this? If
On 9/2/15, 9:31 AM, Gregory Farnum wrote:
[ Re-adding the list. ]
On Wed, Sep 2, 2015 at 4:29 PM, Erming Pei wrote:
Hi Gregory,
Thanks very much for the confirmation and explanation.
And I presume you have an MDS cap in there as well?
Is there a difference between set this cap and w
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Thanks for the responses.
I forgot to include the fio test for completeness:
8 job QD=8
[ext4-test]
runtime=150
name=ext4-test
readwrite=randrw
size=15G
blocksize=4k
ioengine=sync
iodepth=8
numjobs=8
thread
group_reporting
time_based
direct=1
1 j
Hi cephers, trying to deploying a new ceph cluster with master release
(v9.0.3) and when trying to create the initial mons and error appears
saying that "admin_socket: exception getting command descriptions: [Errno
2] No such file or directory", find the log:
...
[ceph_deploy.mon][INFO ] distro
On Wed, 2 Sep 2015, Dan van der Ster wrote:
> On Wed, Sep 2, 2015 at 4:23 PM, Sage Weil wrote:
> > On Wed, 2 Sep 2015, Dan van der Ster wrote:
> >> On Wed, Sep 2, 2015 at 4:11 PM, Sage Weil wrote:
> >> > On Wed, 2 Sep 2015, Dan van der Ster wrote:
> >> >> ...
> >> >> Normally I use crushtool --te
[ Re-adding the list. ]
On Wed, Sep 2, 2015 at 4:29 PM, Erming Pei wrote:
> Hi Gregory,
>
>Thanks very much for the confirmation and explanation.
>
>>And I presume you have an MDS cap in there as well?
> Is there a difference between set this cap and without setting?
Well, I don't think yo
On 09/02/2015 02:31 PM, Janusz Borkowski wrote:
> Hi!
>
> Do you have replication factor 2?
yes.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On Wed, Sep 2, 2015 at 4:23 PM, Sage Weil wrote:
> On Wed, 2 Sep 2015, Dan van der Ster wrote:
>> On Wed, Sep 2, 2015 at 4:11 PM, Sage Weil wrote:
>> > On Wed, 2 Sep 2015, Dan van der Ster wrote:
>> >> ...
>> >> Normally I use crushtool --test --show-mappings to test rules, but
>> >> AFAICT it do
Hi Mathieu,
Le 02/09/2015 14:10, Mathieu GAUTHIER-LAFAYE a écrit :
> Hi All,
>
> We have some troubles regularly with virtual machines using RBD storage. When
> we restart some virtual machines, they starts to do some filesystem checks.
> Sometime it can rescue it, sometime the virtual machine d
On 09/02/2015 08:51 AM, Vickey Singh wrote:
Hello Ceph Experts
I have a strange problem , when i am reading or writing to Ceph pool ,
its not writing properly. Please notice Cur MB/s which is going up and down
--- Ceph Hammer 0.94.2
-- CentOS 6, 2.6
-- Ceph cluster is healthy
You might find t
On Wed, 2 Sep 2015, Dan van der Ster wrote:
> On Wed, Sep 2, 2015 at 4:11 PM, Sage Weil wrote:
> > On Wed, 2 Sep 2015, Dan van der Ster wrote:
> >> ...
> >> Normally I use crushtool --test --show-mappings to test rules, but
> >> AFAICT it doesn't let you simulate an out osd, i.e. with reweight = 0
On Wed, Sep 2, 2015 at 4:11 PM, Sage Weil wrote:
> On Wed, 2 Sep 2015, Dan van der Ster wrote:
>> ...
>> Normally I use crushtool --test --show-mappings to test rules, but
>> AFAICT it doesn't let you simulate an out osd, i.e. with reweight = 0.
>> Any ideas how to test this situation without uplo
On Wed, 2 Sep 2015, Dan van der Ster wrote:
> Hi all,
>
> We just ran into a small problem where some PGs wouldn't backfill
> after an OSD was marked out. Here's the relevant crush rule; being a
> non-trivial example I'd like to test different permutations of the
> crush map (e.g. increasing choos
As of yesterday we are now ready to start providing Debian Jessie packages.
They will be present by default for the upcoming Ceph release (Infernalis).
For other releases (e.g. Firefly, Hammer, Giant) it means that there will
be a Jessie package for them for new versions only.
Let me know if you
Hi all,
We just ran into a small problem where some PGs wouldn't backfill
after an OSD was marked out. Here's the relevant crush rule; being a
non-trivial example I'd like to test different permutations of the
crush map (e.g. increasing choose_total_tries):
rule critical {
ruleset 4
Hello Ceph Experts
I have a strange problem , when i am reading or writing to Ceph pool , its
not writing properly. Please notice Cur MB/s which is going up and down
--- Ceph Hammer 0.94.2
-- CentOS 6, 2.6
-- Ceph cluster is healthy
One interesting thing is when every i start rados bench comman
Hi John and Zheng,
Thanks for the quick replies!
I'm using kernel 4.2. I'll test out that fix.
Arthur
On Wed, Sep 2, 2015 at 10:29 PM, Yan, Zheng wrote:
> probably caused by http://tracker.ceph.com/issues/12551
>
> On Wed, Sep 2, 2015 at 7:57 PM, Arthur Liu wrote:
> > Hi,
> >
> > I am experie
probably caused by http://tracker.ceph.com/issues/12551
On Wed, Sep 2, 2015 at 7:57 PM, Arthur Liu wrote:
> Hi,
>
> I am experiencing an issue with CephFS with cache tiering where the kernel
> clients are reading files filled entirely with 0s.
>
> The setup:
> ceph 0.94.3
> create cephfs_metadata
Hi!
Do you have replication factor 2?
To test recovery e.g. kill one OSD process, observe when ceph notices it and
starts moving data. Reformat the OSD partition, remove the killed OSD from
cluster, then add a new OSD using the freshly formatted partition. When you
have again 3 OSDs, observe w
Hi!
Thanks for the explanation. The behaviour (overwriting) was puzzling and
suggesting serious filesystem corruption. Once we identified the scenario, we
can try workarounds.
Regards,
J.
On 02.09.2015 11:50, Yan, Zheng wrote:
>> On Sep 2, 2015, at 17:11, Gregory Farnum wrote:
>>
>> Whoops, f
Hi All,
We have some troubles regularly with virtual machines using RBD storage. When
we restart some virtual machines, they starts to do some filesystem checks.
Sometime it can rescue it, sometime the virtual machine die (Linux or Windows).
We have move from Firefly to Hammer the last month. I
Hi,
I am experiencing an issue with CephFS with cache tiering where the kernel
clients are reading files filled entirely with 0s.
The setup:
ceph 0.94.3
create cephfs_metadata replicated pool
create cephfs_data replicated pool
cephfs was created on the above two pools, populated with files, then:
Thanks!
Playing around with max_keys in bucket listing retrieval actually gives
me results or not, this gives me a way to list the content until the bug
is fixed.
Is it possible somehow to copy the objects to a new bucket (with
versioning disabled), and rename the current one? I don't think the
Hi,
When it can be expected that there will be a Jessie repo for ceph hammer
available?
Thanks!
Mit freundlichen Grüßen/Kind regards
Jonas Rottmann
Systems Engineer
FIS-ASP Application Service Providing und
IT-Outsourcing GmbH
Röthleiner Weg 4
D-97506 Grafenrheinfeld
Phone: +49 (9723) 9188-568
On 08/31/2015 09:39 AM, Wido den Hollander wrote:
> True, but your performance is greatly impacted during recovery. So a
> three node cluster might work well when the skies are clear and the sun
> is shining, but it has a hard time dealing with a complete node failure.
The question of "how tiny a
> On Sep 2, 2015, at 17:11, Gregory Farnum wrote:
>
> Whoops, forgot to add Zheng.
>
> On Wed, Sep 2, 2015 at 10:11 AM, Gregory Farnum wrote:
>> On Wed, Sep 2, 2015 at 10:00 AM, Janusz Borkowski
>> wrote:
>>> Hi!
>>>
>>> I mount cephfs using kernel client (3.10.0-229.11.1.el7.x86_64).
>>>
>
> On Sep 2, 2015, at 16:44, Gregory Farnum wrote:
>
> On Tue, Sep 1, 2015 at 9:20 PM, Erming Pei wrote:
>> Hi,
>>
>> I tried to set up a read-only permission for a client but it looks always
>> writable.
>>
>> I did the following:
>>
>> ==Server end==
>>
>> [client.cephfs_data_ro]
>>
After search the source code, i found ceph_psim tool which can
simulate objects distribution,
but it seems a little simple.
2015-09-01 22:58 GMT+08:00 huang jun :
> hi,all
>
> Recently, i did some experiments on OSD data distribution,
> we set up a cluster with 72 OSDs,all 2TB sata disk,
> and c
I think this may be related to what I had to do, it rings a bell at least.
http://unix.stackexchange.com/questions/153693/cant-use-userspace-cpufreq-governor-and-set-cpu-frequency
The P-state drive doesn't support userspace, so you need to disable it and make
Linux use the old acpi drive instead
On Wed, Sep 2, 2015 at 10:00 AM, Janusz Borkowski
wrote:
> Hi!
>
> I mount cephfs using kernel client (3.10.0-229.11.1.el7.x86_64).
>
> The effect is the same when doing "echo >>" from another machine and from a
> machine keeping the file open.
>
> The file is opened with open( ..,
> O_WRONLY|O_LA
Whoops, forgot to add Zheng.
On Wed, Sep 2, 2015 at 10:11 AM, Gregory Farnum wrote:
> On Wed, Sep 2, 2015 at 10:00 AM, Janusz Borkowski
> wrote:
>> Hi!
>>
>> I mount cephfs using kernel client (3.10.0-229.11.1.el7.x86_64).
>>
>> The effect is the same when doing "echo >>" from another machine an
Hi!
I mount cephfs using kernel client (3.10.0-229.11.1.el7.x86_64).
The effect is the same when doing "echo >>" from another machine and from a
machine keeping the file open.
The file is opened with open( ..,
O_WRONLY|O_LARGEFILE|O_APPEND|O_BINARY|O_CREAT)
Shell ">>" is implemented as (from
On Tue, Sep 1, 2015 at 9:20 PM, Erming Pei wrote:
> Hi,
>
> I tried to set up a read-only permission for a client but it looks always
> writable.
>
> I did the following:
>
> ==Server end==
>
> [client.cephfs_data_ro]
> key = AQxx==
> caps mon = "allow r"
> caps
1) Take a look at the number of file descriptors the QEMU process is using, I
think you are over the limits
pid=pid of qemu process
cat /proc/$pid/limits
echo /proc/$pid/fd/* | wc -w
2) Jumbo frames may be the cause, are they enabled on the rest of the network?
In any case, get rid of NetworkM
What "idle" driver are you using?
/dev/cpu_dma_latency might not be sufficient if the OS uses certain idle
instructions, IMO mwait is still issued and its latency might not be 1 on Atoms.
What is in /sys/devices/system/cpu/cpu0/cpuidle/state*/latency on Atoms?
Btw disabling all power management i
I somehow missed the original question, but if you run a database on CEPH you
will be limited not by throughput but by latency.
Even if you run OSDs with ramdisk, the latency will still be 1-2ms at best
(depending strictly on OSD CPU and memory speed) and that limits the number of
database trans
Hi,
comments below
> On 01 Sep 2015, at 18:08, Jelle de Jong wrote:
>
> Hi Jan,
>
> I am building two new clusters for testing. I been reading your messages
> on the mailing list for a while now and want to thank you for your support.
>
> I can redo all the numbers, but is your question to run
51 matches
Mail list logo