Re: [ceph-users] Ceph with Clos IP fabric

2017-05-08 Thread Jan Marquardt
Hi,

sorry for the delay, but in the meantime we were able to find a
workaround. Inspired by this:

> Side note: Configuring the loopback IP on the physical interfaces is
> workable if you set it on **all** parallel links. Example with server1:
> 
>  
> 
> “iface enp3s0f0 inet static
> 
>   address 10.10.100.21/32
> 
> iface enp3s0f1 inet static
> 
>   address 10.10.100.21/32
> 
> iface enp4s0f0 inet static
> 
>   address 10.10.100.21/32
> 
> iface enp4s0f1 inet static
> 
>   address 10.10.100.21/32”
> 
>  
> 
> This should guarantee that the loopback ip is advertised if one of the 4
> links to switch1 and switch2 is up, but I am not sure if that’s workable
> for ceph’s listening address.

We added the loopback ip as well on lo as on dummy0. This solves the
issue for us and the Ceph cluster works as intended.

Regards

Jan


-- 
Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg
Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95
E-Mail: supp...@artfiles.de | Web: http://www.artfiles.de
Geschäftsführer: Harald Oltmanns | Tim Evers
Eingetragen im Handelsregister Hamburg - HRB 81478



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-08 Thread Jason Dillaman
You are saying that you had v2 RBD images created against Hammer OSDs
and client libraries where exclusive lock, object map, etc were never
enabled. You then upgraded the OSDs and clients to Jewel and at some
point enabled exclusive lock (and I'd assume object map) on these
images -- or were the exclusive lock and object map features already
enabled under Hammer?

The fact that you encountered an object map error on an export
operation is surprising to me.  Does that error re-occur if you
perform the export again? If you can repeat it, it would be very
helpful if you could run the export with "--debug-rbd=20" and capture
the generated logs.

On Sat, May 6, 2017 at 2:38 PM, Stefan Priebe - Profihost AG
 wrote:
> Hi,
>
> also i'm getting these errors only for pre jewel images:
>
> 2017-05-06 03:20:50.830626 7f7876a64700 -1
> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
> object map in-memory
> 2017-05-06 03:20:50.830634 7f7876a64700 -1
> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
> object map on-disk
> 2017-05-06 03:20:50.831250 7f7877265700 -1
> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0
>
> while running export-diff.
>
> Stefan
>
> Am 06.05.2017 um 07:37 schrieb Stefan Priebe - Profihost AG:
>> Hello Json,
>>
>> while doing further testing it happens only with images created with
>> hammer and that got upgraded to jewel AND got enabled exclusive lock.
>>
>> Greets,
>> Stefan
>>
>> Am 04.05.2017 um 14:20 schrieb Jason Dillaman:
>>> Odd. Can you re-run "rbd rm" with "--debug-rbd=20" added to the
>>> command and post the resulting log to a new ticket at [1]? I'd also be
>>> interested if you could re-create that
>>> "librbd::object_map::InvalidateRequest" issue repeatably.
>>>n
>>> [1] http://tracker.ceph.com/projects/rbd/issues
>>>
>>> On Thu, May 4, 2017 at 3:45 AM, Stefan Priebe - Profihost AG
>>>  wrote:
 Example:
 # rbd rm cephstor2/vm-136-disk-1
 Removing image: 99% complete...

 Stuck at 99% and never completes. This is an image which got corrupted
 for an unknown reason.

 Greets,
 Stefan

 Am 04.05.2017 um 08:32 schrieb Stefan Priebe - Profihost AG:
> I'm not sure whether this is related but our backup system uses rbd
> snapshots and reports sometimes messages like these:
> 2017-05-04 02:42:47.661263 7f3316ffd700 -1
> librbd::object_map::InvalidateRequest: 0x7f3310002570 should_complete: r=0
>
> Stefan
>
>
> Am 04.05.2017 um 07:49 schrieb Stefan Priebe - Profihost AG:
>> Hello,
>>
>> since we've upgraded from hammer to jewel 10.2.7 and enabled
>> exclusive-lock,object-map,fast-diff we've problems with corrupting VM
>> filesystems.
>>
>> Sometimes the VMs are just crashing with FS errors and a restart can
>> solve the problem. Sometimes the whole VM is not even bootable and we
>> need to import a backup.
>>
>> All of them have the same problem that you can't revert to an older
>> snapshot. The rbd command just hangs at 99% forever.
>>
>> Is this a known issue - anythink we can check?
>>
>> Greets,
>> Stefan
>>
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-08 Thread Stefan Priebe - Profihost AG
Hi,
Am 08.05.2017 um 14:40 schrieb Jason Dillaman:
> You are saying that you had v2 RBD images created against Hammer OSDs
> and client libraries where exclusive lock, object map, etc were never
> enabled. You then upgraded the OSDs and clients to Jewel and at some
> point enabled exclusive lock (and I'd assume object map) on these
> images

Yes i did:
for img in $(rbd -p cephstor5 ls -l | grep -v "@" | awk '{ print $1 }');
do rbd -p cephstor5 feature enable $img
exclusive-lock,object-map,fast-diff || echo $img; done

> -- or were the exclusive lock and object map features already
> enabled under Hammer?

No as they were not the rbd defaults.

> The fact that you encountered an object map error on an export
> operation is surprising to me.  Does that error re-occur if you
> perform the export again? If you can repeat it, it would be very
> helpful if you could run the export with "--debug-rbd=20" and capture
> the generated logs.

No i can't repeat it. It happens every night but for different images.
But i never saw it for a vm twice. If i do he export again it works fine.

I'm doing an rbd export or an rbd export-diff --from-snap it depends on
the VM and day since the last snapshot.

Greets,
Stefan

> 
> On Sat, May 6, 2017 at 2:38 PM, Stefan Priebe - Profihost AG
>  wrote:
>> Hi,
>>
>> also i'm getting these errors only for pre jewel images:
>>
>> 2017-05-06 03:20:50.830626 7f7876a64700 -1
>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
>> object map in-memory
>> 2017-05-06 03:20:50.830634 7f7876a64700 -1
>> librbd::object_map::InvalidateRequest: 0x7f7860004410 invalidating
>> object map on-disk
>> 2017-05-06 03:20:50.831250 7f7877265700 -1
>> librbd::object_map::InvalidateRequest: 0x7f7860004410 should_complete: r=0
>>
>> while running export-diff.
>>
>> Stefan
>>
>> Am 06.05.2017 um 07:37 schrieb Stefan Priebe - Profihost AG:
>>> Hello Json,
>>>
>>> while doing further testing it happens only with images created with
>>> hammer and that got upgraded to jewel AND got enabled exclusive lock.
>>>
>>> Greets,
>>> Stefan
>>>
>>> Am 04.05.2017 um 14:20 schrieb Jason Dillaman:
 Odd. Can you re-run "rbd rm" with "--debug-rbd=20" added to the
 command and post the resulting log to a new ticket at [1]? I'd also be
 interested if you could re-create that
 "librbd::object_map::InvalidateRequest" issue repeatably.
 n
 [1] http://tracker.ceph.com/projects/rbd/issues

 On Thu, May 4, 2017 at 3:45 AM, Stefan Priebe - Profihost AG
  wrote:
> Example:
> # rbd rm cephstor2/vm-136-disk-1
> Removing image: 99% complete...
>
> Stuck at 99% and never completes. This is an image which got corrupted
> for an unknown reason.
>
> Greets,
> Stefan
>
> Am 04.05.2017 um 08:32 schrieb Stefan Priebe - Profihost AG:
>> I'm not sure whether this is related but our backup system uses rbd
>> snapshots and reports sometimes messages like these:
>> 2017-05-04 02:42:47.661263 7f3316ffd700 -1
>> librbd::object_map::InvalidateRequest: 0x7f3310002570 should_complete: 
>> r=0
>>
>> Stefan
>>
>>
>> Am 04.05.2017 um 07:49 schrieb Stefan Priebe - Profihost AG:
>>> Hello,
>>>
>>> since we've upgraded from hammer to jewel 10.2.7 and enabled
>>> exclusive-lock,object-map,fast-diff we've problems with corrupting VM
>>> filesystems.
>>>
>>> Sometimes the VMs are just crashing with FS errors and a restart can
>>> solve the problem. Sometimes the whole VM is not even bootable and we
>>> need to import a backup.
>>>
>>> All of them have the same problem that you can't revert to an older
>>> snapshot. The rbd command just hangs at 99% forever.
>>>
>>> Is this a known issue - anythink we can check?
>>>
>>> Greets,
>>> Stefan
>>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



> 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph memory overhead when used with KVM

2017-05-08 Thread Jason Dillaman
Thanks. One more question: was the image a clone or a stand-alone image?

On Fri, May 5, 2017 at 2:42 AM, nick  wrote:
> Hi,
> I used one of the fio example files and changed it a bit:
>
> """
> # This job file tries to mimic the Intel IOMeter File Server Access Pattern
> [global]
> description=Emulation of Intel IOmeter File Server Access Pattern
> randrepeat=0
> filename=/root/test.dat
> # IOMeter defines the server loads as the following:
> # iodepth=1 Linear
> # iodepth=4 Very Light
> # iodepth=8 Light
> # iodepth=64Moderate
> # iodepth=256   Heavy
> iodepth=8
> size=80g
> direct=0
> ioengine=libaio
>
> [iometer]
> stonewall
> bs=4M
> rw=randrw
>
> [iometer_just_write]
> stonewall
> bs=4M
> rw=write
>
> [iometer_just_read]
> stonewall
> bs=4M
> rw=read
> """
>
> Then let it run:
> $> while true; do fio stress.fio; rm /root/test.dat; done
>
> I had this running over a weekend.
>
> Cheers
> Sebastian
>
> On Tuesday, May 02, 2017 02:51:06 PM Jason Dillaman wrote:
>> Can you share the fio job file that you utilized so I can attempt to
>> repeat locally?
>>
>> On Tue, May 2, 2017 at 2:51 AM, nick  wrote:
>> > Hi Jason,
>> > thanks for your feedback. I did now some tests over the weekend to verify
>> > the memory overhead.
>> > I was using qemu 2.8 (taken from the Ubuntu Cloud Archive) with librbd
>> > 10.2.7 on Ubuntu 16.04 hosts. I suspected the ceph rbd cache to be the
>> > cause of the overhead so I just generated a lot of IO with the help of
>> > fio in the VMs (with a datasize of 80GB) . All VMs had 3GB of memory. I
>> > had to run fio multiple times, before reaching high RSS values.
>> > I also noticed that when using larger blocksizes during writes (like 4M)
>> > the memory overhead in the KVM process increased faster.
>> > I ran several fio tests (one after another) and the results are:
>> >
>> > KVM with writeback RBD cache: max. 85% memory overhead (2.5 GB overhead)
>> > KVM with writethrough RBD cache: max. 50% memory overhead
>> > KVM without RBD caching: less than 10% overhead all the time
>> > KVM with local storage (logical volume used): 8% overhead all the time
>> >
>> > I did not reach those >200% memory overhead results that we see on our
>> > live
>> > cluster, but those virtual machines have a way longer uptime as well.
>> >
>> > I also tried to reduce the RSS memory value with cache dropping on the
>> > physical host and in the VM. Both did not lead to any change. A reboot of
>> > the VM also does not change anything (reboot in the VM, not a new KVM
>> > process). The only way to reduce the RSS memory value is a live migration
>> > so far. Might this be a bug? The memory overhead sounds a bit too much
>> > for me.
>> >
>> > Best Regards
>> > Sebastian
>> >
>> > On Thursday, April 27, 2017 10:08:36 AM you wrote:
>> >> I know we noticed high memory usage due to librados in the Ceph
>> >> multipathd checker [1] -- the order of hundreds of megabytes. That
>> >> client was probably nearly as trivial as an application can get and I
>> >> just assumed it was due to large monitor maps being sent to the client
>> >> for whatever reason. Since we changed course on our RBD iSCSI
>> >> implementation, unfortunately the investigation into this high memory
>> >> usage fell by the wayside.
>> >>
>> >> [1]
>> >> http://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=blob;f=libmult
>> >> ip
>> >> ath/checkers/rbd.c;h=9ea0572f2b5bd41b80bf2601137b74f92bdc7278;hb=HEAD
>> >>
>> >> On Thu, Apr 27, 2017 at 5:26 AM, nick  wrote:
>> >> > Hi Christian,
>> >> > thanks for your answer.
>> >> > The highest value I can see for a local storage VM in our
>> >> > infrastructure
>> >> > is a memory overhead of 39%. This is big, but the majority (>90%) of
>> >> > our
>> >> > local storage VMs are using less than 10% memory overhead.
>> >> > For ceph storage based VMs this looks quite different. The highest
>> >> > value I
>> >> > can see currently is 244% memory overhead. So that specific allocated
>> >> > 3GB
>> >> > memory VM is using now 10.3 GB RSS memory on the physical host. This is
>> >> > a
>> >> > really huge value. In general I can see that the majority of the ceph
>> >> > based VMs has more than 60% memory overhead.
>> >> >
>> >> > Maybe this is also a bug related to qemu+librbd. It would be just nice
>> >> > to
>> >> > know if other people are seeing those high values as well.
>> >> >
>> >> > Cheers
>> >> > Sebastian
>> >> >
>> >> > On Thursday, April 27, 2017 06:10:48 PM you wrote:
>> >> >> Hello,
>> >> >>
>> >> >> Definitely seeing about 20% overhead with Hammer as well, so not
>> >> >> version
>> >> >> specific from where I'm standing.
>> >> >>
>> >> >> While non-RBD storage VMs by and large tend to be closer the specified
>> >> >> size, I've seen them exceed things by few % at times, too.
>> >> >> For example a 4317968KB RSS one that ought to be 4GB.
>> >> >>
>> >> >> Regards,
>> >> >>
>> >> >> Christian
>> >> >>
>> >> >> On Thu, 27 Apr 2017 09:56:48 +0200 nick wrote:
>> >> >> > Hi,
>>

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-08 Thread Peter Maloney
On 05/08/17 14:50, Stefan Priebe - Profihost AG wrote:
> Hi,
> Am 08.05.2017 um 14:40 schrieb Jason Dillaman:
>> You are saying that you had v2 RBD images created against Hammer OSDs
>> and client libraries where exclusive lock, object map, etc were never
>> enabled. You then upgraded the OSDs and clients to Jewel and at some
>> point enabled exclusive lock (and I'd assume object map) on these
>> images
> Yes i did:
> for img in $(rbd -p cephstor5 ls -l | grep -v "@" | awk '{ print $1 }');
> do rbd -p cephstor5 feature enable $img
> exclusive-lock,object-map,fast-diff || echo $img; done
If that's all you did, and not also
rbd object-map rebuild $img

Then it won't have a proper object map, which is probably more like
being disabled rather than more bugged.

And in my testing, this randomly doesn't work, often saying the object
map is invalid after some point (in rbd info output), unless you do
something... maybe disconnect all clients, then rebuild it, then clients
can connect again, or just repeat a few times until it stays. I haven't
tested that well.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-08 Thread Stefan Priebe - Profihost AG
Hi Peter,



Am 08.05.2017 um 15:23 schrieb Peter Maloney:
> On 05/08/17 14:50, Stefan Priebe - Profihost AG wrote:
>> Hi,
>> Am 08.05.2017 um 14:40 schrieb Jason Dillaman:
>>> You are saying that you had v2 RBD images created against Hammer OSDs
>>> and client libraries where exclusive lock, object map, etc were never
>>> enabled. You then upgraded the OSDs and clients to Jewel and at some
>>> point enabled exclusive lock (and I'd assume object map) on these
>>> images
>> Yes i did:
>> for img in $(rbd -p cephstor5 ls -l | grep -v "@" | awk '{ print $1 }');
>> do rbd -p cephstor5 feature enable $img
>> exclusive-lock,object-map,fast-diff || echo $img; done
> If that's all you did, and not also
> rbd object-map rebuild $img

sorry we did a rbd object-map rebuild as well in a 2nd run. I just
missed this command. Sorry.

> And in my testing, this randomly doesn't work, often saying the object
> map is invalid after some point (in rbd info output), unless you do
> something... maybe disconnect all clients, then rebuild it, then clients
> can connect again, or just repeat a few times until it stays. I haven't
> tested that well.

no i've not disconnected the clients. I've seen no information that the
clients need to do a reconnect. I also have no idea how to force a
reconnect while running qemu.

Greets,
Stefan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EXT: Re: Intel power tuning - 30% throughput performance increase

2017-05-08 Thread Warren Wang - ISD
We also noticed a tremendous gain in latency performance by setting cstates to 
processor.max_cstate=1 intel_idle.max_cstate=0. We went from being over 1ms 
latency for 4KB writes to well under (.7ms? going off mem). I will note that we 
did not have as much of a problem on Intel v3 procs, but on v4 procs, our low 
QD, single threaded write perf dropped tremendously. I don’t recall now, but it 
was much worse than just a 30% loss in perf compared to a v3 proc that had 
default C states set. We only saw a small bump in power usage as well.

Bumping the CPU frequency up also offered a small performance change as well.

Warren Wang
Walmart ✻

On 5/3/17, 3:43 AM, "ceph-users on behalf of Dan van der Ster" 
 wrote:

Hi Blair,

We use cpu_dma_latency=1, because it was in the latency-performance profile.
And indeed by setting cpu_dma_latency=0 on one of our OSD servers,
powertop now shows the package as 100% in turbo mode.

So I suppose we'll pay for this performance boost in energy.
But more importantly, can the CPU survive being in turbo 100% of the time?

-- Dan



On Wed, May 3, 2017 at 9:13 AM, Blair Bethwaite
 wrote:
> Hi all,
>
> We recently noticed that despite having BIOS power profiles set to
> performance on our RHEL7 Dell R720 Ceph OSD nodes, that CPU frequencies
> never seemed to be getting into the top of the range, and in fact spent a
> lot of time in low C-states despite that BIOS option supposedly disabling
> C-states.
>
> After some investigation this C-state issue seems to be relatively common,
> apparently the BIOS setting is more of a config option that the OS can
> choose to ignore. You can check this by examining
> /sys/module/intel_idle/parameters/max_cstate - if this is >1 and you 
*think*
> C-states are disabled then your system is messing with you.
>
> Because the contemporary Intel power management driver
> (https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt) now
> limits the proliferation of OS level CPU power profiles/governors, the 
only
> way to force top frequencies is to either set kernel boot command line
> options or use the /dev/cpu_dma_latency, aka pmqos, interface.
>
> We did the latter using the pmqos_static.py, which was previously part of
> the RHEL6 tuned latency-performance profile, but seems to have been 
dropped
> in RHEL7 (don't yet know why), and in any case the default tuned profile 
is
> throughput-performance (which does not change cpu_dma_latency). You can 
find
> the pmqos-static.py script here
> 
https://github.com/NetSys/NetBricks/blob/master/scripts/tuning/pmqos-static.py.
>
> After setting `./pmqos-static.py cpu_dma_latency=0` across our OSD nodes 
we
> saw a conservative 30% increase in backfill and recovery throughput - now
> when our main RBD pool of 900+ OSDs is backfilling we expect to see 
~22GB/s,
> previously that was ~15GB/s.
>
> We have just got around to opening a case with Red Hat regarding this as 
at
> minimum Ceph should probably be actively using the pmqos interface and 
tuned
> should be setting this with recommendations for the latency-performance
> profile in the RHCS install guide. We have done no characterisation of it 
on
> Ubuntu yet, however anecdotally it looks like it has similar issues on the
> same hardware.
>
> Merry xmas.
>
> Cheers,
> Blair
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reg: Ceph-deploy install - failing

2017-05-08 Thread Curt
Hello,

I don't use Cent, but I've seen the same thing with Ubuntu, I'm going to
assume it's the same problem.  The repo url should be download.ceph.com,
instead of just ceph.com, which is uses when it add's it to the repo.  My
solution usually is, correct the repo URL to point to download.ceph.com and
in the ceph-deploy add the tag --no-adjust-repo.

Cheers

On Sun, May 7, 2017 at 3:51 PM, psuresh  wrote:

> Hi,
>
> When i run "ceph-deploy install lceph-mon2" from admin node i'm getting
> following error.   Any clue!
>
> [cide-lceph-mon2][DEBUG ] connected to host: cide-lceph-mon2
> [cide-lceph-mon2][DEBUG ] detect platform information from remote host
> [cide-lceph-mon2][DEBUG ] detect machine type
> [ceph_deploy.install][INFO  ] Distro info: CentOS Linux 7.3.1611 Core
> [cide-lceph-mon2][INFO  ] installing ceph on cide-lceph-mon2
> [cide-lceph-mon2][INFO  ] Running command: yum clean all
> [cide-lceph-mon2][DEBUG ] Loaded plugins: fastestmirror, langpacks,
> priorities
> [cide-lceph-mon2][DEBUG ] Cleaning repos: Ceph Ceph-noarch base
> ceph-source epel extras updates
> [cide-lceph-mon2][DEBUG ] Cleaning up everything
> [cide-lceph-mon2][INFO  ] adding EPEL repository
> [cide-lceph-mon2][INFO  ] Running command: yum -y install epel-release
> [cide-lceph-mon2][DEBUG ] Loaded plugins: fastestmirror, langpacks,
> priorities
> [cide-lceph-mon2][DEBUG ] Determining fastest mirrors
> [cide-lceph-mon2][DEBUG ]  * base: centos.excellmedia.net
> [cide-lceph-mon2][DEBUG ]  * epel: ftp.cuhk.edu.hk
> [cide-lceph-mon2][DEBUG ]  * extras: centos.excellmedia.net
> [cide-lceph-mon2][DEBUG ]  * updates: centos.excellmedia.net
> [cide-lceph-mon2][DEBUG ] Package epel-release-7-9.noarch already
> installed and latest version
> [cide-lceph-mon2][DEBUG ] Nothing to do
> [cide-lceph-mon2][INFO  ] Running command: yum -y install yum-priorities
> [cide-lceph-mon2][DEBUG ] Loaded plugins: fastestmirror, langpacks,
> priorities
> [cide-lceph-mon2][DEBUG ] Loading mirror speeds from cached hostfile
> [cide-lceph-mon2][DEBUG ]  * base: centos.excellmedia.net
> [cide-lceph-mon2][DEBUG ]  * epel: ftp.cuhk.edu.hk
> [cide-lceph-mon2][DEBUG ]  * extras: centos.excellmedia.net
> [cide-lceph-mon2][DEBUG ]  * updates: centos.excellmedia.net
> [cide-lceph-mon2][DEBUG ] Package yum-plugin-priorities-1.1.31-40.el7.noarch
> already installed and latest version
> [cide-lceph-mon2][DEBUG ] Nothing to do
> [cide-lceph-mon2][DEBUG ] Configure Yum priorities to include obsoletes
> [cide-lceph-mon2][WARNIN] check_obsoletes has been enabled for Yum
> priorities plugin
> [cide-lceph-mon2][INFO  ] Running command: rpm --import
> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [cide-lceph-mon2][INFO  ] Running command: rpm -Uvh --replacepkgs
> http://ceph.com/rpm-hammer/el7/noarch/ceph-release-1-0.el7.noarch.rpm
> [cide-lceph-mon2][WARNIN] error: open of  failed: No such file or
> directory
> [cide-lceph-mon2][WARNIN] error: open of Index failed: No
> such file or directory
> [cide-lceph-mon2][WARNIN] error: open of of failed: No such file or
> directory
> [cide-lceph-mon2][WARNIN] error: open of /rpm-hammer/
> failed: No such file or directory
> [cide-lceph-mon2][WARNIN] error: open of  directory
> [cide-lceph-mon2][WARNIN] error: open of bgcolor=white> failed: No such
> file or directory
> [cide-lceph-mon2][WARNIN] error: open of Index failed: No such file or
> directory
> [cide-lceph-mon2][WARNIN] error: open of of failed: No such file or
> directory
> [cide-lceph-mon2][WARNIN] error: open of /rpm-hammer/ failed: No such file or directory
> [cide-lceph-mon2][WARNIN] error: open of href=../>../ failed: No such
> file or directory
> [cide-lceph-mon2][WARNIN] error: open of  directory
> [cide-lceph-mon2][WARNIN] error: open of href=el6/>el6/ failed: No
> such file or directory
> [cide-lceph-mon2][WARNIN] error: open of 24-Apr-2016 failed: No such file
> or directory
> [cide-lceph-mon2][WARNIN] error: open of 00:05 failed: No such file or
> directory
> [cide-lceph-mon2][WARNIN] error: -: not an rpm package (or package
> manifest):
> [cide-lceph-mon2][WARNIN] error: open of  directory
> [cide-lceph-mon2][WARNIN] error: open of href=el7/>el7/ failed: No
> such file or directory
> [cide-lceph-mon2][WARNIN] error: open of 29-Aug-2016 failed: No such file
> or directory
> [cide-lceph-mon2][WARNIN] error: open of 11:53 failed: No such file or
> directory
> [cide-lceph-mon2][WARNIN] error: -: not an rpm package (or package
> manifest):
> [cide-lceph-mon2][WARNIN] error: open of  directory
> [cide-lceph-mon2][WARNIN] error: open of href=fc20/>fc20/ failed: No
> such file or directory
> [cide-lceph-mon2][WARNIN] error: open of 07-Apr-2015 failed: No such file
> or directory
> [cide-lceph-mon2][WARNIN] error: open of 19:21 failed: No such file or
> directory
> [cide-lceph-mon2][WARNIN] error: -: not an rpm package (or package
> manifest):
> [cide-lceph-mon2][WARNIN] error: open of  directory
> [cide-lceph-mon2][WARNIN] error: open of href=rhel6/

[ceph-users] Read from Replica Osds?

2017-05-08 Thread Mehmet
Hi,

I thought that Clients do also reads from ceph replicas. Sometimes i Read in 
the web that this does only happens from the primary pg like how ceph handle 
writes... so what is True?

Greetz
Mehmet___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read from Replica Osds?

2017-05-08 Thread David Turner
Reads will always happen on the Primary OSD for the PG.  Writes are
initially written to the primary OSD, but the write is not ack'd until the
write is completed on ALL secondaries.  I make that distinction because if
you have size 3 and min_size 2, the write will not come back unless all 3
OSDs have written the data.  The min_size is there for security so that
writes do not happen in your cluster if you do not have at least min_size
OSDs active in your PG.  It is not a setting to dictate how many copies
need to be written before a write is successful.

On Mon, May 8, 2017 at 12:04 PM Mehmet  wrote:

> Hi,
>
> I thought that Clients do also reads from ceph replicas. Sometimes i Read
> in the web that this does only happens from the primary pg like how ceph
> handle writes... so what is True?
>
> Greetz
> Mehmet___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read from Replica Osds?

2017-05-08 Thread Jason Dillaman
librbd can optionally read from replicas for snapshots and parent
images (i.e. known read-only data). This is controlled via the
following configuration options:

rbd_balance_snap_reads
rbd_localize_snap_reads
rbd_balance_parent_reads
rbd_localize_parent_reads

Direct users of the librados API can also utilize the
LIBRADOS_OPERATION_BALANCE_READS and LIBRADOS_OPERATION_LOCALIZE_READS
flags to control this behavior.

On Mon, May 8, 2017 at 12:04 PM, Mehmet  wrote:
> Hi,
>
> I thought that Clients do also reads from ceph replicas. Sometimes i Read in
> the web that this does only happens from the primary pg like how ceph handle
> writes... so what is True?
>
> Greetz
> Mehmet
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Performance after adding a node

2017-05-08 Thread Daniel Davidson
Our ceph system performs very poorly or not even at all while the 
remapping procedure is underway.  We are using replica 2 and the 
following ceph tweaks while it is in process:


 1013  ceph tell osd.* injectargs '--osd-recovery-max-active 20'
 1014  ceph tell osd.* injectargs '--osd-recovery-threads 20'
 1015  ceph tell osd.* injectargs '--osd-max-backfills 20'
 1016  ceph -w
 1017  ceph osd set noscrub
 1018  ceph osd set nodeep-scrub

After the remapping finishes, we set these back to default.

Are any of these causing our problems or is there another way to limit 
the impact of the remapping so that users do not think the system is 
down while we add more storage?



thanks,

Dan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Performance after adding a node

2017-05-08 Thread David Turner
WOW!!!  Those are some awfully high backfilling settings you have there.
They are 100% the reason that your customers think your system is down.
You're telling each OSD to be able to have 20 backfill operations running
at the exact same time.  I bet if you were watching iostat -x 1 on one of
your nodes before you inject those settings and then after you inject those
settings, the disk usage will go from a decent amount of 40-70% and jump
all the way up to 100% as soon as those settings are injected.

When you are backfilling, you are copying data from one drive to another.
Each osd-max-backfill you set it to is another file it tries to copy at the
same time.  These can be receiving data (writing to the disk) or moving
data off (reading from the disk followed by a delete).  So by having 20
backfills happening at a time, you are telling each disk to allow 20 files
to be written and/or read from it at the same time.  What happens to a disk
when you are copying 20 large files to it at a time?  all of them move
slower (a lot to do with disk thrashing having 20 threads all reading and
writing to different parts of the disk).

What you want to find is the point where your disks are usually around
80-90% utilized while backfilling, but not consistently 100%.  The easy way
to do that is to increase your osd-max-backfills by 1 or 2 at a time until
you see it go too high, and then back off.  I don't know many people that
go above 5 max backfills in a production cluster on spinning disks.
Usually the ones that do, do it temporarily while they know their cluster
isn't being utilized by customers much.

Personally I have used osd-recover-threads ands osd-recover-max-active,
I've been able to tune my clusters only using osd-max-backfills.  The lower
you leave these the longer the backfill will take, but the less impact your
customers will notice.  I've found 3 to be a generally safe number if
customer IO is your priority, 5 works well if your customers can be ok with
it being slow (but still usable)... but all of this depends on your
hardware and software use-cases.  Test it while watching your disk
utilizations and test your application while finding the right number for
your environment.

Good Luck :)

On Mon, May 8, 2017 at 5:43 PM Daniel Davidson 
wrote:

> Our ceph system performs very poorly or not even at all while the
> remapping procedure is underway.  We are using replica 2 and the
> following ceph tweaks while it is in process:
>
>   1013  ceph tell osd.* injectargs '--osd-recovery-max-active 20'
>   1014  ceph tell osd.* injectargs '--osd-recovery-threads 20'
>   1015  ceph tell osd.* injectargs '--osd-max-backfills 20'
>   1016  ceph -w
>   1017  ceph osd set noscrub
>   1018  ceph osd set nodeep-scrub
>
> After the remapping finishes, we set these back to default.
>
> Are any of these causing our problems or is there another way to limit
> the impact of the remapping so that users do not think the system is
> down while we add more storage?
>
>
> thanks,
>
> Dan
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reg: Ceph-deploy install - failing

2017-05-08 Thread Vasu Kulkarni
the latest stable version is jewel(lts) and kraken,
http://docs.ceph.com/docs/master/releases/

If you want to install stable version use --stable=jewel flag with
ceph-deploy install command and it will get the packages from
download.ceph.com, It is well tested on latest CentOS and Ubuntu.

On Mon, May 8, 2017 at 8:14 AM, Curt  wrote:

> Hello,
>
> I don't use Cent, but I've seen the same thing with Ubuntu, I'm going to
> assume it's the same problem.  The repo url should be download.ceph.com,
> instead of just ceph.com, which is uses when it add's it to the repo.  My
> solution usually is, correct the repo URL to point to download.ceph.com
> and in the ceph-deploy add the tag --no-adjust-repo.
>
> Cheers
>
> On Sun, May 7, 2017 at 3:51 PM, psuresh  wrote:
>
>> Hi,
>>
>> When i run "ceph-deploy install lceph-mon2" from admin node i'm getting
>> following error.   Any clue!
>>
>> [cide-lceph-mon2][DEBUG ] connected to host: cide-lceph-mon2
>> [cide-lceph-mon2][DEBUG ] detect platform information from remote host
>> [cide-lceph-mon2][DEBUG ] detect machine type
>> [ceph_deploy.install][INFO  ] Distro info: CentOS Linux 7.3.1611 Core
>> [cide-lceph-mon2][INFO  ] installing ceph on cide-lceph-mon2
>> [cide-lceph-mon2][INFO  ] Running command: yum clean all
>> [cide-lceph-mon2][DEBUG ] Loaded plugins: fastestmirror, langpacks,
>> priorities
>> [cide-lceph-mon2][DEBUG ] Cleaning repos: Ceph Ceph-noarch base
>> ceph-source epel extras updates
>> [cide-lceph-mon2][DEBUG ] Cleaning up everything
>> [cide-lceph-mon2][INFO  ] adding EPEL repository
>> [cide-lceph-mon2][INFO  ] Running command: yum -y install epel-release
>> [cide-lceph-mon2][DEBUG ] Loaded plugins: fastestmirror, langpacks,
>> priorities
>> [cide-lceph-mon2][DEBUG ] Determining fastest mirrors
>> [cide-lceph-mon2][DEBUG ]  * base: centos.excellmedia.net
>> [cide-lceph-mon2][DEBUG ]  * epel: ftp.cuhk.edu.hk
>> [cide-lceph-mon2][DEBUG ]  * extras: centos.excellmedia.net
>> [cide-lceph-mon2][DEBUG ]  * updates: centos.excellmedia.net
>> [cide-lceph-mon2][DEBUG ] Package epel-release-7-9.noarch already
>> installed and latest version
>> [cide-lceph-mon2][DEBUG ] Nothing to do
>> [cide-lceph-mon2][INFO  ] Running command: yum -y install yum-priorities
>> [cide-lceph-mon2][DEBUG ] Loaded plugins: fastestmirror, langpacks,
>> priorities
>> [cide-lceph-mon2][DEBUG ] Loading mirror speeds from cached hostfile
>> [cide-lceph-mon2][DEBUG ]  * base: centos.excellmedia.net
>> [cide-lceph-mon2][DEBUG ]  * epel: ftp.cuhk.edu.hk
>> [cide-lceph-mon2][DEBUG ]  * extras: centos.excellmedia.net
>> [cide-lceph-mon2][DEBUG ]  * updates: centos.excellmedia.net
>> [cide-lceph-mon2][DEBUG ] Package yum-plugin-priorities-1.1.31-40.el7.noarch
>> already installed and latest version
>> [cide-lceph-mon2][DEBUG ] Nothing to do
>> [cide-lceph-mon2][DEBUG ] Configure Yum priorities to include obsoletes
>> [cide-lceph-mon2][WARNIN] check_obsoletes has been enabled for Yum
>> priorities plugin
>> [cide-lceph-mon2][INFO  ] Running command: rpm --import
>> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>> [cide-lceph-mon2][INFO  ] Running command: rpm -Uvh --replacepkgs
>> http://ceph.com/rpm-hammer/el7/noarch/ceph-release-1-0.el7.noarch.rpm
>> [cide-lceph-mon2][WARNIN] error: open of  failed: No such file or
>> directory
>> [cide-lceph-mon2][WARNIN] error: open of Index failed: No
>> such file or directory
>> [cide-lceph-mon2][WARNIN] error: open of of failed: No such file or
>> directory
>> [cide-lceph-mon2][WARNIN] error: open of /rpm-hammer/
>> failed: No such file or directory
>> [cide-lceph-mon2][WARNIN] error: open of > directory
>> [cide-lceph-mon2][WARNIN] error: open of bgcolor=white> failed: No such
>> file or directory
>> [cide-lceph-mon2][WARNIN] error: open of Index failed: No such file
>> or directory
>> [cide-lceph-mon2][WARNIN] error: open of of failed: No such file or
>> directory
>> [cide-lceph-mon2][WARNIN] error: open of /rpm-hammer/> failed: No such file or directory
>> [cide-lceph-mon2][WARNIN] error: open of href=../>../ failed: No such
>> file or directory
>> [cide-lceph-mon2][WARNIN] error: open of > directory
>> [cide-lceph-mon2][WARNIN] error: open of href=el6/>el6/ failed: No
>> such file or directory
>> [cide-lceph-mon2][WARNIN] error: open of 24-Apr-2016 failed: No such file
>> or directory
>> [cide-lceph-mon2][WARNIN] error: open of 00:05 failed: No such file or
>> directory
>> [cide-lceph-mon2][WARNIN] error: -: not an rpm package (or package
>> manifest):
>> [cide-lceph-mon2][WARNIN] error: open of > directory
>> [cide-lceph-mon2][WARNIN] error: open of href=el7/>el7/ failed: No
>> such file or directory
>> [cide-lceph-mon2][WARNIN] error: open of 29-Aug-2016 failed: No such file
>> or directory
>> [cide-lceph-mon2][WARNIN] error: open of 11:53 failed: No such file or
>> directory
>> [cide-lceph-mon2][WARNIN] error: -: not an rpm package (or package
>> manifest):
>> [cide-lceph-mon2][WARNIN] error: open of > directory
>> [cide-lceph-m

Re: [ceph-users] Read from Replica Osds?

2017-05-08 Thread Ben Hines
We write many millions of keys into RGW which will never be changed (until
they are deleted) -- it would be interesting if we could somehow indicate
this to RGW and enable reading those from the replicas as well.

-Ben

On Mon, May 8, 2017 at 10:18 AM, Jason Dillaman  wrote:

> librbd can optionally read from replicas for snapshots and parent
> images (i.e. known read-only data). This is controlled via the
> following configuration options:
>
> rbd_balance_snap_reads
> rbd_localize_snap_reads
> rbd_balance_parent_reads
> rbd_localize_parent_reads
>
> Direct users of the librados API can also utilize the
> LIBRADOS_OPERATION_BALANCE_READS and LIBRADOS_OPERATION_LOCALIZE_READS
> flags to control this behavior.
>
> On Mon, May 8, 2017 at 12:04 PM, Mehmet  wrote:
> > Hi,
> >
> > I thought that Clients do also reads from ceph replicas. Sometimes i
> Read in
> > the web that this does only happens from the primary pg like how ceph
> handle
> > writes... so what is True?
> >
> > Greetz
> > Mehmet
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Jason
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph memory overhead when used with KVM

2017-05-08 Thread nick
Hi,
I was using a standalone rbd image.

Cheers
Nick

On Monday, May 08, 2017 08:55:55 AM Jason Dillaman wrote:
> Thanks. One more question: was the image a clone or a stand-alone image?
> 
> On Fri, May 5, 2017 at 2:42 AM, nick  wrote:
> > Hi,
> > I used one of the fio example files and changed it a bit:
> > 
> > """
> > # This job file tries to mimic the Intel IOMeter File Server Access
> > Pattern
> > [global]
> > description=Emulation of Intel IOmeter File Server Access Pattern
> > randrepeat=0
> > filename=/root/test.dat
> > # IOMeter defines the server loads as the following:
> > # iodepth=1 Linear
> > # iodepth=4 Very Light
> > # iodepth=8 Light
> > # iodepth=64Moderate
> > # iodepth=256   Heavy
> > iodepth=8
> > size=80g
> > direct=0
> > ioengine=libaio
> > 
> > [iometer]
> > stonewall
> > bs=4M
> > rw=randrw
> > 
> > [iometer_just_write]
> > stonewall
> > bs=4M
> > rw=write
> > 
> > [iometer_just_read]
> > stonewall
> > bs=4M
> > rw=read
> > """
> > 
> > Then let it run:
> > $> while true; do fio stress.fio; rm /root/test.dat; done
> > 
> > I had this running over a weekend.
> > 
> > Cheers
> > Sebastian
> > 
> > On Tuesday, May 02, 2017 02:51:06 PM Jason Dillaman wrote:
> >> Can you share the fio job file that you utilized so I can attempt to
> >> repeat locally?
> >> 
> >> On Tue, May 2, 2017 at 2:51 AM, nick  wrote:
> >> > Hi Jason,
> >> > thanks for your feedback. I did now some tests over the weekend to
> >> > verify
> >> > the memory overhead.
> >> > I was using qemu 2.8 (taken from the Ubuntu Cloud Archive) with librbd
> >> > 10.2.7 on Ubuntu 16.04 hosts. I suspected the ceph rbd cache to be the
> >> > cause of the overhead so I just generated a lot of IO with the help of
> >> > fio in the VMs (with a datasize of 80GB) . All VMs had 3GB of memory. I
> >> > had to run fio multiple times, before reaching high RSS values.
> >> > I also noticed that when using larger blocksizes during writes (like
> >> > 4M)
> >> > the memory overhead in the KVM process increased faster.
> >> > I ran several fio tests (one after another) and the results are:
> >> > 
> >> > KVM with writeback RBD cache: max. 85% memory overhead (2.5 GB
> >> > overhead)
> >> > KVM with writethrough RBD cache: max. 50% memory overhead
> >> > KVM without RBD caching: less than 10% overhead all the time
> >> > KVM with local storage (logical volume used): 8% overhead all the time
> >> > 
> >> > I did not reach those >200% memory overhead results that we see on our
> >> > live
> >> > cluster, but those virtual machines have a way longer uptime as well.
> >> > 
> >> > I also tried to reduce the RSS memory value with cache dropping on the
> >> > physical host and in the VM. Both did not lead to any change. A reboot
> >> > of
> >> > the VM also does not change anything (reboot in the VM, not a new KVM
> >> > process). The only way to reduce the RSS memory value is a live
> >> > migration
> >> > so far. Might this be a bug? The memory overhead sounds a bit too much
> >> > for me.
> >> > 
> >> > Best Regards
> >> > Sebastian
> >> > 
> >> > On Thursday, April 27, 2017 10:08:36 AM you wrote:
> >> >> I know we noticed high memory usage due to librados in the Ceph
> >> >> multipathd checker [1] -- the order of hundreds of megabytes. That
> >> >> client was probably nearly as trivial as an application can get and I
> >> >> just assumed it was due to large monitor maps being sent to the client
> >> >> for whatever reason. Since we changed course on our RBD iSCSI
> >> >> implementation, unfortunately the investigation into this high memory
> >> >> usage fell by the wayside.
> >> >> 
> >> >> [1]
> >> >> http://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=blob;f=libm
> >> >> ult
> >> >> ip
> >> >> ath/checkers/rbd.c;h=9ea0572f2b5bd41b80bf2601137b74f92bdc7278;hb=HEAD
> >> >> 
> >> >> On Thu, Apr 27, 2017 at 5:26 AM, nick  wrote:
> >> >> > Hi Christian,
> >> >> > thanks for your answer.
> >> >> > The highest value I can see for a local storage VM in our
> >> >> > infrastructure
> >> >> > is a memory overhead of 39%. This is big, but the majority (>90%) of
> >> >> > our
> >> >> > local storage VMs are using less than 10% memory overhead.
> >> >> > For ceph storage based VMs this looks quite different. The highest
> >> >> > value I
> >> >> > can see currently is 244% memory overhead. So that specific
> >> >> > allocated
> >> >> > 3GB
> >> >> > memory VM is using now 10.3 GB RSS memory on the physical host. This
> >> >> > is
> >> >> > a
> >> >> > really huge value. In general I can see that the majority of the
> >> >> > ceph
> >> >> > based VMs has more than 60% memory overhead.
> >> >> > 
> >> >> > Maybe this is also a bug related to qemu+librbd. It would be just
> >> >> > nice
> >> >> > to
> >> >> > know if other people are seeing those high values as well.
> >> >> > 
> >> >> > Cheers
> >> >> > Sebastian
> >> >> > 
> >> >> > On Thursday, April 27, 2017 06:10:48 PM you wrote:
> >> >> >> Hello,
> >> >> >> 
> >> >> >> Definitely s