[ceph-users] seqwrite gets good performance but random rw gets worse

2016-05-25 Thread Ken Peng
Hello, We have a cluster with 20+ hosts and 200+ OSDs, each 4T SATA disk for an OSD, no SSD cache. OS is Ubuntu 16.04 LTS, ceph version 10.2.0 Both data network and cluster network are 10Gbps. We run ceph as block storage service only (rbd client within VM). For testing within a VM with sysbench

Re: [ceph-users] seqwrite gets good performance but random rw gets worse

2016-05-25 Thread Adrian Saul
Are you using image-format 2 RBD images? We found a major performance hit using format 2 images under 10.2.0 today in some testing. When we switched to using format 1 images we literally got 10x random write IOPS performance (1600 IOPs up to 3 IOPS for the same test). From: ceph-users [

Re: [ceph-users] seqwrite gets good performance but random rw gets worse

2016-05-25 Thread Ken Peng
yes we run format 2 only. We will run a data disk with format 1 for a testing for comparsion. I will tell you the results. thanks. 2016-05-25 15:31 GMT+08:00 Adrian Saul : > > > Are you using image-format 2 RBD images? > > > > We found a major performance hit using format 2 images under 10.2.0 to

Re: [ceph-users] Ceph crash, how to analyse and recover

2016-05-25 Thread Christian Balzer
Hello, On Wed, 25 May 2016 06:43:05 + Ammerlaan, A.J.G. wrote: > > Hello Ceph Users, > > We have a Ceph test cluster, that we want to bring into production and > will grow rapidly in the future. Ceph version: > ceph 0.80.7-2+deb8u1 > amd64distribut

Re: [ceph-users] restore OSD node After SO failure

2016-05-25 Thread Iban Cabrillo
HI, Please could some one give me any advises? regards, I 2016-05-20 10:22 GMT+02:00 Iban Cabrillo : > Hi cephers, >Could someone tell me the right steps, for bring to life and OSD > server? data and journal disks seems to be OK but the dual SD slot for SO > have failed. > > Regards, I > >

Re: [ceph-users] restore OSD node After SO failure

2016-05-25 Thread Oliver Dzombic
Hi Iban, the problem is, that you leave us with zero useful information. If your dual SD Slot for SO ( what ever that is supposed to be ) failed, you should correct that. It does not sound like a ceph related problem. Usually, to get ceph started, the official documentation is a good way to star

Re: [ceph-users] seqwrite gets good performance but random rw gets worse

2016-05-25 Thread Ken Peng
Hi, After comparison we found there is nothing much difference between format 1 and format 2. format 1 is even worse for randrw. format 1 result: # sysbench --test=fileio --file-total-size=5G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run sysbench 0.4.12: multi-thread

Re: [ceph-users] SSD randwrite performance

2016-05-25 Thread Max A. Krasilnikov
Hello! On Wed, May 25, 2016 at 11:45:29AM +0900, chibi wrote: > Hello, > On Tue, 24 May 2016 21:20:49 +0300 Max A. Krasilnikov wrote: >> Hello! >> >> I have cluster with 5 SSD drives as OSD backed by SSD journals, one per >> osd. One osd per node. >> > More details will help identify other

Re: [ceph-users] restore OSD node After SO failure

2016-05-25 Thread Christian Balzer
Hello, On Wed, 25 May 2016 10:20:35 +0200 Oliver Dzombic wrote: > Hi Iban, > > the problem is, that you leave us with zero useful information. > > If your dual SD Slot for SO ( what ever that is supposed to be ) failed, > you should correct that. It does not sound like a ceph related problem.

Re: [ceph-users] seqwrite gets good performance but random rw gets worse

2016-05-25 Thread Ken Peng
Hi again, when setup file-fsync-freq=1 (fsync for each time writing) and file-fsync-freq=0 (never fsync by sysbench), the result gets huge difference. (one is 382.94Kb/sec, another is 25.921Mb/sec). How do you think of it? thanks. file-fsync-freq=1, # sysbench --test=fileio --file-total-size=5G

Re: [ceph-users] seqwrite gets good performance but random rw gets worse

2016-05-25 Thread Adrian Saul
Sync will always be lower – it will cause it to wait for previous writes to complete before issuing more so it will effectively throttle writes to a queue depth of 1. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ken Peng Sent: Wednesday, 25 May 2016 6:36 PM To: ce

Re: [ceph-users] seqwrite gets good performance but random rw gets worse

2016-05-25 Thread Ken Peng
You are right. anyway our sysbench result for random R/W gets so worse, sysbench by default sets up file-fsync-freq=100. Do you guys have any idea for debug and tuning ceph cluster for better random IO performance? Thanks. ___ ceph-users mailing list cep

[ceph-users] Q on calamari

2016-05-25 Thread Andrey Shevel
I tried to install calamari in ceph [ceph@ceph-admin ~]$ ceph -v ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269) and got [ceph@ceph-admin ~]$ ceph-deploy calamari connect osd1 [ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf [ceph_deploy.cli][INF

Re: [ceph-users] Ceph Status - Segmentation Fault

2016-05-25 Thread John Spray
On Mon, May 23, 2016 at 12:41 PM, Mathias Buresch wrote: > Please found the logs with higher debug level attached to this email. You've attached the log from your mon, but it's not your mon that's segfaulting, right? You can use normal ceph command line flags to crank up the verbosity on the CLI

Re: [ceph-users] restore OSD node After SO failure

2016-05-25 Thread Iban Cabrillo
Thanks Christian, The Custer is UP for now ;), then I will remove the OSD server (osd.x disk) from ceph and reinstall it from scratch. regards, I 2016-05-25 10:29 GMT+02:00 Christian Balzer : > > Hello, > > On Wed, 25 May 2016 10:20:35 +0200 Oliver Dzombic wrote: > > > Hi Iban, > > > > the pro

Re: [ceph-users] radosgw hammer -> jewel upgrade (default zone & region config)

2016-05-25 Thread nick
I finally managed to fix the problem with the radosgw upgrade to jewel with the help of the script from Yehuda (https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/). I tested the whole upgrade on our staging cluster first. As already mentioned I removed the '_' in the "fil

[ceph-users] Replacing Initial-Mon

2016-05-25 Thread c...@dolphin-it.de
Hi! Our monitors need a hardware replacement and we would like to reinstall them. Can I shut them down one-by-one, add new hardware with same IPv4 and IPv6 and redeploy dem from our admin-machine? Is there anything I missed? Something that I need to pay attention to because they are the initi

Re: [ceph-users] Removing objects and bucket after issues with placement group

2016-05-25 Thread Romero Junior
After a couple of days querying Google I still haven’t found a solution for this. I decided to add more outputs: http://pastebin.com/raw/mk9gthxT Basically the objects are listed and linked to that bucket, but I cannot remove or access them (see pastebin). Anyone? Kind regards, Romero Jun

Re: [ceph-users] using jemalloc in trusty

2016-05-25 Thread Andrei Mikhailovsky
Interesting, I've switched to jemalloc about a month ago while running Hammer. after installing the library and using the /etc/ld.so.preload I am seeing that all ceph-osd processes are indeed using the library. I've upgraded to Jewel a few days ago and see the same picture: # time lsof |grep c

[ceph-users] Missing OSD daemons while they are in UP state.

2016-05-25 Thread Albert Archer
Hello All, Unfortunately my virtual ceph cluster(virtual machines on VMware ESXI) felt into strange state. When i reboot one of ceph OSD machines (hostname=osd3), all of OSD daemons that related to that host were down (it's normal). But when that OSD host boots up, i couldn't make OSDs UP, Because

Re: [ceph-users] using jemalloc in trusty

2016-05-25 Thread Alexandre DERUMIER
>>To me it looks like the libbrary is being used, but please advise if it is >>otherwise. can you do a "perf top", to see if malloc are done through tcmalloc or jemalloc ? - Mail original - De: "Andrei Mikhailovsky" À: "Joshua M. Boniface" Cc: "ceph-users" Envoyé: Mercredi 25 Mai 20

Re: [ceph-users] Ceph Status - Segmentation Fault

2016-05-25 Thread Mathias Buresch
I don't know what exactly is segfaulting. Here ist the output with command line flags and gdb (I can't really notice erros in that output): # ceph -s --debug-monc=20 --debug-ms=20 2016-05-25 14:51:02.406135 7f188300a700 10 monclient(hunting): build_initial_monmap 2016-05-25 14:51:02.406444 7f1883

Re: [ceph-users] CEPH/CEPHFS upgrade questions (9.2.0 ---> 10.2.1)

2016-05-25 Thread Gregory Farnum
On Tue, May 24, 2016 at 9:54 PM, Goncalo Borges wrote: > Thank you Greg... > > There is one further thing which is not explained in the release notes and > that may be worthwhile to say. > > The rpm structure (for redhat compatible releases) changed in Jewel where > now there is a ( ceph + ceph-co

Re: [ceph-users] jewel 10.2.1 lttng & rbdmap.service

2016-05-25 Thread kefu chai
On Tue, May 24, 2016 at 5:23 AM, Max Vernimmen wrote: > Hi, > > I upgraded to 10.2.1 and noticed that lttng is a dependency for the RHEL > packages in that version. Since I have no intention of doing traces on ceph > I find myself wondering why ceph is now requiring these libraries to be > instal

Re: [ceph-users] Ceph Status - Segmentation Fault

2016-05-25 Thread John Spray
On Wed, May 25, 2016 at 3:00 PM, Mathias Buresch wrote: > I don't know what exactly is segfaulting. > > Here ist the output with command line flags and gdb (I can't really > notice erros in that output): > > # ceph -s --debug-monc=20 --debug-ms=20 > 2016-05-25 14:51:02.406135 7f188300a700 10 moncl

Re: [ceph-users] Blocked ops, OSD consuming memory, hammer

2016-05-25 Thread Gregory Farnum
On Tue, May 24, 2016 at 11:19 PM, Heath Albritton wrote: > Not going to attempt threading and apologies for the two messages on > the same topic. Christian is right, though. 3 nodes per tier, 8 SSDs > per node in the cache tier, 12 spinning disks in the cold tier. 10GE > client network with a s

Re: [ceph-users] Ceph Status - Segmentation Fault

2016-05-25 Thread Mathias Buresch
There wasnt a package ceph-debuginfo available (Maybe bc I am running Ubuntu). Have installed those: * ceph-dbg * librados2-dbg There would be also ceph-mds-dbg and ceph-fs-common-dbg and so.. But now there are more information provided by the gdb output :) (gdb) run /usr/bin/ceph status --de

Re: [ceph-users] Error 400 Bad Request when accessing Ceph

2016-05-25 Thread Andrey Ptashnik
Hi Team, I’m trying to get a second chance for the question below if someone came across such error. Regards, Andrey Ptashnik On 5/24/16, 11:14 AM, "ceph-users on behalf of Andrey Ptashnik" wrote: >Hello Team, > >I set up a Ceph cluster version 0.94.5 for testing and trying to connect

Re: [ceph-users] Missing OSD daemons while they are in UP state.

2016-05-25 Thread Albert Archer
It's not strange enough ??? On May 25, 2016 3:38 PM, "Albert Archer" wrote: > Hello All, > Unfortunately my virtual ceph cluster(virtual machines on VMware ESXI) > felt into strange state. > When i reboot one of ceph OSD machines (hostname=osd3), all of OSD daemons > that related to that host wer

Re: [ceph-users] jewel 10.2.1 lttng & rbdmap.service

2016-05-25 Thread Ken Dreyer
On Wed, May 25, 2016 at 8:00 AM, kefu chai wrote: > On Tue, May 24, 2016 at 5:23 AM, Max Vernimmen > wrote: >> Hi, >> >> I upgraded to 10.2.1 and noticed that lttng is a dependency for the RHEL >> packages in that version. Since I have no intention of doing traces on ceph >> I find myself wonder

Re: [ceph-users] CEPH/CEPHFS upgrade questions (9.2.0 ---> 10.2.1)

2016-05-25 Thread Ken Dreyer
On Wed, May 25, 2016 at 8:05 AM, Gregory Farnum wrote: > On Tue, May 24, 2016 at 9:54 PM, Goncalo Borges > wrote: >> Thank you Greg... >> >> There is one further thing which is not explained in the release notes and >> that may be worthwhile to say. >> >> The rpm structure (for redhat compatible

Re: [ceph-users] Blocked ops, OSD consuming memory, hammer

2016-05-25 Thread Heath Albritton
I fear I've hit a bug as well. Considering an upgrade to the latest release of hammer. Somewhat concerned that I may lose those PGs. -H > On May 25, 2016, at 07:42, Gregory Farnum wrote: > >> On Tue, May 24, 2016 at 11:19 PM, Heath Albritton wrote: >> Not going to attempt threading and apo

[ceph-users] Ceph Tech Talk Tomorrow

2016-05-25 Thread Patrick McGarry
Hey cephers, Just a reminder that this month's Ceph tech talk is tomorrow at 1p EST. This month we will be having a tutorial on the Ceph benchmarking tool (CBT). HTTP://Ceph.com/ceph-tech-talks Join us for a rundown of the current status of Ceph performance profiling. ___

[ceph-users] Best CLI or GUI client for Ceph and S3 protocol

2016-05-25 Thread Andrey Ptashnik
Team, I wanted to ask if some of you are using CLI or GUI based S3 browsers/clients with Ceph and what are the best ones? Regards, Andrey Ptashnik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-cep

Re: [ceph-users] Best CLI or GUI client for Ceph and S3 protocol

2016-05-25 Thread Brian Haymore
I'd give a solid plug for 'rclone' (http://rclone.org). We have been using it for some time and the inherent parallelism has shown great performance. This is a data moving tool, and currently does not fill the need for setting acls on content. So to a degree s3cmd is complimentary (or your ot

[ceph-users] ceph-disk: Error: No cluster conf found in /etc/ceph with fsid

2016-05-25 Thread Albert.K.Chong (git.usca07.Newegg) 22201
Hi, I follow storage cluster Quick start instruction in my centos 7 more than 10 times including complete cleaning and reinstallation. I failed in the same step every time. "ceph-deploy osd activate ..." The last try I just create disk in the local drive to avoid some permission warning and r

Re: [ceph-users] Blocked ops, OSD consuming memory, hammer

2016-05-25 Thread Shinobu Kinjo
What will the followings show you? ceph pg 12.258 list_unfound // maybe hung... ceph pg dump_stuck and enable debug to osd.4 debug osd = 20 debug filestore = 20 debug ms = 1 But honestly my best bet is to upgrade to the latest. It would save your life much more. - Shinobu On Thu, May 26, 20

Re: [ceph-users] Ceph Status - Segmentation Fault

2016-05-25 Thread Brad Hubbard
Hi John, This looks a lot like http://tracker.ceph.com/issues/12417 which is, of course, fixed. Worth gathering debug-auth=20 ? Maybe on the MON end as well? Cheers, Brad - Original Message - > From: "Mathias Buresch" > To: jsp...@redhat.com > Cc: ceph-us...@ceph.com > Sent: Thursday,

Re: [ceph-users] Best CLI or GUI client for Ceph and S3 protocol

2016-05-25 Thread David Wang
Hi Andrey, I usually use s3cmd for CLI and Sree for GUI. 2016-05-26 5:11 GMT+08:00 Andrey Ptashnik : > Team, > > I wanted to ask if some of you are using CLI or GUI based S3 > browsers/clients with Ceph and what are the best ones? > > Regards, > > Andrey Ptashnik > > __

Re: [ceph-users] Blocked ops, OSD consuming memory, hammer

2016-05-25 Thread Christian Balzer
Hello, On Thu, 26 May 2016 07:26:19 +0900 Shinobu Kinjo wrote: > What will the followings show you? > > ceph pg 12.258 list_unfound // maybe hung... > ceph pg dump_stuck > > and enable debug to osd.4 > > debug osd = 20 > debug filestore = 20 > debug ms = 1 > > But honestly my best bet is to

Re: [ceph-users] Best CLI or GUI client for Ceph and S3 protocol

2016-05-25 Thread George Mihaiescu
Aws cli is used by us. > On May 25, 2016, at 5:11 PM, Andrey Ptashnik wrote: > > Team, > > I wanted to ask if some of you are using CLI or GUI based S3 browsers/clients > with Ceph and what are the best ones? > > Regards, > > Andrey Ptashnik > > ___

Re: [ceph-users] Falls cluster then one node switch off

2016-05-25 Thread Christian Balzer
Hello, I've expanded the cache-tier in my test cluster from a single node to 2, increased the pool size from 1 to 2, then waited until all the data was rebalanced/duplicated and the cluster was healthy again. Then I stopped all OSDs on one of the 2 nodes and nothing other than degraded/undersiz

Re: [ceph-users] NVRAM cards as OSD journals

2016-05-25 Thread Christian Balzer
Hello, On Tue, 24 May 2016 14:30:41 + Somnath Roy wrote: > If you are not tweaking ceph.conf settings when using NVRAM as journal , > I would highly recommend to try the following. > > 1. Since you have very small journal , try to reduce > filestore_max_sync_interval/min_sync_interval signi

Re: [ceph-users] Missing OSD daemons while they are in UP state.

2016-05-25 Thread Albert Archer
Can anybody give me help ??? On Wed, May 25, 2016 at 7:45 PM, Albert Archer wrote: > It's not strange enough ??? > On May 25, 2016 3:38 PM, "Albert Archer" wrote: > >> Hello All, >> Unfortunately my virtual ceph cluster(virtual machines on VMware ESXI) >> felt into strange state. >> When i rebo

Re: [ceph-users] Missing OSD daemons while they are in UP state.

2016-05-25 Thread Christian Balzer
Hello, On Thu, 26 May 2016 10:12:54 +0430 Albert Archer wrote: > Can anybody give me help ??? > Being proactive would likely be all the help you need. If you're searching for systemd problems in this ML, the Ceph changelogs and google in general you will find many problems, some of them very mu

Re: [ceph-users] Missing OSD daemons while they are in UP state.

2016-05-25 Thread Albert Archer
Thanks , i use jewel (2.10.1) release on ubuntu 16.04 (kernel 4.4). On Thu, May 26, 2016 at 10:25 AM, Christian Balzer wrote: > > Hello, > > On Thu, 26 May 2016 10:12:54 +0430 Albert Archer wrote: > > > Can anybody give me help ??? > > > Being proactive would likely be all the help you need. > >

Re: [ceph-users] Replacing Initial-Mon

2016-05-25 Thread Christian Balzer
Hello, On Wed, 25 May 2016 10:32:05 + c...@dolphin-it.de wrote: > > > Hi! > > Our monitors need a hardware replacement and we would like to reinstall > them. Can I shut them down one-by-one, add new hardware with same IPv4 > and IPv6 and redeploy dem from our admin-machine? > > Is there a

Re: [ceph-users] ceph-disk: Error: No cluster conf found in /etc/ceph with fsid

2016-05-25 Thread Christian Balzer
Hello, if you google the EXACT subject of your mail you will find several threads about this, the first one most likely exactly what you're seeing (having a not fully cleaned/purged install leftover). http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-May/040128.html Christian On Wed, 25