Re: [ceph-users] SSD OSDs crashing after upgrade to 12.2.7

2018-09-06 Thread Caspar Smit
Hi,

These reports are kind of worrying since we have a 12.2.5 cluster too
waiting to upgrade. Did you have a luck with upgrading to 12.2.8 or still
the same behavior?
Is there a bugtracker for this issue?

Kind regards,
Caspar

Op di 4 sep. 2018 om 09:59 schreef Wolfgang Lendl <
wolfgang.le...@meduniwien.ac.at>:

> is downgrading from 12.2.7 to 12.2.5 an option? - I'm still suffering
> from high frequent osd crashes.
> my hopes are with 12.2.9 - but hope wasn't always my best strategy
>
> br
> wolfgang
>
> On 2018-08-30 19:18, Alfredo Deza wrote:
> > On Thu, Aug 30, 2018 at 5:24 AM, Wolfgang Lendl
> >  wrote:
> >> Hi Alfredo,
> >>
> >>
> >> caught some logs:
> >> https://pastebin.com/b3URiA7p
> > That looks like there is an issue with bluestore. Maybe Radoslaw or
> > Adam might know a bit more.
> >
> >
> >> br
> >> wolfgang
> >>
> >> On 2018-08-29 15:51, Alfredo Deza wrote:
> >>> On Wed, Aug 29, 2018 at 2:06 AM, Wolfgang Lendl
> >>>  wrote:
>  Hi,
> 
>  after upgrading my ceph clusters from 12.2.5 to 12.2.7  I'm
> experiencing random crashes from SSD OSDs (bluestore) - it seems that HDD
> OSDs are not affected.
>  I destroyed and recreated some of the SSD OSDs which seemed to help.
> 
>  this happens on centos 7.5 (different kernels tested)
> 
>  /var/log/messages:
>  Aug 29 10:24:08  ceph-osd: *** Caught signal (Segmentation fault) **
>  Aug 29 10:24:08  ceph-osd: in thread 7f8a8e69e700
> thread_name:bstore_kv_final
>  Aug 29 10:24:08  kernel: traps: bstore_kv_final[187470] general
> protection ip:7f8a997cf42b sp:7f8a8e69abc0 error:0 in
> libtcmalloc.so.4.4.5[7f8a997a8000+46000]
>  Aug 29 10:24:08  systemd: ceph-osd@2.service: main process exited,
> code=killed, status=11/SEGV
>  Aug 29 10:24:08  systemd: Unit ceph-osd@2.service entered failed
> state.
>  Aug 29 10:24:08  systemd: ceph-osd@2.service failed.
>  Aug 29 10:24:28  systemd: ceph-osd@2.service holdoff time over,
> scheduling restart.
>  Aug 29 10:24:28  systemd: Starting Ceph object storage daemon osd.2...
>  Aug 29 10:24:28  systemd: Started Ceph object storage daemon osd.2.
>  Aug 29 10:24:28  ceph-osd: starting osd.2 at - osd_data
> /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
>  Aug 29 10:24:35  ceph-osd: *** Caught signal (Segmentation fault) **
>  Aug 29 10:24:35  ceph-osd: in thread 7f5f1e790700
> thread_name:tp_osd_tp
>  Aug 29 10:24:35  kernel: traps: tp_osd_tp[186933] general protection
> ip:7f5f43103e63 sp:7f5f1e78a1c8 error:0 in
> libtcmalloc.so.4.4.5[7f5f430cd000+46000]
>  Aug 29 10:24:35  systemd: ceph-osd@0.service: main process exited,
> code=killed, status=11/SEGV
>  Aug 29 10:24:35  systemd: Unit ceph-osd@0.service entered failed
> state.
>  Aug 29 10:24:35  systemd: ceph-osd@0.service failed
> >>> These systemd messages aren't usually helpful, try poking around
> >>> /var/log/ceph/ for the output on that one OSD.
> >>>
> >>> If those logs aren't useful either, try bumping up the verbosity (see
> >>>
> http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/#boot-time
> >>> )
>  did I hit a known issue?
>  any suggestions are highly appreciated
> 
> 
>  br
>  wolfgang
> 
> 
> 
>  ___
>  ceph-users mailing list
>  ceph-users@lists.ceph.com
>  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> >> --
> >> Wolfgang Lendl
> >> IT Systems & Communications
> >> Medizinische Universität Wien
> >> Spitalgasse 23 / BT 88 /Ebene 00
> >> A-1090 Wien
> >> Tel: +43 1 40160-21231
> >> Fax: +43 1 40160-921200
> >>
> >>
>
> --
> Wolfgang Lendl
> IT Systems & Communications
> Medizinische Universität Wien
> Spitalgasse 23 / BT 88 /Ebene 00
> A-1090 Wien
> Tel: +43 1 40160-21231
> Fax: +43 1 40160-921200
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-community] How to setup Ceph OSD auto boot up on node reboot

2018-09-06 Thread Mateusz Skala (UST, POL)
Hi,

If it’s problem with UUID of partition You can use this commands:



sgdisk --change-name={journal_partition_number}:'ceph journal' 
--typecode={journal_partition_number}:45b0969e-9b03-4f30-b4c6-b4b80ceff106 
--mbrtogpt -- /dev/{journal_device}

sgdisk --change-name={data_partition_number}:'ceph data'  
--typecode=1:4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D --mbrtogpt /dev/{data_device}



Just adjust {journal_partition_number}, {journal_device}, 
{data_partition_number} and {data_device}.



After this ‘ceph-disk list’ should show proper list of Your devices.



Regards

Mateusz



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of David 
Turner
Sent: Wednesday, September 5, 2018 8:12 PM
To: Pardhiv Karri 
Cc: ceph-users ; ceph-commun...@lists.ceph.com
Subject: Re: [ceph-users] [Ceph-community] How to setup Ceph OSD auto boot up 
on node reboot



The magic sauce to get Filestore OSDs to start on a node reboot is to make sure 
that all of your udev magic is correct.  In particular you need to have the 
correct UUID set for all partitions.  I haven't dealt with it in a long time, 
but I've written up a few good ML responses about it.



On Tue, Sep 4, 2018 at 12:38 PM Pardhiv Karri 
mailto:meher4in...@gmail.com>> wrote:

   Hi,



   I created a ceph cluster  manually (not using ceph-deploy). When I reboot 
the node the osd's doesn't come backup because the OS doesn't know that it need 
to bring up the OSD. I am running this on Ubuntu 1604. Is there a standardized 
way to initiate ceph osd start on node reboot?

   "sudo start ceph-osd-all" isn't working well and doesn't like the idea of 
"sudo start ceph-osd id=1" for each OSD in rc file.Need to do it for both 
Hammer (Ubuntu 1404) and Luminous (Ubuntu 1604).

   --

   Thanks,

   Pardhiv Karri
   "Rise and Rise again until LAMBS become LIONS"



   ___
   Ceph-community mailing list
   ceph-commun...@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow requests from bluestore osds

2018-09-06 Thread Marc Schöchlin
Hello Uwe,

as described in my mail we are running 4.13.0-39.

In conjunction with some later mails of this thread it seems that this problem 
might related to os/microcode (spectre) updates.
I am planning a ceph/ubuntu upgrade in the next week because of various 
reasons, let's see what happens.

Regards Marc


Am 05.09.2018 um 20:24 schrieb Uwe Sauter:
> I'm also experiencing slow requests though I cannot point it to scrubbing.
>
> Which kernel do you run? Would you be able to test against the same kernel 
> with Spectre/Meltdown mitigations disabled ("noibrs noibpb nopti 
> nospectre_v2" as boot option)?
>
> Uwe
>
> Am 05.09.18 um 19:30 schrieb Brett Chancellor:
>> Marc,
>>    As with you, this problem manifests itself only when the bluestore OSD is 
>> involved in some form of deep scrub.  Anybody have any insight on what might 
>> be causing this?
>>
>> -Brett
>>
>> On Mon, Sep 3, 2018 at 4:13 AM, Marc Schöchlin > > wrote:
>>
>>     Hi,
>>
>>     we are also experiencing this type of behavior for some weeks on our not
>>     so performance critical hdd pools.
>>     We haven't spent so much time on this problem, because there are
>>     currently more important tasks - but here are a few details:
>>
>>     Running the following loop results in the following output:
>>
>>     while true; do ceph health|grep -q HEALTH_OK || (date;  ceph health
>>     detail); sleep 2; done
>>
>>     Sun Sep  2 20:59:47 CEST 2018
>>     HEALTH_WARN 4 slow requests are blocked > 32 sec
>>     REQUEST_SLOW 4 slow requests are blocked > 32 sec
>>      4 ops are blocked > 32.768 sec
>>      osd.43 has blocked requests > 32.768 sec
>>     Sun Sep  2 20:59:50 CEST 2018
>>     HEALTH_WARN 4 slow requests are blocked > 32 sec
>>     REQUEST_SLOW 4 slow requests are blocked > 32 sec
>>      4 ops are blocked > 32.768 sec
>>      osd.43 has blocked requests > 32.768 sec
>>     Sun Sep  2 20:59:52 CEST 2018
>>     HEALTH_OK
>>     Sun Sep  2 21:00:28 CEST 2018
>>     HEALTH_WARN 1 slow requests are blocked > 32 sec
>>     REQUEST_SLOW 1 slow requests are blocked > 32 sec
>>      1 ops are blocked > 32.768 sec
>>      osd.41 has blocked requests > 32.768 sec
>>     Sun Sep  2 21:00:31 CEST 2018
>>     HEALTH_WARN 7 slow requests are blocked > 32 sec
>>     REQUEST_SLOW 7 slow requests are blocked > 32 sec
>>      7 ops are blocked > 32.768 sec
>>      osds 35,41 have blocked requests > 32.768 sec
>>     Sun Sep  2 21:00:33 CEST 2018
>>     HEALTH_WARN 7 slow requests are blocked > 32 sec
>>     REQUEST_SLOW 7 slow requests are blocked > 32 sec
>>      7 ops are blocked > 32.768 sec
>>      osds 35,51 have blocked requests > 32.768 sec
>>     Sun Sep  2 21:00:35 CEST 2018
>>     HEALTH_WARN 7 slow requests are blocked > 32 sec
>>     REQUEST_SLOW 7 slow requests are blocked > 32 sec
>>      7 ops are blocked > 32.768 sec
>>      osds 35,51 have blocked requests > 32.768 sec
>>
>>     Our details:
>>
>>    * system details:
>>      * Ubuntu 16.04
>>       * Kernel 4.13.0-39
>>       * 30 * 8 TB Disk (SEAGATE/ST8000NM0075)
>>       * 3* Dell Power Edge R730xd (Firmware 2.50.50.50)
>>         * Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
>>         * 2*10GBITS SFP+ Network Adapters
>>         * 192GB RAM
>>       * Pools are using replication factor 3, 2MB object size,
>>         85% write load, 1700 write IOPS/sec
>>         (ops mainly between 4k and 16k size), 300 read IOPS/sec
>>    * we have the impression that this appears on deepscrub/scrub 
>> activity.
>>    * Ceph 12.2.5, we alread played with the osd settings OSD Settings
>>      (our assumtion was that the problem is related to rocksdb 
>> compaction)
>>      bluestore cache kv max = 2147483648
>>      bluestore cache kv ratio = 0.9
>>      bluestore cache meta ratio = 0.1
>>      bluestore cache size hdd = 10737418240
>>    * this type problem only appears on hdd/bluestore osds, ssd/bluestore
>>      osds did never experienced that problem
>>    * the system is healthy, no swapping, no high load, no errors in dmesg
>>
>>     I attached a log excerpt of osd.35 - probably this is useful for
>>     investigating the problem is someone owns deeper bluestore knowledge.
>>     (slow requests appeared on Sun Sep  2 21:00:35)
>>
>>     Regards
>>     Marc
>>
>>
>>     Am 02.09.2018 um 15:50 schrieb Brett Chancellor:
>>     > The warnings look like this.     >
>>     > 6 ops are blocked > 32.768 sec on osd.219
>>     > 1 osds have slow requests
>>     >
>>     > On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza > 
>>     > >> wrote:
>>     >
>>     >     On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor
>>  >     mailto:bchancel...@salesforce.com> 
>> >     >>
>>  >     wrote:
>> 

Re: [ceph-users] Ceph Luminous - journal setting

2018-09-06 Thread M Ranga Swami Reddy
Thank you.
Yep, Iam using the bluestore backend with Luminous version.

Thanks
Swami
On Tue, Sep 4, 2018 at 9:04 PM David Turner  wrote:
>
> Are you planning on using bluestore or filestore?  The settings for filestore 
> haven't changed.  If you're planning to use bluestore there is a lot of 
> documentation in the ceph docs as well as a wide history of questions like 
> this on the ML.
>
> On Mon, Sep 3, 2018 at 5:24 AM M Ranga Swami Reddy  
> wrote:
>>
>> Hi  - I am using the Ceph Luminous release. here what are the OSD
>> journal settings needed for OSD?
>> NOTE: I used SSDs for journal till Jewel release.
>>
>> Thanks
>> Swami
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.8 Luminous released

2018-09-06 Thread Abhishek Lekshmanan
Adrian Saul  writes:

> Can I confirm if this bluestore compression assert issue is resolved in 
> 12.2.8?
>
> https://tracker.ceph.com/issues/23540

The PR itself in the backport issue is in the release notes, ie.
pr#22909, which references two tracker issues. Unfortunately,the script
that generates release notes handles one tracker issue back to its
original non-backport tracker, which caused only #21480 being mentioned.

Thanks for noticing, I'll fix the script to follow multiple issues. 
>
> I notice that it has a backport that is listed against 12.2.8 but there is no 
> mention of that issue or backport listed in the release notes.
>
>
>> -Original Message-
>> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
>> ow...@vger.kernel.org] On Behalf Of Abhishek Lekshmanan
>> Sent: Wednesday, 5 September 2018 2:30 AM
>> To: ceph-de...@vger.kernel.org; ceph-us...@ceph.com; ceph-
>> maintain...@ceph.com; ceph-annou...@ceph.com
>> Subject: v12.2.8 Luminous released
>>
>>
>> We're glad to announce the next point release in the Luminous v12.2.X stable
>> release series. This release contains a range of bugfixes and stability
>> improvements across all the components of ceph. For detailed release notes
>> with links to tracker issues and pull requests, refer to the blog post at
>> http://ceph.com/releases/v12-2-8-released/
>>
>> Upgrade Notes from previous luminous releases
>> -
>>
>> When upgrading from v12.2.5 or v12.2.6 please note that upgrade caveats
>> from
>> 12.2.5 will apply to any _newer_ luminous version including 12.2.8. Please
>> read the notes at https://ceph.com/releases/12-2-7-luminous-
>> released/#upgrading-from-v12-2-6
>>
>> For the cluster that installed the broken 12.2.6 release, 12.2.7 fixed the
>> regression and introduced a workaround option `osd distrust data digest =
>> true`, but 12.2.7 clusters still generated health warnings like ::
>>
>>   [ERR] 11.288 shard 207: soid
>>   11:1155c332:::rbd_data.207dce238e1f29.0527:head
>> data_digest
>>   0xc8997a5b != data_digest 0x2ca15853
>>
>>
>> 12.2.8 improves the deep scrub code to automatically repair these
>> inconsistencies. Once the entire cluster has been upgraded and then fully
>> deep scrubbed, and all such inconsistencies are resolved; it will be safe to
>> disable the `osd distrust data digest = true` workaround option.
>>
>> Changelog
>> -
>> * bluestore: set correctly shard for existed Collection (issue#24761, 
>> pr#22860,
>> Jianpeng Ma)
>> * build/ops: Boost system library is no longer required to compile and link
>> example librados program (issue#25054, pr#23202, Nathan Cutler)
>> * build/ops: Bring back diff -y for non-FreeBSD (issue#24396, issue#21664,
>> pr#22848, Sage Weil, David Zafman)
>> * build/ops: install-deps.sh fails on newest openSUSE Leap (issue#25064,
>> pr#23179, Kyr Shatskyy)
>> * build/ops: Mimic build fails with -DWITH_RADOSGW=0 (issue#24437,
>> pr#22864, Dan Mick)
>> * build/ops: order rbdmap.service before remote-fs-pre.target
>> (issue#24713, pr#22844, Ilya Dryomov)
>> * build/ops: rpm: silence osd block chown (issue#25152, pr#23313, Dan van
>> der Ster)
>> * cephfs-journal-tool: Fix purging when importing an zero-length journal
>> (issue#24239, pr#22980, yupeng chen, zhongyan gu)
>> * cephfs: MDSMonitor: uncommitted state exposed to clients/mdss
>> (issue#23768, pr#23013, Patrick Donnelly)
>> * ceph-fuse mount failed because no mds (issue#22205, pr#22895, liyan)
>> * ceph-volume add a __release__ string, to help version-conditional calls
>> (issue#25170, pr#23331, Alfredo Deza)
>> * ceph-volume: adds test for `ceph-volume lvm list /dev/sda` (issue#24784,
>> issue#24957, pr#23350, Andrew Schoen)
>> * ceph-volume: do not use stdin in luminous (issue#25173, issue#23260,
>> pr#23367, Alfredo Deza)
>> * ceph-volume enable the ceph-osd during lvm activation (issue#24152,
>> pr#23394, Dan van der Ster, Alfredo Deza)
>> * ceph-volume expand on the LVM API to create multiple LVs at different
>> sizes (issue#24020, pr#23395, Alfredo Deza)
>> * ceph-volume lvm.activate conditional mon-config on prime-osd-dir
>> (issue#25216, pr#23397, Alfredo Deza)
>> * ceph-volume lvm.batch remove non-existent sys_api property
>> (issue#34310, pr#23811, Alfredo Deza)
>> * ceph-volume lvm.listing only include devices if they exist (issue#24952,
>> pr#23150, Alfredo Deza)
>> * ceph-volume: process.call with stdin in Python 3 fix (issue#24993, 
>> pr#23238,
>> Alfredo Deza)
>> * ceph-volume: PVolumes.get() should return one PV when using name or
>> uuid (issue#24784, pr#23329, Andrew Schoen)
>> * ceph-volume: refuse to zap mapper devices (issue#24504, pr#23374,
>> Andrew Schoen)
>> * ceph-volume: tests.functional inherit SSH_ARGS from ansible (issue#34311,
>> pr#23813, Alfredo Deza)
>> * ceph-volume tests/functional run lvm list after OSD provisioning
>> (issue#24961, pr#23147, Alfredo Deza)
>> * ceph-volume: unmount lvs correctly before zapping (issue#2

Re: [ceph-users] ceph-fuse using excessive memory

2018-09-06 Thread Andras Pataki
It looks like I have a process that can reproduce the problem at will.  
Attached is a quick plot of the RSS memory usage of ceph-fuse over a 
period of 13-14 hours or so (the x axis is minutes, the y axis is 
bytes).  It looks like the process steadily grows up to about 200GB and 
then its memory usage stabilizes.  So something comes to equilibrium at 
the 200GB size.  What would be a good way to further understand where 
the memory goes?  I could even run some instrumented binary if needed to 
further pin down what is happening.  As I mentioned below, we are 
running with somewhat increased memory related settings in 
/etc/ceph.conf, but based on my understanding of the parameters, they 
shouldn't amount to such high memory usage.


Andras


On 09/05/2018 10:15 AM, Andras Pataki wrote:
Below are the performance counters.  Some scientific workflows trigger 
this - some parts of them are quite data intensive - they process 
thousands of files over many hours to days.  The 200GB ceph-fuse got 
there in about 3 days.  I'm keeping the node alive for now in case we 
can extract some more definitive info on what is happening there.


Andras


# ceph daemon /var/run/ceph/ceph-client.admin.asok perf dump
{
    "AsyncMessenger::Worker-0": {
    "msgr_recv_messages": 37730,
    "msgr_send_messages": 37731,
    "msgr_recv_bytes": 1121379127,
    "msgr_send_bytes": 11913693154,
    "msgr_created_connections": 75333,
    "msgr_active_connections": 730,
    "msgr_running_total_time": 642.152166956,
    "msgr_running_send_time": 536.723862752,
    "msgr_running_recv_time": 25.429112242,
    "msgr_running_fast_dispatch_time": 63.814291954
    },
    "AsyncMessenger::Worker-1": {
    "msgr_recv_messages": 38507,
    "msgr_send_messages": 38467,
    "msgr_recv_bytes": 1240174043,
    "msgr_send_bytes": 11673685736,
    "msgr_created_connections": 75479,
    "msgr_active_connections": 729,
    "msgr_running_total_time": 628.670562086,
    "msgr_running_send_time": 523.772820969,
    "msgr_running_recv_time": 25.902871268,
    "msgr_running_fast_dispatch_time": 62.375965165
    },
    "AsyncMessenger::Worker-2": {
    "msgr_recv_messages": 597697,
    "msgr_send_messages": 504640,
    "msgr_recv_bytes": 1314713236,
    "msgr_send_bytes": 11880445442,
    "msgr_created_connections": 75338,
    "msgr_active_connections": 728,
    "msgr_running_total_time": 711.909282325,
    "msgr_running_send_time": 556.195748166,
    "msgr_running_recv_time": 127.267332682,
    "msgr_running_fast_dispatch_time": 62.209721085
    },
    "client": {
    "reply": {
    "avgcount": 236795,
    "sum": 6177.205536940,
    "avgtime": 0.026086722
    },
    "lat": {
    "avgcount": 236795,
    "sum": 6177.205536940,
    "avgtime": 0.026086722
    },
    "wrlat": {
    "avgcount": 857828153,
    "sum": 8413.835066735,
    "avgtime": 0.09808
    }
    },
    "objectcacher-libcephfs": {
    "cache_ops_hit": 4160412,
    "cache_ops_miss": 4887,
    "cache_bytes_hit": 3247294145494,
    "cache_bytes_miss": 12914144260,
    "data_read": 48923557765,
    "data_written": 35292875783,
    "data_flushed": 35292681606,
    "data_overwritten_while_flushing": 0,
    "write_ops_blocked": 0,
    "write_bytes_blocked": 0,
    "write_time_blocked": 0.0
    },
    "objecter": {
    "op_active": 0,
    "op_laggy": 0,
    "op_send": 111268,
    "op_send_bytes": 35292681606,
    "op_resend": 0,
    "op_reply": 111268,
    "op": 111268,
    "op_r": 2193,
    "op_w": 109075,
    "op_rmw": 0,
    "op_pg": 0,
    "osdop_stat": 2,
    "osdop_create": 2,
    "osdop_read": 2193,
    "osdop_write": 109071,
    "osdop_writefull": 0,
    "osdop_writesame": 0,
    "osdop_append": 0,
    "osdop_zero": 0,
    "osdop_truncate": 0,
    "osdop_delete": 0,
    "osdop_mapext": 0,
    "osdop_sparse_read": 0,
    "osdop_clonerange": 0,
    "osdop_getxattr": 0,
    "osdop_setxattr": 0,
    "osdop_cmpxattr": 0,
    "osdop_rmxattr": 0,
    "osdop_resetxattrs": 0,
    "osdop_tmap_up": 0,
    "osdop_tmap_put": 0,
    "osdop_tmap_get": 0,
    "osdop_call": 0,
    "osdop_watch": 0,
    "osdop_notify": 0,
    "osdop_src_cmpxattr": 0,
    "osdop_pgls": 0,
    "osdop_pgls_filter": 0,
    "osdop_other": 0,
    "linger_active": 0,
    "linger_send": 0,
    "linger_resend": 0,
    "linger_ping": 0,
    "poolop_active": 0,
    "poolop_send": 0,
    "poolop_resend": 0,
    "poolstat_active": 0,
    "poolstat_send": 0,
    "poolstat_resend": 0,
    "statfs_active": 0,
    "statfs_send": 1348,
    "statfs_resend": 0,
    "command_a

[ceph-users] help needed

2018-09-06 Thread Muhammad Junaid
Hi there

Hope, every one will be fine. I need an urgent help in ceph cluster design.
We are planning 3 OSD node cluster in the beginning. Details are as under:

Servers: 3 * DELL R720xd
OS Drives: 2 2.5" SSD
OSD Drives: 10  3.5" SAS 7200rpm 3/4 TB
Journal Drives: 2 SSD's Samsung 850 PRO 256GB each
Raid controller: PERC H710 (512MB Cache)
OSD Drives: On raid0 mode
Journal Drives: JBOD Mode
Rocks db: On same Journal drives

My question is: is this setup good for a start? And critical question is:
should we enable write back caching on controller for Journal drives? Pls
suggest. Thanks in advance. Regards.

Muhammad Junaid
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fixing a 12.2.5 reshard

2018-09-06 Thread Sean Purdy
Hi,


We were on 12.2.5 when a bucket with versioning and 100k objects got stuck when 
autoreshard kicked in.  We could download but not upload files.  But upgrading 
to 12.2.7 then running bucket check now shows twice as many objects, according 
to bucket limit check.  How do I fix this?


Sequence:

12.2.5 autoshard happened, "radosgw-admin reshard list" showed a reshard 
happening but no action.
12.2.7 upgrade went fine, didn't fix anything straightaway. "radosgw-admin 
reshard list" same.  Still no file uploads.  bucket limit check showed 100k 
files in the bucket as expected, and no shards.
Ran "radosgw-admin bucket check --fix"

Now "reshard list" shows no reshards in progress, but bucket limit check shows 
200k files in two shards, 100k per shard.  It should be half this.


The output of "bucket check --fix" has 
existing_header: "num_objects": 203344 for "rgw.main"
calculated_header: "num_objects": 101621

Shouldn't it install the calculated_header?



Before:

$ sudo radosgw-admin reshard list
[
  {

"tenant": "",
"bucket_name": "static",
"bucket_id": "a5501bce-1360-43e3-af08-8f3d1e102a79.3475308.1",
"new_instance_id": "static:a5501bce-1360-43e3-af08-8f3d1e102a79.3620665.1",
"old_num_shards": 1,
"new_num_shards": 2
  }
]

$ sudo radosgw-admin bucket limit check
{
"user_id": "static",
"buckets": [
{
"bucket": "static",
"tenant": "",
"num_objects": 101621,
"num_shards": 0,
"objects_per_shard": 101621,
"fill_status": "OK"
}
]
}

Output from bucket check --fix

{
"existing_header": {
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 101621
},
"rgw.main": {
"size": 37615290807,
"size_actual": 38017675264,
"size_utilized": 0,
"size_kb": 36733683,
"size_kb_actual": 37126636,
"size_kb_utilized": 0,
"num_objects": 203344
}
}
},
"calculated_header": {
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 101621
},
"rgw.main": {
"size": 18796589005,
"size_actual": 18997686272,
"size_utilized": 18796589005,
"size_kb": 18356044,
"size_kb_actual": 18552428,
"size_kb_utilized": 18356044,
"num_objects": 101621
}
}
}
}

After:

{
"user_id": "static",
"buckets": [
{
"bucket": "static",
"tenant": "",
"num_objects": 203242,
"num_shards": 2,
"objects_per_shard": 101621,
"fill_status": "OK"
}
]
}


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] help needed

2018-09-06 Thread Marc Roos
 
 

Do not use Samsung 850 PRO for journal
Just use LSI logic HBA (eg. SAS2308)


-Original Message-
From: Muhammad Junaid [mailto:junaid.fsd...@gmail.com] 
Sent: donderdag 6 september 2018 13:18
To: ceph-users@lists.ceph.com
Subject: [ceph-users] help needed

Hi there

Hope, every one will be fine. I need an urgent help in ceph cluster 
design. We are planning 3 OSD node cluster in the beginning. Details are 
as under:

Servers: 3 * DELL R720xd
OS Drives: 2 2.5" SSD
OSD Drives: 10  3.5" SAS 7200rpm 3/4 TB
Journal Drives: 2 SSD's Samsung 850 PRO 256GB each Raid controller: PERC 
H710 (512MB Cache) OSD Drives: On raid0 mode Journal Drives: JBOD Mode 
Rocks db: On same Journal drives

My question is: is this setup good for a start? And critical question 
is: should we enable write back caching on controller for Journal 
drives? Pls suggest. Thanks in advance. Regards.

Muhammad Junaid 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading ceph with HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent

2018-09-06 Thread Marc Roos
Thanks interesting to read. So in luminous it is not really a problem. I 
was expecting to get into trouble with the monitors/mds. Because my 
failover takes quite long, and thought it was related to the damaged pg

Luminous: "When the past intervals tracking structure was rebuilt around 
exactly the information required, it became extremely compact and 
relatively insensitive to extended periods of cluster unhealthiness" 

> >
> >
> > The adviced solution is to upgrade ceph only in HEALTH_OK state. And 
I
> > also read somewhere that is bad to have your cluster for a long time 
in
> > an HEALTH_ERR state.
> >
> > But why is this bad?

See https://ceph.com/community/new-luminous-pg-overdose-protection
under "Problems with past intervals"

"if the cluster becomes unhealthy, and especially if it remains 
unhealthy for an extended period of time, a combination of effects can 
cause problems."

"If a cluster is unhealthy for an extended period of time (e.g., days or 
even weeks), the past interval set can become large enough to require a 
significant amount of memory."



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading ceph with HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent

2018-09-06 Thread Marc Roos


> 
> >
> >
> > The adviced solution is to upgrade ceph only in HEALTH_OK state. And 
I
> > also read somewhere that is bad to have your cluster for a long time 
in
> > an HEALTH_ERR state.
> >
> > But why is this bad?
> 
> Aside from the obvious (errors are bad things!), many people have
> external monitoring systems that will alert them on the transitions
> between OK/WARN/ERR.  If the system is stuck in ERR for a long time,
> they are unlikely to notice new errors or warnings.  These systems can
> accumulate faults without the operator noticing.

All obvious, I would expect such answer on psychology mailing list ;)

I am mostly testing with ceph, and trying to educate myself a bit.
I am asking because I had this error in sep. 2017 then when changing
the crush reweight it disappeared, on jan.2018 after scrubbing it 
appeared and now after adding the 4th node it disappeared again. 



> > Why is this bad during upgrading?
> 
> It depends what's gone wrong.  For example:

>  - If your cluster is degraded (fewer than desired number of replicas
> of data) then taking more services offline (even briefly) to do an
> upgrade will create greater risk to the data by reducing the number of
> copies available.
> - If your system is in an error state because something has gone bad
> on disk, then recovering it with the same software that wrote the data
> is a more tested code path than running some newer code against a
> system left in a strange state by an older version.
> 
> There will always be exceptions to this (e.g. where the upgrade is the
> fix for whatever caused the error), but the general purpose advice is
> to get a system nice and clean before starting the upgrade.
> 
> John
> 
> > Can I quantify how bad it is? (like with large log/journal file?)
> >
> >


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] help needed

2018-09-06 Thread Muhammad Junaid
Thanks. Can you please clarify, if we use any other enterprise class SSD
for journal, should we enable write-back caching available on raid
controller for journal device or connect it as write through. Regards.

On Thu, Sep 6, 2018 at 4:50 PM Marc Roos  wrote:

>
>
>
> Do not use Samsung 850 PRO for journal
> Just use LSI logic HBA (eg. SAS2308)
>
>
> -Original Message-
> From: Muhammad Junaid [mailto:junaid.fsd...@gmail.com]
> Sent: donderdag 6 september 2018 13:18
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] help needed
>
> Hi there
>
> Hope, every one will be fine. I need an urgent help in ceph cluster
> design. We are planning 3 OSD node cluster in the beginning. Details are
> as under:
>
> Servers: 3 * DELL R720xd
> OS Drives: 2 2.5" SSD
> OSD Drives: 10  3.5" SAS 7200rpm 3/4 TB
> Journal Drives: 2 SSD's Samsung 850 PRO 256GB each Raid controller: PERC
> H710 (512MB Cache) OSD Drives: On raid0 mode Journal Drives: JBOD Mode
> Rocks db: On same Journal drives
>
> My question is: is this setup good for a start? And critical question
> is: should we enable write back caching on controller for Journal
> drives? Pls suggest. Thanks in advance. Regards.
>
> Muhammad Junaid
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] help needed

2018-09-06 Thread David Turner
The official ceph documentation recommendations for a db partition for a
4TB bluestore osd would be 160GB each.

Samsung Evo Pro is not an Enterprise class SSD. A quick search of the ML
will allow which SSDs people are using.

As was already suggested, the better option is an HBA as opposed to a raid
controller. If you are set on your controllers, write-back is fine as long
as you have BBU. Otherwise you should be using write-through.

On Thu, Sep 6, 2018, 8:54 AM Muhammad Junaid 
wrote:

> Thanks. Can you please clarify, if we use any other enterprise class SSD
> for journal, should we enable write-back caching available on raid
> controller for journal device or connect it as write through. Regards.
>
> On Thu, Sep 6, 2018 at 4:50 PM Marc Roos  wrote:
>
>>
>>
>>
>> Do not use Samsung 850 PRO for journal
>> Just use LSI logic HBA (eg. SAS2308)
>>
>>
>> -Original Message-
>> From: Muhammad Junaid [mailto:junaid.fsd...@gmail.com]
>> Sent: donderdag 6 september 2018 13:18
>> To: ceph-users@lists.ceph.com
>> Subject: [ceph-users] help needed
>>
>> Hi there
>>
>> Hope, every one will be fine. I need an urgent help in ceph cluster
>> design. We are planning 3 OSD node cluster in the beginning. Details are
>> as under:
>>
>> Servers: 3 * DELL R720xd
>> OS Drives: 2 2.5" SSD
>> OSD Drives: 10  3.5" SAS 7200rpm 3/4 TB
>> Journal Drives: 2 SSD's Samsung 850 PRO 256GB each Raid controller: PERC
>> H710 (512MB Cache) OSD Drives: On raid0 mode Journal Drives: JBOD Mode
>> Rocks db: On same Journal drives
>>
>> My question is: is this setup good for a start? And critical question
>> is: should we enable write back caching on controller for Journal
>> drives? Pls suggest. Thanks in advance. Regards.
>>
>> Muhammad Junaid
>>
>>
>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] help needed

2018-09-06 Thread Nick Fisk
If it helps, I’m seeing about a 3GB DB usage for a 3TB OSD about 60% full. This 
is with a pure RBD workload, I believe this can vary depending on what your 
Ceph use case is.

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of David 
Turner
Sent: 06 September 2018 14:09
To: Muhammad Junaid 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] help needed

 

The official ceph documentation recommendations for a db partition for a 4TB 
bluestore osd would be 160GB each.

 

Samsung Evo Pro is not an Enterprise class SSD. A quick search of the ML will 
allow which SSDs people are using.

 

As was already suggested, the better option is an HBA as opposed to a raid 
controller. If you are set on your controllers, write-back is fine as long as 
you have BBU. Otherwise you should be using write-through.

 

On Thu, Sep 6, 2018, 8:54 AM Muhammad Junaid mailto:junaid.fsd...@gmail.com> > wrote:

Thanks. Can you please clarify, if we use any other enterprise class SSD for 
journal, should we enable write-back caching available on raid controller for 
journal device or connect it as write through. Regards.

 

On Thu, Sep 6, 2018 at 4:50 PM Marc Roos mailto:m.r...@f1-outsourcing.eu> > wrote:

 


Do not use Samsung 850 PRO for journal
Just use LSI logic HBA (eg. SAS2308)


-Original Message-
From: Muhammad Junaid [mailto:junaid.fsd...@gmail.com 
 ] 
Sent: donderdag 6 september 2018 13:18
To: ceph-users@lists.ceph.com  
Subject: [ceph-users] help needed

Hi there

Hope, every one will be fine. I need an urgent help in ceph cluster 
design. We are planning 3 OSD node cluster in the beginning. Details are 
as under:

Servers: 3 * DELL R720xd
OS Drives: 2 2.5" SSD
OSD Drives: 10  3.5" SAS 7200rpm 3/4 TB
Journal Drives: 2 SSD's Samsung 850 PRO 256GB each Raid controller: PERC 
H710 (512MB Cache) OSD Drives: On raid0 mode Journal Drives: JBOD Mode 
Rocks db: On same Journal drives

My question is: is this setup good for a start? And critical question 
is: should we enable write back caching on controller for Journal 
drives? Pls suggest. Thanks in advance. Regards.

Muhammad Junaid 




___
ceph-users mailing list
ceph-users@lists.ceph.com  
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Menno Zonneveld
I've setup a CEPH cluster to test things before going into production but I've 
run into some performance issues that I cannot resolve or explain.

Hardware in use in each storage machine (x3)
- dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000)
- dual 10Gbit EdgeSwitch 16-Port XG
- LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
- 3x Intel S4500 480GB SSD as OSD's
- 2x SSD raid-1 boot/OS disks
- 2x Intel(R) Xeon(R) CPU E5-2630
- 128GB memory

Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 on all 
nodes.

Running rados benchmark resulted in somewhat lower than expected performance 
unless ceph enters the 'near-full' state. When the cluster is mostly empty 
rados bench (180 write -b 4M -t 16) results in about 330MB/s with 0.18ms 
latency but when hitting near-full state this goes up to a more expected 
550MB/s and 0.11ms latency.

iostat on the storage machines shows the disks are hardly utilized unless the 
cluster hits near-full, CPU and network also aren't maxed out. I’ve also tried 
with NIC bonding and just one switch, without jumbo frames but nothing seem to 
matter in this case.

Is this expected behavior or what can I try to do to pinpoint the bottleneck ?

The expected performance is per Proxmox's benchmark results they released this 
year, they have 4 OSD's per server and hit almost 800MB/s with 0.08ms latency 
using 10Gbit and 3 nodes, though they have more OSD's and somewhat different 
hardware I understand I won't hit the 800MB/s mark but the difference between 
empty and almost full cluster makes no sense to me, I'd expect it to be the 
other way around.

Thanks,
Menno
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Marc Roos
 

Test pool is 3x replicated?


-Original Message-
From: Menno Zonneveld [mailto:me...@1afa.com] 
Sent: donderdag 6 september 2018 15:29
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Rados performance inconsistencies, lower than 
expected performance

I've setup a CEPH cluster to test things before going into production 
but I've run into some performance issues that I cannot resolve or 
explain.

Hardware in use in each storage machine (x3)
- dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000)
- dual 10Gbit EdgeSwitch 16-Port XG
- LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
- 3x Intel S4500 480GB SSD as OSD's
- 2x SSD raid-1 boot/OS disks
- 2x Intel(R) Xeon(R) CPU E5-2630
- 128GB memory

Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 on 
all nodes.

Running rados benchmark resulted in somewhat lower than expected 
performance unless ceph enters the 'near-full' state. When the cluster 
is mostly empty rados bench (180 write -b 4M -t 16) results in about 
330MB/s with 0.18ms latency but when hitting near-full state this goes 
up to a more expected 550MB/s and 0.11ms latency.

iostat on the storage machines shows the disks are hardly utilized 
unless the cluster hits near-full, CPU and network also aren't maxed 
out. I’ve also tried with NIC bonding and just one switch, without 
jumbo frames but nothing seem to matter in this case.

Is this expected behavior or what can I try to do to pinpoint the 
bottleneck ?

The expected performance is per Proxmox's benchmark results they 
released this year, they have 4 OSD's per server and hit almost 800MB/s 
with 0.08ms latency using 10Gbit and 3 nodes, though they have more 
OSD's and somewhat different hardware I understand I won't hit the 
800MB/s mark but the difference between empty and almost full cluster 
makes no sense to me, I'd expect it to be the other way around.

Thanks,
Menno
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Menno Zonneveld
ah yes, 3x replicated with minimal 2.


my ceph.conf is pretty bare, just in case it might be relevant

[global]
 auth client required = cephx
 auth cluster required = cephx
 auth service required = cephx

 cluster network = 172.25.42.0/24

 fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e

 keyring = /etc/pve/priv/$cluster.$name.keyring

 mon allow pool delete = true
 mon osd allow primary affinity = true

 osd journal size = 5120
 osd pool default min size = 2
 osd pool default size = 3


-Original message-
> From:Marc Roos 
> Sent: Thursday 6th September 2018 15:43
> To: ceph-users ; Menno Zonneveld 
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
>  
> 
> Test pool is 3x replicated?
> 
> 
> -Original Message-
> From: Menno Zonneveld [mailto:me...@1afa.com] 
> Sent: donderdag 6 september 2018 15:29
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> I've setup a CEPH cluster to test things before going into production 
> but I've run into some performance issues that I cannot resolve or 
> explain.
> 
> Hardware in use in each storage machine (x3)
> - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000)
> - dual 10Gbit EdgeSwitch 16-Port XG
> - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
> - 3x Intel S4500 480GB SSD as OSD's
> - 2x SSD raid-1 boot/OS disks
> - 2x Intel(R) Xeon(R) CPU E5-2630
> - 128GB memory
> 
> Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 on 
> all nodes.
> 
> Running rados benchmark resulted in somewhat lower than expected 
> performance unless ceph enters the 'near-full' state. When the cluster 
> is mostly empty rados bench (180 write -b 4M -t 16) results in about 
> 330MB/s with 0.18ms latency but when hitting near-full state this goes 
> up to a more expected 550MB/s and 0.11ms latency.
> 
> iostat on the storage machines shows the disks are hardly utilized 
> unless the cluster hits near-full, CPU and network also aren't maxed 
> out. I’ve also tried with NIC bonding and just one switch, without 
> jumbo frames but nothing seem to matter in this case.
> 
> Is this expected behavior or what can I try to do to pinpoint the 
> bottleneck ?
> 
> The expected performance is per Proxmox's benchmark results they 
> released this year, they have 4 OSD's per server and hit almost 800MB/s 
> with 0.08ms latency using 10Gbit and 3 nodes, though they have more 
> OSD's and somewhat different hardware I understand I won't hit the 
> 800MB/s mark but the difference between empty and almost full cluster 
> makes no sense to me, I'd expect it to be the other way around.
> 
> Thanks,
> Menno
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] help needed

2018-09-06 Thread Darius Kasparavičius
Hello,

I'm currently running a similar setup. It's running a blustore OSD
with 1 NVME device for db/wal devices. That NVME device is not large
enough to support 160GB db partition per osd, so I'm stuck with 50GB
each. Currently haven't had any issues with slowdowns or crashes.

The cluster is relatively idle. Up to 10k iops at peaks with 50/50
read/write io distribution. Thought throughput is a different matter
It's more like 10:1 with 1GBps/100MBps.

 I have noticed that the best latencies I can get from using raid0 on
sas devices is running them in Writeback and disabled readahead on
controler. It might be that you will have different results. I wish
you luck in testing it.


On Thu, Sep 6, 2018 at 4:14 PM David Turner  wrote:
>
> The official ceph documentation recommendations for a db partition for a 4TB 
> bluestore osd would be 160GB each.
>
> Samsung Evo Pro is not an Enterprise class SSD. A quick search of the ML will 
> allow which SSDs people are using.
>
> As was already suggested, the better option is an HBA as opposed to a raid 
> controller. If you are set on your controllers, write-back is fine as long as 
> you have BBU. Otherwise you should be using write-through.
>
> On Thu, Sep 6, 2018, 8:54 AM Muhammad Junaid  wrote:
>>
>> Thanks. Can you please clarify, if we use any other enterprise class SSD for 
>> journal, should we enable write-back caching available on raid controller 
>> for journal device or connect it as write through. Regards.
>>
>> On Thu, Sep 6, 2018 at 4:50 PM Marc Roos  wrote:
>>>
>>>
>>>
>>>
>>> Do not use Samsung 850 PRO for journal
>>> Just use LSI logic HBA (eg. SAS2308)
>>>
>>>
>>> -Original Message-
>>> From: Muhammad Junaid [mailto:junaid.fsd...@gmail.com]
>>> Sent: donderdag 6 september 2018 13:18
>>> To: ceph-users@lists.ceph.com
>>> Subject: [ceph-users] help needed
>>>
>>> Hi there
>>>
>>> Hope, every one will be fine. I need an urgent help in ceph cluster
>>> design. We are planning 3 OSD node cluster in the beginning. Details are
>>> as under:
>>>
>>> Servers: 3 * DELL R720xd
>>> OS Drives: 2 2.5" SSD
>>> OSD Drives: 10  3.5" SAS 7200rpm 3/4 TB
>>> Journal Drives: 2 SSD's Samsung 850 PRO 256GB each Raid controller: PERC
>>> H710 (512MB Cache) OSD Drives: On raid0 mode Journal Drives: JBOD Mode
>>> Rocks db: On same Journal drives
>>>
>>> My question is: is this setup good for a start? And critical question
>>> is: should we enable write back caching on controller for Journal
>>> drives? Pls suggest. Thanks in advance. Regards.
>>>
>>> Muhammad Junaid
>>>
>>>
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Marc Roos


I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB
2x E5-2660
2x LSI SAS2308 
1x dual port 10Gbit (one used, and shared between cluster/client vlans)

I have 5 pg's scrubbing, but I am not sure if there is any on the ssd 
pool. I am noticing a drop in the performance at the end of the test. 
Maybe some caching on the ssd?

rados bench -p rbd.ssd 60 write -b 4M -t 16
Bandwidth (MB/sec): 448.465
Average Latency(s): 0.142671

rados bench -p rbd.ssd 180 write -b 4M -t 16
Bandwidth (MB/sec): 381.998
Average Latency(s): 0.167524


-Original Message-
From: Menno Zonneveld [mailto:me...@1afa.com] 
Sent: donderdag 6 september 2018 15:52
To: Marc Roos; ceph-users
Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
expected performance

ah yes, 3x replicated with minimal 2.


my ceph.conf is pretty bare, just in case it might be relevant

[global]
 auth client required = cephx
 auth cluster required = cephx
 auth service required = cephx

 cluster network = 172.25.42.0/24

 fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e

 keyring = /etc/pve/priv/$cluster.$name.keyring

 mon allow pool delete = true
 mon osd allow primary affinity = true

 osd journal size = 5120
 osd pool default min size = 2
 osd pool default size = 3


-Original message-
> From:Marc Roos 
> Sent: Thursday 6th September 2018 15:43
> To: ceph-users ; Menno Zonneveld 
> 
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> than expected performance
> 
>  
> 
> Test pool is 3x replicated?
> 
> 
> -Original Message-
> From: Menno Zonneveld [mailto:me...@1afa.com]
> Sent: donderdag 6 september 2018 15:29
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> I've setup a CEPH cluster to test things before going into production 
> but I've run into some performance issues that I cannot resolve or 
> explain.
> 
> Hardware in use in each storage machine (x3)
> - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000)
> - dual 10Gbit EdgeSwitch 16-Port XG
> - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
> - 3x Intel S4500 480GB SSD as OSD's
> - 2x SSD raid-1 boot/OS disks
> - 2x Intel(R) Xeon(R) CPU E5-2630
> - 128GB memory
> 
> Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 
> on all nodes.
> 
> Running rados benchmark resulted in somewhat lower than expected 
> performance unless ceph enters the 'near-full' state. When the cluster 

> is mostly empty rados bench (180 write -b 4M -t 16) results in about 
> 330MB/s with 0.18ms latency but when hitting near-full state this goes 

> up to a more expected 550MB/s and 0.11ms latency.
> 
> iostat on the storage machines shows the disks are hardly utilized 
> unless the cluster hits near-full, CPU and network also aren't maxed 
> out. I’ve also tried with NIC bonding and just one switch, without 
> jumbo frames but nothing seem to matter in this case.
> 
> Is this expected behavior or what can I try to do to pinpoint the 
> bottleneck ?
> 
> The expected performance is per Proxmox's benchmark results they 
> released this year, they have 4 OSD's per server and hit almost 
> 800MB/s with 0.08ms latency using 10Gbit and 3 nodes, though they have 

> more OSD's and somewhat different hardware I understand I won't hit 
> the 800MB/s mark but the difference between empty and almost full 
> cluster makes no sense to me, I'd expect it to be the other way 
around.
> 
> Thanks,
> Menno
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Alwin Antreich
Hi,

On Thu, Sep 06, 2018 at 03:52:21PM +0200, Menno Zonneveld wrote:
> ah yes, 3x replicated with minimal 2.
> 
> 
> my ceph.conf is pretty bare, just in case it might be relevant
> 
> [global]
>auth client required = cephx
>auth cluster required = cephx
>auth service required = cephx
> 
>cluster network = 172.25.42.0/24
> 
>fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e
> 
>keyring = /etc/pve/priv/$cluster.$name.keyring
> 
>mon allow pool delete = true
>mon osd allow primary affinity = true
On our test cluster, we didn't set the primary affinity as all OSDs were
SSDs of the same model. Did you do any settings other than this? How
does your crush map look like?

> 
>osd journal size = 5120
>osd pool default min size = 2
>osd pool default size = 3
> 
> 
> -Original message-
> > From:Marc Roos 
> > Sent: Thursday 6th September 2018 15:43
> > To: ceph-users ; Menno Zonneveld 
> > Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
> > expected performance
> > 
> >  
> > 
> > Test pool is 3x replicated?
> > 
> > 
> > -Original Message-
> > From: Menno Zonneveld [mailto:me...@1afa.com] 
> > Sent: donderdag 6 september 2018 15:29
> > To: ceph-users@lists.ceph.com
> > Subject: [ceph-users] Rados performance inconsistencies, lower than 
> > expected performance
> > 
> > I've setup a CEPH cluster to test things before going into production 
> > but I've run into some performance issues that I cannot resolve or 
> > explain.
> > 
> > Hardware in use in each storage machine (x3)
> > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000)
> > - dual 10Gbit EdgeSwitch 16-Port XG
> > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
> > - 3x Intel S4500 480GB SSD as OSD's
> > - 2x SSD raid-1 boot/OS disks
> > - 2x Intel(R) Xeon(R) CPU E5-2630
> > - 128GB memory
> > 
> > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 on 
> > all nodes.
> > 
> > Running rados benchmark resulted in somewhat lower than expected 
> > performance unless ceph enters the 'near-full' state. When the cluster 
> > is mostly empty rados bench (180 write -b 4M -t 16) results in about 
> > 330MB/s with 0.18ms latency but when hitting near-full state this goes 
> > up to a more expected 550MB/s and 0.11ms latency.
> > 
> > iostat on the storage machines shows the disks are hardly utilized 
> > unless the cluster hits near-full, CPU and network also aren't maxed 
> > out. I’ve also tried with NIC bonding and just one switch, without 
> > jumbo frames but nothing seem to matter in this case.
> > 
> > Is this expected behavior or what can I try to do to pinpoint the 
> > bottleneck ?
> > 
> > The expected performance is per Proxmox's benchmark results they 
> > released this year, they have 4 OSD's per server and hit almost 800MB/s 
> > with 0.08ms latency using 10Gbit and 3 nodes, though they have more 
> > OSD's and somewhat different hardware I understand I won't hit the 
> > 800MB/s mark but the difference between empty and almost full cluster 
> > makes no sense to me, I'd expect it to be the other way around.
> > 
> > Thanks,
> > Menno

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Menno Zonneveld
-Original message-
> From:Alwin Antreich 
> Sent: Thursday 6th September 2018 16:27
> To: ceph-users 
> Cc: Menno Zonneveld 
> Subject: Re: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> Hi,

Hi!

> On Thu, Sep 06, 2018 at 03:52:21PM +0200, Menno Zonneveld wrote:
> > ah yes, 3x replicated with minimal 2.
> > 
> > 
> > my ceph.conf is pretty bare, just in case it might be relevant
> > 
> > [global]
> > auth client required = cephx
> > auth cluster required = cephx
> > auth service required = cephx
> > 
> > cluster network = 172.25.42.0/24
> > 
> > fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e
> > 
> > keyring = /etc/pve/priv/$cluster.$name.keyring
> > 
> > mon allow pool delete = true
> > mon osd allow primary affinity = true
> On our test cluster, we didn't set the primary affinity as all OSDs were
> SSDs of the same model. Did you do any settings other than this? How
> does your crush map look like?

I only used this option when testing with mixing HDD and SSD (1 replica on SSD 
and 2 on HDD); right now affinity for all disks is 1.

The weight of one OSD in each server is lower because I have partitioned the 
drive to be able to test with SSD journal for HDDs but this isn't active at the 
moment.

If I understand correctly setting the weight like this should be fine and I 
also tested with weight 1 for all OSD's and I still get the same performance 
('slow' when empty, fast when full)

Current ceph osd tree

ID  CLASS WEIGHT  TYPE NAME    STATUS REWEIGHT PRI-AFF 
 -1   3.71997 root ssd 
 -5   1.23999 host ceph01-test 
  2   ssd 0.36600 osd.2    up  1.0 1.0 
  3   ssd 0.43700 osd.3    up  1.0 1.0 
  6   ssd 0.43700 osd.6    up  1.0 1.0 
 -7   1.23999 host ceph02-test 
  4   ssd 0.36600 osd.4    up  1.0 1.0 
  5   ssd 0.43700 osd.5    up  1.0 1.0 
  7   ssd 0.43700 osd.7    up  1.0 1.0 
 -3   1.23999 host ceph03-test 
  0   ssd 0.36600 osd.0    up  1.0 1.0 
  1   ssd 0.43700 osd.1    up  1.0 1.0 
  8   ssd 0.43700 osd.8    up  1.0 1.0 

My current crush map looks like this:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class ssd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class ssd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host ceph03-test {
 id -3 # do not change unnecessarily
 id -4 class ssd # do not change unnecessarily
 # weight 1.240
 alg straw2
 hash 0 # rjenkins1
 item osd.1 weight 0.437
 item osd.0 weight 0.366
 item osd.8 weight 0.437
}
host ceph01-test {
 id -5 # do not change unnecessarily
 id -6 class ssd # do not change unnecessarily
 # weight 1.240
 alg straw2
 hash 0 # rjenkins1
 item osd.3 weight 0.437
 item osd.2 weight 0.366
 item osd.6 weight 0.437
}
host ceph02-test {
 id -7 # do not change unnecessarily
 id -8 class ssd # do not change unnecessarily
 # weight 1.240
 alg straw2
 hash 0 # rjenkins1
 item osd.5 weight 0.437
 item osd.4 weight 0.366
 item osd.7 weight 0.437
}
root ssd {
 id -1 # do not change unnecessarily
 id -2 class ssd # do not change unnecessarily
 # weight 3.720
 alg straw2
 hash 0 # rjenkins1
 item ceph03-test weight 1.240
 item ceph01-test weight 1.240
 item ceph02-test weight 1.240
}

# rules
rule ssd {
 id 0
 type replicated
 min_size 1
 max_size 10
 step take ssd
 step chooseleaf firstn 0 type host
 step emit
}

# end crush map

> > 
> > osd journal size = 5120
> > osd pool default min size = 2
> > osd pool default size = 3
> > 
> >

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Menno Zonneveld
The benchmark does fluctuate quite a bit that's why I run it for 180 seconds 
now as then I do get consistent results.

Your performance seems on par with what I'm getting with 3 nodes and 9 OSD's, 
not sure what to make of that.

Are your machines actively used perhaps? Mine are mostly idle as it's still a 
test setup.

-Original message-
> From:Marc Roos 
> Sent: Thursday 6th September 2018 16:23
> To: ceph-users ; Menno Zonneveld 
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> 
> 
> I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB
> 2x E5-2660
> 2x LSI SAS2308 
> 1x dual port 10Gbit (one used, and shared between cluster/client vlans)
> 
> I have 5 pg's scrubbing, but I am not sure if there is any on the ssd 
> pool. I am noticing a drop in the performance at the end of the test. 
> Maybe some caching on the ssd?
> 
> rados bench -p rbd.ssd 60 write -b 4M -t 16
> Bandwidth (MB/sec): 448.465
> Average Latency(s): 0.142671
> 
> rados bench -p rbd.ssd 180 write -b 4M -t 16
> Bandwidth (MB/sec): 381.998
> Average Latency(s): 0.167524
> 
> 
> -Original Message-
> From: Menno Zonneveld [mailto:me...@1afa.com] 
> Sent: donderdag 6 september 2018 15:52
> To: Marc Roos; ceph-users
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> ah yes, 3x replicated with minimal 2.
> 
> 
> my ceph.conf is pretty bare, just in case it might be relevant
> 
> [global]
>auth client required = cephx
>auth cluster required = cephx
>auth service required = cephx
> 
>cluster network = 172.25.42.0/24
> 
>fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e
> 
>keyring = /etc/pve/priv/$cluster.$name.keyring
> 
>mon allow pool delete = true
>mon osd allow primary affinity = true
> 
>osd journal size = 5120
>osd pool default min size = 2
>osd pool default size = 3
> 
> 
> -Original message-
> > From:Marc Roos 
> > Sent: Thursday 6th September 2018 15:43
> > To: ceph-users ; Menno Zonneveld 
> > 
> > Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> > than expected performance
> > 
> >  
> > 
> > Test pool is 3x replicated?
> > 
> > 
> > -Original Message-
> > From: Menno Zonneveld [mailto:me...@1afa.com]
> > Sent: donderdag 6 september 2018 15:29
> > To: ceph-users@lists.ceph.com
> > Subject: [ceph-users] Rados performance inconsistencies, lower than 
> > expected performance
> > 
> > I've setup a CEPH cluster to test things before going into production 
> > but I've run into some performance issues that I cannot resolve or 
> > explain.
> > 
> > Hardware in use in each storage machine (x3)
> > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000)
> > - dual 10Gbit EdgeSwitch 16-Port XG
> > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
> > - 3x Intel S4500 480GB SSD as OSD's
> > - 2x SSD raid-1 boot/OS disks
> > - 2x Intel(R) Xeon(R) CPU E5-2630
> > - 128GB memory
> > 
> > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 
> > on all nodes.
> > 
> > Running rados benchmark resulted in somewhat lower than expected 
> > performance unless ceph enters the 'near-full' state. When the cluster 
> 
> > is mostly empty rados bench (180 write -b 4M -t 16) results in about 
> > 330MB/s with 0.18ms latency but when hitting near-full state this goes 
> 
> > up to a more expected 550MB/s and 0.11ms latency.
> > 
> > iostat on the storage machines shows the disks are hardly utilized 
> > unless the cluster hits near-full, CPU and network also aren't maxed 
> > out. I’ve also tried with NIC bonding and just one switch, without 
> > jumbo frames but nothing seem to matter in this case.
> > 
> > Is this expected behavior or what can I try to do to pinpoint the 
> > bottleneck ?
> > 
> > The expected performance is per Proxmox's benchmark results they 
> > released this year, they have 4 OSD's per server and hit almost 
> > 800MB/s with 0.08ms latency using 10Gbit and 3 nodes, though they have 
> 
> > more OSD's and somewhat different hardware I understand I won't hit 
> > the 800MB/s mark but the difference between empty and almost full 
> > cluster makes no sense to me, I'd expect it to be the other way 
> around.
> > 
> > Thanks,
> > Menno
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> > 
> 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] failing to respond to cache pressure

2018-09-06 Thread Eugen Block

Hi,

I would like to update this thread for others struggling with cache pressure.

The last time we hit that message was more than three weeks ago  
(workload has not changed), so it seems as our current configuration  
is fitting our workload.
Reducing client_oc_size to 100 MB (from default 200 MB) seems to be  
the trick here, just increasing the cache size was not enough, at  
least not if you are limited in memory. Currently we have set  
mds_cache_memory_limit to 4 GB.


Another note on MDS cache size:
I had configured the mds_cache_memory_limit (4 GB) and client_oc_size  
(100 MB) in version 12.2.5. Comparing the real usage with "ceph daemon  
mds. cache status" and the reserved memory with "top" I noticed a  
huge difference, the reserved memory was almost 8 GB while "cache  
status" was at nearly 4 GB.
After upgrading to 12.2.7 the reserved memory size in top is still  
only about 5 GB after one week. Obviously there have been improvements  
regarding memory consumption of MDS, which is nice. :-)


Regards,
Eugen


Zitat von Eugen Block :


Hi,


I think it does have positive effect on the messages. Cause I get fewer
messages than before.


that's nice. I also receive definitely less cache pressure messages  
than before.
I also started to play around with the client side cache  
configuration. I halved the client object cache size from 200 MB to  
100 MB:


ceph@host1:~ $ ceph daemon mds.host1 config set client_oc_size 104857600

Although I still encountered one pressure message recently the total  
amount of these messages has decreased significantly.


Regards,
Eugen


Zitat von Zhenshi Zhou :


Hi Eugen,
I think it does have positive effect on the messages. Cause I get fewer
messages than before.

Eugen Block  于2018年8月20日周一 下午9:29写道:


Update: we are getting these messages again.

So the search continues...


Zitat von Eugen Block :


Hi,

Depending on your kernel (memory leaks with CephFS) increasing the
mds_cache_memory_limit could be of help. What is your current
setting now?

ceph:~ # ceph daemon mds. config show | grep mds_cache_memory_limit

We had these messages for months, almost every day.
It would occur when hourly backup jobs ran and the MDS had to serve
an additional client (searching the whole CephFS for changes)
besides the existing CephFS clients. First we updated all clients to
a more recent kernel version, but the warnings didn't stop. Then we
doubled the cache size from 2 GB to 4 GB last week and since then I
haven't seen this warning again (for now).

Try playing with the cache size to find a setting fitting your
needs, but don't forget to monitor your MDS in case something goes
wrong.

Regards,
Eugen


Zitat von Wido den Hollander :


On 08/13/2018 01:22 PM, Zhenshi Zhou wrote:

Hi,
Recently, the cluster runs healthy, but I get warning messages

everyday:




Which version of Ceph? Which version of clients?

Can you post:

$ ceph versions
$ ceph features
$ ceph fs status

Wido


2018-08-13 17:39:23.682213 [INF]  Cluster is now healthy
2018-08-13 17:39:23.682144 [INF]  Health check cleared:
MDS_CLIENT_RECALL (was: 6 clients failing to respond to cache pressure)
2018-08-13 17:39:23.052022 [INF]  MDS health message cleared (mds.0):
Client docker38:docker failing to respond to cache pressure
2018-08-13 17:39:23.051979 [INF]  MDS health message cleared (mds.0):
Client docker73:docker failing to respond to cache pressure
2018-08-13 17:39:23.051934 [INF]  MDS health message cleared (mds.0):
Client docker74:docker failing to respond to cache pressure
2018-08-13 17:39:23.051853 [INF]  MDS health message cleared (mds.0):
Client docker75:docker failing to respond to cache pressure
2018-08-13 17:39:23.051815 [INF]  MDS health message cleared (mds.0):
Client docker27:docker failing to respond to cache pressure
2018-08-13 17:39:23.051753 [INF]  MDS health message cleared (mds.0):
Client docker27 failing to respond to cache pressure
2018-08-13 17:38:11.100331 [WRN]  Health check update: 6 clients

failing

to respond to cache pressure (MDS_CLIENT_RECALL)
2018-08-13 17:37:39.570014 [WRN]  Health check update: 5 clients

failing

to respond to cache pressure (MDS_CLIENT_RECALL)
2018-08-13 17:37:31.099418 [WRN]  Health check update: 3 clients

failing

to respond to cache pressure (MDS_CLIENT_RECALL)
2018-08-13 17:36:34.564345 [WRN]  Health check update: 1 clients

failing

to respond to cache pressure (MDS_CLIENT_RECALL)
2018-08-13 17:36:27.121891 [WRN]  Health check update: 3 clients

failing

to respond to cache pressure (MDS_CLIENT_RECALL)
2018-08-13 17:36:11.967531 [WRN]  Health check update: 5 clients

failing

to respond to cache pressure (MDS_CLIENT_RECALL)
2018-08-13 17:35:59.870055 [WRN]  Health check update: 6 clients

failing

to respond to cache pressure (MDS_CLIENT_RECALL)
2018-08-13 17:35:47.787323 [WRN]  Health check update: 3 clients

failing

to respond to cache pressure (MDS_CLIENT_RECALL)
2018-08-13 17:34:59.435933 [WRN]  Health check failed: 1 clients

failing

to respo

Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Marc Roos

It is idle, testing still, running a backup's at night on it.
How do you fill up the cluster so you can test between empty and full? 
Do you have a "ceph df" from empty and full? 

I have done another test disabling new scrubs on the rbd.ssd pool (but 
still 3 on hdd) with:
ceph tell osd.* injectargs --osd_max_backfills=0
Again getting slower towards the end.
Bandwidth (MB/sec): 395.749
Average Latency(s): 0.161713


-Original Message-
From: Menno Zonneveld [mailto:me...@1afa.com] 
Sent: donderdag 6 september 2018 16:56
To: Marc Roos; ceph-users
Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
expected performance

The benchmark does fluctuate quite a bit that's why I run it for 180 
seconds now as then I do get consistent results.

Your performance seems on par with what I'm getting with 3 nodes and 9 
OSD's, not sure what to make of that.

Are your machines actively used perhaps? Mine are mostly idle as it's 
still a test setup.

-Original message-
> From:Marc Roos 
> Sent: Thursday 6th September 2018 16:23
> To: ceph-users ; Menno Zonneveld 
> 
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> than expected performance
> 
> 
> 
> I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB 2x E5-2660 2x 

> LSI SAS2308 1x dual port 10Gbit (one used, and shared between 
> cluster/client vlans)
> 
> I have 5 pg's scrubbing, but I am not sure if there is any on the ssd 
> pool. I am noticing a drop in the performance at the end of the test.
> Maybe some caching on the ssd?
> 
> rados bench -p rbd.ssd 60 write -b 4M -t 16
> Bandwidth (MB/sec): 448.465
> Average Latency(s): 0.142671
> 
> rados bench -p rbd.ssd 180 write -b 4M -t 16
> Bandwidth (MB/sec): 381.998
> Average Latency(s): 0.167524
> 
> 
> -Original Message-
> From: Menno Zonneveld [mailto:me...@1afa.com]
> Sent: donderdag 6 september 2018 15:52
> To: Marc Roos; ceph-users
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> than expected performance
> 
> ah yes, 3x replicated with minimal 2.
> 
> 
> my ceph.conf is pretty bare, just in case it might be relevant
> 
> [global]
>auth client required = cephx
>auth cluster required = cephx
>auth service required = cephx
> 
>cluster network = 172.25.42.0/24
> 
>fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e
> 
>keyring = /etc/pve/priv/$cluster.$name.keyring
> 
>mon allow pool delete = true
>mon osd allow primary affinity = true
> 
>osd journal size = 5120
>osd pool default min size = 2
>osd pool default size = 3
> 
> 
> -Original message-
> > From:Marc Roos 
> > Sent: Thursday 6th September 2018 15:43
> > To: ceph-users ; Menno Zonneveld 
> > 
> > Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> > than expected performance
> > 
> >  
> > 
> > Test pool is 3x replicated?
> > 
> > 
> > -Original Message-
> > From: Menno Zonneveld [mailto:me...@1afa.com]
> > Sent: donderdag 6 september 2018 15:29
> > To: ceph-users@lists.ceph.com
> > Subject: [ceph-users] Rados performance inconsistencies, lower than 
> > expected performance
> > 
> > I've setup a CEPH cluster to test things before going into 
> > production but I've run into some performance issues that I cannot 
> > resolve or explain.
> > 
> > Hardware in use in each storage machine (x3)
> > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 
> > 9000)
> > - dual 10Gbit EdgeSwitch 16-Port XG
> > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
> > - 3x Intel S4500 480GB SSD as OSD's
> > - 2x SSD raid-1 boot/OS disks
> > - 2x Intel(R) Xeon(R) CPU E5-2630
> > - 128GB memory
> > 
> > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 

> > on all nodes.
> > 
> > Running rados benchmark resulted in somewhat lower than expected 
> > performance unless ceph enters the 'near-full' state. When the 
> > cluster
> 
> > is mostly empty rados bench (180 write -b 4M -t 16) results in about 

> > 330MB/s with 0.18ms latency but when hitting near-full state this 
> > goes
> 
> > up to a more expected 550MB/s and 0.11ms latency.
> > 
> > iostat on the storage machines shows the disks are hardly utilized 
> > unless the cluster hits near-full, CPU and network also aren't maxed 

> > out. I’ve also tried with NIC bonding and just one switch, without 
> > jumbo frames but nothing seem to matter in this case.
> > 
> > Is this expected behavior or what can I try to do to pinpoint the 
> > bottleneck ?
> > 
> > The expected performance is per Proxmox's benchmark results they 
> > released this year, they have 4 OSD's per server and hit almost 
> > 800MB/s with 0.08ms latency using 10Gbit and 3 nodes, though they 
> > have
> 
> > more OSD's and somewhat different hardware I understand I won't hit 
> > the 800MB/s mark but the difference between empty and almost full 
> > cluster makes no sense to 

Re: [ceph-users] mgr/dashboard: Community branding & styling

2018-09-06 Thread Ernesto Puerta
Thanks for the feedback, Erwan, John! We may follow up on the tracker issues.

Kind Regards,
Ernesto


On Thu, Sep 6, 2018 at 3:54 PM Erwan Velu  wrote:
>
> Cool stuff.
> Feed the tickets to report my comments.
>
> Cheers,
>
> - Mail original -
> De: "Ernesto Puerta" 
> À: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com
> Cc: "Michael Celedonia" , "Ju Lim" 
> Envoyé: Mercredi 5 Septembre 2018 13:49:41
> Objet: mgr/dashboard: Community branding & styling
>
> Hi dashboard devels & users,
>
> You may find below a link to a PDF with the recommendations from
> Michael Celedonia & Ju Lim (in CC) on top of the current community
> branding, but just to summarize changes
> (http://tracker.ceph.com/issues/35688):
>
> - Login screen (http://tracker.ceph.com/issues/35689).
> - Masthead (http://tracker.ceph.com/issues/35690).
> - Landing screen (http://tracker.ceph.com/issues/35691).
> - About modal window (http://tracker.ceph.com/issues/35693).
> - Ceph brand gray (#47545C), and switch all shades of gray (sorry,
> npi) according to that hue (e.g: a dark bluish gray - #333E46)
> (http://tracker.ceph.com/issues/35692).
>
> Please, feel free to provide your feedback and suggestions on these
> proposals, either here or, even better, in those tickets.
>
> [ PDF: 
> https://gist.github.com/epuertat/7aa01770e47dbb6d99ba35b8cd3f9391/raw/aa26f4d0a3f351be0bcf3ed87e1acfd45a702145/ceph-dashboard-community-branding-styling.pdf
> ]
>
> Kind Regards,
> Ernesto
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Alwin Antreich
On Thu, Sep 06, 2018 at 05:15:26PM +0200, Marc Roos wrote:
> 
> It is idle, testing still, running a backup's at night on it.
> How do you fill up the cluster so you can test between empty and full? 
> Do you have a "ceph df" from empty and full? 
> 
> I have done another test disabling new scrubs on the rbd.ssd pool (but 
> still 3 on hdd) with:
> ceph tell osd.* injectargs --osd_max_backfills=0
> Again getting slower towards the end.
> Bandwidth (MB/sec): 395.749
> Average Latency(s): 0.161713
In the results you both had, the latency is twice as high as in our
tests [1]. That can already make quiet some difference. Depending on the
actual hardware used, there may or may not be the possibility for good
optimisation.

As a start, you could test the disks with fio, as shown in our benchmark
paper, to get some results for comparison. The forum thread [1] has
some benchmarks from other users for comparison.

[1] https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/

--
Cheers,
Alwin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.8 Luminous released

2018-09-06 Thread Igor Fedotov

Hi Adrian,

yes, this issue has been fixed by

https://github.com/ceph/ceph/pull/22909


Thanks,

Igor


On 9/6/2018 8:10 AM, Adrian Saul wrote:

Can I confirm if this bluestore compression assert issue is resolved in 12.2.8?

https://tracker.ceph.com/issues/23540

I notice that it has a backport that is listed against 12.2.8 but there is no 
mention of that issue or backport listed in the release notes.



-Original Message-
From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
ow...@vger.kernel.org] On Behalf Of Abhishek Lekshmanan
Sent: Wednesday, 5 September 2018 2:30 AM
To: ceph-de...@vger.kernel.org; ceph-us...@ceph.com; ceph-
maintain...@ceph.com; ceph-annou...@ceph.com
Subject: v12.2.8 Luminous released


We're glad to announce the next point release in the Luminous v12.2.X stable
release series. This release contains a range of bugfixes and stability
improvements across all the components of ceph. For detailed release notes
with links to tracker issues and pull requests, refer to the blog post at
http://ceph.com/releases/v12-2-8-released/

Upgrade Notes from previous luminous releases
-

When upgrading from v12.2.5 or v12.2.6 please note that upgrade caveats
from
12.2.5 will apply to any _newer_ luminous version including 12.2.8. Please
read the notes at https://ceph.com/releases/12-2-7-luminous-
released/#upgrading-from-v12-2-6

For the cluster that installed the broken 12.2.6 release, 12.2.7 fixed the
regression and introduced a workaround option `osd distrust data digest =
true`, but 12.2.7 clusters still generated health warnings like ::

   [ERR] 11.288 shard 207: soid
   11:1155c332:::rbd_data.207dce238e1f29.0527:head
data_digest
   0xc8997a5b != data_digest 0x2ca15853


12.2.8 improves the deep scrub code to automatically repair these
inconsistencies. Once the entire cluster has been upgraded and then fully
deep scrubbed, and all such inconsistencies are resolved; it will be safe to
disable the `osd distrust data digest = true` workaround option.

Changelog
-
* bluestore: set correctly shard for existed Collection (issue#24761, pr#22860,
Jianpeng Ma)
* build/ops: Boost system library is no longer required to compile and link
example librados program (issue#25054, pr#23202, Nathan Cutler)
* build/ops: Bring back diff -y for non-FreeBSD (issue#24396, issue#21664,
pr#22848, Sage Weil, David Zafman)
* build/ops: install-deps.sh fails on newest openSUSE Leap (issue#25064,
pr#23179, Kyr Shatskyy)
* build/ops: Mimic build fails with -DWITH_RADOSGW=0 (issue#24437,
pr#22864, Dan Mick)
* build/ops: order rbdmap.service before remote-fs-pre.target
(issue#24713, pr#22844, Ilya Dryomov)
* build/ops: rpm: silence osd block chown (issue#25152, pr#23313, Dan van
der Ster)
* cephfs-journal-tool: Fix purging when importing an zero-length journal
(issue#24239, pr#22980, yupeng chen, zhongyan gu)
* cephfs: MDSMonitor: uncommitted state exposed to clients/mdss
(issue#23768, pr#23013, Patrick Donnelly)
* ceph-fuse mount failed because no mds (issue#22205, pr#22895, liyan)
* ceph-volume add a __release__ string, to help version-conditional calls
(issue#25170, pr#23331, Alfredo Deza)
* ceph-volume: adds test for `ceph-volume lvm list /dev/sda` (issue#24784,
issue#24957, pr#23350, Andrew Schoen)
* ceph-volume: do not use stdin in luminous (issue#25173, issue#23260,
pr#23367, Alfredo Deza)
* ceph-volume enable the ceph-osd during lvm activation (issue#24152,
pr#23394, Dan van der Ster, Alfredo Deza)
* ceph-volume expand on the LVM API to create multiple LVs at different
sizes (issue#24020, pr#23395, Alfredo Deza)
* ceph-volume lvm.activate conditional mon-config on prime-osd-dir
(issue#25216, pr#23397, Alfredo Deza)
* ceph-volume lvm.batch remove non-existent sys_api property
(issue#34310, pr#23811, Alfredo Deza)
* ceph-volume lvm.listing only include devices if they exist (issue#24952,
pr#23150, Alfredo Deza)
* ceph-volume: process.call with stdin in Python 3 fix (issue#24993, pr#23238,
Alfredo Deza)
* ceph-volume: PVolumes.get() should return one PV when using name or
uuid (issue#24784, pr#23329, Andrew Schoen)
* ceph-volume: refuse to zap mapper devices (issue#24504, pr#23374,
Andrew Schoen)
* ceph-volume: tests.functional inherit SSH_ARGS from ansible (issue#34311,
pr#23813, Alfredo Deza)
* ceph-volume tests/functional run lvm list after OSD provisioning
(issue#24961, pr#23147, Alfredo Deza)
* ceph-volume: unmount lvs correctly before zapping (issue#24796,
pr#23128, Andrew Schoen)
* ceph-volume: update batch documentation to explain filestore strategies
(issue#34309, pr#23825, Alfredo Deza)
* change default filestore_merge_threshold to -10 (issue#24686, pr#22814,
Douglas Fuller)
* client: add inst to asok status output (issue#24724, pr#23107, Patrick
Donnelly)
* client: fixup parallel calls to ceph_ll_lookup_inode() in NFS FASL
(issue#22683, pr#23012, huanwen ren)
* client: increase verbosity level for log messages in helper methods
(i

[ceph-users] CephFS on a mixture of SSDs and HDDs

2018-09-06 Thread Vladimir Brik
Hello

I am setting up a new ceph cluster (probably Mimic) made up of servers
that have a mixture of solid state and spinning disks. I'd like CephFS
to store data of some of our applications only on SSDs, and store data
of other applications only on HDDs.

Is there a way of doing this without running multiple filesystems within
the same cluster? (E.g. something like configuring CephFS to store data
of some directory trees in an SSD pool, and storing others in an HDD pool)

If not, can anybody comment on their experience running multiple file
systems in a single cluster? Are there any known issues (I am only aware
of some issues related to security)?

Does anybody know if support/testing of multiple filesystems in a
cluster is something actively being worked on and if it might stop being
"experimental" in near future?


Thanks very much,

Vlad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph talks from Mounpoint.io

2018-09-06 Thread Gregory Farnum
Unfortunately I don't believe anybody collected the slide files, so they
aren't available for public access. :(

On Wed, Sep 5, 2018 at 8:16 PM xiangyang yu  wrote:

> Hi  Greg,
> Where can we download the talk ppt at mountpoint.io?
>
> Best  wishes,
> brandy
>
> Gregory Farnum  于2018年9月6日周四 上午7:05写道:
>
>> Hey all,
>> Just wanted to let you know that all the talks from Mountpoint.io are
>> now available on YouTube. These are reasonably high-quality videos and
>> include Ceph talks such as:
>> "Bringing smart device failure prediction to Ceph"
>> "Pains & Pleasures Testing the Ceph Distributed Storage Stack"
>> "Ceph cloud object storage: the right way"
>> "Lessons Learned Scaling Ceph for Public Clouds"
>> "Making Ceph fast in the face of failure"
>> "Anatomy of a librados client application"
>> "Self-aware Ceph: enabling ceph-mgr to control Ceph services via
>> Kubernetes"
>> "Doctor! I need Ceph: a journey of open source storage in healthcare‍"
>> "Rook: Storage Orchestration for a Cloud-Native World"
>> "What’s new in Ceph"
>> and possibly some others I've missed (sorry!).
>>
>> https://www.youtube.com/playlist?list=PL3P__0CcDTTHn7_QtNauTqpYxLCczR431
>>
>> Enjoy!
>> -Greg
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS on a mixture of SSDs and HDDs

2018-09-06 Thread Serkan Çoban
>Is there a way of doing this without running multiple filesystems within the 
>same cluster?
yes, have a look at following link:
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/ceph_file_system_guide/index#working-with-file-and-directory-layouts
On Thu, Sep 6, 2018 at 8:19 PM Vladimir Brik
 wrote:
>
> Hello
>
> I am setting up a new ceph cluster (probably Mimic) made up of servers
> that have a mixture of solid state and spinning disks. I'd like CephFS
> to store data of some of our applications only on SSDs, and store data
> of other applications only on HDDs.
>
> Is there a way of doing this without running multiple filesystems within
> the same cluster? (E.g. something like configuring CephFS to store data
> of some directory trees in an SSD pool, and storing others in an HDD pool)
>
> If not, can anybody comment on their experience running multiple file
> systems in a single cluster? Are there any known issues (I am only aware
> of some issues related to security)?
>
> Does anybody know if support/testing of multiple filesystems in a
> cluster is something actively being worked on and if it might stop being
> "experimental" in near future?
>
>
> Thanks very much,
>
> Vlad
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS on a mixture of SSDs and HDDs

2018-09-06 Thread Marc Roos
 
To add a data pool to an existing cephfs 

ceph osd pool set fs_data.ec21 allow_ec_overwrites true
ceph osd pool application enable fs_data.ec21 cephfs
ceph fs add_data_pool cephfs fs_data.ec21

Then link the pool to the directory (ec21)
setfattr -n ceph.dir.layout.pool -v fs_data.ec21 ec21


-Original Message-
From: Vladimir Brik [mailto:vladimir.b...@icecube.wisc.edu] 
Sent: donderdag 6 september 2018 19:01
To: ceph-users@lists.ceph.com
Subject: [ceph-users] CephFS on a mixture of SSDs and HDDs

Hello

I am setting up a new ceph cluster (probably Mimic) made up of servers 
that have a mixture of solid state and spinning disks. I'd like CephFS 
to store data of some of our applications only on SSDs, and store data 
of other applications only on HDDs.

Is there a way of doing this without running multiple filesystems within 
the same cluster? (E.g. something like configuring CephFS to store data 
of some directory trees in an SSD pool, and storing others in an HDD 
pool)

If not, can anybody comment on their experience running multiple file 
systems in a single cluster? Are there any known issues (I am only aware 
of some issues related to security)?

Does anybody know if support/testing of multiple filesystems in a 
cluster is something actively being worked on and if it might stop being 
"experimental" in near future?


Thanks very much,

Vlad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph and NVMe

2018-09-06 Thread Stefan Priebe - Profihost AG
Hello list,

has anybody tested current NVMe performance with luminous and bluestore?
Is this something which makes sense or just a waste of money?

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and NVMe

2018-09-06 Thread Steven Vacaroaia
Hi ,
Just to add to this question, is anyone using Intel Optane DC P4800X on
DELL R630 ...or any other server ?
Any gotchas / feedback/ knowledge sharing will be greatly appreciated

Steven

On Thu, 6 Sep 2018 at 14:59, Stefan Priebe - Profihost AG <
s.pri...@profihost.ag> wrote:

> Hello list,
>
> has anybody tested current NVMe performance with luminous and bluestore?
> Is this something which makes sense or just a waste of money?
>
> Greets,
> Stefan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph talks from Mounpoint.io

2018-09-06 Thread David Turner
They mentioned that they were going to send the slides to everyone that had
their badges scanned at the conference.  I haven't seen that email come out
yet, though.

On Thu, Sep 6, 2018 at 4:14 PM Gregory Farnum  wrote:

> Unfortunately I don't believe anybody collected the slide files, so they
> aren't available for public access. :(
>
> On Wed, Sep 5, 2018 at 8:16 PM xiangyang yu  wrote:
>
>> Hi  Greg,
>> Where can we download the talk ppt at mountpoint.io?
>>
>> Best  wishes,
>> brandy
>>
>> Gregory Farnum  于2018年9月6日周四 上午7:05写道:
>>
> Hey all,
>>> Just wanted to let you know that all the talks from Mountpoint.io are
>>> now available on YouTube. These are reasonably high-quality videos and
>>> include Ceph talks such as:
>>> "Bringing smart device failure prediction to Ceph"
>>> "Pains & Pleasures Testing the Ceph Distributed Storage Stack"
>>> "Ceph cloud object storage: the right way"
>>> "Lessons Learned Scaling Ceph for Public Clouds"
>>> "Making Ceph fast in the face of failure"
>>> "Anatomy of a librados client application"
>>> "Self-aware Ceph: enabling ceph-mgr to control Ceph services via
>>> Kubernetes"
>>> "Doctor! I need Ceph: a journey of open source storage in healthcare‍"
>>> "Rook: Storage Orchestration for a Cloud-Native World"
>>> "What’s new in Ceph"
>>> and possibly some others I've missed (sorry!).
>>>
>>> https://www.youtube.com/playlist?list=PL3P__0CcDTTHn7_QtNauTqpYxLCczR431
>>>
>>> Enjoy!
>>> -Greg
>>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph talks from Mounpoint.io

2018-09-06 Thread Amye Scavarda
Still working on all of that! Never fear!
-- amye
On Thu, Sep 6, 2018 at 1:16 PM David Turner  wrote:

> They mentioned that they were going to send the slides to everyone that
> had their badges scanned at the conference.  I haven't seen that email come
> out yet, though.
>
> On Thu, Sep 6, 2018 at 4:14 PM Gregory Farnum  wrote:
>
>> Unfortunately I don't believe anybody collected the slide files, so they
>> aren't available for public access. :(
>>
>> On Wed, Sep 5, 2018 at 8:16 PM xiangyang yu  wrote:
>>
>>> Hi  Greg,
>>> Where can we download the talk ppt at mountpoint.io?
>>>
>>> Best  wishes,
>>> brandy
>>>
>>> Gregory Farnum  于2018年9月6日周四 上午7:05写道:
>>>
>> Hey all,
 Just wanted to let you know that all the talks from Mountpoint.io are
 now available on YouTube. These are reasonably high-quality videos and
 include Ceph talks such as:
 "Bringing smart device failure prediction to Ceph"
 "Pains & Pleasures Testing the Ceph Distributed Storage Stack"
 "Ceph cloud object storage: the right way"
 "Lessons Learned Scaling Ceph for Public Clouds"
 "Making Ceph fast in the face of failure"
 "Anatomy of a librados client application"
 "Self-aware Ceph: enabling ceph-mgr to control Ceph services via
 Kubernetes"
 "Doctor! I need Ceph: a journey of open source storage in healthcare‍"
 "Rook: Storage Orchestration for a Cloud-Native World"
 "What’s new in Ceph"
 and possibly some others I've missed (sorry!).

 https://www.youtube.com/playlist?list=PL3P__0CcDTTHn7_QtNauTqpYxLCczR431

 Enjoy!
 -Greg

>>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and NVMe

2018-09-06 Thread Jeff Bailey
I haven't had any problems using 375GB P4800X's in R730 and R740xd 
machines for DB+WAL.  The iDRAC whines a bit on the R740 but everything 
works fine.


On 9/6/2018 3:09 PM, Steven Vacaroaia wrote:

Hi ,
Just to add to this question, is anyone using Intel Optane DC P4800X on 
DELL R630 ...or any other server ?

Any gotchas / feedback/ knowledge sharing will be greatly appreciated
Steven

On Thu, 6 Sep 2018 at 14:59, Stefan Priebe - Profihost AG 
mailto:s.pri...@profihost.ag>> wrote:


Hello list,

has anybody tested current NVMe performance with luminous and bluestore?
Is this something which makes sense or just a waste of money?

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Safe to use RBD mounts for Docker volumes on containerized Ceph nodes

2018-09-06 Thread Jacob DeGlopper
I've seen the requirement not to mount RBD devices or CephFS filesystems 
on OSD nodes.  Does this still apply when the OSDs and clients using the 
RBD volumes are all in Docker containers?


That is, is it possible to run a 3-server setup in production with both 
Ceph daemons (mon, mgr, and OSD) in containers, along with applications 
in containers using Ceph as shared storage (Elasticsearch, gitlab, etc)?


    -- jacob

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and NVMe

2018-09-06 Thread Linh Vu
We have P3700s and Optane 900P (similar to P4800 but the workstation version 
and a lot cheaper) on R730xds, for WAL, DB and metadata pools for cephfs and 
radosgw. They perform great!


From: ceph-users  on behalf of Jeff Bailey 

Sent: Friday, 7 September 2018 7:36:19 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph and NVMe

I haven't had any problems using 375GB P4800X's in R730 and R740xd
machines for DB+WAL.  The iDRAC whines a bit on the R740 but everything
works fine.

On 9/6/2018 3:09 PM, Steven Vacaroaia wrote:
> Hi ,
> Just to add to this question, is anyone using Intel Optane DC P4800X on
> DELL R630 ...or any other server ?
> Any gotchas / feedback/ knowledge sharing will be greatly appreciated
> Steven
>
> On Thu, 6 Sep 2018 at 14:59, Stefan Priebe - Profihost AG
> mailto:s.pri...@profihost.ag>> wrote:
>
> Hello list,
>
> has anybody tested current NVMe performance with luminous and bluestore?
> Is this something which makes sense or just a waste of money?
>
> Greets,
> Stefan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse using excessive memory

2018-09-06 Thread Yan, Zheng
Could you please try make ceph-fuse use simple messenger (add "ms type
= simple" to client section of ceph.conf).

Regards
Yan, Zheng



On Wed, Sep 5, 2018 at 10:09 PM Sage Weil  wrote:
>
> On Wed, 5 Sep 2018, Andras Pataki wrote:
> > Hi cephers,
> >
> > Every so often we have a ceph-fuse process that grows to rather large size 
> > (up
> > to eating up the whole memory of the machine).  Here is an example of a 
> > 200GB
> > RSS size ceph-fuse instance:
> >
> > # ceph daemon /var/run/ceph/ceph-client.admin.asok dump_mempools
> > {
> > "bloom_filter": {
> > "items": 0,
> > "bytes": 0
> > },
> > "bluestore_alloc": {
> > "items": 0,
> > "bytes": 0
> > },
> > "bluestore_cache_data": {
> > "items": 0,
> > "bytes": 0
> > },
> > "bluestore_cache_onode": {
> > "items": 0,
> > "bytes": 0
> > },
> > "bluestore_cache_other": {
> > "items": 0,
> > "bytes": 0
> > },
> > "bluestore_fsck": {
> > "items": 0,
> > "bytes": 0
> > },
> > "bluestore_txc": {
> > "items": 0,
> > "bytes": 0
> > },
> > "bluestore_writing_deferred": {
> > "items": 0,
> > "bytes": 0
> > },
> > "bluestore_writing": {
> > "items": 0,
> > "bytes": 0
> > },
> > "bluefs": {
> > "items": 0,
> > "bytes": 0
> > },
> > "buffer_anon": {
> > "items": 51534897,
> > "bytes": 207321872398
> > },
> > "buffer_meta": {
> > "items": 64,
> > "bytes": 5632
> > },
> > "osd": {
> > "items": 0,
> > "bytes": 0
> > },
> > "osd_mapbl": {
> > "items": 0,
> > "bytes": 0
> > },
> > "osd_pglog": {
> > "items": 0,
> > "bytes": 0
> > },
> > "osdmap": {
> > "items": 28593,
> > "bytes": 431872
> > },
> > "osdmap_mapping": {
> > "items": 0,
> > "bytes": 0
> > },
> > "pgmap": {
> > "items": 0,
> > "bytes": 0
> > },
> > "mds_co": {
> > "items": 0,
> > "bytes": 0
> > },
> > "unittest_1": {
> > "items": 0,
> > "bytes": 0
> > },
> > "unittest_2": {
> > "items": 0,
> > "bytes": 0
> > },
> > "total": {
> > "items": 51563554,
> > "bytes": 207322309902
> > }
> > }
> >
> > The general cache size looks like this (if it is helpful I can put a whole
> > cache dump somewhere):
> >
> > # ceph daemon /var/run/ceph/ceph-client.admin.asok dump_cache | grep path | 
> > wc
> > -l
> > 84085
> > # ceph daemon /var/run/ceph/ceph-client.admin.asok dump_cache | grep name | 
> > wc
> > -l
> > 168186
> >
> > Any ideas what 'buffer_anon' is and what could be eating up the 200GB of
> > RAM?
>
> buffer_anon is memory consumed by the bufferlist class that hasn't been
> explicitly put into a separate mempool category.  The question is
> where/why are buffers getting pinned in memory.  Can you dump the
> perfcounters?  That might give some hint.
>
> My guess is a leak, or a problem with the ObjectCacher code that is
> preventing it from timming older buffers.
>
> How reproducible is the situation?  Any idea what workloads trigger it?
>
> Thanks!
> sage
>
> >
> > We are running with a few ceph-fuse specific parameters increased in
> > ceph.conf:
> >
> ># Description:  Set the number of inodes that the client keeps in
> >the metadata cache.
> ># Default:  16384
> >client_cache_size = 262144
> >
> ># Description:  Set the maximum number of dirty bytes in the object
> >cache.
> ># Default:  104857600 (100MB)
> >client_oc_max_dirty = 536870912
> >
> ># Description:  Set the maximum number of objects in the object cache.
> ># Default:  1000
> >client_oc_max_objects = 8192
> >
> ># Description:  Set how many bytes of data will the client cache.
> ># Default:  209715200 (200 MB)
> >client_oc_size = 2147483640
> >
> ># Description:  Set the maximum number of bytes that the kernel
> >reads ahead for future read operations. Overridden by the
> >client_readahead_max_periods setting.
> ># Default:  0 (unlimited)
> >#client_readahead_max_bytes = 67108864
> >
> ># Description:  Set the number of file layout periods (object size *
> >number of stripes) that the kernel reads ahead. Overrides the
> >client_readahead_max_bytes setting.
> ># Default:  4
> >client_readahead_max_periods = 64
> >
> ># Description:  Set the minimum number bytes that the kernel reads
> >ahead.
> ># Default:  131072 (128KB)
> >client_readahead_min = 4194304
> >
> >
> > We are running a 12.2.7 ceph cluster, and the cluster is otherwise healthy.
> >
> > Any hints would be appreciated.  Thanks,
> >
> > Andras
> >
> > ___
> ceph-users mailing list
> ceph-users@l