On Wed, Feb 8, 2017 at 3:05 PM, Ahmed Khuraidah <abushi...@gmail.com> wrote:
> Hi Shinobu, I am using SUSE packages in scope of their latest SUSE > Enterprise Storage 4 and following documentation (method of deployment: > ceph-deploy) > But, I was able reproduce this issue on Ubuntu 14.04 with Ceph > repositories (also latest Jewel and ceph-deploy) as well. > Community Ceph packages are running on ubuntu box, right? If so, please do `ceph -v` on ubuntu box. And also please provide us with same issue which you hit on suse box. > > On Wed, Feb 8, 2017 at 3:03 AM, Shinobu Kinjo <ski...@redhat.com> wrote: > >> Are you using opensource Ceph packages or suse ones? >> >> On Sat, Feb 4, 2017 at 3:54 PM, Ahmed Khuraidah <abushi...@gmail.com> >> wrote: >> >>> I Have opened ticket on http://tracker.ceph.com/ >>> >>> http://tracker.ceph.com/issues/18816 >>> >>> >>> My client and server kernels are the same, here is info: >>> # lsb_release -a >>> LSB Version: n/a >>> Distributor ID: SUSE >>> Description: SUSE Linux Enterprise Server 12 SP2 >>> Release: 12.2 >>> Codename: n/a >>> # uname -a >>> Linux cephnode 4.4.38-93-default #1 SMP Wed Dec 14 12:59:43 UTC 2016 >>> (2d3e9d4) x86_64 x86_64 x86_64 GNU/Linux >>> >>> >>> Thanks >>> >>> On Fri, Feb 3, 2017 at 1:59 PM, John Spray <jsp...@redhat.com> wrote: >>> >>>> On Fri, Feb 3, 2017 at 8:07 AM, Ahmed Khuraidah <abushi...@gmail.com> >>>> wrote: >>>> > Thank you guys, >>>> > >>>> > I tried to add option "exec_prerun=echo 3 > /proc/sys/vm/drop_caches" >>>> as >>>> > well as "exec_prerun=echo 3 | sudo tee /proc/sys/vm/drop_caches", but >>>> > despite FIO corresponds that command was executed, there are no >>>> changes. >>>> > >>>> > But, I caught very strange another behavior. If I will run my FIO test >>>> > (speaking about 3G file case) twice, after the first run FIO will >>>> create my >>>> > file and print a lot of IOps as described already, but if- before >>>> second >>>> > run- drop cache (by root echo 3 > /proc/sys/vm/drop_caches) I broke >>>> will end >>>> > with broken MDS: >>>> > >>>> > --- begin dump of recent events --- >>>> > 0> 2017-02-03 02:34:41.974639 7f7e8ec5e700 -1 *** Caught signal >>>> > (Aborted) ** >>>> > in thread 7f7e8ec5e700 thread_name:ms_dispatch >>>> > >>>> > ceph version 10.2.4-211-g12b091b (12b091b4a40947aa43919e71a318e >>>> d0dcedc8734) >>>> > 1: (()+0x5142a2) [0x557c51e092a2] >>>> > 2: (()+0x10b00) [0x7f7e95df2b00] >>>> > 3: (gsignal()+0x37) [0x7f7e93ccb8d7] >>>> > 4: (abort()+0x13a) [0x7f7e93ccccaa] >>>> > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> > const*)+0x265) [0x557c51f133d5] >>>> > 6: (MutationImpl::~MutationImpl()+0x28e) [0x557c51bb9e1e] >>>> > 7: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_relea >>>> se()+0x39) >>>> > [0x557c51b2ccf9] >>>> > 8: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned >>>> long, bool, >>>> > unsigned long, utime_t)+0x9a7) [0x557c51ca2757] >>>> > 9: (Locker::remove_client_cap(CInode*, client_t)+0xb1) >>>> [0x557c51ca38f1] >>>> > 10: (Locker::_do_cap_release(client_t, inodeno_t, unsigned long, >>>> unsigned >>>> > int, unsigned int)+0x90d) [0x557c51ca424d] >>>> > 11: (Locker::handle_client_cap_release(MClientCapRelease*)+0x1cc) >>>> > [0x557c51ca449c] >>>> > 12: (MDSRank::handle_deferrable_message(Message*)+0xc1c) >>>> [0x557c51b33d3c] >>>> > 13: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x557c51b3c991] >>>> > 14: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x557c51b3dae5] >>>> > 15: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x557c51b25703] >>>> > 16: (DispatchQueue::entry()+0x78b) [0x557c5200d06b] >>>> > 17: (DispatchQueue::DispatchThread::entry()+0xd) [0x557c51ee5dcd] >>>> > 18: (()+0x8734) [0x7f7e95dea734] >>>> > 19: (clone()+0x6d) [0x7f7e93d80d3d] >>>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> needed to >>>> > interpret this. >>>> >>>> Oops! Please could you open a ticket on tracker.ceph.com, with this >>>> backtrace, the client versions, any non-default config settings, and >>>> the series of operations that led up to it. >>>> >>>> Thanks, >>>> John >>>> >>>> > " >>>> > >>>> > On Thu, Feb 2, 2017 at 9:30 PM, Shinobu Kinjo <ski...@redhat.com> >>>> wrote: >>>> >> >>>> >> You may want to add this in your FIO recipe. >>>> >> >>>> >> * exec_prerun=echo 3 > /proc/sys/vm/drop_caches >>>> >> >>>> >> Regards, >>>> >> >>>> >> On Fri, Feb 3, 2017 at 12:36 AM, Wido den Hollander <w...@42on.com> >>>> wrote: >>>> >> > >>>> >> >> Op 2 februari 2017 om 15:35 schreef Ahmed Khuraidah >>>> >> >> <abushi...@gmail.com>: >>>> >> >> >>>> >> >> >>>> >> >> Hi all, >>>> >> >> >>>> >> >> I am still confused about my CephFS sandbox. >>>> >> >> >>>> >> >> When I am performing simple FIO test into single file with size >>>> of 3G I >>>> >> >> have too many IOps: >>>> >> >> >>>> >> >> cephnode:~ # fio payloadrandread64k3G >>>> >> >> test: (g=0): rw=randread, bs=64K-64K/64K-64K/64K-64K, >>>> ioengine=libaio, >>>> >> >> iodepth=2 >>>> >> >> fio-2.13 >>>> >> >> Starting 1 process >>>> >> >> test: Laying out IO file(s) (1 file(s) / 3072MB) >>>> >> >> Jobs: 1 (f=1): [r(1)] [100.0% done] [277.8MB/0KB/0KB /s] [4444/0/0 >>>> >> >> iops] >>>> >> >> [eta 00m:00s] >>>> >> >> test: (groupid=0, jobs=1): err= 0: pid=3714: Thu Feb 2 07:07:01 >>>> 2017 >>>> >> >> read : io=3072.0MB, bw=181101KB/s, iops=2829, runt= 17370msec >>>> >> >> slat (usec): min=4, max=386, avg=12.49, stdev= 6.90 >>>> >> >> clat (usec): min=202, max=5673.5K, avg=690.81, stdev=361 >>>> >> >> >>>> >> >> >>>> >> >> But if I will change size to file to 320G, looks like I skip the >>>> cache: >>>> >> >> >>>> >> >> cephnode:~ # fio payloadrandread64k320G >>>> >> >> test: (g=0): rw=randread, bs=64K-64K/64K-64K/64K-64K, >>>> ioengine=libaio, >>>> >> >> iodepth=2 >>>> >> >> fio-2.13 >>>> >> >> Starting 1 process >>>> >> >> Jobs: 1 (f=1): [r(1)] [100.0% done] [4740KB/0KB/0KB /s] [74/0/0 >>>> iops] >>>> >> >> [eta >>>> >> >> 00m:00s] >>>> >> >> test: (groupid=0, jobs=1): err= 0: pid=3624: Thu Feb 2 06:51:09 >>>> 2017 >>>> >> >> read : io=3410.9MB, bw=11641KB/s, iops=181, runt=300033msec >>>> >> >> slat (usec): min=4, max=442, avg=14.43, stdev=10.07 >>>> >> >> clat (usec): min=98, max=286265, avg=10976.32, stdev=14904.82 >>>> >> >> >>>> >> >> >>>> >> >> For random write test such behavior not exists, there are almost >>>> the >>>> >> >> same >>>> >> >> results - around 100 IOps. >>>> >> >> >>>> >> >> So my question: could please somebody clarify where this caching >>>> likely >>>> >> >> happens and how to manage it? >>>> >> >> >>>> >> > >>>> >> > The page cache of your kernel. The kernel will cache the file in >>>> memory >>>> >> > and perform read operations from there. >>>> >> > >>>> >> > Best way is to reboot your client between test runs. Although you >>>> can >>>> >> > drop kernel caches I always reboot to make sure nothing is cached >>>> locally. >>>> >> > >>>> >> > Wido >>>> >> > >>>> >> >> P.S. >>>> >> >> This is latest SLES/Jewel based onenode setup which has: >>>> >> >> 1 MON, 1 MDS (both data and metadata pools on SATA drive) and 1 >>>> OSD >>>> >> >> (XFS on >>>> >> >> SATA and journal on SSD). >>>> >> >> My FIO config file: >>>> >> >> direct=1 >>>> >> >> buffered=0 >>>> >> >> ioengine=libaio >>>> >> >> iodepth=2 >>>> >> >> runtime=300 >>>> >> >> >>>> >> >> Thanks >>>> >> >> _______________________________________________ >>>> >> >> ceph-users mailing list >>>> >> >> ceph-users@lists.ceph.com >>>> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >> > _______________________________________________ >>>> >> > ceph-users mailing list >>>> >> > ceph-users@lists.ceph.com >>>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > ceph-users mailing list >>>> > ceph-users@lists.ceph.com >>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> > >>>> >>> >>> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com