[ceph-users] Behavior of ceph-fuse when network is down

2017-11-24 Thread Zhang Qiang
Hi all,

To observe what will happen to ceph-fuse mount if the network is down, we
blocked
network connections to all three monitors by iptables. If we restore the
network
immediately(within minutes), the blocked I/O request will be restored,
every thing will
be back to normal.

But if we continue to block it long enough, say twenty minutes, ceph-fuse
will not be
able to restore. The ceph-fuse process is still there, but will not be able
to handle I/O
operations, df or ls will hang indefinitely.

What is the retry policy of ceph-fuse? Is it normal for ceph-fuse to hang
after the
network blocking? If so, how can I make it restore to normal after the
network is
recovered? If it is not normal, what might be the cause? How can I help to
debug this?

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Behavior of ceph-fuse when network is down

2017-11-24 Thread Yan, Zheng
On Fri, Nov 24, 2017 at 4:59 PM, Zhang Qiang  wrote:
> Hi all,
>
> To observe what will happen to ceph-fuse mount if the network is down, we
> blocked
> network connections to all three monitors by iptables. If we restore the
> network
> immediately(within minutes), the blocked I/O request will be restored, every
> thing will
> be back to normal.
>
> But if we continue to block it long enough, say twenty minutes, ceph-fuse
> will not be
> able to restore. The ceph-fuse process is still there, but will not be able
> to handle I/O
> operations, df or ls will hang indefinitely.
>
> What is the retry policy of ceph-fuse? Is it normal for ceph-fuse to hang
> after the
> network blocking? If so, how can I make it restore to normal after the
> network is
> recovered? If it is not normal, what might be the cause? How can I help to
> debug this?

you can use 'kick_stale_sessions' ASOK command to make ceph-fuse
reconnect, or set  'client_reconnect_stale' config option to true.
Besides, you need to set mds config option
'mds_session_blacklist_on_timeout' to false.

>
> Thanks.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Sharing Bluestore WAL

2017-11-24 Thread Richard Hesketh
On 23/11/17 17:19, meike.talb...@women-at-work.org wrote:
> Hello,
> 
> in our preset Ceph cluster we used to have 12 HDD OSDs per host.
> All OSDs shared a common SSD for journaling.
> The SSD was used as root device and the 12 journals were files in the 
> /usr/share directory, like this:
> 
> OSD 1 - data /dev/sda - journal /usr/share/sda
> OSD 2 - data /dev/sdb - journal /usr/share/sdb
> ...
> 
> We now want to migrate to Bluestore and continue to use this approach.
> I tried to use "ceph-deploy osd prepare test04:sdc --bluestore --block-db 
> /var/local/sdc-block --block-wal /var/local/sdc-wal" to setup an OSD which 
> essentially works.
> 
> However I'm wondering is this correct at all.
> And how can I make sure that the sdc-block and sdc-wal to not fill up the SSD 
> disk.
> Is there any option to limit the file size and what are the recommended value 
> of such an option?
> 
> Thank you
> 
> Meike

The maximum size of the WAL is dependent on cluster configuration values, but 
it will always be relatively small. There is no maximum DB size or, as it 
stands, good estimates for how large a DB may realistically grow. The expected 
behaviour is that if the DB outgrows its device it will spill over onto the 
data device. I don't believe there is any option that would let you effectively 
limit the size of files if you're using flat files to back your devices.

Using files for your DB/WAL is not recommended practice - you have the space 
problems that you mention and you'll also be suffering a performance hit by 
sticking a filesystem layer in the middle of things. Realistically, you should 
partition your SSD and provide entire partitions as the devices on which to 
store your OSD DBs. There is no point in specifying the WAL as a separate 
device unless you're doing something advanced; it will be stored alongside the 
DB on the DB device if not otherwise specified, and since you're putting them 
on the same device anyway you get no advantage to splitting them. With 
everything partitioned off correctly, you don't have to worry about Ceph data 
enroaching on your root FS space.

I would also worry that unless that one SSD is very large, 12 HDDs : 1 SSD 
could be overdoing it. Filestore journals sustained a lot of writing but didn't 
need to be very large, comparatively; Bluestore database w/ WAL is a lot 
lighter on the I/O but does need considerably more standing space since it's 
actually permanently storing metadata rather than just write journalling. If 
it's the case that you've only got a few GB of space you can spare for each DB, 
you're probably going to overgrow that very quickly and you won't see much 
actual benefit from using the SSD.

Rich



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] "failed to open ino"

2017-11-24 Thread Jens-U. Mozdzen

Hi all,

with our Ceph Luminous CephFS, we're plaqued with "failed to open ino"  
messages. These don't seem to affect daily business (in terms of "file  
access"). (There's a backup performance issue that may eventually be  
related, but I'll report on that in a different thread.)


Our Ceph currently is at v12.2.1 (git.1507910930.aea79b8b7a), on  
OpenSUSE Leap 42.3. Three Ceph nodes, 12 HDD OSDs, two SSD OSDs,  
status "HEALTH_OK".


We have a single CephFS and two MDS (active/standby), metadata pool is  
on SSD OSDs, content is on HDD OSDs (all file stores). That CephFS is  
mounted by several clients (via kernel cephfs support, mostly kernel  
version 4.4.76) and via NFS (kernel nfsd on a kernel-mounted CephFS).


In the log of the active MDS, we currently see the following two  
inodes reported over and over again, about every 30 seconds:


--- cut here ---
2017-11-24 18:24:16.496397 7fa308cf0700  0 mds.0.cache  failed to open  
ino 0x10001e45e1d err -22/0
2017-11-24 18:24:16.497037 7fa308cf0700  0 mds.0.cache  failed to open  
ino 0x10001e4d6a1 err -22/-22
2017-11-24 18:24:16.500645 7fa308cf0700  0 mds.0.cache  failed to open  
ino 0x10001e45e1d err -22/0
2017-11-24 18:24:16.501218 7fa308cf0700  0 mds.0.cache  failed to open  
ino 0x10001e4d6a1 err -22/-22
2017-11-24 18:24:46.506210 7fa308cf0700  0 mds.0.cache  failed to open  
ino 0x10001e45e1d err -22/0
2017-11-24 18:24:46.506926 7fa308cf0700  0 mds.0.cache  failed to open  
ino 0x10001e4d6a1 err -22/-22
2017-11-24 18:24:46.510354 7fa308cf0700  0 mds.0.cache  failed to open  
ino 0x10001e45e1d err -22/0
2017-11-24 18:24:46.510891 7fa308cf0700  0 mds.0.cache  failed to open  
ino 0x10001e4d6a1 err -22/-22

--- cut here ---

There were other reported inodes with other errors, too ("err -5/0",  
for instance), the root cause seems to be the same (see below).


For the 0x10001e4d6a1 inode, it was first mentioned in the MDS log as  
follows, and then every 30 seconds as "failed to open". 0x10001e45e1d  
just appeared at the same time, "out of the blue":


--- cut here ---
2017-11-23 19:12:28.440107 7f3586cbd700  0 mds.0.bal replicating dir  
[dir 0x10001e4d6a1 /some/path/in/our/cephfs/ [2,head] auth pv=3473  
v=3471 cv=2963/2963 ap=1+6+6 state=1610612738|complete f(v0  
m2017-11-23 19:12:25.005299 15=4+11) n(v74 rc2017-11-23  
19:12:28.429258 b137317605 5935=4875+1060)/n(v74 rc2017-11-23  
19:12:28.337259 b139723963 5969=4898+1071) hs=15+23,ss=0+0 dirty=30 |  
child=1 dirty=1 authpin=1 0x5649e811e9c0] pop 10223.1 .. rdp 373 adj 0
2017-11-23 19:15:55.015347 7f3580cb1700  0 mds.0.cache  failed to open  
ino 0x10001e45e1d err -22/0
2017-11-23 19:15:55.016056 7f3580cb1700  0 mds.0.cache  failed to open  
ino 0x10001e4d6a1 err -22/-22

--- cut here ---

According to a message from this ML, "replicating dir" is supposed to  
indicate that the directory is hot and being replicated to another  
active MDS to spread the load. But there is no other active MDS, as we  
have only two in total (and Ceph properly reports "mds: cephfs-1/1/1  
up  {0=server1=up:active}, 1 up:standby-replay".


From what I took from the logs, all error messages seem to be related  
to that same path "/some/path/in/our/cephfs/" all the time, which gets  
deleted and recreated every evening - thus changing ino numbers. But  
what can it be that makes that same path, buried far down in some  
cephfs hierarchy, cause these "failed to open ino" messages? It's just  
one of a lot of directories, and no particularly populated one (about  
40 files and directories).


Another possibly interesting side fact: The previous ino's messages  
sometimes stop when our evening/night job ends (no more active  
accesses to that directory), sometimes they continue throughout the  
day. I've even noticed it crossing the directory recreation time  
(messages for old inode don't stop when the old directory is deleted,  
except for after a new "replicating dir" message appears). Today I  
deleted the directory (including its parent) during the day, but the  
messages wouldn't stop. The messages also survive switching to the  
other MDS. And by now the next daily instance of our job has started,  
recreating the directories (that I had deleted throughout the day).  
The "failed to open ino" messages for the old ino numbers still  
appears every 30 seconds, I've not yet seen any "replicating dir"  
message for any of that cephfs tree area. I have seen a few for other  
areas of the cephfs tree, but no other ino numbers were reported as  
"failed to open" - only the two from above:


--- cut here ---
2017-11-24 19:22:13.50 7fa308cf0700  0 mds.0.cache  failed to open  
ino 0x10001e45e1d err -22/0
2017-11-24 19:22:13.000770 7fa308cf0700  0 mds.0.cache  failed to open  
ino 0x10001e4d6a1 err -22/-22
2017-11-24 19:22:13.003918 7fa308cf0700  0 mds.0.cache  failed to open  
ino 0x10001e45e1d err -22/0
2017-11-24 19:22:13.004469 7fa308cf0700  0 mds.0.cache  failed to open  
ino 0x10001e4d6a1 err -22/-22
2017-11-24

Re: [ceph-users] Behavior of ceph-fuse when network is down

2017-11-24 Thread Zhang Qiang
Thanks! I'll check it out.

2017年11月24日 17:58,"Yan, Zheng" 写道:

> On Fri, Nov 24, 2017 at 4:59 PM, Zhang Qiang 
> wrote:
> > Hi all,
> >
> > To observe what will happen to ceph-fuse mount if the network is down, we
> > blocked
> > network connections to all three monitors by iptables. If we restore the
> > network
> > immediately(within minutes), the blocked I/O request will be restored,
> every
> > thing will
> > be back to normal.
> >
> > But if we continue to block it long enough, say twenty minutes, ceph-fuse
> > will not be
> > able to restore. The ceph-fuse process is still there, but will not be
> able
> > to handle I/O
> > operations, df or ls will hang indefinitely.
> >
> > What is the retry policy of ceph-fuse? Is it normal for ceph-fuse to hang
> > after the
> > network blocking? If so, how can I make it restore to normal after the
> > network is
> > recovered? If it is not normal, what might be the cause? How can I help
> to
> > debug this?
>
> you can use 'kick_stale_sessions' ASOK command to make ceph-fuse
> reconnect, or set  'client_reconnect_stale' config option to true.
> Besides, you need to set mds config option
> 'mds_session_blacklist_on_timeout' to false.
>
> >
> > Thanks.
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com