On 04/22/2015 07:18 PM, Dr. David Alan Gilbert wrote: > * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: >> Hi, >> >> ping ... > > I will get to look at this again; but not until after next week. > >> The main blocked bugs for COLO have been solved, > > I've got the v3 set running, but the biggest problem I hit are problems > with the packet comparison module; I've seen a panic which I think is > in colo_send_checkpoint_req that I think is due to the use of > GFP_KERNEL to allocate the netlink message and I think it can schedule > there. I tried making that a GFP_ATOMIC but I'm hitting other > problems with :
Thanks for your test. I guest the backtrace should like: 1. colo_send_checkpoint_req() 2. colo_setup_checkpoint_by_id() Because we hold rcu read lock, so we cannot use GFP_KERNEL to malloc memory. > > kcolo_thread, no conn, schedule out Hmm, how to reproduce it? In my test, I only focus on block replication, and I don't use the network. > > that I've not had time to look into yet. > > So I only get about a 50% success rate of starting COLO. > I see there are stuff in the TODO of the colo-proxy that > seem to say the netlink stuff should change, maybe you're already fixing > that? Do you mean you get about a 50% success rate if you use the network? Thanks Wen Congyang > >> we also have finished some new features and optimization on COLO. (If you >> are interested in this, >> we can send them to you in private ;)) > >> For easy of review, it is better to keep it simple now, so we will not add >> too much new codes into this frame >> patch set before it been totally reviewed. > > I'd like to see those; but I don't want to take code privately. > It's OK to post extra stuff as a separate set. > >> COLO is a totally new feature which is still in early stage, we hope to >> speed up the development, >> so your comments and feedback are warmly welcomed. :) > > Yes, it's getting there though; I don't think anyone else has > got this close to getting a full FT set working with disk and networking. > > Dave > >>