Hi! Wow! I'm impressed by your analysis, even though when I read your "takes longer and longer" I was guessing that everything existing is compared with everything new, making a O(n^2), which in fact it seems to be. However that's just a guess what might be going on ;-)
Regards, Ulrich >>> Aaron Jones <[email protected]> schrieb am 04.09.2013 um 01:28 in Nachricht <[email protected]>: > Unfortunately there's only one LU per session and there's not presently a > way to > present multiple LUs over a single session. The storage target is a cluster > that uses iSCSI redirection to direct sessions to the correct endpoint on a > per-volume basis. We're looking into what it will take to expose multiple > volumes over a single session, but that may be a while. > > I managed to get to 10,000 sessions, and didn't try any higher. My ENOMEM > errors were being caused by my kernel.max_pid setting being too low. More > than > 32000 pids are being used up during the creation of 10000 sessions. I was > not > actually out of memory. > > As for the time it takes to make sessions in various ways, I ran two > experiments > on a setup with 20 ifaces each exposed to 100 targets, for a total of 2000 > targets. When I run: > > iscsiadm -m node -L all > > It took about 2 minutes 37 seconds for all 2000 logins to complete. > However, if > I run: > > iscsiadm -m node -l -I testiface.0 > iscsiadm -m node -l -I testiface.1 > iscsiadm -m node -l -I testiface.2 > ... > iscsiadm -m node -l -I testiface.19 > > It takes substantially longer, around 3 hours in total. It takes longer and > longer for each subsequent execution. > > Cumulative Sessions Milliseconds > ------------------- ------------ > 100 2390 > 200 10270 > 300 26380 > 400 51520 > 500 86380 > 600 123230 > 700 171070 > 800 228540 > 900 293650 > 1000 377840 > 1100 451320 > 1200 564190 > 1300 635140 > 1400 740620 > 1500 868140 > 1600 989730 > 1700 1115170 > 1800 1253960 > 1900 1401440 > 2000 1555110 > > I've attached some logs that give more detail (but not much) detail on the > experiments. > > send_targets - output from performing discovery on each iface > login_2000_log - iscsiadm stdout while logging in all 2000 sessions. > login_2000_time - /usr/bin/time statistics for the command. > login_100_X_log - iscsiadm stdout while logging in 100 sessions for the > Xth time. > login_100_X_time - /usr/bin/time statistics for the command. > > All of the *_log files have had timestamps added to each line of stdout. > > I used some python scripts / bash scripts to automate the logins, and I can > attach those as well if needed. > > Doing a regression on the times for logging in 100 sessions, I get the > following > polynomial approximating the number of milliseconds it will take to log in > 100 > sessions such that the total number of sessions will become x. > > 0.4058x^2 - 33.14x - 255.9 > > Here's a plot of the data points versus the regression (sessions vs > milliseconds). > > <https://lh6.googleusercontent.com/-CnnQpW7KFJs/UiZvTZmmwMI/AAAAAAAAACM/j0c5Du > > U891I/s1600/graph.png> > > That is to say, if there are 3900 existing sessions, and we want to log in > 100 > more, the regression shows it should take > > 0.4058*4000^2 - 33.14*4000 - 255.9 = 6359984 ms = 1.77 hours > > When I log in one interface at a time it looks like iscsiadm is rather busy > whereas normally iscsiadm is only busy for a short amount of time and then > iscsid becomes busy. Is this likely an issue with scanning the database > repeatedly? > > I feel like I'm missing an obvious command line parameter when I do these > logins > that would improve things, but I don't see anything. > > Absent a quick response I suppose I need to try to profile iscsiadm/iscsid > to > see where the nested loop is that appears to be giving me the n^2 behavior > or > turning up tracing on both iscsiadm and iscsid. > > On Monday, August 26, 2013 11:14:11 PM UTC-6, Mike Christie wrote: >> >> On 08/22/2013 06:15 PM, Aaron Jones wrote: >> > I have a rather odd use case where I would like to have several thousand >> > iSCSI sessions (one connection per session) on a single initiator, but >> > I'm having trouble reaching beyond 6000-8000 before things start >> breaking. >> > >> >> I do not think people have tried that many. I think at Red Hat we only >> tested with 1K targets/sessions. >> >> How many LUs per session do you have? >> >> There are some limits due to how many file handles we can have open. >> That is currently hard coded to >> >> #define ISCSI_MAX_FILES 16384 >> >> I think though it does not like up 1 session == 1 file. There are other >> limits like the host number and session number are 32 bits, but you are >> not close to them. >> >> >> > My hardware has 72GB memory, 12 Xeon Cores at 2.5GHz, and 2 10GB >> > ethernet ports. For software I'm running on Ubuntu 12.04 (With Linux >> > kernel 3.8) with open-iscsi 2.0-873. >> > >> > First, I make several ifaces using the same hardware so I can present >> > different iqns to the target. >> > >> > iscsiadm -m iface -I testiface.0 -o new >> > iscsiadm -m iface -I testiface.0 -o update -n iface.initiatorname -v >> > iqn.something.0000 >> > iscsiadm -m iface -I testiface.0 -o update -n iface.iface_num -v 0 >> > iscsiadm -m iface -I testiface.1 -o new >> > iscsiadm -m iface -I testiface.1 -o update -n iface.initiatorname -v >> > iqn.something.0001 >> > iscsiadm -m iface -I testiface.1 -o update -n iface.iface_num -v 1 >> > >> > Then I perform discovery for each iface. The discovery may report up to >> > 2000 targets per initiator iface that I'm using. >> > >> > iscsiadm -m discovery -t sendtargets -p some.ip.address -I >> > testiface.0 -o new >> > iscsiadm -m discovery -t sendtargets -p some.ip.address -I >> > testiface.1 -o new >> > >> > Finally I try to log in to everything: >> > >> > iscsiadm -m node -L all >> > >> > >> > After iscsid crunches on it for a while (seemingly single threaded from >> > watching top), and eventually all of the iSCSI sessions are created (I >> >> It is single threaded but each operation does not block. At least it >> should not block for long. It does not wait for login to complete for >> example, so if you see iscsid blocking for a long time on a operation >> let me know. >> >> >> > can see the sessions are established on the targets). After the >> > sessions are created, iscsid then iscsiadm begins printing: >> > >> > Login to [iface: testiface.1, target: iqn.something.else, portal: >> > some.ip.address,3260] successful. >> > >> > >> > However, after quite a few successes, I finally start getting: >> > >> > iscsiadm: Could not login to [iface: testiface.5, target: >> > iqn.something.else, portal: some.ip.address,3260]. >> > iscsiadm: initiator reported error (9 - internal error) >> > >> > >> > I did an strace of iscsid during this, and it seems to be getting ENOMEM >> > back on clone system calls. Looking at ps after I start geting >> > failures, there are quite a few defunct iscsid processes in the list >> > that I guess the parent process has not attended to yet. What seems >> > strange to me is that I seem to have quite a lot of free memory looking >> > at vmstat/free (at least right up to the point where things start going >> > crazy). I added a 300GB SSD as swap space, but that did not seem to >> > affect the problem, nor did tweaking Linux's overcommit settings. >> > Eventually the issues get so bad that bash complains about not being >> > able to fork processes due to a lack of memory. Would 8000 sessions >> > really take 72GB of memory? >> > >> > strace snippet: >> > >> > clone(child_stack=0, >> > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, >> > child_tidptr=0x7fc24aa7a9d0) = -1 ENOMEM (Cannot allocate memory) >> > >> > >> > If I try to log in for each iface individually it is FAR slower. That >> > is to say the first 2000 sessions are established rather quickly, but >> > logging in things after that is extremely slow, i.e. using the commands >> > >> > iscsiadm -m node -l -I testiface.0 >> > >> > iscsiadm -m node -l -I testiface.1 >> > >> > The first command goes fairly quickly, but the second command takes an >> > extreme amount of time. >> >> I am not sure what you mean here. Did you mean that after 2K sessions >> when you run those commands they are all slow or is one command fast and >> one is slow? >> >> Do you know what part of the login process is slow? Is it the iscsi >> login, the scsi scanning, or some other operation? >> >> There is not really enough info to go by in the bug report. It would >> take some major debugging. When I get time I will try it here. >> >> >> > > -- > You received this message because you are subscribed to the Google Groups > "open-iscsi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/open-iscsi. > For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/groups/opt_out.
