This is very helpful info Jewell. Thanks for sharing the update. -Amar
On Tue, Jan 7, 2020 at 7:51 PM Jewell, Paul < [email protected]> wrote: > Hi All, > > Just as a follow up on this. I changed the mount options line to look like: > > server:/gv0 /data glusterfs defaults,_netdev,direct-io-mode=disable 0 0 > > The change was adding the part "direct-io-mode=disable". > > After this and a complete reboot of the cluster, it appears the problem > has been solved. > > So I guess I would recommend using this "direct-io-mode=disable" when > working with numpy files. > > Thanks, > -Paul > ------------------------------ > *From:* [email protected] < > [email protected]> on behalf of Jewell, Paul < > [email protected]> > *Sent:* Thursday, December 12, 2019 10:52 AM > *To:* [email protected] <[email protected]> > *Subject:* [External] Re: [Gluster-users] Problems with gluster > distributed mode and numpy memory mapped files > > Hi All, > > I am using gluster in order to share data between four development > servers. It is just four primarily compute-machine attached to a switch. > Each one has a 10TB hardware raid5 volume for a total of about 40TB. Each > of the machines is a gluster client and server mounted to "itself". (For > example, machine1 mounts the gluster volume as machine1:gvfs) The volume is > a standard "distributed" (just JBOD all of the space, no duplication) > > This works well in general. Recently I have been developing an application > for our research that makes use of the numpy "mmap" mode. Basically, a loop > is iterated and at the end of it, for each file in a list, it is opened, > and an array of data is dumped into one contiguous slice in the file, > basically like this: > > for data in chunks: > for i in range(10): > new_coverage = np.load("coverage_file_%d" % d, mmap_mode='r+') > new_coverage[(i*10000):((i+1)*10000), : ] = data > > That is to say, a slice of about ~50MB of data is written to each file, > one at a time. On the first iteration, the first ~50mb is written, on the > second iteration, the second ~50mb is written, etc. > > I have timed the run of these iterations, and something very strange is > happening. This is an output example of one iteration of the outer loop. > The small numbers represent a reasonable amount of time to write this small > data. For some reason > on some files the writing time is absolutely huge for this small amount of > data. > > Writing new coverage for majiq file 1 took 0.591965 > Writing new coverage for majiq file 2 took 0.601540 > Writing new coverage for majiq file 3 took 0.989093 > Writing new coverage for majiq file 4 took 612.667724 > Writing new coverage for majiq file 5 took 612.630379 > Writing new coverage for majiq file 6 took 612.099524 > Writing new coverage for majiq file 7 took 612.666857 > Writing new coverage for majiq file 8 took 612.633965 > Writing new coverage for majiq file 9 took 612.634483 > Writing new coverage for majiq file 10 took 612.096660 > Writing new coverage for majiq file 11 took 612.634655 > Writing new coverage for majiq file 12 took 612.632943 > Writing new coverage for majiq file 13 took 612.631088 > Writing new coverage for majiq file 14 took 612.675435 > Writing new coverage for majiq file 15 took 341.445473 > Writing new coverage for majiq file 16 took 0.550073 > Writing new coverage for majiq file 17 took 0.655051 > Writing new coverage for majiq file 18 took 0.641738 > Writing new coverage for majiq file 19 took 0.808618 > Writing new coverage for majiq file 20 took 0.583891 > Writing new coverage for majiq file 21 took 0.617392 > Writing new coverage for majiq file 22 took 0.749011 > Writing new coverage for majiq file 23 took 0.609862 > > At first I thought this must be something like the fast files are on the > same disk as the server where the program is running, the the slow files > were distributed to other machines, but it turns out that for every > iteration of the outer loop, the slow group of files actually changes. In > top the python process is stuck in D state throughout these writes. I > have tested my algorithm on my development machine, as well as a scratch > (single SSD drive) on the same servers that I ran the other very slow > problem test on above, and all of the writes are under 1 second. > > It is an extremely strange problem I have never seen before. > > Because all of the high values are pretty much exactly the same amount of > time, it seems like this is some kind of timeout happening perhaps? What > would be causing it to happen that often? > > Thanks for reading! > > ------------------------------ > *From:* Jewell, Paul <[email protected]> > *Sent:* Monday, December 9, 2019 1:40 PM > *To:* [email protected] <[email protected]> > *Subject:* Re: Problems with gluster distributed mode and numpy memory > mapped files > > Hi All, > > I am using gluster in order to share data between four development > servers. It is just four primarily compute-machine attached to a switch. > Each one has a 10TB hardware raid5 volume for a total of about 40TB. Each > of the machines is a gluster client and server mounted to "itself". (For > example, machine1 mounts the gluster volume as machine1:gvfs) The volume is > a standard "distributed" (just JBOD all of the space, no duplication) > > This works well in general. Recently I have been developing an application > for our research that makes use of the numpy "mmap" mode. Basically, a loop > is iterated and at the end of it, for each file in a list, it is opened, > and an array of data is dumped into one contiguous slice in the file, > basically like this: > > for data in chunks: > for i in range(10): > new_coverage = np.load("coverage_file_%d" % d, mmap_mode='r+') > new_coverage[(i*10000):((i+1)*10000), : ] = data > > That is to say, a slice of about ~50MB of data is written to each file, > one at a time. On the first iteration, the first ~50mb is written, on the > second iteration, the second ~50mb is written, etc. > > I have timed the run of these iterations, and something very strange is > happening. This is an output example of one iteration of the outer loop. > The small numbers represent a reasonable amount of time to write this small > data. For some reason > on some files the writing time is absolutely huge for this small amount of > data. > > Writing new coverage for majiq file 1 took 0.591965 > Writing new coverage for majiq file 2 took 0.601540 > Writing new coverage for majiq file 3 took 0.989093 > Writing new coverage for majiq file 4 took 612.667724 > Writing new coverage for majiq file 5 took 612.630379 > Writing new coverage for majiq file 6 took 612.099524 > Writing new coverage for majiq file 7 took 612.666857 > Writing new coverage for majiq file 8 took 612.633965 > Writing new coverage for majiq file 9 took 612.634483 > Writing new coverage for majiq file 10 took 612.096660 > Writing new coverage for majiq file 11 took 612.634655 > Writing new coverage for majiq file 12 took 612.632943 > Writing new coverage for majiq file 13 took 612.631088 > Writing new coverage for majiq file 14 took 612.675435 > Writing new coverage for majiq file 15 took 341.445473 > Writing new coverage for majiq file 16 took 0.550073 > Writing new coverage for majiq file 17 took 0.655051 > Writing new coverage for majiq file 18 took 0.641738 > Writing new coverage for majiq file 19 took 0.808618 > Writing new coverage for majiq file 20 took 0.583891 > Writing new coverage for majiq file 21 took 0.617392 > Writing new coverage for majiq file 22 took 0.749011 > Writing new coverage for majiq file 23 took 0.609862 > > At first I thought this must be something like the fast files are on the > same disk as the server where the program is running, the the slow files > were distributed to other machines, but it turns out that for every > iteration of the outer loop, the slow group of files actually changes. In > top the python process is stuck in D state throughout these writes. I > have tested my algorithm on my development machine, as well as a scratch > (single SSD drive) on the same servers that I ran the other very slow > problem test on above, and all of the writes are under 1 second. > > It is an extremely strange problem I have never seen before. > > Because all of the high values are pretty much exactly the same amount of > time, it seems like this is some kind of timeout happening perhaps? What > would be causing it to happen that often? > > Thanks for reading! > > > ------------------------------ > *From:* Jewell, Paul > *Sent:* Thursday, December 5, 2019 4:18 PM > *To:* [email protected] <[email protected]> > *Subject:* Problems with gluster distributed mode and numpy memory mapped > files > > Hi All, > > I am using gluster in order to share data between four development > servers. It is just four primarily compute-machine attached to a switch. > Each one has a 10TB hardware raid5 volume for a total of about 40TB. Each > of the machines is a gluster client and server mounted to "itself". (For > example, machine1 mounts the gluster volume as machine1:gvfs) The volume is > a standard "distributed" (just JBOD all of the space, no duplication) > > This works well in general. Recently I have been developing an application > for our research that makes use of the numpy "mmap" mode. Basically, a loop > is iterated and at the end of it, for each file in a list, it is opened, > and an array of data is dumped into one contiguous slice in the file, > basically like this: > > for data in chunks: > for i in range(10): > new_coverage = np.load("coverage_file_%d" % d, mmap_mode='r+') > new_coverage[(i*10000):((i+1)*10000), : ] = data > > That is to say, a slice of about ~50MB of data is written to each file, > one at a time. On the first iteration, the first ~50mb is written, on the > second iteration, the second ~50mb is written, etc. > > I have timed the run of these iterations, and something very strange is > happening. This is an output example of one iteration of the outer loop. > The small numbers represent a reasonable amount of time to write this small > data. For some reason > on some files the writing time is absolutely huge for this small amount of > data. > > Writing new coverage for majiq file 1 took 0.591965 > Writing new coverage for majiq file 2 took 0.601540 > Writing new coverage for majiq file 3 took 0.989093 > Writing new coverage for majiq file 4 took 612.667724 > Writing new coverage for majiq file 5 took 612.630379 > Writing new coverage for majiq file 6 took 612.099524 > Writing new coverage for majiq file 7 took 612.666857 > Writing new coverage for majiq file 8 took 612.633965 > Writing new coverage for majiq file 9 took 612.634483 > Writing new coverage for majiq file 10 took 612.096660 > Writing new coverage for majiq file 11 took 612.634655 > Writing new coverage for majiq file 12 took 612.632943 > Writing new coverage for majiq file 13 took 612.631088 > Writing new coverage for majiq file 14 took 612.675435 > Writing new coverage for majiq file 15 took 341.445473 > Writing new coverage for majiq file 16 took 0.550073 > Writing new coverage for majiq file 17 took 0.655051 > Writing new coverage for majiq file 18 took 0.641738 > Writing new coverage for majiq file 19 took 0.808618 > Writing new coverage for majiq file 20 took 0.583891 > Writing new coverage for majiq file 21 took 0.617392 > Writing new coverage for majiq file 22 took 0.749011 > Writing new coverage for majiq file 23 took 0.609862 > > At first I thought this must be something like the fast files are on the > same disk as the server where the program is running, the the slow files > were distributed to other machines, but it turns out that for every > iteration of the outer loop, the slow group of files actually changes. In > top the python process is stuck in D state throughout these writes. I > have tested my algorithm on my development machine, as well as a scratch > (single SSD drive) on the same servers that I ran the other very slow > problem test on above, and all of the writes are under 1 second. > > It is an extremely strange problem I have never seen before. > > Because all of the high values are pretty much exactly the same amount of > time, it seems like this is some kind of timeout happening perhaps? What > would be causing it to happen that often? > > Thanks for reading! > > ________ > > Community Meeting Calendar: > > APAC Schedule - > Every 2nd and 4th Tuesday at 11:30 AM IST > Bridge: https://bluejeans.com/441850968 > > NA/EMEA Schedule - > Every 1st and 3rd Tuesday at 01:00 PM EDT > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > [email protected] > https://lists.gluster.org/mailman/listinfo/gluster-users >
________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/441850968 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/441850968 Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
