Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?
Hello, Freebsd-geom. I'm digging thought GEOM/IO code and can not find place, where requests from userland to read more than MAXPHYS bytes, is splitted into several "struct bio"? It seems, that these children request are issued one-by-one, not in parallel, am I right? Why? It breaks down parallelism, when underlying GEOM can process several requests simoltaneously? -- // Black Lion AKA Lev Serebryakov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?
Lev Serebryakov wrote: >I'm digging thought GEOM/IO code and can not find place, where > requests from userland to read more than MAXPHYS bytes, is splitted > into several "struct bio"? > > It seems, that these children request are issued one-by-one, not in > parallel, am I right? Why? It breaks down parallelism, when > underlying GEOM can process several requests simoltaneously? AFAIK first time requests from user-land broken to MAXPHYS-size pieces by physio() before entering GEOM. Requests are indeed serialized here, I suppose to limit KVA that thread can harvest, but IMHO it could be reconsidered. One more split happens (when needed) at geom_disk module to honor disk driver's maximal I/O size. There is no serialization. Most of ATA/SATA drivers in 8-STABLE support I/O up to at least min(512K, MAXPHYS) - 128K by default. Many SCSI drivers still limited by DFLTPHYS - 64K. -- Alexander Motin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?
on 10/12/2010 15:22 Lev Serebryakov said the following: > Hello, Freebsd-geom. > >I'm digging thought GEOM/IO code and can not find place, where > requests from userland to read more than MAXPHYS bytes, is splitted > into several "struct bio"? Check out g_disk_start(). The split is done based on disk-specific d_maxsize, not hardcoded MAXPHYS, of course. > It seems, that these children request are issued one-by-one, not in > parallel, am I right? Why? It breaks down parallelism, when > underlying GEOM can process several requests simoltaneously? How do you *issue* the child requests in parallel? Of course, they can *run* in parallel if system configuration permits that and request run time is sufficient for an overlap to happen. Besides, there are no geoms under disk geom, it works on peripheral drivers. But maybe I misunderstood your question and you talked about a different I/O layer or different I/O path. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?
on 10/12/2010 16:48 Andriy Gapon said the following: > But maybe I misunderstood your question and you talked about a different I/O > layer > or different I/O path. > Oh, probably you talk about physread/physwrite == physio. Indeed, it issues bio-s with max size of si_iosize_max and runs them sequentially. Besides, if uio is really "vectored", then each uio sub-buffer is processed sequentially too. This is probably less fast than running the requests in parallel; plus side could be that less KVA is required for mapping user space buffer (UIO_USERSPACE case) into kernel. Not sure if the latter is much of concern though. The sequential code is simpler too :-) -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?
Hello, Alexander. You wrote 10 декабря 2010 г., 17:45:20: >>I'm digging thought GEOM/IO code and can not find place, where >> requests from userland to read more than MAXPHYS bytes, is splitted >> into several "struct bio"? >> It seems, that these children request are issued one-by-one, not in >> parallel, am I right? Why? It breaks down parallelism, when >> underlying GEOM can process several requests simoltaneously? > AFAIK first time requests from user-land broken to MAXPHYS-size pieces > by physio() before entering GEOM. Requests are indeed serialized here, I > suppose to limit KVA that thread can harvest, but IMHO it could be > reconsidered. It is good idea, maybe to have GEOM flag for this? For example, any stripe/geom3/geom5 code can process read of series of reads, for example much fater, than sequentially -- if userland want to read big blocks, bigger than stripe size. And small stripe size is bad idea due to high fixed cost of transaction. Now, when application read files on RAID5 with big blocks (say, read() is called with 1Mb buffer), RAID5 geom sees read requests of 128Kb in size, one by one. And with stripe size of 128Kb, it performs as single disk :( I can add pre-read for full-sized reads, but it is not generic solution, and sending BIOs from one (logical/userland) read/write request without awaiting their completion is generic solution. > One more split happens (when needed) at geom_disk module to honor disk > driver's maximal I/O size. There is no serialization. Most of ATA/SATA > drivers in 8-STABLE support I/O up to at least min(512K, MAXPHYS) - 128K > by default. Many SCSI drivers still limited by DFLTPHYS - 64K. Yep, it is what I seen in my investigations. -- // Black Lion AKA Lev Serebryakov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?
Hello, Andriy. You wrote 10 декабря 2010 г., 18:03:27: > on 10/12/2010 16:48 Andriy Gapon said the following: >> But maybe I misunderstood your question and you talked about a different I/O >> layer >> or different I/O path. > Oh, probably you talk about physread/physwrite == physio. > Indeed, it issues bio-s with max size of si_iosize_max and runs them > sequentially. Yep, I'm talking about this case. See my message to Alexander Motin with explanation why I think sequential processing here is not good idea. -- // Black Lion AKA Lev Serebryakov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?
on 10/12/2010 16:45 Alexander Motin said the following: > by default. Many SCSI drivers still limited by DFLTPHYS - 64K. Including the cases where MAXBSIZE is abused because it historically has the same value. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?
Andriy Gapon wrote: > on 10/12/2010 16:45 Alexander Motin said the following: >> by default. Many SCSI drivers still limited by DFLTPHYS - 64K. > > Including the cases where MAXBSIZE is abused because it historically has the > same > value. DFLTPHYS automatically assumed by CAM for all SIMs not reporting their maximal I/O size. All drivers using MAXBSIZE most likely will fall into this category, because this functionality was added just at 8.0. -- Alexander Motin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
USENIX ATC '11 Submission Deadline Approaching
We're writing to remind you that the submissions deadline for the 2011 USENIX Annual Technical Conference (USENIX ATC '11) is just over a month away. Please submit your work by 11:59 p.m. EST on Wednesday, January 12, 2011. http://www.usenix.org/atc11/cfpb/ The USENIX ATC '11 Program Committee seeks high-quality submissions that further the knowledge and understanding of modern computing systems, with an emphasis on implementations and experimental results. We encourage papers that break new ground or present insightful results based on practical experience with computer systems. USENIX ATC has a broad scope, and specific topics of interest include but are not limited to: * Architectural interaction * Cloud computing * Deployment experience * Distributed and parallel systems * Embedded systems * Energy/power management * File and storage systems * Mobile, wireless, and sensor systems * Networking and network services * Operating systems * Reliability, availability, and scalability * Security, privacy, and trust * System and network management and troubleshooting * Usage studies and workload characterization * Virtualization For more details on the submission process, please see the complete Call for Papers at http://www.usenix.org/atc11/cfpb/ We look forward to your submissions. Jason Nieh, Columbia University Carl Waldspurger, VMware USENIX ATC '11 Program Chairs atc11cha...@usenix.org --- Call for Papers 2011 USENIX Annual Technical Conference June 15-17, 2011, in Portland, OR http://www.usenix.org/atc11/cfpb/ Submissions Deadline: January 12, 2011, 11:59 p.m. EST ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"