Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?

2010-12-10 Thread Lev Serebryakov
Hello, Freebsd-geom.

   I'm  digging  thought  GEOM/IO  code  and  can not find place, where
 requests  from  userland to read more than MAXPHYS bytes, is splitted
 into  several "struct bio"?

  It seems, that these children request are issued one-by-one, not in
 parallel,   am  I  right?  Why?  It  breaks  down  parallelism,  when
 underlying GEOM can process several requests simoltaneously?

-- 
// Black Lion AKA Lev Serebryakov 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?

2010-12-10 Thread Alexander Motin
Lev Serebryakov wrote:
>I'm  digging  thought  GEOM/IO  code  and  can not find place, where
>  requests  from  userland to read more than MAXPHYS bytes, is splitted
>  into  several "struct bio"?
> 
>   It seems, that these children request are issued one-by-one, not in
>  parallel,   am  I  right?  Why?  It  breaks  down  parallelism,  when
>  underlying GEOM can process several requests simoltaneously?

AFAIK first time requests from user-land broken to MAXPHYS-size pieces
by physio() before entering GEOM. Requests are indeed serialized here, I
suppose to limit KVA that thread can harvest, but IMHO it could be
reconsidered.

One more split happens (when needed) at geom_disk module to honor disk
driver's maximal I/O size. There is no serialization. Most of ATA/SATA
drivers in 8-STABLE support I/O up to at least min(512K, MAXPHYS) - 128K
by default. Many SCSI drivers still limited by DFLTPHYS - 64K.

-- 
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?

2010-12-10 Thread Andriy Gapon
on 10/12/2010 15:22 Lev Serebryakov said the following:
> Hello, Freebsd-geom.
> 
>I'm  digging  thought  GEOM/IO  code  and  can not find place, where
>  requests  from  userland to read more than MAXPHYS bytes, is splitted
>  into  several "struct bio"?

Check out g_disk_start().
The split is done based on disk-specific d_maxsize, not hardcoded MAXPHYS, of 
course.

>   It seems, that these children request are issued one-by-one, not in
>  parallel,   am  I  right?  Why?  It  breaks  down  parallelism,  when
>  underlying GEOM can process several requests simoltaneously?

How do you *issue* the child requests in parallel?
Of course, they can *run* in parallel if system configuration permits that and
request run time is sufficient for an overlap to happen.
Besides, there are no geoms under disk geom, it works on peripheral drivers.

But maybe I misunderstood your question and you talked about a different I/O 
layer
or different I/O path.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?

2010-12-10 Thread Andriy Gapon
on 10/12/2010 16:48 Andriy Gapon said the following:
> But maybe I misunderstood your question and you talked about a different I/O 
> layer
> or different I/O path.
> 

Oh, probably you talk about physread/physwrite == physio.
Indeed, it issues bio-s with max size of si_iosize_max and runs them 
sequentially.
 Besides, if uio is really "vectored", then each uio sub-buffer is processed
sequentially too.
This is probably less fast than running the requests in parallel; plus side 
could
be that less KVA is required for mapping user space buffer (UIO_USERSPACE case)
into kernel.  Not sure if the latter is much of concern though.  The sequential
code is simpler too :-)

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?

2010-12-10 Thread Lev Serebryakov
Hello, Alexander.
You wrote 10 декабря 2010 г., 17:45:20:

>>I'm  digging  thought  GEOM/IO  code  and  can not find place, where
>>  requests  from  userland to read more than MAXPHYS bytes, is splitted
>>  into  several "struct bio"?
>>   It seems, that these children request are issued one-by-one, not in
>>  parallel,   am  I  right?  Why?  It  breaks  down  parallelism,  when
>>  underlying GEOM can process several requests simoltaneously?
> AFAIK first time requests from user-land broken to MAXPHYS-size pieces
> by physio() before entering GEOM. Requests are indeed serialized here, I
> suppose to limit KVA that thread can harvest, but IMHO it could be
> reconsidered.
  It  is good idea, maybe to have GEOM flag for this? For example, any
  stripe/geom3/geom5  code  can  process  read of series of reads, for
  example much fater, than sequentially -- if userland
  want  to  read big blocks, bigger than stripe size. And small stripe
  size  is  bad  idea due to high fixed cost of transaction. Now, when
  application  read  files  on  RAID5 with big blocks (say, read() is
  called with 1Mb buffer), RAID5 geom sees  read requests   of 128Kb
  in size, one by one. And with stripe size of  128Kb,  it  performs
  as  single  disk :( I can add pre-read for full-sized  reads,  but
  it is not generic solution, and sending BIOs from   one
  (logical/userland) read/write  request  without awaiting  their
  completion is generic solution.

> One more split happens (when needed) at geom_disk module to honor disk
> driver's maximal I/O size. There is no serialization. Most of ATA/SATA
> drivers in 8-STABLE support I/O up to at least min(512K, MAXPHYS) - 128K
> by default. Many SCSI drivers still limited by DFLTPHYS - 64K.
  Yep, it is what I seen in my investigations.

-- 
// Black Lion AKA Lev Serebryakov 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?

2010-12-10 Thread Lev Serebryakov
Hello, Andriy.
You wrote 10 декабря 2010 г., 18:03:27:

> on 10/12/2010 16:48 Andriy Gapon said the following:
>> But maybe I misunderstood your question and you talked about a different I/O 
>> layer
>> or different I/O path.
> Oh, probably you talk about physread/physwrite == physio.
> Indeed, it issues bio-s with max size of si_iosize_max and runs them 
> sequentially.
   Yep,  I'm  talking  about  this  case.  See my message to Alexander
Motin  with  explanation why I think sequential processing here is not
good idea.

-- 
// Black Lion AKA Lev Serebryakov 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?

2010-12-10 Thread Andriy Gapon
on 10/12/2010 16:45 Alexander Motin said the following:
> by default. Many SCSI drivers still limited by DFLTPHYS - 64K.

Including the cases where MAXBSIZE is abused because it historically has the 
same
value.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?

2010-12-10 Thread Alexander Motin
Andriy Gapon wrote:
> on 10/12/2010 16:45 Alexander Motin said the following:
>> by default. Many SCSI drivers still limited by DFLTPHYS - 64K.
> 
> Including the cases where MAXBSIZE is abused because it historically has the 
> same
> value.

DFLTPHYS automatically assumed by CAM for all SIMs not reporting their
maximal I/O size. All drivers using MAXBSIZE most likely will fall into
this category, because this functionality was added just at 8.0.

-- 
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


USENIX ATC '11 Submission Deadline Approaching

2010-12-10 Thread Lionel Garth Jones
We're writing to remind you that the submissions deadline for the 2011
USENIX Annual Technical Conference (USENIX ATC '11) is just over a month
away. Please submit your work by 11:59 p.m. EST on Wednesday, January
12, 2011.
http://www.usenix.org/atc11/cfpb/

The USENIX ATC '11 Program Committee seeks high-quality submissions that
further the knowledge and understanding of modern computing systems,
with an emphasis on implementations and experimental results. We
encourage papers that break new ground or present insightful results
based on practical experience with computer systems.

USENIX ATC has a broad scope, and specific topics of interest include
but are not limited to:

* Architectural interaction
* Cloud computing
* Deployment experience
* Distributed and parallel systems
* Embedded systems
* Energy/power management
* File and storage systems
* Mobile, wireless, and sensor systems
* Networking and network services
* Operating systems
* Reliability, availability, and scalability
* Security, privacy, and trust
* System and network management and troubleshooting
* Usage studies and workload characterization
* Virtualization

For more details on the submission process, please see the complete
Call for Papers at
http://www.usenix.org/atc11/cfpb/

We look forward to your submissions.

Jason Nieh, Columbia University
Carl Waldspurger, VMware
USENIX ATC '11 Program Chairs
atc11cha...@usenix.org

---
Call for Papers
2011 USENIX Annual Technical Conference
June 15-17, 2011, in Portland, OR
http://www.usenix.org/atc11/cfpb/
Submissions Deadline: January 12, 2011, 11:59 p.m. EST
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"