[jira] [Comment Edited] (ARROW-11583) [C++] Filesystem aware disk scheduling

Weston Pace (Jira) Tue, 09 Feb 2021 20:37:04 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-11583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282218#comment-17282218
 ]


Weston Pace edited comment on ARROW-11583 at 2/10/21, 4:36 AM:
---------------------------------------------------------------

I ran some experiments to see how effective the OS disk scheduling policies 
were.  My test was run on an Ubuntu 20.04 machine against an HDD and an SSD.

Methodology:
  * Theoretical max performance was measured using Gnome disks benchmark, this 
# was verified with `pv` of a large file.
  * % of sequential reads was measured using blktrace/blkparse
  * Reads were always never cached in the O/S thanks to judicious use of 
posix_fadvise
  * On this system the only OS scheduler available was mq-deadline, with a 
default read_expire of 500ms
  * "Read serially" with more than 1 thread used pipeline parallelism, but only 
ever one file at a time
  * "NOT read serially" read all files at once, pipeline parallelism was 
theoretically possible but likely unused
  * 10 files, 10MB per file, 10 iterations averaged together
  * All reads were done using block stream readers read with background 
readahead

Parameters:
  * use_ssd - Which disk to use, 0 means the HDD
  * serial - If true, read one file at a time
  * io_threads - How many threads to use for I/O

Results
||SSD/HDD||Serial/Parallel||Num. I/O Threads||% Max Performance||
|SSD|Serial|1|89%|
|SSD|Parallel|1|76%|
|SSD|Serial|8|87%|
|SSD|Parallel|8|97%|
|HDD|Serial|1|78%|
|HDD|Parallel|1|55%|
|HDD|Serial|8|99.8%|
|HDD|Parallel|8|83%|

Conclusion:

If Arrow simply tries to read everything in parallel, and allocates multiple 
threads for I/O then we are probably good in most cases but we will under 
perform by about 15% if run on an HDD.

I need to run my experiments on AWS to see how S3 and EBS perform under this 
load.  Some EBS drives are backed by an HDD and these may underperform.  For S3 
we would likely need even more extensive experiments to see how many files we 
can truly read at once and it may depend on how S3 is actually doing 
replication.


was (Author: westonpace):
I ran some experiments to see how effective the OS disk scheduling policies 
were.  My test was run on an Ubuntu 20.04 machine against an HDD and an SSD.

Methodology:
 * Theoretical max performance was measured using Gnome disks benchmark, this # 
was verified with `pv` of a large file.
 * % of sequential reads was measured using blktrace/blkparse
 * Reads were always never cached in the O/S thanks to judicious use of 
posix_fadvise
 * On this system the only OS scheduler available was mq-deadline, with a 
default read_expire of 500ms
 * "Read serially" with more than 1 thread used pipeline parallelism, but only 
ever one file at a time
 * "NOT read serially" read all files at once, pipeline parallelism was 
theoretically possible but likely unused
 * 10 files, 10MB per file, 10 iterations averaged together
 * All reads were done using block stream readers read with background readahead

Parameters:
 * use_ssd - Which disk to use, 0 means the HDD
 * serial - If true, read one file at a time
 * io_threads - How many threads to use for I/O

Results
||SSD/HDD||Serial/Parallel||# I/O Threads||% Max Performance||
|SSD|Serial|1|89%|
|SSD|Parallel|1|76%|
|SSD|Serial|8|87%|
|SSD|Parallel|8|97%|
|HDD|Serial|1|78%|
|HDD|Parallel|1|55%|
|HDD|Serial|8|99.8%|
|HDD|Parallel|8|83%|

Conclusion:

If Arrow simply tries to read everything in parallel, and allocates multiple 
threads for I/O then we are probably good in most cases but we will under 
perform by about 15% if run on an HDD.

I need to run my experiments on AWS to see how S3 and EBS perform under this 
load.  Some EBS drives are backed by an HDD and these may underperform.  For S3 
we would likely need even more extensive experiments to see how many files we 
can truly read at once and it may depend on how S3 is actually doing 
replication.

> [C++] Filesystem aware disk scheduling
> --------------------------------------
>
>                 Key: ARROW-11583
>                 URL: https://issues.apache.org/jira/browse/ARROW-11583
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> Different filesystems have different ideal access strategies.  For example:
> AWS: Unlimited parallelism?, no penalty for random?
> AWS EBS: Depends
> SSD: Bounded parallelism (# of hw contexts), penalty for random within 
> context.
> HDD: Very limited parallelism (1 usually), penalty for random access
> Currently, Arrow does not factor these access strategies into I/O scheduling. 
>  For example, when reading a dataset of 100 files then it will start reading 
> X files at once (where X is the parallelism of the thread pool).  For AWS 
> this is ideal.  For an HDD this is not.
> The OS does have a scheduler which attempts to mitigate this.  It does not 
> know the scope of the I/O and the dependencies amongt the I/O (e.g. in the 
> above dataset read example it's better to read X quickly and then Y quickly 
> than it is to read X and Y slowly at the same time).  I've run some 
> experiments (see comment) which show the OS scheduler fails to achieve ideal 
> performance in fairly typical cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-11583) [C++] Filesystem aware disk scheduling

Reply via email to