> From: edmud...@mail.bounceswoosh.org > [mailto:edmud...@mail.bounceswoosh.org] On Behalf Of Eric D. Mudama > > >Unless your drive is able to queue up a request to read every single used > >part of the drive... Which is larger than the command queue for any > >reasonable drive in the world... The point is, in order to be "optimal" you > >have to eliminate all those seeks, and perform sequential reads only. The > >only seeks you should do are to skip over unused space. > > I don't think you read my whole post. I was saying this seek > calculation pre-processing would have to be done by the host server, > and while not impossible, is not trivial. Present the next 32 seeks > to each device while the pre-processor works on the complete list of > future seeks, and the drive will do as well as possible.
I did read that, but now I think, perhaps I misunderstand it, or you misunderstood me? I am thinking... If you're just queueing up a few reads at a time (less than infinity, or less than 99% of the pool) ... I would not assume that these 32 seeks are even remotely sequential.... I mean ... 32 blocks in a pool of presumably millions of blocks... I would assume they are essentially random, are they not? In my mind, which is likely wrong or at least oversimplified, I think if you want to order the list of blocks to read according to disk order (which should at least be theoretically possible on mirrors, but perhaps not even physically possible on raidz)... You would have to first generate a list of all the blocks to be read, and then sort it. Rough estimate, for any pool of a reasonable size, that sounds like some GB of ram to me. Maybe there's a less-than-perfect sort algorithm which has a much lower memory footprint? Like a simple hashing algorithm that will guarantee the next few thousand seeks are in disk order... Although they will skip or jump over many blocks that will have to be done later ... An algorithm which is not a perfect sort, but given some repetition and multiple passes over the disk, might achieve an acceptable level of performance versus memory footprint... _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss