Re: [zfs-discuss] Self-tuning recordsize

Erblichs Tue, 17 Oct 2006 01:59:22 -0700

Group, et al, 

        I don't understand that if the problem is systemic based on
        the number of continual dirty pages and stress to clean
        those pages, then why .....


        If the problem is FS independent, because any number of
        different installed FSs can equally consume pages.
        Thus, to solve the problem on a per FS basis seems to me a
        bandaid approach..

        Then why doesn't the OS determine that a dangerous level of high 
        watermark number of pages are continually being paged out 
        (we have swapped and have a large percentage of available pages 
        always dirty: based on recent past history) and thus,

         * force the writes to a set of predetermined pages (limit the
           number of pages for I/O),
         * these pages get I/O scheduled immediately, not waiting for
           a need for these pages and finding them dirty, 
           (hopefully a percentage of these pages will be cleaned and
            be immediately available if needed in the near future),

         Yes, the OS could redirect the I/O as being direct without
         using the page cache, but the assumption is that these
         procs are behaving as multiple-readers and need the cached
         page data in the near future. Thus, changing the behaviour
         to remove whether the pages are cached bcause they CAN
         totally consume the cache removes the multiple-reader
         reader to cache the data in the first place, thus...


        *  guarantee that heartbeats are always regular by preserving
           5 to 20% of pages for exec / text,
        *  limit the number of interrupts being generated by network
           so low level SCSI interrupts can page and not be starved,
           (something the white paper did not mention),
           (yes, this will cause the loss of UDP based data but we
            need to generate some form of backpressure / explicit
            congestion event),
        * if the files coming in from network were TCP based, hopefully
          a segment would be dropped and act as a backpressure to
          the originator of the data,
        * if the files are being read from the FS, then a max I/O rate 
          should be determined based on the number of pages that are 
          clean and ready to accept FS data,
        *  etc

        Thus, tuning to determine whether the page cache should be used
        for write or read, should allow one set of processes not to
        adversely effect the operation of other processes.

        And any OS, should only slow down the dirty I/O pages for
        those specific processes and other processes work being
        unaware of the I/O issues..

        Mitchell Erblich
        ---------------------

Richard Elling - PAE wrote:
> 
> Roch wrote:
> > Oracle will typically create it's files with 128K writes
> > not recordsize ones.
> 
> Blast from the past...
>         http://www.sun.com/blueprints/0400/ram-vxfs.pdf
> 
>   -- richard
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Self-tuning recordsize

Reply via email to