Andrew Ford writes:
 > Tim Bunce writes:
 >  > On Fri, Dec 17, 1999 at 02:56:49PM +0000, Andrew Ford wrote:
 >  > > Tim Bunce writes:
 >  > >  > On Fri, Dec 17, 1999 at 12:55:12PM +0000, Andrew Ford wrote:
 >  > >  > > I have an urgent need for a module to tie an array of integers to an
 >  > >  > > mmap'ed file (a sparse array of several hundred million integers).  I
 >  > >  > > have looked at Mmap.pm by Malcolm Beattie and seen the idea for
 >  > >  > > Array::Virtual registered by Larry Wall and have started implementing
 >  > >  > > the module I need.  But should it be called Array::Virtual (taking
 >  > >  > > Larry's slot) or Mmap::Array?
 >  > >  > > 
 >  > >  > > The interface I have in mind is:
 >  > >  > > 
 >  > >  > >     use Array::Virtual;            # or Mmap::Array
 >  > >  > > 
 >  > >  > >     my @array;
 >  > >  > >     open(FH, "...");
 >  > >  > >     my $nel  = 42;
 >  > >  > >     my $prot = "rw";       # or "ro", or "wo" or should it be PROT_READ?
 >  > >  > >     my $shared = 1;        # or should it be MAP_SHARED?
 >  > >  > >     my $offset = 0;     # this would be the default anyway
 >  > >  > >     my $type   = "int4"    # some set of (string) literals for 1, 2, 4, 8 
 >  > >  > >                    # byte integers in native or network byte
 >  > >  > >                    # order, plus floating point (default probably int4)
 >  > >  > > 
 >  > >  > >     tie @array, $nel, $prot, $shared, FH, $offset, $type;
 >  > >  > > 
 >  > >  > >     $array[0] = 42; 
 >  > >  > >     #etc
 >  > >  > > 
 >  > >  > >     undef @array;
 >  > >  > > 
 >  > >  > > Any thoughts?
 >  > >  > 
 >  > >  > The word 'Virtual' doesn't carry much meaning here. Maybe:
 >  > >  > 
 >  > >  >      Tie::MmapArray
 >  > >  > 
 >  > >  > I'd switch to using a hash of named parameters to the tie.
 >  > >  > 
 >  > >  > I'd also use pack() letters to describe the element type (which would
 >  > >  > neatly expand to a string of letters for arrays of structures).
 >  > >  > 
 >  > > 
 >  > > Thanks for the prompt feedback.  I agree about the name and using a
 >  > > hash for parameters, so the call will now look like:
 >  > > 
 >  > >    use Tie::MmapArray;
 >  > >    use Fcntl;
 >  > > 
 >  > >    tie @array, { fh     => $fh,
 >  > >                  eltype => "l",
 >  > >                  nels   => 42,
 >  > >                  mode   => O_RDWR,  # or "rw"
 >  > >                  shared => 1,
 >  > >                  offset => 0 };
 >  > > 
 >  > > This raises a couple of issues:
 >  > > 
 >  > > 1. If the fh parameter is not specified then this becomes an anonymous
 >  > >    mmap call, which is probably not a sensible default.  Should I have 
 >  > >    the filehandle as a separate parameter (e.g. tie @array, FH, $href)
 >  > >    and require an explicit undef for anonymous mmap'ing (or require an 
 >  > >    explicit undef for the value of the "fh" element of the hash)?
 >  > 
 >  > Umm, the former seems better, I think. But I imagine many people would
 >  > want to just pass in a file name and let the module look after opening it.
 >  > 
 > 
 > Ah!  I didn't think of something that simple.  As Mmap.pm wants a
 > filehandle I thought I had to present the same interface, but I
 > suppose it ain't necessarily so.
 > 
 > OK, so now we have:
 > 
 >     use Tie::MmapArray;
 >  
 >     tie @array, $filename, { eltype => "l",
 >                              nels   => 42,
 >                              mode   => "rw",    # or "ro" or "wo"
 >                              shared => 1,
 >                              offset => 0 };
 > 
 > and if $filename is actually a filehandle we just do The Right Thing.
 > 
 > 
 >  > > 2. The mode could be a numeric parameter with values:
 >  > >    O_RDONLY, O_WRONLY or O_RDWR (from Fcntl).  I know that mmap uses
 >  > >    PROT_READ, PROT_WRITE, but not many people are that familiar with
 >  > >    mmap compared to open(2).
 >  > 
 >  > True. I was confusing mode (mmap prot arg) with seperate mmap flags arg.
 >  > 
 >  > >    I could allow both the Fcntl constants 
 >  > >    and the strings "ro", "wo" and "rw".
 >  > 
 >  > Given the above I'd say just go with the strings. Keep it simple.
 >  > 
 >  > > 3. using the pack letters does not allow all (sensible) possibilities
 >  > >    (signed/unsigned, network/native, 1/2/4/8 integers or float/double).
 >  > >    specifically it looks like unsigned network order [is not covered]
 >  > 
 >  > Then submit a perl patch that adds them, if it is really needed.
 >  > See Porting/patching.pod in the perl source directory.
 >  > [CC'd to perl5-porters for comment].
 > 
 > My client actually uses 3 byte integers to save diskspace in some
 > circumstances ;-)
 > 
 >  > 
 >  > >    and 8 byte integers are not covered.
 >  > 
 >  > >From the 5.005_03 manual:
 >  > 
 >  >     q   A signed quad (64-bit) value.
 >  >     Q   An unsigned quad value.
 >  >       (Available only if your system supports 64-bit integer values
 >  >        _and_ if Perl has been compiled to support those.
 >  >            Causes a fatal error otherwise.)
 >  > 
 > 
 > Johan's Perl5 Pocket Reference doesn't have q or Q so I missed that.
 > 
 > 
 >  > >    I could allow the letters to be
 >  > >    qualified, e.g. i8 would be an eight-byte integer, and c22 would be 
 >  > >    an array of 22-character, fixed-length strings.
 >  > 
 >  > I'd suggest you just follow the well defined pack() syntax. If you need
 >  > something outside that then (try to) either patch perl's pack to add
 >  > it, or agree with perl5-porters what a safe 'escape' syntax would be so
 >  > you'll be safe from future additions to pack().
 > 
 > Actually if the pack letters cover everything I'm interested in then
 > something like i8 could create a two dimensional array.  One could
 > even extend the metaphor such that something like
 > 
 >     tie @array, $file, { eltype => "ia24i", ... };

that should of course be

   tie @array, 'Tie::MmapArray', $file, { eltype => "ia24i", ... };

 > 
 > would tie @array to a file with the record structure given, so that
 > $array[$n] returns a reference to a three-element array, the elements
 > of which are an integer, a 24-character string and an integer
 > repectively, and hey presto we've got access to the fields of a
 > record-oriented file.  (Mmm, I wonder what eltype => "perl" should do.)
 > 
 > I might leave this level of complexity from the first version of the
 > module though ;-)
 > 


The initial version is now on http://www.ford-mason.co.uk/resources/

I'll upload to CPAN once I've had a chance to test it some more, flesh
out some of the missing functionality and integrate any changes in
response to feedback.

Andrew
-- 
Andrew Ford,  Director       Ford & Mason Ltd           +44 1531 829900 (tel)
[EMAIL PROTECTED]      South Wing, Compton House  +44 1531 829901 (fax)
http://www.ford-mason.co.uk  Compton Green, Redmarley   +44 385 258278 (mobile)
http://www.refcards.com      Gloucester, GL19 3JB, UK   

Reply via email to