Ideally [at least, what I would like], managing a file on a remote
resource should be the same as managing one locally, eg.
my Amazon $fn = open("$path-to-input-file-location/$file-name", :r) or
die $!;
for $fn.readlines { };
$fn.close;
my Google $fn = open("$path-to-output-file-location/$file-name", :w) or
die $!;
for @lots-of-data -> $item { $fn.say: $item };
$fn.close;
If it is possible to make file on a cloud system as easy to manipulate
as a local file, that will aid the acceptibility of perl6.
Hence, the use of 'open', 'close', 'say', 'readlines', rather than
'connect', 'put', 'get'.
The issue - it seems to me - is that the connection between a computer
and its local filesystem has such a low probability of failure (viz.,
disk failure or disk full) that is can normally be ignored, but there is
relatively high probability of a connection failure between the computer
and a remote site and so it would be normal to have exception handling
in the code.
That means the designer of the API should indicate how these exceptions
should be handled so that a program doesnt hang in a loop that is
expecting data from a remote resource whose server has disconnected.
On 06/10/2010 03:24 AM, Aaron Sherman wrote:
On Wed, Jun 9, 2010 at 10:04 AM, Aaron Sherman<a...@ajs.com> wrote:
Has anyone begun to consider what kind of filesystem interface we want
for things like sftp, Amazon S3, Google Storage and other remote
storage possibilities? Is there any extant work out there, or should I
just start spit-balling?
In the absence of anything forthcoming and a totally arbitrary sense
of urgency ;-) here's what I think I should do:
IO::FileSystems (S32) gives us some basics and the Path role also
provides some useful features.
I will start there and build an IO::FileSystems::VFS roughly like:
class IO::VFS is IO::FileSystems {
...
# Session data if applicable
has IO::VFS::Session $.session;
# Many methods take a $context which, if supplied
# will contain back-end specific data such as restart markers
# or payment model information. I'll probably define
# a role for the context parameter, but otherwise
# leave it pretty loose as a back-end specific structure.
# A simple operation that guarantees a round-trip to the filesystem
method nop($context?) { ... }
# list of sub-IO::VFS partitions/buckets/etc
method targets($context?) { ... }
method find_target($locator, $context?) { ... }
# Means of acquiring file-level access through a VFS
method find($locator, $enc = $.session.encoding, $context?) { ... }
method glob(Str $matcher, $enc = $.session.encoding, $context?) { ... }
# Like opening and writing to filehandle, but the operation is totally
# opaque and might be a single call, senfile or anything else.
# Note that this doesn't replace $obj.find($path).write(...)
method put($locator, $data, $enc = $.session.encoding, $context?) { ... }
# Atomic copy/rename, etc. are logically filesystem operations, even though
# they might have counterparts at the file level. The distinction being that
# at the filesystem level I never know nor care what the contents of the
# file are, I just ask for an operation to be performed on a given path.
method copy($from, $to, $enc = $.session.encoding, $context?) { ... }
method rename($from, $to, $enc = $.session.encoding, $context?) { ... }
method delete($locator, $enc = $.session.encoding, $context?) { ... }
# service-level ACLs if any
method acl($locator, $context?) { ... }
}
The general model I imagine would be something like:
my IO::VFS::S3 $s3 .= new();
$s3.session.connect($amazonlogininfo);
my $bucket = $s3.find_target($bucket_name);
$bucket.put("quote.txt", "Now is the time for all good men...\n");
say "URI: ", $bucket.find("quote.txt").uri;
or
my IO::VFS::GoogleStorage $goog .= new();
$goog.session.connect($googlelogininfo);
my $bucket = $goog.find_target($bucket_name);
$bucket.put("quote.txt", "Now is the time for all good men...\n");
say "URI: ", $bucket.find("quote.txt").uri;
or
my IO::VFS::SFTP $sftp .= new();
$sftp.session.connect(:host<storage>, :user<ajs>, :password<iforgotit>);
my $filesystem = $sftp.find_target("/tmp");
$filesystem.put("quote.txt", "Now is the time for all good men...\n");
say "URI: ", $filesystem.find("quote.txt").uri; # using sftp:...
Notice that everything after $obj.session.connect is identical except
for my choice of variable names. In fact, you shouldn't have to worry
about what storage back-end you're using as long as you have a valid
VFS handle. Really path names are the only thing that might trip you
up.
Thoughts?
I think that in order to do this, I'll need the following support
libraries which may or may not exist (I'll be looking into these):
IO::FileSystems
Path
HTTP (requires socket IO, MIME, base64, etc.)
Various crypto libs
I don't intend to provide a finished implementation of any of these
where they don't already exist (I may not even end up with a final
implementation of the VFS layer), but at least I'll get far enough
along that others who want to work on this will have a starting point,
and I'll want to at least have a test that fakes its way all the way
down to creating a remote file on all three services, even if most of
the operations involve passing on blobs of data generated by
equivalent calls in other languages.