steve you mentioned: >> but to test YARN it has to be visible across processes.
What do you mean by "test yarn"? I think for the FileSystem APIs unit testing, we dont care about YARN, do we? On Thu, Mar 6, 2014 at 6:02 AM, Steve Loughran <ste...@hortonworks.com>wrote: > On 5 March 2014 19:07, Jay Vyas <jayunit...@gmail.com> wrote: > > > Hi HCFS Community :) > > > > This is Jay... Some of you know me.... I hack on a broad range of file > > system and hadoop ecosystem interoperability stuff. I just wanted to > > introduce myself and let you folks know im going to be working to help > > clean up the existing unit testing frameworks for the FileSystem and > > FileContext APIs. I've listed some bullets below . > > > > - byte code inspection based code coverage for file system APIs with a > tool > > such as corbertura. > > > > - HADOOP-9361 points out that there are many different types of file > > systems. > > > > > It adds a lot more structure to the tests with an XML declaration of each > FS (in the -test) JAR. > > It's pretty much complete except for some discrepancies between file:// and > hdfs that I need to fix in file: > -handling of mkdirs if the destination exists and is a file (currently: > returns 0) > -seek() on a closed stream. Currently appears to work, at least on OS/X. > > > > - Creating mock file systems which can be used to validate API tests, > which > > emulate different FS semantics (atomic directory creation, eventual > > consistency, strict consistency, POSIX compliance, append support, > etc...) > > > > That's an interesting thought, adding some inconsistency semantics on top > of an existing FS to emulate blobstore > behaviour. How would you do this? A in-memory RAM FS could do some of this, > but to test YARN it has to be visible across processes. > We'd really need an in-ram simulation of semantics that also offered an RPC > API of some form. > > > > > > > Is anyone interested in the above issues or have any opinions on how / > > where i should get started? > > > > Our end goal is to have a more transparent and portable set of test APIs > > for the hadoop file system implementors, across the board : so that we > can > > all test our individual implementations confidently. > > > > So, anywhere i can lend a hand - let me know. I think this effort will > > require all of us in the file system community to join forces, and it > will > > benefit us all immensly in the long run as well. > > > > > I should do another '9361 patch, once I get those final quirks in file:// > sorted out so that it is consistent with HDFS. > 1. HDFS is and continues to be, the definition of the semantics of all > filesystem interfaces. > 2. It'd be good if we understood more about what accidental features of the > FS code depends on. e.g. does anything rely on mkdirs() being atomic? Of > 0x00 being a valid char in a filename? How do programs fail when blocksize > is too small (try setting it to 1 and see how pig reacts)? How much code > depends on close() being near-instantaneous and never failing? Blobstores > do their write then, and can break both these requirements -which is > something a mock FS could add atop file: > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > -- Jay Vyas http://jayunit100.blogspot.com