On 10/11/07, Keith R. Bennett <[EMAIL PROTECTED]> wrote: > > Hello, all. I am working with the Apache Tika project. We found the need to > get a newly opened input stream from the user, and possibly read it multiple > times. I am aware of the mark and release methods, but we needed to support > streams of arbitrary length, so I thought we'd have to figure something else > out.
I don't see anything in the javadocs for the mark/reset methods in InputStream that prevent it from being used for streams of arbitrary length. Is this an assumption based on the fact that the mark method specifies a "readLimit" parameter? In the reset method javadocs it only says an IOException "Might" be thrown if the readLimit has been exceeded - so my take is that it would not be inconsistent to create an implementation that ignores that parameter. Better IMO to use these than invent a new "rewind" method. > I created a class, and I'd like your feedback on it. If you'd like to > include it, or something based on it, in a future version of your project, > feel free. Or, if it's a bad idea, or you can suggest modifications or a > totally different approach that would fulfill the need more wisely, please > let me know. I have a couple of comments. Firstly although the InputStream byte array read methods delegate to the single byte read method - not all implementations do (FileInputStream doesn't appear to) and so RereadableInputStream funnelling all reads (and writes) through the single byte read method could limit this impl. from taking advantage of any performance benefits that the streams it delegates to might have when processing an array of bytes. So my suggestion would be to also implement the read(byte[], offset, length) as well. Secondly, from a Commons IO perspective, we already have some of the functionality for some parts of what you're trying to achieve: 1) DeferredFileOutputStream (see http://tinyurl.com/yoj4kd) - writes to a byte array until a threshold is reached and then switches to a file. Doesn't currently support temporary files, but could be easily (IMO) enhanced to do so (see http://tinyurl.com/ysvrh9) 2) TeeInputStream (see http://tinyurl.com/yszejy) - as it reads an InputStream it also writes to an OutputStream So to achieve something like the functionality in your "first pass", you could do something like File tempFile = new File("tikka.tmp"); DeferredFileOutputStream deferred = new DeferredFileOutputStream(1024, tempFile); InputStream currentInput = new TeeInputStream(origInput, deferred); After the streams been processed thru' once - then you could switch the current input stream: if (deferred.isInMemory()) { currentInput = new ByteArrayInputStream(deferred.getData()); } else { currentInput = new RereadableFileInputStream(deferred.getFile()); } RereadableFileInputStream doesn't yet exist, but a proxy that supports mark/reset and closes/re-creates an underlying FileInputStream on reset. AIUI ByteArrayInputStream already supports mark/reset - so whereever the stream is cached it could use the standard mark/reset to re-position. The main advantage of this is that if the different pieces of functionality that make up your RereadableInputStream are broken down in to smaller/simpler components it makes it much easier to test those indivdually and then compose them together to create the more complex behaviour you require. Niall > It's called RereadableInputStream. It saves the bytes read from the > original stream in a byte [], until a user-specified threshold is reached, > then it moves the buffer to a temporary file. > > I'm attaching the file and a basic unit test class to this message. This > version is newer than the one currently in Tika's subversion repository. > For reasons that I won't bore you with, this version is not yet committed. > > Thanks for any help you can offer. > > Regards, > Keith Bennett > > http://www.nabble.com/file/p13164204/RereadableInputStream.java > RereadableInputStream.java > http://www.nabble.com/file/p13164204/RereadableInputStreamTest.java > RereadableInputStreamTest.java > -- > View this message in context: > http://www.nabble.com/RereadableInputStream-tf4609782.html#a13164204 > Sent from the Commons - Dev mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]