Hello,

I'm needing to search and collect very large sets of files using a variety
of criteria. All my research for libraries to support this keeps coming back to 
Ant and its powerful FileSet and DirectoryScanner. So I'm hoping to make use 
of them, especially since Ant works on several platforms.

Given that the number of files will be very large, I'm concerned about how 
DirectoryScanner blocks until it has collected all the results in memory. 
I ran across one case where the DirectoryScanner search duration exceeded 
a desired time frame https://bz.apache.org/bugzilla/show_bug.cgi?id=57253
so I know this isn't just a theoretical concern.

Now I can't be the first person to consider the idea of accessing
DirectoryScanner results before they are complete, or to actively store the
results persistently, like in a database. Which is why I wanted to run the idea 
by this mailing list in case this is a terrible idea, or if there's a better 
way, or if someone has already done it.

Implementation wise, it doesn't seem very difficult. Looking at the design of 
DirectoryScanner, I see all results are ultimately stored in Vectors. 
So I could extend Vector and override all behavior with database queries. 
And since DirectoryScanner exposes the ability to set the Vectors 
(via protected methods), I would extend DirectoryScanner to use this new 
database backed Vector.

Doing this would offer the benefits of
* Avoiding memory limits
* Access to the results from other threads or processes while the search 
is in progress
* Still supporting Vector.contains() lookup capability used during the search
* and the database allows for various real-time searching, aggregating, 
reporting, etc.

Again, I can't be the first one to consider this, but I can't find any mention
of such an idea anywhere. So if anyone has any thoughts on the matter, I would 
definitely appreciate any feedback.

Thank you for your time and consideration.

Regards,
Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@ant.apache.org
For additional commands, e-mail: user-h...@ant.apache.org

Reply via email to