Hi Uwe, Can you please share some details about that design decision "Whenever Lucene updates something in the index, it creates a new file". Is it right understanding that while IndexOutput is open, Lucene continues to use the same output/file. But after it closed (for instance application was restarted), lucene will create a new output-file? Does it mean that in case of many restarts, lucene will create many small files or it reads previous one and writes to the newly created (like merge)?
I am asking because at Lucene 4.1 we used lucene way and interfaces to work with our own data files. We have implemented RepositoryDirectory (FS and RAM) that implemented Directory interface and provided IndexInput and IndexOutput that we used to work with files. We write index and repository data into bucket directory and create a new bucket directory when index + repository reaches 1GB. That's why our raw data file size is usually 300Mb and we appended to it after close/restart. Now to upgrade to lucene 5 and higher we in a position to make a decision: either use our own interface to work with repository (data files) or understand lucene internals/motivation and continue to use it. I believe that lucene should use effective way how it works with Directory and maybe we could continue to use it for "raw data directory" too, but as results we may produce many small files (for every restart) or we will need to merge too big files. Can you point to some internals details? Thanks! Vladimir Kuzmin On Wed, Sep 2, 2015 at 12:47 AM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > Lucene never appends to files, so this is not something that is not used > anywhere. Whenever Lucene updates something in the index, it creates a new > file. In earlier Lucene version there was seeking supported, but this is > removed since Lucene 4.7 (I think). This was just a hack around some > problems (requirement to modify header after writing file), but this is now > solved, so seek() was removed completely. And it won't come back. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -----Original Message----- > > From: Vlad K [mailto:kuzmi...@gmail.com] > > Sent: Wednesday, September 02, 2015 8:07 AM > > To: java-user@lucene.apache.org > > Subject: Lucene 5.2.1: FSDirectory, is it possible to open existing > output for > > append? > > > > FSDirectory createOutput re-creates file because it opens stream with > > TRUNCATE_EXISTING. What is the way to open existing file and append > > data? I used it at Lucene 4.1 to create store with raw messages. I could > use > > Files.newOutputStream directly to do that but I just want to understand > > what is the idea of the design that prohibits appending to existing > data? I > > can't keep IndexOutput always open, at least after restart of > application I > > have to re-open existing data and continue to append. What is the way > > Lucene suggest for that now? > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >