I think my questions wasn't clear.. Let's say I'm doing something like that (c# code, but that's not the issue..)
TextReader reader=new StreamReader("C:\FileToIndex.txt"); Int lineCount=CountLines("C:\FileToIndex.txt"); //This ones reads the entire file and count the number of lines Document doc=new Document(); doc.add(Field.Text("body"),reader); doc.add(Field.Keyword("lineCount"),lineCount); In the above example, I'm reading the entire file twice. This could be a 100Mb file. Now, Let's say I have a class LineCountingTextReader that counts the lines as the file is being read. If I do the following doc.add(Field.Text("body"),lcReader); Then only after I call IndexWriter.AddDocument I will actually have the line count (since only then the file will be read entirely). I don't want to read the entire file into memory and use it for both line counting and analyzing since it may be a very big file. So I'm wondering what other are doing? This is also a problem when you need to get several pieces of information from 1 file to different fields (i.e. analyze an html file and also get the links from it and add them to other a different field). Thanks in advance, Eyal > -----Original Message----- > From: Eyal Post [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 01, 2006 8:24 AM > To: java-user@lucene.apache.org > Subject: Adding line count to a document > > I'd like to add a line count field to my indexed document. > The obvious way is to read my file twice, once to tokenize it > and add it's content to a field in the document and once to > count the number of lines in it and add it to another field. > Any idea how can I optimize this and read the file once? > > Regards, > Eyal > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]