Many thanks Erick, Your points are valid, i was thinking entire Log file as a lucene document, im wrong trying to chop the log file might be the way to go
my bad expressions , yes you got that right timestamp must be added as a "FIELD" that is what i meant really appreciate your detailed reply, regards, Abdul --- Erick Erickson <[EMAIL PROTECTED]> wrote: > Let me take a crack at it. See below... > > On 12/13/06, abdul aleem <[EMAIL PROTECTED]> > wrote: > > > > Hello All, > > > > Apolgies if it is a naive question > > > > a) Indexing large file ( more than 4MB ) > > Do i need to read the entire file as string > using > > java.io and create a Document object ? > > > Essentially yes. IF you must index the whole > document as a single Lucene > document. my reply later for why you may not want to > do this. > > The following are equivalent. > Document doc = new Document(); > doc.add("textfield", "data1 data2 data3"); > doc.add("textfield", "data4 data5 data6"); > > > and > Document doc = new Document(); > doc.add("textfield", "data1 data2 data3 data4 data5 > data6"); > > So you could read chunks of your file and index them > into the *same* field > before writing the document to the index, or you > could read the file as a > single chunk and index it all at once. > > HOWEVER: note that the default in Lucene is to index > only the first 10,000 > (?) tokens for a field in a single document. See > IndexWriter.SetMaxFieldLength. > > > > The file contains timestamp, if i need to index > on > > timestamp is parsing the entire file manually > > (tokenizing) and store the timestamp as document > > object is the only way ? or is there any > alternatives > > ? > > > Perhaps I don't understand the problem. You can > store the timestamp as a > *field* in a document. Is that what you mean by > storing it as a "document > object"? But there's no way I know of to have Lucene > automatically do > something with arbitrary text in the input stream. > You could write a custom > analyzer (see Lucene In Action for the Synonym > Analyzer for a model). That > Analyzer would be responsible for recognizing > timestamps in the input stream > and doing something special with them. > > > b) I need to search the contents of a log file which > > is changing rapidly, from initial testing i see if > any > > changes in the file is not reflected unless it is > > *Indexed* again > > > > Do we need to index the files always before > search > > if the content of the file is dynamically changed > > > Yes. That is, you can't search data you haven't > indexed. > > > ( log file has a pattern and always logs in a > > similar fashion, each time i need to index takes > lot > > of time as the file is large (approach a ) is > there > > any work arounds for this ? ) > > > I think you need to re-think your approach. A > document in Lucene is whatever > you want to think of it as. For instance, you could > index your log file such > that each Lucene "document" was all the data added > to the log file over some > specified time interval. So, say you have a log that > starts at midnight. > Each Lucene "document" could be all the data added > to the index for each > minute. So you'd have a document for the data added > to the log between 12:00 > and 12:00:59. Another "document" for all data > between 12:01:00 and 12:01:59 > etc. That way, you don't have to re-index the entire > log, just everything > since the last interval you already indexed. > > I'm not necessarily recommending this approach, but > using it to illustrate > that you don't need to think of a Lucene Document as > your entire log file. > You may be much better off slicing the data up > somehow and having a > one-to-many relationship between your log and the > Lucene documents...... > > Perhaps you could index every message as an > individual document (which would > deal with your timestamp issue), or..... > > Hope this helps > Erick > > I would greatly appreciate if any inputs on the > > above, > > > > Many thanks, > > Abdul > > > > > > > > > > > > > > > > > ____________________________________________________________________________________ > > Need a quick answer? Get one in minutes from > people who know. > > Ask your question on www.Answers.yahoo.com > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > ____________________________________________________________________________________ Do you Yahoo!? Everyone is raving about the all-new Yahoo! Mail beta. http://new.mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]