Yes, read and index each line. If that's a performance problem I suggest you upgrade your hardware. Try it - never worry about performance in advance. Bottlenecks are generally not where you expect.
-- Ian. On Thu, Jul 4, 2013 at 10:53 AM, Ankit Murarka <ankit.mura...@rancoretech.com> wrote: > Thanks.Indeed I am indexing each file. But how do I index each line of a > file. > This will essentially mean-> First I need to index each file to know whether > the word exist or not. Then I need to index each line of the file to know > them location. This does not seem to be a problem. > > Problem is If I specify the file name to index, the file will be indexed. If > i specify the directory name, all the file inside that directory will be > indexed. But how do I go about indexing each line of a file. > > Does this mean, get each line in file and feed it to lucene so that indexes > can be generated. This will be very resource extensive as well as severly > hit performance issue. > > On 7/4/2013 2:04 PM, Ian Lea wrote: >> >> Sounds like you're indexing each log file as one lucene document. >> Obvious answer is to index each line in each log file as a separate >> doc. Searches would then match lines in files and you can display >> those lines, summarizing counts per file if you want that, >> >> If you wanted to be able to show surrounding lines, index the line >> number and the file name. So if you got a hit on line 12345 of file >> logabc.txt you could execute a second search with logfilename: >> logabc.txt AND lineno:[12340 TO 12350] to get 5 lines either side. >> Use a NumericField and NumericRangeQuery for lineno if you are >> concerned about performance. See recent thread on this list for more >> on that. >> >> >> -- >> Ian. >> >> >> On Thu, Jul 4, 2013 at 8:10 AM, Ankit Murarka >> <ankit.mura...@rancoretech.com> wrote: >> >>> >>> Dear Team, >>> I have a potential usecase. I have large number of log >>> files which are archived in a particular directory. Now the administrator >>> would like to view certain information which might/might not be present >>> in >>> any of the files inside the directory. >>> >>> Using lucene, I was able to get whether the specific word he is searching >>> for is present in the files or not and in which files they are present. >>> >>> BUT, is it possible to find the location of that word inside the file. >>> Each >>> file is about 5 MB and does not really make sense to parse the file to >>> know >>> the location of a certain word which is present. >>> >>> Can lucene help in this regard? Or atleast a close approximation of its >>> location in the file. I would be wishing to show atleast 256KB of data >>> from >>> the point that word is present int he file. >>> >>> Googled a lot but to no avail. >>> >>> -- >>> Regards >>> >>> Ankit >>> >>> "Peace is found not in what surrounds us, but in what we hold within." >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > > -- > Regards > > Ankit Murarka > > "Peace is found not in what surrounds us, but in what we hold within." > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org