Re: Possible location of word inside the file.

Ian Lea Thu, 04 Jul 2013 03:34:14 -0700

Yes, read and index each line.  If that's a performance problem I
suggest you upgrade your hardware.  Try it - never worry about
performance in advance.  Bottlenecks are generally not where you
expect.



--
Ian.

On Thu, Jul 4, 2013 at 10:53 AM, Ankit Murarka
<ankit.mura...@rancoretech.com> wrote:
> Thanks.Indeed I am indexing each file. But how do I index each line of a
> file.
> This will essentially mean-> First I need to index each file to know whether
> the word exist or not. Then I need to index each line of the file to know
> them location. This does not seem to be a problem.
>
> Problem is If I specify the file name to index, the file will be indexed. If
> i specify the directory name, all the file inside that directory will be
> indexed. But how do I go about indexing each line of a file.
>
> Does this mean, get each line in file and feed it to lucene so that indexes
> can be generated. This will be very resource extensive as well as severly
> hit performance issue.
>
> On 7/4/2013 2:04 PM, Ian Lea wrote:
>>
>> Sounds like you're indexing each log file as one lucene document.
>> Obvious answer is to index each line in each log file as a separate
>> doc.  Searches would then match lines in files and you can display
>> those lines, summarizing counts per file if you want that,
>>
>> If you wanted to be able to show surrounding lines, index the line
>> number and the file name.  So if you got a hit on line 12345 of file
>> logabc.txt you could execute a second search with logfilename:
>> logabc.txt AND lineno:[12340 TO 12350] to get 5 lines either side.
>> Use a NumericField and NumericRangeQuery for lineno if you are
>> concerned about performance.  See recent thread on this list for more
>> on that.
>>
>>
>> --
>> Ian.
>>
>>
>> On Thu, Jul 4, 2013 at 8:10 AM, Ankit Murarka
>> <ankit.mura...@rancoretech.com>  wrote:
>>
>>>
>>> Dear Team,
>>>                   I have a potential usecase. I have large number of log
>>> files which are archived in a particular directory. Now the administrator
>>> would like to view certain information which might/might not be present
>>> in
>>> any of the files inside the directory.
>>>
>>> Using lucene, I was able to get whether the specific word he is searching
>>> for is present in the files or not and in which files they are present.
>>>
>>> BUT, is it possible to find the location of that word inside the file.
>>> Each
>>> file is about 5 MB and does not really make sense to parse the file to
>>> know
>>> the location of a certain word which is present.
>>>
>>> Can lucene help in this regard? Or atleast a close approximation of its
>>> location in the file. I would be wishing to show atleast 256KB of data
>>> from
>>> the point that word is present int he file.
>>>
>>> Googled a lot but to no avail.
>>>
>>> --
>>> Regards
>>>
>>> Ankit
>>>
>>> "Peace is found not in what surrounds us, but in what we hold within."
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
>
> --
> Regards
>
> Ankit Murarka
>
> "Peace is found not in what surrounds us, but in what we hold within."
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Possible location of word inside the file.

Reply via email to