So I have a requirement where I have a directory filled with xml files.
I wrote a parser to parse these files, and index all of the xml
attributes and properties into documents. An example of one of these
documents is below. I'm parsing sentences into words, and tagging the
sentences based on certain criteria.

My issue is trying to find out if lucene can handle cross-document
searching. So below is indexed as a single document... and there will be
multiple sentences before, after, and throughout an entire transcript.
Is it possible somehow to say, "I want a result where one line marked as
Symptom is 5 lines away from another line marked as Brand." So in
essence, I'm trying to search across multiple lucene documents.

 

Any thoughts or literature out there?

 

<transcript>

                <line id="1">

                                <tag id="10" type="Symptom" />

<tag id="12" type="Brand" />

                                <word>

                                                <token>Coughing</token>

 
<part-of-speech>SBJ</part-of-speech>

</word>

<word>

                                                <token>is</token>

 
<part-of-speech>VB</part-of-speech>

</word>

<word>

                                                <token>caused</token>

 
<part-of-speech>NP</part-of-speech>

</word>

<word>

                                                <token>by</token>

 
<part-of-speech>PP</part-of-speech>

</word>

<word>

                                                <token>Mucinex</token>

 
<part-of-speech>PDC</part-of-speech>

</word>

                </line>

</transcript>

 

 

Thanks so much!

Reply via email to