Hi Dan, My guess, though you didn't directly say so, is that you're representing each sentence/"line" as a separate Lucene document. To directly answer your question about whether inter-document relations (like database joins) are queryable in Lucene, I don't think so, other than performing multiple searches, where you feed the results of one query into another one (e.g.: first query for all lines with tag X, retrieve the line-ID and transcript-ID field values, then query for tag Y, requiring the same transcript-ID field value, and any one of the line-ID values that are within the window you want).
If instead (or perhaps in addition, depending on your other needs), each full transcript is a Lucene document, you can perform the kinds of searches you're talking about with tools available in Lucene. I'm thinking of a lucene document with a "line-tags" field, populated with the tags you've associated with each line, and with the position of each line tag adjusted so that two tags assigned to the same line are given the same position (sometimes Lucene users call terms with the same position "synonyms", because that's the most common thing this capability is used for). Then you can run a SpanNearQuery over the line-tags field, to return matches where tag X is within N lines of tag Y. (See <http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/spans/package-summary.html> for info on the Lucene Span Query family.) Steve On 4/8/2009 at 9:33 AM, Dan Scrima wrote: > So I have a requirement where I have a directory filled with xml files. > I wrote a parser to parse these files, and index all of the xml > attributes and properties into documents. An example of one of these > documents is below. I'm parsing sentences into words, and tagging the > sentences based on certain criteria. > > My issue is trying to find out if lucene can handle cross-document > searching. So below is indexed as a single document... and there will > be multiple sentences before, after, and throughout an entire > transcript. Is it possible somehow to say, "I want a result where one > line marked as Symptom is 5 lines away from another line marked as > Brand." So in essence, I'm trying to search across multiple lucene > documents. > > Any thoughts or literature out there? > > <transcript> > <line id="1"> > <tag id="10" type="Symptom" /> > <tag id="12" type="Brand" /> > <word> > <token>Coughing</token> > <part-of-speech>SBJ</part-of-speech> > </word> > <word> > <token>is</token> > <part-of-speech>VB</part-of-speech> > </word> > <word> > <token>caused</token> > <part-of-speech>NP</part-of-speech> > </word> > <word> > <token>by</token> > <part-of-speech>PP</part-of-speech> > </word> > <word> > <token>Mucinex</token> > <part-of-speech>PDC</part-of-speech> > </word> > </line> > </transcript> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org