: I'm about to embark on implementing the full-text search feature of XQuery:
Good luck with that. Here's some quick suggestions on how i'd try to tackle the things you asked about, w/o putting much thought into... : title ftcontains "usability" occurs at least 2 times assuming this is just term based (and not complex subclauses) i would write a custom subclass of TermQuery that enforces a minimum term frequency. : title ftcontains "improve" with stemming index two versions of every field - one with stemming and one w/o : This allows you to specify -- at query-time -- one of "case : insensitive", "case sensitive", "lowercase", "uppercase". I have no idea what it would mean to match something "uppercase" or "lowercase" -- unless that's just syntactic suger for "uppercase by input, and then look for a case sensitve match) but again: two fields for case sensitive/insensitive : This is similar to the Cast Option except its "diacritics insensitive" : or "diacritics sensitive. How about implementing this? two fields, again. ...at this point, if you need to support all permutations of these options you are looking at 2*2*2 index fields per source field ... so you start getting into hte realm where i might consider keeping them all in one field, using Payloads to note the various attributes that each Term has. : abstract ftcontains "propagating of errors" : with stop words ("a", "the", "of") : : would match a document with an abstract that contains "propagating few : errors". It seems odd, I know. It's as if the stop words become : wildcards, i.e.: are you serious? ... so if i query for "A of the B" with stop words ("of", "the") then that has to match "A totally ridiculous B" ? ... that makes no sense what so ever. why require so much verbosity just to get a "gap" that matches anything? that seems like a straight query parsing problem ... if you see one of the terms in teh stop work list, strip it out, and increase the phrase slop on the PhraseQuery you are building. : body ftcontains "Mexico" not in "New Mexico" SpanNotQuery : title ftcontains ("web site" ftand "usability") ordered SpanNearQuery : abstract ftcontains "usability" ftand "web site" same sentence : : You can also do any combination of {same|different} : {sentence|paragraph}. My guess for this would also be to keep track of : sentence/paragraph data in a payload. Yes? sounds right. : book ftcontains "Web Usability" without content $x//annotation depends on how you plan on indexing all of hte context stuff ... if the tags are Terms then a SpanNOtQuery would work ... if they are Payloads you just need some sort of SpanTermNotMatchingPayload query. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org