Hello, everyone! Could anyone please explain how to get offsets for hits? I.e. I have a big text file and want to find some string in it. As a result of this operation, I need an array of offsets (in characters) from the beginning of the file for each occurrence of the string.
As an example, suppose, the file is "The quick brown fox jumps over the lazy dog" and the search string is "quick brown". I expect the result of search to be 4. I spent a while trying to achieve this, but failed. I tried to create a document with a single field ("content") and use TermPositionVector to get term offsets. It works when query consists of a single term. I just get all occurrences of this term in the "content" field, and that's it. But what about more complex queries? I think I could do it by iterating query terms, getting their offsets, then doing some magic to sort them and link particular occurrences of different terms together, etc. But this looks like a lot of work for such a simple task. I feel like there should be a better way. I understand, that, may be, for some more complex queries, it isn't clear how to define what "offset" is. But I don't really need sophisticated queries. I just need simple substring search. May be, Lucene is not supposed to be used that way. But I also need to manage a number of big files and be able to search in multiple files at once and produce results quickly - things Lucene does well (as far as I know). Best regards, Dmitry. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org