Hi, I've been wondering about best way to index a pre-analyzed field. With pre-analyzed, I mean essentially one that I'd like to initialize with the constructor Field(String name, TokenStream tokenStream).
There is a loop about a bunch of document, all with pre-defined tokenizations that are stored in the variable tokenizations. One by one, the Lucene documents are added to the writer. The writer is an IndexWriter object that has been initialized and configured before. I have implemented a custom TokenStream class for that purpose, so I've approached the problem like the following: CustomTokenStream ts = new CustomTokenStream(); for (tokenization : tokenizations) { idField = new Field("id", doc.getDocid(), Field.Store.YES, Field.Index.NOT_ANALYZED); ts.setTokenization(tokenization); textField = new Field("text", ts); luceneDocument.add(idField); luceneDocument.add(textField); try { writer.addDocument(luceneDocument); } catch (IOException e) { System.err.println("Error adding document:\n"+e.getLocalizedMessage()); } } The problem is clearly that I cannot query the text field, can I? I've tried other ways though like initializing the text field with textField = new Field(String name, String value, Field.Store.YES, Field.Index.ANALYZED) and setting textField.setTokenStream(ts); However, this does not seem to make sense since I don't want to use a Lucene built-in analyzer and I'm not quite clear about what I should use for the value in the latter approach. Any help is very welcome! Thank you very much! Best regards, Carsten -- Carsten Schnober Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP -- Korpusanalyseplattform der nächsten Generation http://korap.ids-mannheim.de/ | Tel.: +49-(0)621-1581-238 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org