understanding the norm encode and decode

2015-03-03 Thread wangdong
I read the article about the scoring section in lucene as follows: Encoding and decoding of the resulted float norm in a single byte are done by the static methods of the class Similarity:encodeNorm()

Re: Part of speech search with lucene

2015-03-03 Thread Michael Sokolov
I believe you can accomplish what you are talking about using PhraseQuery, say: note that it has public void add(Term term, int position) which does enable searching for multiple terms at the same position and you should be able to encode different kinds of attributes using text tricks like I

Re: Part of speech search with lucene

2015-03-03 Thread David Villarejo
What you propose is good if you want to index only the pos of a token. But I want to index some extra info, such as "lemma" of a token, phonetic encoding, etc. Sorry, I was not too general in my previous post. Imagine you want to ask this: an adj whose lemma is "quick" followed by "brown" followed

Re: Part of speech search with lucene

2015-03-03 Thread Michael Sokolov
What if you indexed every word with two synonyms: the plain unadorned word and a token formed by concatenating the pos and the word with some unusual separator character? For example, "the quick brown fox" would be: { the | article:the } {quick | adj:quick } { brown | adj:brown } { fox | noun

Part of speech search with lucene

2015-03-03 Thread David Villarejo
After many google searchs I decided to post my problem here hoping that someone help me. What I want to achieve is to perform queries as follows (Don't worry about the query format): q1: (adjective) "jumps" (preposition) // any adj followed by "jumps" followed by any prep. q2: (adjective:brown) "j