what retrieval method is more useful for short text retrieval?

2011-08-17 Thread Lei Pang
Hi everyone, I'm trying to retrieve videos through their meta data, such as title, description, tags and comments. Because these text always very short, so I wonder if BM25 and LM is useful for that? or can anyone recommend some other retrieval method? Thanks in advance. Best Wishes Yours Sincere

Strange change to query parser behaviour in recent versions

2011-08-17 Thread Trejkaz
Hi all. Suppose I am searching for - 限定 In 3.0, QueryParser would parse this as a phrase query. In 3.3, it parses it as a boolean query, but offers an option to treat it like a phrase. Why would the default be not to do this? Surely you would always want it to become a phrase query. The new p

Re: Can I use Lucene to solve this problem?

2011-08-17 Thread Alexander Aristov
Hi Look at the apache mohaut project (based on hadoop ). It seems you need machine learning algorithms. Best Regards Alexander Aristov On 17 August 2011 20:39, Ian Lea wrote: > Certainly sounds doable in lucene. Is it basically working apart from > false positives? Can you give some example

Re: Regarding multiple index creation and Searching

2011-08-17 Thread Mihai Caraman
heard that ~80million docs per index (varying with average document size). @Uwe Schindler: Is hashed distribution really necessary when using MultiReader? I did hear that solr uses continuous hashing algorithm with shards of indexes. But MultiReader didn't say anything about hashing.

Re: Overriding default handling of '/' and '-'

2011-08-17 Thread Mihai Caraman
QueryParser is to blaim, so avoid using it. Like you said, by just filtering you're good. That's how I did it, when the query came, it came broken in two, the part that needed to be (full-text)analyzed and the second part by which I filtered as exact match(suppose it applies to you too) 2011/8/17

Re: Can I use Lucene to solve this problem?

2011-08-17 Thread Federico Fissore
Josh Rehman, il 17/08/2011 05:03, ha scritto: My organization is looking to solve a difficult problem, and I believe that Lucene is a close fit (although perhaps it is not). However I'm not sure exactly how to approach this problem. [...] maybe using semantic vectors? [0] we've played around

Re: Overriding default handling of '/' and '-'

2011-08-17 Thread Ian Lea
What analyzer are you using? You could build your own including MappingCharFilter to replace / and - with something that didn't cause splits. You could also get clever and insert the translated value in the token stream as well as the original which might give you the best of both worlds. If the

Re: Can I use Lucene to solve this problem?

2011-08-17 Thread Ian Lea
Certainly sounds doable in lucene. Is it basically working apart from false positives? Can you give some examples of the false positives? I'd be tempted to look at span queries which will let you say that "Yesterday I put on my green plaid shirt" is a better match against "Green plaid shirt with

Re: [SPATIAL] Spatial search runs forever

2011-08-17 Thread drazen.nis
At the end I've found what the problem is. The problem is in using non thread safe Map implementations in DistanceFilter. So if you execute the searches, using the same instance of DistanceFilter, using one thread, everything works as expected. But executing it with multiple threads in parallel, th

RE: Strange behavior of the StandardAnalyzer

2011-08-17 Thread Uwe Schindler
Hi, Do you use the same Analyzer for both searching and indexing. These typ of issues only happen, if you have different analyzers. This type of query should always work with StandardAnalyzer. Which Lucene version and which analysis configuration do you have (including matchVersion parameters)? U

Strange behavior of the StandardAnalyzer

2011-08-17 Thread Alain Sahli
Hello, I set a field which contains a name of a person to Field.Index.ANALYZED. I use the StandardAnalyzer for the seaching part and in general it works very well. But I found one strange case which I have to change to fit the expectation of the customer. If I search for an exact name which is