Re: Grouping on multiple shards possible in lucene?

2012-11-20 Thread Ravikumar Govindarajan
Hi Shai, I would only want to sort based on doc additions. Ex: d1,d2,d3. Then true sort order means d3,d2,d1. Doc timestamp based solution is much more involved like you said It's nice to know that you are already working on it and there will be a solution in the near future. In the meantime, I

Re: Performance of IndexSearcher.explain(Query)

2012-11-20 Thread Trejkaz
On Wed, Nov 21, 2012 at 10:40 AM, Robert Muir wrote: > Explain is not performant... but the comment is fair I think? Its more of a > worst-case, depends on the query. > Explain is going to rewrite the query/create the weight and so on just to > advance() the scorer to that single doc > So if this

Re: Using Lucene 2.3 indices with Lucene 4.0

2012-11-20 Thread Trejkaz
On Wed, Nov 21, 2012 at 12:33 AM, Ramprakash Ramamoorthy wrote: > On Tue, Nov 20, 2012 at 5:42 PM, Danil ŢORIN wrote: > >> Ironically most of the changes are in unicode handling and standard >> analyzer ;) >> > > Ouch! It hurts then ;) What we did going from 2 -> 3 (and in some cases where passi

Re: Performance of IndexSearcher.explain(Query)

2012-11-20 Thread Robert Muir
On Tue, Nov 20, 2012 at 6:18 PM, Trejkaz wrote: > I have a feature I wanted to implement which required a quick way to > check whether an individual document matched a query or not. > > IndexSearcher.explain seemed to be a good fit for this. > > The query I tested was just a BooleanQuery with two

Performance of IndexSearcher.explain(Query)

2012-11-20 Thread Trejkaz
I have a feature I wanted to implement which required a quick way to check whether an individual document matched a query or not. IndexSearcher.explain seemed to be a good fit for this. The query I tested was just a BooleanQuery with two TermQuery inside it, both with MUST. I ran an empty query t

Re: Line feed on windows

2012-11-20 Thread Jack Krupansky
This doesn't sound like a Lucene issue. It's up to you to read a file and pass it as a string to Lucene. Maybe you're trying to read the file one line at a time, in which case it is up to you to supply line delimiters when combining the lines into a single string. Try reading the full file into

Re: Grouping on multiple shards possible in lucene?

2012-11-20 Thread Shai Erera
Hi Ravi, I've been dealing with reverse indexing lately, so let me share with you a bit of my experience thus far. First, you need to define what does reverse indexing mean for you. If it means that docs that were indexed in the following order: d1, d2, d3 should be traversed during search in tha

Re: Grouping on multiple shards possible in lucene?

2012-11-20 Thread Ravikumar Govindarajan
But, I think it should be possible with some fun codec & merge policy & MultiReader magic, to have docIDs assigned in "reverse chronological order" Can you explain it a bit more? I was thinking perhaps we store absolute doc-ids instead of delta to do reverse traversal. But this could waste a lot o

Re: Using Lucene 2.3 indices with Lucene 4.0

2012-11-20 Thread Ramprakash Ramamoorthy
On Tue, Nov 20, 2012 at 5:42 PM, Danil ŢORIN wrote: > Ironically most of the changes are in unicode handling and standard > analyzer ;) > Ouch! It hurts then ;) > > On Tue, Nov 20, 2012 at 12:31 PM, Ramprakash Ramamoorthy < > youngestachie...@gmail.com> wrote: > > > On Tue, Nov 20, 2012 at 3:54

Re: Grouping on multiple shards possible in lucene?

2012-11-20 Thread Michael Sokolov
On 11/20/2012 6:49 AM, Michael McCandless wrote: On Tue, Nov 20, 2012 at 1:49 AM, Ravikumar Govindarajan wrote: Also, for a TopN query sorted by doc-id will the query terminate early? Actually, it won't! But it really should ... you could make a Collector that throws an exception once the N

Re: Using Lucene 2.3 indices with Lucene 4.0

2012-11-20 Thread Danil ŢORIN
Ironically most of the changes are in unicode handling and standard analyzer ;) On Tue, Nov 20, 2012 at 12:31 PM, Ramprakash Ramamoorthy < youngestachie...@gmail.com> wrote: > On Tue, Nov 20, 2012 at 3:54 PM, Danil ŢORIN wrote: > > > However behavior of some analyzers changed. > > > > So even af

Re: Grouping on multiple shards possible in lucene?

2012-11-20 Thread Michael McCandless
On Tue, Nov 20, 2012 at 1:49 AM, Ravikumar Govindarajan wrote: > Thanks Mike. Actually, I think I can eliminate sort-by-time, if I am able > to iterate postings in reverse doc-id order. Is this possible in lucene? Alas that is not easy to do in Lucene: the posting lists are encoded in forward doc

Re: TokenStreamComponents in Lucene 4.0

2012-11-20 Thread Robert Muir
On Tue, Nov 20, 2012 at 6:26 AM, Carsten Schnober wrote: > > Thanks, Uwe! > I think what changed in comparison to Lucene 3.6 is that reset() is > called upon initialization, too, instead of after processing the first > document only, right? There is no such change: this step was always mandator

Re: TokenStreamComponents in Lucene 4.0

2012-11-20 Thread Carsten Schnober
Am 20.11.2012 10:22, schrieb Uwe Schindler: Hi, > The createComponents() method of Analyzers is only called *once* for each > thread and the Tokenstream is *reused* for later documents. The Analyzer will > call the final method Tokenizer#setReader() to notify the Tokenizer of a new > Reader (t

Re: Using Lucene 2.3 indices with Lucene 4.0

2012-11-20 Thread Ramprakash Ramamoorthy
On Tue, Nov 20, 2012 at 3:54 PM, Danil ŢORIN wrote: > However behavior of some analyzers changed. > > So even after upgrade the old index is readable with 4.0, it doesn't mean > everything still works as before. > Thank you Torin, I am using the standard analyzer only and both the systems use Un

Re: Using Lucene 2.3 indices with Lucene 4.0

2012-11-20 Thread Ian Lea
Sure - read all the release notes, migration guides, everything, test and test again. -- Ian. On Tue, Nov 20, 2012 at 10:24 AM, Danil ŢORIN wrote: > However behavior of some analyzers changed. > > So even after upgrade the old index is readable with 4.0, it doesn't mean > everything still wo

Re: Using Lucene 2.3 indices with Lucene 4.0

2012-11-20 Thread Danil ŢORIN
However behavior of some analyzers changed. So even after upgrade the old index is readable with 4.0, it doesn't mean everything still works as before. On Tue, Nov 20, 2012 at 12:20 PM, Ian Lea wrote: > You can upgrade the indexes with org.apache.lucene.index.IndexUpgrader. > You'll need to do

Re: Using Lucene 2.3 indices with Lucene 4.0

2012-11-20 Thread Ian Lea
You can upgrade the indexes with org.apache.lucene.index.IndexUpgrader. You'll need to do it in steps, from 2.x to 3.x to 4.x, but should work fine as far as I know. -- Ian. On Tue, Nov 20, 2012 at 10:16 AM, Ramprakash Ramamoorthy < youngestachie...@gmail.com> wrote: > I understand lucene 2.

RE: TokenStreamComponents in Lucene 4.0

2012-11-20 Thread Uwe Schindler
Hi, all the components of your Tokenstream in Lucene 4.0 are *required* tob e reuseable, see the documentation: http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/Analyzer.html All your components must implement reset() according to the Tokenstream contract: http://lucene.apach

Re: TokenStreamComponents in Lucene 4.0

2012-11-20 Thread Carsten Schnober
Am 19.11.2012 17:44, schrieb Carsten Schnober: Hi, > However, after switching to Lucene 4 and TokenStreamComponents, I'm > getting a strange behaviour: only the first document in the collection > is tokenized properly. The others do appear in the index, but > un-tokenized, although I have tried n

Re: ANN: UweSays Query Operator

2012-11-20 Thread Tommaso Teofili
that's nice! Tommaso 2012/11/19 Uwe Schindler > Lol! > > Many thanks for this support! > > Uwes > > > > Otis Gospodnetic schrieb: > > >Hi, > > > >Quick announcement for Uwe & Friends. > > > >UweSays is now a super-duper-special query operator over on > >http://search-lucene.com/ . Now whenev