Re: Collector is collecting more than the specified hits

2014-02-17 Thread saisantoshi
As I mentioned in my original post, I am calling like the below: MyCollector collector; TopScoreDocCollector topScore = TopScoreDocCollector.create(firstIndex+numHits, true); IndexSearcher searcher = new IndexSearcher(reader); try {

Re: Extending StandardTokenizer Jflex to not split on '/'

2014-02-17 Thread Steve Rowe
Sorry, Diego, the generated scanner diff doesn't tell me anything. Since I was able to successfully make changes to the open source and get the desired behavior, I'm guessing you're: a) not using the same (versions of) tools as me; b) not using the same (version of the) source as me; or c) not tes

Re: Collector is collecting more than the specified hits

2014-02-17 Thread saisantoshi
Could you please elaborate on the above? I am not sure if the collector is already doing it or do I need to call any other API? Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117883.html Sent from

Re: Collector is collecting more than the specified hits

2014-02-17 Thread Michael McCandless
This is exactly what searchAfter is for ("deep paging"). Mike McCandless http://blog.mikemccandless.com On Mon, Feb 17, 2014 at 3:12 PM, saisantoshi wrote: > The collector is collecting all the documents. Let's say I have 50k documents > and I want the collector to give me the results taking t

Re: Extending StandardTokenizer Jflex to not split on '/'

2014-02-17 Thread Diego Fernandez
Hey Steve, thanks for the quick reply. I didn't have a chance to test again until today. In our Lucene build, we had already made some customization to the JFlex file and it re-generates the java file whenever we build our project. Unfortunately, it is still not working for me. I diffed the

Re: Reverse Matching

2014-02-17 Thread Alan Woodward
Hi Siraj, At the moment luwak is based on a fork of lucene (https://github.com/flaxsearch/lucene-solr-intervals, itself based on work done in LUCENE-2878), which we use to report exact match positions. I'm hoping to get it working with the main lucene classes soon, though. Alan Woodward www.f

RE: Reverse Matching

2014-02-17 Thread Siraj Haider
Thanks for your great advice Ahmet. Do you know if I could use luwak libraries in my Lucene project diretly? Or do I have to use Solr? Currently, we use core lucene libraries in our system and have built our own framework around it. regards -Siraj -Original Message- From: Ahmet Arslan

Re: Collector is collecting more than the specified hits

2014-02-17 Thread saisantoshi
The collector is collecting all the documents. Let's say I have 50k documents and I want the collector to give me the results taking the start and maxHits. Can we get this functionality from Lucene? For example, very first time, I want to collect from 0 -100 & the next time I want to collect from 1

Re: Actual min and max-value of NumericField during codec flush

2014-02-17 Thread Michael McCandless
On Mon, Feb 17, 2014 at 8:33 AM, Ravikumar Govindarajan wrote: >> >> Well, this will change your scores? MultiReader will sum up all term >> statistics across all SegmentReaders "up front", and then scoring per >> segment will use those top-level weights. > > > Our app needs to do only matching a

Re: Lucene doubt

2014-02-17 Thread Adrien Grand
Hi Pedro, Lucene indeed supports indexing data from several threads into a single IndexWriter instance, and it will make use of all your I/O and CPU. You can learn more about how it works at http://blog.trifork.com/2011/05/03/lucene-indexing-gains-concurrency/ On Mon, Feb 17, 2014 at 3:54 PM, Ped

Re: Lucene doubt

2014-02-17 Thread Michael McCandless
In general, both indexing and searching are highly concurrent in Lucene. Mike McCandless http://blog.mikemccandless.com On Mon, Feb 17, 2014 at 9:54 AM, Pedro Cardoso wrote: > Good afternoon, > > I am using Lucene in developing a protect, however I was faced with a > doubt. > > I wonder if a

Lucene doubt

2014-02-17 Thread Pedro Cardoso
Good afternoon, I am using Lucene in developing a protect, however I was faced with a doubt. I wonder if a multi-thread system it is possible to write concurrently? Cumprimentos/ Best Regards *Pedro Cardoso* http://www.linkedin.com/pub/pedro-cardoso/54/243/60

Re: codec mismatch

2014-02-17 Thread Jack Krupansky
Are you using or aware of Solandra? See: https://github.com/tjake/Solandra Solandra has been superceded by a commercial product, DataStax Enterprise that combines Solr/Lucene and Cassandra. Solr/Lucene indexing of Cassandra data is supported, but the actual Lucene indexes are stored in the nat

Re: Actual min and max-value of NumericField during codec flush

2014-02-17 Thread Ravikumar Govindarajan
> > Well, this will change your scores? MultiReader will sum up all term > statistics across all SegmentReaders "up front", and then scoring per > segment will use those top-level weights. Our app needs to do only matching and sorting. In-fact, it would be fully OK to by-pass scoring. But I feel

Re: codec mismatch

2014-02-17 Thread Michael McCandless
That NPE is happening inside Cassandra's sources; I think you need to trace what's happening there and how its FileBlock can be null? It looks like it's a bug on how CassandraDirectory handles compound files (e.g. _0.cfs), which are somewhat tricky because it's a file that acts itself like a Direc

Re: codec mismatch

2014-02-17 Thread Jason Wee
Hi Mike, Thank you. This exception is pretty clear that during lucene execute readInternal(...) on _0.cfs and encountered an npe. The root cause is because the object being read, FileBlock is null. As far as i can tell, it happen only during reading _0.cfs but not on the index files that were re