Re: Serializing RAMDirectory in 4.6.0

2014-01-18 Thread Konstantyn Smirnov
Yeah, already done that (after some experimenting): static void serializeRAMDirectory( RAMDirectory dir, output ){ if( null == dir ) return output?.withObjectOutputStream{ out -> out.writeLong dir.sizeInBytes() out.writeInt dir.fileMap.size() dir.fileMap.each{ String na

Serializing RAMDirectory in 4.6.0

2014-01-17 Thread Konstantyn Smirnov
Hi all, In Lucene 3.x the RAMDirectory was Serializable. In 4.x not any more... what's the best/most performant/easies way to serialize the RAMDir in 4.6.0? TIA -- View this message in context: http://lucene.472066.n3.nabble.com/Serializing-RAMDirectory-in-4-6-0-tp4111999.html Sent from the

Re: RAMDirectory and expungeDeletes()/optimize()

2013-05-21 Thread Konstantyn Smirnov
I want to refresh the topic a bit. Using the Lucene 4.3.0, I could'n find a method like expungeDeletes() in the IW anymore. I rely on lucence's MergePolicies to do the optimization, but I need to keep the metadata up-to-date, docFreqs and termFreqs to name a few. The only way to accomplish that w

Re: Lucene 4.1: IntField cannot be found by a NumericRangeFilter/NumericRangeQuery

2013-03-04 Thread Konstantyn Smirnov
Ah yes, my bad! I indeed used my own fieldTypes for my numeric fields. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-4-1-IntField-cannot-be-found-by-a-NumericRangeFilter-NumericRangeQuery-tp4044544p4044670.html Sent from the Lucene - Java Users mailing list archiv

RE: Lucene 4.1: IntField cannot be found by a NumericRangeFilter/NumericRangeQuery

2013-03-04 Thread Konstantyn Smirnov
changing FT to indexed=true did the trick, thanks Shouldn't it be enabled by default? If I invert a field using one of numeric classes, I'd expect it to be indexed. Otherwise I would use a StringField or StoredField... -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene

Lucene 4.1: IntField cannot be found by a NumericRangeFilter/NumericRangeQuery

2013-03-04 Thread Konstantyn Smirnov
Hi guys, On my path of migrating from 3.6.x to 4.1, I'm facing the following problem: I create a document with an IntField in it: doc.add new IntField( 'freeSeats', 5, Store.YES ) After adding to the doc and writing to the index, the field looks like (copied from eclipse debugger): [20]Int

RE: Confusion with Analyzer.tokenStream() re-use in 4.1

2013-02-27 Thread Konstantyn Smirnov
Thanks for the answer Uwe! so the behavior has changed since the 3.6, hasn't it? Now I need to instantiate the analyzer each time I feed the field with the tokenStream, or it happens behind the scenes if I use new (String name, String value, Field.Store store). Another question then... Now I tr

Confusion with Analyzer.tokenStream() re-use in 4.1

2013-02-27 Thread Konstantyn Smirnov
Dear all, I'm using the following test-code: Document doc = new Document() Analyzer a = new SimpleAnalyzer( Version.LUCENE_41 ) TokenStream inputTS = a.tokenStream( 'name1', new StringReader( 'aaa bbb ccc' ) ) Field f = new TextField( 'name1', inputTS ) doc.add f TokenStream ts = doc.getField(

Re: Lucene vs SQL.

2012-08-01 Thread Konstantyn Smirnov
If you tokenize AND store fields in your document, you can always pull them and re-invert using another analyzer, so you don't need to store the "original data" somewhere else. The point is rather the performance. I started a discussion on that topic http://lucene.472066.n3.nabble.com/Performance

Re: is there a way to control when merges happen?

2012-08-01 Thread Konstantyn Smirnov
Hi Mike. I have a LogDocMergePolicy + ConcurrentMergeScheduler in my setup. I tried adding new segments with 800-5000 documents in each of them in a row, but the scheduler seemed to ignore them at first... only after some time it managed to merge some of them. I have an option to use a quartz-sch

Re: RAMDirectory and expungeDeletes()/optimize()

2012-07-11 Thread Konstantyn Smirnov
JavaDoc comes from here http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexWriter.html#expungeDeletes() other blanks are here because it's groovy :) Or what did you mean exactly? -- View this message in context: http://lucene.472066.n3.nabble.com/RAMDirectory-and-expungeDel

RAMDirectory and expungeDeletes()/optimize()

2012-07-11 Thread Konstantyn Smirnov
Hi all in my app (Lucene 3.5.0 powered) I index the documents (not too many, say up to 100k) using the RAMDirectory. Then I need to send the segment over the network to be merged with the existing index other there. The segment need to be as "slim" as possible, e.g. without any pending deleted do

Re: Lucene Document Uniqueness question

2012-06-06 Thread Konstantyn Smirnov
you can use aggregation for that. dump a collection of prices as a field with multiple values into a document //pseudo-code def doc = new Document(...) doc.add new Field( 'id', id ) doc.add new Field( 'price', price1 ) doc.add new Field( 'price', price2 ) doc.add new Field( 'price', price3 ) inde

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-22 Thread Konstantyn Smirnov
simple what is the speed of indexing of document with stored fields? what is the retrieval rate? how good can it scale? How good performs the MongoDB and other within the same discipline? Has anyone conducted such comparison-tests? To dump like 1 mio documents into the index (with the single inde

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-21 Thread Konstantyn Smirnov
That's ok, but what is the real difference? Are there any performance tests? I can assume, that up to 1 GB index size, there will be no noticeable difference with stored fields in comparison with some MongoDB, but if the index size grows? -- View this message in context: http://lucene.472066.n3.

Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-18 Thread Konstantyn Smirnov
Hi all, apologies, if this question was already asked before. If I need to store a lot of data (say, millions of documents), what would perform better (in terms of reads/writes/scalability etc.): Lucene with stored fields (Field.Store.YES) or another NoSql DB like Mongo or Couch? Does it make se

RE: Reusing a CachingWrapperFilter

2011-07-29 Thread Konstantyn Smirnov
If I define a query and filter like this: Query q = new BooleanQuery() // populating q Filter filter = new CachingWrapperFilter( new QueryWrapperFilter( q ) ) given that I don't need score and I do need a cached filter to reuse it immediately for other calculations, which way of searching would

RE: Reusing a CachingWrapperFilter

2011-07-25 Thread Konstantyn Smirnov
Uwe Schindler wrote: > > To just count the results use TotalHitCountCollector (since Lucene Core > 3.1) > with IndexSaercher.search(). > ok, thanks for that! so the code should look like: CachingWrapperFilter cwf = new CachingWrapperFilter( filter ) searcher.search( query, cwf ... ) // search

Re: Index one huge text file

2011-07-25 Thread Konstantyn Smirnov
If you read your file as a stream, i.e. line-by-line without buffering it in RAM, you shall have no problems with performance, as 60k lines is a piece of cake :). You can try using LineNumberReader: Reader lnr = new LineNumberReader( new FileReader( new File( '/path/to/your/file' ) ) ) String lin

Reusing a CachingWrapperFilter

2011-07-25 Thread Konstantyn Smirnov
Hi all! are there any limitations or implications on reusing a CWF? In my app I'm doing the following: Filter filter = new BooleanFilter(...) // initialized with a couple of Term-, Range-, Boolean- and PrefixFilter CachingWrapperFilter cwf = new CachingWrapperFilter( filter ) searcher.search(

Re: Can I run Lucene in google app engine?

2010-12-10 Thread Konstantyn Smirnov
Thanks Mike, I found it. It's a really elegant way, to serialize the object. No special serialize() methods, just dump in to stram - that's it :) -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-run-Lucene-in-google-app-engine-tp560098p2063140.html Sent from the Lucene

Re: Can I run Lucene in google app engine?

2010-12-08 Thread Konstantyn Smirnov
Hi Mike, can you please elaborate? Where can I find the test? TIA Michael McCandless-2 wrote: > > Yes, I believe so (we have a unit test asserting this). > > But, there's no guarantee of cross-version compatibility of the serialized > form. > > Mike > -- View this message in context: h

Keeping the IndexWriter open?

2010-03-23 Thread Konstantyn Smirnov
Hi all, are there any potential dangers in keeping the IndexWriter (which is a singleton in my app) open throughout the whole application life? I have tested it shortly, and it seems to be working fine... Am I missing some pitfalls and caveats? Thanks - Konstantyn Smirnov, CTO http

Re: NullPointerExc in CloseableThreadLocal... (Lucene 3.0.0)

2010-01-13 Thread Konstantyn Smirnov
probability is really low, I think, because the update takes also about 100 ms... Anyway, it would be worth trying some IR reopen lock. Do you have any idea on that? - Konstantyn Smirnov, CTO http://www.poiradar.ru www.poiradar.ru http://www.poiradar.com.ua www.poiradar.com.ua http://www.poiradar.com

NullPointerExc in CloseableThreadLocal... (Lucene 3.0.0)

2010-01-13 Thread Konstantyn Smirnov
Hi all Consider the following piece of code: Searcher s = this.getSearcher() def hits = s.search( query, filter, params.offset + params.max, sort ) for( hit in hits.scoreDocs[ lower..http://www.poiradar.ru www.poiradar.ru http://www.poiradar.com.ua www.poiradar.com.ua http://www.poiradar.com

Re: TermEnum.skipTo in 3.0.0 replacement

2009-12-10 Thread Konstantyn Smirnov
thanks guys! :) another question, what is faster indexReader.terms( t ) or 10 times termEnum.next() ? - Konstantyn Smirnov, CTO http://www.poiradar.ru www.poiradar.ru http://www.poiradar.com.ua www.poiradar.com.ua http://www.poiradar.com www.poiradar.com http://www.poiradar.de

TermEnum.skipTo in 3.0.0 replacement

2009-12-10 Thread Konstantyn Smirnov
Hi all in the Lucene 2.3.2 there was a method in TermEnum skipTo( term ) In the 3.0.0 it's missing... Are there any other way to skip terms? - Konstantyn Smirnov, CTO http://www.poiradar.ru www.poiradar.ru http://www.poiradar.com.ua www.poiradar.com.ua http://www.poirada

Re: Performance diffs between filter.bits() and searcher.docFreq()

2009-09-02 Thread Konstantyn Smirnov
without the need to optimize()? - Konstantyn Smirnov, CTO http://www.poiradar.ru www.poiradar.ru http://www.poiradar.com.ua www.poiradar.com.ua http://www.poiradar.com www.poiradar.com http://www.poiradar.de www.poiradar.de -- View this message in context: http://www.nabble.com/Performance-diffs-

Performance diffs between filter.bits() and searcher.docFreq()

2009-08-28 Thread Konstantyn Smirnov
are the 'delayed' deletes, so it doesn't give the exact numbers, while the 1st is satisfied with indexReader.reopen() Which one is faster? Can I replace the 2nd one with the 1st and still get the same performance? Thanks in advance - Konstantyn Smirnov, CTO http://www.

Re: Suggestive Search

2009-04-09 Thread Konstantyn Smirnov
I implemented the suggestions-feature for a couple of web-sites. an example can be seen on http://www.genios.de/r_firmen/webcgi?START=016&SEITE=firmenk_d.ein&DBN=&WID=01852-8850939-00904_3 genios.de . type smth in in the Firma and Person fields. The Firma-index has 3++ mio records, Person ~ 1.

Incremental search, CachingWrapperFilter and BooleanFilter

2009-02-19 Thread Konstantyn Smirnov
Hi all I implemented an autocomplete functionality, which is pretty classical: a user types in some words in an input field, and sees a list of matches in a drop-down. I've done it using filters (BooleanFilter, and TermsFilter + PrefixFilter), and it's working against and index (loaded in RAM) w

Best Practice for Lucene Search

2009-02-11 Thread Konstantyn Smirnov
In the beginning of the development, I was also facing a choice to mirror the documents in DB/index. But when the number of raws reached the mark of 7 mio, the query like "select count(id) from documentz" (using PostgresQL) would take ages (ok, about 10 minutes!!! ), it became clear t

Re: WildCardQuery and TooManyClauses

2008-09-19 Thread Konstantyn Smirnov
Konstantyn Smirnov wrote: > > So, how can I plug the WildcardFilter in, to prevent TooManyClauses? Are > there other ways, than using the trunk? > Now I ended up in overriding also QueryParser.getPrefixQuery() method, using ConstantScoreQuery and PrefixFilter. MaxClauseCountExc

Re: WildCardQuery and TooManyClauses

2008-09-18 Thread Konstantyn Smirnov
Michael McCandless-2 wrote: > > > It's only with the trunk version of Lucene that QueryParser calls > getWildcardQuery on parsing a wildcard string from the user's query. > I see.. So, how can I plug the WildcardFilter in, to prevent TooManyClauses? Are there other ways, than using the tru

RE: WildCardQuery and TooManyClauses

2008-09-18 Thread Konstantyn Smirnov
Beard, Brian wrote: > > 1) Extend QueryParser to override the getWildcardQuery method. > Kinda late :), but I still have another question: Who calls that getWildcardQuery() method? I subclassed the QueryParser, but that method does never get invoked, even if the query contains *. Shall I

Re: TermsFilter and MUST

2008-09-12 Thread Konstantyn Smirnov
Hi Mark, I ended up implementing a MandatoryTermsFilter, which looks like: class MandatoryTermsFilter extends Filter { List terms BitSet bits( IndexReader reader ){ int size = reader.maxDoc() BitSet result = new BitSet( size ) BitSet andMask = new BitSet( size ) andMas

TermsFilter and MUST

2008-09-12 Thread Konstantyn Smirnov
Hi gents, is it possible to use TermsFilter with the 'MUST' occurence rule, instead of the 'SHOULD'? In the code: def tf = new TermsFilter() for( some terms ){ tf.addTerm( new Term( ) ) } I want that all terms MUST limit the hit list. Thanks in advance -- View this message in context:

Re: Parametric/faceted Searching

2008-07-24 Thread Konstantyn Smirnov
I soved that using a single field in the document. It's content is based on a simple convention. Say I have 2 docs with values BirthsMarriagesDeath_Deaths_Females and BirthsMarriagesDeath_Divorces. Now when I need to get the total count for BirthsMarriagesDeath category, I run "BirthsMarriages

Re: Requesting MultipleIndeces

2008-06-25 Thread Konstantyn Smirnov
if you have a good hardware with tons of RAM, you can use ParallelMultiSearcher, which looks-up in all indieces simulateneously. if you are short on that, you must search in one index at a time, using MultiSearcher. -- View this message in context: http://www.nabble.com/Requesting-MultipleIndec

Re: HitCollector and sorting

2008-06-17 Thread Konstantyn Smirnov
hossman wrote: > > Take a look at TopFieldDocCollector It's a HitCollector provided out of > the box that does sorting. > will it work against a ParallelMultiSearcher? -- View this message in context: http://www.nabble.com/HitCollector-and-sorting-tp17604363p17881706.html Sent from the Lu

Re: Compass - Reloading Domain Object Defintiion Files

2008-06-10 Thread Konstantyn Smirnov
I was having a similar prob. See here: http://www.nabble.com/Alternative-to-Compass-Searchable-plugin-tp17248352p17248352.html -- View this message in context: http://www.nabble.com/Compass---Reloading-Domain-Object-Defintiion-Files-tp17742490p17749796.html Sent from the Lucene - Java Users mai

Re: Typical Indexing performance

2008-06-06 Thread Konstantyn Smirnov
my 2 cents My indexing-module handles the documents with ~15 fields, most of those must be indexed and stored. Using the GermanAnalyzer I saw the following times: 10 MB ~ 3400 docs --> 6-8 sec 70 MB ~ 5 docs --> 65 sec so it gives me 500 - 760 doc/s -- View this message in context: http:/

HitCollector and sorting

2008-06-02 Thread Konstantyn Smirnov
Hi all Currently I'm using the search method returning the Hits object. According to http://wiki.apache.org/lucene-java/ImproveSearchingSpeed one should use a HitCollector-oriented search method instead. But I need another aspect of the "Hits search(...)" method: it's sorting ability. Now my c