Yeah, already done that (after some experimenting):
static void serializeRAMDirectory( RAMDirectory dir, output ){
if( null == dir ) return
output?.withObjectOutputStream{ out ->
out.writeLong dir.sizeInBytes()
out.writeInt dir.fileMap.size()
dir.fileMap.each{ String na
Hi all,
In Lucene 3.x the RAMDirectory was Serializable.
In 4.x not any more...
what's the best/most performant/easies way to serialize the RAMDir in 4.6.0?
TIA
--
View this message in context:
http://lucene.472066.n3.nabble.com/Serializing-RAMDirectory-in-4-6-0-tp4111999.html
Sent from the
I want to refresh the topic a bit.
Using the Lucene 4.3.0, I could'n find a method like expungeDeletes() in the
IW anymore. I rely on lucence's MergePolicies to do the optimization, but I
need to keep the metadata up-to-date, docFreqs and termFreqs to name a few.
The only way to accomplish that w
Ah yes, my bad!
I indeed used my own fieldTypes for my numeric fields.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Lucene-4-1-IntField-cannot-be-found-by-a-NumericRangeFilter-NumericRangeQuery-tp4044544p4044670.html
Sent from the Lucene - Java Users mailing list archiv
changing FT to indexed=true did the trick, thanks
Shouldn't it be enabled by default?
If I invert a field using one of numeric classes, I'd expect it to be
indexed.
Otherwise I would use a StringField or StoredField...
--
View this message in context:
http://lucene.472066.n3.nabble.com/Lucene
Hi guys,
On my path of migrating from 3.6.x to 4.1, I'm facing the following problem:
I create a document with an IntField in it:
doc.add new IntField( 'freeSeats', 5, Store.YES )
After adding to the doc and writing to the index, the field looks like
(copied from eclipse debugger):
[20]Int
Thanks for the answer Uwe!
so the behavior has changed since the 3.6, hasn't it?
Now I need to instantiate the analyzer each time I feed the field with the
tokenStream, or it happens behind the scenes if I use new (String name,
String value, Field.Store store).
Another question then... Now I tr
Dear all,
I'm using the following test-code:
Document doc = new Document()
Analyzer a = new SimpleAnalyzer( Version.LUCENE_41 )
TokenStream inputTS = a.tokenStream( 'name1', new StringReader( 'aaa bbb
ccc' ) )
Field f = new TextField( 'name1', inputTS )
doc.add f
TokenStream ts = doc.getField(
If you tokenize AND store fields in your document, you can always pull them
and re-invert using another analyzer, so you don't need to store the
"original data" somewhere else.
The point is rather the performance. I started a discussion on that topic
http://lucene.472066.n3.nabble.com/Performance
Hi Mike.
I have a LogDocMergePolicy + ConcurrentMergeScheduler in my setup.
I tried adding new segments with 800-5000 documents in each of them in a
row, but the scheduler seemed to ignore them at first... only after some
time it managed to merge some of them.
I have an option to use a quartz-sch
JavaDoc comes from here
http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexWriter.html#expungeDeletes()
other blanks are here because it's groovy :) Or what did you mean exactly?
--
View this message in context:
http://lucene.472066.n3.nabble.com/RAMDirectory-and-expungeDel
Hi all
in my app (Lucene 3.5.0 powered) I index the documents (not too many, say up
to 100k) using the RAMDirectory.
Then I need to send the segment over the network to be merged with the
existing index other there.
The segment need to be as "slim" as possible, e.g. without any pending
deleted do
you can use aggregation for that.
dump a collection of prices as a field with multiple values into a document
//pseudo-code
def doc = new Document(...)
doc.add new Field( 'id', id )
doc.add new Field( 'price', price1 )
doc.add new Field( 'price', price2 )
doc.add new Field( 'price', price3 )
inde
simple
what is the speed of indexing of document with stored fields? what is the
retrieval rate? how good can it scale? How good performs the MongoDB and
other within the same discipline?
Has anyone conducted such comparison-tests? To dump like 1 mio documents
into the index (with the single inde
That's ok, but what is the real difference? Are there any performance tests?
I can assume, that up to 1 GB index size, there will be no noticeable
difference with stored fields in comparison with some MongoDB, but if the
index size grows?
--
View this message in context:
http://lucene.472066.n3.
Hi all,
apologies, if this question was already asked before.
If I need to store a lot of data (say, millions of documents), what would
perform better (in terms of reads/writes/scalability etc.): Lucene with
stored fields (Field.Store.YES) or another NoSql DB like Mongo or Couch?
Does it make se
If I define a query and filter like this:
Query q = new BooleanQuery()
// populating q
Filter filter = new CachingWrapperFilter( new QueryWrapperFilter( q ) )
given that I don't need score and I do need a cached filter to reuse it
immediately for other calculations, which way of searching would
Uwe Schindler wrote:
>
> To just count the results use TotalHitCountCollector (since Lucene Core
> 3.1)
> with IndexSaercher.search().
>
ok, thanks for that!
so the code should look like:
CachingWrapperFilter cwf = new CachingWrapperFilter( filter )
searcher.search( query, cwf ... ) // search
If you read your file as a stream, i.e. line-by-line without buffering it in
RAM, you shall have no problems with performance, as 60k lines is a piece of
cake :).
You can try using LineNumberReader:
Reader lnr = new LineNumberReader( new FileReader( new File(
'/path/to/your/file' ) ) )
String lin
Hi all!
are there any limitations or implications on reusing a CWF?
In my app I'm doing the following:
Filter filter = new BooleanFilter(...)
// initialized with a couple of Term-, Range-, Boolean- and PrefixFilter
CachingWrapperFilter cwf = new CachingWrapperFilter( filter )
searcher.search(
Thanks Mike, I found it.
It's a really elegant way, to serialize the object. No special serialize()
methods, just dump in to stram - that's it :)
--
View this message in context:
http://lucene.472066.n3.nabble.com/Can-I-run-Lucene-in-google-app-engine-tp560098p2063140.html
Sent from the Lucene
Hi Mike,
can you please elaborate?
Where can I find the test?
TIA
Michael McCandless-2 wrote:
>
> Yes, I believe so (we have a unit test asserting this).
>
> But, there's no guarantee of cross-version compatibility of the serialized
> form.
>
> Mike
>
--
View this message in context:
h
Hi all,
are there any potential dangers in keeping the IndexWriter (which is a
singleton in my app) open throughout the whole application life?
I have tested it shortly, and it seems to be working fine...
Am I missing some pitfalls and caveats?
Thanks
-
Konstantyn Smirnov, CTO
http
probability is really low, I think, because the update takes also about 100
ms...
Anyway, it would be worth trying some IR reopen lock. Do you have any idea
on that?
-
Konstantyn Smirnov, CTO
http://www.poiradar.ru www.poiradar.ru
http://www.poiradar.com.ua www.poiradar.com.ua
http://www.poiradar.com
Hi all
Consider the following piece of code:
Searcher s = this.getSearcher()
def hits = s.search( query, filter, params.offset + params.max, sort )
for( hit in hits.scoreDocs[ lower..http://www.poiradar.ru www.poiradar.ru
http://www.poiradar.com.ua www.poiradar.com.ua
http://www.poiradar.com
thanks guys! :)
another question, what is faster indexReader.terms( t ) or 10 times
termEnum.next() ?
-
Konstantyn Smirnov, CTO
http://www.poiradar.ru www.poiradar.ru
http://www.poiradar.com.ua www.poiradar.com.ua
http://www.poiradar.com www.poiradar.com
http://www.poiradar.de
Hi all
in the Lucene 2.3.2 there was a method in TermEnum skipTo( term )
In the 3.0.0 it's missing...
Are there any other way to skip terms?
-
Konstantyn Smirnov, CTO
http://www.poiradar.ru www.poiradar.ru
http://www.poiradar.com.ua www.poiradar.com.ua
http://www.poirada
without the need to optimize()?
-
Konstantyn Smirnov, CTO
http://www.poiradar.ru www.poiradar.ru
http://www.poiradar.com.ua www.poiradar.com.ua
http://www.poiradar.com www.poiradar.com
http://www.poiradar.de www.poiradar.de
--
View this message in context:
http://www.nabble.com/Performance-diffs-
are the 'delayed' deletes, so it doesn't
give the exact numbers, while the 1st is satisfied with indexReader.reopen()
Which one is faster? Can I replace the 2nd one with the 1st and still get
the same performance?
Thanks in advance
-
Konstantyn Smirnov, CTO
http://www.
I implemented the suggestions-feature for a couple of web-sites.
an example can be seen on
http://www.genios.de/r_firmen/webcgi?START=016&SEITE=firmenk_d.ein&DBN=&WID=01852-8850939-00904_3
genios.de .
type smth in in the Firma and Person fields.
The Firma-index has 3++ mio records, Person ~ 1.
Hi all
I implemented an autocomplete functionality, which is pretty classical: a
user types in some words in an input field, and sees a list of matches in a
drop-down.
I've done it using filters (BooleanFilter, and TermsFilter + PrefixFilter),
and it's working against and index (loaded in RAM) w
In the beginning of the development, I was also facing a choice to mirror the
documents in DB/index.
But when the number of raws reached the mark of 7 mio, the query like
"select count(id) from documentz"
(using PostgresQL) would take ages (ok, about 10 minutes!!! ), it became
clear t
Konstantyn Smirnov wrote:
>
> So, how can I plug the WildcardFilter in, to prevent TooManyClauses? Are
> there other ways, than using the trunk?
>
Now I ended up in overriding also QueryParser.getPrefixQuery() method, using
ConstantScoreQuery and PrefixFilter. MaxClauseCountExc
Michael McCandless-2 wrote:
>
>
> It's only with the trunk version of Lucene that QueryParser calls
> getWildcardQuery on parsing a wildcard string from the user's query.
>
I see..
So, how can I plug the WildcardFilter in, to prevent TooManyClauses? Are
there other ways, than using the tru
Beard, Brian wrote:
>
> 1) Extend QueryParser to override the getWildcardQuery method.
>
Kinda late :), but I still have another question:
Who calls that getWildcardQuery() method?
I subclassed the QueryParser, but that method does never get invoked, even
if the query contains *.
Shall I
Hi Mark,
I ended up implementing a MandatoryTermsFilter, which looks like:
class MandatoryTermsFilter extends Filter {
List terms
BitSet bits( IndexReader reader ){
int size = reader.maxDoc()
BitSet result = new BitSet( size )
BitSet andMask = new BitSet( size )
andMas
Hi gents,
is it possible to use TermsFilter with the 'MUST' occurence rule, instead of
the 'SHOULD'?
In the code:
def tf = new TermsFilter()
for( some terms ){
tf.addTerm( new Term( ) )
}
I want that all terms MUST limit the hit list.
Thanks in advance
--
View this message in context:
I soved that using a single field in the document.
It's content is based on a simple convention.
Say I have 2 docs with values BirthsMarriagesDeath_Deaths_Females and
BirthsMarriagesDeath_Divorces.
Now when I need to get the total count for BirthsMarriagesDeath category, I
run "BirthsMarriages
if you have a good hardware with tons of RAM, you can use
ParallelMultiSearcher, which looks-up in all indieces simulateneously.
if you are short on that, you must search in one index at a time, using
MultiSearcher.
--
View this message in context:
http://www.nabble.com/Requesting-MultipleIndec
hossman wrote:
>
> Take a look at TopFieldDocCollector It's a HitCollector provided out of
> the box that does sorting.
>
will it work against a ParallelMultiSearcher?
--
View this message in context:
http://www.nabble.com/HitCollector-and-sorting-tp17604363p17881706.html
Sent from the Lu
I was having a similar prob.
See here:
http://www.nabble.com/Alternative-to-Compass-Searchable-plugin-tp17248352p17248352.html
--
View this message in context:
http://www.nabble.com/Compass---Reloading-Domain-Object-Defintiion-Files-tp17742490p17749796.html
Sent from the Lucene - Java Users mai
my 2 cents
My indexing-module handles the documents with ~15 fields, most of those must
be indexed and stored. Using the GermanAnalyzer I saw the following times:
10 MB ~ 3400 docs --> 6-8 sec
70 MB ~ 5 docs --> 65 sec
so it gives me 500 - 760 doc/s
--
View this message in context:
http:/
Hi all
Currently I'm using the search method returning the Hits object. According
to http://wiki.apache.org/lucene-java/ImproveSearchingSpeed one should use a
HitCollector-oriented search method instead.
But I need another aspect of the "Hits search(...)" method: it's sorting
ability.
Now my c
43 matches
Mail list logo