Hi Igor,
About your performance problem with SpanQueries and Payloads:
Try to filter with the corresponding BooleanQuery and use a profiler.
You have an IO-bottleneck because of reading position and payload
information per document.
Possible it would help if you first filter off the "obviously"
Hi Nariman,
In my understanding of ComplexPhraseQueryParser this class is not longer
supported.
http://issues.apache.org/jira/browse/LUCENE-1486#action_12782254
Instead with lucene 3.1 the new
org.apache.lucene.queryParser.standard.parser.StandardSyntaxParser will do
this job.
https://issues.apa
Hi,
I have a problem with the checkedRepeats in SloppyPhraseScorer.
This feature is for phrases like "1st word 2st word".
Without this feature the result would be the same as "1st word 2st".
OK
But I have an Index with more then one token on the same position.
The german sentence "Die käuflich
Hi Dave,
facets:
in you case a solution with one
int[IndexReader.maxDoc()]
fits. For each document number you can store an integer which represents the
facet value.
This is what org.apache.solr.request.UnInvertedField will store in your
case.
(*John* : is there something similar in com.browseeng
Hi David,
correct: you should avoid reading the content of a document inside a
hitcollector.
Normaly that means to cache all you need in main memory. Very simple and
fast is a facet with only 255 possible values and exactly one value per
document. In this case you need only an byte[IndexReader.ma
Hi John,
I intended to compare xtf with hierarchical facet browsing in browseengine
(selection expansion).
I found PathFacetCountCollector/PathFacetHandler#getFacetsForPath, and I
think that the implementation in xtf has a lot of advantages.
So I suggest you to reuse the xtf-source for that (Gro
Hi Dave,
searching and sorting in lucene are two separate functions (if you not want
to sort by relevance).
You will not loss performance if you first search with BitSet as
HitCollector and then sort the result by DateField.
But more easy is to extend TopFieldDocCollector/TopFieldCollector to a
C
Hi ilwes,
Did you noticed the thread
http://www.nabble.com/Lucene-vs.-Database-td19755932.html
?
I think it is usefull for the question about using lucene storage fields
even if you already have the information in DB.
Best regards
Karsten
ilwes wrote:
>
> Hello,
>
> I googled, searched t
Hi Murali,
I think a search with 4 * 5 = 20 Boolean Clauses will not be a performance
problem
(at least if you have only one optimized index-folder).
You also could use one Field which contains content of all other fields with
a boost factor for each term (different boost for content from diffe
Hi John,
I will take a look in the bobo-browse source code at week end.
Do you now the xtf implementation of faceted browsing:
starting point is
org.cdlib.xtf.textEngine.facet.GroupCounts#addDoc
?
(It works with millions of facet values on millions of hits)
What is the starting point in browsee
hi glen,
possible you will find this thread interesting:
http://groups.google.com/group/xtf-user/browse_thread/thread/beb62f5ff9a16a3a/16044d1009511cda
was about a taxonomy like in your example.
Also take a look to the faceted browsing on date in
http://www.marktwainproject.org/xtf/search?categor
Hi Dipak,
Which kind of "Taxonomy"?
Where is the difference to "faceted browsing" in your case?
best regards
Karsten
Kesarkar, Dipak wrote:
>
> Hi
>
> I want to include Taxonomy feature in my search.
>
> Does Lucene support Taxonomy? How?
>
> If not, is there in different way to add Tax
Hi buFka,
take a look to
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
e.g. your example does not set mergeFactor or RAMBufferSizeMB
I also like the last tip: "Run a Java profiler"
Because in my case, the leak of performance vanished after I switched from
jdom to saxon.
(we are indexi
Hi Zender,
please take a look to
http://www.nabble.com/Lucene-vs.-Database-td19755932.html#a19757274
you shouldn't use a lucene fields to store such huge data. At least not a
lucene field in your main search index.
You can use lucene as repository, but I would advice you to use a extra
index for
Hi csantos,
most possible this is not about lucene:
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/AbstractMethodError.html
GermanAnalyser ist not part of normal lucene jar (it is part of
lucene-analyzers).
In application server the position of jar files can be important.
Please try your cod
Hi Blured,
sorry I don't know anything about eclipse birt.
I recommend to start a new thread "eclipse birt with lucene" where you
describe your problem again in detail.
be aware that lucene don't know numerical values. lucene only knows strings.
best regards
Karsten
blured blured wrote:
>
Hi Blured,
if you are asking about integration of lucene and a DBMS, possible compass
is something for you
http://www.nabble.com/Lucene-vs.-Database-tp19755932p19758736.html
if you think about using hibernate: I think there already exist a lucene
connector, so you don't have to use jdbc.
if you
Hi Ohsang,
are you looking for
http://lucene.apache.org/java/2_4_0/fileformats.html
?
Best regards
Karsten
Kwon, Ohsang wrote:
>
> I want to know how the lucene stored the data in the index internally.
>
> (Lucene`s index format changed very often.)
>
>
>
> I can not find this informati
Hi Chris,
most likely this is not a lucene problem.
You looked with luke in the stored fields of your document?
Please take a second look with luke in the terms of your field 'unique_id'
(with "Show top terms"):
What do you see?
Best regards
Karsten
btw: why do you use the prefix search? Thi
Hi Brian,
I don't know the internals of highlighting („explanation“) in lucene.
But I know that XTF (
http://xtf.wiki.sourceforge.net/underHood_Documents#tocunderHood_Documents5
) can handle very large documents (above 100 Mbyte) with highlighting very
fast. The difference to your approach is, th
Hi spring,
unit of retrieval in lucene is a document.
There are no joins between document sets like in sql.
What you can do is to collect all hits for each term query on level of
folders and than implement the logical „and“ or „or“ by your own.
For this you could reuse the existing implementation
Hi agatone,
I agree with markharw00 that highlighting is the main reason to store fields
in lucene.
I want to remind Sascha Fahl that the stored field in lucene are not inside
the inverted index-structure.
The implemention of stored fields is very simple:
A (.fdt)-file with the pairs "field-name
Hi Luther,
your question:
"Is there a way to ask Lucene to search starting from a fixed position?"
the anwer: no, not by standard search.
But you don't want to use your field for scoring. So this is a field to
filter results.
you could easily change RangeFilter for this purpose but the new filt
be some other toolkit would
> more useful(possibly on top of the Lucene)
>
> Thanks in advance for any suggestions and comments. I would appreciate any
> ideas and directions to look into.
>
>
> On Tue, Sep 2, 2008 at 11:46 AM, Karsten F.
> <[EMAIL PROTECTED]>wrote:
&g
Hi Antony,
I decided first to delete all duplicates from master(iW) and then to insert
all temporary indices(other).
Any other opinions?
Best regards
Karsten
public static synchronized void merge(IndexWriter iW, Directory[] other,
final String uniqueID_FieldName) throws IOException{
Hi Leonid,
what kind of query is your use case?
Comlex scenario:
You need all the hierarchical structure information in one query. This means
you want to search with xpath in a real xml-Database. (like: All Documents
with a subtitle XY which contains directly after this subtitle a table with
the
Hi Markus,
hopefully someone will tell you the predefined Filter for this.
I only want to agree, that filter is the correct place for this, and that
you should be aware of the Token positions (after your filter you must have
two Tokens on the same position).
I think "WordDelimitierFilter" is a
Hi John,
I am not sure about the way Solr implements range query.
But it looks like, that Solr is using
org.apache.lucene.search.ConstantScoreRangeQuery
which itself is using
org.apache.lucene.search.RangeFilter
So Solr do not rewrite the query to a large Boolean SHOULD, but it is
reading all t
Hi John,
about "integration other index implementation":
Sounds like you need a DBMS with some lucene features.
There was a post about using lucene in Oracle:
http://www.nabble.com/Using-lucene-as-a-database...-good-idea-or-bad-idea--to18703473.html#a18741137
and
http://www.nabble.com/Oracle-and-
Hi David,
this is not true, please take a look to
IndexWriter#setRAMBufferSizeMB
and
IndexWriter#setMaxBufferedDocs
But you can produce 9 segments (each with only one document), if you call
IndexWriter#flush
or
IndexWriter#commit
after each addDocument
so from my knowledge about lucene there is
Hi Bill,
you should not use prefix-query (*), because in first step lucene would
generate a list of all terms in this field, and than search for all this
terms. Which is senceless.
I would suggest to insert a new field "myFields" which contains as value the
names of all fields for this docum
Hi A.
starting point of xtf was the TEI format. I am very curious, if you find a
missing point for your needs.
(I already used it with cocoon.)
I never saw a better implementation of searching xml-aware: Each hit knows
his exact position inside the indexed(=source) xml-file :-)
I you dive into
hi Martin,
I think you are searching for
DuplicateFilter
http://www.nabble.com/how-to-get--all-unique--documents-based-on-keyword-feild-to18807014.html
best regards
Karsten
wysiecki wrote:
>
> Hello,
>
> thanks for help in advance.
>
> my example docs:
>
> two fileds company_id and co
Hi,
I want to agree with the advice of using only one index.
And I want to add two reasons:
1. Sorting and caching are working with the lucene-document-numbers.
In case of lucene "warming up" means that a lot of int-Arrays and bitsets
are stored in main memory.
If you using different MultiReader
Hi Nico Krijnen,
I think it is ok, to store a filter for each user-session im memory.
And I think that a cached filter is the correct approach for permissions.
(extra memory usage = one bit for each user and each document)
Hopefully someone with more experience will also answer your question.
B
Hi Ganesh,
in this Thread nobody said, that lucene is a good storage server.
Only "it could be used as storage server" (Grant: Connect data storage with
simple, fast lookup and Lucene..)
I don't now about automatic rentention.
But for the rest in your list of features I suggest to take a deep lo
Hi Grant,
you made mention of jackrabbit as example of storing data in lucene.
I did not find something like that in source-code. I found
"LocalFileSystem" and "DatabaseFileSystem".
(I found lucene for indexing and searching.)
Have I overlooked something?
Best regards
Karsten
Grant Inge
Hi Fayyaz,
again, this is about SAX-Handler not about lucene.
My understanding of what you want:
1. one lucene document for each SPEECH-Element (already implemented)
2. one lucene document for each SCENE-COMMENTARY-Element (not implemented
yet).
correct?
If yes, you can write
i
Hi,
only to be sure:
You know IndexModifier.deleteDocument(int)?
It is deprecated, because you should use
IndexWriter.deleteDocuments(Term[]).
What do you mean with "index is committed".
If you mean "optimize()" the document number will change (so there is a
side-effect;-)
best regards
Karste
Hi Fayyaz,
>From my point of view, this is not a lucene question.
If I understand your SAX-Handler correctly, you start a document with each
"speech"-start-Tag and you end this document with each "lines"-close-Tag.
So if you know that the SCENE-COMMENTARY Elements and the speech elements
are dis
Hi,
my question: How did ebay solve this problem?
Take a look to the faceted browsing in the mark twain project:
http://www.marktwainproject.org/xtf/search?keyword=Berlin&style=mtp
http://tinyurl.com/5cvb3c
This solution is open source and from the xtf project (they use lucene).
http://xtf.wiki
41 matches
Mail list logo