Re: Strange Error while deleting Documents from index while indexing.

2007-07-26 Thread miztaken
Where shall i post this issue. I am naive to Lucene. And about IndexWriter Closing. Now i am trying like this: 1. Open New IndexReader. 2. Delete Documents. 3. Close IndexReader. 4. Open New IndexWriter. 5. Write Documents. 6. Close IndexWriter. 7. Repeat the process for n times the in nth time o

Re: Lucene equivalent of SQL DISTINCT for a specific field's "stored values"

2007-07-26 Thread Daniel Noll
On Friday 27 July 2007 12:50:12 TimF wrote: > However, obviously this returns the list of distinct terms, >Hello , World , Goodbye , Foo , Bar , Mad > > not the list of distinct stored values, >Hello World , Goodbye World , Foo Bar , Mad Mad Mad Mad World > > I could add another field to th

Lucene equivalent of SQL DISTINCT for a specific field's "stored values"

2007-07-26 Thread TimF
I have a field called "category". Sample data for "category: Hello World Goodbye World Foo Bar Mad Mad Mad Mad World It is tokenized and stored in the index. I tokenize the field because I may want to search on a specific word(s) in a category but not necessarily the entire category.

Re: Strange Error while deleting Documents from index while indexing.

2007-07-26 Thread Doron Cohen
Seems like a Lucene.Net issue, better post there to solve this. One comment - must close the writer between iterations otherwise next attempt to delete a document with a reader will fail to obtain a write lock. miztaken <[EMAIL PROTECTED]> wrote on 25/07/2007 23:27:59: > > Hi, > I am dumping the

Re: Linear Hashing in Lucene?

2007-07-26 Thread Dmitry
Karl, Thanks for info, its very difficult to find something about Orion Algorithm and Linear Hashing. I will check the thread . DT, www.ejinz.com Search Engine Advertisement - Original Message - From: "karl wettin" <[EMAIL PROTECTED]> To: Sent: Thursday, July 26, 2007 3:49 PM Subject

Re: Linear Hashing in Lucene?

2007-07-26 Thread karl wettin
26 jul 2007 kl. 05.56 skrev Dmitry: 1. does exist Ontology Wraper in Lucene implementation? Not publically available as far as I know. There have been some discussion on the forums though, you could try to search for OWL, RDF or something using Nabble and get in touch with the authors of

Re: Displaying results in the order

2007-07-26 Thread karl wettin
26 jul 2007 kl. 05.38 skrev Dmitry: Is there a way to update a document in the Index without causing any change to the order in which it comes up in searches? I would say no, the score is calculated based on the matching terms, content length, et c. For details see

Re: Delete corrupted doc

2007-07-26 Thread Rafael Rossini
I see, thanks. On 7/26/07, Mike Klaas <[EMAIL PROTECTED]> wrote: On 26-Jul-07, at 10:18 AM, Rafael Rossini wrote: > Yes, I optimized, but in the with SOLR. I don´t know why, but when > optimize > an index with SOLR, it leaves you with about 15 files, instead of > the 3... You are probably no

Search terms on a single "instance" of field

2007-07-26 Thread Rafael Rossini
Hi guys, I have a problem that is kind of tricky: I have a set of documents that I enrich with dynamic metadata. The metada name is the fieldName in lucene and the value is the text. For example: "Rio de Janeiro is a beautiful city." would be indexed in one field called text, and on ano

Re: Delete corrupted doc

2007-07-26 Thread Mike Klaas
On 26-Jul-07, at 10:18 AM, Rafael Rossini wrote: Yes, I optimized, but in the with SOLR. I don´t know why, but when optimize an index with SOLR, it leaves you with about 15 files, instead of the 3... You are probably not using the compound file format. Try setting: true in solrconfig

Re: Delete corrupted doc

2007-07-26 Thread Yonik Seeley
On 7/26/07, Rafael Rossini <[EMAIL PROTECTED]> wrote: > Well... thanks for the help, this was really my last solution (rebuild) but > I think I have no other choice... I really can´t tell exactly if this > corruption was caused by bad hardware or not, but do you guys have any > ideia about what mig

Re: Delete corrupted doc

2007-07-26 Thread Rafael Rossini
Well... thanks for the help, this was really my last solution (rebuild) but I think I have no other choice... I really can´t tell exactly if this corruption was caused by bad hardware or not, but do you guys have any ideia about what might have happend here? Could I have generated this corruption

Re: Delete corrupted doc

2007-07-26 Thread Yonik Seeley
On 7/26/07, Mark Miller <[EMAIL PROTECTED]> wrote: > Anyway, what this says to me (and I should have realized this before) is > that there is no document with your corrupt id, rather there is a term that > thinks it is in that invalid doc id. The corruption must be in the > term:docids inverted ind

Re: Delete corrupted doc

2007-07-26 Thread Mark Miller
From what I can tell, you shouldn't need to even try my first suggestion (what happened to the experts on this question by the way?). Returning true from isDeleted for the corrupt id should not matter. It appears to me that deletes are handled by keeping a simple list of the id's that are delet

Re: Delete corrupted doc

2007-07-26 Thread Rafael Rossini
Yes, I optimized, but in the with SOLR. I don´t know why, but when optimize an index with SOLR, it leaves you with about 15 files, instead of the 3... I´ll try to optimize directly on lucene, and see what happens, if nothing happens I´ll try your suggestion. Thanks a lot Mark!! On 7/26/07, Mark M

RE: How to show category count with results?

2007-07-26 Thread Ramana Jelda
Hi , Of course this statement is very expensive. -->document.get("CAMPCATID")==null?"":document.get("CAMPCATID"); Use StringIndex/FieldCache/something similar to implement category counting. :) Jelda > -Original Message- > From: Bhavin Pandya [mailto:[EMAIL PROTECTED] > Sent: Thursday,

Re: MoreLikeThis for multiple documents

2007-07-26 Thread Grant Ingersoll
I have some sample code for doing relevance feedback across multiple documents at http://www.cnlp.org/apachecon2005 It could be modified to provide more of the MoreLikeThis functionality (i.e. determining important terms via tf/idf) for now it just takes the top X terms -Grant On Jul 25,

How to show category count with results?

2007-07-26 Thread Bhavin Pandya
Hi, I want to show each category name and its count with results. I achieved this using DocCollector but its very slow when no of results in lacs... As fetching of documents from reader in collect method is expensive... public void collect(int doc, float score) { Document document = mread

Re: multi-field and wildcard query highlighter questions

2007-07-26 Thread Lukas Vlcek
Hi! On 7/20/07, Mark Miller <[EMAIL PROTECTED]> wrote: > > 1) Perhaps the the query you tried does not match anything in your > index? What release are you using? [prefix*] works fine for me. I realized that this was caused by Compass' own implementation of Queryparser (i.e.: CompassQueryParser)

Re: StandardTokenizer is slowing down highlighting a lot

2007-07-26 Thread Stanislaw Osinski
> If anyone is interested, I could prepare a JFlex based Analyzer > equivalent > (to the extent possible) to current StandardAnalyzer, which might > offer nice > indexing and highlighting speed-ups. +1. I think a lot of people would be interested in a faster StandardAnalyzer. I've attached a

Re: MoreLikeThis for multiple documents

2007-07-26 Thread Mathieu Lecarme
Jens Grivolla a écrit : > Hello, > > I'm looking to extract significant terms characterizing a set of > documents (which in turn relate to a topic). > > This basically comes down to functionality similar to determining the > terms with the greatest offer weight (as used for blind relevance > feedba

Re: Assembling a query from multiple fields

2007-07-26 Thread Askar Zaidi
I did this yesterday. Manually appended an extra field to the query. It works fine. On 7/26/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > > On Jul 25, 2007, at 5:05 PM, Joe Attardi wrote: > > As far as I can tell, I basically have two options: > > (1) Manually prepend the field identifier to the

Re: Assembling a query from multiple fields

2007-07-26 Thread Erik Hatcher
On Jul 25, 2007, at 5:05 PM, Joe Attardi wrote: As far as I can tell, I basically have two options: (1) Manually prepend the field identifier to the query text, for example: String fullQuery = field + ":" + queryText; then parse this query normally with QueryParser, OR (2) Since

Re: Delete corrupted doc

2007-07-26 Thread Mark Miller
You know, on second though, a merge shouldn't even try to access a doc > maxdoc (i think). Have you just tried an optimize? On 7/25/07, Rafael Rossini <[EMAIL PROTECTED]> wrote: Hi guys, Is there a way of deleting a document that, because of some corruption, got and docID larger than the m

Re: Delete corrupted doc

2007-07-26 Thread Mark Miller
This may not be very elegant, but if you are really in a jam, here is what I would try: Check out a copy of Lucene. Modify the isDeleted method on both MultiReader and SegmentReader so that it returns true if the docid passed in is the id in question (if it is not the id, then just have the metho

RE: Solr newbe

2007-07-26 Thread Darren Hartford
One side-note is various content management tools already handle a lot of data extraction (POI/PDFBox/etc). In the case of Jakarta Slide and Apache Jackrabbit, both use Lucene under the covers to index this data. Not sure if you want to take the approach of putting your documents as 'managed' und

Re: Highlighter strategy in Lucene

2007-07-26 Thread Mark Miller
There has been a lot of Highlighter discussion on the list. Search the list at nabble or gossamer-threads and you will find a lot of info. - Mark On 7/25/07, Dmitry <[EMAIL PROTECTED]> wrote: Waht kind of Highlighter strategy Lucene is using? thanks, Dt www.ejinz.com Search Engine for News -

Solr newbe

2007-07-26 Thread Arne Muller
Hello, I've just started with Lucene to index a file server and aiming to index lotus notes and some tables from relational databases. After some research, I came (so far) to the conclusion that I'm re-inventing the wheel, and that it may be better to use solr or nutch as lucene front-ends. I w

RE: Multiple Languages with Lucene (Arabic & English)

2007-07-26 Thread Elie Choueiri
Thanks for the clarification, I'll play around with it and head back if things don't work according to plan. Thanks again, e -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 5:53 PM To: java-user@lucene.apache.org Subject: Re: Multiple Langu