Re: problem found with DiskDocValuesFormat

2013-08-22 Thread Sean Bridges
Thanks for the answers, and thanks for the changes to load doc values to disk, it will be nice to use a supported codec. Upgrading our indexes is not an option, as they are very large. Sean On Wed, Aug 21, 2013 at 11:15 PM, Robert Muir wrote: > On Thu, Aug 22, 2013 at 1:48 AM, Sean Brid

Re: problem found with DiskDocValuesFormat

2013-08-21 Thread Sean Bridges
code from DiskDocValuesFormat and call it CustomDiskDocValuesFormat, and give CustomDiskDocValuesFormat a new name so that when we upgrade lucene, we won't use an incompatible version of DiskDocValuesFormat? Thanks, Sean On Wed, Aug 21, 2013 at 8:44 AM, Robert Muir wrote: > On Wed, Aug 21

Re: problem found with DiskDocValuesFormat

2013-08-21 Thread Sean Bridges
hanks, Sean On Tue, Aug 13, 2013 at 4:34 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > DiskDVFormat does not have index back compatibility between minor > releases; maybe that's what you are seeing? So, you must fully > re-index after any DiskDVFormat field after u

Re: Pulsing40PostingsFormat in lucene 4.1

2013-01-29 Thread Sean Bridges
Thanks, we will try the class path trickery. How do we avoid similar situations in the future? Is Pulsing41PostingsFormat going to be maintained in future versions of Lucene? What are the safe PostingFormat/Codecs to use? Every PostingFormat/Codec is @deprecated or @experimental. Sean On

Re: Pulsing40PostingsFormat in lucene 4.1

2013-01-29 Thread Sean Bridges
type org.apache.lucene.codecs.PostingsFormat with name 'Pulsing40' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.The current classpath supports the following names: [Lucene40, Lucene41, Pulsing41, SimpleText, Memory, BloomFilter, Direct] Thanks, Sean On Tu

Re: delete by docid in lucene 4

2012-07-12 Thread Sean Bridges
Thanks for the advice everyone, I'll try updateDocument() for now. Sean On Thu, Jul 12, 2012 at 3:25 PM, Michael McCandless wrote: > On Thu, Jul 12, 2012 at 6:17 PM, Simon Willnauer > wrote: >> Sean seriously a couple of hundred docs a second, don't bother just >

Re: delete by docid in lucene 4

2012-07-12 Thread Sean Bridges
I don't know if the difference is significant. It would be nice to have a deleteDocument(int docId) in IndexWriter. It seems like it would be easy to add as DocumentsWriter already has a deletedDocID. I can file a jira and submit a patch if this is something that you guys would accept. Sea

Re: delete by docid in lucene 4

2012-07-12 Thread Sean Bridges
Thanks for the tip. Does using updateDocument instead of addDocument affect indexing/search performance? Sean On Thu, Jul 12, 2012 at 9:27 AM, Uwe Schindler wrote: > The trick is to index not with addDocument(Document) but instead with > updateDocument(Term, Document). Lucene then ad

Re: delete by docid in lucene 4

2012-07-12 Thread Sean Bridges
Does that return a Term which matches the lucene docId? What is the value of Constants.DEFAULT_ID_FIELD ? Thanks, Sean On Thu, Jul 12, 2012 at 6:54 AM, Edward W. Rouse wrote: > I get around this by creating an id based term like: > > new Term(Constants.DEFAULT_ID_FIELD, id) > >

Re: delete by docid in lucene 4

2012-07-12 Thread Sean Bridges
. While calculating max and min serial id, if we see a duplicate serial id, we call IndexReader.deleteByDocId(...) . We could check for duplicate serial ids while indexing, but that is racy, and not as efficient. Thanks, Sean On Thu, Jul 12, 2012 at 12:42 AM, Simon Willnauer wrote: > On Thu, Jul

delete by docid in lucene 4

2012-07-11 Thread Sean Bridges
are the same. Thanks, Sean - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Memory question

2012-05-15 Thread Sean Bridges
/apache/cassandra/utils/CLibrary.java Sean On Tue, May 15, 2012 at 1:12 PM, Nader, John P wrote: > We've encountered this issue and came up with a fairly good approach to > address it. > > We are on Lucene 3.0.2 with Java 1.6.0_29.  Our indices are about 35GB in > size.  Our

RE: Is indexing much slower in 3.5.0 than in 2.4.1 for Wikipedia data?

2011-12-13 Thread Sean Tong
the benchmarks not comparable. Thanks, Sean 3.5.0 Index Stats with modified DocMaker: Number of fields: 4 Number of documents: 200,000 Number of terms: 3,694,904 Has deletions?/Optimized? No/No Index format: -11 (Lucene 3.1) Index functionality: lock-less, single norms, shared doc store, check

RE: Is indexing much slower in 3.5.0 than in 2.4.1 for Wikipedia data?

2011-12-13 Thread Sean Tong
is at least as good as 2.4.1 or 2.9.4? Do you have any recommendations on indexing configurations/settings? Through my experiments, I found large flush memory settings (e.g 64m or 128m) helps with the index performance for the Wikipeida data in 3.5.0 but not so much in 2.4.1. Thanks,

RE: Is indexing much slower in 3.5.0 than in 2.4.1 for Wikipedia data?

2011-12-12 Thread Sean Tong
000 2 16.00 101 20 761.95 262.4963,139,256 91,881,472 The performance is slightly better than the one using StandardAnalyzer, but this is still much worse than the performance with 2.4.1. Sean -Original Message- From: Simon Willna

RE: Is indexing much slower in 3.5.0 than in 2.4.1 for Wikipedia data?

2011-12-12 Thread Sean Tong
ystemErase { "Populate" CreateIndex { "MAddDocs" AddDoc > : 20 CloseIndex } NewRound } : 3 RepSumByName RepSumByPrefRound MAddDocs #End of wikipedia-default.alg file Thanks, Sean From: Sean Tong [mailto:st...@jamasoftware.com] Sent: Sund

Is indexing much slower in 3.5.0 than in 2.4.1 for Wikipedia data?

2011-12-11 Thread Sean Tong
indexing speed using 2.4.1 is 2.3x of the speed using 3.5.0. Did I miss any settings or configurations? Thanks, Sean - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: ja

lucene across many clients

2011-01-13 Thread Sean Joyce
ially problematic. Perhaps a better way would be to create one large index (or several large indices) and use a BitSet to to limit the results to only the relevant client. Has any one worked on a system similar to this one and can provide some architecture advice?

Re:Using Lucene to search live, being-edited documents

2010-12-28 Thread Sean
Does it make any sense? Every time a search result is shown, the original document could have been changed, no matter how fast the indexing speed is. If you can accept this inconsistency, you do not need to index so frequently at all. -- Original -- From: "s

Re:Using Lucene to search live, being-edited documents

2010-12-28 Thread Sean
Does it make any sense? Every time a search result is shown, the original document could have been changed, no matter how fast the indexing speed is. If you can accept this inconsistency, you do not need to index so frequently at all. -- Original -- From: "s

Re: Analyzer

2010-12-02 Thread Sean
By the way, is there an analyzer which splites each letter of a word? e.g. hello world => h/e/l/l/o/w/o/r/l/d Regards, Sean -- Original -- From: "Erick Erickson"; Date: Tue, Nov 30, 2010 09:07 PM To: "java-user";

Sample SynonymAnalyzer vs. Lucene 2.2

2007-10-19 Thread Sean Dague
0)' line, it works, but now that throws off the token positions. This probably doesn't matter, but I'm curious what the new prefered approach is here? Thanks in advance, -Sean -- ______ Sean Dague

search timeout

2007-03-15 Thread Sean Timm
e sense to add the feature at the Lucene level rather than implement the feature in each derivative. Thanks, Sean - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: luceneweb example returning null hrefs

2006-07-28 Thread SEAN MCELROY
Thank you. The example application is now working as expected. Sean Chen Wu <[EMAIL PROTECTED]> wrote: Hi, Please change the "url" to "path" in the result JSP file. coz the field name that is indexed is called "path" rather than "url". Chee

Re: luceneweb example returning null hrefs

2006-07-28 Thread SEAN MCELROY
Thank you. The example application is now working as expected. Sean Chen Wu <[EMAIL PROTECTED]> wrote: Hi, Please change the "url" to "path" in the result JSP file. coz the field name that is indexed is called "path" rather than "url". Chee

luceneweb example returning null hrefs

2006-07-28 Thread SEAN MCELROY
Hello, I am trying to use the luceneweb application that is shipped with the lucene installation. I have followed the installation instructions and the luceneweb application has been successfully deployed using Tomcat 5.5.9. However all the results returned point to http://localhost:8080/l

Re: SpanQuery parser? Update (ugly hack inside...)

2005-11-07 Thread Sean O'Connor
? If so, you'll also want to account for BooleanQuery, recursively. The surround parser can create both boolean queries and span queries. Sean, as you seem to prefer not to use the surround syntax, do you think this syntax could be improved somehow? I recall trying to make it simpler

Re: SpanQuery parser? Update (ugly hack inside...)

2005-11-07 Thread Sean O'Connor
Erik Hatcher wrote: On 4 Nov 2005, at 18:32, Sean O'Connor wrote: I'm posting this primarily hoping to give back a tiny bit to a very helpful community. More likely however, someone else will open my eyes to an easier approach than what I outline below... I've come up w

Re: SpanQuery parser? Update (ugly hack inside...)

2005-11-04 Thread Sean O'Connor
ct hit found. This is really only useful for "termA near 'some phrase'" at the moment, but might become more advanced in the next 2-3 months. Sean Paul Elschot wrote: On Thursday 20 October 2005 00:40, Sean O'Connor wrote: Hello, I have user entered search

SpanQuery parser?

2005-10-19 Thread Sean O'Connor
er help an existing effort, or just continue with my own hacking. Thanks, Sean ps: some of this message is repeated from previous postings just as background for my goal. - To unsubscribe, e-mail: [EMAIL PROTECTED] For addit

Location of code which determines a Hit for PhraseQuery

2005-09-07 Thread Sean O'Connor
#x27;proper' hit for something like an exact phrase? Apologies in advance for the poor sample text above, and the repetition in question matter. Hopefully I am getting closer to getting my head wrapped around the query/hit process (and then work on extending the hits to

Re: Hits document offset information? Span query or Surround? - thanks

2005-09-06 Thread Sean O'Connor
Thanks for the input. I am looking at the suggested links now. If I make any progress I will return to see if any of my work would be appropriate to contribute back. Sean Paul Elschot wrote: On Tuesday 06 September 2005 08:52, markharw00d wrote: >>I believe I have heard tha

Hits document offset information? Span query or Surround?

2005-09-05 Thread Sean O'Connor
available. It is something I need, even at the cost of search efficiency. Thanks Sean - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Can Span Queries contain boolean, prefix and other component queries?

2005-09-04 Thread Sean O'Connor
help, Sean Paul Elschot wrote: Sean, On Sunday 04 September 2005 20:43, Sean O'Connor wrote: Hello, I am trying to do some complex queries such as: [Field contents] The movie Napoleon Dynamite is a movie about a kid named Napoleon who has no Dynamite. [Query] "Napol* Dynam

Re: Lucene contrib (surround), Subversion, and Eclipse

2005-09-04 Thread Sean O'Connor
he benefits of using ant. I'll take a few hours and play with eclipse and it's ant integration on my next foray into the sandbox, er, I mean contribs : -). Thanks for the feedback, Sean Chris Hostetter wrote: I don't use Eclipse, (and in fac i've never acctaully built

Lucene contrib (surround), Subversion, and Eclipse

2005-09-04 Thread Sean O'Connor
ne else might benefit from this information. I assume though that anyone (else) wanting to play with Lucene development code would already be familiar with these steps, so it's probably not an issue. Thanks, Sean - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Phrase frequency

2005-09-04 Thread Sean O'Connor
. If I do, I would be happy to share. Good luck, and feel free to post anything you think might be helpful if you implement something. Sean Fabio Cristiano dos Anjos wrote: Hi, How can I get phrase frequency in an index? Thanks in advance

Can Span Queries contain boolean, prefix and other component queries?

2005-09-04 Thread Sean O'Connor
ry? Something like a PhrasePrefixQuery joined to a BooleanQuery by a SpanNearQuery? If not, does anyone have a suggestion on how to do this? I am assuming I will need to do two queries, and determine the 'nearness' of the resulti

Example of Field.TermVector.WITH_POSITIONS_OFFSETS usage?

2005-08-23 Thread Sean O'Connor
of Term, just field name and field contents?) The weight also seems to have an array of TermPositions, which have SegmentTermPositions. I thought this was what I wanted, but I don't see the proper start/end fields, or anything which seems to be on the right track. Can anyone point

Re: Search Hit frequency and location

2005-06-16 Thread Sean O'Connor
to educate myself would be welcome as well. Cheers, Sean Erik Hatcher wrote: On Jun 16, 2005, at 12:03 PM, Sean O'Connor wrote: Yes, see the Javadoc for IndexReader.termPositions(). I'm probably missing the obvious here, but I assume this refers to the analyzed ter

Search Hit frequency and location

2005-06-16 Thread Sean O'Connor
. individual words, possibly transmogrified by the analyzer). I further assume that this does not directly relate to the results of a search for "Lucene in Action". Where do I find information about the search hits? Have I