Thanks for the answers, and thanks for the changes to load doc values to
disk, it will be nice to use a supported codec.
Upgrading our indexes is not an option, as they are very large.
Sean
On Wed, Aug 21, 2013 at 11:15 PM, Robert Muir wrote:
> On Thu, Aug 22, 2013 at 1:48 AM, Sean Brid
code from DiskDocValuesFormat and call it
CustomDiskDocValuesFormat, and give CustomDiskDocValuesFormat a new name so
that when we upgrade lucene, we won't use an incompatible version of
DiskDocValuesFormat?
Thanks,
Sean
On Wed, Aug 21, 2013 at 8:44 AM, Robert Muir wrote:
> On Wed, Aug 21
hanks,
Sean
On Tue, Aug 13, 2013 at 4:34 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> DiskDVFormat does not have index back compatibility between minor
> releases; maybe that's what you are seeing? So, you must fully
> re-index after any DiskDVFormat field after u
Thanks, we will try the class path trickery.
How do we avoid similar situations in the future? Is Pulsing41PostingsFormat
going to be maintained in future versions of Lucene? What are the
safe PostingFormat/Codecs
to use? Every PostingFormat/Codec is @deprecated or @experimental.
Sean
On
type
org.apache.lucene.codecs.PostingsFormat with name 'Pulsing40' does not
exist. You need to add the corresponding JAR file supporting this SPI to
your classpath.The current classpath supports the following names:
[Lucene40, Lucene41, Pulsing41, SimpleText, Memory, BloomFilter, Direct]
Thanks,
Sean
On Tu
Thanks for the advice everyone, I'll try updateDocument() for now.
Sean
On Thu, Jul 12, 2012 at 3:25 PM, Michael McCandless
wrote:
> On Thu, Jul 12, 2012 at 6:17 PM, Simon Willnauer
> wrote:
>> Sean seriously a couple of hundred docs a second, don't bother just
>
I don't know if the
difference is significant.
It would be nice to have a deleteDocument(int docId) in IndexWriter.
It seems like it would be easy to add as DocumentsWriter already has a
deletedDocID. I can file a jira and submit a patch if this is
something that you guys would accept.
Sea
Thanks for the tip.
Does using updateDocument instead of addDocument affect
indexing/search performance?
Sean
On Thu, Jul 12, 2012 at 9:27 AM, Uwe Schindler wrote:
> The trick is to index not with addDocument(Document) but instead with
> updateDocument(Term, Document). Lucene then ad
Does that return a Term which matches the lucene docId? What is the
value of Constants.DEFAULT_ID_FIELD ?
Thanks,
Sean
On Thu, Jul 12, 2012 at 6:54 AM, Edward W. Rouse wrote:
> I get around this by creating an id based term like:
>
> new Term(Constants.DEFAULT_ID_FIELD, id)
>
>
. While calculating max and min serial id, if we see a
duplicate serial id, we call IndexReader.deleteByDocId(...) .
We could check for duplicate serial ids while indexing, but that is
racy, and not as efficient.
Thanks,
Sean
On Thu, Jul 12, 2012 at 12:42 AM, Simon Willnauer
wrote:
> On Thu, Jul
are the same.
Thanks,
Sean
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
/apache/cassandra/utils/CLibrary.java
Sean
On Tue, May 15, 2012 at 1:12 PM, Nader, John P wrote:
> We've encountered this issue and came up with a fairly good approach to
> address it.
>
> We are on Lucene 3.0.2 with Java 1.6.0_29. Our indices are about 35GB in
> size. Our
the benchmarks not comparable.
Thanks,
Sean
3.5.0 Index Stats with modified DocMaker:
Number of fields: 4
Number of documents: 200,000
Number of terms: 3,694,904
Has deletions?/Optimized? No/No
Index format: -11 (Lucene 3.1)
Index functionality: lock-less, single norms, shared doc store, check
is at least as
good as 2.4.1 or 2.9.4? Do you have any recommendations on indexing
configurations/settings? Through my experiments, I found large flush memory
settings (e.g 64m or 128m) helps with the index performance for the Wikipeida
data in 3.5.0 but not so much in 2.4.1.
Thanks,
000 2 16.00 101 20 761.95
262.4963,139,256 91,881,472
The performance is slightly better than the one using StandardAnalyzer, but
this is still much worse than the performance with 2.4.1.
Sean
-Original Message-
From: Simon Willna
ystemErase
{ "Populate"
CreateIndex
{ "MAddDocs" AddDoc > : 20
CloseIndex
}
NewRound
} : 3
RepSumByName
RepSumByPrefRound MAddDocs
#End of wikipedia-default.alg file
Thanks,
Sean
From: Sean Tong [mailto:st...@jamasoftware.com]
Sent: Sund
indexing speed using 2.4.1 is 2.3x of the speed using 3.5.0. Did I miss
any settings or configurations?
Thanks,
Sean
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: ja
ially problematic. Perhaps a better way would be to
create one large index (or several large indices) and use a BitSet to
to limit the results to only the relevant client. Has any one worked
on a system similar to this one and can provide some architecture
advice?
Does it make any sense?
Every time a search result is shown, the original document could have been
changed, no matter how fast the indexing speed is.
If you can accept this inconsistency, you do not need to index so frequently at
all.
-- Original --
From: "s
Does it make any sense?
Every time a search result is shown, the original document could have been
changed, no matter how fast the indexing speed is.
If you can accept this inconsistency, you do not need to index so frequently at
all.
-- Original --
From: "s
By the way, is there an analyzer which splites each letter of a word?
e.g.
hello world => h/e/l/l/o/w/o/r/l/d
Regards,
Sean
-- Original --
From: "Erick Erickson";
Date: Tue, Nov 30, 2010 09:07 PM
To: "java-user";
0)' line, it works, but
now that throws off the token positions. This probably doesn't matter,
but I'm curious what the new prefered approach is here?
Thanks in advance,
-Sean
--
______
Sean Dague
e sense to add the feature at the Lucene level rather
than implement the feature in each derivative.
Thanks,
Sean
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Thank you. The example application is now working as expected.
Sean
Chen Wu <[EMAIL PROTECTED]> wrote:
Hi,
Please change the "url" to "path" in the result JSP file. coz the field
name that is indexed is called "path" rather than "url".
Chee
Thank you. The example application is now working as expected.
Sean
Chen Wu <[EMAIL PROTECTED]> wrote:
Hi,
Please change the "url" to "path" in the result JSP file. coz the field
name that is indexed is called "path" rather than "url".
Chee
Hello,
I am trying to use the luceneweb application that is shipped with the lucene
installation. I have followed the installation instructions and the luceneweb
application has been successfully deployed using Tomcat 5.5.9. However all the
results returned point to http://localhost:8080/l
? If so, you'll also want to account for
BooleanQuery, recursively.
The surround parser can create both boolean queries and span queries.
Sean, as you seem to prefer not to use the surround syntax, do you think
this syntax could be improved somehow? I recall trying to make it simpler
Erik Hatcher wrote:
On 4 Nov 2005, at 18:32, Sean O'Connor wrote:
I'm posting this primarily hoping to give back a tiny bit to a very
helpful community. More likely however, someone else will open my
eyes to an easier approach than what I outline below...
I've come up w
ct hit found. This is really only useful for
"termA near 'some phrase'" at the moment, but might become more advanced
in the next 2-3 months.
Sean
Paul Elschot wrote:
On Thursday 20 October 2005 00:40, Sean O'Connor wrote:
Hello,
I have user entered search
er help an existing effort, or just continue with my
own hacking.
Thanks,
Sean
ps: some of this message is repeated from previous postings just as
background for my goal.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For addit
#x27;proper' hit for something like an
exact phrase?
Apologies in advance for the poor sample text above, and the
repetition in question matter. Hopefully I am getting closer to getting
my head wrapped around the query/hit process (and then work on extending
the hits to
Thanks for the input. I am looking at the suggested links now. If I make
any progress I will return to see if any of my work would be appropriate
to contribute back.
Sean
Paul Elschot wrote:
On Tuesday 06 September 2005 08:52, markharw00d wrote:
>>I believe I have heard tha
available. It is something I need, even at the cost of search efficiency.
Thanks
Sean
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
help,
Sean
Paul Elschot wrote:
Sean,
On Sunday 04 September 2005 20:43, Sean O'Connor wrote:
Hello,
I am trying to do some complex queries such as:
[Field contents]
The movie Napoleon Dynamite is a movie about a kid named Napoleon who
has no Dynamite.
[Query]
"Napol* Dynam
he benefits
of using ant. I'll take a few hours and play with eclipse and it's ant
integration on my next foray into the sandbox, er, I mean contribs : -).
Thanks for the feedback,
Sean
Chris Hostetter wrote:
I don't use Eclipse, (and in fac i've never acctaully built
ne else might
benefit from this information. I assume though that anyone (else)
wanting to play with Lucene development code would already be familiar
with these steps, so it's probably not an issue.
Thanks,
Sean
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
.
If I do, I would be happy to share.
Good luck, and feel free to post anything you think might be helpful if
you implement something.
Sean
Fabio Cristiano dos Anjos wrote:
Hi,
How can I get phrase frequency in an index?
Thanks in advance
ry? Something like a
PhrasePrefixQuery joined to a BooleanQuery by a SpanNearQuery?
If not, does anyone have a suggestion on how to do this? I am
assuming I will need to do two queries, and determine the 'nearness' of
the resulti
of Term, just field name and field contents?)
The weight also seems to have an array of TermPositions, which have
SegmentTermPositions. I thought this was what I wanted, but I don't see
the proper start/end fields, or anything which seems to be on the right
track.
Can anyone point
to
educate myself would be welcome as well.
Cheers,
Sean
Erik Hatcher wrote:
On Jun 16, 2005, at 12:03 PM, Sean O'Connor wrote:
Yes, see the Javadoc for IndexReader.termPositions().
I'm probably missing the obvious here, but I assume this refers to
the analyzed ter
. individual words, possibly transmogrified by
the analyzer).
I further assume that this does not directly relate to the results of
a search for "Lucene in Action". Where do I find information about the
search hits? Have I
41 matches
Mail list logo