Has anyone experienced a latency increase between the above versions?
Mainly in conjunction queries.
Thanks
-John
Hi folks:
Sorry about the cross-post.
Luke is awesome, but sometimes you only have command line access to
your production boxes. So I wrote CLue, a command line lucene interface in
the likes of Luke:
Please take a look and collaborators wanted :)
https://github.com/javasoze/clu
Hi:
I see TermDocs.close not being called when created with TermQuery:
TermQuery creates it and passes to TermScorer, and is never closed.
I see TermDocs.close actually closes the input stream.
Is it safe not closing TermDocs?
Thanks
-John
Any comments?
Did we just unintentionally remove getFieldComparatorSource in 3.0.0?
-John
-- Forwarded message --
From: John Wang
Date: Mon, Dec 21, 2009 at 11:21 AM
Subject: 3.0 api change
To: Lucene Users List ,
lucene-...@jakarta.apache.org
Hi guys:
I noticed
Hi:
I did some performance analysis for different ways of doing numeric
ranging with lucene. Thought I'd share:
http://invertedindex.blogspot.com/2009/11/numeric-range-queries-comparison.html
-John
If you run the zoie test turned to nrt, you can replicate it rather easily:
While the test is running, do lsof on your process, e.g.
lsof -p | | wc
-John
On Thu, Nov 12, 2009 at 8:24 AM, John Wang wrote:
> Well, I have code in the finally block to call IndexReader.close for every
>
t; reader you get back from getReader?
>
> Mike
>
> On Sun, Nov 8, 2009 at 10:41 PM, John Wang wrote:
> > I am seeing the samething, but only when IndexWriter.getReader is called
> at
> > a high rate.
> >
> > from lsof, I see file handles growing.
> >
> &
I am seeing the samething, but only when IndexWriter.getReader is called at
a high rate.
from lsof, I see file handles growing.
-John
On Sun, Nov 8, 2009 at 7:29 PM, Daniel Noll wrote:
> Hi all.
>
> We updated to Lucene 2.9, and now we find that after closing our text
> index, it is not possib
Hi guys:
Running into a strange problem:
I am indexing into a field a numeric string:
int n = Math.abs(rand.nextInt(100));
Field myField = new Field(MY_FIELD,String.valueOf(n),Store.NO,Index.
NOT_ANALYZED_NO_NORMS);
myField.setOmitTermFreqAndPositions(true);
doc.add(myFi
n cost -
> in some cases it does not.
>
> But we are talking degradation as you add more segments, not pure speed.
> Degradation is worse now in the sort case.
>
> John Wang wrote:
> > With many other coding that happened in 2.9, e.g. the PQ api etc.,
> sorting
&g
With many other coding that happened in 2.9, e.g. the PQ api etc., sorting
is actually faster than 2.4.
-John
On Thu, Oct 22, 2009 at 5:07 AM, Mark Miller wrote:
> Bill Au wrote:
> > Since Lucene 2.9 has per segment searching/caching, does query
> performance
> > degrade less than before (2.9) a
Hi Glen:
I think it is in your application code:
The indexReader returned is not closed if the underlying index has changed.
If your update rate is high, you will run into this issue because GC may not
have caught up with the FH leak.
THe code should instead be:
if (indexReader!=null){
I
I think it was my email Yonik responded to and he is right, I was being lazy
and didn't read the javadoc very carefully.My bad.
Thanks for the javadoc change.
-John
On Mon, Oct 12, 2009 at 1:57 PM, Yonik Seeley wrote:
> On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix
> wrote:
> > It may be surpri
Oh, that is really good to know!
Is this deterministic? e.g. as long as writer.addDocument() is called, next
getReader reflects the change? Does it work with deletes? e.g.
writer.deleteDocuments()?
Thanks Mike for clarifying!
-John
On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless <
luc...@mik
Given you have 1M docs and about 1M terms, do you see very few docs per
term?
If your DocSet per term is very sparse, BitSet is probably not a good
representation. Simple int array maybe better for memory, and faster for
iterating.
-John
On Mon, Oct 12, 2009 at 8:45 AM, Paul Elschot wrote:
> On
Hi guys:
The new FieldComparator api looks really scary :)
But after some perf testing with numbers I'd like to share, I guess it
is worth it:
HW: Mac Pro with 16G memory
jvm: 1.6.0_13"
jvm arg: -Xms1g -Xmx1g -server
setup
index:
1M docs even split into 8 segments (to make sure the test
Eric:
For more specific Zoie questions, let's move it to the zoie discussion
group instead.
Thanks
-John
On Sun, Oct 11, 2009 at 2:31 PM, John Wang wrote:
> Hi Eric:
>
> I regret the direction the thread has taken and partly take responsibility
> for it...
>
> As t
globe. Sometimes there are differences of
> opinion, however those are easily ironed out over time (and quite
> frankly in this case benchmarks).
>
> However I am very concerned about your ignorant disregard of some of the
> most basic human rights in existence.
>
> -J
&
I can provide some preliminary numbers (we will need to do some detailed
analysis and post it somewhere):
Dataset: medline
starting index: empty.
add only, no update, for 30 min.
maximum indexing load, 1000 docs/ sec
Under stress, we take indexing events (add only) and stream into both
systems: Z
Jason:
I would really appreciate it if you would stop making false
statements and misinformation. Everyone is entitled to his/her opinions on
technologies, but deliberately making misleading and false information on
such a distribution is just unethical, and you'll end up just discrediting
Looking at the code, seems there is a disconnect between how/when field
cache is loaded when IndexWriter.getReader() is called.
Is FieldCache updated? Otherwise, are we reloading FieldCache for each
reader instance?
Seems for operations that lazy loads field cache, e.g. sorting, this has a
signif
If you escape the character + or #, the sentence:
"I know java + c++" would not skip +, furthermore, it breaks query parsing,
where + is reserved.
-John
On Thu, Jul 16, 2009 at 9:04 AM, John Wang wrote:
> This runs into problems when you have such following sentence:
> "I
This runs into problems when you have such following sentence:
"I dislike c++."
If you use WSA, then last token is "c++.", not "c++", the query would not
find this document.
-John
On Thu, Jul 16, 2009 at 8:29 AM, Chris Salem wrote:
> That seems to be working. you don't have to escape the plus
Hi guys:
Running into a question with IndexWriter.addIndexesNoOptimize:
I am trying to expand a smaller index by replicating it into a larger
index. So I am adding the same directory N times.
I get an exception because noDupDirs(dirs) fails. For this call, is this
check neccessary?
You are right, Grant.Michael, Anmol, let's move this to the kamikaze mailing
list:
http://groups.google.com/group/kamikaze-users
Michael, I have added you by default.
-John
On Thu, Apr 30, 2009 at 4:37 PM, Grant Ingersoll wrote:
> Does Kamikaze have a mailing list? It seems like, to me anyway,
What analyzers are you using for both query and indexing?Can you also post
some code on you indexed?
-John
On Fri, Apr 24, 2009 at 8:02 PM, blazingwolf7 wrote:
>
> Hi,
>
> I created a query that will find a match inside documents. Example of text
> match "terror india"
> And documents with this
Hi Michael:
We are using it internally here at LinkedIn for both our search engine
as well as our social graph engine. And we have a team developing actively
on it. Let us know how we can help you.
-John
On Fri, Apr 24, 2009 at 1:56 PM, Michael Mastroianni <
mmastroia...@glgroup.com> wrote:
Karsten:
Yes, you kinda need that for faceting to work. Take a look at
FacetDataCache class.
-John
On Wed, Apr 22, 2009 at 3:06 AM, Karsten F.
wrote:
>
> Hi Dave,
>
> facets:
> in you case a solution with one
> int[IndexReader.maxDoc()]
> fits. For each document number you can store an inte
Hi David:
We built bobo-browse specifically for these types of usecases:
http://code.google.com/p/bobo-browse
Let me know if you need any help getting it going.
-John
On Mon, Apr 20, 2009 at 12:59 PM, Karsten F.
wrote:
>
> Hi David,
>
> correct: you should avoid reading the content o
Is there a reason the Query build is from a bitset via a ConstantScoreQuery
instead a RangeQuery? Seems we would be paying a penalty for loading the
bitset, esp the bitset would be rather sparse.
Furthermore, is TrieRangeQuery planning to be somehow used in the spatial
package?
Thanks
-John
On
Little I know about GSA, there isn't a distributed solution (old
information, not sure if it is still the case), so it is not very easy to
scale your search system. Something you can achieve rather easily with a
Lucene/Solr implementation.
There are other benefits of using an open source solution s
> John mentions.
>
> -Grant
>
>
> On Apr 3, 2009, at 7:24 PM, John Wang wrote:
>
> Not quite.For example, # of fields is static thru out the corpus. # zones
>> is per document. E.g. let's say you have 1 million docs, some docs have 2
>> paragraphs,
rch, but came up empty handed.
>
> Thanks for your time!
>
> Matthew Runo
> Software Engineer, Zappos.com
> mr...@zappos.com - 702-943-7833
>
> On Apr 3, 2009, at 10:08 AM, John Wang wrote:
>
> > Verity VDK, which was bought by autonomy, has zone search. S
;
> Thanks for your time!
>
> Matthew Runo
> Software Engineer, Zappos.com
> mr...@zappos.com - 702-943-7833
>
>
> On Apr 3, 2009, at 10:08 AM, John Wang wrote:
>
> Verity VDK, which was bought by autonomy, has zone search. Something
>> lucene
>> currently does not
Verity VDK, which was bought by autonomy, has zone search. Something lucene
currently does not support.
We have implemented it ontop of lucene and thinking about contributing.
-John
On Fri, Apr 3, 2009 at 8:56 AM, Lukáš Vlček wrote:
> Hi,
> anybody has experience with Automony search technolog
m doing?
BTW, can you shine some light on why would IndexWriter move docids around
when it is opened and no docs has been added to it?
Thanks
-John
On Thu, Apr 2, 2009 at 2:20 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Wed, Apr 1, 2009 at 6:37 PM, John Wang wrot
PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Wed, Apr 1, 2009 at 5:22 PM, John Wang wrote:
> > Hi Michael:
> >
> >1) Yes, we use TermDocs, exactly what
> IndexWriter.deleteDocuments(Term)
> > is doing under the cover.
>
> This
ess <
luc...@mikemccandless.com> wrote:
> On Wed, Apr 1, 2009 at 2:04 PM, John Wang wrote:
>
> > My test essentially this. I took out the reader.deleteDocuments call from
> > both scenarios. I took a index of 5m docs. a batch of 1 randomly
> > generated uids.
> >
Thanks Michael for the info.
I do guarantee there are not modifications between when
"MySpecialIndexReader" is loaded and when I iterate and find the deleted
docids. I am, however, not aware that when IndexWriter is opened, docids
move. I thought only when docs are added and when it is committed.
how
> would you produce that docIdSet?
>
> We could consider delete by Filter instead, since that exposes the
> necessary getDocIdSet(IndexReader) method.
>
> Or, with near real-time search, we could enhance it to allow deletions
> via the obtained reader (the first approach doesn&
So do you think it is a good addition/change to the current api now?
-John
On Tue, Mar 31, 2009 at 2:18 PM, Yonik Seeley wrote:
> On Tue, Mar 31, 2009 at 4:58 PM, John Wang wrote:
> > I fail to see the difference of exposing the api to allow for a Query
> > instance to be
Excellent!
Thanks
-John
On Tue, Mar 31, 2009 at 2:25 PM, Yonik Seeley wrote:
> On Tue, Mar 31, 2009 at 4:55 PM, John Wang wrote:
> > Maybe I am missing something. I don't see any calls that would gimme the
> > number of segments. Are you suggesting:
> IndexCom
eeley
wrote:
> On Tue, Mar 31, 2009 at 3:41 PM, John Wang wrote:
> > Also, can we expose IndexWriter.deleteDocuments(int[] docids)?
>
> Exposing internal ids from the IndexWriter may not be a good idea
> given that they are transient.
>
>
> -Yonik
&
Maybe I am missing something. I don't see any calls that would gimme the
number of segments. Are you suggesting: IndexCommit.getFileNames().size()?
Thanks
-John
On Tue, Mar 31, 2009 at 1:04 PM, Yonik Seeley wrote:
> On Tue, Mar 31, 2009 at 3:43 PM, John Wang wrote:
> > Can we ha
Can we have an API that exposes index information, e.g. number of segments
etc.? (or simply make SegmentInfo(s) public classes)
We currently do this by working around package-level protecting by sneaking
in a subclass in the org.apache.index package. We are moving towards OSGI,
and split-packages
Hi guys:
IndexWriter.deleteDocuments(Query query) api is not really making sense
to me. Wouldn't IndexWriter.deleteDocuments(DocIdSet set) be better? Since
we don't really care about scoring for this call.
Also, can we expose IndexWriter.deleteDocuments(int[] docids)? Using the
current api is
Elschot wrote:
> John,
>
> On Sunday 08 February 2009 00:35:10 John Wang wrote:
> > Our implementation of facet search can handle this. Using bitsets for
> > intersection is not scalable performance wise when index is large.
> >
> > We are using a compact forwarded i
Our implementation of facet search can handle this. Using bitsets for
intersection is not scalable performance wise when index is large.
We are using a compact forwarded index representation in memory for the
counting. Similar to FieldCache idea but more compact.
Check it out at: http://sourcefor
Luke is great, but sometimes you don't have a windowing system installed on
the target machine. A webapp like LIMO is very useful. It is unfortunate
that it is not being maintained.
-John
On Mon, Jan 26, 2009 at 3:44 PM, Chris Hostetter
wrote:
>
> : I need to monitor my searches and index. i kn
Mike:
"We are considering replacing the current random-access
IndexReader.isDeleted(int docID) method with an iterator & skipTo
(DocIdSet) access that would let you iterate through the deleted
docIDs, instead."
This is exactly what we are doing. We do have to however, build the
intern
> > On Wednesday 07 January 2009 07:25:17 John Wang wrote:
> > > Hi:
> > >
> > >The default buffer size (for docid,score etc) is 32 in TermScorer.
> > >
> > > We have a large index with some terms to have very dense doc sets.
> By
> &
Hi:
The default buffer size (for docid,score etc) is 32 in TermScorer.
We have a large index with some terms to have very dense doc sets. By
increasing the buffer size we see very dramatic performance improvements.
With our index (may not be typical), here are some numbers with buffer
ieldable youll find:
>
> /** Expert:
> *
> * If set, omit term freq, positions and payloads from postings for this
> field.
> */
> void setOmitTf(boolean omitTf);
>
> - Mark
>
>
> John Wang wrote:
>
>> Thanks Mark!I don't think it is documented (a
Thanks Mark!I don't think it is documented (at least the ones I've read),
should this be considered as a bug or ... ?
Thanks
-John
On Thu, Dec 18, 2008 at 2:05 PM, Mark Miller wrote:
> Drops positions as well.
>
> - Mark
>
>
>
> On Dec 18, 2008, at 4:57 PM, &quo
Hi:
In lucene 2.4, when Field.omitTF() is called, payload is disabled as
well. Is this intentional? My understanding is payload is independent from
the term frequencies.
Thanks
-John
between solr and browseengine ?
>
> Thanks for mention browseengine. I really like the car demo!
>
> Best regards
> Karsten
>
>
> John Wang wrote:
> >
> > We are doing lotsa internal changes for performance. Also upgrading the
> > api
> > to support
wsing:
> starting point is
> org.cdlib.xtf.textEngine.facet.GroupCounts#addDoc
> ?
> (It works with millions of facet values on millions of hits)
>
> What is the starting point in browseengine?
>
> How is the connection between solr and browseengine ?
>
> Thanks for mention browseengine. I really like t
We are doing lotsa internal changes for performance. Also upgrading the api
to support for features. So my suggestion is to wait for 2.0. (should
release this this month, at the latest mid jan) We can take this offline if
you want to have a deeper discussion on browse engine.
Thanks
-John
On Thu
We are doing a release shortly which contains API change.Let us know if you
need help.
-John
On Wed, Dec 10, 2008 at 11:27 AM, John Wang <[EMAIL PROTECTED]> wrote:
> www.browseengine.com
> -John
>
>
> On Wed, Dec 10, 2008 at 10:55 AM, Glen Newton <[EMAIL PROTECTED]
www.browseengine.com
-John
On Wed, Dec 10, 2008 at 10:55 AM, Glen Newton <[EMAIL PROTECTED]> wrote:
> From what I understand:
> faceted browse is a taxonomy of depth =1
>
> A taxonomy in general has an arbitrary depth:
>
> Example: Biological taxonomy:
>
> Kingdom Animalia
> Phylum Acanthocepha
Hi Cooper:
Where are these classes?
Thanks
-John
On Tue, Dec 9, 2008 at 2:27 AM, Cooper Geng <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> My application will provide Chinese search engine. I got some analyzer on
> Chinese language.
> Any suggestion about these:
>
> IK_CAnalyzer
> IKAnalyzer
>
>
The obvious way is to use use MatchAllDocsQuery with Sort parameters on the
searcher, e.g.
searcher.search(new MatchAllDocsQuery(),sort);
If you only care about 1 sort spec (e.g. no secondary sort to break ties) it
may be faster just traversing the term table since that is already sorted.
-John
searcher.doc(scoreDoc.doc);
On Thu, Dec 4, 2008 at 6:59 PM, Ian Vink <[EMAIL PROTECTED]> wrote:
> I have this search which returns TopDocs
> TopDocs topDocs = searcher.Search(query, bookFilter, maxDocsToFind);
>
>
> How do I get the document object for the ScoreDoc?
>
> foreach (ScoreDoc scoreDo
On Thu, Dec 4, 2008 at 5:46 PM, Muralidharan V <[EMAIL PROTECTED]>wrote:
> John,
>
> Using the FieldCache worked well. Thanks!
>
> -Murali
>
> On Thu, Dec 4, 2008 at 3:10 PM, John Wang <[EMAIL PROTECTED]> wrote:
>
> > Easiest way to do thi
Easiest way to do this is using the FieldCache. It constructs a StringIndex
object which gives you very fast lookup to the field value (index) given a
docid. Create a parallel count array to the lookup array for the
StringIndex. Run your HitCollector thru should be fast.
Loading FieldCache maybe ex
, could someone explain?
>
> thanks,
> -glen
>
>
> 2008/12/4 John Wang <[EMAIL PROTECTED]>:
> > Thanks!
> > -John
> >
> > On Thu, Dec 4, 2008 at 2:16 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> >
> >> Details in the bug:
> &
ockFactory);
> }
>
> -Yonik
>
>
> On Thu, Dec 4, 2008 at 5:08 PM, John Wang <[EMAIL PROTECTED]> wrote:
> > That does not help. The File/path is not stored with the instance. It is
> in
> > a map FSDirectory keeps statically. Should subclasses of FSDirectory
Tim:
How about implementing your own HitCollector and stop when you have
collected 100 docs with score above certain threshold?
BTW, are there lotsa concurrent searches?
-John
On Thu, Dec 4, 2008 at 12:52 PM, Tim Sturge <[EMAIL PROTECTED]> wrote:
> That makes sense. I should be more p
..what version are we talking about? :-)
>
> The current development version of Lucene allows you to directly
> instantiate FSDirectory subclasses.
>
> -Yonik
>
>
> > thanks,
> >
> > Glen
> >
> > 2008/12/4 Yonik Seeley <[EMAIL PROTECTED]>:
>
Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Thu, Dec 4, 2008 at 4:11 PM, John Wang <[EMAIL PROTECTED]> wrote:
> > Hi guys:
> >We did some profiling and benchmarking:
> >
> >The thread contention on FSDIrectory is gone, and for the set of
> queries
>
Hi guys:
We did some profiling and benchmarking:
The thread contention on FSDIrectory is gone, and for the set of queries
we are running, performance improved by a factor of 5 (to be conservative).
Great job, this is awesome, a simple change and made a huge difference.
To get NIO
eader (call
> its incRef()) and then decRef() it when you're done. That would probably be
> cleanest...
>
> Mike
>
>
> On Jun 29, 2008, at 11:51 AM, John Wang wrote:
>
> Hi:
>> I had some code to do indexReader pooling to avoid open and close on a
>> large
Hi:
I had some code to do indexReader pooling to avoid open and close on a
large index when doing lotsa searches. So I had a FilteredIndexReader proxy
that overrides the doClose method to do nothing, and when I really want to
close it, I call super.doClose(). This patter worked well for me prior
Maybe building a Lucene gateway to hook in with VSpider.
Are you using VSpider or K2Spider?
-John
On Tue, Jun 24, 2008 at 8:35 PM, yugana <[EMAIL PROTECTED]> wrote:
>
> Hi Otis,
>
> Thanks for the reply. So you mean it is not possible to use Lucene to index
> the fetched (Verity Spider Content)
Hi:
I am trying to add couple more values to the TermInfo file and want to
keep the index backward compatible. But I see values such as docFreq etc.
are stored as a VInt, so I couldn't do things like using the signed bit to
determine whether to read/write the extra values.
Any suggestions?
(
How big is your index?
Thanks
-John
On Thu, May 29, 2008 at 10:29 AM, Michael Busch <[EMAIL PROTECTED]> wrote:
> Does your FilteredIndexReader.reopen() return a new instance of
> FilteredIndexReader in case the inner reader was updated (i. e.
> in!=newInner)?
>
>
> -
fig);
}
fixes my leak.
-John
On Thu, May 29, 2008 at 12:35 AM, Michael Busch <[EMAIL PROTECTED]> wrote:
> Could you share some details about how you implemented reopen() in your
> reader?
>
> -Michael
>
>
> John Wang wrote:
>
>> Yes, I do close the old reader.
>
with the reference
>> counting. Are you doing anything special? E. g. do you have own reader
>> implementations that you call reopen() on? What kinds of readers are you
>> using?
>>
>> Are you maybe able to provide a heapdump?
>>
>> -Michael
Hi:
We are experiencing memory leak with calling IndexReader.reopen().
From eyeballing the lucene source code, I am seeing normCache is not
cleared.
Anyone else experiencing this?
Thanks
-John
I see. So is it then the bailey project?
-John
On Tue, May 20, 2008 at 9:04 PM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:
> Oh, it very much did. Check Hadoop Wiki's "Recent Changes", it's there.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> - Original
Hi:
What is the current status on the distributed lucene project proposed at:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg00338.html
Thanks
-John
If your indexed field is not used to further filtering out the doc nor
further scoring, you should use some sort of priority queueing mechanism to
gather the top N documents. You can then call reader.document() on those
docs if necc.
-John
On Sat, May 10, 2008 at 6:35 AM, Stephane Nicoll <[EMAIL
TED]> wrote:
> On Thursday 01 May 2008 00:01:48 John Wang wrote:
> > I am not sure how well lucene would perform with > 2 Billion docs in a
> > single index anyway.
>
> Even if they're in multiple indexes, the doc IDs being ints will still
> prevent
> it going pa
;
> That said, Lucene needs to support >2B, so docids (and all associated
> internals) need to become 'long' fairly soon
>
> -Glen
>
> 2008/4/30 John Wang <[EMAIL PROTECTED]>:
> > lucene docids are represented in a java int, so max signed int would be
> the
lucene docids are represented in a java int, so max signed int would be the
limit, a little over 2 billion.
-John
On Wed, Apr 30, 2008 at 11:54 AM, Sebastin <[EMAIL PROTECTED]> wrote:
>
> Hi All,
> Does Lucene supports Billions of data in a single index store of size 14
> GB
> for every search.I
Other use is for custom Query objects to reboost or expand the user query
from information gathered from the indexreader at search time.
-John
On Mon, Apr 7, 2008 at 2:56 PM, Paul Elschot <[EMAIL PROTECTED]> wrote:
> Itamar,
>
> Query rewrite replaces wildcards with terms available from
> the ind
check out http://www.browseengine.com
tag cloud impl on lucene is avail.
-John
On Wed, Apr 2, 2008 at 4:12 PM, Daniel Noll <[EMAIL PROTECTED]> wrote:
> On Thursday 03 April 2008 08:08:09 Dominique Béjean wrote:
> > Hum, it looks like it is not true.
> > Use a do-while loop make the first terms.t
Apparently tp.nextPosition() is needed :(
Any ideas?
-John
On Thu, Apr 3, 2008 at 8:20 AM, John Wang <[EMAIL PROTECTED]> wrote:
> I am loading both from disk.
> But I found the culprit:
>
> My code:
>
> while (tp.next())
>
> {
>
>
Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request)
> got 2.6 Million Euro funding!
>
>
> On Thu, Apr 3, 2008 at 7:36 AM, John Wang <[EMAIL PROTECTED]> wrote:
> > Sorry, gmail was screwy and accidentally sent the msg.
> &g
d
cache load, and it took much longer than when it had 1000.
I did some profiling and the profiler is pointing to TermPositions.next
and TermPositions.nextPosition and TermPositions.getPayload as the culprit.
I would think payload would always be faster. Any ideas?
Thanks
-John
On Thu, Apr 3, 2008 a
Hi:
HI Grant:
I don't see FunctionQuery in the javadocs. Can you post a link?
Thanks
-john
On Mon, Mar 24, 2008 at 3:32 PM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:
> See the FunctionQuery and the org.apache.lucene.search.function
> package. You can also implement your own query, as it's n
Tue, Mar 25, 2008 at 11:16 AM, Erik Hatcher <[EMAIL PROTECTED]>
wrote:
>
> On Mar 25, 2008, at 1:32 PM, John Wang wrote:
> >Is there a way to random accessing term value in a field? e.g.
> >
> >in my field, content, the terms are: lucene, is, cool
>
Hi:
Is there a way to random accessing term value in a field? e.g.
in my field, content, the terms are: lucene, is, cool
Is there a way to access content[2] -> cool?
Thanks
-John
We are running on one box in prod with 20 million docs in one index.
-John
On Fri, Mar 14, 2008 at 8:01 PM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:
> How big is your machine and how big are your docs? (unique terms,
> etc.) Even if it would fit, it sounds like you are going to have to
> go d
ene.document.Document,%20org.apache.lucene.analysis.Analyzer%29>
> .
>
>
>
> On Mar 13, 2008, at 4:12 PM, John Wang wrote:
>
> > Hi Grant:
> >
> >For our corpus, we don't rely on idf in scoring calculation that
> > much,
> > so I don't see that being
8 at 11:37 AM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:
>
> On Mar 13, 2008, at 11:03 AM, John Wang wrote:
>
> > Yes, but usually it's a good idea to add documents in batch and not
> > having
> > to reinstantiate the writer for every document and then closing
,
> thus your application can identify the language, choose the analyzer
> for the given doc, and then add the document
>
> See
> public void addDocument(Document doc, Analyzer analyzer)
>
>
> On Mar 12, 2008, at 8:40 PM, John Wang wrote:
>
> > Hi all:
> >
> &
Hi all:
Maybe this has been asked before:
I am building an index consists of multiple languages, (stored as a
field), and I have different analyzers depending on the language of the
language to be indexed. But the IndexWriter takes only an Analyzer.
I was hoping to have IndexWriter t
you can always modify the raw lucene score in the hitCollector.
-John
On Wed, Mar 5, 2008 at 1:16 PM, sumittyagi <[EMAIL PROTECTED]> wrote:
>
> is there any way to change the score of the documents.
> Actually i want to modify the scores of the documents dynamically,
> everytime
> for a given que
1 - 100 of 144 matches
Mail list logo