Re: Using setIndexSort on a binary field

2021-10-18 Thread Alex K
. The IndexRearranger is also good to know about. Cheers, Alex On Sun, Oct 17, 2021 at 9:32 AM Michael Sokolov wrote: > Yeah, index sorting doesn't do that -- it sorts *within* each segment > so that when documents are iterated (within that segment) by any of > the many DocIdSetIterators that

Re: Using setIndexSort on a binary field

2021-10-15 Thread Alex K
like what I described possible? Any clarification would be great. Thanks, Alex On Fri, Oct 15, 2021 at 12:43 PM Adrien Grand wrote: > Hi Alex, > > You need to use a BinaryDocValuesField so that the field is indexed with > doc values. > > `Field` is not going to work because it

Using setIndexSort on a binary field

2021-10-15 Thread Alex K
g the java.util.Arrays.compareUnsigned method to sort the fields. Thanks, Alex

Re: Control the number of segments without using forceMerge.

2021-07-05 Thread Alex K
.de/sites/berlinbuzzwords.de/files/2021-06/The%20future%20of%20Lucene%27s%20MMapDirectory.pdf>, and his great post about MMapDirectory from a few years ago <https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html>. Definitely recommended for others. Thanks, Alex On Mon, Jul 5,

Re: Control the number of segments without using forceMerge.

2021-07-05 Thread Alex K
ed on some experimenting and reading the code, it seems to be quite complicated, especially once you start calling addDocument from several threads in parallel. It's good to learn about the MultiReader. I'll look into that some more. Thanks, Alex On Mon, Jul 5, 2021 at 9:14 AM Uwe Schindler wrot

Re: Does Lucene have anything like a covering index as an alternative to DocValues?

2021-07-05 Thread Alex K
Hi Uwe, Thanks for clarifying. That makes sense. Thanks, Alex Klibisz On Mon, Jul 5, 2021 at 9:22 AM Uwe Schindler wrote: > Hi, > > Sorry I misunderstood you question, you want to lookup the UUID in another > system! > Then the approach you are doing is correct. Either store

Control the number of segments without using forceMerge.

2021-07-04 Thread Alex K
ds without force-merging after adding all of the documents? Thanks in advance for any tips Alex Klibisz

Does Lucene have anything like a covering index as an alternative to DocValues?

2021-07-04 Thread Alex K
n the actual index, rather than using DocValues? Thanks in advance for any tips Alex Klibisz

Re: Lucene/Solr and BERT

2021-05-26 Thread Alex K
Thanks Michael. IIRC, the thing that was taking so long was merging into a single segment. Is there already benchmarking code for HNSW available somewhere? I feel like I remember someone posting benchmarking results on one of the Jira tickets. Thanks, Alex On Wed, May 26, 2021 at 3:41 PM Michael

Re: Lucene/Solr and BERT

2021-05-25 Thread Alex K
klibisz.elastiknn / lucene <https://search.maven.org/artifact/com.klibisz.elastiknn/lucene/7.12.1.0/jar> and com.klibisz.elastiknn / models <https://search.maven.org/artifact/com.klibisz.elastiknn/models/7.12.1.0/jar>. The tests are Scala but all of the implementation is in Java. Thanks, Alex On

Re: Lucene/Solr and BERT

2021-04-21 Thread Alex K
There were a couple additions recently merged into lucene but not yet released: - A first-class vector codec - An implementation of HNSW for approximate nearest neighbor search They are however available in the snapshot releases. I started on a small project to get the HNSW implementation into the

Re: How to access block-max metadata?

2020-10-12 Thread Alex K
ected > that advance doesn't help much over nextDoc. advanceShallow is rarely a > drop-in replacement for advance since it's unable to tell whether a > document matches or not, it can only be used to reason about maximum scores > for a range of doc IDs when combined with Impact

Re: How to access block-max metadata?

2020-10-12 Thread Alex K
s. I tried this using .advance() instead of .nextDoc(), but found the improvement was negligible. I'm thinking maybe advanceShallow() would let me get that speedup. - AK On Mon, Oct 12, 2020 at 2:59 AM Adrien Grand wrote: > Hi Alex, > > The entry point for block-max metadata is Terms

How to access block-max metadata?

2020-10-11 Thread Alex K
mple! I appreciate any tips or examples! Thanks, Alex

Re: Optimizing term-occurrence counting (code included)

2020-09-20 Thread Alex K
ching 10s to 100s of terms? It seems the bottleneck is in the PostingsFormat implementation. Perhaps there is a PostingsFormat better suited for this usecase? Thanks, Alex On Fri, Jul 24, 2020 at 7:59 AM Alex K wrote: > Thanks Ali. I don't think that will work in this case, since

Re: Simultaneous Indexing and searching

2020-09-02 Thread Alex K
FWIW, I agree with Michael: this is not a simple problem and there's been a lot of effort in Elasticsearch and Solr to solve it in a robust way. If you can't use ES/solr, I believe there are some posts on the ES blog about how they write/delete/merge shards (Lucene indices). On Tue, Sep 1, 2020 at

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-26 Thread Alex K
Hi, Also have a look here: https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-9378 Seems it might be related. - Alex On Sun, Jul 26, 2020, 23:31 Trejkaz wrote: > Hi all. > > I've been tracking down slow seeking performance in TermsEnum after > updating to Luc

Re: Optimizing term-occurrence counting (code included)

2020-07-24 Thread Alex K
Thanks Ali. I don't think that will work in this case, since the data I'm counting is managed by lucene, but that looks like an interesting project. -Alex On Fri, Jul 24, 2020, 00:15 Ali Akhtar wrote: > I'm new to lucene so I'm not sure what the best way of speeding this

Optimizing term-occurrence counting (code included)

2020-07-23 Thread Alex K
alexklibisz/elastiknn/blob/c75b23f/plugin/src/main/java/org/apache/lucene/search/MatchHashesAndScoreQuery.java#L54-L73 I appreciate any suggestions you might have. - Alex

Re: ANN search current state

2020-07-15 Thread Alex K
de so that it's elasticsearch-agnostic and can be used directly in Lucene apps. However I'm much more familiar with Elasticsearch's APIs and usage/testing patterns than I am with raw Lucene, so I'd likely need to get some help from the Lucene community. Please LMK if that sounds inter

Re: Optimizing a boolean query for 100s of term clauses

2020-06-25 Thread Alex K
post the diff when I do. - AK On Thu, Jun 25, 2020 at 5:07 AM Tommaso Teofili wrote: > hi Alex, > > I had worked on a similar problem directly on Lucene (within Anserini > toolkit) using LSH fingerprints of tokenized feature vector values. > You can find code at [1] and some inf

Re: Optimizing a boolean query for 100s of term clauses

2020-06-24 Thread Alex K
On Wed, Jun 24, 2020 at 8:44 AM Toke Eskildsen wrote: > On Tue, 2020-06-23 at 09:50 -0400, Alex K wrote: > > I'm working on an Elasticsearch plugin (using Lucene internally) that > > allows users to index numerical vectors and run exact and approximate > > k-nearest

Re: Optimizing a boolean query for 100s of term clauses

2020-06-24 Thread Alex K
of the speed... > > On Tue, Jun 23, 2020 at 8:52 PM Alex K wrote: > > > > The TermsInSetQuery is definitely faster. Unfortunately it doesn't seem > to > > return the number of terms that matched in a given document. Rather it > just > > returns the boost v

Re: Optimizing a boolean query for 100s of term clauses

2020-06-23 Thread Alex K
n 23, 2020 at 3:17 PM Alex K wrote: > Hi Michael, > Thanks for the quick response! > > I will look into the TermInSetQuery. > > My usage of "heap" might've been confusing. > I'm using a FunctionScoreQuery from Elasticsearch. > This gets instantiated with

Re: Optimizing a boolean query for 100s of term clauses

2020-06-23 Thread Alex K
e there really two heaps? Do you override the standard > collector? > > On Tue, Jun 23, 2020 at 9:51 AM Alex K wrote: > > > > Hello all, > > > > I'm working on an Elasticsearch plugin (using Lucene internally) that > > allows users to index numerical vectors

Optimizing a boolean query for 100s of term clauses

2020-06-23 Thread Alex K
the Java query classes should look familiar. Maybe there are some settings that I'm not aware of? Maybe I could optimize this by implementing a custom query or scorer? Maybe there's just no way to speed this up? I appreciate any input, examples, links, etc.. :) Also, let me know if I can provide any additional details. Thanks, Alex Klibisz

Re: Lucene coreClosedListeners memory issues

2019-06-03 Thread alex stark
. Why coreClosedListeners increased to such high number in a single day?  On Mon, 03 Jun 2019 18:21:34 +0800 Adrien Grand wrote And do you call release on every searcher that you acquire? On Mon, Jun 3, 2019 at 11:47 AM alex stark <mailto:alex.st...@zoho.com> wrote: &

Re: Lucene coreClosedListeners memory issues

2019-06-03 Thread alex stark
Hi Adrien, I didn't directly open readers. It is controlled by searcher manager. On Mon, 03 Jun 2019 16:32:06 +0800 Adrien Grand wrote It looks like you are leaking readers. On Mon, Jun 3, 2019 at 9:46 AM alex stark <mailto:alex.st...@zoho.com.invalid> wrot

Lucene coreClosedListeners memory issues

2019-06-03 Thread alex stark
Hi experts, I recently have memory issues on Lucene. By checking heap dump, most of them are occupied by SegmentCoreReaders.coreClosedListeners which is about nearly half of all. Dominator Tree num retain size(bytes) percent percent(live) class Name --

Re: Any way to improve document fetching performance?

2018-08-28 Thread alex stark
seriously look at putting the fields you want in docValues=true fields and pulling from there. The entire Streaming functionality is built on this and is quite fast. Best, Erick On Mon, Aug 27, 2018 at 7:35 AM wrote: > > can you post your query string? > > Best > > > On 8/

Re: Any way to improve document fetching performance?

2018-08-27 Thread alex stark
machine? no net latency in between? Best On 8/27/18 10:14 AM, alex stark wrote: > quite small, just serveral simple short text store fields. The total index size is around 1 GB (2m doc). On Mon, 27 Aug 2018 22:12:07 +0800 wrote ---- Alex,- how big are those docs? Best regards On 8/27/18 10

Re: Any way to improve document fetching performance?

2018-08-27 Thread alex stark
quite small, just serveral simple short text store fields. The total index size is around 1 GB (2m doc). On Mon, 27 Aug 2018 22:12:07 +0800  wrote Alex,- how big are those docs? Best regards On 8/27/18 10:09 AM, alex stark wrote: > Hello experts, I am wondering is there any way

Any way to improve document fetching performance?

2018-08-27 Thread alex stark
Hello experts, I am wondering is there any way to improve document fetching performance, it appears to me that visiting from store field is quite slow. I simply tested to use indexsearch.doc() to get 2000 document which takes 50ms. Is there any idea to improve that? 

Replacement of CollapsingTopDocsCollector

2018-08-20 Thread alex stark
In Lucene 7.x, CollapsingTopDocsCollector is removed, is there any replacement of it?

RE: Legacy filter strategy in Lucene 6.0

2018-08-09 Thread alex stark
Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: alex stark > Sent: Thursday, August 9, 2018 9:12 AM > To: java-user > Cc: java-user@lucene.apache.org > Subject: Re: Legacy filter strategy in Lucene 6

Re: Legacy filter strategy in Lucene 6.0

2018-08-09 Thread alex stark
there is two phase iterator, but I did not find how to use it. Is it a appropriate scenario to use two phase iterator? or It is better to do it in a collector? Is there any guide of two phase iterator? Best Regards   On Wed, 08 Aug 2018 16:08:39 +0800  Adrien Grand wrote Hi Alex, These

Legacy filter strategy in Lucene 6.0

2018-08-08 Thread alex stark
As FilteredQuery are removed in Lucene 6.0, we should use boolean query to do the filtering. How about the legacy filter strategy such as LEAP_FROG_FILTER_FIRST_STRATEGY or QUERY_FIRST_FILTER_STRATEGY? What is the current filter strategy?  Thanks,

LUCENE-8396 performance result?

2018-07-17 Thread alex stark
LUCENE-8396 looks pretty good for LBS use cases, do we have performance result for this approach? It appears to me it would greatly reduce terms to index a polygon, and how about search performance? does it also perform well for complex polygon which has hundreds or more coordinates? 

Re: Lucene 5.2.0 global ordinal based query time join on multiple indexes

2015-07-21 Thread Alex Pang
Seems if I create a MultiReader from my index searchers and create the ordinal map from that MultiReader (and use an IndexSearcher created from the MultiReader in the createJoinQuery), then the correct results are found. On Mon, Jul 20, 2015 at 5:48 PM, Alex Pang wrote: > Hi, > > >

Lucene 5.2.0 global ordinal based query time join on multiple indexes

2015-07-20 Thread Alex Pang
y: joinQuery = JoinUtil.createJoinQuery("join_field", fromQuery, new TermQuery(new Term("type", "to")), searcher2, ScoreMode.Max, ordinalMap); Thanks, Alex

Re: Performance issues with the default field compression

2014-04-10 Thread Alex Parvulescu
Hi Adrien, Thanks for clarifying! We're going to go the custom codec & custom visitor route. best, alex On Wed, Apr 9, 2014 at 10:38 PM, Adrien Grand wrote: > Hi Alex, > > Indeed, one or several (the number depends on the size of your > documents) documents need to be

Performance issues with the default field compression

2014-04-09 Thread Alex Parvulescu
ks like it will #skip through a bunch of other stuff before finishing a document. [1] thanks in advance, alex [0] https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressingStoredFieldsReader.java?view=markup#l364 [1] https://svn.apache

Question about the CompoundWordTokenFilterBase

2013-09-18 Thread Alex Parvulescu
propose a small change to skip the original token (controlled by a flag)? If there's interest I can put this in a JIRA issue and we can continue the discussion there. The patch is not too complicated, but I haven't ran any of the tests yet :) thanks, alex

which parser to use?

2011-09-22 Thread alex
hi all, I need to create analyzer and I need to choose what parser to use. can anyone recommend ? JFlex javacc antlr thanks. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: jav

Similarity class and searchPayloads

2011-06-08 Thread Alex vB
? Regards Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Similarity-class-and-searchPayloads-tp3041463p3041463.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe

Lucene query processing

2011-04-26 Thread Alex vB
a value contains tf if I set setOmitTermFreqAndPositions(true)? Allways 1? 4) How are term freqs, payloads read from disk? In bulk for all remaining docs at once or every time a document gets scored? Regards Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-query

Re: New codecs keep Freq skip/omit Pos

2011-04-23 Thread Alex vB
s. The term frequency is then used for ranking purposes. At the moment I pick for ranking the highest value from the freq vector which corresponds to the most matching version. Regards Alex - To unsubscribe, e-mail: java-user-uns

Re: New codecs keep Freq skip/omit Pos

2011-04-23 Thread Alex vB
esponding to a query get fetched, right? If this structure would be possible there are several more implementations with promising results (Two-Level Diff/MSA in this paper http://cis.poly.edu/suel/papers/version.pdf). Regards Alex -- View this message in context: http://lucene.472066.n3.nabble

Re: New codecs keep Freq skip/omit Pos

2011-04-22 Thread Alex vB
Wow cool , I will give that a try! Thank you!! Alex -- View this message in context: http://lucene.472066.n3.nabble.com/New-codecs-keep-Freq-skip-omit-Pos-tp2849776p2852370.html Sent from the Lucene - Java Users mailing list archive at Nabble.com

Re: New codecs keep Freq skip/omit Pos

2011-04-22 Thread Alex vB
I also indexed one time with Lucene 3.0. Are those sizes really completely the same? Standard 4.0 W Freq W Pos 28.1 GB Standard 4.0 W/O Freq W/O Pos 6.2 GB Standard 3.0 W Freq W Pos 28.1 GB Standard 3.0 WO Freq WO Pos 6.2 GB Regards Alex -- View this message in context: http

Re: New codecs keep Freq skip/omit Pos

2011-04-22 Thread Alex vB
is there just "grubbing" through the code? My own implementation needs 2,8 GB of space including FREQ but not POS. This is why I am asking because I want somehow compare the result. Compared to 20 GB it is very nice and compared to 1,6 GB it is very bad ;). Regards Alex -- View this

New codecs keep Freq skip/omit Pos

2011-04-21 Thread Alex vB
just skip writing postions/payloads? Would it mess up the index? The written files have different endings like pyl, skp, pos, doc etc. Gives me "not counting" the pos file a correct index size estimation for W Freqs W/O Pos? Or where exactly are term positions written? Regards Alex

lucene-snowball 3.1.0 packages are missing?

2011-04-03 Thread Alex Ott
Hello I'm trying to upgrade Lucene in my project to 3.1.0 release, but there is no lucene-snowball 3.1.0 package on maven central. Is it intended behaviour? Should I continue to use 3.0.3 for snowball package? -- With best wishes, Alex Ott http://alexott.blogspot.com/http://alexot

Lucene 4.0 Payloads

2011-03-17 Thread Alex vB
} As far as I know there are two possibilities to use payloads 1) During similarity scoring 2) During search Is there a better/faster way to receive payloads during search? Is it possible to run a normal query and read the payloads from hits? Is 1 or 2 the faster way to use payloads? Can I fin

Early Termination

2011-03-15 Thread Alex vB
Hi, is Lucene capable of any early termination techniques during query processing? On the forum I only found some information about TimeLimitedCollector. Are there more implementations? Regards Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Early-Termination

How are stored Fields/Payloads loaded

2011-02-28 Thread Alex vB
Payload, CSD)? Are there other ways to retrieve payloads during search than Spanquery (I would like to use a normal query here)? Regards Alex -- View this message in context: http://lucene.472066.n3.nabble.com/How-are-stored-Fields-Payloads-loaded-tp2598739p2598739.html Sent from the Lucene -

Re: Storing payloads without term-position and frequency

2011-02-03 Thread Alex
Best regards Alex PS: I am currently looking for a bedroom in New York, Brooklyn (Park Slope or near NYU Poly). Maybe somebody rents a room from 15 Feb until 15 April. :) Am Donnerstag, den 03.02.2011, 12:38 -0500 schrieb Grant Ingersoll: > Payloads only make sense in terms of specific position

Storing payloads without term-position and frequency

2011-02-02 Thread Alex vB
I am not able to retrieve payloads. Would it be hard to "hack" Lucene for my requests? Anymore I only store one payload per term if that information makes it easier. Best regards Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Storing-payloads-without-te

RE: Could not find implementing class

2011-01-25 Thread Alex vB
Hello Uwe, I recompiled some classes manually in Lucene sources. No it's running fine! Something went wrong there. Thank you very much! Best regards Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Could-not-find-implementing-class-tp2330598p2332141.html Sent fro

Re: Could not find implementing class

2011-01-25 Thread Alex vB
- sudo update-java-alternatives -java-6-sun Greetings Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Could-not-find-implementing-class-tp2330598p2331617.html Sent from the Lucene - Java Users mailing list a

Could not find implementing class

2011-01-25 Thread Alex vB
th analyzer etc.. Line 86 in Demo.java is writer.addDocument(doc);. Greetings Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Could-not-find-implementing-class-tp2330598p2330598.html Sent from the Lucene - Java Users mailing list archiv

Indexing large XML dumps

2011-01-03 Thread Alex vB
files without completely loading it into a field? 3) How can I avoid to parse an article twice? Best regards Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-large-XML-dumps-tp2185926p2185926.html Sent from the Lucene - Java Users mailing list archive at Nabbl

Re: Implementing indexing of Versioned Document Collections

2010-11-16 Thread Alex vB
alyoad = new Payload(toByteArray(value)); payloadAttr.setPayload(bitvectorPalyoad); } 3) Can I use payloads without term positions? If my questions are unclear please tell me! :) Best regards Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-i

Re: Implementing indexing of Versioned Document Collections

2010-11-16 Thread Alex vB
verison as an own field). Therefore I need to add the payload after the tokenizing step. Is this possible? What happens if I add payload for a current term and I add another payload for the same term later ? Is it overwritten or appended? Greetings Alex -- View this message in context: http:

Implementing indexing of Versioned Document Collections

2010-11-09 Thread Alex vB
ng on storing and try to extend Lucenes search after the former step. THX in advance & best regards Alex [1] http://cis.poly.edu/suel/ -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-indexing-of-Versioned-Document-Collections-tp1872701p1872701.html Sent from t

Detailed file handling on hard disk

2010-09-03 Thread Alex vB
t position/location for my PostingList/Document? Do I need information/metadata about the blocks from the underlying file system? Or where can I find further informations about this stuff? :) Best regards Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Detailed-file-handli

Re: Document category identification in query

2009-12-20 Thread Alex
ed approach using search queries that would have been tagged with the relevant location categories. What do you guys think ? Would this be a viable approach ? Thanks for all ! Cheers Alex

Re: Document category identification in query

2009-12-15 Thread Alex
Can anybody help me or maybe point me to relevant resources I could learn from ? Thanks.

Document category identification in query

2009-12-14 Thread Alex
rant" since bistro is a very similar concept to "Retaurant" ... Once the category is identified I can then query the index for documents that match that category the best. What is the proper way to identify the most relevant category in a user query based on the above ? Should I consider any other better approach ? Any help appreciated. Many thanks Alex.

Re: Search with whitespaces

2009-09-29 Thread Alex Bredariol Grilo
To use ShingleFilter, I'd like to change its TOKEN_SEPARATOR, but it's final. Furthermore, I tryed to compile its source code but the compiler isn't finding some methods like addAtribute. Does someone know how could I do that? Alex On Fri, Sep 25, 2009 at 2:42 PM, Robert Muir wr

Re: Filtering query results based on relevance/acuracy

2009-09-29 Thread Alex
anybody can help ? On Sat, Sep 26, 2009 at 11:22 PM, Alex wrote: > Hi Otis and thank your for helping me out. > > Sorry for the late reply. > > > > Although a Phrase query or TermQuery would be perfectly suited in some > cases, this will not work in my case. > &

Re: Filtering query results based on relevance/acuracy

2009-09-26 Thread Alex
e what feature of Lucene I should use here in the first step of the algo to only find the most relevant LocationTypes and filter out the ones that are not relevant enough. Any help and any thoughts on my approach greatly appreciated. Thanks in advance. Cheers, Alex.

Re: Search with whitespaces

2009-09-25 Thread Alex Bredariol Grilo
tion (assuming field is f1): > > f1:notebook f1:"note book" > > which means (notebook OR "note book"), 2nd condition is phrase > search. > > Best regards, Lisheng > > -Original Message- > From: Alex Bredariol Grilo [mailto:abgr...@gmail.com] &g

Search with whitespaces

2009-09-25 Thread Alex Bredariol Grilo
these documents could be found and I'd like to do it for a general case, like trying to search the words joining the word before or after. How could I do that? Is there an analyzer which tokenize like that? Thank you Alex

Filtering query results based on relevance/acuracy

2009-09-21 Thread Alex
} catch (ParseException e) { throw new RuntimeException("Unable to parse query: " + queryString, e); } I guess that there is a way to filter out results that have a score below a given threshold or a way to filter out results based on score gap or anything similar. But I have no idea on how to do this... What is the best way to achieve what I want? Thank you for your help ! Cheers, Alex

Re: Query and language conversion

2009-09-01 Thread Alex
Many thanks Steve for all that information. I understand by your answer that cross-lingual search doesn't come "out-of -the-box" in Lucene. Cheers. Alex On Tue, Sep 1, 2009 at 6:46 PM, Steven A Rowe wrote: > Hi Alex, > > What you want to do is commonly referr

Query and language conversion

2009-09-01 Thread Alex
Hi, I am new to Lucene so excuse me if this is a trivial question .. I have data that I Index in a given language (English). My users will come from different countries and my search screen will be internationalized. My users will then probably query thing in their own language. Is it possible t

cannot retrieve the values of a field is not stored in the index

2009-06-04 Thread Alex Steward
Hi,   Is there a way I can retrieve the value of a field that is not stored in the Index? private static void indexFile(IndexWriter writer, File f)     throws IOException {     if (f.isHidden() || !f.exists() || !f.canRead()) {   return;     }     System.out.println("Indexing " + f.getC

Re: lucene code changes

2009-05-19 Thread Alex Steward
 I have a need to implement an custom inverted index in Lucene. I have files like the ones I have attached here. The Files have words and and scores assigned to that word. There will 100's of such files. Each file will have atleast 5 such name value pairs. Note: Currently the file only shows

lucene source code changes

2009-05-19 Thread Alex Steward
Hello,  I have a need to implement an custom inverted index in Lucene. I have files like the ones I have attached here. The Files have words and and scores assigned to that word. There will 100's of such files. Each file will have atleast 5 such name value pairs. Note: Currently the file onl

RE: Lucene Concurrency Issue

2008-08-07 Thread Alex Wang
Thanks Mark and Jason for your responses and your contrib to Lucene. I will try to dig into them and incorporate the ideas into my app. Thanks again! Alex >-Original Message- >From: Jason Rutherglen [mailto:[EMAIL PROTECTED] >Sent: Thursday, August 07, 2008 10:07 AM >T

Lucene Concurrency Issue

2008-08-06 Thread Alex Wang
concurrent add/delete/search happens. Is there any general guidelines that you can share? Thanks in advance! Alex

RE: Urgent Help Please: "Resource temporarily unavailable"

2008-08-06 Thread Alex Wang
IndexReader and IndexWriter are thread safe. Beside, I have no explicit multi-threading in our own code. Thanks again! Alex Wang CrossView Inc. Office: 416-281-6888 Email: [EMAIL PROTECTED] Web: http://www.crossview.com

Urgent Help Please: "Resource temporarily unavailable"

2008-08-06 Thread Alex Wang
failed. I have no clue what could have caused such error. Unfortunately there is no further info in the logs. Can someone please shed some light on this? Thanks. Alex

Urgent Help Please: "Resource Tempararily Unavailable"

2008-08-06 Thread Alex Wang
failed. I have no clue what could have caused such error. Unfortunately there is no further info in the logs. Can someone please shed some light on this? Thanks. Alex

Re: How Lucene Search

2008-06-26 Thread Alex Cheng
the debugger that came with eclipse is pretty good for this purpose. You can create a small project and then attach Lucene source for the purpose of debugging. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e

IndexDeletionPolicy to delete commits after N minutes

2008-06-25 Thread Alex Cheng
hi, what is the correct way to instruct the indexwriter (or other classes?) to delete old commit points after N minutes ? I tried to write a customized IndexDeletionPolicy that uses the parameters to schedule future jobs to perform file deletion. However, I am only getting the filenames through the

IndexDeletionPolicy to delete after N minutes

2008-06-25 Thread Alex Cheng
hi, what is the correct way to instruct the indexwriter to delete old commit points after N minutes ? I tried to write a customized IndexDeletionPolicy that uses the parameters to schedule future jobs to do file deletion. However, I am only getting the filenames, and not absolute file names. thank

instruct IndexDeletionPolicy to delete old commits after N minutes

2008-06-25 Thread Alex Cheng
hi, what is the correct way to instruct the indexwriter to delete old commit points after N minutes ? I tried to write a customized IndexDeletionPolicy that uses the parameters to schedule future jobs to do file deletion. However, I am only getting the filenames, and not absolute file names. thank

Re: searching for C++

2008-06-24 Thread Alex Soto
Thanks everyone. I appreciate the help. I think I will write my own tokenizer, because I do not have a predefined list of words with symbols. I will modify the grammar by defining a SYMBOL token as John suggested and redefine ALPHANUM to include it. Regards, Alex Soto On Tue, Jun 24, 2008 at

searching for words with symbols

2008-06-24 Thread Alex Soto
t provides. I think I need to write a specialized tokenizer (and accompanying analyzer) that let the "+" characters pass. I would use the JFlex provided one, modify it and add it to my project. My question is: Is there any simpler way to accomplish the same? -- Alex Soto [EMAIL PRO

searching for C++

2008-06-24 Thread Alex Soto
t provides. I think I need to write a specialized tokenizer (and accompanying analyzer) that let the "+" characters pass. I would use the JFlex provided one, modify it and add it to my project. My question is: Is there any simpler way to accomplish the same? Best regards, Alex Sot

RE: huge tii files

2008-06-17 Thread Alex
you can invoke IndexReader.setTermInfosIndexDivisor prior to any search to control the fraction of .tii file read into memory. _ 聰明搜尋和瀏覽網路的免費工具列 — MSN 搜尋工具列 http://toolbar.live.com/ ---

RE: Is it possible to get only one Field from a Document?

2008-06-11 Thread Alex
if you have many terms across the fields, you might want to invoke IndexReader's setTermInfosIndexDivisor() method, which would reduce the in memory term infos used to lookup idf, but a (slightly) slower search. > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: Re: Is it po

RE: lucene memory consumption

2008-05-29 Thread Alex
I believe we have around 346 million documents Alex > Date: Thu, 29 May 2008 18:39:31 -0400 > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: Re: lucene memory consumption > > Alex wrote: >> Currently

RE: lucene memory consumption

2008-05-29 Thread Alex
Currently, searching on our index consumes around 2.5GB of ram. This is just a single term query, nothing that requires the in memory cache like in the FieldScoreQuery. Alex > Date: Thu, 29 May 2008 15:25:43 -0700 > From: [EMAIL PROTECTED] >

lucene memory consumption

2008-05-29 Thread Alex
Hi, other than the in memory terms (.tii), and the few kilobytes of opened file buffer, where are some other sources of significant memory consumption when searching on a large index ? (> 100GB). The queries are just normal term queries. ___

RE: slow FieldCacheImpl.createValue

2008-05-19 Thread Alex
TECTED] > To: java-user@lucene.apache.org > Subject: Re: slow FieldCacheImpl.createValue > > Hey Alex, > I guess you haven't tried warming up the engine before putting it to use. > Though one of the simpler implementation, you could try warming up the > engine first by sendin

slow FieldCacheImpl.createValue

2008-05-19 Thread Alex
hi, I have a ValueSourceQuery that makes use of a stored field. The field contains roughly 27.27 million untokenized terms. The average length of each term is 8 digits. The first search always takes around 5 minutes, and it is due to the createValue function in the FieldCacheImpl. The search is e

Can I using HFS in lucene 2.3.1?

2008-04-25 Thread Alex Chew
Hi, Does somebody have practice building a distributed application with lucene and Hadoop/HFS? Lucene 2.3.1 looks not explose HFSDirectory. Any advice will be appreciated. Regards, Alex

  1   2   >