Re: Does Lucene Vector Search support int8 and / or even binary?

2025-04-14 Thread John Dale (DB2DOM)
unsubscribe On Tue, Mar 19, 2024 at 2:59 PM Shubham Chaudhary wrote: > Hi Michael, > > Lucene already had int8 vector support since 9.5 (#1054 > ) but it was left to the user > to get those quantized vectors and index using KnnByteVectorField > < > htt

Scoring Across Multiple Fields

2020-01-27 Thread John Brown
ery. Am I going about this in the correct way? Any clarification would be greatly appreciated. Thank you, John B

Using Lucene as a Document Comparison Tool

2019-12-12 Thread John Brown
re is an easy way to do this that I'm missing, after all, I essentially just want to remove a step from the process. Any help would be much appreciated. Thank you, -John B

Re: docValues & facets

2019-06-11 Thread John Davis
Solr folks mentioned this is dependent on lucene's behavior of merging segments. I am not sure where the right answer lies.. On Tue, Jun 11, 2019 at 12:01 AM Adrien Grand wrote: > Hi John, > > You probably meant to send this to the solr-user@lucene.a.o list, this > java-user

docValues & facets

2019-06-10 Thread John Davis
Hi there, We recently changed a field from TextField no docValues to SortableTextField which has docValues enabled by default. Once we did this we do not see any facet values for the field. I know that once all the docs are re-indexed facets should work again, however can someone clarify the curren

Question about Lucene in my project ..

2019-05-27 Thread John Dale
d paginate? Is Lucene transactional when adding to the index? Sincerely, John - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

field:* vs field:[* TO *]

2019-04-17 Thread John Davis
value for a given field, and I've noticed the queries to be quite slow especially for fields that have a large number of distinct values. John

Re: moderator

2017-02-27 Thread betty john
Send a mail to java-user-unsubscr...@lucene.apache.org On Tue, Feb 28, 2017 at 9:13 AM, Arvind Gupta wrote: > How can I unsubscribe from this mailing list > > -arvind >

All in one query

2016-10-18 Thread betty john
Hi Is there any function that performs exact match, fuzzy search and prefix search?

PointValues wildcard search in 6.0?

2016-04-29 Thread John Doe
Wildcard queries don't seem to be working for PointValues in Lucene 6.0 . For example, "new WildcardQuery(new Term(some_LongPoint_field_name, "*")" does not find anything. A similar query worked fine with LongFields though. While PointValues javadocs say "are indexed differently than ordinary text

Retrieve found terms

2014-11-25 Thread John Cecere
y is 'arch*' and I got 3 hits on it. I want to know what terms in the indexed matched (e.g. archery, architecture, archenemy). More specifically, I'd like to do this without having to use the highlighter jar. I already have my index set up with term vectors enabled. Thanks, John

Term vectors

2014-09-30 Thread John Cecere
l comprehensive documentation. The javadocs are more reference material than anything else. Can someone point me to some documentation on term vectors, how they work, and how to to use them? Thanks, John -- John Cecere Principal Engineer - Oracle Corporation 732-987-4317 / john.ce

Re: Case sensitivity

2014-09-19 Thread John Cecere
dexWriterConfig and the IndexWriter. Thanks, John On 9/19/14 9:36 AM, Paul Libbrecht wrote: two fields? paul On 19 sept. 2014, at 15:07, John Cecere wrote: Is there a way to set up Lucene so that both case-sensitive and case-insensitive searches can be done without having to generate two

Case sensitivity

2014-09-19 Thread John Cecere
Is there a way to set up Lucene so that both case-sensitive and case-insensitive searches can be done without having to generate two indexes? -- John Cecere Principal Engineer - Oracle Corporation 732-987-4317 / john.cec...@oracle.com

Re: IndexWriter croaks on large file

2014-02-19 Thread John Cecere
erent TextFields (the content). I've tried doing this and haven't found any problems with it, but I'm just wondering if there's anything I should be aware of. Regards, John On 2/14/14 4:37 PM, Tri Cao wrote: As docIDs are ints too, it's most likely he'll hit the limit of

Re: IndexWriter croaks on large file

2014-02-14 Thread John Cecere
these things. Wouldn't just changing from int to long for the offsets solve the problem ? I'm sure it would probably have to be changed in a lot of places, but why impose such a limitation ? Especially since it's using an InputStream and only dealing with a block of data at a

IndexWriter croaks on large file

2014-02-14 Thread John Cecere
.lucene.index.IndexWriter.addDocument(IndexWriter.java:1202) Thanks, John -- John Cecere Principal Engineer - Oracle Corporation 732-987-4317 / john.cec...@oracle.com - To unsubscribe, e-mail: java-user-unsubscr...@luc

possible latency increase from Lucene versions 4.1 to 4.4?

2013-09-13 Thread John Wang
Has anyone experienced a latency increase between the above versions? Mainly in conjunction queries. Thanks -John

Re: lucene and ejb applications

2013-08-09 Thread John C
Hi, can I have an advice to write an EJB app that R/W on Lucene's Indexs? Please. Thank you

command line lucene

2013-05-17 Thread John Wang
/clue Thanks -John

Re: Upgrading Lucene 2.0.0 TermQuery to 4.2 QueryParser

2013-04-03 Thread Lewis John Mcgibbney
taphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] >> Sent: Thursday, April 04, 2013 3:39 AM >> To: java-user@lucene.apache.org >> Subject: Upgrading Lucene 2.0.0 TermQuery to 4.2 Qu

Re: Necessary to close() IndexSearcher in 4.X?

2013-04-03 Thread Lewis John Mcgibbney
ndexReader open as long as possible as it is very expensive to open/close them all the time. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Lewis

Necessary to close() IndexSearcher in 4.X?

2013-04-03 Thread Lewis John Mcgibbney
Hi, I am encountering many situations where searcher.close() is present in finally blocks such as } finally { if (searcher != null) { try { searcher.close(); } catch (Exception ignore) { } searc

Upgrading Lucene 2.0.0 TermQuery to 4.2 QueryParser

2013-04-03 Thread Lewis John Mcgibbney
Hi, I'm currently embarking upon a non trivial upgrade of some legacy 2.0.0 code and encounter the following IndexSearcher searcher = null; try { searcher = new IndexSearcher(indexFilePath); Term productIdTerm = new Term("product_id", productId);

Re: Lucene index on NFS

2012-10-02 Thread Nader, John P
s. We are moving to this architecture with a new product, so I am just now starting to understand the trade-offs. Hope that helps. -John On 10/2/12 8:01 AM, "Jong Kim" wrote: >Thank you all for reply. > >So it soudns like it is a known fact that the performance would suffe

Re: Memory question

2012-05-16 Thread Nader, John P
Lutz > >-Original Message- >From: Chris Bamford [mailto:chris.bamf...@talktalk.net] >Sent: Dienstag, 15. Mai 2012 16:38 >To: java-user@lucene.apache.org >Subject: Re: Memory question > > > Hi John, > >Very interesting, thanks for the detailed explanation. It certainl

Re: Memory question

2012-05-15 Thread Nader, John P
file size and how much is resident. The java heap shows up in pmap as well on linux, so you can determine how much of that is in memory as well. John On 5/15/12 3:38 PM, "Chris Bamford" wrote: >Thanks Uwe. > >What I'd like to understand is the implications of this on

RE: Immutable OpenBitSet?

2011-04-28 Thread Nader, John P
I agree that Trejkaz's example is correct and consistent with both the JLS spec and "Java Concurrency in Practice", by Goetz. Without synchronization, the final keyword is necessary to ensure all values set on a long[] in a constructor are seen by other threads in the state they were when the

RE: Immutable OpenBitSet?

2011-04-27 Thread Nader, John P
--Original Message- From: Federico Fissore [mailto:feder...@fissore.org] Sent: Wednesday, April 27, 2011 5:12 PM To: java-user@lucene.apache.org Subject: Re: Immutable OpenBitSet? Nader, John P, il 27/04/2011 20:28, ha scritto: > Hello, > > We have an application that relies heavily on cach

Immutable OpenBitSet?

2011-04-27 Thread Nader, John P
ide the methods, but the fields cannot be overridden as final. Are there any suggestions of the forum? Possibly other Lucene classes to solve this problem? Or other solutions? I'm just looking for ideas. Thanks. -John

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Christopher St John
On Tue, Jan 18, 2011 at 3:04 PM, Grant Ingersoll wrote: > > Where do you get your Lucene/Solr downloads from? > > [] ASF Mirrors (linked in our release announcements or via the Lucene website) > > [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) > > [] I/we build them from sourc

3.0.3 Contrib Query Parser : Custom Field Name Builder

2011-01-07 Thread Christopher St John
7;t have a hash an equals or hash method. Suggestions? I've worked around it by registering a class based builder, checking for the field name and either delegating to the original builder or doing my custom processing, but it's a little awkward. -cks -- Chr

RE: PDF text extracted without spaces

2010-12-03 Thread McGibbney, Lewis John
Hi Ganesh I encountered this same problem last week. I was thinking if it was possible to include at minimum a WhitespaceAnalyzer somewhere within Tika which would solve the problem. I am not sure of how this would be done as I am not familiar with Tika codebase. Unfortunately I don't think th

RE: tokensFromAnalysis

2010-12-02 Thread McGibbney, Lewis John
rsion, we can't help, because we don't know what you're coding against. Steve > -Original Message----- > From: McGibbney, Lewis John [mailto:lewis.mcgibb...@gcu.ac.uk] > Sent: Thursday, December 02, 2010 10:46 AM > To: 'java-user@lucene.apache.

RE: tokensFromAnalysis

2010-12-02 Thread McGibbney, Lewis John
[mailto:simon.willna...@googlemail.com] Sent: 02 December 2010 15:35 To: java-user@lucene.apache.org Subject: Re: tokensFromAnalysis man what version of lucene are you useing? simon On Thu, Dec 2, 2010 at 4:27 PM, McGibbney, Lewis John wrote: > Hello List, > > Having posted a couple of days ag

tokensFromAnalysis

2010-12-02 Thread McGibbney, Lewis John
Hello List, Having posted a couple of days ago, I have one last question regarding the following code fragment public static Token[] tokensFromAnalysis(Analyzer analyzer, String text) throws IOException { TokenStream stream = analyzer.tokenStream("contents", new StringReader(t

Keyword extraction from pdf to text

2010-11-30 Thread McGibbney, Lewis John
Hello list, I am currently attempting to extract keywords from pdf documents, my aim is then to begin constructing a domain ontology using the words which are extracted. I do not need to index anything at this stage, but wish to extract and push the output as plain text into a text file. An exa

IndexWriter Class

2010-11-25 Thread McGibbney, Lewis John
Hello List, Lucene 3.0.1 Windows Vista Premium Home Edition I am currently attempting to configure my IndexFiles.java file. My intention is to add the following functionality to the code as I require input text to be further analyzed than what the default analyzer does. IndexWriter writer = n

RE: Filters do not work with MultiSearcher?

2010-09-10 Thread Nader, John P
x27;t obvious) and our assumption was that a MultiReader was being passed in. Thank you for your quick response and useful insights. -John -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Friday, September 10, 2010 12:32 PM To: java

Filters do not work with MultiSearcher?

2010-09-10 Thread Nader, John P
We are attempting to perform a filtered search on two indices joined by a MultiSearcher. Unfortunately, it appears there is an issue in the lucene code that is causing the filter to be simply reused at the starting ordinal for each individual index instead of being augmented by the starting doc

Problems with Lucene 3.0.2 and Java 1.6.0_12

2010-08-18 Thread Nader, John P
interested. -John -Original Message- From: Nader, John P [mailto:john.na...@cengage.com] Sent: Friday, July 30, 2010 3:17 PM To: java-user@lucene.apache.org Subject: RE: Term browsing much slower in Lucene 3.x.x Mike, We took your suggestion and refactored like this: TermEnum termEnum

RE: Term browsing much slower in Lucene 3.x.x

2010-07-30 Thread Nader, John P
sary in other API calls. BTW, that environment is Java 1.6.0_12 on 64-bit SUSE Linux with 32G of RAM and using MMapDirectory. Thanks. -John -Original Message- From: Nader, John P [mailto:john.na...@cengage.com] Sent: Thursday, July 29, 2010 5:49 PM To: java-user@lucene.apache.org Sub

RE: Term browsing much slower in Lucene 3.x.x

2010-07-29 Thread Nader, John P
the added synchronization. I don't think is waiting on locks, but rather the memory flush and loading that goes on. -John -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Thursday, July 29, 2010 5:55 AM To: java-user@lucene.apache.org Subject: Re:

Term browsing much slower in Lucene 3.x.x

2010-07-28 Thread Nader, John P
ld be more efficient? The synchronization appears to be on aspects of these classes that the next() operation is not concerned with. My other question is whether there are planned performance enhancements to address this loss of performance? Thanks. John

Memory use and Lucene

2010-04-01 Thread John Viviano
OS and JVM versions are as follows: Linux version 2.6.18-028stab066.10 (r...@rhel5-64-build) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) #1 SMP Fri Dec 4 15:49:04 MSK 2009 java version "1.6.0_17" Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) 64-Bit Server

TooManyClauses and maxClauseCount question

2010-01-20 Thread john smith
Hi I'am getting TooManyClauses exception while performing wildcard query. I'am thinking about changing max clause count limit (BooleanQuery.setMaxClauseCount() method). My question referes to memory consumption in case of increasing maxClauseCount parameter. Does Lucene do it in a smart way (

TermDocs.close

2009-12-27 Thread John Wang
Hi: I see TermDocs.close not being called when created with TermQuery: TermQuery creates it and passes to TermScorer, and is never closed. I see TermDocs.close actually closes the input stream. Is it safe not closing TermDocs? Thanks -John

Fwd: 3.0 api change

2009-12-21 Thread John Wang
Any comments? Did we just unintentionally remove getFieldComparatorSource in 3.0.0? -John -- Forwarded message -- From: John Wang Date: Mon, Dec 21, 2009 at 11:21 AM Subject: 3.0 api change To: Lucene Users List , lucene-...@jakarta.apache.org Hi guys: I noticed

share some numbers for range queries

2009-11-15 Thread John Wang
Hi: I did some performance analysis for different ways of doing numeric ranging with lucene. Thought I'd share: http://invertedindex.blogspot.com/2009/11/numeric-range-queries-comparison.html -John

Re: IndexWriter.close() no longer seems to close everything

2009-11-12 Thread John Wang
If you run the zoie test turned to nrt, you can replicate it rather easily: While the test is running, do lsof on your process, e.g. lsof -p | | wc -John On Thu, Nov 12, 2009 at 8:24 AM, John Wang wrote: > Well, I have code in the finally block to call IndexReader.close for every >

Re: IndexWriter.close() no longer seems to close everything

2009-11-12 Thread John Wang
Well, I have code in the finally block to call IndexReader.close for every reader I get from IndexWriter.getReader. On Mon, Nov 9, 2009 at 2:43 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Does this look like a real leak John? You're definitely closing every &g

Re: IndexWriter.close() no longer seems to close everything

2009-11-08 Thread John Wang
I am seeing the samething, but only when IndexWriter.getReader is called at a high rate. from lsof, I see file handles growing. -John On Sun, Nov 8, 2009 at 7:29 PM, Daniel Noll wrote: > Hi all. > > We updated to Lucene 2.9, and now we find that after closing our text > inde

lucene 2.9+ numeric indexing

2009-11-08 Thread John Wang
to Luke, the number displayed was 77886: i.e. searching for: MY_FIELD:77886$ does yield a doc, and using reconstructing the doc functionality, I see the value is 77886. Ideas? Thanks -John

Re: 2.9 per segment searching/caching

2009-10-22 Thread John Wang
first set of numbers I gave you, which was run on jdk1.6, with small queue sizes, and there was very very slight difference with multiQ faster by a tinu bit. Thanks -John On Thu, Oct 22, 2009 at 7:09 PM, Mark Miller wrote: > Yes - in many cases, the other wins outweigh the queue transitio

Re: 2.9 per segment searching/caching

2009-10-22 Thread John Wang
With many other coding that happened in 2.9, e.g. the PQ api etc., sorting is actually faster than 2.4. -John On Thu, Oct 22, 2009 at 5:07 AM, Mark Miller wrote: > Bill Au wrote: > > Since Lucene 2.9 has per segment searching/caching, does query > performance > > degrade les

Re: Lucene 2.9.0 leaves too many .cfs files open, causing too many files open java error.

2009-10-18 Thread John Wang
done with them however. -John On Sun, Oct 18, 2009 at 10:47 AM, GlenAbbeyDrive wrote: > > I commit the IndexWriter every 200 documents in a batch as follows and you > can see that I reopened the reader after the commit. > > private void commit(IndexWriter writer) throws Corrup

Re: Realtime search best practices

2009-10-12 Thread John Wang
I think it was my email Yonik responded to and he is right, I was being lazy and didn't read the javadoc very carefully.My bad. Thanks for the javadoc change. -John On Mon, Oct 12, 2009 at 1:57 PM, Yonik Seeley wrote: > On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix > wrote: >

Re: Realtime search best practices

2009-10-12 Thread John Wang
Oh, that is really good to know! Is this deterministic? e.g. as long as writer.addDocument() is called, next getReader reflects the change? Does it work with deletes? e.g. writer.deleteDocuments()? Thanks Mike for clarifying! -John On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless <

Re: faceted search performance

2009-10-12 Thread John Wang
Given you have 1M docs and about 1M terms, do you see very few docs per term? If your DocSet per term is very sparse, BitSet is probably not a good representation. Simple int array maybe better for memory, and faster for iterating. -John On Mon, Oct 12, 2009 at 8:45 AM, Paul Elschot wrote: >

new sorting api and some perf numbers

2009-10-11 Thread John Wang
e same query for each test. First query, includes loading lucene 2.4.1: 4858ms, lucene 2.9.0: 816ms, gain of 595% avg of the rest 19 queries: lucene 2.4.1: 32ms, lucene 2.9.0: 17ms , gain of 188% I ran this test about 5 times, the findings are similar. The performance gain is significant! Great job! -John

Re: Realtime & distributed

2009-10-11 Thread John Wang
Eric: For more specific Zoie questions, let's move it to the zoie discussion group instead. Thanks -John On Sun, Oct 11, 2009 at 2:31 PM, John Wang wrote: > Hi Eric: > > I regret the direction the thread has taken and partly take responsibility > for it... > > As t

Re: Realtime & distributed

2009-10-11 Thread John Wang
. Hope this helps. -John On Sun, Oct 11, 2009 at 1:51 PM, Angel, Eric wrote: > Man, this thread really went south. Anyhow, I have a few questions about > Zoie: > > * How many nodes are you using to support the speeds you desire at LI? > * Am I wrong to assume that the RAMDir h

Re: Realtime & distributed

2009-10-09 Thread John Wang
ormance becomes less significant. To be truly realtime, IMHO, you need some sort of memory helper to handle transient indexing requests. Doing that is where the actual challenge is. -John On Fri, Oct 9, 2009 at 1:06 PM, Jason Rutherglen wrote: > The dimensions sound good. It's unclear if yo

Re: Realtime & distributed

2009-10-08 Thread John Wang
g/search with Zoie here at linkedin in a production environment powering a real internet company with real traffic. -John On Thu, Oct 8, 2009 at 7:56 PM, Jason Rutherglen wrote: > Eric, > > Katta doesn't require HDFS which would be slow to search on, > though Katta can be used t

Results of setting LogMergePolicy "calibrateSizeByDeletes=true"

2009-09-30 Thread Jibo John
Hello, I am in the process of trying out the lucene patch LUCENE-1634, however I'm not getting the expected behavior. I see that the segments are not getting merged even after all the documents are deleted from it. Because of this, the index size really grows to a huge number. The expec

PrefixQuery vs wildcardquery

2009-09-28 Thread John Seer
Hello, Is there any benefit of using one or other for "start with query"? Regards -- View this message in context: http://www.nabble.com/PrefixQuery-vs-wildcardquery-tp25649045p25649045.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

2.9 NRT w.r.t. sorting and field cache

2009-09-21 Thread John Wang
significant performance issue. Please advise. Thanks -John

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread John Byrne
Valery wrote: Hi John, (aren't you the same John Byrne who is a key contributor to the great OpenSSI project?) Nope, never heard of him! But with a great name like that I'm sure he'll go a long way :) John Byrne-3 wrote: I'm inclined to disagree with the idea tha

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread John Byrne
the job of the original tokenizer anyway. Do the simplest thing that could possibly work! Regards, -John Valery wrote: Hi Robert, so, would you expect a Tokenizer to consider '/' in both cases as a separate Token? Personally, I see no problem if Tokenzer would do the following job: &

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-11 Thread Jibo John
ng because the thread pool was already told to shut down. Larger queues made it much more likely to happen. Can you try the new version (attached)? Also, make sure you add 'doc.reuse.fields=false' to your alg (on trunk). Mike On Tue, Aug 11, 2009 at 12:39 PM, Jibo John wrote: Mik

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-03 Thread Jibo John
ng from earlier releases, which could explain what you're seeing). If you are missing that, can you download the current code from http://www.manning.com/hatcher3 and try again? If that's not the problem... can you post the benchmark alg you are using in each case? Mike On Fri, Jul 31, 200

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread Jibo John
Hi Phil, It's 5 threads for IndexWriter. For ThreadedIndexWriter, I used: writer.num.threads=16 writer.max.thread.queue.size=80 Thanks, -Jibo On Jul 31, 2009, at 5:01 PM, Phil Whelan wrote: Hi Jibo, Your mergeFactor is different, and the resulting numFiles (segment files) is different. May

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread Jibo John
On Jul 31, 2009, at 2:52 PM, Michael McCandless wrote: Hmmm... can you run CheckIndex on both indexes and post the results? java org.apache.lucene.index.CheckIndex /path/to/index Mike On Fri, Jul 31, 2009 at 2:38 PM, Jibo John wrote: Number of docs are the same in the index for both the

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread Jibo John
or were they different? If different, how so (e.g., missing terms, etc.)? Later, Jim On Fri, Jul 31, 2009 at 2:38 PM , Jibo John wrote: Number of docs are the same in the index for both the cases (200,000). I haven't altered the benchmark/ code, but, used a profiler to verify

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread Jibo John
a smaller index. Can you sanity check the index? Eg is numDocs() the same for both? You definitely called close() on the writer, right? That method waits for all threads to finish their work before actually closing. Mike On Thu, Jul 30, 2009 at 8:01 PM, Jibo John wrote: While trying out a few tuni

ThreadedIndexWriter vs. IndexWriter

2009-07-30 Thread Jibo John
While trying out a few tuning options using contrib/benchmak as described in LIA (2nd edition) book, I had an interesting observation. If I use a ThreadedIndexWriter (picked the example from lia2e, page 356) instead of IndexWriter, the index size got reduced by 40% compared to using IndexWr

Re: Tokenizer queston: how can I force ? and ! to be separate tokens?

2009-07-17 Thread John Byrne
[?] that you stored previously. If you're already using something that's grammar-based (such as StandardTokenizer) then you could add the "?" to the grammar as a separate token. If you can figure out how to do this from looking at the grammar file, then it's probably

Re: searching for c++, c#, etc...

2009-07-16 Thread John Wang
If you escape the character + or #, the sentence: "I know java + c++" would not skip +, furthermore, it breaks query parsing, where + is reserved. -John On Thu, Jul 16, 2009 at 9:04 AM, John Wang wrote: > This runs into problems when you have such following sentence: > "I

Re: searching for c++, c#, etc...

2009-07-16 Thread John Wang
This runs into problems when you have such following sentence: "I dislike c++." If you use WSA, then last token is "c++.", not "c++", the query would not find this document. -John On Thu, Jul 16, 2009 at 8:29 AM, Chris Salem wrote: > That seems to be worki

Re: strange issues with IRISH

2009-07-13 Thread John Byrne
nce much,storage space is generally not much of an issue, and it makes phrase searching more accurate if you keep them. -John Hi All, I've came across very strange issue with Irish language. I have the following set of strings in Irish: ag an gcrosbhealach seo, Lean ar an mуrb

Re: Multi Value field

2009-07-07 Thread John Seer
tom > Similarity at query time to ignore the length normalisation factor. > > Cheers > Mark > > > > On 7 Jul 2009, at 19:31, John Seer wrote: > >> >> Hello, >> >> I have 100k index with documents with one searchable field in it. >> That &g

Multi Value field

2009-07-07 Thread John Seer
Hello, I have 100k index with documents with one searchable field in it. That field has multiple values for example doc( search: X search : X Y search: X Y Z id:1) doc( search: X Y K id:2) I am using Standard Analyzer for building and searching, and having problem with scores if the term is "

addIndexesNoOptimize

2009-07-03 Thread John Wang
? I temporarily commented it and the resulting index seems to fine. Thanks -John

Re: KeywordAnalyzer

2009-07-01 Thread John Seer
about - and if my term contains & or \ no results I tried to use QueryParser.escape(); before passing into parser. I am not getting error during quering but not result is found Simon Willnauer wrote: > > On Wed, Jul 1, 2009 at 7:27 PM, John Seer wrote: >> >> Hello, &

KeywordAnalyzer

2009-07-01 Thread John Seer
Hello, I am using KeywordAnalyzer for one of the fields and have problem with it. When my original term has not English characters as well as - & \ /. Is there any alternative for this. Or how to solve the issue with characters? Thanks -- View this message in context: http://www.nabble.com

Re: Lucene Term Encoder

2009-06-29 Thread John Seer
place. Basilcy my main problem is dash for this moment I don't know how to search for term which has a dash in it Thanks Simon Willnauer wrote: > > Hi John, > > what do you mean by encoding? If you can be more clear about what you > are looking for you might get hel

Lucene Term Encoder

2009-06-26 Thread John Seer
Hello, Is there any class in lucene which will do encoding for term? Thanks -- View this message in context: http://www.nabble.com/Lucene-Term-Encoder-tp24228145p24228145.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

Re: How to create a new index

2009-05-20 Thread John Byrne
our problem and simplify the code a little - I think you could just use that constructor every time, because it will only create the index if it does not already exist. -John KK wrote: Thanks a lot @John. That solved the problem and the other advice is really helpful. I'd have bumped

Re: How to create a new index

2009-05-20 Thread John Byrne
do this in your main method, right after you create an instance of SimpleIndexer, but before you call createIndex. -John KK wrote: Thank you very much. I'm using the one mentioned by @Anshum ..but the problem is that after indexing some no of docs what I see is only the last one indexe

Re: How to create a new index

2009-05-20 Thread John Byrne
tories. If you want to automate the generation of the path itself, then there are several ways to do it, but the best way really depends on *why* you're generating a new index. For instance, you could just create a timestamped name, but that name might not be very meaningful. Hope tha

Re: kamikaze

2009-04-30 Thread John Wang
You are right, Grant.Michael, Anmol, let's move this to the kamikaze mailing list: http://groups.google.com/group/kamikaze-users Michael, I have added you by default. -John On Thu, Apr 30, 2009 at 4:37 PM, Grant Ingersoll wrote: > Does Kamikaze have a mailing list? It seems like, to m

Re: Query did not return results

2009-04-24 Thread John Wang
What analyzers are you using for both query and indexing?Can you also post some code on you indexed? -John On Fri, Apr 24, 2009 at 8:02 PM, blazingwolf7 wrote: > > Hi, > > I created a query that will find a match inside documents. Example of text > match "terror india"

Re: kamikaze

2009-04-24 Thread John Wang
Hi Michael: We are using it internally here at LinkedIn for both our search engine as well as our social graph engine. And we have a team developing actively on it. Let us know how we can help you. -John On Fri, Apr 24, 2009 at 1:56 PM, Michael Mastroianni < mmastroia...@glgroup.com>

Re: Faceting, Sort and DocIDSet

2009-04-22 Thread John Wang
Karsten: Yes, you kinda need that for faceting to work. Take a look at FacetDataCache class. -John On Wed, Apr 22, 2009 at 3:06 AM, Karsten F. wrote: > > Hi Dave, > > facets: > in you case a solution with one > int[IndexReader.maxDoc()] > fits. For each document nu

SpellChecker locks folder

2009-04-22 Thread John Cherouvim
Hello After I call the SpellChecker.indexDictionary method the directory which contained the lucene index is locked. I cannot rename of delete the folder (windows). In the source of SpellChecker lines 352-353 I see that after the indexing is done the index is reopened: searcher.close(); sea

Re: Faceting, Sort and DocIDSet

2009-04-20 Thread John Wang
Hi David: We built bobo-browse specifically for these types of usecases: http://code.google.com/p/bobo-browse Let me know if you need any help getting it going. -John On Mon, Apr 20, 2009 at 12:59 PM, Karsten F. wrote: > > Hi David, > > correct: you should avoid reading

Re: LocalLucene/Lucene Spatial

2009-04-19 Thread John Wang
Is there a reason the Query build is from a bitset via a ConstantScoreQuery instead a RangeQuery? Seems we would be paying a penalty for loading the bitset, esp the bitset would be rather sparse. Furthermore, is TrieRangeQuery planning to be somehow used in the spatial package? Thanks -John On

Re: Google's search Appliance relevance ranking

2009-04-17 Thread John Wang
r box (you may have to pay support on top of the "pizza box") my two cents. -John On Fri, Apr 17, 2009 at 9:09 AM, Grant Ingersoll wrote: > > On Apr 16, 2009, at 10:22 AM, Vasudevan Comandur wrote: > > Hi, >> >> The question that I am posting in this grou

Re: Sequential match query

2009-04-13 Thread John Seer
If I understand you guys correct If I have term which has n tokens I will need to create n SpanTermQuery put them in array and pass it to SpanNearQuery? Erik Hatcher wrote: > > > On Apr 12, 2009, at 8:15 AM, Tim Williams wrote: > >> On Sun, Apr 12, 2009 at 5:56 AM, Erik Hatcher >> wrote: >>

Re: Different Analyzer for different fields in the same document

2009-04-10 Thread John Seer
Thanks this is useful class for future... Koji Sekiguchi-2 wrote: > > John Seer wrote: >> Hello, >> There is any way that a single document fields can have different >> analyzers >> for different fields? >> >> I think one way of doing it to create cust

  1   2   3   4   5   6   >