from:"Cam Bazz"

Re: Keyword analyzer will turn query to lowercase

2016-09-22 Thread Cam Bazz

) { this.lowercaseExpandedTerms = lowercaseExpandedTerms; } Query parser lowercases the queries only if it is a wildcard, prefix , fuzzy and range query. and it can be turned off by parser.setLowerCaseExpandedTerms(false); Which solved my problem, Best regards, C. On Thu, Sep 22, 2016 at 5:01 PM, Cam Bazz

Keyword analyzer will turn query to lowercase

2016-09-22 Thread Cam Bazz

Hello, I am indexing userAgent fields found in apache logs. Indexing and querying everything with KeywordAnalyzer - But I found something strange: IndexSearcher searcher = new IndexSearcher(reader); Analyzer q_analyzer = new KeywordAnalyzer(); QueryParser pars

FacetResult getTopChildren

2016-09-19 Thread Cam Bazz

Hello, FacetResult getTopChildren returns the top N facets, however I need to return facets where count is above a certain threshold, for example return all facets that had counts > 10. Is there a way to accomplish this? I have been looking over the API docs and could not find it. I could maybe g

simple facet search

2016-09-18 Thread Cam Bazz

Hello, I have a field called timeSlot in my documents, basically representing an hour. When a query is made, I would like to make a graph of how many doc hits corresponds to each timeSlot, sort it and display a chart of it. I am simply using term queries, to query StringFields, and here is my re

Re: do i need a key if not going to query by key or update the document

2016-09-12 Thread Cam Bazz

ents are otherwise > tiny, to add one if you don't really need it. > > Mike McCandless > > http://blog.mikemccandless.com > > On Mon, Sep 12, 2016 at 5:42 AM, Cam Bazz wrote: > > Hello, > > > > Do I need to add a key, if I will not be > > > >

do i need a key if not going to query by key or update the document

2016-09-12 Thread Cam Bazz

Hello, Do I need to add a key, if I will not be a. updating the document b. will not fetch the document by key? What could be the possible downside of not using a key that uniquely identifies the document? I am building a log processor, and all I will do is sort and iterate. Best regards, C.

indexing array fields

2016-09-03 Thread Cam Bazz

Hello, I need to index arrays of long, usually of long[20], 20 in length. Its been a while since I worked with lucene, last time was probably < version 3. I read https://lucene.apache.org/core/6_2_0/core/org/apache/lucene/document/Field.html There are SortedDocValuesField and SortedSetDocValuesF

Re: Hiring etiquette

2008-10-19 Thread Cam Bazz

How can we get on to that list? Best, On Mon, Oct 20, 2008 at 1:58 AM, Hasan Diwan <[EMAIL PROTECTED]> wrote: > 2008/10/19 Mark Miller <[EMAIL PROTECTED]>: >> You might instead limit your email to those that have agreed to be contacted >> at http://wiki.apache.org/lucene-java/Support > > FWIW, th

Re: triplet store

2008-09-29 Thread Cam Bazz

for instance one described in: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/rusher.html On Mon, Sep 29, 2008 at 4:04 PM, Jason Rutherglen <[EMAIL PROTECTED]> wrote: > What is that? > > On Mon, Sep 29, 2008 at 8:51 AM, Cam Bazz <[EMAIL PROTECTED]> wrote

triplet store

2008-09-29 Thread Cam Bazz

Has anyone tried to implement a triplet store with lucene? Best, -C.B. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: IndexSearcher.search

2008-09-18 Thread Cam Bazz

one moment: the top doc collector is based on some sort of queue, I assume. What kind of queue is that? does it sort based on score, or whichever doc comes first. best. On Wed, Sep 17, 2008 at 9:43 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : Well, it turns out the theoretical maximum f

Re: Some SSD results to share

2008-09-17 Thread Cam Bazz

fusionio.com has the SSD killer. not that expensive neither. just twice or triple the ssd. Best. On Tue, Sep 16, 2008 at 2:16 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > Related, I've been considering filesystem based filters on SSD. That ought > to be rather fast, consume no memory and be as si

Re: Phrase Query

2008-09-17 Thread Cam Bazz

CTED]> wrote: > Are the terms stopwords? > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Cam Bazz <[EMAIL PROTECTED]> >> To: java-user@lucene.apache.org >> Sent: Tuesday

Re: TopDocCollector & Paging

2008-09-17 Thread Cam Bazz

And how about queries that need starting position, like hits between 100 and 200? could we pass something to the collector that will count between 0 to 100 and then get the next 100 records? Best. On Wed, Sep 17, 2008 at 5:16 PM, Erick Erickson <[EMAIL PROTECTED]> wrote: > Doesn't TopDocCollecto

Phrase Query

2008-09-15 Thread Cam Bazz

Hello, Lets say I have two documents, both containing field F. document 0 has the string "a b" as F document 1 has the string "b a" as F I am trying to make a phrasequery like: PhraseQuery pq = new PhraseQuery(); pq.add(new Term("F", "a")); pq.add(new Term("F", "b"));

Re: Phrase Query

2008-09-15 Thread Cam Bazz

I noticed this was because I was using a KeywordAnalyzer. Is it possible to write a document with different analyzers in different fields? Best. On Tue, Sep 16, 2008 at 8:33 AM, Cam Bazz <[EMAIL PROTECTED]> wrote: > Hello, > > Lets say I have two documents, both containing field

Re: TopDocs question

2008-09-15 Thread Cam Bazz

Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Cam Bazz <[EMAIL PROTECTED]> >> To: java-user@lucene.apache.org >> Sent: Monday, September 15, 2008 11:25:39 PM >> Subject: Re: TopDocs

Re: IndexSearcher.search

2008-09-15 Thread Cam Bazz

In cases where we dont know the possible number of hits -- and wanting to test the new 2.4 way of doing things, could I use custom hitcollectors for everything? any performance penalty for this? from what I understand both TopDocCollector and TopDocs will try to allocate an array of Integer.MAX_V

Re: TopDocs question

2008-09-15 Thread Cam Bazz

Yes, I looked into implementing a custom collector that would return number of hits, but - I could not. collect() can not access anything that is final, and final can not be incremented. Any ideas? Best. On Tue, Sep 16, 2008 at 6:05 AM, Daniel Noll <[EMAIL PROTECTED]> wrote: > Cam B

TopDocs question

2008-09-15 Thread Cam Bazz

Hello, Could it harm if I make a searcher.search(query, Integer.MAX_VALUE) ? I just need to make a query to get the number of hits in this case, but I dont know what the max hits will be. Also When I make a topdocs.totalHits is that same as topdocs.scoreDocs.length()? Best. -C.A. ---

warming up searchers

2008-09-15 Thread Cam Bazz

Hello, What kind of query is best to warm up a searcher? How many searches should I do? Are we supposed to search for things we know do exist, or is it better to make queries we know they dont exist? Best. -C.B. - To unsubscrib

Re: instantiated index in 2.4

2008-09-15 Thread Cam Bazz

Hello Karl; This is good good good news. It works. However, I added a document like doc.add(new Field("f", "a", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); and then searched. The score is 0.3~ for the found document. should not it be 1.0? also it will find when searched for "f","b" o

Re: more on isDeleted

2008-09-15 Thread Cam Bazz

n Mon, Sep 15, 2008 at 11:09 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > It will return true if the provided docID was deleted, by term or query or > docID (due to exception, privately) prior to when you asked IndexWriter to > give you a "realtime" IndexReader. >

Re: more on isDeleted

2008-09-15 Thread Cam Bazz

buffered deletes down to docID. Those deletes > that are against existing segments in the index will be flushed at that > point to those segments; the deletes that apply only to buffered docs will > be held in RAM and used by the RAMIndexSearcher that searches IndexWriter's > buff

Re: 2.4 questions

2008-09-15 Thread Cam Bazz

out of curiousity and somewhat unrelated to this thread. when can we expect to see 2.4? it seems much much as changed. so people would want to port their code? Best. On Mon, Sep 15, 2008 at 10:56 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Cam Bazz wrote: > >

Re: IndexWriter commit

2008-09-15 Thread Cam Bazz

t;> still in the OS's write cache when it crashed. >> >> But the guarantee only holds if the underlying storage system is "honest" >> about fsync(), ie, it truly flushes all written bytes for that file to disk >> before returning. >> >> Mike >>

Re: 2.4 questions

2008-09-15 Thread Cam Bazz

well, I did not understand here. so there is a no way of using the new constructor - and specify autoCommit = false ? Best On Mon, Sep 15, 2008 at 10:30 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Cam Bazz wrote: > >> However the documentation states that autoCom

Re: more on isDeleted

2008-09-15 Thread Cam Bazz

. On Mon, Sep 15, 2008 at 10:20 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > You'll have to open a new IndexReader after the delete is committed. > > An IndexReader (or IndexSearcher) only searches the point-in-time snapshot > of the index as of when it was ope

Re: 2.4 questions

2008-09-15 Thread Cam Bazz

certain criteria. Best. On Mon, Sep 15, 2008 at 10:05 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Cam Bazz wrote: > >> Hello, >> >> I see that IndexWriter.flush() is depreciated in 2.4. What do we use? > > Looks like you already found it, but the j

more on isDeleted

2008-09-15 Thread Cam Bazz

Hello, Here is what I am trying to do: dir = FSDirectory.getDirectory("/test"); writer = new IndexWriter(dir, analyzer, true, new IndexWriter.MaxFieldLength(2)); writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH); Document da = new Document(); da.ad

Re: IndexWriter commit

2008-09-15 Thread Cam Bazz

g) is a > private method that should never have been in the javadocs. Thanks for > raising this! > > Mike > > Cam Bazz wrote: > >> Hello, >> >> What is the difference between flush in <2.4 and commit? >> >> Also I have been looking over docs, a

IndexReader.isDeleted

2008-09-15 Thread Cam Bazz

Hello, I would like to get advantage of isDeleted. If I delete a document from index, and not commit, and index searcher is not reinstantiated, how can I check if a document is marked for deletion? I tried it with both commit() and without committing, the isDeleted(mydeleteddocid) returns always f

IndexSearcher.search

2008-09-15 Thread Cam Bazz

Hello, What is the new favorable way of searching a query? I understand Hits will be depreciated. So how do we do it the new way? With hit collector? Best. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-

IndexWriter commit

2008-09-15 Thread Cam Bazz

Hello, What is the difference between flush in <2.4 and commit? Also I have been looking over docs, and they mention commit(long) but there is no commit(long) method but only commit() Best. - To unsubscribe, e-mail: [EMAIL PROT

2.4 questions

2008-09-15 Thread Cam Bazz

Hello, I see that IndexWriter.flush() is depreciated in 2.4. What do we use? Also I used to make a: try { nodeWriter = new IndexWriter(nodeDir, true, analyzer, false); } catch(FileNotFoundException e) { nodeWriter = new IndexWriter(nodeDir, true, analyzer,

instantiated index in 2.4

2008-09-15 Thread Cam Bazz

Hello, I have been looking at instantiated index in the trunk. Does this come with a searcher? Are the adds reflected directly to the index? Or is it just an experimental thing only with reader and writer? Best. - To unsubscrib

Re: patching lucene-1314

2008-09-15 Thread Cam Bazz

[EMAIL PROTECTED]> wrote: > I usually do: > cd > patch -p 0 -i > > See also the HowToContribute page on the wiki. > > > On Sep 15, 2008, at 7:38 AM, Cam Bazz wrote: > >> Hello, >> >> To patch for lucene-1314 what must I do? >> >>

query to return docs that has a certain field

2008-09-11 Thread Cam Bazz

Hello, Lets say we have different document types, and one type of document only contains field A. How can I make a query so that I get all the documents that only has field A? There is a get all documents query, but that would get all the documents whether they contain field A or not. Is there

Re: ramdisks

2008-09-05 Thread Cam Bazz

> On Thu, 2008-09-04 at 17:58 +0200, Cam Bazz wrote: > > anyone using ramdisks for storage? there is ramsam and there is also > fusion > > io. but they are kinda expensive. any other alternatives I wonder? > > We've done some comparisons of RAM (Lucene RAM

ramdisks

2008-09-04 Thread Cam Bazz

hello, anyone using ramdisks for storage? there is ramsam and there is also fusion io. but they are kinda expensive. any other alternatives I wonder? Best.

lucene ram buffering

2008-09-04 Thread Cam Bazz

hello, I was reading the performance optimization guides then I found : writer.setRAMBufferSizeMB() combined with: writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH); this can be used to flush automatically so if the ram buffer size is over a certain limit it will flush. now the question:

Re: string similarity measures

2008-09-04 Thread Cam Bazz

at 5:02 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > > 4 sep 2008 kl. 15.54 skrev Cam Bazz: > > yes, I already have a system for users reporting words. they fall on an >> operator screen and if operator approves, or if 3 other people marked it >> as >> curse, t

Re: string similarity measures

2008-09-04 Thread Cam Bazz

ver shingles? Best, On Thu, Sep 4, 2008 at 4:12 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > > 4 sep 2008 kl. 14.38 skrev Cam Bazz: > > > Hello, >> This came up before but - if we were to make a swear word filter, string >> edit distances are no good. for exampl

Re: Realtime Search for Social Networks Collaboration

2008-09-04 Thread Cam Bazz

Hello Jason, I have been trying to do this for a long time on my own. keep up the good work. What I tried was a document cache using apache collections. and before a indexwrite/delete i would sync the cache with index. I am waiting for lucene 2.4 to proceed. (query by delete) Best. On Wed, Sep

string similarity measures

2008-09-04 Thread Cam Bazz

Hello, This came up before but - if we were to make a swear word filter, string edit distances are no good. for example words like `shot` is confused with `shit`. there is also problem with words like hitchcock. appearently i need something like soundex or double metaphone. the thing is - these are

lucene based tagging structure

2008-09-01 Thread Cam Bazz

Hello, Recently I developed an interest in making a lucene based structure for tagging. As we all know lucene's update is not real-time and one has to delete a document prior to updating it. I have been googling for different approaches to a lucene based tagging structure, and I stumbled upon ht

getting a random doc from index

2008-08-29 Thread Cam Bazz

hello, how could I possibly get a select a random document out of a document collection inside a lucene index? best regards, -C.B.

top terms

2008-08-12 Thread Cam Bazz

hello, how do we get the terms with the highest frequency for a given field? I know one can TermEnum terms = searcher.getIndexReader().terms() then, iterate over it and filter the fields required and count them, but is there a way to get lets say top 50 terms for a given field without iterating?

Re: delete by doc id

2008-08-12 Thread Cam Bazz

in > the past, i.e. DT Search, Verity, etc. > > Am I missing something? For us it has been a "best practice" to treat > Lucene as described. > > //andy > > > On Fri, Aug 8, 2008 at 2:39 PM, Cam Bazz <[EMAIL PROTECTED]> wrote: > > hello, > > >

Re: delete by doc id

2008-08-12 Thread Cam Bazz

ones to delete? A separate reader > running on the side? > > The problem is, as IndexWriter merges segments, the IDs shift. Any reader > you have already open won't see this shift (until you reopen it), so you > could end up deleting the wrong IDs. > > Mike > >

delete by doc id

2008-08-08 Thread Cam Bazz

hello, what would happen if I modified the class IndexWriter, and made the delete by id method public? I have two fields in my documents and I got to be able to delete by those two fields, (by query in other words) and I do not wish to go trunk version. I am getting quite desperate, and if not f

Re: next release

2008-08-04 Thread Cam Bazz

TED]> wrote: > > Cam Bazz wrote: > > I am still in trouble deleting documents. >> > > OK but 2.3.3 isn't going to help you here -- it doesn't change anything > about deletion of docs. > > Appearently - when an indexwriter and searcher is open at the sa

Re: next release

2008-08-04 Thread Cam Bazz

exception. Best, -C.B. On Tue, Aug 5, 2008 at 2:33 AM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > Alas, not yet -- at least it hasn't been discussed yet. > > Mike > > > Cam Bazz wrote: > > hello, >> >> is there a

next release

2008-08-04 Thread Cam Bazz

hello, is there any date for the 2.3.3 release? best, -C.B.

Re: deleting documents with doc id

2008-08-01 Thread Cam Bazz

well just checked the api, the deleteDocuments(term[]) method deletes any document containing any of the terms. I think I will go to the trunk version. best. -c.a. On Sat, Aug 2, 2008 at 12:14 AM, Cam Bazz <[EMAIL PROTECTED]> wrote: > > from what I understand: > there is a delet

Re: deleting documents with doc id

2008-08-01 Thread Cam Bazz

from what I understand: there is a deleteDocuments by a term array method? I was asking if there was a side effect of deleting from indexReader that i get from an indexsearcher and not the writer. Best. On Sun, Jul 27, 2008 at 9:44 PM, Karsten F. <[EMAIL PROTECTED]>wrote: > > Hi, > > only to b

Re: deleting documents with doc id

2008-07-26 Thread Cam Bazz

2:13 AM, Karl Wettin <[EMAIL PROTECTED]> wrote: > > 23 jul 2008 kl. 22.08 skrev Cam Bazz: > > > hello - >> >> if I make a query and get the document ids and delete with the document id >> - >> could there be a side effect? >> >> my inde

deleting documents with doc id

2008-07-23 Thread Cam Bazz

hello - if I make a query and get the document ids and delete with the document id - could there be a side effect? my index is committed periodically, but i can not say when it is committed. best regards, -c.b.

Re: lucene delete by query

2008-07-23 Thread Cam Bazz

how reliable is the version in the trunk? is it ok for production? On Wed, Jul 23, 2008 at 5:25 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > It's in the lucene trunk (current development version). > IndexWriter.deleteDocuments(Query query) > > -Yonik > > On Wed,

lucene delete by query

2008-07-23 Thread Cam Bazz

hello, was not there a lucene delete by query feature coming up? I remember something like that, but I could not find an references. best regards, -c.b.

boolean query or

2008-07-08 Thread Cam Bazz

Hello, Is it possible to make a boolean query where a word is equal to fieldA or fieldB? in other words, I like to search a word in two fields, if word passes in fieldA or fieldB, then it is a hit. Best, -C.B.

uniqueWords, and termDocs

2008-06-23 Thread Cam Bazz

Hello, I need to be able to select a random word out of all the words in my index. how can I do this tru termDocs() ? Also, I need to get a list of unique words as well. Is there a way to ask this to lucene? Best Regards, -C.B.

Re: lucene wildcard query with stop character

2008-06-12 Thread Cam Bazz

ote: > I assume you want all of your queries to function in this way? > > If so, you could just translate the * character into a ? at search time, > which should give you the functionality you are asking for. > > Unless I'm missing something. > > Matt > >

lucene wildcard query with stop character

2008-06-12 Thread Cam Bazz

Hello, Imagine I have the following documents having keys A A>B A>B>C A>B>D A>B>C>D now Imagine a query with keyword analyzer and a wildcard: A>B>* which will bring me A>B>C , A>B>D and A>B>C>D but I just want to get A>B>C and A>B>D so can I make a query like A>B>* but does not have the > cha

Re: fieldNorm and fieldValueUniqueness

2008-06-11 Thread Cam Bazz

yes, figured it out. thanks. how about checking for uniqueness? Best. On Wed, Jun 11, 2008 at 5:39 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > > 11 jun 2008 kl. 16.04 skrev Cam Bazz: > >> >> When you look at the fields of a document with Luke, there is a norm &g

fieldNorm and fieldValueUniqueness

2008-06-11 Thread Cam Bazz

Hello, When you look at the fields of a document with Luke, there is a norm column. I have not been able to figure out what that is. The reason I am asking is that I am trying to build a uniqueness model. My Index is structured as follows: classID, textID, K, V classID is a given class. textID

Re: How to add PageRank score with lucene's relevant score in sorting

2008-05-29 Thread Cam Bazz

Hello, little off topic, but how did you obtain the pagerank for each page. did you calculate it, or have you obtained it with some other way while getting a specific site. Best. On Thu, May 29, 2008 at 3:28 PM, 过佳 <[EMAIL PROTECTED]> wrote: > thanks Glen , we have tried it , but the bottleneck

Re: text extraction from pdf

2008-05-15 Thread Cam Bazz

Hello Bill, Problem I am having is that some of them has multiple columns. and multiple word boxes. Does the xpdf patch extract different columns and wordboxes? Best, -C.B. On Wed, May 14, 2008 at 6:35 PM, Bill Janssen <[EMAIL PROTECTED]> wrote: > > > the unix program pdf2text can convert keep

text extraction from pdf

2008-05-14 Thread Cam Bazz

Hello All, Any suggestions for extracting text from PDF? I have tried pdfbox, but it works nice, however if the pdf is structured, it wont provide good results. For example consider the pdf: P1 Lorem Ipsum Bla bla P3 Lorem2 Ipsum2 P1 bla bla P2 bla bla bla P

document scoring

2008-03-20 Thread Cam Bazz

Hello, I am querying an index by using custom boost factors for each field. Usually a query looks like: fieldA:"term1"^0.2 fieldB:"term2"^4 when I get scores from HitCollector, they are not necessarily between 0 and 1. How can I normalize these scores? Best. -C.A.

using hitcollector and scoring at the same time

2008-03-20 Thread Cam Bazz

Hello, I recently changed my query logic. Before, I was getting a hits object, and now I am using a bitSet with a hitcollector. The reason for using bitSet is document caching, and being able to count how many hits belong to which categories. Although my new logic works, I have noticed that now t

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz

IndexReader. IndexReader > still searches only a point in time. > > Mike > > Cam Bazz wrote: > > > yes, I meant the same index. > > > > I thought with the new changes - the index reader would see the > > changes > > without re-opening. > > It wo

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz

what you mean by "same thread". Maybe you meant "same > index"? > > Yes, if the IndexReader reopens. > > IndexWriter.commit() makes the changes visible to readers, and makes > the changes durable to os/computer crash or power outage. > > Mike > > Cam

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz

wrote: > > It's a hard drive issue. When you call fsync, the OS asks the hard > drive to sync. > > Mike > > Cam Bazz wrote: > > > Hello, > > > > I understand the issue. But I have not understood - is this > > hardware related > > issue - i.e a

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz

bytes are not > actually written to stable storage. If you have such a device that > lies then Lucene 2.4 won't be able to guarantee index consistency on > crash/power outage. > > Mike > > Cam Bazz wrote: > > > Hello, > > > > What do you mean by I

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz

> files in the index to stable storage (assuming your IO system doesn't > "lie" on fsync). > > Mike > > On Mar 17, 2008, at 4:33 AM, Cam Bazz wrote: > > > Nice. Thanks. > > > > will the 2.4 have commit improvements that we previously talked about?

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz

riter. > > Mike > > Cam Bazz wrote: > > > Hello Erick, > > > > Has anyone found a way for deleting a document with a query? I > > understand it > > can be deleted via terms, but I need to delete a document with two > > terms, > > that is the only w

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz

Hello Erick, Has anyone found a way for deleting a document with a query? I understand it can be deleted via terms, but I need to delete a document with two terms, that is the only way I can identify my document is by looking at two terms not one. best. On Fri, Mar 14, 2008 at 4:58 PM, Erick Eri

query question

2008-02-19 Thread Cam Bazz

Hello, I have a tokenized field where I store some info. Lets say I have "abc 1234" and "abc 678" When the user searches for "abc1234" how can I find "abc 1234" ? Best. -C.B.

Re: matching products with suggest feature

2008-02-14 Thread Cam Bazz

de, then for the first it > will suggest "abcde" but for the second it won't suggest it because the > ngrams produced are "abc" and "bce" .. and "bce" does not appear in > "abcde". > > Am I right? If not, can you elaborate more on t

Re: matching products with suggest feature

2008-02-13 Thread Cam Bazz

e you > add more terms than what exists, it won't find anything. > > On Feb 13, 2008 6:54 PM, Cam Bazz <[EMAIL PROTECTED]> wrote: > > > Hello; > > > > I am trying to make a product matcher based on lucene's ngram based > > suggest. > > I did s

matching products with suggest feature

2008-02-13 Thread Cam Bazz

Hello; I am trying to make a product matcher based on lucene's ngram based suggest. I did some changes so that instead of giving the speller a dictionary I feed it with a List. For example lets say I have "HP NC4400 EY605EA CORE 2 DUO T5600 1.83GHz/512MB/80GB/12.1'' NOTEBOOK" and I index it with

Re: DefaultIndexAccessor

2008-02-04 Thread Cam Bazz

t; a finally block. Batch load multiple docs, but if your just randomly > adding > a doc, get the Writer, add it, and then release the Writer in a finally > block. If you are batch loading a million docs and you want to be able > to see them > as they are added: get the writer and add

Re: DefaultIndexAccessor

2008-02-04 Thread Cam Bazz

ent, and an app that adds docs will be a bit more responsiveeg it > wont hang as Readers are being reopened. > > I also have to bring the AccessProvider classes back. No easy way to use > your own custom Readers without it...I shouldn't have stripped it out. > > - Mark &

DefaultIndexAccessor

2008-02-04 Thread Cam Bazz

Hello, Regarding https://issues.apache.org/jira/browse/LUCENE-1026 , this seems very interesting. I have read the discussion on the page, but I could not figure out which set of files is the latest. Is it the IndexAccessor-1.26.2008.zip file? I will read through the code, make my own tests, and s

ParallelReader question

2008-02-04 Thread Cam Bazz

Hello, When using a parallel reader with two indexes lets say, when we call a document with id, is it the combined fields of a document from the two indexes that return? The documentation was not clear on that one, except the document(int n, FieldSelector fs) method. Best, -C.B.

Re: appending field to an existing index

2008-02-04 Thread Cam Bazz

Hello, I have read the parallel reader doc. It says it must have the same number of documents as the other index. When we are using a writer - searcher combination, how can we integrate this parallel reader into game. Simply, I have some documents, and I just like to mark them, in an efficient wa

document Id question, again

2008-01-31 Thread Cam Bazz

Hello; If no document is ever deleted nor updated from an index, will the document id change? under which circumstances will the document ids change, apart from delete? Best Regards, -C.B.

hitcollector and sort

2008-01-28 Thread Cam Bazz

Hello, How can I use a hit collector and sort object in query? I looked at the API and sort is only usable with hits. Is it even possible? since hitcollector returns a bitset - how do we do the ordering? Best, -C.B.

IndexSearcher and Multiple Threads

2008-01-28 Thread Cam Bazz

Hello, Is IndexSearcher ThreadSafe? I made a simple httpserver using grizzly as described in http://jlorenzen.blogspot.com/2007/06/using-grizzly-to-create-simple-http.html which submit queries to a single instance of indexsearcher and I get some errors (when I query with more then one threads) suc

Re: TermEnum trick

2008-01-25 Thread Cam Bazz

5 PM, Erick Erickson <[EMAIL PROTECTED]> wrote: > Can you show us what you've tried? > > Erick > > On Jan 25, 2008 10:49 AM, Cam Bazz <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > How about getting which documents have the that term as a bitset?

Re: TermEnum trick

2008-01-25 Thread Cam Bazz

} > list.add(term.text()); > } while (theTerms.next()); > > > On Jan 25, 2008 10:24 AM, Cam Bazz <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > How do we get the TermEnum trick? I could not figure it out. basically, >

TermEnum trick

2008-01-25 Thread Cam Bazz

Hello, How do we get the TermEnum trick? I could not figure it out. basically, I have a field called category, and I like to learn what different values the category field takes. (sort of like unique in sql) Best Regards, -C.B.

Re: stange exception while indexing

2008-01-24 Thread Cam Bazz

<[EMAIL PROTECTED]> wrote: > > That means that one of the merges, which run in the background by > default with 2.3, hit an unhandled exception. > > Did you see another exception logged / printed to stderr before this > one? > > Mike > > Cam Bazz wrote: > > >

stange exception while indexing

2008-01-24 Thread Cam Bazz

Does anyone have any idea about the error I got while indexing? Best Regards, -C.B. Exception in thread "main" java.io.IOException: background merge hit exception: _kq:C962870 _kr:C2591 into _ks [optimize] at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1749) at org.apach

HitCollector

2008-01-22 Thread Cam Bazz

Hello, Could someone show me a concrete example of how to use HitCollector? I have documents which have a field category. When I run a query, I need to sort results by category as well as count how many hits are there for a given category. I understand: searcher.search(Query, new HitCollector()

Re: delete a document from indexwriter

2008-01-22 Thread Cam Bazz

t; > Do you have a specific use case in mind here? I think we'd like to > make this option available someday in IndexWriter, but doing so now > (when there is no way to get a "reliable" docID) seems too dangerous... > > Mike > > Cam Bazz wrote: > > > Hel

Re: delete a document from indexwriter

2008-01-21 Thread Cam Bazz

using a reader, it will acquire the write.lock, > which will fail if you have another writer open on that index). > > Mike > > Cam Bazz wrote: > > > Hello Michael; > > > > how can I construct a chain where both reader and writer at the > > same state? > &g

Re: Clarification about IndexWriter.deleteDocuments and flush.

2008-01-21 Thread Cam Bazz

the source to lucene made makes me think of extensions. Nice code. Best, On Jan 21, 2008 4:47 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Cam Bazz wrote: > > > Hello, > > > > When we delete documents from index - will it autoflush when count of >

Re: delete a document from indexwriter

2008-01-21 Thread Cam Bazz

gt; You can also use Solr, which provides "delete by query". > > Mike > > Cam Bazz wrote: > > > Hello Mike; > > > > How about deleting by a compount term? > > > > for example if I have a document with two fields srcId and dstId > > and I wa

1 2 >

1 - 100 of 118 matches

Mail list logo