Re: file open handles?

2010-01-26 Thread Jason Rutherglen
Jamie, How often are you calling getReader? Is it only these files? Jason On Tue, Jan 26, 2010 at 12:58 PM, Jamie wrote: > Ok. I spoke too soon. The problem is not solved. I am still seeing these > file handles lying around. Is this something I should be worried about? > We are no

Analyzer for stripping non alpha-numeric characters?

2010-02-04 Thread Jason Rutherglen
Is there an analyzer that easily strips non alpha-numeric from the end of a token? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Analyzer for stripping non alpha-numeric characters?

2010-02-04 Thread Jason Rutherglen
wrote: > Hi Jason, > > Solr's PatternReplaceFilter(ts, "\\P{Alnum}+$", "", false) should work, > chained after an appropriate tokenizer. > > Steve > > On 02/04/2010 at 12:18 PM, Jason Rutherglen wrote: >> Is there an anal

Re: Analyzer for stripping non alpha-numeric characters?

2010-02-04 Thread Jason Rutherglen
Answering my own question... PatternReplaceFilter doesn't output multiple tokens... Which means messing with capture state... On Thu, Feb 4, 2010 at 2:16 PM, Jason Rutherglen wrote: > Transferred partially to solr-user... > > Steven, thanks for the reply! > > I wonder if

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Jason Rutherglen
Peter, Perhaps other concurrent operations? Jason On Tue, Feb 23, 2010 at 10:43 AM, Peter Keegan wrote: > Using Lucene 2.9.1, I have the following pseudocode which gets repeated at > regular intervals: > > 1. FSDirectory dir = FSDirectory.open(java.io.File); > 2. dir.set

Re: If you could have one feature in Lucene...

2010-02-25 Thread Jason Rutherglen
long - whatever > happened to CSF? That feature is so 2006, and we still > don't have it? I'm completely disturbed about the whole situation myself. > > Who the heck is in charge here? > > On 02/25/2010 12:51 PM, Jason Rutherglen wrote: >> >> It'd be great to

Is it safe to use reopen on IndexReader

2010-03-31 Thread Jason Tesser
= new IndexSearcher(ir.reopen(true)); if(ir != indexSearcher.getIndexReader()){ ir.close(); } Is the if(ir != indexSearcher.getIndexReader()){ check needed? Thanks, Jason Tesser dotCMS Lead Development Manager 1-305-858-1422

fastest way to gather simple terms that match documents?

2010-03-31 Thread Jason Eacott
keen to avoid that option if possible. Is there a quick way to discover this information? All I need is a list of terms (as simple strings would be fine), I don't care how many were found or what position or anything else. just which ones matched. thoug

Re: Lucene Challenge - sum, count, avg, etc.

2010-04-01 Thread Jason Eacott
Thanks for the ref - didn't know about Pig before. the language and approach looks useful, so now I'm wondering if it couldn't be used across lucene over hadoop too. If data was indexed in lucene and Pig knew that, then it could make for an interesting alternate lucene query language. could this w

round robin search results with same score

2007-11-11 Thread Jason Bradfield
. BTW. I am using Hibernate Search.. But have the ability to do pure Lucene... Thanks. Jason. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Find last term

2008-05-13 Thread Jason Rutherglen
Last term, field, TermEnum On Tue, May 13, 2008 at 12:34 PM, Erick Erickson <[EMAIL PROTECTED]> wrote: > Find the last term of what? Document? Field in an index? Query? > > Best > Erick > > On Tue, May 13, 2008 at 12:28 PM, Jason Rutherglen < > [EMAIL PROTECTED]>

Find last term

2008-05-13 Thread Jason Rutherglen
It is easy to find the first term using TermEnum. Is there a way to find the last term without using StringIndex and binarysearch? Are there plans to offer this functionality?

Re: slow FieldCacheImpl.createValue

2008-05-20 Thread Jason Rutherglen
https://issues.apache.org/jira/browse/LUCENE-1278 solves this problem On Tue, May 20, 2008 at 1:32 AM, Anshum <[EMAIL PROTECTED]> wrote: > Hey Alex, > I guess you haven't tried warming up the engine before putting it to use. > Though one of the simpler implementation, you could try warming up the

Re: Handeling when a field does not exist in the document

2008-05-22 Thread Jason Rutherglen
That is an interesting problem. https://issues.apache.org/jira/browse/LUCENE-1292 will build a tag index that uses a ParallelReader to allow tag fields to be searchable. The tag index does not use the usual IndexWriter but uses a specialized realtime updateable index built for tags. Depending on

Re: Improving search performance

2008-05-22 Thread Jason Rutherglen
It would be interesting to see the results of using a custom IndexReader that implements http://dsiutils.dsi.unimi.it/docs/it/unimi/dsi/util/ImmutableExternalPrefixMap.htmlor something like it. The only problem right now would be hooking into the Lucene SegmentMerger to merge other indices such as

Re: Improving search performance

2008-05-22 Thread Jason Rutherglen
Query time boosting has no bottlenecks. Storing will not affect performance. You will probably want to use PrefixFilter and ConstantScoreRangeQuery. Solr has ConstantScorePrefixQuery. Simply means if the document contains the term, the result will show, the scoring will not be quite the same be

Re: Improving search performance

2008-05-24 Thread Jason Rutherglen
There needs to be a solution to that problem. I noticed it several years ago which is why ever since have designed systems using MultiSearcher concepts. There should only be one instance of deleted docs per IndexReader now that there is reopen. Editing the live deleted docs does not seem like so

RAMDirectory IndexInput and IndexOutput

2008-06-19 Thread Jason Rutherglen
Seeing strange behavior with RAMDirectory. Is a file designed to supported IndexOutput being open concurrently with IndexInput? I open an IndexInput with IndexOutput open, with data written to the file previously, and the IndexInput is reporting a filelength of 0, while Directory.fileLength() rep

Re: RAMDirectory IndexInput and IndexOutput

2008-06-19 Thread Jason Rutherglen
ROTECTED]> wrote: > Did you try calling flush() on the IndexOutput before opening the > IndexInput? > > -Yonik > > On Thu, Jun 19, 2008 at 12:13 PM, Jason Rutherglen > <[EMAIL PROTECTED]> wrote: > > Seeing strange behavior with RAMDirectory. Is a file design

Re: RAMDirectory IndexInput and IndexOutput

2008-06-19 Thread Jason Rutherglen
tput.writeBytes(bytes, bytes.length); output.flush(); System.out.println("fileLength: "+ramDirectory.fileLength("test")); output = ramDirectory.createOutput("test"); IndexInput input = ramDirectory.openInput("test"); System.out.println("input l

Re: RAMDirectory IndexInput and IndexOutput

2008-06-19 Thread Jason Rutherglen
oblem here). > > -Yonik > > On Thu, Jun 19, 2008 at 3:10 PM, Jason Rutherglen > <[EMAIL PROTECTED]> wrote: > > public void testMain() throws IOException { > >RAMDirectory ramDirectory = new RAMDirectory(); > >IndexOutput output = ramDirectory.

Re: RAMDirectory IndexInput and IndexOutput

2008-06-19 Thread Jason Rutherglen
Created a RAMDirectory like directory class that uses ByteArrayRandomAccessIO from http://reader.imagero.com/uio/ to allow concurrent random file access. On Thu, Jun 19, 2008 at 3:33 PM, Jason Rutherglen < [EMAIL PROTECTED]> wrote: > Looks like it cannot be used for a log system t

Re: yet again: getting the minimum and maximum value of a field

2008-06-25 Thread Jason Rutherglen
I looked heavily at this. It requires a customization of TermInfosReader whereby the tii (term dictionary) SegmentTermEnum is traversed looking for the last term with a particular field. Once found, from that position in the tis SegmentTermEnum would need to be traversed again for the last term w

Class for serializing TokenStream to IndexOutput

2008-06-26 Thread Jason Rutherglen
Is there a class to do this?

Re: Scaling

2008-07-17 Thread Jason Rutherglen
The scaling per machine should be linear. The overhead from the network is minimal because the Lucene object sizes are not impacting. Google mentions in one of their early white papers on scaling http://labs.google.com/papers/googlecluster-ieee.pdf that they have sub indexes which are now popular

Re: Scaling

2008-07-18 Thread Jason Rutherglen
could be done with indexes that are updates often however it would seem to require a lot of work with possibly little to gain, unless you want to offer the user 0.05 second response times. On Fri, Jul 18, 2008 at 3:49 AM, Eric Bowman <[EMAIL PROTECTED]> wrote: > Jason Rutherglen wrote:

Re: Using lucene as a database... good idea or bad idea?

2008-07-30 Thread Jason Rutherglen
A possible open source solution using a page based database would be to store the documents in http://jdbm.sourceforge.net/ which offers BTree, Hash, and raw page based access. One would use a primary key type of persistent ID to lookup the document data from JDBM. Would be a good Lucene project

Re: Lucene Concurrency Issue

2008-08-07 Thread Jason Rutherglen
The contrib realtime search patch enables the functionality you described. https://issues.apache.org/jira/browse/LUCENE-1313 On Wed, Aug 6, 2008 at 7:45 PM, Alex Wang <[EMAIL PROTECTED]> wrote: > > Hi all, > > To allow mutilple users concurrently add, delete docs and at the same time > search the

Re: Text storing design and performance question

2007-01-10 Thread Jason Pump
Renaud, one optimization you can do on this is to try the first 10kb, see if it finds text worth highlighting, if not, with a slight overlap try the next 9.9kb - 19.9kb or just 9.9kb -> end if you're feeling lazy. This assumes that most good matches are at the start of the document, and that th

Re: Text storing design and performance question

2007-01-11 Thread Jason Pump
the start of the document. Most queries should have results that meet that criteria. Renaud Waldura wrote: Jason: Interesting idea, thanks. But how do you know whether the highlighting is any good? I thought highlighter implemented some kind of strategy to find the best fragment. Say my q

Re: Index a source, but not store it... can it be done?

2007-03-08 Thread Jason Pump
If you store a hash code of the word rather then the actual word you should be able to search for stuff but not be able to actually retrieve it; you can trade precision for "security" based on the number of bits in the hash code ( e.g. 32 or 64 bits). I'd think a 64 bit hash would be a reasonab

Re: Index a source, but not store it... can it be done?

2007-03-09 Thread Jason Pump
documents places on them and how much effort he thinks that a hacker might be prepared to put into recovering the text. The best you're ever going to do is to protect the index as well as you do the original documents. jch ----

OT re Emulating Pages Search

2007-04-03 Thread Jason Pump
x27;t possibly score higher then my #10 result right now. In this situation the idea of supplying a page start/end does become valuable in reducing load and does not require maintaining state inside the engine. Jason Erick Erickson wrote: Efficient in your situation, maybe. Good for everybody? Pro

Re: Language detection library

2007-05-03 Thread Jason Pump
commands, e-mail: [EMAIL PROTECTED] -- Jason Pump Technical Architect Healthline 660 Third Street, Ste. 100 San Francisco, CA 94107 direct dial 415.281.3133 cell 510.812.1784 www.healthline.com 09 F9 11 02 9D 74 E3 5B D8 41 5

Re: product based term combination for BooleanQuery?

2007-07-03 Thread Jason Pump
You're not using any type of phrase search. Try -> ( (title:"John Bush"^4.0) OR (body:"John Bush") ) AND ( (title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush) ) or maybe ( (title:"John Bush"~4^4.0) OR (body:"John Bush"~4) ) AND ( (title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush

De-duping MultiSearcher results

2005-11-14 Thread Jason Calabrese
returned docs to removed dups using the guid field in the index. This work fine when the results are under about 5,000 documents, but when there is a large number of results a search take way too long. Does anyone know of a better and more efficient way t

Re: De-duping MultiSearcher results

2005-11-14 Thread Jason Calabrese
when I search multiple indexes. --Jason > You probably want to build a Filter. > > I've been planning to do exactly this on our own system, only our > duplicates are indicated by documents having the same value in an MD5 > digest field, instead of a GUID field. > > For

Re: Inappropriate content detection

2006-02-06 Thread Jason Polites
There is also an open source java anti spam api which does a baysian scan of email content (plus other stuff). You could retro-fit to work with raw text. www.jasen.org (get the latest HEAD from CVS as the current release is a bit old... new version imminent) - Original Message - From:

Re: Getting count of documents matching a query?

2006-04-07 Thread Jason Calabrese
I just wrote some simple code to test this. For my test I ran the test with 3 queries: - A 3 term boolean - A single term query with over 5000 hits - A single term query with 0 hits For each query I ran the ran 4 tests of 10,000 searches: 1) using hits.length to get the counts and the standard si

Re: Stemming terms in SpanQuery

2006-05-02 Thread Jason Calabrese
I think the best way to tokening/stem is to use the analyzer directly. for example: TokenStream ts = analyzer.tokenStream(field, new StringReader(text)); Token token = null; while ((token = ts.next()) != null) { Term newTerm = new Term(field, token.termTe

Re: Adding stem AND original term

2006-06-28 Thread Jason Pump
I would think what you want to do is index on the stem, and rank on the stem and the original form. After all, if you match exactly, then you better match for the stem. Robert Haycock wrote: Hi, I started using the EnglishStemmer and noticed that only the stem gets added to the index. I woul

Re: search with RangeFilter.Less

2006-06-28 Thread Jason Pump
It's a string comparison. Make the "5" a "05" would be a simple workaround. Jason Peter W. wrote: Hello, I'm trying to do a numerical search for a property in Lucene using RangeFilter.Less without using both RangeQuery and test cases. Here's the code

Inserting a document into an index at a specified position

2006-07-05 Thread Jason Calabrese
e any standard way to do this? --Jason - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Inserting a document into an index at a specified position

2006-07-07 Thread Jason Calabrese
All, I sent this the other day, but didn't get any responses. I'm hoping that it was just missed, so I'm trying again. There has to be a better way to to insert a document in to an index then reindexing everything. --Jason On Wednesday 05 July 2006 5:06 pm, Jason Calabre

Re: Inserting a document into an index at a specified position

2006-07-07 Thread Jason Calabrese
> When you say you keep your documents ordered alphabetically, it's confusing > to me. Are you saying that you pre-sort all your documents then insert them > one after another so that automatically-generated internal Lucene ID maps > exactly to the alphabetical ordering? That is, for any document I

Re: Inserting a document into an index at a specified position

2006-07-07 Thread Jason Calabrese
We only display the 10 hits at a time, so we don't need to iterate through all the hits. It feels like there should be a way to pull a document out 1 index and stick it into an other and bring all the unstored fields along with it. On Friday 07 July 2006 12:52, Erick Erickson wrote: > Did you

Re: Sorting

2006-07-29 Thread Jason Calabrese
One fast way to make an alphabetic sort very fast is to presort your docs before adding them to the index. If you do this you can then just sort by index order. We are using this for a large index (1 million+ docs) and it works very good, and seems even slightly faster than relevance sorting.

Field compression too slow

2006-08-10 Thread Jason Polites
Hello all, I am experiencing some performance problems indexing large(ish) amounts of text using the IndexField.Store.COMPRESS option when creating a Field in Lucene. I have a sample document which has about 4.5MB of text to be stored as compressed data within the field, and the indexing of this

Re: Field compression too slow

2006-08-10 Thread Jason Polites
Thanks for the Jira issue... one question on your synchronization comment... I have "assumed" I can't have two threads writing to the index concurrently, so have implemented my own read/write locking system. Are you saying I don't need to bother with this? My reading of the doco suggests that y

Re: updating document

2006-08-10 Thread Jason Polites
Are your storing the contents of the fields in the index? That is, specifying Field.Store.YES when creating the field? In my experience fields which are not stored are not recoverable from the index (well.. they can be reconstructed but it's a lossy process). So when you retrieve the document,

Re: Field compression too slow

2006-08-10 Thread Jason Polites
I can share the data.. but it would be quicker for you to just pull out some random text from anywhere you like. The issue is that the text was in an email, which was one of about 2,000 and I don't know which one. I got the 4.5MB figure from the number of bytes in the byte array reported in the

Re: updating document

2006-08-10 Thread Jason Polites
the index. Lucene works best when the index is light-weight. My recommendation is to think carefully about the "role" of the index, vs the role of your data storage approach. On 8/11/06, Deepan Chakravarthy <[EMAIL PROTECTED]> wrote: On Fri, 2006-08-11 at 01:58 +1000, Jason Po

Re: search document for keywords and keyphrases

2006-08-11 Thread Jason Polites
Yes you could use lucene for this, but it may be overkill for your requirement. If I understand you correctly, all you need to is find documents which match "any" of the words in your list? Do you need to rank the results? If not, it's probably easier just to create your own inverted index of

Re: Indexing Documents which has Attachments and are Refered many times!!

2006-08-12 Thread Jason Polites
Maybe I'm not understanding your requirement, but this should be fairly simple in Lucene. Each document in your document management system would be represented by a single Lucene document in the index. Each lucene document will then have several fields, each field representing the values of the

Re: WIll storing docs affect lucene's search performance ?

2006-08-12 Thread Jason Polites
IMO you should avoid storing any data in the index that you don't need for display. Lucene is an index (and a damn good one), not a database. If you find yourself storing large amounts of data in the index, this could be an indication that you may need to re-think your architecture. In its simp

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Jason Polites
Sounds like you're a bit frustrated. Cheer up, the simple fact is that engineering and business rarely see eye-to-eye. Just focus on the fact that what you have learnt from the process will help you, and they paid for it ;) On the issue at hand...Lucene should scale to this level, but you need

Re: updating document

2006-08-12 Thread Jason Polites
ync you should be ok. On 8/11/06, Karel Tejnora <[EMAIL PROTECTED]> wrote: Jason is right. I think, even Im not expert on lucene too, your newly added document cann't recreate terms for field with analyzer, because field text in empty. There is very hairy solution - hack a IndexRead

Re: Index not recreated

2006-08-14 Thread Jason Polites
My advice would be the "back-to-basics" approach. Create a test case which creates a simple index with a few documents, verify the index is as you expect, then re-create the index and verify again. Run this test case on your production environment (if you are able). This will determine once and

Re: Index not recreated

2006-08-14 Thread Jason Polites
fferent threads accessing the index. This would also explain why you see the problem in production and not testing. On 8/15/06, Jason Polites <[EMAIL PROTECTED]> wrote: My advice would be the "back-to-basics" approach. Create a test case which creates a simple index with a few do

Re: Indexing Documents which has Attachments and are Refered many times!!

2006-08-19 Thread Jason Polites
, Shaghayegh Sahebie <[EMAIL PROTECTED]> wrote: thanks Jason and Steve; maybe i didn't understand your solution well, but in this system a document is refered many times (we have a refer description wich we should index it also) and each time a document is refered i should update

Re: index update with database insertion

2006-08-21 Thread Jason Polites
I'm not sure about the solution in the referenced thread. It will work, but doesn't it run the risk of breaching the transaction isolation of the database write? The issue is when the index is notified of a database update. If it is notified prior to the transaction commit, and the commit fails

java.io.IOException: Access is denied on java.io.WinNTFileSystem.createFileExclusively

2006-08-26 Thread Jason Polites
Hi all, When indexing with multiple threads, and under heavy load, I get the following exception: java.io.IOException: Access is denied at java.io.WinNTFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:850) at org.apache.lucene.store.FSDirectory$1.o

Re: java.io.IOException: Access is denied on java.io.WinNTFileSystem.createFileExclusively

2006-08-26 Thread Jason Polites
On 8/26/06, Michael McCandless <[EMAIL PROTECTED]> wrote: Are you also running searchers against this index? Are they re-init'ing frequently or being opened and then held open? No searches running in my initial test, although I can't be certain what is happening under the Compass hood. This

Re: java.io.IOException: Access is denied on java.io.WinNTFileSystem.createFileExclusively

2006-08-27 Thread Jason Polites
due to any reason can be thought of as the same thing, regardless of the reason (so long as its logged). Seems like the simplest solution too. On 8/28/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 8/26/06, Jason Polites <[EMAIL PROTECTED]> wrote: > Synchronization at this

Re: java.io.IOException: Access is denied on java.io.WinNTFileSystem.createFileExclusively

2006-08-27 Thread Jason Polites
]> wrote: Doron Cohen wrote: > "Jason Polites" <[EMAIL PROTECTED]> wrote on 27/08/2006 09:36:07: > >> I would have thought that simultaneous cross-JVM access to an index was >> outside of scope of the core Lucene API (although it would be great), but &

Re: Lucene displaying results in the order they were added

2006-08-27 Thread Jason Polites
Not sure what the desired end result is here, but you shouldn't need to update the document jut to give it a boost factor. This can be done in the query string used to search the index. As for updating affecting search order, I don't think you can assume any guarantees in this regard. You're pr

Re: java.io.IOException: Access is denied on java.io.WinNTFileSystem.createFileExclusively

2006-08-28 Thread Jason Polites
Yeah.. I had a think about this, and I now remember why I originally came to the conclusion about cross-JVM access. When I was adding documents to the index, and searching at the same time (from a different JVM) I would get the occassional (but regular) FileNotFoundException. I don't recall the

Re: java.io.IOException: Access is denied on java.io.WinNTFileSystem.createFileExclusively

2006-08-28 Thread Jason Polites
ound.. if that helps. On 8/28/06, Michael McCandless <[EMAIL PROTECTED]> wrote: Jason Polites wrote: > Yeah.. I had a think about this, and I now remember why I originally > came to > the conclusion about cross-JVM access. > > When I was adding documents to the index, and searc

Re: Straight TF-IDF cosine similarity?

2006-08-29 Thread Jason Polites
Have you looked at the MoreLikeThis class in the similarity package? On 8/30/06, Winton Davies <[EMAIL PROTECTED]> wrote: Hi All, I'm scratching my head - can someone tell me which class implements an efficient multiple term TF.IDF Cosine similarity scoring mechanism? There is clearly the sin

Re-created fields consistently indexed?

2006-08-30 Thread Jason Polites
Hi all, I understand that it is possible to "re-create" fields which are indexed but not stored (as is done by Luke), and that this is a lossy process, however I am wondering whether the indexed version of this remains consistent. That is, if I re-create a non-stored field, then re-index this fi

word frequency list?

2006-08-30 Thread Jason Pump
Is there a large list of words and their frequency in the english language? Obviously it would differ by corpus but I would like to see what's already available. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional comman

Re: word frequency list?

2006-08-31 Thread Jason Pump
Thanks Boris, Jason Boris Aleksandrovsky wrote: Jason, You can look here: http://www.cs.ualberta.ca/~lindek/downloads.htm for Word frequency counts from a 1.5B word corpus (TREC disks 1-5 and the Reuters corpus <http://about.reuters.com/researchandstandards/corpus/>). The word

Stop words in index

2006-09-02 Thread Jason Polites
Hey all, I am using the StandardAnalyzer with my own list of stop words (which is more comprehensive than the default list), and my expectation was that this would omit these stop words from the index when data is indexed using this analyzer. However, I am seeing stop words in the term vector fo

Re: Stop words in index

2006-09-02 Thread Jason Polites
Original Message From: Jason Polites <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Saturday, September 2, 2006 9:05:27 AM Subject: Stop words in index Hey all, I am using the StandardAnalyzer with my own list of stop words (which is more comprehensive than the default list), and m

Re: Stop words in index

2006-09-03 Thread Jason Polites
ot;, but not "on". This is fine, and if the user searches for: Disney on Ice They will get a match. But, it seems that a search for: "Disney on Ice" With the quotations indicating the desire for an "exact match", the absence of stop words in the index means this

Re: Stop words in index

2006-09-04 Thread Jason Polites
e" which is what should have cone in your doc when it was indexed using that analyzer. : : On 9/3/06, Jason Polites <[EMAIL PROTECTED]> wrote: : > : > Roger that. I'll double check my code. : > : > Thanks. : > : > : > On 9/3/06, Otis Gospodnetic <[EMAIL PROT

Re: Possible exceptions using IndexReader & IndexWriter

2006-09-18 Thread Jason Polites
I've also seen FileNotFound exceptions when attempting a search on an index while it's being updated, and the searcher is in a different JVM. This is supposed to be supported, but on Windows seems to regularly fail (for me anyway). The simplest solution to this would be a service oriented approa

lucene 1.4 + needs spaces problem

2005-04-06 Thread Jason Eacott
ED]:false + [EMAIL PROTECTED]:"2004" + [EMAIL PROTECTED]:"February" + [EMAIL PROTECTED]:"Council" can anyone tell me if this has been fixed somewhere or whether this was by design? (I cannot imagine that it is) I know I have and set by default but this should stil

RE: FileNotFoundException segments

2005-07-07 Thread Jason Polites
if ((indexFile = new File(indexDir)).exists() && indexFile.isDirectory()) { exists = false; Isn't this backwards? Couldn't you just do: indexFile = new File(indexDir); exists = (indexFile.exists() && indexFile.isDirectory()); -Original Message- From: bib_lucene bib [mailto:

RE: Search Timeout - abort a search

2005-07-07 Thread Jason Polites
You could do it asynchronously. That is, separate off the actually lucene search into a different thread which does the actual search, then the calling thread simply waits for a maximum time for the search thread to complete, then queries the status of the search thread to get the results obtained

Re: Did you mean?

2005-08-29 Thread Jason Haruska
To add to other comments: This functionality should also look at how common a term is in the corpus. Using the corpus as "correct" set of terms to search on isn't always what you want if the corpus is unclean (misspellings, etc.) I believe this is why if you search on an uncommon term, Google w

Re: custom sort

2005-08-31 Thread Jason Haruska
I had to do something similar, but I plan on re-writing it into something more elegant. I hope this helps give you some ideas. 1. Create a QueryFilter on only those items that matched the criteria (have a required clause in your boolean query) 2. Create a BitFilter which takes a BitSet from step

Re: Question: force a field must be matched?

2005-09-15 Thread Jason Haruska
On 9/15/05, James Huang <[EMAIL PROTECTED]> wrote: > > Suppose I have a book index with field="publisher", field="title", etc. > I want to search for books only from "Manning", do I have to do anything > special? how? > add new BooleanClause(new TermQuery(new Term("publisher","Manning")), true,

Searching for Empty Field

2011-07-14 Thread Trieu, Jason T
the latest postings on this topic were a few years old, I am wondering if there have been any changes in Lucene query syntax to support searching for empty fields. Has anyone been successfully searched for empty fields with recent Lucene releases? Thanks Jason

How to determine memory required for searching

2011-08-04 Thread Trieu, Jason T
ts of resources. Perhaps 8 GB of memory is just simply not enough to handle an index of 600 million documents. But before telling management that they must get more memory, I'd to see if there might be other ways to accomplish this. Thanks in advance. Jason

Assistance for Unified Index Proces

2013-08-14 Thread Mark Jason B. Nacional
ified Index". In this implementation, we have only one index file to manage. I just want to get information as to how am I going to implemented it in a an optimal way. Any suggestion would be perfect! :) Thanks! Mark Jason Nacional Junior Software Engineer

RE: How to search

2008-08-26 Thread Jiao, Jason (NSN - CN/Cheng Du)
y to look for those that match the pattern. Br. Jason Jiao >-Original Message- >From: ext Daniel Noll [mailto:[EMAIL PROTECTED] >Sent: Tuesday, August 26, 2008 10:50 AM >To: java-user@lucene.apache.org >Subject: Re: How to search > >Venkata Subbarayudu wrote: >

Luke issues "Unknown format version: -6"

2008-08-26 Thread Jiao, Jason (NSN - CN/Cheng Du)
not contain any new features, API or file format changes, which makes it fully compatible to 2.3.0 and 2.3.1". Any hints? Thanks in advance. Jason Jiao - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

<    1   2   3