Using alternative scoring mechanism.

2012-12-01 Thread Eyal Ben Meir
Can one replace the basic scoring algorithm (TF/IDF) for a specific field, to use a different one? I need to compute similarity for NAME field. The regular TF/IDF is not good enough, and I want to use a Name Recognition Engine as a scorer. How can it be done? Thanks, Eyal.

Re: IndexNotFoundException

2011-06-06 Thread Ben Hood
Ian, Thanks a lot for the heads up. I didn't even know those detailed release notes existed - I just looked on the main release page and searched a bit through JIRA. Cheers, Ben On Mon, Jun 6, 2011 at 2:57 PM, Ian Lea wrote: > In the release notes for 3.1.0 under Changes in b

Re: IndexNotFoundException

2011-06-06 Thread Ben Hood
r(dir, new StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED); IndexSearcher searcher = new IndexSearcher(dir, true); } } On Mon, Jun 6, 2011 at 10:58 AM, Ben Hood <0x6e6...@gmail.com> wrote: > Hi, > > I'm trying to upgrade from 3.0.2 to 3.2.0 and am runn

IndexNotFoundException

2011-06-06 Thread Ben Hood
ous? Thanks for any help, Cheers, Ben -- package net.lshift.diffa.kernel.util; import com.eaio.uuid.UUID; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.sea

boosting on a sint results in high cpu spikes and ultimately hangs solr

2011-02-08 Thread Ben VandenBos
Here's our cache config: Any advice or ideas would be greatly appreciated. thanks! ben

Question to the writer of MultiPassIndexSplitter

2010-07-22 Thread Yatir Ben Shlomo
Hi, I heard work is being done on re-writing MultiPassIndexSplitter so it will be a single pass and work quicker. I was wondering if this is already done or when is it due ? Thanks

Matching term document character offsets (PyLucene 3.0.1)

2010-05-16 Thread Ben Phelan
crementToken() # do something here with the token (which should match the search term but isn't even close...) p = pos positions.close() searcher.close() Any help with either of these problems would be greatly appreciated! Cheers, Ben ---

Re: Implementing filtering based on multiple fields

2010-01-08 Thread Yaniv Ben Yosef
is under 1.0. > > Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > - Original Message > > From: Yaniv Ben Yosef > > To: java-user@lucene.apache.org > > Sent: Thu, January 7, 2010 6:55:18 PM > > Subject: Re: Implementing f

Re: Implementing filtering based on multiple fields

2010-01-07 Thread Yaniv Ben Yosef
gospodne...@yahoo.com> wrote: > For something like CSE, I think you want to isolate users and their > data/indices. > > I'd look at Bixo or Nutch or Droids ==> Lucene or Solr > > Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > &g

Implementing filtering based on multiple fields

2010-01-07 Thread Yaniv Ben Yosef
Hi, I'm very new to Lucene. In fact, I'm at the beginning of an evaluation phase, trying to figure whether Lucene is the right fit for my needs. The project I'm involved in requires something similar to the Google Custom Search Engine (CSE). In CSE, each user can defin

Re: Solr/Lucene on OpenVMS, filesystem-specific issues

2010-01-05 Thread Ben Armstrong
Ben Armstrong wrote: I am trying to get Solr 1.4.0 to work on OpenVMS V8.3 Alpha with Java 1.5.0-6.p1. ... If Lucene would consider the segment number to end at a final period instead of scanning to the end of the string, then I could get past this error. I looked at the other possible JAVA

Solr/Lucene on OpenVMS, filesystem-specific issues

2010-01-05 Thread Ben Armstrong
file, keeping last" and "keeping first" controls just govern where the mandatory (unquoted, unmangled) dot goes. And in any event, these only affect multi-dot filenames, e.g. "extra.dot.dat", not extensionless files, e.g. "segment_1.", so for Lucene t

SpanScorer example with lucene 3.0.0

2009-12-27 Thread Ben Jiang
? This seems to be quite some changes since 2.9.0. The old SpanScorer from highlighter package has been moved to the lucene core. I would appreciate if someone can help out here. Thanks in advance Ben

Review and questions about Lucene Java 2.9.0

2009-10-08 Thread Mehdi Ben Hamida
true? All security aspects (data's, access, communication...) should be developed and held by the application and that the API does not provide any security classes. Thanks a lot for your help. *Mehdi Ben Hamida*

Re: extracting non-english text from word, pdf, etc....??

2007-08-02 Thread Ben Litchfield
it comes out as gibberish. As a simple test if Acrobat can extract the text then PDFBox should be able to as well. Ben Quoting Grant Ingersoll <[EMAIL PROTECTED]>: Hey Michael, Have you given it a try? I would think they would work, but haven't actually done it. Setup a smal

Re: Indexing PDF document

2007-06-06 Thread Ben Litchfield
you need to include the both the bouncy castle jars and FontBox jar. Both are included with the PDFBox distribution. Ben Quoting jim shirreffs <[EMAIL PROTECTED]>: Thanks I rebuilt PDFbox and got past that problem but now I am getting Exception in thread

Re: decrypting a PDF to read the content

2007-02-12 Thread Ben Litchfield
PDFBox comes with a version of BouncyCastle that will work. It is likely that other versions will also work as well. Is there a specific version that you have tried and didn't work? Ben Quoting Alixandre Santana <[EMAIL PROTECTED]>: Hi All, I got this error when i tried to de

Re: Full disk space during indexing process with 120 gb of free disk space

2006-12-04 Thread Ben Litchfield
u are most likely not calling close() on the PDDocument object. How are you adding the documents to the index. There is a simple helper class called org.pdfbox.searchengine.lucene.LucenePDFDocment that you may find useful. Ben Ariel Isaac Romero Cartaya wrote: Hi every body: I am getting

Re: Intermittent search performance problem

2006-11-07 Thread Ben Dotte
y get to the bottom of this sooner or later. Thanks again, Ben On 11/6/06, Vladimir Olenin <[EMAIL PROTECTED]> wrote: Any profiler can add it's own overhead. You might try the "-verbose:gc" JVM flag (if you haven't tried it yet). The fastest way to check if you problems

Re: Intermittent search performance problem

2006-11-03 Thread Ben Dotte
. On 11/3/06, Ben Dotte <[EMAIL PROTECTED]> wrote: I'm trying to figure out a way to troubleshoot a performance problem we're seeing when searching against a memory-based index. What happens is we will run a search against the index and it generally returns in 1 second or less. Bu

Intermittent search performance problem

2006-11-03 Thread Ben Dotte
indexes aren't optimized. We're currently on Lucene 2.0 but I had the same problem with 1.9.1. Thanks, Ben - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Out of memory error

2006-07-13 Thread Ben Litchfield
is still giving you an out of memory error then it is possibly an issue with PDFBox, if that is the case then please create an issue and attach/upload the PDF on the PDFBox site. Ben > Thanks. > > I am using the getText(PDDocument) method of the PDFTextStripper. I will > t

Re: What is a good book on Lucene?

2006-06-28 Thread Ben Knear
etc. Thanks! Vlad Lucene in Action was very helpful for this beginner! Ben - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Flushing RAMDir into FSDir

2006-06-28 Thread Ben Knear
eed of the indexing, but minimize damage in case of a crash. Ben Knear - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: weird error with SVN of Lucene

2006-06-27 Thread Ben Knear
m trying to CO or update it for 2 hours... can you perform updates or COs? -- Yura Smolsky, http://altervisionmedia.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTEC

A Special SpanQuery

2006-06-22 Thread Ben Knear
within 1 slop of each other and not have C after B within the slop left over from A to B. With this rule, A B E M F would return true, while A B E C J would be false. Thanks! Ben - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: Suggesting refine searches with Lucene

2006-02-13 Thread Ben
I may be wrong but isn't this what Carrot2 does? -Ben On 2/13/06, Chun Wei Ho <[EMAIL PROTECTED]> wrote: > > Thanks. But I am actually looking for approaches/libraries which will > help me to come up with the suggested "refine searches". > > For example

Re: Can PDFBox or POI handle multi-byte characters with different enc odings?

2006-02-10 Thread Ben Litchfield
PDFBox can handle multi-byte encodings. There are a couple recent fixes for CJK languages that are not part of 0.7.2 but are part of the nightly build. Ben On Fri, 10 Feb 2006, Zhang, Lisheng wrote: > Hi, > > Currently we are using PDFBox to process PDF files and > POI to pro

Re: Java heap space ...after index process

2005-10-26 Thread Ben Litchfield
? Ben Patricio Galeas wrote: Hello All, I try to index some PDF documents using PDFBox. It works apparent normally, but when the index process ends, I get the following message: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space Do you have some idea? Thank

RE: Lucene in Action : example code -> document-parsing framework ...

2005-10-17 Thread Ben Litchfield
In addition, the latest version(0.7.2) of PDFBox does not require log4j, so you could also upgrade to that version. Ben On Mon, 17 Oct 2005 [EMAIL PROTECTED] wrote: > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/log4j/Logger > at org.pdfbox.p

Stopping Duplicates

2005-09-17 Thread Ben Gill
name What is the best way for me to achieve this? Thanks Ben - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: PDFBox PDFExtractor

2005-09-12 Thread Ben Litchfield
lucene fairly easily. I highly suggest you do some tests against your own set of PDF documents. A new version of PDFBox was released this weekend and does have some improvements in terms of speed and memory. Ben Litchfield PDFBox http://www.pdfbox.org/ On Mon, 12 Sep 2005 [EMAIL PROTECTED] wrote

Date boosts implementation

2005-09-04 Thread Ben
Hi Could someone please give me some suggestions on how to implement date boosts? I would like to boost the document when it is new and lower the boost when it's old. Thanks, Ben - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: IO bandwidth throttling

2005-09-01 Thread Ben Gollmer
utilities, assuming you are on Linux). Cheers, -- Ben signature.asc Description: OpenPGP digital signature

Re: IO bandwidth throttling

2005-09-01 Thread Ben Gollmer
a Unix box I would just renice the process or set some ulimits. Adding code to each application that might possibly need bandwidth or memory restrictions seems redundant, not to mention a chore :) Cheers, -- Ben signature.asc Description: OpenPGP digital signature

Re: Integrating lucene search with adobe search

2005-08-15 Thread Ben Litchfield
quot;; It also possible to pass it in when opening from the command line. Ben Litchfield On Mon, 15 Aug 2005, Andrew Boyd wrote: > Hello all, > After I do my search and display the hits I get back I would like to pass > the seach string that I used with lucene to acrobat reader when it

Count the total # of docs in the index?

2005-08-07 Thread Ben
Hi Is it possible to count the total number of documents in the index without requesting a search? I would like to count the total documents in the index within a date range. Thanks, Ben - To unsubscribe, e-mail: [EMAIL

Re: StackOverflowError when index pdf files

2005-07-20 Thread Ben Litchfield
Yes, this sounds like an issue with PDFBox, can you determine if it is a single PDF document and post an issue on the PDFBox sourceforge site. Thanks, Ben Litchfield On Wed, 20 Jul 2005, Otis Gospodnetic wrote: > It sounds like the problem may stem from your PDF parser >

Re: Lucene in clustered environment (Tomcat)

2005-06-07 Thread Ben
Wouldn't it defeat the purpose of clustering if you have a single server to manage a single index? What would happen if this server failed? Cheers, Ben On 6/8/05, Ben <[EMAIL PROTECTED]> wrote: > How about using JavaGroups to notify other nodes in the cluster about

Re: Lucene in clustered environment (Tomcat)

2005-06-07 Thread Ben
this method, I don't have to modify my Lucene code, I just need to add additional code to notify other nodes. I believe this method also scales better. Cheers, Ben On 6/7/05, Nader Henein <[EMAIL PROTECTED]> wrote: > I realize I've already asked you this question, but do you ne

Re: Lucene in clustered environment (Tomcat)

2005-06-07 Thread Ben
z, it's rock solid and really versatile. I am using Quartz, it is really great and supports cluster. Thanks, Ben On 6/7/05, Nader Henein <[EMAIL PROTECTED]> wrote: > When you say your cluster is on a single machine, do you mean that you > have multiple webservers on the same machi

Re: Lucene in clustered environment (Tomcat)

2005-06-07 Thread Ben
My cluster is on a single machine and I am using FS index. I have already integrated Lucene into my web application for use in a non-clustered environment. I don't know what I need to do to make it work in a clustered environment. Thanks, Ben On 6/7/05, Nader Henein <[EMAIL PROTECTED

Lucene in clustered environment (Tomcat)

2005-06-06 Thread Ben
Hi I would like to use Lucene in a clustered environment, what are the things that I should consider and do? I would like to use the same ordinary index storage for all the nodes in the the cluster, possible? Thanks, Ben - To

Re: Lucene - PDFBox

2005-05-25 Thread Ben Litchfield
There were some fixes around extra spaces in the 0.7.1 version of PDFBox, if you are not using that version please try it, otherwise post an issue on the PDFBox sourceforge site. http://sourceforge.net/tracker/?group_id=78314&atid=552832 Thanks, Ben On Wed, 25 May 2005, Thomas X Hoban w

Re: Lucene - PDFBox

2005-05-25 Thread Ben Litchfield
Can you run the following command line application on the PDF to verify that the extracted text is correct java org.pdfbox.ExtractText Ben On Wed, 25 May 2005, Thomas X Hoban wrote: > > > First, I am new to Lucene. > > Is there anyone out there who has had trouble getting hi

Delete documents base on more than one condition?

2005-04-25 Thread Ben
Hi Is it possible to delete a set of documents where they match certain conditions? I would like to delete a set of articles that belong to a given user within a category. Thanks, Ben - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: batch delete

2005-03-28 Thread Ben
Cheers, I didn't know iff = if and only if. :) My question was mainly about the polarity. Thanks, Ben On Mon, 28 Mar 2005 08:19:19 -0800, Chuck Williams <[EMAIL PROTECTED]> wrote: > Otis Gospodnetic writes (3/28/2005 7:34 AM): > > >iff = if and only if. Not a typo,

Re: batch delete

2005-03-28 Thread Ben
ero iff this term belongs after the argument. Shouldn't it be: Compares two terms, returning an integer which is less than zero if this term belongs before the argument, equal zero if this term is equal to the argument, and greater than zero if this term belongs after the argument Thanks,

Re: batch delete

2005-03-28 Thread Ben
OK, so I have to query for a list of old documents (from a given date) and delete each document individually? Can I use DateFilter.Before() with Term? Thanks, Ben On Mon, 28 Mar 2005 02:13:48 -0600, Chris Lamprecht <[EMAIL PROTECTED]> wrote: > Ben, > > If you know the exact te

Re: batch delete

2005-03-27 Thread Ben
BTW is it possible to do what I am trying to achieve without querying the database or the index? Thanks, Ben On Mon, 28 Mar 2005 10:38:52 +1000, Ben <[EMAIL PROTECTED]> wrote: > Hi > > I need to delete a number of documents that are older than a > particular time from a Luc

batch delete

2005-03-27 Thread Ben
Hi I need to delete a number of documents that are older than a particular time from a Lucene index. What is the best way to do this? Thanks, Ben - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail

Re: Alerts for search results

2005-03-06 Thread Ben
Thanks for your help, something for me to start with. PS: Sorry about the double posts -Ben On Sun, 06 Mar 2005 16:35:56 +0400, Nader Henein <[EMAIL PROTECTED]> wrote: > Well since you're doing it by keyword, it's a little tricky coz if you > want to batch like searches

Alerts for search results

2005-03-06 Thread Ben
feature? I would imagine it's going to take a lot of resources to do a search for each keyword. Any guidance is greatly appreciated. Thanks! -Ben - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [

Alerts for search results

2005-03-06 Thread Ben
feature? I would imagine it's going to take a lot of resources to do a search for each keyword. Any guidance is greatly appreciated. Thanks! Ben - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [