date:20070503

Re: Language detection library

2007-05-03 Thread karl wettin

4 maj 2007 kl. 02.20 skrev Chris Lu: I suppose if a document is indexed as English or French, when users searching the document, we need to parse the query as English or French also? If you do some language specific token analysis such as stemming, yes. Detecting the language on such small t

Re: customizing index file name

2007-05-03 Thread Erick Erickson

Oh, fix as in make constant, not fix as in broken ... No, I don't know of any way to do this. Can your installer just pack up everything in a directory? Erick On 5/3/07, Shaw, James <[EMAIL PROTECTED]> wrote: I mean specifying the name of the .csf file, rather than letting Lucene come up with

RE: customizing index file name

2007-05-03 Thread Shaw, James

I mean specifying the name of the .csf file, rather than letting Lucene come up with a name by itself. I'm actually using Lucene.Net, and we pre-index during our build and want to include the index in the installer, but the installer can only reference named files, and it wouldn't work if the .csf

Re: Language detection library

2007-05-03 Thread Chris Lu

I suppose if a document is indexed as English or French, when users searching the document, we need to parse the query as English or French also? -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.db

Re: customizing index file name

2007-05-03 Thread Erick Erickson

Uh, what do you mean "fix"? You shouldn't have to do anything with it at all. What behavior are you observing that you want to change and why? Erick On 5/3/07, Shaw, James <[EMAIL PROTECTED]> wrote: Does anyone know how to fix the .cfs file name in an index directory? The deletable and segment

Re: Implementing lagre secure Lucene search system questions.

2007-05-03 Thread Daniel Noll

jim shirreffs wrote: Hi, I'm a relative Lucene newbe and would appreciate some expert advice. Sounds like you might want to start a new thread, otherwise people who know the answer to your problem might not see your post. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo N

customizing index file name

2007-05-03 Thread Shaw, James

Does anyone know how to fix the .cfs file name in an index directory? The deletable and segments file names are always the same, but we have observed that the .cfs file name changes each time you index a content directory with some changes to the directory (some deleted files, added files, etc). H

Re: Language detection library

2007-05-03 Thread karl wettin

3 maj 2007 kl. 22.06 skrev Mordo, Aviran (EXP N-NANNATEK): Anyone knows of a good language detection library that can detect what language a document (text) is ? I posted this some time back: https://issues.apache.org/jira/browse/LUCENE-826 A bit of proof-of-concept:ish, but it does the job

Re: Language detection library

2007-05-03 Thread Andrzej Bialecki

Jason Pump wrote: http://software.wise-guys.nl/libtextcat/ ... which is what Nutch implements in its language-identifier plugin. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___

Re: For indexing: how to estimate needed memory?

2007-05-03 Thread Erick Erickson

Coincidentally, I'm hacking at this very problem First, are you sure you're free memory calculation is OK? Why not just use freeMemory? Perhaps also calling the gc if the avail isn't enough. Although I confess I don't know the innards of the interplay of getting the various memory amounts

Re: Language detection library

2007-05-03 Thread Jason Pump

http://software.wise-guys.nl/libtextcat/ Otis Gospodnetic wrote: LingPipe - commercial unless your data/product/service is free. Nutch language id plugin. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Origin

Re: Language detection library

2007-05-03 Thread Otis Gospodnetic

LingPipe - commercial unless your data/product/service is free. Nutch language id plugin. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: "Mordo, Aviran (EXP N-NANNATEK)" <[EMAIL PROTEC

Language detection library

2007-05-03 Thread Mordo, Aviran (EXP N-NANNATEK)

Anyone knows of a good language detection library that can detect what language a document (text) is ? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

For indexing: how to estimate needed memory?

2007-05-03 Thread david m

Our application includes an indexing server that writes to multiple indexes in parallel (each thread writes to a single index). In order to avoid an OutOfMemoryError, each request to index a document is checked to see if the JVM has enough memory available to index the document. I know that Index

Re: Doubt in FuzzyQuery

2007-05-03 Thread Stefan Will

It seems to me like a french stemmer is what you need instead of a fuzzy query. What analyzer are you using for your documents and queries ? -- Stefan [EMAIL PROTECTED] wrote: Hi! I have a problem in dealing whith a fuzzy query in Lucene 2.1.0. In order to explain my problem, I illustrate it

Implementing lagre secure Lucene search system questions.

2007-05-03 Thread jim shirreffs

Hi, I'm a relative Lucene newbe and would appreciate some expert advice. I would like to make fulltest searchable, files distributed on various local hosts in the intranet. My startup plan is to index these files locally and then merge all the little indexes into a master indexes on a search

Re: MergeFactor advice wanted

2007-05-03 Thread Erick Erickson

I don't think you're doing yourself any good by explicitly using a RAMdirectory in the first place. If you use a simple FSDirectory, a number of documents are added in RAM before being flushed to the FS. Why do you add this complexity to your code with no proof that it does you any good? Or do yo

Re: MergeFactor advice wanted

2007-05-03 Thread Erick Erickson

I don't think (but don't know for sure) whether optimizing before the end of the run buys you anything. And you're right, it takes a while. I've assumed that it was best done at the end of the entire run, but that's only an assumption. Search the archives for the thread titled MergeFactor and Ma

Re: Doubt in FuzzyQuery

2007-05-03 Thread Erick Erickson

It would help a lot if you can either post a snippet of code showing how you construct the fuzzy query or create a small, self-contained program illustrating the problem. With the latter approach, I've often found that in the middle of creating the program, what I'm doing wrong surfaces ... Best

Re: Email Definition in StandardTokenizer.jj

2007-05-03 Thread Erick Erickson

I can just see Hatcher's reply now.. Would you be willing to submit the correct code ? Erick On 5/2/07, Winton Davies <[EMAIL PROTECTED]> wrote: Hey guys, Does someone who makes commits want to fix the EMAIL definition in StandardTokenizer.jj Its a not very well known exception to the n

RE: drawback addindexes method

2007-05-03 Thread Steven Parkes

See IndexWriter#addIndexesNoOptimize, released with 2.1. Note that it doesn't optimize before or after, so if you want an optimize at the end, you need to ask for it manually. -Original Message- From: Chandan Tamrakar [mailto:[EMAIL PROTECTED] Sent: Thursday, May 03, 2007 12:46 AM To: jav

RE: MergeFactor advice wanted

2007-05-03 Thread Chandan Tamrakar

What if we are using addindexes(Ram Directory) method ? it calls optimize function inside the function itself ? Any solution to this ? -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Thursday, May 03, 2007 4:03 PM To: java-user@lucene.apache.org Subject: Re: MergeFac

Re: MergeFactor advice wanted

2007-05-03 Thread Aleksander M. Stensby

Ok. but then you would not optimize at all? Not even in the end of the indexing run? On Thu, 03 May 2007 12:17:40 +0200, Mark Miller <[EMAIL PROTECTED]> wrote: I think it is worth your time to do some benchmarking. I think mergeFactor is not very helpful in the end...if you set it high, y

Re: MergeFactor advice wanted

2007-05-03 Thread Mark Miller

I think it is worth your time to do some benchmarking. I think mergeFactor is not very helpful in the end...if you set it high, you'll index faster but then your searches will be slower prompting you to optimize...after which you'll find that you paid all your gains back. Test things out for yo

MergeFactor advice wanted

2007-05-03 Thread Aleksander M. Stensby

Hello everyone! I'm wondering if any of you have any helpful advice to what MergeFactor i should use... The indexing process is handling a large amount of documents and i would like to index as fast as possible. Initial thought was to increase the mergeFactor to make the indexer work more in

Doubt in FuzzyQuery

2007-05-03 Thread sccarrera

Hi! I have a problem in dealing whith a fuzzy query in Lucene 2.1.0. In order to explain my problem, I illustrate it by a simple example: I would like to recover files including the set of strings "société américaine" and "sociétés américaines" from a fuzzy query relating the string "société

drawback addindexes method

2007-05-03 Thread Chandan Tamrakar

I found that IndexWriter.addIndexes(Directory[]) always calls optimize method twice I am indexing a documents in batches , i.e I call this method when X no. of documents are buffered in RAM Using RAMDirectory . So as the index size grows , optimize method will only increase by indexing time C

Re: Language detection library

Re: customizing index file name

RE: customizing index file name

Re: Language detection library

Re: customizing index file name

Re: Implementing lagre secure Lucene search system questions.

customizing index file name

Re: Language detection library

Re: Language detection library

Re: For indexing: how to estimate needed memory?

Re: Language detection library

Re: Language detection library

Language detection library

For indexing: how to estimate needed memory?

Re: Doubt in FuzzyQuery

Implementing lagre secure Lucene search system questions.

Re: MergeFactor advice wanted

Re: MergeFactor advice wanted

Re: Doubt in FuzzyQuery

Re: Email Definition in StandardTokenizer.jj

RE: drawback addindexes method

RE: MergeFactor advice wanted

Re: MergeFactor advice wanted

Re: MergeFactor advice wanted

MergeFactor advice wanted

Doubt in FuzzyQuery

drawback addindexes method

27 matches

Site Navigation

Mail list logo

Footer information