On Fri, Mar 27, 2009 at 4:22 PM, Marvin Humphrey wrote:
> I think the difference here is that Lucene gets to use multiple threads within
> one process, while Lucy has to at least be capable of using a multiple-process
> concurrency model in order to support real-time search for non-threaded hosts
On Fri, Mar 27, 2009 at 03:59:09PM -0400, Michael McCandless wrote:
> >> Why must merge policy be made public for realtime search? [In Lucy]
> >
> > Because real-time search under Lucy needs to be able to operate using
> > multiple
> > write processes, since threads will not always be available.
On Fri, Mar 27, 2009 at 1:12 PM, Marvin Humphrey wrote:
>> Why must merge policy be made public for realtime search? [In Lucy]
>
> Because real-time search under Lucy needs to be able to operate using multiple
> write processes, since threads will not always be available.
>
> You need to be able
i've been using the one in icu for some time...
http://icu-project.org/apiref/icu4j/com/ibm/icu/text/CharsetDetector.html
On Fri, Mar 27, 2009 at 2:57 PM, Zhang, Lisheng <
lisheng.zh...@broadvision.com> wrote:
> Hi,
>
> What's the best free tool for encoding detection? For example we have
> a AS
Hi,
What's the best free tool for encoding detection? For example we have
a ASCII file README.txt, which needs to be indexed, but we need to
know its encoding before we can convert it to Java String.
I saw some free tools on the market, but have no experiences with any
of them yet? What is the be
Lisheng,
You might want to look at the Nutch LanguageID plugin
(http://wiki.apache.org/nutch/LanguageIdentifier) too.
Cheers,
Boris
On Fri, Mar 27, 2009 at 10:22 AM, Zhang, Lisheng
wrote:
> Thanks very much!
>
> -Original Message-
> From: jochen.sc...@gmail.com [mailto:jochen.sc...@gmai
Thanks very much!
-Original Message-
From: jochen.sc...@gmail.com [mailto:jochen.sc...@gmail.com]on Behalf Of
Jochen Frey
Sent: Friday, March 27, 2009 10:04 AM
To: java-user@lucene.apache.org
Subject: Re: Free software for language detection
Lisheng,
Here's a package you could take a lo
On Fri, Mar 27, 2009 at 12:39:05PM -0400, Michael McCandless wrote:
> Why must merge policy be made public for realtime search? [In Lucy]
Because real-time search under Lucy needs to be able to operate using multiple
write processes, since threads will not always be available.
You need to be abl
Lisheng,
Here's a package you could take a look at. I have used it in the past and it
worked reasonably well. Let me know what else you find and how it works for
you.
http://www.olivo.net/software/lc4j/
Good luck!
Jochen Frey
On Fri, Mar 27, 2009 at 9:54 AM, Zhang, Lisheng <
lisheng.zh...@broad
Hi,
Are you aware of any free software for language detection (given certain
text, see if it is French, or Japanese)? I saw Bob Carpenter's previous
mail which explained the principle nicely, but could not locate free tools?
Thanks very much for helps, Lisheng
--
On Fri, Mar 27, 2009 at 12:13 PM, Marvin Humphrey
wrote:
>> Whereas in Lucene neither MultiSegmentReader nor SegmentReader is public.
>
> I had thought making SegmentReader public was at least under consideration as
> part of the implementation for segment-centric sorted search, but I guess it
>
Thanks a million. You really helped me get on the right direction. I'm
going to start building some test cases this afternoon so I can really
begin to get my hands dirty and see how everything works. It's good to
know that there isn't one "right" way for my purposes, which is mostly
what I wanted
On Fri, Mar 27, 2009 at 11:09:09AM -0400, Michael McCandless wrote:
> Whereas in Lucene neither MultiSegmentReader nor SegmentReader is public.
I had thought making SegmentReader public was at least under consideration as
part of the implementation for segment-centric sorted search, but I guess i
Michael McCandless wrote:
Are you opening your IndexReader with readOnly=true? If not, you're
likely hitting contention on the "isDeleted" method.
When you run with a "normal" directory, either on a traditional hard
drive or SSD device, do you use NIOFSDirectory? That removes
contention, but,
ReadOnly option was introduce with 2.4
from javadoc: "...as of 2.4, it's possible to open a read-only
IndexReader using one of the static open methods that accepts the
boolean readOnly parameter."
http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/index/IndexReader.html#open(org.apache
Alas, it's new as of 2.4. Can you upgrade?
Mike
On Fri, Mar 27, 2009 at 11:55 AM, wrote:
>> > How can I open it "readonly"?
>>
>> See the javadocs for IndexReader.
>
> I did it already for 2.3 - cannot find readonly
>
>
> -
>
> > How can I open it "readonly"?
>
> See the javadocs for IndexReader.
I did it already for 2.3 - cannot find readonly
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-use
>> Are you opening your IndexReader with readOnly=true? If not, you're
>> likely hitting contention on the "isDeleted" method.
>
> How can I open it "readonly"?
See the javadocs for IndexReader.
--
Ian.
-
To unsubscribe, e-mail
> Are you opening your IndexReader with readOnly=true? If not, you're
> likely hitting contention on the "isDeleted" method.
How can I open it "readonly"?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For ad
Yes, updating a document in Lucene is "expensive" for two
reasons:
1> deleting and adding a document does mean there's internal
work being done. But it's not all *that* expensive. So this really
comes down to how many records you expect to update
every 15 minutes. You've gotta try it.
2
On Fri, Mar 27, 2009 at 9:48 AM, Marvin Humphrey wrote:
> Every indexer opens a PolyReader [analogous to MultiSegmentReader], even when
> there's no data in the index, or a single segment. (I modified PolyReader so
> that 1-segment and 0-segment states were officially valid for this purpose.)
OK
I'm going to try and cover all replies so far, but for the most part
this first one since it had the most help so far. Thanks to everyone who
replied so far, you've given me a lot of great ideas to think about and
look into. I'm going to begin some small test indexes with our data so
we have somet
this is really no problem at all... use RBBI to identify runs of numbers in
your query string, and then replace them with the normalized version. you
will need icu jar for this.
String userQuery = "Potter 19,99";
Locale locale = new Locale("nl");
RuleBasedBreakIterator bi = (RuleBa
On Fri, Mar 27, 2009 at 08:08:20AM -0400, Michael McCandless wrote:
> Actually, how will Lucy do this?
Every indexer opens a PolyReader [analogous to MultiSegmentReader], even when
there's no data in the index, or a single segment. (I modified PolyReader so
that 1-segment and 0-segment states wer
Actually, how will Lucy do this?
Even though Lucy's SegmentReader is lighter weight, it still seems
like you shouldn't be opening them in the writer (except for realtime
search)? What's your plan? Are you going to simply make the segment
metadata public?
Mike
On Thu, Mar 26, 2009 at 9:51 PM, M
Also, see here for other ideas that may help:
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
I just updated that page with readOnly IndexReader & NIOFSDirectory.
Mike
On Fri, Mar 27, 2009 at 7:07 AM, Paul Taylor wrote:
> Hi
>
> I am trying to run the performance tests against luc
Are you opening your IndexReader with readOnly=true? If not, you're
likely hitting contention on the "isDeleted" method.
When you run with a "normal" directory, either on a traditional hard
drive or SSD device, do you use NIOFSDirectory? That removes
contention, but, it only works on non-Windows
Hi
I am trying to run the performance tests against lucene, and am suprised
about the results.
I have a test that creates a queue of queries, and a number of threads.
The threads run concurrently getting the next query available, peforming
a query on the index and taking the top hits. The in
Yes, I have scoring, I will start with the 3 that are the closest.
By the way, I saw that Lucene also has Synonyms. Any reason you suggested
Solr?
Thanks,
Liat
2009/3/24 Grant Ingersoll
> Do you have any info that helps you narrow down how many to choose, like
> some type of ranking of the syno
On Thu, Mar 26, 2009 at 9:51 PM, Marvin Humphrey wrote:
>> eg querying whether
>> compound file format is in use, whether separate norms are stored,
>> "get me total size in bytes of all files" (or maybe just "get me all
>> files", plus utility method somewhere to add up the sizes), so this
>> ap
That's an interesting idea yes.
Thanks!
Daan de Wit wrote:
>
> Maybe you can create a filter that parses numeric tokens to their
> locale-specific counterpart, and then search for both the converted and
> the unconverted token.
>
> Daan
>
>> -Original Message-
>> From: Marcel Overdij
Hi
I was going to suggest looking at hibernate search. It comes with
event listeners that modify your indexes when the persistent entity
changes. It use lucene under the hood so if you need to access lucene
the you can.
Indexing can be done sync or async and the documentation shows how to
Maybe you can create a filter that parses numeric tokens to their
locale-specific counterpart, and then search for both the converted and
the unconverted token.
Daan
> -Original Message-
> From: Marcel Overdijk [mailto:marceloverd...@gmail.com]
> Sent: vrijdag 27 maart 2009 7:55
> To: jav
33 matches
Mail list logo