Hi all,
I am using the following simple code, which led to NoClassDefFoundError for
EarlyTerminatingSortingCollector. Any one can help?
Thanks.
RAMDirectory index_dir = new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer();
AnalyzingInfixSuggester suggester = new
AnalyzingInfixSuggester
Hi,
I am using Lucene 4.10 suggester which I thought can return similar phrase.
But it turned out the different way.
My code is as follow:
public static void main(String[] args) throws IOException {
String path = "c:/data/suggest/dic.txt";
Dictionary dic;
dic = new FileDictionary(new FileInpu
Hi,
I have an index of multiple gigabytes which serves 5-10 threads and needs
refreshing very often. I wonder if RAMDirectory is the good candidate for
this purpose. If not, what kind of directory is better?
Thanks,
Cheng
Hi,
I build a query using
QueryBuilder.createBooleanQuery("title","【微信活动】6500盒“健康瘦身减肥”梅免费送").
When I check the query, the toString() of this query looks like:
Query: title:而 title:不用 title:下载 title:2. title:目前 title:来说 title:已经
title:完美越狱 title:的人 title:没有 title:任何 title:必要 title:再用 title:红 titl
;
try {
writer.addDocument(d);
writer.commit();
} catch (Exception e) {
}
Unfortunately, when I search the index, all what I get is:
{号=202, 栋=6}, which doesn't contain double quotes. Therefore I can't
rebuild the map object with the return value.
Please help.
On Wed, Feb 13, 2013 at 10:
e String representation of a Map, the same way you
> do any other String: use StringField or an analyzer that keeps the
> characters you want it to. Maybe WhitespaceAnalyzer.
>
>
> --
> Ian.
>
>
> On Wed, Feb 13, 2013 at 1:34 AM, Cheng wrote:
> > Hi,
> >
>
t;> SEVERE: Socket accept failed
> >>> org.apache.tomcat.jni.Error: 24: Too many open files
> >>> at org.apache.tomcat.jni.Socket.accept(Native Method)
> >>> at
> >>>
> org.apache.tomcat.util.net.AprEndpoint$Acceptor.run(AprEndpoint.java:
ndless
>
> http://blog.mikemccandless.com
>
> On Tue, Jan 22, 2013 at 8:20 AM, Cheng wrote:
> > Hi,
> >
> > I run a Lucene application on Tomcat. The app will try to open a Linux
> > directory, and sometime returns CorruptIndexException error.
> >
> > Shortly after I r
What version of
> lucene?
>
> --
> Ian.
>
>
> On Fri, Sep 28, 2012 at 1:56 AM, Cheng wrote:
> > Hi,
> >
> > I have a ram based index which occasionally needs to be persistent with a
> > disk based index. Every time the size doubles which eats up my di
, qibaoyuan wrote:
> check out http://code.google.com/p/ik-analyzer/ it's quite
> straightforward.
>
>
>
> At 2012-09-06 22:22:45,Cheng wrote:
> >I use 3.5 now, and plan to try 3.6. How can I use IKAnalyzer and make the
> >analyzer to use my own dicti
cn seems not able to import your own dictionay,it can only import
> >> stop word dict;You can try IKAnalyzer instead.
> >>
> >>
> >> At 2012-09-06 22:10:15,Cheng wrote:
> >> >Thanks. I will try that.
> >> >
> >> >Another questi
yzer instead.
>
>
> At 2012-09-06 22:10:15,Cheng wrote:
> >Thanks. I will try that.
> >
> >Another question. How to use my own dictionary instead of the default one
> >either in FatJAR or smartcn.jar?
> >
> >On Thu, Sep 6, 2012 at 10:07 AM wrote:
> >
>
Also, I checked and couldn't find the smartcn.jar in the originally shipped
Lucene jar. Should I build it myself? and how?
Thanks.
On Thu, Sep 6, 2012 at 10:10 AM, Cheng wrote:
> Thanks. I will try that.
>
> Another question. How to use my own dictionary instead of the default o
Thanks. I will try that.
Another question. How to use my own dictionary instead of the default one
either in FatJAR or smartcn.jar?
On Thu, Sep 6, 2012 at 10:07 AM, 齐保元 wrote:
>
>
> import contrib/smartcn.jar is not complicated.or you can try FatJAR.
>
>
> At 2012-09-06 22:
the past.
>
> Can you explain how you are using Lucene?
>
> You may also want to try the CachingRAMDirectory patch on
> https://issues.apache.org/jira/browse/LUCENE-4123
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sat, Jun 16, 2012 at 7:18 AM, Cheng
ues.apache.org/jira/browse/LUCENE-4123
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sat, Jun 16, 2012 at 7:18 AM, Cheng wrote:
> > After a number of test, the performance of MMapDirectory is not even
> close
> > to that of RAMDirectory, in terms of sp
hetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> > Sent: Monday, June 04, 2012 6:10 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: RAMDirectory unexpectedly slows
> >
n e.g.
> Wikipedia.
>
> Uwe
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de
>
>
>
> Cheng schrieb:
>
> Please shed more insight into the difference between JVM heap size and the
> memory size used by Lucene.
>
> What
ill be cached in RAM regardless in the OS system IO
> cache.
>
> 1.
> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/apache/lucene/store/bytebuffer/ByteBufferDirectory.java
>
> On Mon, Jun 4, 2012 at 10:55 AM, Cheng wrote:
> > My indexes are 500MB
the file system cache of the operating system, so copying data
> to Java heap space is not useful."
>
> -- Jack Krupansky
>
> -Original Message- From: Cheng
> Sent: Monday, June 04, 2012 10:08 AM
> To: java-user@lucene.apache.org
> Subject: RAMDirectory unexpecte
Thank you. The alternative sounds reasonable.
On Thu, Feb 23, 2012 at 12:54 PM, Shai Erera wrote:
> Hi Cheng,
>
> You will need to use the exact path labels in order to get to the category
> 'Mark Twain', unless you index multiple paths from start, e.g.:
> /author/Amer
Hi,
I am using Taxonomy Search to build a facet comprising things such as
“/author/American/Mark Twain”.
Since the word "author" has a synonym of "writer", can I use "writer"
instead of "author" to get the path?
Currently I can only use exactly the word "author" to do it.
Thanks
great idea!
On Sun, Feb 19, 2012 at 9:43 PM, Li Li wrote:
> you can delete by query like -category:category1
>
> On Sun, Feb 19, 2012 at 9:41 PM, Li Li wrote:
>
> > I think you could do as follows. taking splitting it to 3 indexes for
> > example.
> > you can copy the index 3 times.
> > for co
only have one writer against one index at a time. Lucene's
> > locking will prevent anything else.
> >
> >
> > --
> > Ian.
> >
> >
> > On Tue, Feb 14, 2012 at 4:49 PM, Cheng wrote:
> > > Hi,
> > >
> > > I need to mana
mit() when you want changes to be durable (survive
> OS/JVM crash, power loss, etc.).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Feb 13, 2012 at 1:17 PM, Cheng wrote:
> > Hi,
> >
> > My application will go on for ever. When is good t
thanks a lot
On Wed, Feb 8, 2012 at 9:48 PM, Ian Lea wrote:
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>
> (the 3rd item is Use a local filesystem!)
>
> --
> Ian.
>
>
> On Wed, Feb 8, 2012 at 12:44 PM, Cheng wrote:
> > Hi,
> >
> >
; // Do not use s after this!
> s = null;
>
> --
> Ian.
>
>
> On Wed, Feb 8, 2012 at 12:09 PM, Cheng wrote:
> > You are right. There is a method by which I do searching. At the end of
> the
> > method, I release the index searcher (not the searchermanager).
> >
&g
Calling release() multiple times?
>
> From the exception message the first sounds most likely.
>
>
> --
> Ian.
>
>
> On Wed, Feb 8, 2012 at 5:20 AM, Cheng wrote:
> > Hi,
> >
> > I am using NRTManager and NRTManagerReopenThread. Though I don'
rformance w/o it (after removing the
> commit calls). NRT is very fast...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Feb 6, 2012 at 11:46 AM, Cheng wrote:
> > Good point. I should remove the commits.
> >
> > Any difference between NRTCas
all flushed segments.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Feb 6, 2012 at 10:45 AM, Cheng wrote:
> > Uwe, when I meant speed is slow, I didn't refer to instant visibility of
> > changes, but that the changes may be synchronized with FSDirecto
Agree.
On Mon, Feb 6, 2012 at 11:53 PM, Uwe Schindler wrote:
> Hi Cheng,
>
> all pros and cons are explained in those articles written by Mike! As soon
> as there are harddisks in the game, there is a slowdown, what do you
> expect?
> If you need it faster, buy SSDs! :-)
>
necessary.
>
> If you are using NRTManager why do you care how long this takes? How
> often are you calling it? Why?
>
>
> --
> Ian.
>
>
> On Mon, Feb 6, 2012 at 3:45 PM, Cheng wrote:
> > Uwe, when I meant speed is slow, I didn't refer to instant visibi
date the index? Time taken for updates to become visible in search
> results? Time taken for searches to run on the IndexSearcher returned
> from SearcherManager? Something else?
>
>
> --
> Ian.
>
>
> On Mon, Feb 6, 2012 at 3:27 PM, Cheng wrote:
> > Ian,
> >
/goo.gl/mzAHt
> http://goo.gl/5RoPx
> http://goo.gl/vSJ7x
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Cheng [mailto:zhoucheng2...@gmail.c
Ian,
I encountered an issue that I need to frequently update the index. The
NRTManager seems not very helpful on this front as the speed is slower than
RAMDirectory is used.
Any improvement advice?
On Mon, Feb 6, 2012 at 10:24 PM, Cheng wrote:
> That really helps! I will try it
ith
> >> nrtm.updateDocument(...), and to search use
> >>
> >> IndexSearcher searcher = srchm.acquire();
> >> try {
> >> search ...
> >> } finally {
> >> srchm.release(searcher);
> >> }
> >>
> >> All thread s
e. And I bet it'll be blindingly fast.
>
> Don't forget to close() things down at the end.
>
>
> --
> Ian.
>
>
>
> On Mon, Feb 6, 2012 at 12:15 AM, Cheng wrote:
> > I was trying to, but don't know how to even I read some of your blogs.
>
//blog.mikemccandless.com
>
> On Sun, Feb 5, 2012 at 9:03 AM, Cheng wrote:
> > Hi Uwe,
> >
> > My challenge is that I need to update/modify the indexes frequently while
> > providing the search capability. I was trying to use FSDirectory, but
> found
> >
know of MMapDirectory, and wonder if it is as fast as RAMDirectory.
On Sun, Feb 5, 2012 at 4:14 PM, Uwe Schindler wrote:
> Hi Cheng,
>
> It seems that you use a RAMDirectory for *caching*, otherwise it makes no
> sense to write changes back. In recent Lucene versions, this is
gt; >
> > > -Original Message-
> > > From: Pedro Lacerda [mailto:pslace...@gmail.com]
> > > Sent: Saturday, January 28, 2012 12:49 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: How to avoid filtering stop words like "
Hi,
I don't want to filter certain stop words within the StandardAnalyzer? Can
I do so?
Ideally, I would like to have a customized StandardAnalyzer.
Thanks.
simon.willna...@googlemail.com> wrote:
> Hey,
>
>
> On Wed, Jan 25, 2012 at 11:01 PM, Cheng wrote:
> > Hi,
> >
> > I am using multiple writer instances in a web service. Some instances are
> > busy all the time, while some aren't. I wonder how to configure the
>
Hi,
I am using multiple writer instances in a web service. Some instances are
busy all the time, while some aren't. I wonder how to configure the writer
to dissolve itself after a certain time of idling, say 30 seconds.
If the answer is yes, can I do more in the dissolving, such as writing the
ch
Hi, can any of you provide a working code example that utilizes the
NRTManager, NRTManagerReopenThread and ExecutorServices instances?
The limited availability of information regarding these classes really
drives me nut.
Thanks
greate thanks
On Mon, Jan 16, 2012 at 5:56 AM, findbestopensource <
findbestopensou...@gmail.com> wrote:
> Check out the presentation.
> http://java.dzone.com/videos/archive-it-scaling-beyond
>
> Web archive uses Lucene to index billions of pages.
>
> Regards
> Aditya
> www.findbestopensource.com
I saw the link,
https://builds.apache.org/job/Lucene-3.x/javadoc/contrib-misc/org/apache/lucene/index/NRTManagerReopenThread.html,
which talks about how to use the NRTManagerReopenThread.
I am currently using the Java ExecutorService framework to utilize a
multiple threading scenario. Pls see belo
I just found some interesting stuff here:
https://builds.apache.org/job/Lucene-3.x/javadoc/contrib-misc/org/apache/lucene/index/NRTManagerReopenThread.html
How the NRTManager is plugged into my executeservice framework?
On Sun, Jan 15, 2012 at 1:04 AM, Cheng wrote:
> That sounds like wha
ose readers will be reopened.
> > So in general, a reopen after a small number of updates may well be
> > quicker than a reopen after a large number of updates. How important
> > is it that your searches get up to date data? If vital, you'll have to
> > reopen. If not
I have 10MM entities, for each of which I will index 10-20 fields. Also, I
will have to index 100MM related information of the entities, and each
piece of the information will have to go through some Analyzer.
I have a few questions:
1) Can I use just one index folder for all the data?
2) If I h
The reason is I have indexes on hard drive but want to load them into ram
for faster searching, adding, deleting, etc.
Using RAMDirectory can help achieve this goal.
On Thu, Jan 12, 2012 at 6:36 PM, Sanne Grinovero
wrote:
> Maybe you could explain why you are doing this? Someone could suggest
>
I am currently using the following statement at the end of each index
writing, although I don't know if the writing modifies the indexes or not:
is = new IndexSearcher(IndexReader.openIfChanged(ir));
# is -> IndexSearcher, ir-> IndexReader
My question is how expensive to create a searcher insta
; Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Jan 11, 2012 at 3:29 PM, Cheng wrote:
> > Will do if I see a perf gain.
> >
> > The other issue is that in each thread my apps will not only do indexing
> > but searching. That means I will have to pass
to an FSDir).
>
> If you see a perf gain then please report back!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Jan 11, 2012 at 3:09 PM, Cheng wrote:
> > Can I create a RAMDirectory based writer and have it work cross all
> > threads? In the sen
Can I create a RAMDirectory based writer and have it work cross all
threads? In the sense, I would like to use RAMDirectory every where and
have the RAMDirectory written to FSDirectory in the end.
I suppose that should work, right?
On Wed, Jan 11, 2012 at 2:31 PM, Michael McCandless <
luc...@mik
I have read a lot about IndexWriter and multi-threading over the Internet.
It seems to me that the normal practice is:
1) use a same indexwriter instance for multiple threads;
2) create an individual RAMDirectory per threads;
3) use addIndexes(Directory[]) methods to add to a local drive folder al
Hi,
I use a same instance of writer for multiple threads. It turns out that the
time to finish jobs is more than to create a new writer instance in each
thread. What would be the possible reasons?
Thanks
I tried IndexWriterConfig.OpenMode CREATE, and the size is doubled.
The only way that is effective is the writer's deleteAll() methods.
On Mon, Jan 9, 2012 at 5:23 AM, Ian Lea wrote:
> If you load an existing disk index into a RAMDirectory, make some
> changes in RAM and call addIndexes to add
Hi,
I new a RAMDirectory based upon a FSDirectory. After a few modifications, I
would like to synchronize the two.
Some on the mailing list provided a solution that uses addIndex() function.
However, the FSDirectory simply combines with the RAMDirectory, and the
size doubled.
How can I do a rea
Hi, my servlet application is running a large index of 20G. I don't think
it can be loaded to RAM at one time.
What are the general strategies to improve the search and write performance?
Thanks
Hi,
I am trying to use a shared IndexWriter instance for a multi-thread
application. Surprisingly, this under performs by creating a writer
instance within a thread.
My code is as follow. Can someone help explain why? Thanks.
Scenario 1: shared IndexWriter instance
RAMDirectory ramDir = new RA
Hi,
I was trying to use QueryParser for some chinese, but encountered the
following issues:
(1) org.apache.lucene.queryParser.ParseException: Cannot parse '大众UP!':
Encountered "" at line 1, column 5.
the error seems to be the Chinese exclamation mark.
(2) org.apache.lucene.queryParser.ParseExce
xWriter( fs, ... );
> try {
>writer.addIndexes( ram );
> } finally {
> writer.close();
> }
> }
>
>
> http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/index/IndexWriter.html#addIndexes(org.apache.lucene.store.Directory
> ..
>
Hi,
Suppose that we have a huge amount of indices on hard drives but working in
RAMDirectory is a must, how can we decide which part of the indices to be
loaded into RAM, how to modify the indices, and when and how to synchronize
the indices with those on hard drives?
Any thoughts?
Thanks!
Hi,
I am creating a RAMDirectory based upon a folder on disk. After doing a lot
of adding, deleting, or updating, I want to flush the changes to the disk.
However, the flush() function is not available for 3.5. How can I save the
changes to disk?
Thanks!
Hi,
I need to save a list of records into an index on hard drive. I keep a
writer and a reader open till the end of the operation.
My issue is that I need to compare each of the new records with each of the
records that have been saved into the index. There are plenty of duplicate
records in the
> index2.close();
> index3.close();
> ...
>
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> > Sent: Thu
Hi,
I have multiple indexed folders (or directories), each holding indexing
files for specific purposes. I want to do a search over these folders (or
directories) in a same query.
Is it possible?
Thanks
Hi, I notice that there are a few run() methods in Fetcher.java and that the
following statement in Crawler.java calls the JobClient.runJob(job) in
Fetcher.java.
fetcher.fetch(segs[0], threads,
org.apache.nutch.fetcher.Fetcher.isParsing(conf));
I would like to know which run() in Fetcher.java has
thanks lan.
On Wed, May 25, 2011 at 11:44 PM, Ian Lea wrote:
> Sure. See the javadocs for IndexWriter.setMaxFieldLength or
> LimitTokenCountAnalyzer if you are using 3.1.0.
>
>
> --
> Ian.
>
>
> On Wed, May 25, 2011 at 4:24 PM, Cheng Zhou
> wrote:
> > Hi
Hi, I wonder if I can associate a text string of over 5MB with a single
field.
Thanks.
elds and that of the field boost?
Cheng
On Wed, May 25, 2011 at 6:20 PM, Ian Lea wrote:
> > Quite a few Lucene examples on lines shows how to insert multiple fields
> > into a Document and how to query the indexed file with certain fields and
> > queried text. I would like to kn
Hi,
I have a large number of XML files to be indexed by Lucene. All the files
share similar structure as below:
..
Things to be noted are:
The root element of Group has 30 or so attributes, and it usually has over
2000 Subgroup elements, which in turn also have more than 20
the list - didn't notice that my reply went to Cheng directly)
There is an Ant target "get-db-jar" that can do the downloading for you - you
can see the URL it uses here:
<http://svn.apache.org/viewvc/lucene/java/tags/lucene_3_0_3/contrib/db/bdb/build.xml?view=markup#l49>
tream class is not available. Please see this, "public final class
FastCharStream implements CharStream"
What is it? Do you know where to download it?
3) The QueryParser class can't be resolve. Please see this, SrndQuery lq =
QueryParser.parse(queryText);
Thanks,
Cheng
-
package, which are
under contrib/db/bdb/src/java folder.
Do you know when I can find the proper jar file?
Cheng
-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu]
Sent: Sunday, May 15, 2011 10:08 PM
To: java-user@lucene.apache.org
Subject: RE: Lucene 3.3 in Eclipse
Hi Cheng
Hi, I created a java project for Lucene 3.3 in Eclipse, and found that in
the DbHandleExtractor.java file, the package of com.sleepycat.db.internal.Db
is not resolved. How can I overcome this?
I have tried to download .jar for this, but don't know which and where to
download.
Thanks
>
> http://java.sun.com/javase/technologies/hotspot/gc/index.jsp
>
> -Original Message-
> From: Peter Cheng [mailto:[EMAIL PROTECTED]
> Sent: Sunday, October 05, 2008 7:55 AM
> To: java-user@lucene.apache.org
> Subject: RE: Memory eaten up by String, Term and Te
//search.dbsight.com
> > Lucene Database Search in 3 minutes:
> >
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database
> _Search_in_3_minutes
> > DBSight customer, a shopping comparison site, (anonymous per
> > request) got
> > 2.6 Million Euro funding!
&g
t; Instant Scalable Full-Text Search On Any Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> > Lucene Database Search in 3 minutes:
> >
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database
> _Search_in_3_minutes
> &
Hi there,
I use luke v0.8.1 which build base on lucene 2.3.0. First, I run
lucene/demo/IndexFiles to build index successfully. Then I use luke to
open index, but luke issues "Unknown format version: -6" . I check the
documentation of lucene which said "lucene 2.3.2 does not contain any
new
The lucene FAQ says:
What wildcard search support is available from Lucene?
Lucene supports wild card queries which allow you to perform searches
such as book*, which will find documents containing terms such as book,
bookstore, booklet, etc. Lucene refers to this type of a query as a
'prefix quer
the debugger that came with eclipse is pretty good for this purpose.
You can create a small project and then attach Lucene source for the
purpose of debugging.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e
hi,
what is the correct way to instruct the indexwriter (or other
classes?) to delete old
commit points after N minutes ?
I tried to write a customized IndexDeletionPolicy that uses the
parameters to schedule future
jobs to perform file deletion. However, I am only getting the
filenames through the
hi,
what is the correct way to instruct the indexwriter to delete old
commit points after N minutes ?
I tried to write a customized IndexDeletionPolicy that uses the
parameters to schedule future
jobs to do file deletion. However, I am only getting the filenames,
and not absolute file names.
thank
hi,
what is the correct way to instruct the indexwriter to delete old
commit points after N minutes ?
I tried to write a customized IndexDeletionPolicy that uses the
parameters to schedule future
jobs to do file deletion. However, I am only getting the filenames,
and not absolute file names.
thank
Hi all,
I found that instead of storing a term ID for a term in the index, Lucene
stores the actual term string value. I am wondering if there ever is such a
"term ID" for each distinctive term indexed in Lucne, similar as a "doc ID"
for each distinctive document indexed in Lucene.
In other words
I've encountered a few discrepcies between the javadoc of Lucene and the
source code.
I use:
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/ as
the most up-to-date javadoc reference.
For instance, the SegmentTermDocs class implements the TermDocs interface.
However, there is
87 matches
Mail list logo