Re: SynonymGraphFilter can't consume an incoming graph

2019-02-14 Thread lambda.coder lucene
rowse/LUCENE-6664?focusedCommentId=16386294&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16386294 > Le 11 févr. 2019 à 05:46, Erick Erickson a écrit : > > It's, well, undefined. As in nobody knows except that it'll be wrong. > And exactly

SynonymGraphFilter can't consume an incoming graph

2019-02-10 Thread lambda.coder lucene
Hello, The Javadocs of SynonymGraphFilter says that it can’t consume an incoming graph and that the result will be undefined Is there any example that exhibits the limitations and what is meant by undefined ? Regards Patrick -

Re: A key value field storing

2012-03-21 Thread Deb Lucene
eanQuery bq = new BooleanQuery(); > Query nameq = parser.parse(...) or whatever > Query confq = NumericRangeQuery.newXxx(...); > bq.add(nameq, ...); > bq,add(confq, ...); > > and search using bq. > > > -- > Ian. > > > On Wed, Mar 21, 2012 at 2:20 PM, Deb Lucene w

A key value field storing

2012-03-21 Thread Deb Lucene
Hi Group, Sorry for cross posting! We need to index a document corpus (news articles) with some meta data features. The meta data are actually company names with some scoring (a double, between 0 to 1). For example, two documents can be - document 1 (some text - say a technical article from NY t

Re: Multi field search with values

2012-03-20 Thread Deb Lucene
Hi group, Is there any way to index a document based on a key value (key = text, value = double) pair? For example, we have a situation where - document 1 IBM - 0.5 Google - 0.9 Apple - 0.3 document 2 IBM - 0.6 Google - 0.1 Apple - 0.4 now we need to search using two fields, the name (e.g. "I

Re: Multi field search with values

2012-03-14 Thread Deb Lucene
dd(qthresh, ...) > > and use bq in the search call. > > > -- > Ian. > > > On Wed, Mar 14, 2012 at 3:32 PM, Deb Lucene wrote: > > Hi Group, > > > > I am working on a Lucene search solution for multiple fields. So far, if > > the fields are of string type I

Multi field search with values

2012-03-14 Thread Deb Lucene
Hi Group, I am working on a Lucene search solution for multiple fields. So far, if the fields are of string type I am having no difficulties in retrieving using the MultiFieldQueryParser. For example, my indexing and searching logic look like this - indexing - I am indexing a corpus on the

SearchBlox is now FREE. No limitations!

2010-12-06 Thread Lucene
number of new paid support packages and free forum-based support. SearchBlox is an Enterprise Search Server built on top of Apache Lucene and includes: - Integrated crawlers for HTTP/HTTPS, filesystems and feeds - Web based Admin Console to configure and manage upto 250 indexes - REST API

Next Word - Any Suggestions?

2010-10-26 Thread Lucene
Am about to implement a custom query that is sort of mash-up of Facets, Highlighting, and SpanQuery - but thought I'd see if anyone has done anything similar. In simple words, I need facet on the next word given a target word. For example, if my index only had the following 5 documents (co

[ANN] SearchBlox Version 6.0 incorporating Apache Lucene 3.0.2 released

2010-08-11 Thread Lucene
The SearchBlox Team is pleased to announce the availability of SearchBlox Version 6.0. The new version upgrades to Apache Lucene 3.0.2. SearchBlox is an integrated Enterprise Search Server incorporating Lucene and includes: - Web/RSS/FileSystem Crawlers - REST API - Web Admin Console for

TermDocs

2010-05-12 Thread roy-lucene-user
Hi guys, I've had this code for some time but am just now questioning if it works. I have a custom filter that i've been using since Lucene 1.4 to Lucene 2.2.0 and it essentially builds up a BitSet like so: for ( int x = 0; x < fields.length; x++ ) { for ( int y = 0; y <

Result of query not what I expect

2009-12-29 Thread lucene-newbie123
ut.println("Here is the note: " + doctitle); -- View this message in context: http://old.nabble.com/Result-of-query-not-what-I-expect-tp26958815p26958815.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

[ANN] SearchBlox announces UNLIMITED Edition

2009-08-17 Thread Lucene
The SearchBlox Team is pleased to announce the availability of SearchBlox UNLIMITED Edition of its Lucene-based Search Software. The UNLIMITED Edition allows indexing of unlimited number of documents and provides for unlimited number of development and deployment licenses to the licensee for

Re: Spell check of a large text

2008-12-12 Thread Lucene User no 1981
content is not modified, it's just tagged accordingly. That said, I kind of like your idea, I mean token filter looks like the good candidate. As of Lazzy, is it any different than Lucene SpellChecker (ngram based)? what really matters here is not the accuracy (decent but not exceptional

Spell check of a large text

2008-12-11 Thread Lucene User no 1981
essed with Lucene? Regards, Mac -- View this message in context: http://www.nabble.com/Spell-check-of-a-large-text-tp20953625p20953625.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscri

2.4 Performance

2008-11-18 Thread lucene
On an index of around 20 gigs I've been seeing a performance drop of around 35% after upgrading to 2.4 (measured on ~1 requests identical requests, executed in parallel against a threaded lucene / apache setup, after a roughly 1 query warmup). The principal changes I've made

Re: getting started

2008-08-01 Thread roy-lucene-user
That certainly works if the intent is to grab the entire file. If all you want is that particular line to be returned in the search then that's not going to work. Let's say the files was made up of a million lines and the text was stored in the index (I know, absurd). When grabbing the Document

Re: getting started

2008-08-01 Thread roy-lucene-user
Hello Brittany, I think the easiest thing for you to do is make each line a Document. You might want a FileName and LineNumber field on top of a "Text" field, this way if you need to gather all the lines of your File back together again you can do a search on the FileName. So in your case: Docu

Lucene search time in real production use?

2008-05-31 Thread lucene user
Hi, Folks: What are some average search and retrieval times for Lucene queries in real production use? Would people include relevant stuff like the number of documents in your index, etc.? Thanks for your help!

Handeling when a field does not exist in the document

2008-05-22 Thread lucene user
We have a requirement to inform users on a regular basis of new material on which they have expressed interest. How are we to know what is "new" from the point of view of a particular user? Our idea is to tag each new item in some way (perhaps a date/time stamp in the lucene index indic

Re: Indexing/Querying Annotations and Fields for a document

2008-03-18 Thread lucene-seme1 s
RSON" at the same position as "fred" is to set > the "position increment" of this token to zero. > > Now you can construct a Lucene query that uses this position info in > queries. > i.e. instead of searching for the specific: > >"Fred works for M

Re: Indexing/Querying Annotations and Fields for a document

2008-03-17 Thread lucene-seme1 s
rocessing IMO. Essentially, > you set up a TeeTokenFilter that recognizes your Person and then set > that token aside in the Sink. Then, when you construct the Person > field, you use the SinkTokenizer. > > HTH, > Grant > > On Mar 17, 2008, at 8:54 AM, lucene-seme1 s wro

Indexing/Querying Annotations and Fields for a document

2008-03-17 Thread lucene-seme1 s
Hello, I am a newbie here and still experimenting with Lucene. I have annotations and features generated by GATE for many documents and would like to index the original content of the documents in addition to the generated annotations. The annotations are in the form of [ John loves fishing]. I

Indexing/Querying Annotations and Fields for a document

2008-03-17 Thread lucene-seme1 s
Hello, I am a newbie here and still experimenting with Lucene. I have annotations and features generated by GATE for many documents and would like to index the original content of the documents in addition to the generated annotations. The annotations are in the form of [ John loves fishing]. I

Re: Searching user-private annotations associated with indexed documents

2007-11-27 Thread lucene user
having to reindex the article. > However, the trade-off is that by having the article and annotation in > separate documents, you'll lose the relevance boost you would otherwise > get when the search terms appear both in the annotation and in the > article. > > Pete

Re: Searching user-private annotations associated with indexed documents

2007-11-27 Thread lucene user
? > > Do you want to do things like phrase-search e.g. > "PERSON_ANNOTATION works for Google" > > Or is your idea of an annotation more simply a del.ici.ous-style tag > associated with the whole document? > > Cheers > Mark > > > - Original Messag

Re: Searching user-private annotations associated with indexed documents

2007-11-27 Thread lucene user
I'd be VERY grateful for your help, folks! Thanks! I really need some insight on this. THANKS!! On Nov 26, 2007 6:43 PM, lucene user <[EMAIL PROTECTED]> wrote: > Here are the three options that seem practical to us right now. > > (1) Do The annotation search in postgr

Re: Searching user-private annotations associated with indexed documents

2007-11-26 Thread lucene user
Here are the three options that seem practical to us right now. (1) Do The annotation search in postgres using LIKE or the postgres native full text search. Take the resulting list of file ids and use it to build a filter for the lucene query, the way we currently do for folders. (2

Searching user-private annotations associated with indexed documents

2007-11-26 Thread lucene user
hould handle this? The annotations are changeable by users at any time so we have to be ready to delete them and add others at any time when the user does edit an annotation. Do I need a second Lucene index? Can I do a query against two indexes at the same time? If so, how? The annotations will be

Re: Optimizing index takes too long

2007-11-12 Thread Lucene User
what type of documents are indexing regards gaurav On 11/11/07, Barry Forrest <[EMAIL PROTECTED]> wrote: > > Hi, > > Optimizing my index of 1.5 million documents takes days and days. > > I have a collection of 10 million documents that I am trying to index > wit

Comparing Two Indexes

2007-11-09 Thread Lucene User
Hi, I wanted two compare two indexes.Please recommend an algorithm which takes all the factors into accoubt such as versions of software being used by lucene and application which has an effect on the index being created.We can also compare with certain fields and the text. Regards

Re: Lucene Queries Over User-Editable Dynamic Categories of Documents

2007-10-25 Thread lucene user
been super helpful! Very grateful! Thanks! On 10/24/07, markharw00d <[EMAIL PROTECTED]> wrote: > > lucene user wrote: > > Thanks for all your help! > > > > We are using Lucene 2.1.0 and TermsFilter seems to be new in Lucene > 2.2.0. > > I have not been a

Re: Lucene Queries Over User-Editable Dynamic Categories of Documents

2007-10-24 Thread lucene user
Thanks for all your help! We are using Lucene 2.1.0 and TermsFilter seems to be new in Lucene 2.2.0. I have not been able to find SortedVIntList in the javadocs at all. Because both SortedVIntList and a regular BitSet are based on Lucene Document Numbers, which are not permanent, It seems we

Re: Lucene Queries Over User-Editable Dynamic Categories of Documents

2007-10-24 Thread lucene user
other idea that works for even larger numbers? Frankly, we don't yet understand how our users will use the system in the long run. When you have done stuff like this, how large have the term filters grown? Would it EVER make sense to maintain the end user's catigories in some sort

Lucene Queries Over User-Editable Dynamic Categories of Documents

2007-10-23 Thread lucene user
might have perhaps a few hundred documents in each. These categories might be highly dynamic, with users adding and deleting documents from these categories many times a day. How might we use Lucene to perform searches limited to these very dynamic and end-user editable categories? Any ideas for how

Re: Amount of RAM needed to support a growing lucene index?

2007-08-13 Thread lucene user
<[EMAIL PROTECTED]> wrote: > > > 12 aug 2007 kl. 14.01 skrev lucene user: > > > Do you know if 290k articles and 234 million words is a large > > lucene index > > or a medium one? Do people build them this big all the time? > > If the calculator in my hea

Re: Amount of RAM needed to support a growing lucene index?

2007-08-12 Thread lucene user
Thanks, Karl. Do you know if 290k articles and 234 million words is a large lucene index or a medium one? Do people build them this big all the time? Thanks! On 8/12/07, karl wettin <[EMAIL PROTECTED]> wrote: > > > 12 aug 2007 kl. 09.03 skrev lucene user: > > > If I

Amount of RAM needed to support a growing lucene index?

2007-08-12 Thread lucene user
Hi, Folks - Two quick questions - need to size a server to run our new index. If I have an index with 111k articles and 90 million words indexed, how much RAM should I have to get really fast access speeds? If I have an index with 290k articles and 234 million words indexed, how much RAM should

RE: ERROR opening the Index - contact sysadmin!

2007-06-13 Thread Lucene Help
I tried doing what you did. At step 3, I got the following: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/lucene/demo/IndexFiles Thank you in advance for your help. - Lucene User> From: [EMAIL PROTECTED]> Subject: Re: ERROR opening the Index - contact

Cannot save index to 'index' directory, please delete it first

2007-06-13 Thread Lucene Help
Hello,I tried uninstalling and installing Lucene 2.1.0 and Tomcat 5.5. I followed the Lucene demo instructions and set the CLASSPATH as follows: .;C:\lucene-2.1.0\lucene-core-2.1.0.jar;C:\lucene-2.1.0\lucene-demos-2.1.0.jar I tehn typed java org.apache.lucene.demo.IndexFiles C:\lucene-2.1.0

RE: ERROR opening the Index - contact sysadmin!

2007-06-12 Thread Lucene Help
I tried uninstalling lucene and installing it again. This time, after setting the CLASSPATH C:\lucene-2.1.0\lucene-core-2.1.0.jar;C:\lucene-2.1.0\lucene-demos-2.1.0.jar, I typed java org.apache.lucene.demo.IndexFiles C:\lucene-2.1.0\src into the commandline prompt. I got the following

RE: ERROR opening the Index - contact sysadmin!

2007-06-12 Thread Lucene Help
I am using apache-tomcat 5.5.23 and lucene-2.1.0. At the command prompt, I typed, java org.apache.lucene.demo.IndexHTML - create -index C:\Program Files\Apache Software Foundation\Tomcat 5.5\webapps\opt\lucene\index ..I then got the following: Usage: IndexHTML [-create] [-index ] I then

ERROR opening the Index - contact sysadmin!

2007-06-12 Thread Lucene Help
Hello,I just downloaded Lucene and tried running the demo. I seem to be okay up until I type in a query into the "Search Criteria" page and click on the "Search" button at the URL: http://localhost:8080/luceneweb/At this point I am at the URL http://localhost:8080/lucenew

Re: next() not called in FilterIndexReader.FilterTermDocs

2007-06-10 Thread lucene user
et it working... I'd also like too know if my concept of how the index works at the low level is correct, and if this is my bug or a bug in lucene. On 6/10/07, Erick Erickson <[EMAIL PROTECTED]> wrote: Wouldn't it be easier to just make a Filter? That's what they were intend

next() not called in FilterIndexReader.FilterTermDocs

2007-06-10 Thread lucene user
I am trying to use a Filter Index Reader to provide access only to the subset of my archive for which a certain field contains one of a given list of values. The idea is to create a special term for this field that means 'any term in the list', and add an 'AND' clause to every query, to enforce t

Whats the best way to filter based on a function of an indexed term or field value

2007-06-05 Thread lucene user
take up quite a bit of memory. Use FilterIndexReader A Class FilterIndexReader is provided in Lucene to make it easy to override the index reader. One would think that this would allow you to simply skip documents you don't want whenever they are encountered in the index. For my purposes the Fi

Re: built index doesn't contain a segments file

2007-04-08 Thread lucene
hould throw some exception in case they haven't closed. > Also, what version of Lucene are you using? I ask because > there has been some work in that area for Lucene 2.1, so it > could point towards different issues if you're using an older > version. I actually build the index

built index doesn't contain a segments file

2007-04-08 Thread lucene
segments.gen -rw-r--r-- 1 root root 41 Apr 8 18:37 segments_4 However if I try to IndexReader.open() the index lucene complains about a missing segments file. Well, right, it's not there. But why not? - To unsubscribe, e

Re: reading indice

2006-10-16 Thread heritrix . lucene
Read *org.apache.lucene.index.IndexReader *And *org.apache.lucene.search.IndexSearcher There are description available in these docs. * On 10/17/06, EDMOND KEMOKAI <[EMAIL PROTECTED]> wrote: Can someone tell me how read an index into memory, or how to open an existing index for reading?

Re: IOException and index corruption

2006-10-12 Thread Apache Lucene
m); ? On 10/12/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: On Oct 12, 2006, at 10:17 AM, Apache Lucene wrote: > When I am adding a document to the lucene index if the method > throws an > IOException and if I continue with adding other documents ignoring the > exception, will

IOException and index corruption

2006-10-12 Thread Apache Lucene
When I am adding a document to the lucene index if the method throws an IOException and if I continue with adding other documents ignoring the exception, will the index be corrupted? What happens to the fields which are already written to the index?

Re: searching for the part of a term.

2006-09-27 Thread heritrix . lucene
d in http://www.gossamer-threads.com/lists/lucene/java-user/13345?search_string=Starts%20With%20x%20and%20Ends%20With%20x%20Queries;#13345 was to index rotated token of a word, and then search by the prefix query. But i think here also i'll face the speed issue because of the prefix query..(If i am right

Re: searching for the part of a term.

2006-09-26 Thread heritrix . lucene
Hi, While i was searching forum for my problem for searching a substring, i got few very good links. http://www.gossamer-threads.com/lists/lucene/java-user/39753?search_string=Bitset%20filter;#39753 http://www.gossamer-threads.com/lists/lucene/java-user/7813?search_string=substring;#7813 http

searching for the part of a term.

2006-09-23 Thread heritrix . lucene
Hi All, How can i make my search so that if i am looking for the term "counting" the documents containing "accounting" must also come up. Similarly if i am looking for term "workload", document s containing work also come up as a search result. Wildcard query seems to work in the first case, bu

Re: is there any n-gram analyzer available??

2006-09-22 Thread heritrix . lucene
https://issues.apache.org/jira/browse/LUCENE-400 -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

is there any n-gram analyzer available??

2006-09-22 Thread heritrix . lucene
Hi i am looking for a analyzer that chops a given string into its n-grams. Basically, I want to index 3-grams and more upto the limit of a word. Can anybody tell me if there is any analyzer is available for this. Regards..

Re: Searching is taking a lot...

2006-06-29 Thread heritrix . lucene
Ya you are correct. My idea will not work when there are lots of documents in the index and also there are lots of hits for that page. I am going with you :-) Thanx... On 6/29/06, James Pine <[EMAIL PROTECTED]> wrote: Hey, I'm not a performance guru, but it seems to me that if you've got

Re: Searching is taking a lot...

2006-06-29 Thread heritrix . lucene
perhaps that's not what you ment, perhaps you aren't iterating over any results, in which case using a HitCOllector instead isn't neccessary going to bring that 17sec down. As i told earlier that for the same query minimum time is 2-3 sec and this time is after several attempt(so i think upto th

Re: Searching is taking a lot...

2006-06-29 Thread heritrix . lucene
This will break performance. It is better to first collect all the document numbers (code without the proper declarations): public void collect(int id, float score) { if(docCount >= startDoc && docCount < endDoc) { docNrs.add(id); // or use int[] docNrs when possible. Why

Re: Searching is taking a lot...

2006-06-28 Thread heritrix . lucene
I am using Hits object to collect all documents. Let me tell you my problem. I am creating a web application. Every time when a user looks for something it goes and search the index and return the results. Results may be in millions. So for displaying results, i am doing pagination. Here the probl

Re: Searching is taking a lot...

2006-06-28 Thread heritrix . lucene
. I am using Hits for getiing the results searching the result using Searcher.search(). Is there anyother way of improving its speed. Thanks and regards, On 6/27/06, heritrix. lucene <[EMAIL PROTECTED]> wrote: No. I am not sorting the data... On 6/27/06, Martin Braun <[EMAIL

Re: IndexSearcher in Servlet

2006-06-28 Thread heritrix . lucene
t; >> > It seems to be OK... am I wrong ? >> >> You may be "ok" given your query patterns, but you won't benefit from >> Lucene internal caching unless you use a single IndexSearcher (or >> IndexReader, as just pointed out). >> >>

Re: IndexSearcher in Servlet

2006-06-28 Thread heritrix . lucene
with a static and > instanciated Directory. > > New IndexSearcher(myDirectory) > > It seems to be OK... am I wrong ? You may be "ok" given your query patterns, but you won't benefit from Lucene internal caching unless you use a single IndexSearcher (or I

Re: search performance benchmarks

2006-06-27 Thread heritrix . lucene
Hi, Or Lucene is more like Google in this sense, meaning that the time doesn't depend on the size of the matched result i found that it takes long time if the result set is bigger(upto 25 sec for 29 M results). But for smaller resultset of size approx 10,000 it takes approx. 200 ms.

Re: IndexSearcher in Servlet

2006-06-27 Thread heritrix . lucene
Hi All, I am sorry on my mistake. Now i am agree with you. I had some mistake in my code, I was sharing the hits object in servlet and that was my foolish mistake. Now since i changed it and when i again ran the testcase, there was no problem. i am using single static IndexSearcher now :) Thanks

Re: IndexSearcher in Servlet

2006-06-27 Thread heritrix . lucene
Hi, I also had the same confusion. But today when i did the testing i found that it will merge your results. Therefore i believe that indexSearcher is not thread safe. I tried this on 10,000 requests per second. With Regards On 6/27/06, Ramana Jelda <[EMAIL PROTECTED]> wrote: Hi, You are wrong

Re: IndexSearcher in Servlet

2006-06-27 Thread heritrix . lucene
Hi, The same question i asked yesterday. :-) And now i know the answer :0 Creating a new searcher for each query will make your application very very slow... (leave this idea) U can not have a static indexsearcher object. It will merge all results and the user will get the result of their que

Re: Searching is taking a lot...

2006-06-27 Thread heritrix . lucene
6/26/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: >> >> >> : Can you provide some information on your setup? How are you indexing >> : and searching? Do you have a lot of terms in your query, etc? Have >> you >> : done any profiling of your setup to det

Understanding Boolean Queries..

2006-06-27 Thread heritrix . lucene
Hi i am using lucene 1.9.1. My query is : (subject:cs OR author:ritchie) I am creating one Boolean query for two TermQueries. t1 = new Term("subject", "cs") t2 = new Term("author","ritchie") for this the BooleanQuery i created is: BooleanQuery mergedQue

Re: Searching is taking a lot...

2006-06-27 Thread heritrix . lucene
: done any profiling of your setup to determine where the bottlenecks : are? Are you sure they are in Lucene? what methods are you using for doing the search? (Hits, HitCollector, TopDocs) are you sorting? are you opening a new IndexSearcher for each searcher? what exactly are you timing (a sin

Searching is taking a lot...

2006-06-26 Thread heritrix . lucene
Hi, I have created an index of 47 Million documents. I have 1.28GB RAM. When i am doing a search over this index it is taking on average 25 sec. Is there a way so that i can get results in part of a second... I hope there must be some ways.. Thanks and regards..

Re: addIndexes() is taking infinite time ...

2006-06-22 Thread heritrix . lucene
so how it can be ignored ?? On 6/22/06, Mike Streeton <[EMAIL PROTECTED]> wrote: From memory addIndexes() also does and optimization before hand, this might be what is taking the time. Mike www.ardentia.com the home of NetSearch -Original Message- From: heritrix.lucene [mailto:[EMAIL

Re: addIndexes() is taking infinite time ...

2006-06-21 Thread heritrix . lucene
No. I haven't tried. Today i can try it. One thing that i m thinking is that what role does the file system plays here. I mean is there any difference on if i am doing indexing on FAT32 or i am on EXT3??? i'll have to find it out Can anybody put some light on this?? With regards On 6/22/06,

What is a "Lazy Field"...

2006-06-21 Thread heritrix . lucene
Hi, Can anybody please tell me what a "Lazy Field" is ??? I noticed several time this term has come in discussion... With Regards,

Re: addIndexes() is taking infinite time ...

2006-06-21 Thread heritrix . lucene
hi Otis, Now this time it took 10 Hr 34 Min. to merge the indexes. During merging i noticed it was not completey using the CPU. I have 512MB RAM. and here i found it used upto the 256 MB. Are there some more possibilities to make it more fast ... With Regards, On 6/21/06, heritrix. lucene

Re: addIndexes() is taking infinite time ...

2006-06-20 Thread heritrix . lucene
hi, thanks for your reply. Now i restarted my application with maxBufferedDocs=10,000. And i am sorry to say that i was adding those indexes one by one. :-) Anyway Can you please explain me the addIndex ? I want to know what exactly happens while adding these.. With Regards, On 6/20/06, Otis G

addIndexes() is taking infinite time ...

2006-06-20 Thread heritrix . lucene
Hi all, I had five different indexes: 1 having 15469008 documents 2 having 7734504 documents 3 having 7734504 documents 4 having 7734504 documents 5 having 7734504 documents Which sums to 46407024. The constant values are maxMergeFactor = 1000 maxBufferedDocs = 1000 I wrote a simple program which

Re: How to do pagination on fethed result using lucene...

2006-06-20 Thread heritrix . lucene
Hi, Actually i forgot to write that my application is web based and i am running this on tomcat server. assuming your application is web based, the general concesus is to start by implimening your app so that each page reexecutes the search, reexecuting the search is not feasible as every time

How to do pagination on fethed result using lucene...

2006-06-19 Thread heritrix . lucene
Hi all, I have built an small application that give some thousand results. I want to display results as google displays using pagination. Here my question is, how I'll maintain the sequence of displayed result. Should i associate the "Hits" object along with the session. Assume i want to display

Re: Getting count on distinct values of a field.

2006-06-13 Thread heritrix . lucene
I am sorry for my stupid question. Thanks. :-) Regards, On 6/13/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : But what if that word is present in other fields also. : does "docFreq " only look into that particular field ?? docFreq tells you the frequency of a term, a term is a field a

Re: Getting count on distinct values of a field.

2006-06-13 Thread heritrix . lucene
But what if that word is present in other fields also. does "docFreq " only look into that particular field ?? On 6/13/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: Look at the TermEnum class... iterate over the terms in your field, and docFreq is the number of docs with that term. : Date:

Re: IndexWriter.addIndexes & optimizatio

2006-06-12 Thread heritrix . lucene
Hi, Aprrox 50 Million i have processed upto now. I kept maxMergeFactor and maxBufferedDoc's value 1000. This value i got after several round of test runs. Indexing rate for each document in 50 M, is 1 Document per 4.85 ms. I am only using fsdirectory. Is there any other way to reduce this time??

Re: IndexWriter.addIndexes & optimizatio

2006-06-12 Thread heritrix . lucene
nd pick a mergeFactor that is high, but doesn't get you in trouble > with open files. > > can you please explaing this in brief?? > > regards and thanks, > > On 6/9/06, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > > > > When writing a unit test that comapr

Re: indexing problems

2006-03-07 Thread Apache Lucene
BTW, I could access that index using Luke. It works fine. On 3/7/06, Apache Lucene <[EMAIL PROTECTED]> wrote: > > This line is throwing a null pointer exception for the index I created as > I mentioned in my previous emails. > > searcher = new IndexSearcher(Index

Re: indexing problems

2006-03-07 Thread Apache Lucene
wrote: > > > On Mar 7, 2006, at 10:41 AM, Apache Lucene wrote: > > > Is it advisable to use compound file format? or should I revert it > > back to > > simple file format? How do I revert it back? > > There is a setter on IndexWriter to set it back if you like.

Re: indexing problems

2006-03-07 Thread Apache Lucene
le contains all those individual parts. > > -Yonik > > On 3/7/06, Apache Lucene <[EMAIL PROTECTED]> wrote: > > Hi, > >I am using Lucene 1.9.1 to index the files. The index writer > created > > the following files > > (1) segment file "segmen

indexing problems

2006-03-07 Thread Apache Lucene
Hi, I am using Lucene 1.9.1 to index the files. The index writer created the following files (1) segment file "segments" (2) deletable file "deletable" (3) compound file "cfs" None of the other files like term info, frequency..etc were created. Is there somet

Re: Searching Special Characters

2005-11-16 Thread Lucene User
void index() throws IOException { String indexName = "c:\\lucene\\test"; Analyzer analyzer = new StandardAnalyzer(); IndexWriter writer = new IndexWriter(indexName, analyzer, true); Document d = new Document(); d.add(new Field("headline", &

Searching Special Characters

2005-11-15 Thread Lucene User
Hi Our index contains articles with special characters. For instance, the string P&O is indexed as P&O. The correct entity codes are indexed for all the special characters we use. My question is that a typical user searching for the above will enter P&O but that will not match P&O. I know I coul

Re: Lucene faster on JDK 1.5?

2005-07-08 Thread roy-lucene-user
This might be a good time to ask another question. Are there any advantages to lucene using the java.nio package? Roy On 7/8/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > > Nothing significant, but I've been using 1.5 on > Simpy.com<http://Simpy.com>(lo

Re[4]: md5 keyword field issue

2005-06-20 Thread catalin-lucene
Monday, June 20, 2005, 5:48:30 PM, Erik Hatcher wrote: > Now you've just said the same conflicting thing a different way. You > want to cluster but only return one. :) i think i missunderstood here the Term: cluster. so yes, i just want one image returned. > If you only want one image returned,

Re[2]: md5 keyword field issue

2005-06-20 Thread catalin-lucene
Monday, June 20, 2005, 3:55:36 PM, Erik Hatcher wrote: > Filters reduce the search space to a subset of the documents in the > index. Which document would you want returned when there are > multiple documents in the index with the same MD5? Or do you want to > cluster them by MD5? i think clus

md5 keyword field issue

2005-06-19 Thread catalin-lucene
fferent, so in general all the properties put together (md5, url, alt) compose a different "entity". i bought "Lucene in Action" book, which is a GREAT book. i was looking into "filters". i quote: "If all the information needed to perform filtering is in the index,

Re: SpanTermQuery issue?

2005-06-01 Thread yahootintin-lucene
Reece --- Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On May 31, 2005, at 8:38 PM, Reece Wilton wrote: > > > Hi, > > > > Using a BooleanQuery to combine two SpanTermQuery objects > causes > > unexpected results on Lucene 1.9 RC1. Is this a problem > that is

Changing default "OR" to "AND" for QueryParser

2005-05-05 Thread ivj2324234-lucene
Hello all, For us "OR" defaul causes confusion among the users because they expect narrower results as they add to the query but the oppsite happens because the terms are or-ed. It it possble to request QueryParser to use "AND" as a default instead of "OR"? What API call(s) is responsible for

Re: new added documents not showing

2005-03-23 Thread roy-lucene-user
Pasha, in short, that is all I'm trying to do. Wasn't an issue really before. Otis, not sure what Luke is. But the documents appear after we optimize. Roy. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e

Re: new added documents not showing

2005-03-22 Thread roy-lucene-user
Pasha, in short, that is all I'm trying to do. Wasn't an issue really before. Otis, not sure what Luke is. But the documents appear after we optimize. Roy. On Mon, 21 Mar 2005 18:20:32 -0800 (PST), Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > * Replies will be sent through Spamex to java-use

Re: new added documents not showing

2005-03-21 Thread roy-lucene-user
correct, we also can't see the new documents when we open an IndexReader to the main index. Roy. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: new added documents not showing

2005-03-21 Thread roy-lucene-user
> When do you open the index writer? Where is the code? Ah, sorry. That last section is in a method that gets called in a loop. IndexWriter writer = null; try { writer = new IndexWriter( mainindex, new StandardAnalyzer(), false ); for ( int i = 0; i < dir

Re: new added documents not showing

2005-03-21 Thread roy-lucene-user
On Sat, 19 Mar 2005 22:43:44 +0300, Pasha Bizhan <[EMAIL PROTECTED]> wrote: > Could you provide the code snippets for your process? > Sure (thanx for helping, btw) I just realized that the way I described our process was off a little bit. Here's the process again: 1. grab all index Directorys

  1   2   >