Re: looking for some help

2009-06-08 Thread Ganesh
Lucene Java is the base for all. http://lucene.apache.org/java/docs/index.html Regards Ganesh - Original Message - From: "Mohit Verma" To: Sent: Sunday, June 07, 2009 11:32 AM Subject: looking for some help > Hi, > I am currently working on a project where I need a desktop search API

How to make wordDelimiterFilter[pulled from Solr nighly] to not break non-english words in a wrong way in lucene indexing/searching?

2009-06-08 Thread KK
Hi All, I'm trying to index some indian web page content which are basically a mix of indian and say 5% of english content in the same page itself. For all this I can not use standard or simple analyzer as they break the non-english words in a wrong places say[because the isLetter(ch) happens to be

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-08 Thread Robert Muir
KK can you give me an example of some indian text for which it is doing this? Thanks! On Mon, Jun 8, 2009 at 1:03 AM, KK wrote: > Hi Robert, > The problem is that worddelimiterfilter is doing its job for english content > but for non-english indian content which are unicoded it highlights the > s

indexing performance problems

2009-06-08 Thread Mateusz Berezecki
Hi list, I'm having a trouble with achieving good performance when indexing XML wikipedia dump. The indexing process works as follows 1. setup FSDirectory 2. setup IndexWriter 3. setup custom analyzer chaining wikipediatokenizer, lowercasefilter, porterstemmer, stopfilter and lengthfilter 3. crea

Re: Most frequently indexed term

2009-06-08 Thread Ganesh
Thanks. This works well. The logic is 1. Do the search, For every document get the list of terms and its frequency. 2. Use SortedTermVectorMapper to generate a list of unique terms and its frequency. 2. Sort them to get the list of top numbered frequently indexed terms in a given date range (

AW: Most frequently indexed term

2009-06-08 Thread Uwe Goetzke
Hello Ganesh, What about making a seperate index for each day, get your analysis and merge thereafter that index. I am not sure but I think this might work. Use MultiSearcher for the search. Regards Uwe Goetzke -Ursprüngliche Nachricht- Von: Ganesh [mailto:emailg...@yahoo.co.in] Ge

problem with A cap circum followed by space(Â )

2009-06-08 Thread Rupesh1mb
Hi, I have a problem when i enumerate wildcard in a index then i get two terms 'the'. Actually what was the cause was one was 'the' and another was 'the ' i.e. 'the' followed by  followed by space. What is this  followed by space and why is this being treated as a single character? How am i

Re: indexing performance problems

2009-06-08 Thread Michael McCandless
This isn't normal. A mergeFactor of 150 is way too high; I'd put that back to 10 and see if the problem persists. Also make sure you're using autoCommit=false, and try the suggestions here: http://wiki.apache.org/lucene-java/ImproveIndexingSpeed You're sure the JRE's heap size is big enough

Re: indexing performance problems

2009-06-08 Thread Mateusz Berezecki
Hi Michael Thanks for a prompt response. On Mon, Jun 8, 2009 at 1:27 PM, Michael McCandless wrote: > This isn't normal. > > A mergeFactor of 150 is way too high; I'd put that back to 10 and see > if the problem persists.  Also make sure you're using > autoCommit=false, and try the suggestions her

Re: indexing performance problems

2009-06-08 Thread Michael McCandless
On Mon, Jun 8, 2009 at 7:54 AM, Mateusz Berezecki wrote: > Thanks for a prompt response. You're welcome! >> A mergeFactor of 150 is way too high; I'd put that back to 10 and see >> if the problem persists.  Also make sure you're using >> autoCommit=false, and try the suggestions here: >> >>    h

Re: Retrieving the term vectors of a document in Nutch

2009-06-08 Thread Grant Ingersoll
I'd ask on the nutch-u...@lucene.apache.org mailing list. While Lucene can do all of these things, it is not clear how Nutch exposes, if at all, any of this information. You should be able to get results there. Note, however, that Term Vecs must be created during indexing by creating th

Re: indexing performance problems

2009-06-08 Thread Mateusz Berezecki
Hi Michael, Thanks a lot for a hint. I'll test it out in a few hours and get back to you and/or the list. best, Mateusz On Mon, Jun 8, 2009 at 2:13 PM, Michael McCandless wrote: > On Mon, Jun 8, 2009 at 7:54 AM, Mateusz Berezecki wrote: > >> Thanks for a prompt response. > > You're welcome! > >>

How to know in the returned documents of a query had match in a specific field

2009-06-08 Thread Jamil Marques Figueira Junior
Hello, I wold like to know if any document returned from my search query had match in a specific field. Example: Documents: Field 1 - Company Name Field 2 - Street Document1 (CompanyName = "metalpack corp", Street = "Route 66") Document2 (CompanyName = "ibi Bank", Street = "metalpack") If I

Re: Retrieving the term vectors of a document in Nutch

2009-06-08 Thread House Less
Hello Grant, > I'd ask on the nutch-u...@lucene.apache.org mailing list. While Lucene can > do > all of these things, it is not clear how Nutch exposes, if at all, any of > this > information. You should be able to get results there. Thanks, I'll be sure to ask them. > Note, however, t

Lucene for the Mac

2009-06-08 Thread Ian Vink
Is there a Mac port of the Lucene engine?

RE: Lucene for the Mac

2009-06-08 Thread Uwe Schindler
Lucene is written in Java and Java is cross-platform. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Ian Vink [mailto:ianv...@gmail.com] > Sent: Monday, June 08, 2009 11:55 PM > To: java-user@lucene.apac

Re: Lucene for the Mac

2009-06-08 Thread Paul Libbrecht
Le 08-juin-09 à 23:55, Ian Vink a écrit : Is there a Mac port of the Lucene engine? I don't get it, are you asking whether Lucene java works on MacOSX? answer is yes. Are you asking for a Cocoa and ObjC port? (don't know) paul smime.p7s Description: S/MIME cryptographic signature

Re: Lucene for the Mac

2009-06-08 Thread Ian Vink
Yes, if there an Objective-C version. Ian On Mon, Jun 8, 2009 at 6:57 PM, Paul Libbrecht wrote: > > Le 08-juin-09 à 23:55, Ian Vink a écrit : > >> Is there a Mac port of the Lucene engine? >> > > I don't get it, are you asking whether Lucene java works on MacOSX? answer > is yes. > Are you aski

Re: How to distribute lucene using rsync

2009-06-08 Thread pof
Is their a way to address this without Solr? I was looking at Solr today and it looks like a lot of piss-farting around just to backup an index. I also found their explaination on http://wiki.apache.org/solr/CollectionDistribution poorly explained, so I am a bit confused with the whole process.

Re: Lucene for the Mac

2009-06-08 Thread Grant Ingersoll
http://www.lucidimagination.com/search/?q=Objective+C+port+of+Lucene suggests there is, although I don't know the state of it. On Jun 8, 2009, at 6:05 PM, Ian Vink wrote: Yes, if there an Objective-C version. Ian On Mon, Jun 8, 2009 at 6:57 PM, Paul Libbrecht wrote: Le 08-juin-09 à 2

Re: Lucene for the Mac

2009-06-08 Thread Ian Vink
Found it: http://github.com/tcurdt/lucenekit/tree/master Mac Lucene project. iPhone, Mac Desktop/Laptop Great! On Mon, Jun 8, 2009 at 10:00 PM, Grant Ingersoll wrote: > http://www.lucidimagination.com/search/?q=Objective+C+port+of+Lucenesuggests

HitCollectorWrapper

2009-06-08 Thread Koji Sekiguchi
CHANGES.txt said that we can use HitCollectorWrapper: 12. LUCENE-1575: HitCollector is now deprecated in favor of a new Collector abstract class. For easy migration, people can use HitCollectorWrapper which translates (wraps) HitCollector into Collector. But it looks package private? Thank you,

2.9 javadoc

2009-06-08 Thread Artyom Sokolov
Good time of day. If I understand correctly next release will be 2.9. Where one could find javadocs for it? I've searched in Hudson a bit but didn't find anything. Thanks. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.a