Re: Need Lucene Compression help -- can pay nominal fee

2007-06-18 Thread Sebastin
Hi Hossman, Thanks for your reply.when i index the search fields in my lucene document,it occupy 20% of the original size.how can i reduce the reduce the index size. hossman_lucene wrote: > > > : I need to store all the attributes of the document i index as part of > the > : inde

Re: Phrase Search

2007-06-18 Thread Laxmilal Menaria
Ok.. thanks, I have tried to index address field as UN_TOKENIZED and search using above query, its return Nothing, How can I specified " NOT tokenize" in query.. --Thanks, On 6/18/07, Erick Erickson <[EMAIL PROTECTED]> wrote: Phrase queries won't help you here Your particular issue can be

RE: FW: Lucene indexing vs RDBMS insertion.

2007-06-18 Thread Chew Yee Chuang
Thanks for the sharing and suggestion. Yes Chris, the index is to be partitioned by date time, and old index will not be access so frequent. I also did consider indexing in parallel to different index as well Erick. But I can only put all index in ONE machine and there is only ONE machine to pro

Re: negative queries

2007-06-18 Thread Chris Hostetter
: > And (most) spammers, which is really the point of requiring a : > profile. : : I believe this is called "throwing the baby out with the bath water." you obviously haven't seen the amount of spam that the apache wikis use to get ... The account creation form currently asks you a simple qu

Re: negative queries

2007-06-18 Thread Daniel Noll
On Tuesday 19 June 2007 11:03:25 Erik Hatcher wrote: > > Good way to discourage potential contributors I suppose. > > And (most) spammers, which is really the point of requiring a > profile. I believe this is called "throwing the baby out with the bath water." Daniel -- Daniel Noll Nuix Pt

Re: negative queries

2007-06-18 Thread Erik Hatcher
On Jun 18, 2007, at 8:59 PM, Daniel Noll wrote: On Tuesday 19 June 2007 00:24:39 Steven Rowe wrote: In order to edit wiki pages, you must create a profile and be logged in. Click on the "Login" link in the upper right hand of the front page, to the left of the Search box. Fill out the f

Re: negative queries

2007-06-18 Thread Daniel Noll
On Tuesday 19 June 2007 00:24:39 Steven Rowe wrote: > In order to edit wiki pages, you must create a profile and be logged in. > > Click on the "Login" link in the upper right hand of the front page, to > the left of the Search box. > > Fill out the form that comes up, and click on the "Create Prof

Re: Content Summarization

2007-06-18 Thread Omar Alonso
Take a look at LingPipe (http://alias-i.com/lingpipe/). --- "Mordo, Aviran (EXP N-NANNATEK)" <[EMAIL PROTECTED]> wrote: > Any one knows of a content summarization library. I > need to display a > summarized version of the document, not snippets of > text like the > highlighter, but actually a s

Re: Lucene Search result (scoring )

2007-06-18 Thread Chris Hostetter
: I had tried with Explaination but didn't get the desired results.Can you : give me the brief demo code based on the result order by the no of matching : terms . the Explanation class will not change your scores to give you results in any particular way you might want -- it just explains what fa

Re: Phrase Search

2007-06-18 Thread Chris Hostetter
: Another good old trick is to index field values (tokenized) with : appended special starting and ending tokens, e.g. instead of "Hiran : Magri" use "_start_ Hiran Magri _end_". Then you can query for fields : that are exactly equal to a phrase, while still retaining the : possibility to search b

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-18 Thread Chris Hostetter
: > for the "recentness" aspect a : > ValueSourceQuery composed on a ReverseOrdFieldSource should take : I have a problem with this solution : Document ordering is different : from Recentness : : If i upload 1000 images now, they should have the same "recentness", : even if their order is very di

Re: Lucene for chinese search

2007-06-18 Thread karl wettin
Don't they differ in tokenization? One of them uses grams, the other does not. Or? That would be another thing that might mess it up. But then I never looked at the highlighter, so I can only guess. -- karl 18 jun 2007 kl. 22.37 skrev Chris Lu: Hi, Karl, Thanks for sharing this experience

Re: Lucene for chinese search

2007-06-18 Thread Chris Lu
Hi, Karl, Thanks for sharing this experience. I did find CJKAnalyzer somehow behaves differently than ChineseAnalyzer. When trying to highlight the matched term, ChineseAnalyzer didn't work somehow. But I didn't investigate into it. This is a useful clue for it. -- Chris Lu ---

Re: Lucene for chinese search

2007-06-18 Thread karl wettin
A year or two ago I hacked Lucene to use UTF16 instead of UTF8 as CJK characters are represented by 3 bytes with UTF8, and 2 bytes as UTF16. It is a simple hack. It did however not save me that much as I had a mixed latin and CJK corpus, and I reverted. Still think it is something worth c

Re: Content Summarization

2007-06-18 Thread Mathieu Lecarme
It's not so far from Lucene! http://en.wikipedia.org/wiki/Sentence_extraction have a look at wordnet (http://wordnet.princeton.edu/). Get some list of articles, verb, nouns, and affix rules (like aspell, myspell ...) You will use more cooking rules than code. M. Le 18 juin 07 à 20:29, Mordo,

Content Summarization

2007-06-18 Thread Mordo, Aviran (EXP N-NANNATEK)
Any one knows of a content summarization library. I need to display a summarized version of the document, not snippets of text like the highlighter, but actually a summary of the document. Thanks Aviran - To unsubscribe, e-mail:

Re: FW: Lucene indexing vs RDBMS insertion.

2007-06-18 Thread Chris Lu
Definitely very aggressive. Currently my experience is that, together with database access, DBSight can do 3 million in 2 hours, with Pentium D 3.4Hz. Seems you definitely need some good hardware, and a fast hard drive for this. I feel the hard drive is actually the bottleneck for large indexes.

Re: Lucene for chinese search

2007-06-18 Thread Chris Lu
Basically where ever you see, the encoding should be utf8. The servlet also has an encoding setting. For your case, change the tomcat setting. When rendering jsp page, the encoding also matters. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application s

Re: Lucene Search result (scoring )

2007-06-18 Thread Yatin Soni
Hi Hoss, I had tried with Explaination but didn't get the desired results.Can you give me the brief demo code based on the result order by the no of matching terms . Thanks, Yatin - Original Message - From: "Chris Hostetter" <[EMAIL PROTECTED]> To: Sent: 16, 06, 2007 8:47 AM Subject: Re

Re: negative queries

2007-06-18 Thread Steven Rowe
Hi Daniel, Daniel Noll wrote: > On Saturday 16 June 2007 11:39:35 Chris Hostetter wrote: >> : The mailing list has already answered this question dozens of times. >> : I've been wondering lately, does this list have a FAQ? If so, is this >> : question on it? >> >> The wiki is open to editing by

Re: Phrase Search

2007-06-18 Thread Andrzej Bialecki
Erick Erickson wrote: Phrase queries won't help you here Your particular issue can be addressed, but I'm not sure it's a reasonable long-term solution If you indexed your address field as UN_TOKENIZED, and did NOT tokenize your query, it should give you what you want. What's happening i

Re: Phrase Search

2007-06-18 Thread Erick Erickson
Phrase queries won't help you here Your particular issue can be addressed, but I'm not sure it's a reasonable long-term solution If you indexed your address field as UN_TOKENIZED, and did NOT tokenize your query, it should give you what you want. What's happening is that StandardAnalyzer

Phrase Search

2007-06-18 Thread Laxmilal Menaria
Hello everyone, I am lucene user and tried to implement pharse query, But now getting some logical problems in searching.. My index have 4 fields: Name, Address & City and 6 docs. i.e 1. "Laxmilal Menaria", "Hiran Magri", "Udaipur", 2. "Mohan Sharma", "Hiran Magri Sec 10", "Udaipur"

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-18 Thread Erick Erickson
Good point. You could also think about just storing the date with the appropriate resolution (e.g. day or something like that). Erick On 6/18/07, Antoine Baudoux <[EMAIL PROTECTED]> wrote: > > : Thats what i discovered. The question is : Is the ValueSourceQuery > : strong and fast enough

RE: Lucene for chinese search

2007-06-18 Thread Lee Li Bin
Hi, For indexing, there is no problem, there is Chinese text similar to my datasource (XML) in the index file when opening on a note pad. When I try to use the utf8 in jsp and, getbytes array of 'utf-8' or ISO88599_1 or Cp1252 in Java servlet, but we getting search problem, the search result doe

Re: FW: Lucene indexing vs RDBMS insertion.

2007-06-18 Thread Erick Erickson
I'll certainly be interested to see whether you can hit that number, it's pretty aggressive That said, you can also consider indexing in parallel and combining the results. That is, you can have N machines running on N subsets of the data. At the end, you can combine those indexes with IndexW

Re: Lucene Query

2007-06-18 Thread Erick Erickson
The problem with your code snippets are that they aren't plain Lucene API calls. I'm assuming that you've got your own classes that actually compile . There's nothing I can say about "what's going on" without knowing what your custom classes are doing. We need to know what analyzers you are

Re: Lucene for chinese search

2007-06-18 Thread Mathieu Lecarme
Lee Li Bin a écrit : > Hi, > > I still met problem for searching of Chinese words. > XMl file which is the datasource and analyzer has already been encoded. > Have testing on StandardAnalyzer, CJKAnalyzer, and ChineseAnalyzer, but it > still can't get any results. > > 1.do we need any encoding

RE: Lucene for chinese search

2007-06-18 Thread Lee Li Bin
Hi, I still met problem for searching of Chinese words. XMl file which is the datasource and analyzer has already been encoded. Have testing on StandardAnalyzer, CJKAnalyzer, and ChineseAnalyzer, but it still can't get any results. 1. do we need any encoding configuration in apache tomcat fo

Re: Using Lucene to search Multiple Databases

2007-06-18 Thread rajat mahajan
Hi, At presently I am using NUTCH. I'll try Solr once this is done and will get back to you anyways thanks a lot. Bye, Rajat Mahajan

RE: Using Lucene to search Multiple Databases

2007-06-18 Thread Ard Schrijvers
A search server based on lucene which is very easy to use and implement. I think you can use it to achieve what you want, Regards > > @Ard Schrijvers > > > What is this Solr > i didnt get you. will you please explain it.?? > ---

Re: Using Lucene to search Multiple Databases

2007-06-18 Thread rajat mahajan
@Ard Schrijvers What is this Solr i didnt get you. will you please explain it.??

RE: Using Lucene to search Multiple Databases

2007-06-18 Thread Ard Schrijvers
Hello Rajat, this sounds to me like something very suitable for Solr, Regards Ard > > > Rajat, > > I don't know about the Web Interface you are mentioning but > the task can be > done with a little bit coding from your side. > > I would suggest indexing each database in its own index which

Re: Several questions about scoring/sorting + random sorting in an image/related application

2007-06-18 Thread Antoine Baudoux
: Thats what i discovered. The question is : Is the ValueSourceQuery : strong and fast enough to be : used confidently in a production environment? I looked at the source as i mentioned, i'm not intimately familiar with the new ValueSourceQuery, but the FunctionQuery it's based on is certain

RE: FW: Lucene indexing vs RDBMS insertion.

2007-06-18 Thread Chew Yee Chuang
Thanks for your suggestion Erick. I'm planning to test the indexing soon. For your information, currently the system is inserting into RDBMS which is around 1000 records per seconds. Thus, if lucene in place, I would expect it will index that much of documents per seconds as well (Our target is 3.6

Lucene Query

2007-06-18 Thread Lee Li Bin
Hi, The following query, I am getting only the file path results. I have a field name 'text' in the index. May I know do I display the text file data? Is this the problem with the indexing or the query string? Creating Index: Document doc5 = new Document(); doc5.add(Field.UnIndexed("path