Hi Hossman,
Thanks for your reply.when i index the search fields in my
lucene document,it occupy 20% of the original size.how can i reduce the
reduce the index size.
hossman_lucene wrote:
>
>
> : I need to store all the attributes of the document i index as part of
> the
> : inde
Ok.. thanks, I have tried to index address field as UN_TOKENIZED and search
using above query, its return Nothing, How can I specified " NOT tokenize"
in query..
--Thanks,
On 6/18/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
Phrase queries won't help you here
Your particular issue can be
Thanks for the sharing and suggestion.
Yes Chris, the index is to be partitioned by date time, and old index will
not be access so frequent.
I also did consider indexing in parallel to different index as well Erick.
But I can only put all index in ONE machine and there is only ONE machine to
pro
: > And (most) spammers, which is really the point of requiring a
: > profile.
:
: I believe this is called "throwing the baby out with the bath water."
you obviously haven't seen the amount of spam that the apache wikis use to
get ... The account creation form currently asks you a simple qu
On Tuesday 19 June 2007 11:03:25 Erik Hatcher wrote:
> > Good way to discourage potential contributors I suppose.
>
> And (most) spammers, which is really the point of requiring a
> profile.
I believe this is called "throwing the baby out with the bath water."
Daniel
--
Daniel Noll
Nuix Pt
On Jun 18, 2007, at 8:59 PM, Daniel Noll wrote:
On Tuesday 19 June 2007 00:24:39 Steven Rowe wrote:
In order to edit wiki pages, you must create a profile and be
logged in.
Click on the "Login" link in the upper right hand of the front
page, to
the left of the Search box.
Fill out the f
On Tuesday 19 June 2007 00:24:39 Steven Rowe wrote:
> In order to edit wiki pages, you must create a profile and be logged in.
>
> Click on the "Login" link in the upper right hand of the front page, to
> the left of the Search box.
>
> Fill out the form that comes up, and click on the "Create Prof
Take a look at LingPipe
(http://alias-i.com/lingpipe/).
--- "Mordo, Aviran (EXP N-NANNATEK)"
<[EMAIL PROTECTED]> wrote:
> Any one knows of a content summarization library. I
> need to display a
> summarized version of the document, not snippets of
> text like the
> highlighter, but actually a s
: I had tried with Explaination but didn't get the desired results.Can you
: give me the brief demo code based on the result order by the no of matching
: terms .
the Explanation class will not change your scores to give you results in
any particular way you might want -- it just explains what fa
: Another good old trick is to index field values (tokenized) with
: appended special starting and ending tokens, e.g. instead of "Hiran
: Magri" use "_start_ Hiran Magri _end_". Then you can query for fields
: that are exactly equal to a phrase, while still retaining the
: possibility to search b
: > for the "recentness" aspect a
: > ValueSourceQuery composed on a ReverseOrdFieldSource should take
: I have a problem with this solution : Document ordering is different
: from Recentness :
: If i upload 1000 images now, they should have the same "recentness",
: even if their order is very di
Don't they differ in tokenization? One of them uses grams, the other
does not. Or? That would be another thing that might mess it up. But
then I never looked at the highlighter, so I can only guess.
--
karl
18 jun 2007 kl. 22.37 skrev Chris Lu:
Hi, Karl,
Thanks for sharing this experience
Hi, Karl,
Thanks for sharing this experience.
I did find CJKAnalyzer somehow behaves differently than
ChineseAnalyzer. When trying to highlight the matched term,
ChineseAnalyzer didn't work somehow. But I didn't investigate into it.
This is a useful clue for it.
--
Chris Lu
---
A year or two ago I hacked Lucene to use UTF16 instead of UTF8 as CJK
characters are represented by 3 bytes with UTF8, and 2 bytes as
UTF16. It is a simple hack.
It did however not save me that much as I had a mixed latin and CJK
corpus, and I reverted. Still think it is something worth
c
It's not so far from Lucene!
http://en.wikipedia.org/wiki/Sentence_extraction
have a look at wordnet (http://wordnet.princeton.edu/).
Get some list of articles, verb, nouns, and affix rules (like aspell,
myspell ...)
You will use more cooking rules than code.
M.
Le 18 juin 07 à 20:29, Mordo,
Any one knows of a content summarization library. I need to display a
summarized version of the document, not snippets of text like the
highlighter, but actually a summary of the document.
Thanks
Aviran
-
To unsubscribe, e-mail:
Definitely very aggressive.
Currently my experience is that, together with database access,
DBSight can do 3 million in 2 hours, with Pentium D 3.4Hz. Seems you
definitely need some good hardware, and a fast hard drive for this. I
feel the hard drive is actually the bottleneck for large indexes.
Basically where ever you see, the encoding should be utf8.
The servlet also has an encoding setting. For your case, change the
tomcat setting.
When rendering jsp page, the encoding also matters.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
s
Hi Hoss,
I had tried with Explaination but didn't get the desired results.Can you
give me the brief demo code based on the result order by the no of matching
terms .
Thanks,
Yatin
- Original Message -
From: "Chris Hostetter" <[EMAIL PROTECTED]>
To:
Sent: 16, 06, 2007 8:47 AM
Subject: Re
Hi Daniel,
Daniel Noll wrote:
> On Saturday 16 June 2007 11:39:35 Chris Hostetter wrote:
>> : The mailing list has already answered this question dozens of times.
>> : I've been wondering lately, does this list have a FAQ? If so, is this
>> : question on it?
>>
>> The wiki is open to editing by
Erick Erickson wrote:
Phrase queries won't help you here
Your particular issue can be addressed, but I'm not sure it's a
reasonable long-term solution
If you indexed your address field as UN_TOKENIZED, and
did NOT tokenize your query, it should give you what you want.
What's happening i
Phrase queries won't help you here
Your particular issue can be addressed, but I'm not sure it's a
reasonable long-term solution
If you indexed your address field as UN_TOKENIZED, and
did NOT tokenize your query, it should give you what you want.
What's happening is that StandardAnalyzer
Hello everyone,
I am lucene user and tried to implement pharse query, But now getting some
logical problems in searching..
My index have 4 fields: Name, Address & City and 6 docs.
i.e 1. "Laxmilal Menaria", "Hiran Magri", "Udaipur",
2. "Mohan Sharma", "Hiran Magri Sec 10", "Udaipur"
Good point.
You could also think about just storing the date with the appropriate
resolution (e.g. day or something like that).
Erick
On 6/18/07, Antoine Baudoux <[EMAIL PROTECTED]> wrote:
>
> : Thats what i discovered. The question is : Is the ValueSourceQuery
> : strong and fast enough
Hi,
For indexing, there is no problem, there is Chinese text similar to my
datasource (XML) in the index file when opening on a note pad.
When I try to use the utf8 in jsp and, getbytes array of 'utf-8' or
ISO88599_1 or Cp1252 in Java servlet, but we getting search problem, the
search result doe
I'll certainly be interested to see whether you can hit that number, it's
pretty aggressive
That said, you can also consider indexing in parallel and combining the
results. That is, you can have N machines running on N subsets of
the data. At the end, you can combine those indexes with
IndexW
The problem with your code snippets are that they aren't plain Lucene
API calls. I'm assuming that you've got your own classes that
actually compile . There's nothing I can say about "what's
going on" without knowing what your custom classes are doing.
We need to know what analyzers you are
Lee Li Bin a écrit :
> Hi,
>
> I still met problem for searching of Chinese words.
> XMl file which is the datasource and analyzer has already been encoded.
> Have testing on StandardAnalyzer, CJKAnalyzer, and ChineseAnalyzer, but it
> still can't get any results.
>
> 1.do we need any encoding
Hi,
I still met problem for searching of Chinese words.
XMl file which is the datasource and analyzer has already been encoded.
Have testing on StandardAnalyzer, CJKAnalyzer, and ChineseAnalyzer, but it
still can't get any results.
1. do we need any encoding configuration in apache tomcat fo
Hi,
At presently I am using NUTCH. I'll try Solr once this is done
and will get back to you
anyways thanks a lot.
Bye,
Rajat Mahajan
A search server based on lucene which is very easy to use and implement. I
think you can use it to achieve what you want,
Regards
>
> @Ard Schrijvers
>
>
> What is this Solr
> i didnt get you. will you please explain it.??
>
---
@Ard Schrijvers
What is this Solr
i didnt get you. will you please explain it.??
Hello Rajat,
this sounds to me like something very suitable for Solr,
Regards Ard
>
>
> Rajat,
>
> I don't know about the Web Interface you are mentioning but
> the task can be
> done with a little bit coding from your side.
>
> I would suggest indexing each database in its own index which
: Thats what i discovered. The question is : Is the ValueSourceQuery
: strong and fast enough to be
: used confidently in a production environment? I looked at the source
as i mentioned, i'm not intimately familiar with the new
ValueSourceQuery,
but the FunctionQuery it's based on is certain
Thanks for your suggestion Erick. I'm planning to test the indexing soon.
For your information, currently the system is inserting into RDBMS which is
around 1000 records per seconds. Thus, if lucene in place, I would expect it
will index that much of documents per seconds as well (Our target is 3.6
Hi,
The following query, I am getting only the file path results. I have a field
name 'text' in the index. May I know do I display the text file data?
Is this the problem with the indexing or the query string?
Creating Index:
Document doc5 = new Document();
doc5.add(Field.UnIndexed("path
36 matches
Mail list logo