Multi Field search without Multifieldqueryparser

2008-09-21 Thread Anshul jain
Hi!

I've a lucene document structured like:
Field: Text
name: George Bush
Sex: Male
Occupation: President of USA

Now I can have two types of queries:
Structured query:
name: George Bush AND Occupation: President

Unstructured Query:
George Bush AND President.

After parsing it will become, value: George bush and president.
"value" is some default field that has to defined during parsing.

But as you can see that this unstructured query would not work because
of the structure of the lucene document. Now what I want to do is that
when an user gives an Unstructured query Lucene should search in all
fields. (Multi field query parser is an option but we have to define
all the fields first, and it can be expensive as the query can get
really big).

I would really appreciate if you can help me out with this.

Regards,
Anshul Jain

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene Front-end match

2008-09-21 Thread 叶双明
Thanks Matthew Hall for two helpfull response!
I have used Luke, but hasn't use this power, thanks.

I want to parser something like: "a b*".

I think I have to use WildcardQuery and BooleanQuery.

2008/9/19 Matthew Hall <[EMAIL PROTECTED]>

> To be more specific (just in case you are new to lucene)
>
> Your Query:
>
> Query query = qp.parse("bbb:\"b*\" AND ccc:\"cc*\"");
>
> What I think you actually want here:
>
> Query query = qp.parse("bbb:b* AND ccc:cc*");
>
> Give it a shot, and then like I said, go get Luke, it will help you
> tremendously ^^
>
> Matthew Hall wrote:
>
>> The reason the wildcard is being dropped is because you have wrapped it in
>> a phrase query.  Wildcards are not supported in Pharse Queries.  At least
>> not in any Analyzers that I'm aware of.
>>
>> A really good tool to see the transformations that happen to a query is
>> Luke, open it up against your index, go into the search section, choose the
>> analyzer you use and start playing around.
>>
>> This has helped me countless times when creating I'm my own queries and
>> not getting the results that I expect.
>>
>> -Matt
>>
>> 叶双明 wrote:
>>
>>> I am sorry, just put the string to QueryParser.
>>> But what make me confusing the code:
>>>
>>> Query query = qp.parse("bbb:\"b*\" AND ccc:\"cc*\"");
>>>
>>> doesn't work as i have expected. It drop the Wildcard *.
>>>
>>>
>>> 2008/9/19, 叶双明 <[EMAIL PROTECTED]>:
>>>
>>>
 Thanks!

 Now, I just use Query query = qp.parse("a*"); and  meet the my
 requirements.

 Another question: how to parser query string like:   title:"The
 Right Way" AND text:go
 please show me in java code. thanks.

 2008/9/19 Karl Wettin <[EMAIL PROTECTED]>



> 19 sep 2008 kl. 11.05 skrev 叶双明:
>
> Document>
>
>
>> Document>
>>
>> How can I get the first Document buy some query string like "a" , "ab"
>> or
>> "abc" but no "b" and "bc"?
>>
>>
>>
> You would create an ngram filter that create grams from the first
> position
> only. Take a look at EdgeNGramTokenFilter in contrib/analyzers.
>
>
>karl
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
 --
 Sorry for my English!! 明
 Please help me correct my English expression and error in syntax



>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> --
> Matthew Hall
> Software Engineer
> Mouse Genome Informatics
> [EMAIL PROTECTED]
> (207) 288-6012
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
Sorry for my English!! 明
Please help me correct my English expression and error in syntax


StandardTokenizer and Korean grouping with alphanum

2008-09-21 Thread Daniel Noll

Hi all.

I have a question about Korean tokenisation.  Currently there is a rule 
in StandardTokenizerImpl.jflex which looks like this:


ALPHANUM   = ({LETTER}|{DIGIT}|{KOREAN})+

I'm wondering if there was some good reason why it isn't:

ALPHANUM   = (({LETTER}|{DIGIT})+|{KOREAN}+)

Basically I'm seeing some tokens come back with mixed digits and Hangul, 
and I'm questioning the correctness of that.


Disclaimer: we're not performing any further processing of Korean in 
subsequent filters at the current point in time, and I don't know the 
language either.


Daniel


--
Daniel NollForensic and eDiscovery Software
Senior Developer  The world's most advanced
Nuixemail data analysis
http://nuix.com/and eDiscovery software

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]