Re: Can Lucene be used as Rules Engine?

2020-01-23 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
> Now, I have another requirement which is reverse of above requirement.

I might be wrong but that smells like luwac (available in 8.2.0) 

check out: 

https://issues.apache.org/jira/browse/LUCENE-8766

it should allow you to index the rules and then use the document as a query to 
retrieve the rules that match with your document.

If you want to match the document only if at least one condition is matched you 
can just encode the conditions of a rule in OR and it might work..

please note I'm not sure about this (I never used luwac :)) maybe somebody in 
the list can comment more :)

cheers
Diego


Sent from Bloomberg Professional for Android

- Original Message -
From: Mikhail Khludnev 
At: 23-Jan-2020 07:42:47


Hello, Kart.
I still don't fully get the problem. But usually implementing Rule Engine
requires to use
https://lucene.apache.org/core/7_3_1/sandbox/org/apache/lucene/search/CoveringQuery.html
which
check number of rule clauses in a dedicated field.

On Thu, Jan 23, 2020 at 12:12 AM Karthick Sundaram
 wrote:

> Gentlemen:
>
>
>
> I am using Lucene as search engine for the below requirement:
>
>
>
> Millions of documents (text files) are there.
>
> Each text file has thousands of words (plain Strings with space separated).
>
> Example content of a text file 1 (just showing few words): 0001AAA 0001AAB
> 0001AAC 0061000 PSBP06 MFBP05 ...
>
> Example content of a text file 2 (just showing few words): 0001AAX 0001AAB
> 0001AAN 0061002 PSBP07 MFBP06 ...
>
>
>
> Then there are millions of rules captured in the database. For easy
> understanding, I specify couple of rules below:
>
>
>
> Rule 1:
>
> CONDITION 1: WITH: 0001AAA OR 0001AAC
>
> CONDITION 2: WITH: PSBP06 OR PSBP07
>
> CONDITION 3: WITH: MFBP05
>
>
>
> Rule 2:
>
> CONDITION 1: WITH: 0001AAN OR 0001AAC
>
> CONDITION 2: WITH: PSBP06
>
> CONDITION 3: WITH: PSBP08
>
> CONDITION 4: NOT WITH: MFBP05
>
>
>
> Requirement is, for a given rule, find the text files matching at least one
> word in each condition of the rule
>
> I indexed the contents of each text file as a Lucene document with a Field
> "FileContents" and another field to just store the file name
>
> So, for the Rule 1, I constructed query as (0001AAA OR 0001AAC) AND (PSBP06
> OR PSBP07) AND (MFBP05)
>
> And for Rule 2, the query is (0001AAN OR 0001AAC) AND (PSBP06) AND (PSBP08)
> AND NOT (MFBP05).
>
>
>
> Queries are working and able to find the appropriate text files.
>
>
>
> Now, I have another requirement which is reverse of above requirement.
>
> i.e., For the given text file, I need to find the list of Rules that can
> match.
>
> Example: For the text file 1, the "Rule 1" should match, because the text
> file 1 has 0001AAA which satisfies condition 1, PSBP06 will satisfies
> condition 2, MFBP05 will satisfy condition 3.
>
> Rule 1 has 3 conditions and at least one word in each condition matches for
> text file 1. So Rule 1 is good for text file 1.
>
> Rule 2 should not match for text file 1 because PSBP08 is not there in it.
>
>
>
> I don't know whether i can index the "Rule" information in Lucene. A rule
> can have 1 or more conditions, so I can't use fixed number of Fields to
> query on. Even if there are fixed number of fields, the query has to check
> for each field to match at least one word.
>
> Is it possible to handle this requirement using Lucene? or should I go for
> other options?
>
> I am new to Lucene, any help would be appreciated.
>
>
>
> Thanks,
>
> Kart
>
>

--
Sincerely yours
Mikhail Khludnev


RE: Can Lucene be used as Rules Engine?

2020-01-23 Thread Karthick Sundaram
Luwak (stored query engine, allowing users to efficiently match a stream of 
documents against a large set of queries) seems to be the right candidate for 
my requirement.

Thanks for pointing out this to me. I will dig more about this.

Thanks,
Kart

-Original Message-
From: Diego Ceccarelli (BLOOMBERG/ LONDON) [mailto:dceccarel...@bloomberg.net] 
Sent: Thursday, January 23, 2020 3:22 AM
To: java-user@lucene.apache.org
Subject: Re: Can Lucene be used as Rules Engine?

> Now, I have another requirement which is reverse of above requirement.

I might be wrong but that smells like luwac (available in 8.2.0) 

check out: 

https://issues.apache.org/jira/browse/LUCENE-8766

it should allow you to index the rules and then use the document as a query to 
retrieve the rules that match with your document.

If you want to match the document only if at least one condition is matched you 
can just encode the conditions of a rule in OR and it might work..

please note I'm not sure about this (I never used luwac :)) maybe somebody in 
the list can comment more :)

cheers
Diego


Sent from Bloomberg Professional for Android

- Original Message -
From: Mikhail Khludnev 
At: 23-Jan-2020 07:42:47


Hello, Kart.
I still don't fully get the problem. But usually implementing Rule Engine 
requires to use 
https://lucene.apache.org/core/7_3_1/sandbox/org/apache/lucene/search/CoveringQuery.html
which
check number of rule clauses in a dedicated field.

On Thu, Jan 23, 2020 at 12:12 AM Karthick Sundaram 
 wrote:

> Gentlemen:
>
>
>
> I am using Lucene as search engine for the below requirement:
>
>
>
> Millions of documents (text files) are there.
>
> Each text file has thousands of words (plain Strings with space separated).
>
> Example content of a text file 1 (just showing few words): 0001AAA 
> 0001AAB 0001AAC 0061000 PSBP06 MFBP05 ...
>
> Example content of a text file 2 (just showing few words): 0001AAX 
> 0001AAB 0001AAN 0061002 PSBP07 MFBP06 ...
>
>
>
> Then there are millions of rules captured in the database. For easy 
> understanding, I specify couple of rules below:
>
>
>
> Rule 1:
>
> CONDITION 1: WITH: 0001AAA OR 0001AAC
>
> CONDITION 2: WITH: PSBP06 OR PSBP07
>
> CONDITION 3: WITH: MFBP05
>
>
>
> Rule 2:
>
> CONDITION 1: WITH: 0001AAN OR 0001AAC
>
> CONDITION 2: WITH: PSBP06
>
> CONDITION 3: WITH: PSBP08
>
> CONDITION 4: NOT WITH: MFBP05
>
>
>
> Requirement is, for a given rule, find the text files matching at 
> least one word in each condition of the rule
>
> I indexed the contents of each text file as a Lucene document with a 
> Field "FileContents" and another field to just store the file name
>
> So, for the Rule 1, I constructed query as (0001AAA OR 0001AAC) AND 
> (PSBP06 OR PSBP07) AND (MFBP05)
>
> And for Rule 2, the query is (0001AAN OR 0001AAC) AND (PSBP06) AND 
> (PSBP08) AND NOT (MFBP05).
>
>
>
> Queries are working and able to find the appropriate text files.
>
>
>
> Now, I have another requirement which is reverse of above requirement.
>
> i.e., For the given text file, I need to find the list of Rules that 
> can match.
>
> Example: For the text file 1, the "Rule 1" should match, because the 
> text file 1 has 0001AAA which satisfies condition 1, PSBP06 will 
> satisfies condition 2, MFBP05 will satisfy condition 3.
>
> Rule 1 has 3 conditions and at least one word in each condition 
> matches for text file 1. So Rule 1 is good for text file 1.
>
> Rule 2 should not match for text file 1 because PSBP08 is not there in it.
>
>
>
> I don't know whether i can index the "Rule" information in Lucene. A 
> rule can have 1 or more conditions, so I can't use fixed number of 
> Fields to query on. Even if there are fixed number of fields, the 
> query has to check for each field to match at least one word.
>
> Is it possible to handle this requirement using Lucene? or should I go 
> for other options?
>
> I am new to Lucene, any help would be appreciated.
>
>
>
> Thanks,
>
> Kart
>
>

--
Sincerely yours
Mikhail Khludnev


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Lucene 8 early termination

2020-01-23 Thread Wei
Hi,

I am excited to see Lucene 8 introduced BlockMax WAND as a major speed
improvement https://issues.apache.org/jira/browse/LUCENE-8135.  My question
is, how does it integrate with facet request,  when the numFound won't be
exact? I did some search but haven't found any documentation on this. Any
pointer is greatly appreciated.

Best,
Wei


Re: Lucene 8 early termination

2020-01-23 Thread Uwe Schindler
Hi,

There is no support with calculating facets, because the counts can't be 
optimized with wand or blockmax.

The general recommendation is to execute facets/aggregations in separate 
Elasticsearch or Solr requests (e.g. using AJAX on your website). The display 
of search results would be instant and facets coming later. Doing that in the 
same request or separately does not really matter for performance. So I'd 
always recommend to do it separately if you can do that in your user interface.

Uwe

Am January 23, 2020 6:13:29 PM UTC schrieb Wei :
>Hi,
>
>I am excited to see Lucene 8 introduced BlockMax WAND as a major speed
>improvement https://issues.apache.org/jira/browse/LUCENE-8135.  My
>question
>is, how does it integrate with facet request,  when the numFound won't
>be
>exact? I did some search but haven't found any documentation on this.
>Any
>pointer is greatly appreciated.
>
>Best,
>Wei

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de