Re: mk_meta_rule_scores - does it work correctly?:)

Kris Deugau Wed, 15 Mar 2017 14:40:53 -0700

Marcin Mirosław wrote:

W dniu 2017-03-14 16:23, Kris Deugau napisał(a):

If I read the information flow correctly, this is actually decided by
seek-phrases-in-log, which spits out subrules that reached a certain
hit rate in blocks, followed by the "# passed hit-rate threshold nnn"
line. mk_meta_rule_scores just takes that in, collects the rule names
in each block, and spits out the meta.



I made some tests and watch how output looks to understand how some
paremeters works. Meseems that "--reqhitrate" works in this way:
a) if --reqhitrate contains only one value then output od
seek-phrases-in-log contains only rules that hits more than value passed
to --reqhitrate. So this cuts off rules that are hitted rarely

b) if --reqhitrate contains more than one value then:
<high cut off level equal to higher value passed to --reqhitrate> rules
<second value> other rules <low cut off>
example:
--reqhitrate "70 10 1" gives:
<100%> - no rules here - <70%> - rules that matches less than 70% of
spam - <10%> rules that matches less than 10% of spam and more than 1% -
<1%> - cut off, no rules here

You got me curious about exactly what this means, and we're both right,just describing it differently.

For this usage, seek-phrases-in-log does roughly the following (formultiple values to --reqhitrate):


for each pattern
  determine the percentage of spam it hits
  discard the pattern if it's less than the lowest threshold
sort patterns by hit percentage, highest first
for each pattern
  if the hit rate has passed the next threshold
    print flag line "# passed hit-rate threshold <threshold>"
    advance to next threshold
  print a line "# 1.0 <percentage> 0"
  print the subrule

So if you call it with --reqhitrate "50 0.1 5 20 1" on the right set ofspam, you might get an intermediate file containing:


body __FOO1 /foo1/
# passed hit-rate threshold: 50
body __FOO2 /foo2/
body __FOO3 /foo3/
# passed hit-rate threshold: 20
body __FOO4 /foo4/
body __FOO5 /foo5/
body __FOO6 /foo6/
# passed hit-rate threshold: 5
body __FOO7 /foo7/
# passed hit-rate threshold: 1
body __FOO8 /foo8/

(Plus the additional "# 1.0 <percentage> 0" comment lines that are justnoise at this point.)

mk_meta_rule just takes those groups of rules, separated by the "#passed hit-rate..." lines, and builds FOO_1, FOO_2, F00_3 etc meta rules- 5 of them in this case. mk_meta_rules itself is a pretty simplemindedscript; most of the heavy lifting is done in seek-phrases-in-log.

However, if the message data you feed in doesn't separate out to produceat least one rule in each range, between seek-phrases-in-log andmk_meta_rule it will happily create an empty meta rule:


# passed hit-rate threshold: 50
body __FOO1 /foo1/
body __FOO2 /foo2/
body __FOO3 /foo3/
# passed hit-rate threshold: 20
body __FOO4 /foo4/
body __FOO5 /foo5/
body __FOO6 /foo6/
# passed hit-rate threshold: 5
# passed hit-rate threshold: 1
body __FOO7 /foo7/
body __FOO8 /foo8/

With the above output from seek-phrases-in-log, mk_meta_rule will create"meta FOO_1 ()" and "meta FOO_4 ()", since there are no patterns in thefirst or fourth groups (100% to 50% and 5% to 1%, by your description).It also scores these empty metas at 0 for tidiness - after all, they'llnever fire.

In your case, to look at your original question, none of the derivedpatterns matched more than 70% of the the spam set you fed in, so the _1rule was empty.

Have you tried to use mk_meta_rule_scores and did I get more values of
scores than two? The default and the value in medium range. I suspect
that mk_meta_rule_scores doesn't play well with ranges. It is something
that I can live with it but if somewhere is bug I would try report it.
If it will not be fixed it can save some time of other users trying to
use this scipt.

Try a set of numbers closer together (eg, "10 7 4 1"), and I'd suggestnot using high percentages as it's very unlikely you'll see results inthe highest group even with a narrowly targeted set of spam, or if youonly have a very small number of nearly identical spams.


-kgd

Re: mk_meta_rule_scores - does it work correctly?:)

Reply via email to