Discussion:
Per user SA-Bayes tokens in SQL
(too old to reply)
Nick Rosier
2012-08-20 14:49:24 UTC
Permalink
Hi,

is there a way to configure per-user Bayes tokens for SA?

In my bayes config I've removed the bayes_sql_override_username. Initial
training when mail is scanned by Amavis/SpamAssassin still seems to
store the tokens as the amavis-user rather than the recipient.
Re-training is done with a custom script that configures the user
(sa-learn -u). Is it possible to configure Amavis so it passes the
recipient to SpamAssassin so it can store per-user Bayes information?

Rgds,
N.
n***@gmail.com
2012-08-24 09:38:17 UTC
Permalink
Post by Nick Rosier
Hi,
is there a way to configure per-user Bayes tokens for SA?
In my bayes config I've removed the bayes_sql_override_username. Initial
training when mail is scanned by Amavis/SpamAssassin still seems to
store the tokens as the amavis-user rather than the recipient.
Re-training is done with a custom script that configures the user
(sa-learn -u). Is it possible to configure Amavis so it passes the
recipient to SpamAssassin so it can store per-user Bayes information?
I have found this probably can be done by @sa_username_maps but the documentation on this is nearly non-existing. Anybody can give some information on how to configure this so all recipients have their own SA-DB.

N.
Mark.Martinec+ (Mark Martinec)
2012-08-30 16:57:58 UTC
Permalink
Nick,
Post by Nick Rosier
is there a way to configure per-user Bayes tokens for SA?
Since amavisd-new-2.7.0.

Note that this implies that for a multi-recipient message
SpamAssassin will need to be called more than once per message.
Post by Nick Rosier
In my bayes config I've removed the bayes_sql_override_username. Initial
training when mail is scanned by Amavis/SpamAssassin still seems to
store the tokens as the amavis-user rather than the recipient.
Re-training is done with a custom script that configures the user
(sa-learn -u). Is it possible to configure Amavis so it passes the
recipient to SpamAssassin so it can store per-user Bayes information?
amavisd-new-2.7.0 release notes:

- per-recipient (or per- policy bank) SpamAssassin SQL database usernames
are supported (setting @sa_username_maps, a policy.sa_username SQL field).
This makes it possible to implement per-user or per-user-group or
per-domain Bayes databases when SpamAssassin is configured to keep
its Bayes database on an SQL server. It also makes it possible to load
per-recipient SpamAssassin preferences (configurations) from an SQL
database (as described in a previous section).

Switching between Bayes usernames is cheap compared to switching between
SpamAssassin configuration files. A multi-recipient message whose
recipients map to different usernames will be checked by SpamAssassin
multiple times, once for each unique username;

Example:
@sa_username_maps = (
{ '***@example.com' => 'user1',
'***@example.com' => 'user2',
'.example.com' => 'user_ex',
}
);


Mark
Nick Rosier
2012-09-03 12:30:51 UTC
Permalink
Thanks Mark,
Post by Mark.Martinec+ (Mark Martinec)
Nick,
Post by Nick Rosier
is there a way to configure per-user Bayes tokens for SA?
Since amavisd-new-2.7.0.
Note that this implies that for a multi-recipient message
SpamAssassin will need to be called more than once per message.
Post by Nick Rosier
In my bayes config I've removed the bayes_sql_override_username. Initial
training when mail is scanned by Amavis/SpamAssassin still seems to
store the tokens as the amavis-user rather than the recipient.
Re-training is done with a custom script that configures the user
(sa-learn -u). Is it possible to configure Amavis so it passes the
recipient to SpamAssassin so it can store per-user Bayes information?
- per-recipient (or per- policy bank) SpamAssassin SQL database usernames
This makes it possible to implement per-user or per-user-group or
per-domain Bayes databases when SpamAssassin is configured to keep
its Bayes database on an SQL server. It also makes it possible to load
per-recipient SpamAssassin preferences (configurations) from an SQL
database (as described in a previous section).
Switching between Bayes usernames is cheap compared to switching between
SpamAssassin configuration files. A multi-recipient message whose
recipients map to different usernames will be checked by SpamAssassin
multiple times, once for each unique username;
@sa_username_maps = (
'.example.com' => 'user_ex',
}
);
This is the information I've found so far. It's only not very usefull if
you would like to consult an SQL-DB. I've got all local users defined in
my DB so I'd just like to do a 1 on 1 translation:
***@domain.com => ***@domain.com
***@domain.com => ***@domain.com
unknown-***@domain.com => NULL
***@external-domain.com => NULL

without having to define them all. Just a simple "select user from
mailbox" would suffice but I cannot find any documentation on how to do
that.

N.
Rob Sterenborg (lists)
2012-09-03 13:18:24 UTC
Permalink
Post by Nick Rosier
Hi,
is there a way to configure per-user Bayes tokens for SA?
In my bayes config I've removed the bayes_sql_override_username. Initial
training when mail is scanned by Amavis/SpamAssassin still seems to
store the tokens as the amavis-user rather than the recipient.
Re-training is done with a custom script that configures the user
(sa-learn -u). Is it possible to configure Amavis so it passes the
recipient to SpamAssassin so it can store per-user Bayes information?
I'm by no means an expert on the subject, and I only recently worked out
the below. I may have something wrong, but.. "hope this helps".


I setup Amavisd to use SQL for config:
(Sorry, you'll have to watch line wrapping!)

@lookup_sql_dsn =
( ['DBI:mysql:database={amavis_db_name};host={mysql_ip};port=3306',
'{amavis_mysql_user', '{amavis_mysql_pass}'] );

@storage_sql_dsn = @lookup_sql_dsn;

- In the policy table, the last field is called "sa_username". This is
the user that SA will be run as and will be used in the SA per-user SQL
config.
- Don't use the sa_userconf field. Leave it empty, set it to NULL,
whatever.
- In the users table you can put username (email address, email domain)
together with a policy_id (and priority), so that the correct policy is
used when an email is received.



For SA, I used in /etc/mail/spamassassin/{sql_filename}.conf:

bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn
DBI:mysql:{sa_db_name}:{mysql_ip};mysql_client_found_rows=0
bayes_sql_username {sa_mysql_user}
bayes_sql_password {sa_myqsl_pass}

When SA is called, it will create the user it is called with in the
bayes_vars table if it doesn't exist. The user's id will be used in the
bayes_token and bayes_seen tables when tokens or msgid's are added. This
all works automagically AFAICS.

If you want to manually expire tokens, you have to call "sa-learn
--force-expire" with -u {username} to specify the user you want to
expire tokens for.


I think this is as complete as I can be.


--
Rob
Mark.Martinec+ (Mark Martinec)
2012-09-03 14:50:08 UTC
Permalink
Nick,
Post by Nick Rosier
Post by Mark.Martinec+ (Mark Martinec)
@sa_username_maps = (
'.example.com' => 'user_ex',
}
);
This is the information I've found so far. It's only not very usefull if
you would like to consult an SQL-DB. I've got all local users defined in
without having to define them all. Just a simple "select user from
mailbox" would suffice but I cannot find any documentation on how to do
that.
In any @*_maps config setting you can use any lookup mechanisms
you want: hash, list, regexp, SQL, LDAP.

If you have records for each user in the amavis 'users' table
you can use the 'sa_username' field as Rob is suggesting.
These records can even be synthesised on the fly by a SELECT
clause and need not exist in a database.

For simple mappings like you describe you can use the right-hand-side
substitutions as offered by hash/list/regexp lookups,

e.g. using a hash-type lookup:

@sa_username_maps = (
{ '.example.com' => '$***@example.com' },
);

or using a regexp lookup:

@sa_username_maps = (
new_RE( [ qr'^(.*)@example\com$'i => '$***@example.com' ] ),
);




README.lookups:
REGULAR EXPRESSION LOOKUPS

The pattern allows for capturing of parenthesized substrings, which can
then be referenced from the result string using the $1, $2, ... notation,
as with the Perl m// operator. The number after a $ may be a multi-digit
decimal number. To avoid possible ambiguity the ${n} or $(n) form may be used.
Substring numbering starts with 1. Nonexistent references evaluate to empty
strings. If any substitution is done, the result inherits the taintedness
of the key. Keep in mind that $ and @ characters needs to be backslash-quoted
in qq() strings. Example:
$virus_quarantine_to = new_RE(
[ qr'^(.*)@example\.com$'i => 'virus-${1}@example.com' ],
[ qr'^(.*)(@[^@]*)?$'i => 'virus-${1}${2}' ] );


Similar to $1, $2, ... rhs replacements in a regexp-based lookups,
a couple of these ($1 .. $5) is also simulated and provided by
a hash-type lookup:

# the rhs replacement strings are similar to what would be obtained
# by lookup_re() given the following regular expression:
# /^( ( ( [^\@]*? ) ( \Q$delim\E [^\@]* )? ) (?: \@ (.*) ) )$/xs
my $rhs = [ # a list of right-hand side replacement strings
$addr, # $1 = User+***@Sub.Example.COM
$saved_full_localpart, # $2 = User+Foo
$localpart, # $3 = user (lc if localpart_is_case_sensitive)
$extension, # $4 = +foo (lc if localpart_is_case_sensitive)
$domain, # $5 = sub.example.com (lowercased unconditionally)
];


Mark

Loading...