micah anderson wrote:
Kris Deugau <kdeu...@vianet.ca> writes:
There will only be one database and set of tables, but one of the fields
in each table is the user identifier. Fair warning - if you go full
per-user on a large system, this will MASSIVELY balloon the size of your
Bayes database, and most users will idle below the learning thresholds
for quite a long time.
Can you give an idea of the size calculation? I'm wanting to do this,
but I need to figure out how much space I need to allocate per user!
The SA docs estimate 5-10M per user for file-based per-user Bayes with
the default token expiry settings. I'd expect about the same in SQL,
with anywhere up to 3x bloat over time due to token churn. (Checking my
personal mailbox, I have just over 5M in bayes_tokens, but bayes_seen
has grown over time to 83M. However, the message-ids stored there
aren't being expired.)
Sitewide, with ~1.7M active tokens (expiry set at 2.1M currently), the
database occupies about 342M on disk here, with a 156M SQL dump. This
comes out to about 200 bytes per token of used storage. A single user
with default settings (and plenty of learning) will probably settle down
to somewhere between ~110K and ~140K tokens, so you can probably expect
their data to occupy anywhere from the minimal 5M on up to close to 30M.
Multiply by the number of users and that's what you would have to look
at provisioning for storage. Even at a minimal steady-state you're
likely looking at 100G for 20K users.
If you have more than a handful of users, you're probably better off
looking for ways to group your users with a small number of Bayes
datasets rather than full-on per-user. I haven't tried, but you might
be able to use bayes_sql_override_username in userprefs (also storable
in SQL) to assign users to a particular dataset, with a fallback to a
global default. The documentation reads to me like this should work
(note the last sentence):
bayes_sql_override_username
Used by BayesStore::SQL storage implementation.
If this options is set the BayesStore::SQL module will
override the set username with the value given. This could
be useful for implementing global or group bayes databases.
-kgd