micah anderson wrote:
Kris Deugau <kdeu...@vianet.ca> writes:

There will only be one database and set of tables, but one of the fields
in each table is the user identifier.  Fair warning - if you go full
per-user on a large system, this will MASSIVELY balloon the size of your
Bayes database, and most users will idle below the learning thresholds
for quite a long time.

Can you give an idea of the size calculation? I'm wanting to do this,
but I need to figure out how much space I need to allocate per user!

The SA docs estimate 5-10M per user for file-based per-user Bayes with the default token expiry settings. I'd expect about the same in SQL, with anywhere up to 3x bloat over time due to token churn. (Checking my personal mailbox, I have just over 5M in bayes_tokens, but bayes_seen has grown over time to 83M. However, the message-ids stored there aren't being expired.)

Sitewide, with ~1.7M active tokens (expiry set at 2.1M currently), the database occupies about 342M on disk here, with a 156M SQL dump. This comes out to about 200 bytes per token of used storage. A single user with default settings (and plenty of learning) will probably settle down to somewhere between ~110K and ~140K tokens, so you can probably expect their data to occupy anywhere from the minimal 5M on up to close to 30M. Multiply by the number of users and that's what you would have to look at provisioning for storage. Even at a minimal steady-state you're likely looking at 100G for 20K users.

If you have more than a handful of users, you're probably better off looking for ways to group your users with a small number of Bayes datasets rather than full-on per-user. I haven't tried, but you might be able to use bayes_sql_override_username in userprefs (also storable in SQL) to assign users to a particular dataset, with a fallback to a global default. The documentation reads to me like this should work (note the last sentence):

       bayes_sql_override_username
           Used by BayesStore::SQL storage implementation.

           If this options is set the BayesStore::SQL module will
           override the set username with the value given.  This could
           be useful for implementing global or group bayes databases.

-kgd

Reply via email to