Thanks for the suggestion: I had already considered that solution (first link), but the fear is having to JOIN large tables with hundreds of millions of records.
For my understanding **using JOIN when dealing with big data is bad and a nightmare for performance**: can you confirm? Or am I missing something? That tables would be frequently read and updated and are the core of the application: that also means that every update on a user would produce **many dead rows** - not just 1 user row, as in the case of JSON, but many rows in the user metadata table. On Fri, Dec 4, 2020 at 11:40 PM Nick Cleaton <n...@cleaton.net> wrote: > On Fri, 4 Dec 2020 at 15:39, Marco Colli <collimarc...@gmail.com> wrote: > >> Hello! >> >> We have a multi-tenant service where each customer has millions of users >> (total: ~150M rows). Now we would like to let each customer define some >> custom columns for his users and then let the customer search his users >> efficiently based on these columns. >> >> This problem seems really hard to solve with PostgreSQL: >> >> https://stackoverflow.com/questions/5106335/how-to-design-a-database-for-user-defined-fields >> >> In particular the easiest way would be to add a JSON field on the users >> table (e.g. user metadata). However the PostgreSQL GIN index only supports >> exact matches and not range queries. This means that a query on a range >> (e.g. age > 30) would be extremely inefficient and would result in a table >> scan. >> > > You could have a table of (tenant, customer, setting_name, setting_value) > so that a btree index on (tenant, setting_name, setting_value) would work > for "select customer from my_table where tenant=$1 and setting_name='age' > and setting_value > 30" > > That doesn't deal with setting values having a variety of types, but you > could have a distinct user defined settings table for each setting value > type that you want to support. > >