Re: Need help in database design

Divyansh Gupta JNsThMAudy Mon, 23 Dec 2024 11:12:01 -0800

Thank you everyone for giving your valuable responses, I am glad that
everyone understands my concern. I got some good ideas about the database
design that I am following after going through some stress testing I will
implement the same.


Thank you so much Everyone

On Tue, 24 Dec 2024, 12:09 am Ron Johnson, <ronljohnso...@gmail.com> wrote:

> Are these columns really unique for all 20M rows that a userid can have in
> the table?  I'm dubious.
>
> Split a LOT of those columns out into a separate table named "user" with
> PK userid.  It'll save a huge amount of disk space, and speed up queries by
> not having to fetch it all every time.
>
> useremail varchar(600) NOT NULL,
> title public.citext NULL,
> authorname varchar(600) NULL,
> authoremail varchar(600) NULL,
> updated varchar(300) NOT NULL,
> entryid varchar(2000) NOT NULL,
> lastmodifiedby varchar(600) NULL,
> lastmodifiedbyemail varchar(600) NULL,
> "size" varchar(300) NULL,
> contenttype varchar(250) NULL,
> fileextension varchar(50) NULL,
> docfoldername public.citext NULL,
> folderresourceid public.citext NULL,
> filesize int8 DEFAULT 0 NOT NULL,
> retentionstatus int2 DEFAULT 0 NOT NULL,
> docfileref int8 NULL,
> usid int4 NULL,
> archivepath varchar(500) NULL,
> createddate timestamp(6) DEFAULT NULL::timestamp without time zone NULL,
> zipfilename varchar(100) NULL,
> oncreatedat timestamp(6) DEFAULT clock_timestamp() NOT NULL,
> onupdateat timestamp(6) DEFAULT clock_timestamp() NOT NULL,
> startsnapshot int4 DEFAULT 0 NOT NULL,
> currentsnapshot int4 DEFAULT 0 NOT NULL,
> dismiss int2 DEFAULT 0 NOT NULL,
> checksum varchar NULL,
> typeoffile int2 GENERATED ALWAYS AS (
>
>
>
> On Mon, Dec 23, 2024 at 1:32 PM Divyansh Gupta JNsThMAudy <
> ag1567...@gmail.com> wrote:
>
>> Currently I haven't created those columns , I have created addons_json
>> column which is a JSONB column yet in a discussion weather I should create
>> or consider only one JSONB column.
>>
>> On Tue, 24 Dec 2024, 12:00 am Divyansh Gupta JNsThMAudy, <
>> ag1567...@gmail.com> wrote:
>>
>>> Range partition can help when you applies filter for a specific range
>>> but in my case I need to apply filter on userid always, however I have date
>>> columns but there is less variation in timestamp which I have that's why
>>> didn't go for range partition.
>>>
>>> On Mon, 23 Dec 2024, 11:57 pm Ron Johnson, <ronljohnso...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> 1. I bet you'd get better performance using RANGE partitioning.
>>>> 2. Twenty million rows per userid is a *LOT*.  No subdivisions (like
>>>> date range)?
>>>>
>>>> On Mon, Dec 23, 2024 at 1:23 PM Divyansh Gupta JNsThMAudy <
>>>> ag1567...@gmail.com> wrote:
>>>>
>>>>> Adrian, Please check this out;
>>>>>
>>>>> PARTITION BY HASH (userid); CREATE TABLE dbo.
>>>>> googledocs_tbl_clone_part_0 PARTITION OF dbo.googledocs_tbl_clone FOR
>>>>> VALUES WITH (modulus 84, remainder 0); ... CREATE TABLE dbo.
>>>>> googledocs_tbl_clone_part_83 PARTITION OF dbo.googledocs_tbl_clone FOR
>>>>> VALUES WITH (modulus 84, remainder 83);
>>>>>
>>>>> On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy <
>>>>> ag1567...@gmail.com> wrote:
>>>>>
>>>>>> Adrian, the partition is on userid using hash partition with 84
>>>>>> partitions
>>>>>>
>>>>>> Ron, there could be more than 20 Million records possible for a
>>>>>> single userid in that case if I create index on userid only not on other
>>>>>> column the query is taking more than 30 seconds to return the results.
>>>>>>
>>>>>> On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <ronljohnso...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> If your queries all reference userid, then you only need indices on
>>>>>>> gdid and userid.
>>>>>>>
>>>>>>> On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <
>>>>>>> ag1567...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I have one confusion with this design if I opt to create 50 columns
>>>>>>>> I need to create 50 index which will work with userid index in Bitmap 
>>>>>>>> on
>>>>>>>> the other hand if I create a JSONB column I need to create a single 
>>>>>>>> index ?
>>>>>>>>
>>>>>>>> On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnso...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Given what you just wrote, I'd stick with 50 separate t* columns.
>>>>>>>>> Simplifies queries, simplifies updates, and eliminates JSONB 
>>>>>>>>> conversions.
>>>>>>>>>
>>>>>>>>> On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <
>>>>>>>>> ag1567...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Values can be updated based on customer actions
>>>>>>>>>>
>>>>>>>>>> All rows won't have all 50 key value pairs always if I make those
>>>>>>>>>> keys into columns the rows might have null value on the other hand 
>>>>>>>>>> if it is
>>>>>>>>>> JSONB then the key value pair will not be there
>>>>>>>>>>
>>>>>>>>>> Yes in UI customers can search for the key value pairs
>>>>>>>>>>
>>>>>>>>>> During data population the key value pair will be empty array in
>>>>>>>>>> case of JSONB column or NULL in case of table columns, later when 
>>>>>>>>>> customer
>>>>>>>>>> performs some actions that time the key value pairs will populate and
>>>>>>>>>> update, based on what action customer performs.
>>>>>>>>>>
>>>>>>>>>> On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <
>>>>>>>>>> ag1567...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Let's make it more understandable, here is the table schema with
>>>>>>>>>>> 50 columns in it
>>>>>>>>>>>
>>>>>>>>>>> CREATE TABLE dbo.googledocs_tbl (
>>>>>>>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1
>>>>>>>>>>> MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) 
>>>>>>>>>>> NOT NULL,
>>>>>>>>>>> userid int8 NOT NULL,
>>>>>>>>>>> t1 int4 NULL,
>>>>>>>>>>> t2 int4 NULL,
>>>>>>>>>>> t3 int4 NULL,
>>>>>>>>>>> t4 int4 NULL,
>>>>>>>>>>> t5 int4 NULL,
>>>>>>>>>>> t6 int4 NULL,
>>>>>>>>>>> t7 int4 NULL,
>>>>>>>>>>> t8 int4 NULL,
>>>>>>>>>>> t9 int4 NULL,
>>>>>>>>>>> t10 int4 NULL,
>>>>>>>>>>> t11 int4 NULL,
>>>>>>>>>>> t12 int4 NULL,
>>>>>>>>>>> t13 int4 NULL,
>>>>>>>>>>> t14 int4 NULL,
>>>>>>>>>>> t15 int4 NULL,
>>>>>>>>>>> t16 int4 NULL,
>>>>>>>>>>> t17 int4 NULL,
>>>>>>>>>>> t18 int4 NULL,
>>>>>>>>>>> t19 int4 NULL,
>>>>>>>>>>> t20 int4 NULL,
>>>>>>>>>>> t21 int4 NULL,
>>>>>>>>>>> t22 int4 NULL,
>>>>>>>>>>> t23 int4 NULL,
>>>>>>>>>>> t24 int4 NULL,
>>>>>>>>>>> t25 int4 NULL,
>>>>>>>>>>> t26 int4 NULL,
>>>>>>>>>>> t27 int4 NULL,
>>>>>>>>>>> t28 int4 NULL,
>>>>>>>>>>> t29 int4 NULL,
>>>>>>>>>>> t30 int4 NULL,
>>>>>>>>>>> t31 int4 NULL,
>>>>>>>>>>> t32 int4 NULL,
>>>>>>>>>>> t33 int4 NULL,
>>>>>>>>>>> t34 int4 NULL,
>>>>>>>>>>> t35 int4 NULL,
>>>>>>>>>>> t36 int4 NULL,
>>>>>>>>>>> t37 int4 NULL,
>>>>>>>>>>> t38 int4 NULL,
>>>>>>>>>>> t39 int4 NULL,
>>>>>>>>>>> t40 int4 NULL,
>>>>>>>>>>> t41 int4 NULL,
>>>>>>>>>>> t42 int4 NULL,
>>>>>>>>>>> t43 int4 NULL,
>>>>>>>>>>> t44 int4 NULL,
>>>>>>>>>>> t45 int4 NULL,
>>>>>>>>>>> t46 int4 NULL,
>>>>>>>>>>> t47 int4 NULL,
>>>>>>>>>>> t48 int4 NULL,
>>>>>>>>>>> t49 int4 NULL,
>>>>>>>>>>> t50 int4 NULL,
>>>>>>>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
>>>>>>>>>>> );
>>>>>>>>>>>
>>>>>>>>>>> Every time when i query I will query it along with userid
>>>>>>>>>>> Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
>>>>>>>>>>> more key filters if customer applies
>>>>>>>>>>>
>>>>>>>>>>> On the other hand if I create a single jsonb column the schema
>>>>>>>>>>> will look like :
>>>>>>>>>>>
>>>>>>>>>>> CREATE TABLE dbo.googledocs_tbl (
>>>>>>>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1
>>>>>>>>>>> MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) 
>>>>>>>>>>> NOT NULL,
>>>>>>>>>>> userid int8 NOT NULL,
>>>>>>>>>>> addons_json jsonb default '{}'::jsonb
>>>>>>>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
>>>>>>>>>>> );
>>>>>>>>>>>
>>>>>>>>>>> and the query would be like
>>>>>>>>>>> where userid = 12345678 and ((addons_json @> {t1:1}) or
>>>>>>>>>>> (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
>>>>>>>>>>> more key filters if customer applies
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <
>>>>>>>>>>> david.g.johns...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <
>>>>>>>>>>>> ag1567...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> So here my question is considering one JSONB column is perfect
>>>>>>>>>>>>> or considering 50 columns will be more optimised.
>>>>>>>>>>>>>
>>>>>>>>>>>> The relational database engine is designed around the
>>>>>>>>>>>> column-based approach.  Especially if the columns are generally 
>>>>>>>>>>>> unchanging,
>>>>>>>>>>>> combined with using fixed-width data types.
>>>>>>>>>>>>
>>>>>>>>>>>> David J.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Death to <Redacted>, and butter sauce.
>>>>>>>>> Don't boil me, I'm still alive.
>>>>>>>>> <Redacted> lobster!
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Death to <Redacted>, and butter sauce.
>>>>>>> Don't boil me, I'm still alive.
>>>>>>> <Redacted> lobster!
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Death to <Redacted>, and butter sauce.
>>>> Don't boil me, I'm still alive.
>>>> <Redacted> lobster!
>>>>
>>>
>
> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!
>

Re: Need help in database design

Reply via email to