Thank you everyone for giving your valuable responses, I am glad that everyone understands my concern. I got some good ideas about the database design that I am following after going through some stress testing I will implement the same.
Thank you so much Everyone On Tue, 24 Dec 2024, 12:09 am Ron Johnson, <ronljohnso...@gmail.com> wrote: > Are these columns really unique for all 20M rows that a userid can have in > the table? I'm dubious. > > Split a LOT of those columns out into a separate table named "user" with > PK userid. It'll save a huge amount of disk space, and speed up queries by > not having to fetch it all every time. > > useremail varchar(600) NOT NULL, > title public.citext NULL, > authorname varchar(600) NULL, > authoremail varchar(600) NULL, > updated varchar(300) NOT NULL, > entryid varchar(2000) NOT NULL, > lastmodifiedby varchar(600) NULL, > lastmodifiedbyemail varchar(600) NULL, > "size" varchar(300) NULL, > contenttype varchar(250) NULL, > fileextension varchar(50) NULL, > docfoldername public.citext NULL, > folderresourceid public.citext NULL, > filesize int8 DEFAULT 0 NOT NULL, > retentionstatus int2 DEFAULT 0 NOT NULL, > docfileref int8 NULL, > usid int4 NULL, > archivepath varchar(500) NULL, > createddate timestamp(6) DEFAULT NULL::timestamp without time zone NULL, > zipfilename varchar(100) NULL, > oncreatedat timestamp(6) DEFAULT clock_timestamp() NOT NULL, > onupdateat timestamp(6) DEFAULT clock_timestamp() NOT NULL, > startsnapshot int4 DEFAULT 0 NOT NULL, > currentsnapshot int4 DEFAULT 0 NOT NULL, > dismiss int2 DEFAULT 0 NOT NULL, > checksum varchar NULL, > typeoffile int2 GENERATED ALWAYS AS ( > > > > On Mon, Dec 23, 2024 at 1:32 PM Divyansh Gupta JNsThMAudy < > ag1567...@gmail.com> wrote: > >> Currently I haven't created those columns , I have created addons_json >> column which is a JSONB column yet in a discussion weather I should create >> or consider only one JSONB column. >> >> On Tue, 24 Dec 2024, 12:00 am Divyansh Gupta JNsThMAudy, < >> ag1567...@gmail.com> wrote: >> >>> Range partition can help when you applies filter for a specific range >>> but in my case I need to apply filter on userid always, however I have date >>> columns but there is less variation in timestamp which I have that's why >>> didn't go for range partition. >>> >>> On Mon, 23 Dec 2024, 11:57 pm Ron Johnson, <ronljohnso...@gmail.com> >>> wrote: >>> >>>> >>>> 1. I bet you'd get better performance using RANGE partitioning. >>>> 2. Twenty million rows per userid is a *LOT*. No subdivisions (like >>>> date range)? >>>> >>>> On Mon, Dec 23, 2024 at 1:23 PM Divyansh Gupta JNsThMAudy < >>>> ag1567...@gmail.com> wrote: >>>> >>>>> Adrian, Please check this out; >>>>> >>>>> PARTITION BY HASH (userid); CREATE TABLE dbo. >>>>> googledocs_tbl_clone_part_0 PARTITION OF dbo.googledocs_tbl_clone FOR >>>>> VALUES WITH (modulus 84, remainder 0); ... CREATE TABLE dbo. >>>>> googledocs_tbl_clone_part_83 PARTITION OF dbo.googledocs_tbl_clone FOR >>>>> VALUES WITH (modulus 84, remainder 83); >>>>> >>>>> On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy < >>>>> ag1567...@gmail.com> wrote: >>>>> >>>>>> Adrian, the partition is on userid using hash partition with 84 >>>>>> partitions >>>>>> >>>>>> Ron, there could be more than 20 Million records possible for a >>>>>> single userid in that case if I create index on userid only not on other >>>>>> column the query is taking more than 30 seconds to return the results. >>>>>> >>>>>> On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <ronljohnso...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> If your queries all reference userid, then you only need indices on >>>>>>> gdid and userid. >>>>>>> >>>>>>> On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy < >>>>>>> ag1567...@gmail.com> wrote: >>>>>>> >>>>>>>> I have one confusion with this design if I opt to create 50 columns >>>>>>>> I need to create 50 index which will work with userid index in Bitmap >>>>>>>> on >>>>>>>> the other hand if I create a JSONB column I need to create a single >>>>>>>> index ? >>>>>>>> >>>>>>>> On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <ronljohnso...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Given what you just wrote, I'd stick with 50 separate t* columns. >>>>>>>>> Simplifies queries, simplifies updates, and eliminates JSONB >>>>>>>>> conversions. >>>>>>>>> >>>>>>>>> On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy < >>>>>>>>> ag1567...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Values can be updated based on customer actions >>>>>>>>>> >>>>>>>>>> All rows won't have all 50 key value pairs always if I make those >>>>>>>>>> keys into columns the rows might have null value on the other hand >>>>>>>>>> if it is >>>>>>>>>> JSONB then the key value pair will not be there >>>>>>>>>> >>>>>>>>>> Yes in UI customers can search for the key value pairs >>>>>>>>>> >>>>>>>>>> During data population the key value pair will be empty array in >>>>>>>>>> case of JSONB column or NULL in case of table columns, later when >>>>>>>>>> customer >>>>>>>>>> performs some actions that time the key value pairs will populate and >>>>>>>>>> update, based on what action customer performs. >>>>>>>>>> >>>>>>>>>> On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, < >>>>>>>>>> ag1567...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Let's make it more understandable, here is the table schema with >>>>>>>>>>> 50 columns in it >>>>>>>>>>> >>>>>>>>>>> CREATE TABLE dbo.googledocs_tbl ( >>>>>>>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 >>>>>>>>>>> MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) >>>>>>>>>>> NOT NULL, >>>>>>>>>>> userid int8 NOT NULL, >>>>>>>>>>> t1 int4 NULL, >>>>>>>>>>> t2 int4 NULL, >>>>>>>>>>> t3 int4 NULL, >>>>>>>>>>> t4 int4 NULL, >>>>>>>>>>> t5 int4 NULL, >>>>>>>>>>> t6 int4 NULL, >>>>>>>>>>> t7 int4 NULL, >>>>>>>>>>> t8 int4 NULL, >>>>>>>>>>> t9 int4 NULL, >>>>>>>>>>> t10 int4 NULL, >>>>>>>>>>> t11 int4 NULL, >>>>>>>>>>> t12 int4 NULL, >>>>>>>>>>> t13 int4 NULL, >>>>>>>>>>> t14 int4 NULL, >>>>>>>>>>> t15 int4 NULL, >>>>>>>>>>> t16 int4 NULL, >>>>>>>>>>> t17 int4 NULL, >>>>>>>>>>> t18 int4 NULL, >>>>>>>>>>> t19 int4 NULL, >>>>>>>>>>> t20 int4 NULL, >>>>>>>>>>> t21 int4 NULL, >>>>>>>>>>> t22 int4 NULL, >>>>>>>>>>> t23 int4 NULL, >>>>>>>>>>> t24 int4 NULL, >>>>>>>>>>> t25 int4 NULL, >>>>>>>>>>> t26 int4 NULL, >>>>>>>>>>> t27 int4 NULL, >>>>>>>>>>> t28 int4 NULL, >>>>>>>>>>> t29 int4 NULL, >>>>>>>>>>> t30 int4 NULL, >>>>>>>>>>> t31 int4 NULL, >>>>>>>>>>> t32 int4 NULL, >>>>>>>>>>> t33 int4 NULL, >>>>>>>>>>> t34 int4 NULL, >>>>>>>>>>> t35 int4 NULL, >>>>>>>>>>> t36 int4 NULL, >>>>>>>>>>> t37 int4 NULL, >>>>>>>>>>> t38 int4 NULL, >>>>>>>>>>> t39 int4 NULL, >>>>>>>>>>> t40 int4 NULL, >>>>>>>>>>> t41 int4 NULL, >>>>>>>>>>> t42 int4 NULL, >>>>>>>>>>> t43 int4 NULL, >>>>>>>>>>> t44 int4 NULL, >>>>>>>>>>> t45 int4 NULL, >>>>>>>>>>> t46 int4 NULL, >>>>>>>>>>> t47 int4 NULL, >>>>>>>>>>> t48 int4 NULL, >>>>>>>>>>> t49 int4 NULL, >>>>>>>>>>> t50 int4 NULL, >>>>>>>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid), >>>>>>>>>>> ); >>>>>>>>>>> >>>>>>>>>>> Every time when i query I will query it along with userid >>>>>>>>>>> Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2) >>>>>>>>>>> more key filters if customer applies >>>>>>>>>>> >>>>>>>>>>> On the other hand if I create a single jsonb column the schema >>>>>>>>>>> will look like : >>>>>>>>>>> >>>>>>>>>>> CREATE TABLE dbo.googledocs_tbl ( >>>>>>>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 >>>>>>>>>>> MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) >>>>>>>>>>> NOT NULL, >>>>>>>>>>> userid int8 NOT NULL, >>>>>>>>>>> addons_json jsonb default '{}'::jsonb >>>>>>>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid), >>>>>>>>>>> ); >>>>>>>>>>> >>>>>>>>>>> and the query would be like >>>>>>>>>>> where userid = 12345678 and ((addons_json @> {t1:1}) or >>>>>>>>>>> (addons_json @> {t1:2}) or (addons_json @> {t1:3}) >>>>>>>>>>> more key filters if customer applies >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston < >>>>>>>>>>> david.g.johns...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy < >>>>>>>>>>>> ag1567...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> So here my question is considering one JSONB column is perfect >>>>>>>>>>>>> or considering 50 columns will be more optimised. >>>>>>>>>>>>> >>>>>>>>>>>> The relational database engine is designed around the >>>>>>>>>>>> column-based approach. Especially if the columns are generally >>>>>>>>>>>> unchanging, >>>>>>>>>>>> combined with using fixed-width data types. >>>>>>>>>>>> >>>>>>>>>>>> David J. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Death to <Redacted>, and butter sauce. >>>>>>>>> Don't boil me, I'm still alive. >>>>>>>>> <Redacted> lobster! >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Death to <Redacted>, and butter sauce. >>>>>>> Don't boil me, I'm still alive. >>>>>>> <Redacted> lobster! >>>>>>> >>>>>> >>>> >>>> -- >>>> Death to <Redacted>, and butter sauce. >>>> Don't boil me, I'm still alive. >>>> <Redacted> lobster! >>>> >>> > > -- > Death to <Redacted>, and butter sauce. > Don't boil me, I'm still alive. > <Redacted> lobster! >