Re: The tragedy of SQL

2021-09-16 Thread Arnold Morein
My $5.00 on this discussion (quite fascinating and nostalgic actually). This is 
a little long granted, but it was fun.

As someone who cut their IT in general and programmer teeth in particular on 
the HP 3000 platform, in the time (and a land) BEFORE SQL, where we had only a 
single database engine to choose from (IMAGE, a network database), where you 
accessed data with its hashed key via a strict read/insert/lock before 
(update/delete) API or the KSAM file engine for non-database work that didn’t 
have high transaction volume (again, also only via a strict API), SQL at the 
time was nothing short of a miraculous boon!

None of the above were Relational. Nor was there anything like SQL in the 
business market. At the time, it was mostly found in academia and some 
fringe/bleeding edge companies.

Though I understand the concept of the request that started the thread, what 
irks me is when people make statements like the “tragedy of SQL” or anything 
else that equates SQL with the concept of a RDBMS. The two are not the same 
thing.

Let us not forget that SQL is an acronym for Structured Query Language and it 
is indeed just that: a language and a tool to be used in retrieving and 
manipulating data organized in the form of a tabular data stream; how it is 
managed by the underlying engine is irrelevant. By the time I was able to study 
it in college (after several years of only being to access data via a program I 
had to write utilizing engine APIs) I found it a great tool for a programmer 
and yes, even the odd accountant that had some technical aptitude. At the time, 
a basic SQL interpreter exceeded some of the best "report writer” tools of its 
day and unlike those tools, you could use SQL to actually change data in the 
underlying data store. (It’s Dark Majik, Moriarty!) In fact in many ways, it 
wasn’t marketed as a API abstraction layer (as it has turned into of late) but 
as a separate add-on tool for a given flavor of file/database engine.

Back on the HP 3000 (a 16-bit machine at that point), the only languages for 
use (that I had access to) were COBOL, Fortran, and (new at the time) Pascal. 
Having to write a quick and dirty application just to list data on the screen 
was required and overly tedious. When SQL became available as a layer between 
the user and the data store, it actually reduced the programmer workload as 
simple inquiries could be done by technical if not programming staff. (Also, 
when migrating from one propriety data source to another, SQL could be exported 
from one system and executed in another for import.)

On the other hand, from a programmer’s stand point back then, the introduction 
of the RELATIONAL database, and the cool stuff the engine would do for me with 
a one time declaration and no coding was amazing and again, a time saver. That 
there was a SQL interpreter option for the engine was icing on the cake!

HP would later release a SQL interpreter for its then TurboIMAGE and later 
ALLBASE products. The current CONNX product (now owned by SoftwareAG) provides 
SQL based access to Adabas data. IBM provides a product to provide access to 
VSAM data via SQL. Again: an add-on module to a product that wasn’t originally 
built to a be a RDBMS. 

I think the IT industry did a bit of a disservice to itself, SQL, and the RDBMS 
concept by coupling them so tightly. (Or maybe they just grew together 
organically?) The concept of a QL, whether verbose or terse, that allows a 
human with only a little training to write statements that gives them access to 
the data they need was a great idea. It still is. But, the plethora of other 
languages (for data access or other arcane activities) that have appeared in 
the field over the last 10-15 years is well, boggling; and personally at times 
seem like a rework only because the author didn’t like the language’s syntax or 
simply wasn’t a touch typist. (Man am I glad I took typing in high school!)

As stated, SQL is a language/tool that is simply one way to interact with a 
file/database engine. Though these days, I suspect many vendors prefer to use 
SQL as the only language they offer because SQL has such a broad use and most 
people have had to learn it to some degree or other in their collegiate and 
professional careers. What amazes me is that so many companies and products 
have gone to such lengths to allow the “simple” language of SQL to become the 
de facto language to interact with their file/database engines; sometimes not 
even publishing a non-SQL API at all. (All you get is a C/JDBC driver. Enjoy!)

So back to the thread, given the extensibility of PostgreSQL, I think the 
concept of a different QL to access the data stored there (or anywhere else for 
that matter) may be of great use to a specific audience, depending on the 
environment (read tool) where it is used.  (The biology reference was 
brilliant: one of my COBOL teachers digressed once that one of the features of 
the language at its outset was its abi

Re: Need help in database design

2024-12-23 Thread Arnold Morein
I would like to make a suggestion, if I may. Granted, I do not understand the 
underlying task at hand, but:


A table with multiple columns of the same type smacks of designs that harken 
back to the days of mainframes. (STOP THAT!) The data described is a 
non-normalized array of integers that is meaningless outside of code. Table 
structures should be at least a LITTLE self-descriptive.

It is also not flexible (what if you suddenly need t51? how long would that 
table space adjustment take in production?) and space is wasted if not all 50 
columns are populated.

Use a design that is basically a storage area for name/value pairs:

create table dbo.googledocs_tbl (
id long identity primary key, — easy way to access a single record
owner_id integer/long not null, — fk to owning parent record in other table 
such as user
owner_type char(2), — optional field, identifies the owing table, makes 
this table even more generic
property_name varchar(n) not null, — required unique name for property, not 
an array reference (t1, t4, t50)
— the names are controlled by the developer but should be human 
interpretable which can then be used in queries
property_value int4 not null — the important value in question
);

The fields owner_id, owner_type, property_name become a tertiary key that can 
never be changed, are unique and easily accessible via index lookup.

Add a timestamp if need be

You could then partition the record by owner_type or owner_id or whatever else 
comes to mind.

Then you just have to figure out the best way to index this monster for 
optimized queries.



> On Dec 23, 2024, at 12:31 PM, Divyansh Gupta JNsThMAudy  
> wrote:
> 
> Currently I haven't created those columns , I have created addons_json column 
> which is a JSONB column yet in a discussion weather I should create or 
> consider only one JSONB column.
> 
> 
> On Tue, 24 Dec 2024, 12:00 am Divyansh Gupta JNsThMAudy,  > wrote:
>> Range partition can help when you applies filter for a specific range but in 
>> my case I need to apply filter on userid always, however I have date columns 
>> but there is less variation in timestamp which I have that's why didn't go 
>> for range partition.
>> 
>> 
>> On Mon, 23 Dec 2024, 11:57 pm Ron Johnson, > > wrote:
>>> 
>>> 1. I bet you'd get better performance using RANGE partitioning.
>>> 2. Twenty million rows per userid is a LOT.  No subdivisions (like date 
>>> range)?
>>> 
>>> On Mon, Dec 23, 2024 at 1:23 PM Divyansh Gupta JNsThMAudy 
>>> mailto:ag1567...@gmail.com>> wrote:
 Adrian, Please check this out;
 
 PARTITION BY HASH (userid);
 CREATE TABLE dbo.googledocs_tbl_clone_part_0 PARTITION OF 
 dbo.googledocs_tbl_clone  FOR VALUES WITH (modulus 84, remainder 0);
 ...
 CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION OF 
 dbo.googledocs_tbl_clone  FOR VALUES WITH (modulus 84, remainder 83);
 
 
 
 On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy 
 mailto:ag1567...@gmail.com>> wrote:
> Adrian, the partition is on userid using hash partition with 84 partitions
> 
> Ron, there could be more than 20 Million records possible for a single 
> userid in that case if I create index on userid only not on other column 
> the query is taking more than 30 seconds to return the results.
> 
> 
> On Mon, 23 Dec 2024, 11:40 pm Ron Johnson,  > wrote:
>> If your queries all reference userid, then you only need indices on gdid 
>> and userid.
>> 
>> On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy 
>> mailto:ag1567...@gmail.com>> wrote:
>>> I have one confusion with this design if I opt to create 50 columns I 
>>> need to create 50 index which will work with userid index in Bitmap on 
>>> the other hand if I create a JSONB column I need to create a single 
>>> index ?
>>> 
>>> 
>>> On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, >> > wrote:
 Given what you just wrote, I'd stick with 50 separate t* columns.  
 Simplifies queries, simplifies updates, and eliminates JSONB 
 conversions.
 
 On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy 
 mailto:ag1567...@gmail.com>> wrote:
> Values can be updated based on customer actions
> 
> All rows won't have all 50 key value pairs always if I make those 
> keys into columns the rows might have null value on the other hand if 
> it is JSONB then the key value pair will not be there
> 
> Yes in UI customers can search for the key value pairs
> 
> During data population the key value pair will be empty array in case 
> of JSONB column or NULL in case of table columns, later when customer 
> performs some actions that time the key val