Re: Quesion about querying distributed databases

Igor Korot Wed, 05 Mar 2025 19:03:43 -0800

Hi,

On Wed, Mar 5, 2025, 8:44 PM me nefcanto <sn.1...@gmail.com> wrote:


> I once worked with a monolithic SQL Server database with more than 10
> billion records and about 8 Terabytes of data. A single backup took us more
> than 21 days. It was a nightmare. Almost everybody knows that scaling up
> has a ceiling, but scaling out has no boundaries.
>

But then you did the backup incrementally correct?

That should not take the same amount of time...



> Therefore I will never choose a monolithic database design unless it's a
> small project. But my examples are just examples. We predict 100 million
> records per year. So we have to design accordingly. And it's not just sales
> records. Many applications have requirements that are cheap data but vast
> in multitude. Consider a language-learning app that wants to store the
> known words of any learner. 10 thousand learners each knowing 2 thousand
> words means 20 million records. Convert that to 100 thousand learners each
> knowing 7 thousand words and now you almost have a billion records. Cheap,
> but necessary. Let's not dive into telemetry or time-series data.
>

Can you try and see if 1 server with 3 different databases will do?

Having 1 table per database per server is too ugly.

Also please understand - every databae is different. And so it works and
operates differently. What work good in one may not work good in another...

Thank you.


> We initially chose to break the database into smaller databases, because
> it seemed natural for our modularized monolith architecture. And it worked
> great for SQL Server. If you're small, we host them all on one server. If
> you get bigger, we can put heavy databases on separate machines.
>
> However, I don't have experience working with other types of database
> scaling. I have used table partitioning, but I have never used sharding.
>
> Anyway, that's why I asked you guys. However, encouraging me to go back to
> monolith without giving solutions on how to scale, is not helping. To be
> honest, I'm somehow disappointed by how the most advanced open source
> database does not support cross-database querying just like how SQL Server
> does. But if it doesn't, it doesn't. Our team should either drop it as a
> choice or find a way (by asking the experts who built it or use it) how to
> design based on its features. That's why I'm asking.
>
> One thing that comes to my mind, is to use custom types. Instead of
> storing data in ItemCategories and ItemAttributes, store them as arrays in
> the relevant tables in the same database. But then it seems to me that in
> this case, Mongo would become a better choice because I lose the relational
> nature and normalization somehow. What drawbacks have you experienced in
> that sense?
>
> Regards
> Saeed
>
> On Wed, Mar 5, 2025 at 7:38 PM Adrian Klaver <adrian.kla...@aklaver.com>
> wrote:
>
>> On 3/5/25 04:15, me nefcanto wrote:
>> > Dear Laurenz, the point is that I think if we put all databases into
>> one
>> > database, then we have blocked our growth in the future.
>>
>> How?
>>
>> > A monolith database can be scaled only vertically. We have had huge
>> > headaches in the past with SQL Server on Windows and a single database.
>> > But when you divide bounded contexts into different databases, then you
>> > have the chance to deploy each database on a separate physical machine.
>> > That means a lot in terms of performance. Please correct me if I am
>> wrong.
>>
>> And you add the complexity of talking across machines, as well as
>> maintaining separate machines.
>>
>> >
>> > Let's put this physical restriction on ourselves that we have different
>> > databases. What options do we have? One option that comes to my mind,
>> is
>> > to store the ID of the categories in the Products table. This means
>> that
>> > I don't need FDW anymore. And databases can be on separate machines. I
>> > first query the categories database first, get the category IDs, and
>> > then add a where clause to limit the product search. That could be an
>> > option. Array data type in Postgres is something that I think other
>> > RDBMSs do not have. Will that work? And how about attributes? Because
>> > attributes are more than a single ID. I should store the attribute key,
>> > alongside its value. It's a key-value pair. What can I do for that?
>>
>> You seem to be going out of the way to make your life more complicated.
>>
>> The only way you are going to find an answer is set up test cases and
>> experiment. My bet is a single server with a single database and
>> multiple schemas is where you end up, after all that is where you are
>> starting from.
>>
>>
>> >
>> > Thank you for sharing your time. I really appreciate it.
>> > Saeed
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Mar 5, 2025 at 3:18 PM Laurenz Albe <laurenz.a...@cybertec.at
>> > <mailto:laurenz.a...@cybertec.at>> wrote:
>> >
>> >     On Wed, 2025-03-05 at 14:18 +0330, me nefcanto wrote:
>> >      > That means a solid monolith database. We lose many goodies with
>> that.
>> >      > As a real-world example, right now we can import a single
>> database
>> >      > from the production to the development to test and troubleshoot
>> data.
>> >
>> >     Well, can't you import a single schema then?
>> >
>> >      > What if we host all databases on the same server and use FDW.
>> What
>> >      > happens in that case? Does it return 100 thousand records and
>> join
>> >      > in the memory?
>> >
>> >     It will do just the same thing.  The performance could be better
>> >     because of the reduced latency.
>> >
>> >      > Because in SQL Server, when you perform a cross-database query
>> >      > (not cross-server) the performance is extremely good, proving
>> that
>> >      > it does not return 100 thousand ItemId from
>> Taxonomy.ItemCategories
>> >      > to join with ProductId.
>> >      >
>> >      > Is that the same case with Postgres too, If databases are located
>> >      > on one server?
>> >
>> >     No, you cannot perform cross-database queries without a foreign
>> >     data wrapper.  I don't see a reason why the statement shouldn't
>> >     perform as well as in SQL Server if you use schemas instead of
>> >     databases.
>> >
>> >     Yours,
>> >     Laurenz Albe
>> >
>>
>> --
>> Adrian Klaver
>> adrian.kla...@aklaver.com
>>
>>

Re: Quesion about querying distributed databases

Reply via email to