Re: refactoring basebackup.c

2022-03-08 Thread Jeevan Ladhe
Hi Robert,

My proposed changes are largely cosmetic, but one thing that isn't is
> revising the size - pos <= bound tests to instead check size - pos <
> bound. My reasoning for that change is: if the number of bytes
> remaining in the buffer is exactly equal to the maximum number we can
> write, we don't need to flush it yet. If that sounds correct, we
> should fix the LZ4 code the same way.
>

I agree with your patch. The patch looks good to me.
Yes, the LZ4 flush check should also be fixed. Please find the attached
patch to fix the LZ4 code.

Regards,
Jeevan Ladhe


fix_lz4_flush_logic.patch
Description: Binary data


Re: refactoring basebackup.c

2022-03-08 Thread Jeevan Ladhe
>
> OK, committed all that stuff.
>

Thanks for the commit Robert.


> I think we also need to fix one other thing. Right now, for LZ4
> support we test HAVE_LIBLZ4, but TOAST and XLOG compression are
> testing USE_LZ4, so I think we should be doing the same here. And
> similarly I think we should be testing USE_ZSTD not HAVE_LIBZSTD.
>

I reviewed the patch, and it seems to be capturing and replacing all the
places of HAVE_LIB* with USE_* correctly.
Just curious, apart from consistency, do you see other problems as well
when testing one vs the other?

Regards,
Jeevan Ladhe


Re: refactoring basebackup.c

2022-03-08 Thread Jeevan Ladhe
ok got it. Thanks for your insights.

Regards,
Jeevan Ladhe

On Tue, 8 Mar 2022 at 22:23, Robert Haas  wrote:

> On Tue, Mar 8, 2022 at 11:32 AM Jeevan Ladhe 
> wrote:
> > I reviewed the patch, and it seems to be capturing and replacing all the
> > places of HAVE_LIB* with USE_* correctly.
> > Just curious, apart from consistency, do you see other problems as well
> > when testing one vs the other?
>
> So, the kind of problem you would worry about in a case like this is:
> suppose that configure detects LIBLZ4, but the user specifies
> --without-lz4. Then maybe there is some way for HAVE_LIBLZ4 to be
> true, while USE_LIBLZ4 is false, and therefore we should not be
> compiling code that uses LZ4 but do anyway. As configure.ac is
> currently coded, I think that's impossible, because we only search for
> liblz4 if the user says --with-lz4, and if they do that, then USE_LZ4
> will be set. Therefore, I don't think there is a live problem here,
> just an inconsistency.
>
> Probably still best to clean it up before an angry Andres chases me
> down, since I know he's working on the build system...
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


Re: refactoring basebackup.c

2022-03-15 Thread Jeevan Ladhe
Thanks for the patch, Dipesh.
I had a look at the patch and also tried to take the backup. I have
following suggestions and observations:

I get following error at my end:

$ pg_basebackup -D /tmp/zstd_bk -Ft -Xfetch --compress=server-zstd:7@4
pg_basebackup: error: could not initiate base backup: ERROR:  could not
compress data: Unsupported parameter
pg_basebackup: removing data directory "/tmp/zstd_bk"

This is mostly because I have the zstd library version v1.4.4, which
does not have default support for parallel workers. Maybe we should
have a better error, something that is hinting that the parallelism is
not supported by the particular build.

The regression for pg_verifybackup test 008_untar.pl also fails with a
similar error. Here, I think we should have some logic in regression to
skip the test if the parameter is not supported?

+   if (ZSTD_isError(ret))

+   elog(ERROR,

+"could not compress data: %s",

+ZSTD_getErrorName(ret));

I think all of this can go on one line, but anyhow we have to improve
the error message here.

Also, just a thought, for the versions where parallelism is not
supported, should we instead just throw a warning and fall back to
non-parallel behavior?

Regards,
Jeevan Ladhe

On Mon, 14 Mar 2022 at 21:41, Dipesh Pandit  wrote:

> Hi,
>
> I tried to implement support for parallel ZSTD compression. The
> library provides an option (ZSTD_c_nbWorkers) to specify the
> number of compression workers. The number of parallel
> workers can be set as part of compression parameter and if this
> option is specified then the library performs parallel compression
> based on the specified number of workers.
>
> User can specify the number of parallel worker as part of
> --compress option by appending an integer value after at sign (@).
> (-Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL][@WORKERS])
>
> Please find the attached patch v1 with the above changes.
>
> Note: ZSTD library version 1.5.x supports parallel compression
> by default and if the library version is lower than 1.5.x then
> parallel compression is enabled only the source is compiled with build
> macro ZSTD_MULTITHREAD. If the linked library version doesn't
> support parallel compression then setting the value of parameter
> ZSTD_c_nbWorkers to a value other than 0 will be no-op and
> returns an error.
>
> Thanks,
> Dipesh
>


Re: Logical Replication - improve error message while adding tables to the publication in check_publication_add_relation

2021-03-09 Thread Jeevan Ladhe
On Wed, Mar 10, 2021 at 10:44 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> Hi,
>
> While providing thoughts on [1], I observed that the error messages
> that are emitted while adding foreign, temporary and unlogged tables
> can be improved a bit from the existing [2] to [3].
>

+1 for improving the error messages here.


> Attaching a small patch. Thoughts?
>

I had a look at the patch and it looks good to me. However, I think after
you have added the specific kind of table type in the error message itself,
now the error details seem to be giving redundant information, but others
might
have different thoughts.

The patch itself looks good otherwise. Also the make check and postgres_fdw
check looking good.

Regards,
Jeevan Ladhe


Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-09 Thread Jeevan Ladhe
Hi,

On Mon, Nov 9, 2020 at 10:15 PM Alexey Kondratov 
wrote:

> Hi Hackers,
>
> Today I have accidentally noticed that autoprewarm feature of pg_prewarm
> used TimestampDifference()'s results in a wrong way.
>
> First, it used *seconds* result from it as a *milliseconds*. It was
> causing it to make dump file autoprewarm.blocks ~every second with
> default setting of autoprewarm_interval = 300s.
>
>
I had a look at this, and I agree that this is an issue. I also had a look
at
the patch 0002, and the patch looks good to me.

In patch 0003 there is a typo:

+ /* We have to sleep even after a successfull dump */

s/successfull/successful


Regards,
Jeevan Ladhe


Re: [PATCH] improve the pg_upgrade error message

2021-12-01 Thread Jeevan Ladhe
On Wed, Dec 1, 2021 at 3:45 PM Daniel Gustafsson  wrote:

> > On 1 Dec 2021, at 10:59, Jeevan Ladhe 
> wrote:
>
> > Was wondering if we had any barriers to getting this committed.
>
> No barrier other than available time to, I will try to get to it shortly.
>

Great! Thank you.


> --
> Daniel Gustafsson   https://vmware.com/
>
>


Re: refactoring basebackup.c

2021-12-27 Thread Jeevan Ladhe
Hi Tushar,

You need to apply Robert's v10 version patches 0002, 0003 and 0004, before
applying the lz4 patch(v8 version).
Please let me know if you still face any issues.

Regards,
Jeevan Ladhe

On Mon, Dec 27, 2021 at 7:01 PM tushar 
wrote:

> On 11/22/21 11:05 PM, Jeevan Ladhe wrote:
> > Please find the lz4 compression patch here that basically has:
> Thanks, Could you please rebase your patch, it is failing at my end -
>
> [edb@centos7tushar pg15_lz]$ git apply /tmp/v8-0001-LZ4-compression.patch
> error: patch failed: doc/src/sgml/ref/pg_basebackup.sgml:230
> error: doc/src/sgml/ref/pg_basebackup.sgml: patch does not apply
> error: patch failed: src/backend/replication/Makefile:19
> error: src/backend/replication/Makefile: patch does not apply
> error: patch failed: src/backend/replication/basebackup.c:64
> error: src/backend/replication/basebackup.c: patch does not apply
> error: patch failed: src/include/replication/basebackup_sink.h:285
> error: src/include/replication/basebackup_sink.h: patch does not apply
>
> --
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/
> The Enterprise PostgreSQL Company
>
>


Re: Data type correction in pgstat_report_replslot function parameters

2021-04-01 Thread Jeevan Ladhe
On Thu, Apr 1, 2021 at 10:20 PM vignesh C  wrote:

> Hi,
>
> While I was reviewing replication slot statistics code, I found one
> issue in the data type used for pgstat_report_replslot function
> parameters. We pass int64 variables to the function but the function
> prototype uses int type. I I felt the function parameters should be
> int64. Attached patch fixes the same.


+1 for the change. The patch LGTM.

Regards,
Jeevan Ladhe


Re: refactoring basebackup.c

2021-10-14 Thread Jeevan Ladhe
Thanks, Robert for reviewing the patch.


On Tue, Oct 12, 2021 at 11:09 PM Robert Haas  wrote:

This is the only place where CHUNK_SIZE gets used, and I don't think I
> see any point to it. I think the 5th argument to LZ4F_compressUpdate
> could just be avail_in. And as soon as you do that then I think
> bbsink_lz4_archive_contents() no longer needs to be a loop.
>

Agree. Removed the CHUNK_SIZE and the loop.


>
> + /* First of all write the frame header to destination buffer. */
> + headerSize = LZ4F_compressBegin(mysink->ctx,
> + mysink->base.bbs_next->bbs_buffer,
> + mysink->base.bbs_next->bbs_buffer_length,
> + &mysink->prefs);
>
> + compressedSize = LZ4F_compressEnd(mysink->ctx,
> + mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
> + mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
> + NULL);
>
> I think there's some issue with these two chunks of code. What happens
> if one of these functions wants to write more data than will fit in
> the output buffer? It seems like either there needs to be some code
> someplace that ensures adequate space in the output buffer at the time
> of these calls, or else there needs to be a retry loop that writes as
> much of the data as possible, flushes the output buffer, and then
> loops to generate more output data. But there's clearly no retry loop
> here, and I don't see any code that guarantees that the output buffer
> has to be large enough (and in the case of LZ4F_compressEnd, have
> enough remaining space) either. In other words, all the same concerns
> that apply to LZ4F_compressUpdate() also apply here ... but in
> LZ4F_compressUpdate() you seem to BOTH have a retry loop and ALSO code
> to make sure that the buffer is certain to be large enough (which is
> more than you need, you only need one of those) and here you seem to
> have NEITHER of those things (which is not enough, you need one or the
> other).
>

Fair enough. I have made the change in the bbsink_lz4_begin_backup() to
make sure we reserve enough extra bytes for the header and the footer those
are written by LZ4F_compressBegin() and LZ4F_compressEnd() respectively.
The LZ4F_compressBound() when passed the input size as "0", would give
the upper bound for output buffer needed by the LZ4F_compressEnd().

How about instead using memset() to zero the whole thing and then
> omitting the zero initializations? That seems like it would be less
> fragile, if the upstream structure definition ever changes.
>

Made this change.

Please review the patch, and let me know your comments.

Regards,
Jeevan Ladhe


lz4_compress_v5.patch
Description: Binary data


Re: refactoring basebackup.c

2021-10-15 Thread Jeevan Ladhe
Hi Robert,

> The loop is gone, but CHUNK_SIZE itself seems to have evaded the
executioner.

I am sorry, but I did not really get it. Or it is what you have pointed
in the following paragraphs?

> I think this is not the best way to accomplish the goal. Adding
> LZ4F_compressBound(0) to next_buf_len makes the buffer substantially
> bigger for something that's only going to happen once.

Yes, you are right. I missed this.

> We are assuming in any case, I think, that LZ4F_compressBound(0) <=
> LZ4F_compressBound(mysink->base.bbs_buffer_length), so all you need to
> do is have bbsink_end_archive() empty the buffer, if necessary, before
> calling LZ4F_compressEnd().

This is a fair enough assumption.

> With just that change, you can set
> next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound --
> but that's also more than you need. You can instead do next_buf_len =
> Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're
> probably thinking that won't work, because bbsink_lz4_begin_archive()
> could fill up the buffer partway, and then the first call to
> bbsink_lz4_archive_contents() could overrun it. But that problem can
> be solved by reversing the order of operations in
> bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(),
> test whether you need to empty the buffer first, and if so, do it.

I am still not able to get - how can we survive with a mere
size of Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound).
LZ4F_HEADER_SIZE_MAX is defined as 19 in lz4 library. With this
proposal, it is almost guaranteed that the next buffer length will
be always set to 19, which will result in failure of a call to
LZ4F_compressUpdate() with the error LZ4F_ERROR_dstMaxSize_tooSmall,
even if we had called bbsink_archive_contents() before.

> That's actually less confusing than the way you've got it, because as
> you have it written, we don't really know why we're emptying the
> buffer -- is it to prepare for the next call to LZ4F_compressUpdate(),
> or is it to prepare for the call to LZ4F_compressEnd()? How do we know
> now how much space the next person writing into the buffer is going to
> need? It seems better if bbsink_lz4_archive_contents() empties the
> buffer before calling LZ4F_compressUpdate() if that call might not
> have enough space, and likewise bbsink_lz4_end_archive() empties the
> buffer before calling LZ4F_compressEnd() if that's needed. That way,
> each callback makes the space *it* needs, not the space the *next*
> caller needs. (bbsink_lz4_end_archive() still needs to ALSO empty the
> buffer after LZ4F_compressEnd(), so we don't orphan any data.)

Sure, I get your point here.

> On another note, if the call to LZ4F_freeCompressionContext() is
> required in bbsink_lz4_end_archive(), then I think this code is going
> to just leak the memory used by the compression context if an error
> occurs before this code is reached. That kind of sucks.

Yes, the LZ4F_freeCompressionContext() is needed to clear the
LZ4F_cctx. The structure LZ4F_cctx_s maintains internal stages
of compression, internal buffers, etc.

> The way to fix
> it, I suppose, is a TRY/CATCH block, but I don't think that can be
> something internal to basebackup_lz4.c: I think the bbsink stuff would
> need to provide some kind of infrastructure for basebackup_lz4.c to
> use. It would be a lot better if we could instead get LZ4 to allocate
> memory using palloc(), but a quick Google search suggests that you
> can't accomplish that without recompiling liblz4, and that's not
> workable since we don't want to require a liblz4 built specifically
> for PostgreSQL. Do you see any other solution?

You mean the way gzip allows us to use our own alloc and free functions
by means of providing the function pointers for them. Unfortunately,
no, LZ4 does not have that kind of provision. Maybe that makes a
good proposal for LZ4 library ;-).
I cannot think of another solution to it right away.

Regards,
Jeevan Ladhe


Re: refactoring basebackup.c

2021-10-20 Thread Jeevan Ladhe
Hi Robert,

I mean #define CHUNK_SIZE is still in the patch.
>

Oops, removed now.


> > > With just that change, you can set
> > > next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound --
> > > but that's also more than you need. You can instead do next_buf_len =
> > > Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're
> > > probably thinking that won't work, because bbsink_lz4_begin_archive()
> > > could fill up the buffer partway, and then the first call to
> > > bbsink_lz4_archive_contents() could overrun it. But that problem can
> > > be solved by reversing the order of operations in
> > > bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(),
> > > test whether you need to empty the buffer first, and if so, do it.
> >
> > I am still not able to get - how can we survive with a mere
> > size of Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound).
> > LZ4F_HEADER_SIZE_MAX is defined as 19 in lz4 library. With this
> > proposal, it is almost guaranteed that the next buffer length will
> > be always set to 19, which will result in failure of a call to
> > LZ4F_compressUpdate() with the error LZ4F_ERROR_dstMaxSize_tooSmall,
> > even if we had called bbsink_archive_contents() before.
>
> Sorry, should have been Max(), not Min().
>

Ahh ok.
I looked into the code of LZ4F_compressBound() and the header size is
already included in the calculations, so when we compare
LZ4F_HEADER_SIZE_MAX and mysink->output_buffer_bound, the latter
will be always greater, and hence sufficient length for the output buffer. I
made this change in the attached patch, and also cleared the buffer by
calling bbsink_archive_contents() before calling LZ4_compressUpdate().
Similarly cleared the buffer before calling LZ4F_compressEnd().

> You mean the way gzip allows us to use our own alloc and free functions
> > by means of providing the function pointers for them. Unfortunately,
> > no, LZ4 does not have that kind of provision. Maybe that makes a
> > good proposal for LZ4 library ;-).
> > I cannot think of another solution to it right away.
>
> OK. Will give it some thought.


I have started a thread[1] on LZ4 community for this, but so far no
reply on that.

Regards,
Jeevan Ladhe

[1]
https://groups.google.com/g/lz4c/c/WnJkKwBWlcM/m/zszrla2mBQAJ?utm_medium=email&utm_source=footer


lz4_compress_v6.patch
Description: Binary data


Re: refactoring basebackup.c

2021-10-29 Thread Jeevan Ladhe
Thanks, Robert for the patches.

I tried to take a backup using gzip compression and got a core.

$ pg_basebackup -t server:/tmp/data_gzip -Xnone --server-compression=gzip
NOTICE:  WAL archiving is not enabled; you must ensure that all required
WAL segments are copied through other means to complete the backup
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

The backtrace:

gdb) bt
#0  0x in ?? ()
#1  0x558264bfc40a in bbsink_cleanup (sink=0x55826684b5f8) at
../../../src/include/replication/basebackup_sink.h:268
#2  0x558264bfc838 in bbsink_forward_cleanup (sink=0x55826684b710) at
basebackup_sink.c:124
#3  0x558264bf4cab in bbsink_cleanup (sink=0x55826684b710) at
../../../src/include/replication/basebackup_sink.h:268
#4  0x558264bf7738 in SendBaseBackup (cmd=0x55826683bd10) at
basebackup.c:1020
#5  0x558264c10915 in exec_replication_command (
cmd_string=0x5582667bc580 "BASE_BACKUP ( LABEL 'pg_basebackup base
backup',  PROGRESS,  MANIFEST 'yes',  TABLESPACE_MAP,  TARGET 'server',
 TARGET_DETAIL '/tmp/data_g
zip',  COMPRESSION 'gzip')") at walsender.c:1731
#6  0x558264c8a69b in PostgresMain (dbname=0x5582667e84d8 "",
username=0x5582667e84b8 "hadoop") at postgres.c:4493
#7  0x558264bb10a6 in BackendRun (port=0x5582667de160) at
postmaster.c:4560
#8  0x558264bb098b in BackendStartup (port=0x5582667de160) at
postmaster.c:4288
#9  0x558264bacb55 in ServerLoop () at postmaster.c:1801
#10 0x558264bac2ee in PostmasterMain (argc=3, argv=0x5582667b68c0) at
postmaster.c:1473
#11 0x558264aa0950 in main (argc=3, argv=0x5582667b68c0) at main.c:198

bbsink_gzip_ops have the cleanup() callback set to NULL, and when the
bbsink_cleanup() callback is triggered, it tries to invoke a function that
is NULL. I think either bbsink_gzip_ops should set the cleanup callback
to bbsink_forward_cleanup or we should be calling the cleanup() callback
from PG_CATCH instead of PG_FINALLY()? But in the latter case, even if
we call from PG_CATCH, it will have a similar problem for gzip and other
sinks which may not need a custom cleanup() callback in case there is any
error before the backup could finish up normally.

I have attached a patch to fix this.

Thoughts?

Regards,
Jeevan Ladhe

On Tue, Oct 26, 2021 at 1:45 AM Robert Haas  wrote:

> On Fri, Oct 15, 2021 at 8:05 AM Robert Haas  wrote:
> > > You mean the way gzip allows us to use our own alloc and free functions
> > > by means of providing the function pointers for them. Unfortunately,
> > > no, LZ4 does not have that kind of provision. Maybe that makes a
> > > good proposal for LZ4 library ;-).
> > > I cannot think of another solution to it right away.
> >
> > OK. Will give it some thought.
>
> Here's a new patch set. I've tried adding a "cleanup" callback to the
> bbsink method and ensuring that it gets called even in case of an
> error. The code for that is untested since I have no use for it with
> the existing basebackup sink types, so let me know how it goes when
> you try to use it for LZ4.
>
> I've also added documentation for the new pg_basebackup options in
> this version, and I fixed up a couple of these patches to be
> pgindent-clean when they previously were not.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


fix_gzip_cleanup_callback_core.patch
Description: Binary data


Re: refactoring basebackup.c

2021-11-02 Thread Jeevan Ladhe
I have implemented the cleanup callback bbsink_lz4_cleanup() in the
attached patch.


Please have a look and let me know of any comments.


Regards,

Jeevan Ladhe

On Fri, Oct 29, 2021 at 6:54 PM Robert Haas  wrote:

> On Fri, Oct 29, 2021 at 8:59 AM Jeevan Ladhe
>  wrote:>
> > bbsink_gzip_ops have the cleanup() callback set to NULL, and when the
> > bbsink_cleanup() callback is triggered, it tries to invoke a function
> that
> > is NULL. I think either bbsink_gzip_ops should set the cleanup callback
> > to bbsink_forward_cleanup or we should be calling the cleanup() callback
> > from PG_CATCH instead of PG_FINALLY()? But in the latter case, even if
> > we call from PG_CATCH, it will have a similar problem for gzip and other
> > sinks which may not need a custom cleanup() callback in case there is any
> > error before the backup could finish up normally.
> >
> > I have attached a patch to fix this.
>
> Yes, this is the right fix. Apologies for the oversight.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


lz4_compress_v7.patch
Description: Binary data


Re: Teach pg_receivewal to use lz4 compression

2021-11-18 Thread Jeevan Ladhe
In dir_open_for_write() I observe that we are writing the header
and then calling LZ4F_compressEnd() in case there is an error
while writing the buffer to the file, and the output buffer of
LZ4F_compressEnd() is not written anywhere. Why should this be
necessary? To flush the contents of the internal buffer? But, then we
are calling LZ4F_freeCompressionContext() immediately after the
LZ4F_compressEnd() call. I might be missing something, will be
happy to get more insights.

Regards,
Jeevan Ladhe

On Fri, Nov 5, 2021 at 1:21 PM  wrote:

>
>
> ‐‐‐ Original Message ‐‐‐
>
> On Friday, November 5th, 2021 at 3:47 AM, Michael Paquier <
> mich...@paquier.xyz> wrote:
>
> >
> > I have spent an extra couple of hours staring at the code, and the
> > whole looked fine, so applied. While on it, I have tested the new TAP
> > tests with all the possible combinations of --without-zlib and
> > --with-lz4.
>
> Great news. Thank you very much.
>
> Cheers,
> //Georgios
>
> > --
> > Michael
>
>
>


Re: Teach pg_receivewal to use lz4 compression

2021-11-21 Thread Jeevan Ladhe
On Fri, Nov 19, 2021 at 7:37 AM Michael Paquier  wrote:

> On Thu, Nov 18, 2021 at 07:54:37PM +0530, Jeevan Ladhe wrote:
> > In dir_open_for_write() I observe that we are writing the header
> > and then calling LZ4F_compressEnd() in case there is an error
> > while writing the buffer to the file, and the output buffer of
> > LZ4F_compressEnd() is not written anywhere. Why should this be
> > necessary? To flush the contents of the internal buffer? But, then we
> > are calling LZ4F_freeCompressionContext() immediately after the
> > LZ4F_compressEnd() call. I might be missing something, will be
> > happy to get more insights.
>
> My concern here was symmetry, where IMO it makes sense to have a
> compressEnd call each time there is a successful compressBegin call
> done for the LZ4 state data, as there is no way to know if in the
> future LZ4 won't change some of its internals to do more than just an
> internal flush.
>

Fair enough. But, still I have a doubt in mind what benefit would that
really bring to us here, because we are immediately also freeing the
lz4buf without using it anywhere.

Regards,
Jeevan


Re: refactoring basebackup.c

2021-11-22 Thread Jeevan Ladhe
Hi Robert,

Please find the lz4 compression patch here that basically has:

1. Documentation
2. pgindent run over it.
3. your comments addressed for using "+="

I have not included the compression level per your comment below:
-
> "On second thought, maybe we don't need to do this. There's a thread on
> "Teach pg_receivewal to use lz4 compression" which concluded that
> supporting different compression levels was unnecessary."
-

Regards,
Jeevan Ladhe

On Wed, Nov 17, 2021 at 3:17 AM Robert Haas  wrote:

> On Mon, Nov 15, 2021 at 2:23 PM Robert Haas  wrote:
> > Yeah, that's what it should be doing. I'll commit a fix, thanks for
> > the report and diagnosis.
>
> Here's a new patch set.
>
> 0001 - When I committed the patch to add the missing 2 blocks of zero
> bytes to the tar archives generated by the server, I failed to adjust
> the documentation. So 0001 does that. This is the only new patch in
> the series. I was not sure whether to just remove the statement from
> the documentation saying that those blocks aren't included, or whether
> to mention that we used to include them and no longer do. I went for
> the latter; opinions welcome.
>
> 0002 - This adds a new COPY subprotocol for taking base backups. I've
> improved it over the previous version by adding documentation. I'm
> still seeking comments on the points I raised in
>
> http://postgr.es/m/CA+TgmobrOXbDh+hCzzVkD3weV3R-QRy3SPa=frb_rv9wf5i...@mail.gmail.com
> but what I'm leaning toward doing is committing the patch as is and
> then submitting - or maybe several patches - later to rip some this
> and a few other old things out. That way the debate - or lack thereof
> - about what to do here doesn't have to block the main patch set, and
> also, it feels safer to make removing the existing stuff a separate
> effort rather than doing it now.
>
> 0003 - This adds "server" and "blackhole" as backup targets. In this
> version, I've improved the documentation. Also, the previous version
> only let you use a backup target with -Xnone, and I realized that was
> stupid. -Xfetch is OK too. -Xstream still doesn't work, since that's
> implemented via client-side logic. I think this still needs some work
> to be committable, like adding tests, but I don't expect to make any
> major changes.
>
> 0004 - Server-side gzip compression. Similar level of maturity to 0003.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


v8-0001-LZ4-compression.patch
Description: Binary data


Re: Teach pg_receivewal to use lz4 compression

2021-11-24 Thread Jeevan Ladhe
On Wed, Nov 24, 2021 at 10:55 AM Michael Paquier 
wrote:

> On Mon, Nov 22, 2021 at 09:02:47AM -0500, Robert Haas wrote:
> > On Mon, Nov 22, 2021 at 12:46 AM Jeevan Ladhe
> >  wrote:
> >> Fair enough. But, still I have a doubt in mind what benefit would that
> >> really bring to us here, because we are immediately also freeing the
> >> lz4buf without using it anywhere.
> >
> > Yeah, I'm also doubtful about that. If we're freeng the compression
> > context, we shouldn't need to guarantee that it's in any particular
> > state before doing so. Why would any critical cleanup be part of
> > LZ4F_compressEnd() rather than LZ4F_freeCompressionContext()? The
> > point of LZ4F_compressEnd() is to make sure all of the output bytes
> > get written, and it would be stupid to force people to write the
> > output bytes even when they've decided that they no longer care about
> > them due to some error.
>
> Hmm.  I have double-checked all that, and I agree that we could just
> skip LZ4F_compressEnd() in this error code path.  From what I can see
> in the upstream code, what we have now is not broken either, but the
> compressEnd() call does some work that's not needed here.


Yes I agree that we are not broken, but as you said we are doing some
an extra bit of work here.

Regards,
Jeevan Ladhe


Re: [PATCH] improve the pg_upgrade error message

2021-12-01 Thread Jeevan Ladhe
Hi Daniel,

Was wondering if we had any barriers to getting this committed.
I believe it will be good to have this change and also it will be more in
line
with other check functions also.

Regards,
Jeevan

On Thu, Oct 21, 2021 at 3:51 PM Daniel Gustafsson  wrote:

> > On 14 Jul 2021, at 07:27, Suraj Kharage 
> wrote:
>
> > Overall patch looks good to me.
>
> Agreed, I think this is a good change and in line with how the check
> functions
> work in general.
>
> > Instead of giving suggestion about updating the pg_database catalog, can
> we give "ALTER DATABASE  ALLOW_CONNECTIONS true;" command?
>
> I would actually prefer to not give any suggestions at all, we typically
> don't
> in these error messages.  Since there are many ways to do it (dropping the
> database being one) I think leaving that to the user is per application
> style.
>
> > Also, it would be good if we give 2 spaces after full stop in an error
> message.
>
> Correct, fixed in the attached which also tweaks the language slightly to
> match
> other errors.
>
> I propose to commit the attached, which also adds a function comment while
> there, unless there are objections.
>
> --
> Daniel Gustafsson   https://vmware.com/
>
>


Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

2018-04-01 Thread Jeevan Ladhe
Hi,

I noticed that there were no tests covering this case causing 4dba331cb3
> to not notice this failure in the first place.  I updated your patch to
> add a few tests.  Also, I revised the comment changed by your patch a bit.
>

1. A minor typo:

+-- check that violating rows are correctly reported when attching as the
s/attching/attaching


2. I think following part of the test is already covered:

+-- trying to add a partition for 2 should fail because the default
+-- partition contains a row that would violate its new constraint which
+-- prevents rows containing 2
+create table defpart_attach_test2 partition of defpart_attach_test for
values in (2);
+ERROR:  updated partition constraint for default partition
"defpart_attach_test_d" would be violated by some row
+drop table defpart_attach_test;

IIUC, the test in create_table covers the same scenario as of above:

-- check default partition overlap
INSERT INTO list_parted2 VALUES('X');
CREATE TABLE fail_part PARTITION OF list_parted2 FOR VALUES IN ('W', 'X',
'Y');
ERROR:  updated partition constraint for default partition
"list_parted2_def" would be violated by some row

Regards,
Jeevan Ladhe


Re: let's kill AtSubStart_Notify

2019-09-27 Thread Jeevan Ladhe
Hi Robert,

Generally, a subsystem can avoid needing a callback at subtransaction
> start (or transaction start) by detecting new levels of
> subtransactions at time of use.


Yes I agree with this argument.


> A typical practice is to maintain a
> stack which has entries only for those transaction nesting levels
> where the functionality was used. The attached patch implements this
> method for async.c.


I have reviewed your patch, and it seems correctly implementing the
actions per subtransactions using stack. Atleast I could not find
any flaw with your implementation here.


> I was a little surprised to find that it makes a
> pretty noticeable performance difference when starting and ending
> trivial subtransactions.  I used this test case:
>
> \timing
> do $$begin for i in 1 .. 1000 loop begin null; exception when
> others then null; end; end loop; end;$$;
>

I ran your testcase and on my VM I get numbers like 3593.801 ms
without patch and 3593.801 with the patch, average of 5 runs each.
The runs were quite consistent.

Further make check also passing well.

Regards,
Jeevan Ladhe


Re: let's kill AtSubStart_Notify

2019-09-27 Thread Jeevan Ladhe
>
> I did not read the patch but run the same case what you have given and
> I can see the similar improvement with the patch.
> With the patch 8832.988, without the patch 10252.701ms (median of three
> reading)
>

Possibly you had debug symbols enabled? With debug symbols enabled
I also get about similar number 10136.839 with patch vs 12900.044 ms
without the patch.

Regards,
Jeevan Ladhe


Re: let's kill AtSubStart_Notify

2019-09-27 Thread Jeevan Ladhe
Correction -

On Fri, Sep 27, 2019 at 3:11 PM Jeevan Ladhe 
wrote:

> I ran your testcase and on my VM I get numbers like 3593.801 ms
> without patch and 3593.801 with the patch, average of 5 runs each.
> The runs were quite consistent.
>

 3593.801 ms without patch and 3213.809 with the patch,
approx. 10% gain.

Regards,
Jeevan Ladhe


Re: WIP/PoC for parallel backup

2019-10-16 Thread Jeevan Ladhe
I quickly tried to have a look at your 0001-refactor patch.
Here are some comments:

1. The patch fails to compile.

Sorry if I am missing something, but am not able to understand why in new
function collectTablespaces() you have added an extra parameter NULL while
calling sendTablespace(), it fails the compilation :

+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;


gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
-Wno-unused-command-line-argument -g -g -O0 -Wall -Werror
-I../../../../src/include-c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Po
xlog.c:12253:59: error: too many arguments to function call, expected 2,
have 3
ti->size = infotbssize ? sendTablespace(fullpath, true,
NULL) : -1;
 ~~ ^~~~

2. I think the patch needs to run via pg_indent. It does not follow 80
column
width.
e.g.

+void
+collectTablespaces(List **tablespaces, StringInfo tblspcmapfile, bool
infotbssize, bool needtblspcmapfile)
+{

3.
The comments in re-factored code appear to be redundant. example:
Following comment:
 /* Setup and activate network throttling, if client requested it */
appears thrice in the code, before calling setup_throttle(), in the
prologue of
the function setup_throttle(), and above the if() in that function.
Similarly - the comment:
/* Collect information about all tablespaces */
in collectTablespaces().

4.
In function include_wal_files() why is the parameter TimeLineID i.e. endtli
needed. I don't see it being used in the function at all. I think you can
safely
get rid of it.

+include_wal_files(XLogRecPtr endptr, TimeLineID endtli)

Regards,
Jeevan Ladhe

On Wed, Oct 16, 2019 at 6:49 PM Asif Rehman  wrote:

>
>
> On Mon, Oct 7, 2019 at 6:35 PM Asif Rehman  wrote:
>
>>
>>
>> On Mon, Oct 7, 2019 at 6:05 PM Robert Haas  wrote:
>>
>>> On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman 
>>> wrote:
>>> > Sure. Though the backup manifest patch calculates and includes the
>>> checksum of backup files and is done
>>> > while the file is being transferred to the frontend-end. The manifest
>>> file itself is copied at the
>>> > very end of the backup. In parallel backup, I need the list of
>>> filenames before file contents are transferred, in
>>> > order to divide them into multiple workers. For that, the manifest
>>> file has to be available when START_BACKUP
>>> >  is called.
>>> >
>>> > That means, backup manifest should support its creation while
>>> excluding the checksum during START_BACKUP().
>>> > I also need the directory information as well for two reasons:
>>> >
>>> > - In plain format, base path has to exist before we can write the
>>> file. we can extract the base path from the file
>>> > but doing that for all files does not seem a good idea.
>>> > - base backup does not include the content of some directories but
>>> those directories although empty, are still
>>> > expected in PGDATA.
>>> >
>>> > I can make these changes part of parallel backup (which would be on
>>> top of backup manifest patch) or
>>> > these changes can be done as part of manifest patch and then parallel
>>> can use them.
>>> >
>>> > Robert what do you suggest?
>>>
>>> I think we should probably not use backup manifests here, actually. I
>>> initially thought that would be a good idea, but after further thought
>>> it seems like it just complicates the code to no real benefit.
>>
>>
>> Okay.
>>
>>
>>>   I
>>> suggest that the START_BACKUP command just return a result set, like a
>>> query, with perhaps four columns: file name, file type ('d' for
>>> directory or 'f' for file), file size, file mtime. pg_basebackup will
>>> ignore the mtime, but some other tools might find that useful
>>> information.
>>>
>> yes current patch already returns the result set. will add the additional
>> information.
>>
>>
>>> I wonder if we should also split START_BACKUP (which should enter
>>> non-exclusive backup mode) from GET_FILE_LIST, in case some other
>>> client program wants to use one of those but not the other.  I think
>>> that's probably a good idea, but not sure.
>>>
>>
>> Currently pg_basebackup does not enter in exclusive backup mode and other
>> tools have to
>> use pg_start_backup() and pg_stop_backup() func

Re: pgbench - refactor init functions with buffers

2019-10-22 Thread Jeevan Ladhe
>
>
> I haven't read the complete patch.  But, I have noticed that many
> places you changed the variable declaration from c to c++ style (i.e
> moved the declaration in the for loop).  IMHO, generally in PG, we
> don't follow this convention.  Is there any specific reason to do
> this?
>

+1.

The patch does not apply on master, needs rebase.
Also, I got some whitespace errors.

I think you can also refactor the function tryExecuteStatement(), and
call your newly added function executeStatementExpect() by passing
an additional flag something like "errorOK".

Regards,
Jeevan Ladhe


Re: pgbench - refactor init functions with buffers

2019-10-22 Thread Jeevan Ladhe
On Tue, Oct 22, 2019 at 4:36 PM Fabien COELHO  wrote:

>
> Hello Jeevan,
>
> >> I haven't read the complete patch.  But, I have noticed that many
> >> places you changed the variable declaration from c to c++ style (i.e
> >> moved the declaration in the for loop).  IMHO, generally in PG, we
> >> don't follow this convention.  Is there any specific reason to do
> >> this?
> >
> > +1.
>
> As I said, this C99 feature is already used extensively in pg sources, so
> it makes sense to use it when refactoring something and if appropriate,
> which IMO is the case here.


Ok, no problem.


>
>
> The patch does not apply on master, needs rebase.
>
> Hmmm. "git apply pgbench-buffer-1.patch" works for me on current master.
>
> > Also, I got some whitespace errors.
>
> It possible, but I cannot see any. Could you be more specific?
>

For me it failing, see below:

$ git log -1
commit ad4b7aeb84434c958e2df76fa69b68493a889e4a
Author: Peter Eisentraut 
Date:   Tue Oct 22 10:35:54 2019 +0200

Make command order in test more sensible

Through several updates, the CREATE USER command has been separated
from where the user is actually used in the test.

$ git apply pgbench-buffer-1.patch
pgbench-buffer-1.patch:10: trailing whitespace.
static void append_fillfactor(PQExpBuffer query);
pgbench-buffer-1.patch:18: trailing whitespace.
executeStatementExpect(PGconn *con, const char *sql, const ExecStatusType
expected)
pgbench-buffer-1.patch:19: trailing whitespace.
{
pgbench-buffer-1.patch:20: trailing whitespace.
PGresult   *res;
pgbench-buffer-1.patch:21: trailing whitespace.

error: patch failed: src/bin/pgbench/pgbench.c:599
error: src/bin/pgbench/pgbench.c: patch does not apply

$

Regards,
Jeevan Ladhe


Re: pgbench - refactor init functions with buffers

2019-10-23 Thread Jeevan Ladhe
I am able to apply the v2 patch with "patch -p1 "

-

+static void
+executeStatementExpect(PGconn *con, const char *sql, const ExecStatusType
expected, bool errorOK)
+{

I think some instances like this need 80 column alignment?

-

in initCreatePKeys():
+ for (int i = 0; i < lengthof(DDLINDEXes); i++)
+ {
+ resetPQExpBuffer(&query);
+ appendPQExpBufferStr(&query, DDLINDEXes[i]);

I think you can simply use printfPQExpBuffer() for the first append,
similar to
what you have used in createPartitions(), which is a combination of both
reset
and append.

-

The pgbench tap tests are also running fine.

Regards,
Jeevan Ladhe

On Tue, Oct 22, 2019 at 8:57 PM Fabien COELHO  wrote:

>
> >> The patch does not apply on master, needs rebase.
> >>
> >> Hmmm. "git apply pgbench-buffer-1.patch" works for me on current master.
> >>
> >>> Also, I got some whitespace errors.
> >>
> >> It possible, but I cannot see any. Could you be more specific?
> >
> > For me it failing, see below:
> >
> > $ git log -1
> > commit ad4b7aeb84434c958e2df76fa69b68493a889e4a
>
> Same for me, but it works:
>
>Switched to a new branch 'test'
>sh> git apply ~/pgbench-buffer-2.patch
>sh> git st
> On branch test
> Changes not staged for commit: ...
>  modified:   src/bin/pgbench/pgbench.c
>
>sh> file ~/pgbench-buffer-2.patch
>.../pgbench-buffer-2.patch: unified diff output, ASCII text
>
>sh> sha1sum ~/pgbench-buffer-2.patch
>eab8167ef3ec5eca814c44b30e07ee5631914f07 ...
>
> I suspect that your mailer did or did not do something with the
> attachment. Maybe try with "patch -p1 < foo.patch" at the root.
>
> --
> Fabien.
>


Re: block-level incremental backup

2019-07-23 Thread Jeevan Ladhe
Hi Vignesh,

This backup technology is extending the pg_basebackup itself, which means
we can
still take online backups. This is internally done using pg_start_backup and
pg_stop_backup. pg_start_backup performs a checkpoint, and this checkpoint
is
used in the recovery process while starting the cluster from a backup
image. What
incremental backup will just modify (as compared to traditional
pg_basebackup)
is - After doing the checkpoint, instead of copying the entire relation
files,
it takes an input LSN and scan all the blocks in all relation files, and
store
the blocks having LSN >= InputLSN. This means it considers all the changes
that are already written into relation files including insert/update/delete
etc
up to the checkpoint performed by pg_start_backup internally, and as Jeevan
Chalke
mentioned upthread the incremental backup will also contain copy of WAL
files.
Once this incremental backup is combined with the parent backup by means of
new
combine process (that will be introduced as part of this feature itself)
should
ideally look like a full pg_basebackup. Note that any changes done by these
insert/delete/update operations while the incremental backup was being taken
will be still available via WAL files and as normal restore process, will be
replayed from the checkpoint onwards up to a consistent point.

My two cents!

Regards,
Jeevan Ladhe

On Sat, Jul 20, 2019 at 11:22 PM vignesh C  wrote:

> Hi Jeevan,
>
> The idea is very nice.
> When Insert/update/delete and truncate/drop happens at various
> combinations, How the incremental backup handles the copying of the
> blocks?
>
>
> On Wed, Jul 17, 2019 at 8:12 PM Jeevan Chalke
>  wrote:
> >
> >
> >
> > On Wed, Jul 17, 2019 at 7:38 PM Ibrar Ahmed 
> wrote:
> >>
> >>
> >>
> >> On Wed, Jul 17, 2019 at 6:43 PM Jeevan Chalke <
> jeevan.cha...@enterprisedb.com> wrote:
> >>>
> >>> On Wed, Jul 17, 2019 at 2:15 PM Ibrar Ahmed 
> wrote:
> >>>>
> >>>>
> >>>> At what stage you will apply the WAL generated in between the
> START/STOP backup.
> >>>
> >>>
> >>> In this design, we are not touching any WAL related code. The WAL
> files will
> >>> get copied with each backup either full or incremental. And thus, the
> last
> >>> incremental backup will have the final WAL files which will be copied
> as-is
> >>> in the combined full-backup and they will get apply automatically if
> that
> >>> the data directory is used to start the server.
> >>
> >>
> >> Ok, so you keep all the WAL files since the first backup, right?
> >
> >
> > The WAL files will anyway be copied while taking a backup (full or
> incremental),
> > but only last incremental backup's WAL files are copied to the combined
> > synthetic full backup.
> >
> >>>
> >>>>
> >>>> --
> >>>> Ibrar Ahmed
> >>>
> >>>
> >>> --
> >>> Jeevan Chalke
> >>> Technical Architect, Product Development
> >>> EnterpriseDB Corporation
> >>>
> >>
> >>
> >> --
> >> Ibrar Ahmed
> >
> >
> >
> > --
> > Jeevan Chalke
> > Technical Architect, Product Development
> > EnterpriseDB Corporation
> >
>
>
> --
> Regards,
> vignesh
> EnterpriseDB: http://www.enterprisedb.com
>
>
>


Re: block-level incremental backup

2019-07-25 Thread Jeevan Ladhe
Hi Vignesh,

Please find my comments inline below:

1) If relation file has changed due to truncate or vacuum.
> During incremental backup the new files will be copied.
> There are chances that both the old  file and new file
> will be present. I'm not sure if cleaning up of the
> old file is handled.
>

When an incremental backup is taken it either copies the file in its
entirety if
a file is changed more than 90%, or writes .partial with changed blocks
bitmap
and actual data. For the files that are unchanged, it writes 0 bytes and
still
creates a .partial file for unchanged files too. This means there is a
.partitial
file for all the files that are to be looked up in full backup.
While composing a synthetic backup from incremental backup the
pg_combinebackup
tool will only look for those relation files in full(parent) backup which
are
having .partial files in the incremental backup. So, if vacuum/truncate
happened
between full and incremental backup, then the incremental backup image will
not
have a 0-length .partial file for that relation, and so the synthetic backup
that is restored using pg_combinebackup will not have that file as well.


> 2) Just a small thought on building the bitmap,
> can the bitmap be built and maintained as
> and when the changes are happening in the system.
> If we are building the bitmap while doing the incremental backup,
> Scanning through each file might take more time.
> This can be a configurable parameter, the system can run
> without capturing this information by default, but if there are some
> of them who will be taking incremental backup frequently this
> configuration can be enabled which should track the modified blocks.


IIUC, this will need changes in the backend. Honestly, I think backup is a
maintenance task and hampering the backend for this does not look like a
good
idea. But, having said that even if we have to provide this as a switch for
some
of the users, it will need a different infrastructure than what we are
building
here for constructing bitmap, where we scan all the files one by one. Maybe
for
the initial version, we can go with the current proposal that Robert has
suggested,
and add this switch at a later point as an enhancement.
- My thoughts.

Regards,
Jeevan Ladhe


concerns around pg_lsn

2019-07-29 Thread Jeevan Ladhe
Hi all,

While reviewing some code around pg_lsn_in() I came across a couple of
(potential?) issues:

1.
Commit 21f428eb moves lsn conversion functionality from pg_lsn_in() to a new
function pg_lsn_in_internal(). It takes two parameters the lsn string and a
pointer to a boolean (*have_error) to indicate if there was an error while
converting string format to XLogRecPtr.

pg_lsn_in_internal() only sets the *have_error to 'true' if there is an
error,
but leaves it for the caller to make sure it was passed by initializing as
'false'. Currently it is only getting called from pg_lsn_in() and
timestamptz_in()
where it has been taken care that the flag is set to false before making the
call. But I think in general it opens the door for unpredictable bugs if
pg_lsn_in_internal() gets called from other locations in future (if need
maybe) and by mistake, it just checks the return value of the flag without
setting it to false before making a call.

I am attaching a patch that makes sure that *have_error is set to false in
pg_lsn_in_internal() itself, rather than being caller dependent.

Also, I think there might be callers who may not care if there had been an
error
while converting and just ok to use InvalidXLogRecPtr against return value,
and
may pass just a NULL boolean pointer to this function, but for now, I have
left
that untouched. Maybe just adding an Assert would improve the situation for
time being.

I have attached a patch (fix_have_error_flag.patch) to take care of above.

2.
I happened to peep in test case pg_lsn.sql, and I started exploring the
macros
around lsn.

Following macros:

{code}
/*
 * Zero is used indicate an invalid pointer. Bootstrap skips the first
possible
 * WAL segment, initializing the first WAL page at WAL segment size, so no
XLOG
 * record can begin at zero.

 */
#define InvalidXLogRecPtr   0
#define XLogRecPtrIsInvalid(r)  ((r) == InvalidXLogRecPtr)
{code}

IIUC, in the comment above we clearly want to mark 0 as an invalid lsn (also
further IIUC the comment states - lsn would start from (walSegSize + 1)).
Given
this, should not it be invalid to allow "0/0" as the value of type pg_lsn,
or
for that matter any number < walSegSize?

There is a test scenario in test case pg_lsn.sql which tests insertion of
"0/0"
in a table having a pg_lsn column. I think this is contradictory to the
comment.

I am not sure of thought behind this and might be wrong while making the
above
assumption. But, I tried to look around a bit in hackers emails and could
not
locate any related discussion.

I have attached a patch (mark_lsn_0_invalid.patch) that makes above changes.

Thoughts?

Regards,
Jeevan Ladhe


fix_have_error_flag.patch
Description: Binary data


mark_lsn_0_invalid.patch
Description: Binary data


Re: block-level incremental backup

2019-07-29 Thread Jeevan Ladhe
Hi Jeevan


I reviewed first two patches -


0001-Add-support-for-command-line-option-to-pass-LSN.patch and

0002-Add-TAP-test-to-test-LSN-option.patch


from the set of incremental backup patches, and the changes look good to me.


I had some concerns around the way we are working around with the fact that

pg_lsn_in() accepts the lsn with 0 as a valid lsn and I think that itself is

contradictory to the definition of InvalidXLogRecPtr. I have started a
separate

new thread[1] for the same.


Also, I observe that now commit 21f428eb, has already moved the lsn decoding

logic to a separate function pg_lsn_in_internal(), so the function

decode_lsn_internal() from patch 0001 will go away and the dependent code
needs

to be modified.


I shall review the rest of the patches, and post the comments.


Regards,

Jeevan Ladhe


[1]
https://www.postgresql.org/message-id/CAOgcT0NOM9oR0Hag_3VpyW0uF3iCU=bdufspfk9jrwxrcwq...@mail.gmail.com

On Thu, Jul 11, 2019 at 5:00 PM Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

> Hi Anastasia,
>
> On Wed, Jul 10, 2019 at 11:47 PM Anastasia Lubennikova <
> a.lubennik...@postgrespro.ru> wrote:
>
>> 23.04.2019 14:08, Anastasia Lubennikova wrote:
>> > I'm volunteering to write a draft patch or, more likely, set of
>> > patches, which
>> > will allow us to discuss the subject in more detail.
>> > And to do that I wish we agree on the API and data format (at least
>> > broadly).
>> > Looking forward to hearing your thoughts.
>>
>> Though the previous discussion stalled,
>> I still hope that we could agree on basic points such as a map file
>> format and protocol extension,
>> which is necessary to start implementing the feature.
>>
>
> It's great that you too come up with the PoC patch. I didn't look at your
> changes in much details but we at EnterpriseDB too working on this feature
> and started implementing it.
>
> Attached series of patches I had so far... (which needed further
> optimization and adjustments though)
>
> Here is the overall design (as proposed by Robert) we are trying to
> implement:
>
> 1. Extend the BASE_BACKUP command that can be used with replication
> connections. Add a new [ LSN 'lsn' ] option.
>
> 2. Extend pg_basebackup with a new --lsn=LSN option that causes it to send
> the option added to the server in #1.
>
> Here are the implementation details when we have a valid LSN
>
> sendFile() in basebackup.c is the function which mostly does the thing for
> us. If the filename looks like a relation file, then we'll need to consider
> sending only a partial file. The way to do that is probably:
>
> A. Read the whole file into memory.
>
> B. Check the LSN of each block. Build a bitmap indicating which blocks
> have an LSN greater than or equal to the threshold LSN.
>
> C. If more than 90% of the bits in the bitmap are set, send the whole file
> just as if this were a full backup. This 90% is a constant now; we might
> make it a GUC later.
>
> D. Otherwise, send a file with .partial added to the name. The .partial
> file contains an indication of which blocks were changed at the beginning,
> followed by the data blocks. It also includes a checksum/CRC.
> Currently, a .partial file format looks like:
>  - start with a 4-byte magic number
>  - then store a 4-byte CRC covering the header
>  - then a 4-byte count of the number of blocks included in the file
>  - then the block numbers, each as a 4-byte quantity
>  - then the data blocks
>
>
> We are also working on combining these incremental back-ups with the full
> backup and for that, we are planning to add a new utility called
> pg_combinebackup. Will post the details on that later once we have on the
> same page for taking backup.
>
> Thanks
> --
> Jeevan Chalke
> Technical Architect, Product Development
> EnterpriseDB Corporation
>
>


Re: concerns around pg_lsn

2019-07-30 Thread Jeevan Ladhe
Hi Michael,

Thanks for your inputs, really appreciate.

On Tue, Jul 30, 2019 at 9:42 AM Michael Paquier  wrote:

> On Mon, Jul 29, 2019 at 10:55:29PM +0530, Jeevan Ladhe wrote:
> > I am attaching a patch that makes sure that *have_error is set to false
> in
> > pg_lsn_in_internal() itself, rather than being caller dependent.
>
> Agreed about making the code more defensive as you do.  I would keep
> the initialization in check_recovery_target_lsn and pg_lsn_in_internal
> though.  That does not hurt and makes the code easier to understand,
> aka we don't expect an error by default in those paths.
>

Sure, understood. I am ok with this.

> IIUC, in the comment above we clearly want to mark 0 as an invalid lsn
> (also
> > further IIUC the comment states - lsn would start from (walSegSize + 1)).
> > Given this, should not it be invalid to allow "0/0" as the value of
> > type pg_lsn, or for that matter any number < walSegSize?
>
> You can rely on "0/0" as a base point to calculate the offset in a
> segment, so my guess is that we could break applications by generating
> an error.


Agree that it may break the applications.

Please note that the behavior is much older than the
> introduction of pg_lsn, as the original parsing logic has been removed
> in 6f289c2b with validate_xlog_location() in xlogfuncs.c.
>

My only concern was something that we internally treat as invalid, why do
we allow, that as a valid value for that type. While I am not trying to
reinvent the wheel here, I am trying to understand if there had been any
idea behind this and I am missing it.

Regards,
Jeevan Ladhe


Re: concerns around pg_lsn

2019-07-31 Thread Jeevan Ladhe
On Tue, Jul 30, 2019 at 6:06 PM Robert Haas  wrote:

> On Tue, Jul 30, 2019 at 4:52 AM Jeevan Ladhe
>  wrote:
> > My only concern was something that we internally treat as invalid, why do
> > we allow, that as a valid value for that type. While I am not trying to
> > reinvent the wheel here, I am trying to understand if there had been any
> > idea behind this and I am missing it.
>
> Well, the word "invalid" can mean more than one thing.  Something can
> be valid or invalid depending on context.  I can't have -2 dollars in
> my wallet, but I could have -2 dollars in my bank account, because the
> bank will allow me to pay out slightly more money than I actually have
> on the idea that I will pay them back later (and with interest!).  So
> as an amount of money in my wallet, -2 is invalid, but as an amount of
> money in my bank account, it is valid.
>
> 0/0 is not a valid LSN in the sense that (in current releases) we
> never write a WAL record there, but it's OK to compute with it.
> Subtracting '0/0'::pg_lsn seems useful as a way to convert an LSN to
> an absolute byte-index in the WAL stream.
>

Thanks Robert for such a nice and detailed explanation.
I now understand why LSN '0/0' can still be useful.

Regards,
Jeevan Ladhe


Re: concerns around pg_lsn

2019-07-31 Thread Jeevan Ladhe
Hi Michael,


> On further review of the first patch, I think that it could be a good
> idea to apply the same safeguards within float8in_internal_opt_error.
> Jeevan, what do you think?
>

Sure, agree, it makes sense to address float8in_internal_opt_error(),
there might be more occurrences of such instances in other functions
as well. I think if we agree, as and when encounter them while touching
those areas we should fix them.

What is more dangerous with float8in_internal_opt_error() is, it has
the have_error flag, which is never ever set or used in that function.
Further
more risks are - the callers of this function e.g.
executeItemOptUnwrapTarget()
are passing a non-null pointer to it(default set to false) and expect to
throw
an error if it sees some error during float8in_internal_opt_error(), *but*
float8in_internal_opt_error() has actually never touched the have_error
flag.
So, in this case it is fine because the flag was set to false, if it was not
set, then the garbage value would always result in true and keep on throwing
an error!
Here is relevant code from function executeItemOptUnwrapTarget():

{code}
 975 if (jb->type == jbvNumeric)
 976 {
 977 char   *tmp =
DatumGetCString(DirectFunctionCall1(numeric_out,
 978
NumericGetDatum(jb->val.numeric)));
 979 boolhave_error = false;
 980
 981 (void) float8in_internal_opt_error(tmp,
 982NULL,
 983"double
precision",
 984tmp,
 985&have_error);
 986
 987 if (have_error)
 988 RETURN_ERROR(ereport(ERROR,
 989
 (errcode(ERRCODE_NON_NUMERIC_JSON_ITEM),
 990   errmsg("jsonpath item
method .%s() can only be applied to a numeric value",
 991
 jspOperationName(jsp->type);
 992 res = jperOk;
 993 }
 994 else if (jb->type == jbvString)
 995 {
 996 /* cast string as double */
 997 double  val;
 998 char   *tmp = pnstrdup(jb->val.string.val,
 999jb->val.string.len);
1000 boolhave_error = false;
1001
1002 val = float8in_internal_opt_error(tmp,
1003   NULL,
1004   "double
precision",
1005   tmp,
1006   &have_error);
1007
1008 if (have_error || isinf(val))
1009 RETURN_ERROR(ereport(ERROR,
1010
 (errcode(ERRCODE_NON_NUMERIC_JSON_ITEM),
1011   errmsg("jsonpath item
method .%s() can only be applied to a numeric value",
1012
 jspOperationName(jsp->type);
1013
1014 jb = &jbv;
1015 jb->type = jbvNumeric;
1016 jb->val.numeric =
DatumGetNumeric(DirectFunctionCall1(float8_numeric,
1017
Float8GetDatum(val)));
1018 res = jperOk;
1019 }
{code}

I will further check if by mistake any further commits have removed
references
to assignments from float8in_internal_opt_error(), evaluate it, and set out
a
patch.

This is one of the reason, I was saying it can be taken as a good practice
to
let the function who is accepting an out parameter sets the value for sure
to
some or other value.

Regards,
Jeevan Ladhe


Re: concerns around pg_lsn

2019-07-31 Thread Jeevan Ladhe
Hi Michael,

What is more dangerous with float8in_internal_opt_error() is, it has
> the have_error flag, which is never ever set or used in that function.
> Further
> more risks are - the callers of this function e.g.
> executeItemOptUnwrapTarget()
> are passing a non-null pointer to it(default set to false) and expect to
> throw
> an error if it sees some error during float8in_internal_opt_error(), *but*
> float8in_internal_opt_error() has actually never touched the have_error
> flag.
>

My bad, I see there's this macro call in float8in_internal_opt_error() and
that
set the flag:

{code}
#define RETURN_ERROR(throw_error) \
do { \
if (have_error) { \
*have_error = true; \
return 0.0; \
} else { \
throw_error; \
} \
} while (0)
{code}

My patch on way, thanks.

Regards,
Jeevan Ladhe


Re: concerns around pg_lsn

2019-08-01 Thread Jeevan Ladhe
Hi Michael,

> I will further check if by mistake any further commits have removed
> > references to assignments from float8in_internal_opt_error(),
> > evaluate it, and set out a patch.
>
> Thanks, Jeevan!
>

Here is a patch that takes care of addressing the flag issue including
pg_lsn_in_internal() and others.

I have further also fixed couple of other functions,
numeric_div_opt_error() and
numeric_mod_opt_error() which are basically callers of
make_result_opt_error().

Kindly do let me know if you have any comments.

Regards,
Jeevan Ladhe


0001-Make-the-have_error-flag-initialization-more-defensi.patch
Description: Binary data


Re: concerns around pg_lsn

2019-08-01 Thread Jeevan Ladhe
Hi Michael,

On Thu, Aug 1, 2019 at 1:51 PM Michael Paquier  wrote:

> On Thu, Aug 01, 2019 at 12:39:26PM +0530, Jeevan Ladhe wrote:
> > Here is a patch that takes care of addressing the flag issue including
> > pg_lsn_in_internal() and others.
>
> Your original patch for pg_lsn_in_internal() was right IMO, and the
> new one is not.  In the numeric and float code paths, we have this
> kind of pattern:
> if (have_error)
> {
> *have_error = true;
> return;
> }
> else
> elog(ERROR, "Boom. Show is over.");
>
> But the pg_lsn.c portion does not have that.  have_error cannot be
> NULL or the caller may fall into the trap of setting it to NULL and
> miss some errors at parsing-time.  So I think that keeping the
> assertion on (have_error != NULL) is necessary.
>

Thanks for your concern.

In pg_lsn_in_internal() changes, the caller will get the invalid lsn
in case there are errors:

{code}
if (len1 < 1 || len1 > MAXPG_LSNCOMPONENT || str[len1] != '/')
{
if (have_error)
*have_error = true;

return InvalidXLogRecPtr;
}
{code}

The only thing is that, if the caller cares about the error during
the parsing or not. For some callers just making sure if the given
string was valid lsn or not might be ok, and the return value
'InvalidXLogRecPtr' will tell that. That caller may not unnecessary
declare the flag and pass a pointer to it.

Regards,
Jeevan Ladhe


Re: concerns around pg_lsn

2019-08-03 Thread Jeevan Ladhe
Sure Michael, in the attached patch I have reverted the checks from
pg_lsn_in_internal() and added Assert() per my original patch.

Regards,
Jeevan Ladhe


0001-Make-have_error-initialization-more-defensive-v2.patch
Description: Binary data


Re: concerns around pg_lsn

2019-08-04 Thread Jeevan Ladhe
On Sun, Aug 4, 2019 at 12:13 PM Michael Paquier  wrote:

> On Sat, Aug 03, 2019 at 11:57:01PM -0400, Alvaro Herrera wrote:
> > Can we please change the macro definition so that have_error is one of
> > the arguments?  Having the variable be used inside the macro definition
> > but not appear literally in the call is quite confusing.
>

Can't agree more. This is where I also got confused initially and thought
the flag is unused.

Good idea.  This needs some changes only in float.c.


Please find attached patch with the changes to RETURN_ERROR and
it's references in float.c

Regards,
Jeevan Ladhe


0001-Make-have_error-initialization-more-defensive-v3.patch
Description: Binary data


Re: concerns around pg_lsn

2019-08-04 Thread Jeevan Ladhe
>
> Thanks.  Committed after applying some tweaks to it.  I have noticed
> that you forgot numeric_int4_opt_error() in the set.


Oops. Thanks for the commit, Michael.

Regards,
Jeevan Ladhe


Re: block-level incremental backup

2019-08-08 Thread Jeevan Ladhe
Hi Jeevan,

I have reviewed the backup part at code level and still looking into the
restore(combine) and functional part of it. But, here are my comments so
far:

The patches need rebase.

+   if (!XLogRecPtrIsInvalid(previous_lsn))
+   appendStringInfo(labelfile, "PREVIOUS WAL LOCATION: %X/%X\n",
+(uint32) (previous_lsn >> 32), (uint32)
previous_lsn);

May be we should rename to something like:
"INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP START
LOCATION"
to make it more intuitive?



+typedef struct

+{

+   uint32  magic;

+   pg_crc32c   checksum;

+   uint32  nblocks;

+   uint32  blocknumbers[FLEXIBLE_ARRAY_MEMBER];

+} partial_file_header;


File header structure is defined in both the files basebackup.c and
pg_combinebackup.c. I think it is better to move this to
replication/basebackup.h.



+   boolisrelfile = false;

I think we can avoid having flag isrelfile in sendFile().
Something like this:

if (startincrptr && OidIsValid(dboid) && looks_like_rel_name(filename))
{
//include the code here that is under "if (isrelfile)" block.
}
else
{
_tarWriteHeader(tarfilename, NULL, statbuf, false);
while ((cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp))
> 0)
{
...
}
}



Also, having isrelfile as part of following condition:
{code}
+   while (!isrelfile &&
+  (cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len),
fp)) > 0)
{code}

is confusing, because even the relation files in full backup are going to be
backed up by this loop only, but still, the condition reads '(!isrelfile
&&...)'.



verify_page_checksum()
{
while(1)
{

break;
}
}

IMHO, while labels are not advisable in general, it may be better to use a
label
here rather than a while(1) loop, so that we can move to the label in case
we
want to retry once. I think here it opens doors for future bugs if someone
happens to add code here, ending up adding some condition and then the
break becomes conditional. That will leave us in an infinite loop.



+/* magic number in incremental backup's .partial file */
+#define INCREMENTAL_BACKUP_MAGIC   0x494E4352

Similar to structure partial_file_header, I think above macro can also be
moved
to basebackup.h instead of defining it twice.



In sendFile():

+   buf = (char *) malloc(RELSEG_SIZE * BLCKSZ);

I think this is a huge memory request (1GB) and may fail on busy/loaded
server at
times. We should check for failures of malloc, maybe throw some error on
getting ENOMEM as errno.



+   /* Perform incremenatl backup stuff here. */
+   if ((cnt = fread(buf, 1, Min(RELSEG_SIZE * BLCKSZ,
statbuf->st_size), fp)) > 0)
+   {

Here, should not we expect statbuf->st_size < (RELSEG_SIZE * BLCKSZ), and it
should be safe to read just statbuf_st_size always I guess? But, I am ok
with
having this extra guard here.



In sendFile(), I am sorry if I am missing something, but I am not able to
understand why 'cnt' and 'i' should have different values when they are
being
passed to verify_page_checksum(). I think passing only one of them should be
sufficient.



+   XLogRecPtr  pglsn;
+
+   for (i = 0; i < cnt / BLCKSZ; i++)
+   {

Maybe we should just have a variable no_of_blocks to store a number of
blocks,
rather than calculating this say RELSEG_SIZE(i.e. 131072) times in the worst
case.


+   len += cnt;
+   throttle(cnt);
+   }

Sorry if I am missing something, but, should not it be just:

len = cnt;



As I said earlier in my previous email, we now do not need
+decode_lsn_internal()
as it is already taken care by the introduction of function
pg_lsn_in_internal().

Regards,
Jeevan Ladhe


Re: block-level incremental backup

2019-08-09 Thread Jeevan Ladhe
Hi Robert,

On Fri, Aug 9, 2019 at 6:40 PM Robert Haas  wrote:

> On Thu, Aug 8, 2019 at 8:37 PM Jeevan Ladhe
>  wrote:
> > +   if (!XLogRecPtrIsInvalid(previous_lsn))
> > +   appendStringInfo(labelfile, "PREVIOUS WAL LOCATION: %X/%X\n",
> > +(uint32) (previous_lsn >> 32), (uint32)
> previous_lsn);
> >
> > May be we should rename to something like:
> > "INCREMENTAL BACKUP START WAL LOCATION" or simply "INCREMENTAL BACKUP
> START LOCATION"
> > to make it more intuitive?
>
> So, I think that you are right that PREVIOUS WAL LOCATION might not be
> entirely clear, but at least in my view, INCREMENTAL BACKUP START WAL
> LOCATION is definitely not clear.  This backup is an incremental
> backup, and it has a start WAL location, so you'd end up with START
> WAL LOCATION and INCREMENTAL BACKUP START WAL LOCATION and those sound
> like they ought to both be the same thing, but they're not.  Perhaps
> something like REFERENCE WAL LOCATION or REFERENCE WAL LOCATION FOR
> INCREMENTAL BACKUP would be clearer.
>

Agree, how about INCREMENTAL BACKUP REFERENCE WAL LOCATION ?


> > File header structure is defined in both the files basebackup.c and
> > pg_combinebackup.c. I think it is better to move this to
> replication/basebackup.h.
>
> Or some other header, but yeah, definitely don't duplicate the struct
> definition (or any other kind of definition).
>

Thanks.


> > IMHO, while labels are not advisable in general, it may be better to use
> a label
> > here rather than a while(1) loop, so that we can move to the label in
> case we
> > want to retry once. I think here it opens doors for future bugs if
> someone
> > happens to add code here, ending up adding some condition and then the
> > break becomes conditional. That will leave us in an infinite loop.
>
> I'm not sure which style is better here, but I don't really buy this
> argument.


No issues. I am ok either way.

Regards,
Jeevan Ladhe


Re: basebackup.c's sendFile() ignores read errors

2019-08-29 Thread Jeevan Ladhe
Hi Jeevan,

On Wed, Aug 28, 2019 at 10:26 PM Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
>
> On Tue, Aug 27, 2019 at 10:33 PM Robert Haas 
> wrote:
>
>> While reviewing a proposed patch to basebackup.c this morning, I found
>> myself a bit underwhelmed by the quality of the code and comments in
>> basebackup.c's sendFile(). I believe it's already been pointed out
>> that the the retry logic here is not particularly correct, and the
>> comments demonstrate a pretty clear lack of understanding of how the
>> actual race conditions that are possible here might operate.
>>
>> However, I then noticed another problem which I think is significantly
>> more serious: this code totally ignores the possibility of a failure
>> in fread().  It just assumes that if fread() fails to return a
>> positive value, it's reached the end of the file, and if that happens,
>> it just pads out the rest of the file with zeroes.  So it looks to me
>> like if fread() encountered, say, an I/O error while trying to read a
>> data file, and if that error were on the first data block, then the
>> entire contents of that file would be replaced with zero bytes in the
>> backup, and no error or warning of any kind would be given to the
>> user.  If it hit the error later, everything from the point of the
>> error onward would just get replaced with zero bytes.  To be clear, I
>> think it is fine for basebackup.c to fill out the rest of the expected
>> length with zeroes if in fact the file has been truncated during the
>> backup; recovery should fix it.  But recovery is not going to fix data
>> that got eaten because EIO was encountered while copying a file.
>>
>
> Per fread(3) manpage, we cannot distinguish between an error or EOF. So to
> check for any error we must use ferror() after fread(). Attached patch
> which
> tests ferror() and throws an error if it returns true.
>

You are right. This seems the right approach to me. I can see at least
couple
of places already using ferror() to guard errors of fread()
like CopyGetData(),
read_backup_label() etc.


> However, I think
> fread()/ferror() both do not set errno (per some web reference) and thus
> throwing a generic error here. I have manually tested it.
>

If we are interested in getting the errno, then instead of fread(), we can
use read(). But, here, in particular, we are not decision making anything
depending on errno so I think we should be fine with your current patch.


>
>> The logic that rereads a block appears to have the opposite problem.
>> Here, it will detect an error, but it will also fail in the face of a
>> concurrent truncation, which it shouldn't.
>>
>
> For this, I have checked for feof() assuming that when the file gets
> truncated
> we reach EOF. And if yes, getting out of the loop instead of throwing an
> error.
> I may be wrong as I couldn't reproduce this issue.
>

I had a look at the patch and this seems correct to me.

Few minor comments:

+ /* Check fread() error. */
+ CHECK_FREAD_ERROR(fp, pathbuf);
+

The comments above the macro call at both the places are not necessary as
your macro name itself is self-explanatory.

--
+ /*
+ * If file is truncated, then we will hit
+ * end-of-file error in which case we don't
+ * want to error out, instead just pad it with
+ * zeros.
+ */
+ if (feof(fp))

The if block does not do the truncation right away, so I think the comment
above can be reworded to explain why we reset cnt?

Regards,
Jeevan Ladhe


Re: block-level incremental backup

2019-08-29 Thread Jeevan Ladhe
Due to the inherent nature of pg_basebackup, the incremental backup also
allows taking backup in tar and compressed format. But, pg_combinebackup
does not understand how to restore this. I think we should either make
pg_combinebackup support restoration of tar incremental backup or restrict
taking the incremental backup in tar format until pg_combinebackup
supports the restoration by making option '--lsn' and '-Ft' exclusive.

It is arguable that one can take the incremental backup in tar format,
extract
that manually and then give the resultant directory as input to the
pg_combinebackup, but I think that kills the purpose of having
pg_combinebackup utility.

Thoughts?

Regards,
Jeevan Ladhe


Re: basebackup.c's sendFile() ignores read errors

2019-08-30 Thread Jeevan Ladhe
>
> Fixed both comments in the attached patch.
>

Thanks, the patch looks good to me.

Regards,
Jeevan Ladhe


Re: block-level incremental backup

2019-08-30 Thread Jeevan Ladhe
Here are some comments:


+/* The reference XLOG position for the incremental backup. */

+static XLogRecPtr refptr;

As Robert already pointed we may want to pass this as parameter around
instead
of a global variable. Also, can be renamed to something like:
incr_backup_refptr.
I see in your earlier version of patch this was named startincrptr, which I
think was more meaningful.

-

/*

 * If incremental backup, see whether the filename is a relation
filename
 * or not.

 */

Can be reworded something like:
"If incremental backup, check if it is relation file and can be sent
partially."

-

+   if (verify_checksum)
+   {
+   ereport(WARNING,
+   (errmsg("cannot verify checksum in file \"%s\",
block "
+   "%d: read buffer size %d and page size %d "
+   "differ",
+   readfilename, blkno, (int) cnt, BLCKSZ)));
+   verify_checksum = false;
+   }

For do_incremental_backup() it does not make sense to show the block number
in
warning as it is always going to be 0 when we throw this warning.
Further, I think this can be rephrased as:
"cannot verify checksum in file \"%s\", read file size %d is not multiple of
page size %d".

Or maybe we can just say:
"cannot verify checksum in file \"%s\"" if checksum requested, disable the
checksum and leave it to the following message:

+   ereport(WARNING,
+   (errmsg("file size (%d) not in multiple of page size
(%d), sending whole file",
+   (int) cnt, BLCKSZ)));

-

If you agree on the above comment for blkno, then we can shift declaration
of blkno
inside the condition "   if (!sendwholefile)" in
do_incremental_backup(), or
avoid it altogether, and just pass "i" as blkindex, as well as blkno to
verify_page_checksum(). May be add a comment why they are same in case of
incremental backup.

-

I think we should give the user hint from where he should be reading the
input
lsn for incremental backup in the --help option as well as documentation?
Something like - "To take an incremental backup, please provide value of
"--lsn"
as the "START WAL LOCATION" of previously taken full backup or incremental
backup from backup_lable file.

-

pg_combinebackup:

+static bool made_new_outputdata = false;
+static bool found_existing_outputdata = false;

Both of these are global, I understand that we need them global so that
they are
accessible in cleanup_directories_atexit(). But they are passed to
verify_dir_is_empty_or_create() as parameters, which I think is not needed.
Instead verify_dir_is_empty_or_create() can directly change the globals.

-

I see that checksum_failure is never set and always remains as false. May be
it is something that you wanted to set in combine_partial_files() when a
the corrupted partial file is detected?

-

I think the logic for verifying the backup chain should be moved out from
main()
function to a separate function.

-

+ /*
+ * Verify the backup chain.  INCREMENTAL BACKUP REFERENCE WAL LOCATION of
+ * the incremental backup must match with the START WAL LOCATION of the
+ * previous backup, until we reach a full backup in which there is no
+ * INCREMENTAL BACKUP REFERENCE WAL LOCATION.
+ */

The current logic assumes the incremental backup directories are to be
provided
as input in the serial order the backups were taken. This is bit confusing
unless clarified in pg_combinebackup help menu or documentation. I think we
should clarify it at both the places.

-----

I think scan_directory() should be rather renamed as do_combinebackup().

Regards,
Jeevan Ladhe

On Thu, Aug 29, 2019 at 8:11 PM Jeevan Ladhe 
wrote:

> Due to the inherent nature of pg_basebackup, the incremental backup also
> allows taking backup in tar and compressed format. But, pg_combinebackup
> does not understand how to restore this. I think we should either make
> pg_combinebackup support restoration of tar incremental backup or restrict
> taking the incremental backup in tar format until pg_combinebackup
> supports the restoration by making option '--lsn' and '-Ft' exclusive.
>
> It is arguable that one can take the incremental backup in tar format,
> extract
> that manually and then give the resultant directory as input to the
> pg_combinebackup, but I think that kills the purpose of having
> pg_combinebackup utility.
>
> Thoughts?
>
> Regards,
> Jeevan Ladhe
>


Re: block-level incremental backup

2019-09-02 Thread Jeevan Ladhe
Hi Robert,

On Sat, Aug 31, 2019 at 8:29 AM Robert Haas  wrote:

> On Thu, Aug 29, 2019 at 10:41 AM Jeevan Ladhe
>  wrote:
> > Due to the inherent nature of pg_basebackup, the incremental backup also
> > allows taking backup in tar and compressed format. But, pg_combinebackup
> > does not understand how to restore this. I think we should either make
> > pg_combinebackup support restoration of tar incremental backup or
> restrict
> > taking the incremental backup in tar format until pg_combinebackup
> > supports the restoration by making option '--lsn' and '-Ft' exclusive.
> >
> > It is arguable that one can take the incremental backup in tar format,
> extract
> > that manually and then give the resultant directory as input to the
> > pg_combinebackup, but I think that kills the purpose of having
> > pg_combinebackup utility.
>
> I don't agree. You're right that you would have to untar (and
> uncompress) the backup to run pg_combinebackup, but you would also
> have to do that to restore a non-incremental backup, so it doesn't
> seem much different.
>

Thanks. Yes I agree about the similarity between restoring non-incremental
and incremental backup in this case.


>  I don't think it's worth doing that at this point; I definitely don't
> think it
> needs to be part of the first patch.
>

Makes sense.

Regards,
Jeevan Ladhe


Re: Extension development

2019-11-12 Thread Jeevan Ladhe
Hi Yonatan,

Here is an attempt to explain the components of the extension:

Makefile:
Makefile provides a way to compile your C code. Postgres provides an
infrastructure called PGXS for building the extensions against installed
Postgres server. More of this can be found in official documentation[1].

Control file:
It specifies some properties/metadata about the extension, like version,
comments, directory etc. Official documentation[2]

SQL Script:
This file should be of format extension—version.sql which will have the
functions that are either pure SQL functions, or interfaces for your C
functions and other SQL objects to assist your functions etc. This will be
executed internally by “CREATE EXTENSION” command.

C code:
Your C code is real implementation of your extension. Here you can have C
implementations of SQL interface functions your declared in your .sql
script file, register callbacks e.g. things you want to do post parse,
before execution of a query etc. The filename can be anything but you
should have PG_MODULE_MAGIC included in your C file.

Using this infrastructure one can simply do make, make install and then
“CREATE EXTENSION” command to create objects. This helps keeping track of
all the extension objects together, create them at once, and drop once with
“DROP EXTENSION” command. Here[3] is complete documentation for extension.

Regards,
Jeevan Ladhe

[1] https://www.postgresql.org/docs/current/extend-pgxs.html
[2]
https://www.postgresql.org/docs/current/extend-extensions.html#id-1.8.3.20.12
[3] https://www.postgresql.org/docs/current/extend-extensions.html

On Tue, Nov 12, 2019 at 12:24 PM Yonatan Misgan  wrote:

> I am developed my own PostgreSQL extension for learning purpose and it is
> working correctly but I want to know to which components of the database is
> my own extension components communicate. For example I have c code, make
> file sql script, and control file after compiling the make file to which
> components of the database are each of my extension components to
> communicate. Thanks for your response.
>
>
>
> Regards,
>
> 
>
> *Yonathan Misgan *
>
> *Assistant Lecturer, @ Debre Tabor University*
>
> *Faculty of Technology*
>
> *Department of Computer Science*
>
> *Studying MSc in **Computer Science** (in Data and Web Engineering) *
>
> *@ Addis Ababa University*
>
> *E-mail: yona...@dtu.edu.et *
>
> *yonathanmisga...@gmail.com *
>
> *Tel:   **(+251)-911180185 (mob)*
>
>
>


Fix error in ECPG while connection handling

2018-03-12 Thread Jeevan Ladhe
Hi,

I came across following error while working on ecpg client program.

$ install/bin/ecpg ecpg_connection_ptr.pgc
ecpg_connection_ptr.pgc:26: ERROR: AT option not allowed in WHENEVER
statement

I have attached simple ecpg program 'ecpg_connection_ptr_issue.pgc' that
reproduces the above issue.

After doing some investigation I could see that, in preproc.y, in rule of
'ClosePortalStmt', function output_statement() is called, which in turn
frees
the connection pointer. Here is the relevant trailing snippet from the
output_statement() function.

  whenever_action(whenever_mode | 2);
free(stmt);
if (connection != NULL)
free(connection);
}

Now, when the ecpg parses following statement using rule 'ECPGWhenever':
 EXEC SQL WHENEVER SQLERROR CONTINUE;
it checks if the connection is set, if it is then throws an error as below:

if (connection)
mmerror(PARSE_ERROR, ET_ERROR, "AT option not allowed in WHENEVER
statement");

Now, ideally the connection would have been null here, but, as the
'ClosePortalStmt'
rule freed the connection but did not set it to NULL, it still sees that
there
is a connection(which is actually having garbage in it) and throws an error.

I see similarly there are other places, which are freeing this global
connection
but not setting to NULL, and all those should be fixed. I have attached a
patch
ecpg_connection_ptr_issue_fix.patch to fix these places.

Regards,
Jeevan Ladhe


ecpg_connection_ptr_issue_fix.patch
Description: Binary data


ecpg_connection_ptr_issue.pgc
Description: Binary data


Re: Fix error in ECPG while connection handling

2018-03-13 Thread Jeevan Ladhe
> Thanks for spotting and fixing. I will push the patch as soon as I'm
> online again.
>

Thanks Michael for taking care of this.

Regards,
Jeevan Ladhe.


Re: Make some xlogreader messages more accurate

2023-02-28 Thread Jeevan Ladhe
+1 for the changes.

>1. Why is "wanted >=%u" any better than "wanted at least %u"? IMO, the
>wording as opposed to >= symbol in the user-facing messages works
>better.

I think I agree with Bharath on this: "wanted at least %u" sounds better
for user error than "wanted >=%u".

Regards,
Jeevan Ladhe

On Tue, 28 Feb 2023 at 11:46, Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Thu, Feb 23, 2023 at 1:06 PM Peter Eisentraut
>  wrote:
> >
> > Here is a small patch to make some invalid-record error messages in
> > xlogreader a bit more accurate (IMO).
>
> +1 for these changes.
>
> > My starting point was that when you have some invalid WAL, you often get
> > a message like "wanted 24, got 0".  This is a bit incorrect, since it
> > really wanted *at least* 24, not exactly 24.  So I have updated the
> > messages to that effect, and
>
> Yes, it's not exactly "wanted", but "wanted at least" because
> xl_tot_len is the total length of the entire record including header
> and payload.
>
> > also added that detail to one message where
> > it was available but not printed.
>
> Looks okay.
>
> > Going through the remaining report_invalid_record() calls I then
> > adjusted the use of "invalid" vs. "incorrect" in one case.  The message
> > "record with invalid length" makes it sound like the length was
> > something like -5, but really we know what the length should be and what
> > we got wasn't it, so "incorrect" sounded better and is also used in
> > other error messages in that file.
>
> I have no strong opinion about this change. We seem to be using
> "invalid length" and "incorrect length" interchangeably [1] without
> distinguishing between "invalid" if length is < 0 and "incorrect" if
> length >= 0 and not something we're expecting.
>
> Another comment on the patch:
> 1. Why is "wanted >=%u" any better than "wanted at least %u"? IMO, the
> wording as opposed to >= symbol in the user-facing messages works
> better.
> +report_invalid_record(state, "invalid record offset at %X/%X:
> wanted >=%u, got %u",
> +  "invalid record length at %X/%X:
> wanted >=%u, got %u",
> +  "invalid record length at %X/%X: wanted
> >=%u, got %u",
>
> [1]
> elog(ERROR, "incorrect length %d in streaming transaction's changes
> file \"%s\"",
> "record with invalid length at %X/%X",
> (errmsg("invalid length of checkpoint record")));
> errmsg("invalid length of startup packet")));
> errmsg("invalid length of startup packet")));
> elog(ERROR, "invalid zero-length dimension array in MCVList");
> elog(ERROR, "invalid length (%d) dimension array in MCVList",
> errmsg("invalid length in external \"%s\" value",
> errmsg("invalid length in external bit string")));
> libpq_append_conn_error(conn, "certificate contains IP address with
> invalid length %zu
>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>
>
>


server log inflates due to pg_logical_slot_peek_changes/pg_logical_slot_get_changes calls

2023-03-19 Thread Jeevan Ladhe
Hi,

I observed absurd behaviour while using pg_logical_slot_peek_changes()
and pg_logical_slot_get_changes(). Whenever any of these two functions
are called to read the changes using a decoder plugin, the following
messages are printed in the log for every single such call.

2023-03-19 16:36:06.040 IST [30099] LOG:  starting logical decoding for
slot "test_slot1"
2023-03-19 16:36:06.040 IST [30099] DETAIL:  Streaming transactions
committing after 0/851DFD8, reading WAL from 0/851DFA0.
2023-03-19 16:36:06.040 IST [30099] STATEMENT:  SELECT data FROM
pg_logical_slot_get_changes('test_slot1', NULL, NULL, 'format-version',
'2');
2023-03-19 16:36:06.040 IST [30099] LOG:  logical decoding found consistent
point at 0/851DFA0
2023-03-19 16:36:06.040 IST [30099] DETAIL:  There are no running
transactions.
2023-03-19 16:36:06.040 IST [30099] STATEMENT:  SELECT data FROM
pg_logical_slot_get_changes('test_slot1', NULL, NULL, 'format-version',
'2');

This log is printed on every single call to peek/get functions and bloats
the server log file by a huge amount when called in the loop for reading
the changes.

IMHO, printing the message every time we create the context for
decoding a slot using pg_logical_slot_get_changes() seems over-burn.
Wondering if instead of LOG messages, should we mark these as
DEBUG1 in SnapBuildFindSnapshot() and CreateDecodingContext()
respectively? I can produce a patch for the same if we agree.

Regards,
Jeevan Ladhe


Re: server log inflates due to pg_logical_slot_peek_changes/pg_logical_slot_get_changes calls

2023-03-20 Thread Jeevan Ladhe
Thanks, Ashutosh for the reply.

I think those messages are useful when debugging logical replication
> problems (imagine missing transaction or inconsistent data between
> publisher and subscriber). I don't think pg_logical_slot_get_changes()
> or pg_logical_slot_peek_changes() are expected to be called frequently
> in a loop.


Yeah right. But can you please shed some light on when these functions
should be called, or are they used only for testing purposes?

Instead you should open a replication connection to
> continue to receive logical changes ... forever.
>

Yes, this is what I have decided to resort to now.

Why do you need to call pg_logical_slot_peek_changes() and
> pg_logical_slot_get_changes() frequently?
>

I was just playing around to do something for logical replication and
thought
of doing this quick test where every time interval I read using
pg_logical_slot_peek_changes(), make sure to consume them to a consistent
state, and only then use pg_logical_slot_get_changes() to advance the slot.

Regards,
Jeevan Ladhe


Re: refactoring basebackup.c

2022-01-18 Thread Jeevan Ladhe
Hi,

Similar to LZ4 server-side compression, I have also tried to add a ZSTD
server-side compression in the attached patch. I have done some initial
testing and things seem to be working.

Example run:
pg_basebackup -t server:/tmp/data_zstd -Xnone --server-compression=zstd

The patch surely needs some grooming, but I am expecting some initial
review, specially in the area where we are trying to close the zstd stream
in bbsink_zstd_end_archive(). We need to tell the zstd library to end the
compression by calling ZSTD_compressStream2() thereby sending a
ZSTD_e_end flag. But, this also needs some input string, which per
example[1] line # 686, I have taken as an empty ZSTD_inBuffer.

Thanks, Tushar for testing the LZ4 patch. I have added the LZ4 option in
the pg_basebackup help now.

Note: Before applying these patches please apply Robert's v10 version
of patches 0002, 0003, and 0004.

[1]
https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/zircon/tools/zbi/zbi.cc

Regards,
Jeevan Ladhe

On Wed, Jan 5, 2022 at 10:24 PM tushar 
wrote:

>
>
> On Tue, Dec 28, 2021 at 1:12 PM Jeevan Ladhe <
> jeevan.la...@enterprisedb.com> wrote:
>
>> Hi Tushar,
>>
>> You need to apply Robert's v10 version patches 0002, 0003 and 0004,
>> before applying the lz4 patch(v8 version).
>> Please let me know if you still face any issues.
>>
>
> Thanks, Jeevan.
> I tested —server-compression option using different other options of
> pg_basebackup, also checked -t/—server-compression from pg_basebackup of
> v15 will
> throw an error if the server version is v14 or below. Things are looking
> good to me.
> Two open  issues -
> 1)lz4 value is missing for --server-compression in pg_basebackup --help
> 2)Error messages need to improve if using -t server with -z/-Z
>
> regards,
>


v9-0001-Add-a-LZ4-compression-method-for-server-side-compres.patch
Description: Binary data


v9-0002-Add-a-ZSTD-compression-method-for-server-side-compre.patch
Description: Binary data


Re: refactoring basebackup.c

2022-01-28 Thread Jeevan Ladhe
Hi Robert,

I have attached the latest rebased version of the LZ4 server-side
compression
patch on the recent commits. This patch also introduces the compression
level
and adds a tap test.

Also, while adding the lz4 case in the pg_verifybackup/t/008_untar.pl, I
found
an unused variable {have_zlib}. I have attached a cleanup patch for that as
well.

Please review and let me know your thoughts.

Regards,
Jeevan Ladhe


0001-gzip-tap-test-remove-extra-variable.patch
Description: Binary data


v10-0002-Add-a-LZ4-compression-method-for-server-side-compres.patch
Description: Binary data


Re: refactoring basebackup.c

2022-01-28 Thread Jeevan Ladhe
On Sat, Jan 29, 2022 at 1:20 AM Robert Haas  wrote:

> On Fri, Jan 28, 2022 at 12:48 PM Jeevan Ladhe
>  wrote:
> > I have attached the latest rebased version of the LZ4 server-side
> compression
> > patch on the recent commits. This patch also introduces the compression
> level
> > and adds a tap test.
>
> In view of this morning's commit of
> d45099425eb19e420433c9d81d354fe585f4dbd6 I think the threshold for
> committing this patch has gone up. We need to make it support
> decompression with LZ4 on the client side, as we now have for gzip.
>

Fair enough. Makes sense.


> - In the new test case you set decompress_flags but according to the
> documentation I have here, -m is for multiple files (and so should not
> be needed here) and -d is for decompression (which is what we want
> here). So I'm confused why this is like this.
>
>
'-d' is the default when we have a .lz4 extension, which is true in our
case,
hence elimininated that. About, '-m' introduction, without any option, or
even
after providing the explicit '-d' option, weirdly lz4 command was throwing
decompressed tar on the console, that's when in my lz4 man version I saw
these 2 lines and tried adding '-m' option, and it worked:

" It is considered bad practice to rely on implicit output in scripts.
 because the script´s environment may change. Always use explicit
 output in scripts. -c ensures that output will be stdout. Conversely,
 providing a destination name, or using -m ensures that the output will
 be either the specified name, or filename.lz4 respectively."

and

"Similarly, lz4 -m -d can decompress multiple *.lz4 files."


> This part seems clearly correct, so I have committed it.


Thanks for pushing it.

Regards,
Jeevan Ladhe


Re: refactoring basebackup.c

2022-01-31 Thread Jeevan Ladhe
Hi Robert,

I had an offline discussion with Dipesh, and he will be working on the
lz4 client side decompression part.

Please find the attached patch with the following changes:

- Even if we were going to support LZ4 only on the server side, surely
> it's not right to refuse --compress lz4 and --compress client-lz4 at
> the parsing stage. I don't even think the message you added to main()
> is reachable.
>

I think you are right, I have removed the message and again introduced
the Assert() back.

- In the new test case you set decompress_flags but according to the
> documentation I have here, -m is for multiple files (and so should not
> be needed here) and -d is for decompression (which is what we want
> here). So I'm confused why this is like this.
>

As explained earlier in the tap test the 'lz4 -d base.tar.lz4' command was
throwing the decompression to stdout. Now, I have removed the '-m',
added '-d' for decompression, and also added the target file explicitly in
the command.

Regards,
Jeevan Ladhe


v11-0001-Add-a-LZ4-compression-method-for-server-side-compres.patch
Description: Binary data


Re: refactoring basebackup.c

2022-01-31 Thread Jeevan Ladhe
> I think you are right, I have removed the message and again introduced
> the Assert() back.
>
In my previous version of patch, this was a problem, basically, there should
not be an assert as the code is still reachable be it server-lz4 or
client-lz4.
I removed the assert and added the level range check there similar to gzip.

Regards,
Jeevan Ladhe


v12-0001-Add-a-LZ4-compression-method-for-server-side-compres.patch
Description: Binary data


Re: refactoring basebackup.c

2022-02-10 Thread Jeevan Ladhe
Thanks for the patch, Dipesh.
With a quick look at the patch I have following observations:

--
In bbstreamer_lz4_compressor_new(), I think this alignment is not needed
on client side:

/* Align the output buffer length. */
compressed_bound += compressed_bound + BLCKSZ - (compressed_bound %
BLCKSZ);
--

bbstreamer_lz4_compressor_content(), avail_in and len variables both are
not changed. I think we can simply change the len to avail_in in the
argument list.
--

Comment:
+* Update the offset and capacity of output buffer based on based
on number
+* of bytes written to output buffer.

I think it is thinko:

+* Update the offset and capacity of output buffer based on number
of
+* bytes written to output buffer.
--

Indentation:

+   if ((mystreamer->base.bbs_buffer.maxlen -
mystreamer->bytes_written) <=
+   footer_bound)

--
I think similar to bbstreamer_lz4_compressor_content() in
bbstreamer_lz4_decompressor_content() we can change len to avail_in.


Regards,
Jeevan Ladhe

On Thu, 10 Feb 2022 at 18:11, Dipesh Pandit  wrote:

> Hi,
>
> > On Mon, Jan 31, 2022 at 4:41 PM Jeevan Ladhe <
> jeevan.la...@enterprisedb.com> wrote:
>
>> Hi Robert,
>>
>> I had an offline discussion with Dipesh, and he will be working on the
>> lz4 client side decompression part.
>>
>
> Please find the attached patch to support client side compression
> and decompression using lz4.
>
> Added a new lz4 bbstreamer to compress the archive chunks at
> client if user has specified --compress=clinet-lz4:[LEVEL] option
> in pg_basebackup. The new streamer accepts archive chunks
> compresses it and forwards it to plain-writer.
>
> Similarly, If a user has specified a server compressed lz4 archive
> with plain format (-F p) backup then it requires decompressing
> the compressed archive chunks before forwarding it to tar extractor.
> Added a new bbstreamer to decompress the compressed archive
> and forward it to tar extractor.
>
> Note: This patch can be applied on Jeevan Ladhe's v12 patch
> for lz4 compression.
>
> Thanks,
> Dipesh
>


Re: refactoring basebackup.c

2022-02-11 Thread Jeevan Ladhe
>Jeevan, Your v12 patch does not apply on HEAD, it requires a
rebase.

Sure, please find the rebased patch attached.

Regards,
Jeevan

On Fri, 11 Feb 2022 at 14:13, Dipesh Pandit  wrote:

> Hi,
>
> Thanks for the feedback, I have incorporated the suggestions
> and updated a new patch. PFA v2 patch.
>
> > I think similar to bbstreamer_lz4_compressor_content() in
> > bbstreamer_lz4_decompressor_content() we can change len to avail_in.
>
> In bbstreamer_lz4_decompressor_content(), we are modifying avail_in
> based on the number of bytes decompressed in each iteration. I think
> we cannot replace it with "len" here.
>
> Jeevan, Your v12 patch does not apply on HEAD, it requires a
> rebase. I have applied it on commit
> 400fc6b6487ddf16aa82c9d76e5cfbe64d94f660
> to validate my v2 patch.
>
> Thanks,
> Dipesh
>


v13-0001-Add-a-LZ4-compression-method-for-server-side-compres.patch
Description: Binary data


Re: refactoring basebackup.c

2022-02-11 Thread Jeevan Ladhe
Thanks Robert for the bravity :-)

Regards,
Jeevan Ladhe


On Fri, 11 Feb 2022, 20:31 Robert Haas,  wrote:

> On Fri, Feb 11, 2022 at 7:20 AM Dipesh Pandit 
> wrote:
> > > Sure, please find the rebased patch attached.
> >
> > Thanks, I have validated v2 patch on top of rebased patch.
>
> I'm still feeling brave, so I committed this too after fixing a few
> things. In the process I noticed that we don't have support for LZ4
> compression of streamed WAL (cf. CreateWalTarMethod). It would be good
> to fix that. I'm not quite sure whether
>
> http://postgr.es/m/pm1bMV6zZh9_4tUgCjSVMLxDX4cnBqCDGTmdGlvBLHPNyXbN18x_k00eyjkCCJGEajWgya2tQLUDpvb2iIwlD22IcUIrIt9WnMtssNh-F9k=@pm.me
> is basically what we need or whether something else is required.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


Re: refactoring basebackup.c

2022-02-15 Thread Jeevan Ladhe
Hi,

Please find the attached updated version of patch for ZSTD server side
compression.

This patch has following changes:

- Fixes the issue Tushar reported[1].
- Adds a tap test.
- Makes document changes related to zstd.
- Updates pg_basebackup help for pg_basebackup. Here I have chosen the
suggestion by Robert upthread (as given below):

>> I would be somewhat inclined to leave the level-only variant
>> undocumented and instead write it like this:
>>  -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}

- pg_indent on basebackup_zstd.c.

Thanks Tushar, for offline help for testing the patch.

[1]
https://www.postgresql.org/message-id/6c3f1558-1e56-9946-78a2-c59340da1dbf%40enterprisedb.com

Regards,
Jeevan Ladhe

On Mon, 14 Feb 2022 at 21:30, Robert Haas  wrote:

> On Sat, Feb 12, 2022 at 1:01 AM Shinoda, Noriyoshi (PN Japan FSIP)
>  wrote:
> > Thank you for developing a great feature.
> > The current help message shown below does not seem to be able to specify
> the 'client-' or 'server-' for lz4 compression.
> >  --compress = {[{client, server}-]gzip, lz4, none}[:LEVEL]
> >
> > The attached small patch fixes the help message as follows:
> >  --compress = {[{client, server}-]{gzip, lz4}, none}[:LEVEL]
>
> Hmm. After studying this a bit more closely, I think this might
> actually need a bit more revision than what you propose here. In most
> places, we use vertical bars to separate alternatives:
>
>   -X, --wal-method=none|fetch|stream
>
> But here, we're using commas in some places and the word "or" in one
> case as well:
>
>   -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]
>
> We're also not consistently using braces for grouping, which makes the
> order of operations a bit unclear, and it makes no sense to put
> brackets around LEVEL when it's the only thing that's part of that
> alternative.
>
> A more consistent way of writing the supported syntax would be like this:
>
>   -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|LEVEL|none}
>
> I would be somewhat inclined to leave the level-only variant
> undocumented and instead write it like this:
>
>   -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


v10-0001-Add-a-ZSTD-compression-method-for-server-side-compre.patch
Description: Binary data


Re: refactoring basebackup.c

2022-02-15 Thread Jeevan Ladhe
Thanks Tushar for the testing.

I further worked on ZSTD and now have implemented client side
compression as well. Attached are the patches for both server-side and
client-side compression.

The patch 0001 is a server-side patch, and has not changed since the
last patch version - v10, but, just bumping the version number.

Patch 0002 is the client-side compression patch.

Regards,
Jeevan Ladhe

On Tue, 15 Feb 2022 at 22:24, tushar  wrote:

> On 2/15/22 6:48 PM, Jeevan Ladhe wrote:
> > Please find the attached updated version of patch for ZSTD server side
> Thanks, Jeevan, I again tested with the attached patch, and as mentioned
> the crash is fixed now.
>
> also, I tested with different labels with gzip V/s zstd against data
> directory size which is 29GB and found these results
>
> 
> ./pg_basebackup  -t server:/tmp/
> --compress=server-zstd:  -Xnone -n -N --no-estimate-size -v
>
> --compress=server-zstd:1 =  compress directory size is  1.3GB
> --compress=server-zstd:4 = compress  directory size is  1.3GB
> --compress=server-zstd:7 = compress  directory size is  1.2GB
> --compress=server-zstd:12 = compress directory size is 1.2GB
> 
>
> ===
> ./pg_basebackup  -t server:/tmp/
> --compress=server-gzip:  -Xnone -n -N --no-estimate-size -v
>
> --compress=server-gzip:1 =  compress directory size is  1.8GB
> --compress=server-gzip:4 = compress  directory size is  1.6GB
> --compress=server-gzip:9 = compress  directory size is  1.6GB
> ===
>
> --
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/
> The Enterprise PostgreSQL Company
>
>


v11-0001-Add-a-ZSTD-compression-method-for-server-side-compre.patch
Description: Binary data


v11-0002-ZSTD-CLIENT-compression-WIP-working.patch
Description: Binary data


Re: refactoring basebackup.c

2022-02-16 Thread Jeevan Ladhe
Hi Everyone,

So, I went ahead and have now also implemented client side decompression
for zstd.

Robert separated[1] the ZSTD configure switch from my original patch
of server side compression and also added documentation related to
the switch. I have included that patch here in the patch series for
simplicity.

The server side compression patch
0002-ZSTD-add-server-side-compression-support.patch has also taken care
of Justin Pryzby's comments[2]. Also, made changes to pg_basebackup help
as suggested by Álvaro Herrera.

[1]
https://www.postgresql.org/message-id/CA%2BTgmobRisF-9ocqYDcMng6iSijGj1EZX99PgXA%3D3VVbWuahog%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/20220215175944.GY31460%40telsasoft.com

Regards,
Jeevan Ladhe

On Wed, 16 Feb 2022 at 21:46, Robert Haas  wrote:

> On Wed, Feb 16, 2022 at 11:11 AM Alvaro Herrera 
> wrote:
> > This is hard to interpret for humans though because of the nested
> > brackets and braces.  It gets considerably easier if you split it in
> > separate variants:
> >
> >-Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
> >-Z, --compress=LEVEL
> >-Z, --compress=none
> >  compress tar output with given compression
> method or level
> >
> >
> > or, if you choose to leave the level-only variant undocumented, then
> >
> >-Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
> >-Z, --compress=none
> >  compress tar output with given compression
> method or level
> >
> > There still are some nested brackets and braces, but the scope is
> > reduced enough that interpreting seems quite a bit simpler.
>
> I could go for that. I'm also just noticing that "none" is not really
> a compression method or level, and the statement that it can only
> compress "tar" output is no longer correct, because server-side
> compression can be used together with -Fp. So maybe we should change
> the sentence afterward to something a bit more generic, like "specify
> whether and how to compress the backup".
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


v12-0003-ZSTD-add-client-side-compression-support.patch
Description: Binary data


v2-0001-Add-support-for-building-with-ZSTD.patch
Description: Binary data


v12-0004-ZSTD-add-client-side-decompression-support.patch
Description: Binary data


v12-0002-ZSTD-add-server-side-compression-support.patch
Description: Binary data


improve --with-lz4 install documentation

2022-02-16 Thread Jeevan Ladhe
Now that lz4 also supports backup compression, decompression, and more
future enhancements that can be done here, I think it is better to make
the --with-lz4 description more generic than adding specific details in
there.

Attached is the patch to remove the specifics related to WAL and TOAST
compression.

Regards,
Jeevan Ladhe


improve-with-lz4-install-documentation.patch
Description: Binary data


Re: refactoring basebackup.c

2022-02-16 Thread Jeevan Ladhe
Thanks for the comments Robert. I have addressed your comments in the
attached patch v13-0002-ZSTD-add-server-side-compression-support.patch.
Rest of the patches are similar to v12, but just bumped the version number.

> It will be good if we can also fix
> CreateWalTarMethod to support LZ4 and ZSTD.
Ok we will see, either Dipesh or I will take care of it.

Regards,
Jeevan Ladhe


On Thu, 17 Feb 2022 at 02:37, Robert Haas  wrote:

> On Wed, Feb 16, 2022 at 12:46 PM Jeevan Ladhe 
> wrote:
> > So, I went ahead and have now also implemented client side decompression
> > for zstd.
> >
> > Robert separated[1] the ZSTD configure switch from my original patch
> > of server side compression and also added documentation related to
> > the switch. I have included that patch here in the patch series for
> > simplicity.
> >
> > The server side compression patch
> > 0002-ZSTD-add-server-side-compression-support.patch has also taken care
> > of Justin Pryzby's comments[2]. Also, made changes to pg_basebackup help
> > as suggested by Álvaro Herrera.
>
> The first hunk of the documentation changes is missing a comma between
> gzip and lz4.
>
> + * At the start of each archive we reset the state to start a new
> + * compression operation. The parameters are sticky and they would
> stick
> + * around as we are resetting with option ZSTD_reset_session_only.
>
> I don't think "would" is what you mean here. If you say something
> would stick around, that means it could be that way it isn't. ("I
> would go to the store and buy some apples, but I know they don't have
> any so there's no point.") I think you mean "will".
>
> -printf(_("  -Z,
> --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
> - " compress tar output with given
> compression method or level\n"));
> +printf(_("  -Z,
> --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n"));
> +printf(_("  -Z, --compress=none\n"));
>
> You deleted a line that you should have preserved here.
>
> Overall there doesn't seem to be much to complain about here on a
> first read-through. It will be good if we can also fix
> CreateWalTarMethod to support LZ4 and ZSTD.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


v13-0004-ZSTD-add-client-side-decompression-support.patch
Description: Binary data


v13-0001-Add-support-for-building-with-ZSTD.patch
Description: Binary data


v13-0002-ZSTD-add-server-side-compression-support.patch
Description: Binary data


v13-0003-ZSTD-add-client-side-compression-support.patch
Description: Binary data


Re: Remove redundant variable from transformCreateStmt

2021-04-15 Thread Jeevan Ladhe
Thanks Amul, this looks pretty straight forward. LGTM.
I have also run the regression on master and seems good.

Regards,
Jeevan Ladhe


Re: Remove redundant variable from transformCreateStmt

2021-04-15 Thread Jeevan Ladhe
IMHO, I think the idea here was to just get rid of an unnecessary variable
rather than refactoring.

On Thu, Apr 15, 2021 at 5:48 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Thu, Apr 15, 2021 at 5:04 PM Amul Sul  wrote:
> >
> > Hi,
> >
> > Attached patch removes "is_foreign_table" from transformCreateStmt()
> > since it already has cxt.isforeign that can serve the same purpose.
>
> Yeah having that variable as "is_foreign_table" doesn't make sense
> when we have the info in ctx. I'm wondering whether we can do the
> following (like transformFKConstraints). It will be more readable and
> we could also add more comments on why we don't skip validation for
> check constraints i.e. constraint->skip_validation = false in case for
> foreign tables.
>

To address your concern here, I think it can be addressed by adding a
comment
just before we make a call to transformCheckConstraints().

In transformAlterTableStmt: we can remove transformCheckConstraints
> entirely because calling transformCheckConstraints with skipValidation
> = false does nothing and has no value. This way we could save a
> function call.
>
> I prefer removing the skipValidation parameter from
> transformCheckConstraints. Others might have different opinions.
>

I think this is intentional, to keep the code consistent with the CREATE
TABLE path i.e. transformCreateStmt(). Here is what the comment atop
transformCheckConstraints() reads:

/*
 * transformCheckConstraints
 * handle CHECK constraints
 *
 * Right now, there's nothing to do here when called from ALTER TABLE,
 * but the other constraint-transformation functions are called in both
 * the CREATE TABLE and ALTER TABLE paths, so do the same here, and just
 * don't do anything if we're not authorized to skip validation.
 */

This was originally discussed in thread[1] and commit:
f27a6b15e6566fba7748d0d9a3fc5bcfd52c4a1b

[1]
https://www.postgresql.org/message-id/flat/1238779931.11913728.1449143089410.JavaMail.yahoo%40mail.yahoo.com#f2d8318b6beef37dfff06baa9a1538b7


Regards,
Jeevan Ladhe


Re: Query regarding RANGE Partitioning

2021-05-07 Thread Jeevan Ladhe
Hi Nitin,

On Fri, May 7, 2021 at 4:21 PM Nitin Jadhav 
wrote:

> Hi,
>
> I am not convinced with the following behaviour of RANGE Partitioning.
> Kindly let me know if this is expected behaviour or it should be changed.
>
> *Case-1*:
> postgres@68941=#create table r(a int, b int) partition by range(a,b);
> CREATE TABLE
> postgres@68941=#create table r1 partition of r for values from (100,0) to
> (200,100);
> CREATE TABLE
> postgres@68941=#create table r2 partition of r for values from (400,200)
> to (500,300);
> CREATE TABLE
> postgres@68941=#create table r3 partition of r for values from (0,100) to
> (100,200);
> ERROR:  partition "r3" would overlap partition "r1"
> LINE 1: ...able r3 partition of r for values from (0,100) to (100,200);
>
> As we can see here, I am trying to create a partition table with ranges
> from (0,100) to (100,200)
> which is actually not overlapped with any of the existing partitions. But
> I am getting error saying,
> it overlaps with partition 'r1'.
>
>
*Case-2:*
> postgres@68941=#\d+ r
>   Partitioned table "public.r"
>  Column |  Type   | Collation | Nullable | Default | Storage | Compression
> | Stats target | Description
>
> +-+---+--+-+-+-+--+-
>  a  | integer |   |  | | plain   |
> |  |
>  b  | integer |   |  | | plain   |
> |  |
> Partition key: RANGE (a, b)
> Partitions: r1 FOR VALUES FROM (100, 0) TO (200, 100),
> r2 FOR VALUES FROM (400, 200) TO (500, 300),
> r3 FOR VALUES FROM (200, 100) TO (300, 200)
>
> postgres@68941=#insert into r values(300, 50);
> INSERT 0 1
> postgres@68941=#select * from r3;
>   a  |  b
> -+-
>  300 |  50
> (2 rows)
>
> As per my understanding, in the range partitioned table, lower bound is
> included and upper bound is excluded.
> and in case of multi-column partition keys, the row comparison operator is
> used for tuple routing which means
> the columns are compared left to right. If the partition key value is
> equal to the upper bound of that column then
> the next column will be considered.
>
> So, In case of insertion of row (300, 50). Based on the understanding,
> partition 'r3' should have rejected it.
>
> Kindly confirm whether the above is expected or not. If expected, kindly
> explain.
>

If you describe the partition r3, you can see the way partition
constraints are formed:

postgres=# \d+ r3
   Table "public.r3"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression
| Stats target | Description
+-+---+--+-+-+-+--+-
 a  | integer |   |  | | plain   |
|  |
 b  | integer |   |  | | plain   |
|  |
Partition of: r FOR VALUES FROM (200, 100) TO (300, 200)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((a > 200)
OR ((a = 200) AND (b >= 100))) AND ((a < 300) OR ((a = 300) AND (b < 200
Access method: heap

The above constraint very well fits the tuple you are trying to insert
that is: (a, b) = (300, 50) (where (a = 300) AND (b < 200))

Also, the table partition syntax documentation[1]
<https://www.postgresql.org/docs/current/sql-createtable.html>clarifies
this (look
for "partition_bound_expr"):

"When creating a range partition, the lower bound specified with
FROM is an inclusive bound, whereas the upper bound specified with
TO is an exclusive bound. That is, the values specified in the FROM
list are valid values of the corresponding partition key columns
for this partition, whereas those in the TO list are not. Note that
this statement must be understood according to the rules of row-wise
comparison (Section 9.24.5). For example, given PARTITION BY RANGE
(x,y), a partition bound FROM (1, 2) TO (3, 4) allows x=1 with any y>=2,
x=2 with any non-null y, and x=3 with any y<4."

So, in your case the partition (a, b) for bound (200, 100) TO (300, 200)
would transform to allowing:
a = 200 with any b >= 100 OR
a > 200 and a < 300 with any non-null b
OR a=300 with any b<200

Your particular tuple (300, 50) fits in the last part of the OR i.e
(a=300 with any b<200).

So, IMHO, the range partitioning is behaving as expected.

Similarly, for the case-1 you mention above:
create table r1 partition of r for values from (100,0) to (200,100);
create table r3 partition of r for values from (0,100) to (100,200);
here, (100, 0) or r1 would overlap with (100, 200) of r3.


[1] https://www.postgresql.org/docs/current/sql-createtable.html

Regards,
Jeevan Ladhe


Re: Multi-Column List Partitioning

2021-05-07 Thread Jeevan Ladhe
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:305
#19 0x55779d13d287 in PortalRunSelect (portal=0x55779e398800,
forward=true, count=0, dest=0x55779e425a88) at pquery.c:912
#20 0x55779d13cec0 in PortalRun (portal=0x55779e398800,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x55779e425a88, altdest=0x55779e425a88,
qc=0x7ffddf9b14f0) at pquery.c:756
#21 0x55779d1361ce in exec_simple_query (
query_string=0x55779e3367a0 "SELECT inhparent::pg_catalog.regclass,\n
 pg_catalog.pg_get_expr(c.relpartbound, c.oid),\n  inhdetachpending,\n
 pg_catalog.pg_get_partition_constraintdef(c.oid)\nFROM pg_catalog.pg_class
c JOIN pg_catalo"...) at postgres.c:1214
#22 0x55779d13ad8b in PostgresMain (argc=1, argv=0x7ffddf9b1710,
dbname=0x55779e3626f8 "postgres", username=0x55779e3626d8 "hadoop") at
postgres.c:4476
#23 0x55779d0674d3 in BackendRun (port=0x55779e358380) at
postmaster.c:4488
#24 0x55779d066d8c in BackendStartup (port=0x55779e358380) at
postmaster.c:4210
#25 0x55779d062f9b in ServerLoop () at postmaster.c:1742
#26 0x55779d062734 in PostmasterMain (argc=3, argv=0x55779e3308b0) at
postmaster.c:1414
#27 0x55779cf5805f in main (argc=3, argv=0x55779e3308b0) at main.c:209

Regards,
Jeevan Ladhe

On Thu, May 6, 2021 at 7:33 PM Nitin Jadhav 
wrote:

> Hi,
>
> While reviewing one of the 'Table partitioning' related patches, I found
> that Postgres does not support multiple column based LIST partitioning.
> Based on this understanding, I have started working on this feature. I also
> feel that 'Multi-Column List Partitioning' can be benefited to the Postgres
> users in future.
>
> I am attaching the WIP patch for this feature here. It supports
> 'Multi-Column List Partitioning', however some tasks are still pending. I
> would like to know your thoughts about this, So that I can continue the
> work with improvising the current patch.
>
> Following things are handled in the patch.
> 1. Syntax
>
> CREATE TABLE table_name (attrs) PARTITION BY LIST(list_of_columns);
>
> Earlier there was no provision to mention multiple columns as part of the
> 'list_of_columns' clause. Now we can mention the list of columns separated
> by comma.
>
> CREATE TABLE table_name_p1 PARTITION OF table_name FOR VALUES IN
> list_of_values.
>
> Whereas list_of_columns can be
> a. (value [,...])
> b. (value [,...]) [,...]
>
> I would like to list a few examples here for better understanding.
> Ex-1:
> CREATE TABLE t1(a int) PARTITION BY LIST(a);
> CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN (1, 2, 10, 5, 7);
>
> Ex-2:
> CREATE TABLE t2(a int, b int) PARTITION BY LIST(a,b);
> CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2, 2),(2,
> 10);
>
> Please share if any changes are required in the above syntax.
>
> 2. Modified transformation logic to support above syntax.
>
> 3. Modified the data structures to store the information caused by above
> syntax. Also modified the searching logic to route the tuple to the
> appropriate partition.
>
> 4. Done a few basic testing and verified CREATE TABLE, INSERT INTO and
> SELECT are working fine.
>
>
> Following items are pending and I am working on it.
>
> 1. Handling of 'NULL' values.
>
> 2. Support multi column case in partition pruning.
>
> 3. Add test cases to the regression test suite.
>
> Please share your thoughts.
>
>
> Thanks & Regards,
> Nitin Jadhav
>
>
>
>
>
>
>


Re: .ready and .done files considered harmful

2021-07-06 Thread Jeevan Ladhe
> I have a few suggestions on the patch
> 1.
> +
> + /*
> + * Found the oldest WAL, reset timeline ID and log segment number to
> generate
> + * the next WAL file in the sequence.
> + */
> + if (found && !historyFound)
> + {
> + XLogFromFileName(xlog, &curFileTLI, &nextLogSegNo, wal_segment_size);
> + ereport(LOG,
> + (errmsg("directory scan to archive write-ahead log file \"%s\"",
> + xlog)));
> + }
>
> If a history file is found we are not updating curFileTLI and
> nextLogSegNo, so it will attempt the previously found segment.  This
> is fine because it will not find that segment and it will rescan the
> directory.  But I think we can do better, instead of searching the
> same old segment in the previous timeline we can search that old
> segment in the new TL so that if the TL switch happened within the
> segment then we will find the segment and we will avoid the directory
> search.
>
>
>  /*
> + * Log segment number and timeline ID to get next WAL file in a sequence.
> + */
> +static XLogSegNo nextLogSegNo = 0;
> +static TimeLineID curFileTLI = 0;
> +
>
> So everytime archiver will start with searching segno=0 in timeline=0.
> Instead of doing this can't we first scan the directory and once we
> get the first segment to archive then only we can start predicting the
> next wal segment?  I think there is nothing wrong even if we try to
> look for seg 0 in timeline 0, everytime we start the archivar but that
> will be true only once in the history of the cluster so why not skip
> this until we scan the directory once?
>

+1, I like Dilip's ideas here to optimize further.

Also, one minor comment:

+   /*
+* Log segment number already points to the next file in the sequence

+* (as part of successful archival of the previous file). Generate the
path
+* for status file.

+*/

This comment is a bit confusing with the name of the variable nextLogSegNo.
I think the name of the variable is appropriate here, but maybe we can
reword
the comment something like:

+   /*
+* We already have the next anticipated log segment number and the
+* timeline, check if this WAL file is ready to be archived. If
yes, skip
+* the directory scan.
+*/

Regards,
Jeevan Ladhe


[PATCH] improve the pg_upgrade error message

2021-07-12 Thread Jeevan Ladhe
While looking into one of the pg_upgrade issue, I found it

challenging to find out the database that has the datallowconn set to

'false' that was throwing following error:


*"All non-template0 databases must allow connections, i.e. their
pg_database.datallowconn must be true"*


edb=# create database mydb;

CREATE DATABASE

edb=# update pg_database set datallowconn='false' where datname like 'mydb';

UPDATE 1


Now, when I try to upgrade the server, without the patch we get above

error, which leaves no clue behind about which database has datallowconn

set to 'false'. It can be argued that we can query the pg_database

catalog and find that out easily, but at times it is challenging to get

that from the customer environment. But, anyways I feel we have scope to

improve the error message here per the attached patch.


With attached patch, now I get following error:

*"All non-template0 databases must allow connections, i.e. their
pg_database.datallowconn must be true; database "mydb" has datallowconn set
to false."*



Regards,

Jeevan Ladhe


0001-Improve-the-pg_upgrade-error-message.patch
Description: Binary data


Re: refactoring basebackup.c

2021-09-08 Thread Jeevan Ladhe
>
> 0007 adds server-side compression; currently, it only supports
> server-side compression using gzip, but I hope that it won't be hard
> to generalize that to support LZ4 as well, and Andres told me he
> thinks we should aim to support zstd since that library has built-in
> parallel compression which is very appealing in this context.
>

Thanks, Robert for laying the foundation here.
So, I gave a try to LZ4 streaming API for server-side compression.
LZ4 APIs are documented here[1].

With the attached WIP patch, I am now able to take the backup using the lz4
compression. The attached patch is basically applicable on top of Robert's
V3
patch-set[2].

I could take the backup using the command:
pg_basebackup -t server:/tmp/data_lz4 -Xnone --server-compression=lz4

Further, when restored the backup `/tmp/data_lz4` and started the server, I
could see the tables I created, along with the data inserted on the original
server.

When I tried to look into the binary difference between the original data
directory and the backup `data_lz4` directory here is how it looked:

$ diff -qr data/ /tmp/data_lz4
Only in /tmp/data_lz4: backup_label
Only in /tmp/data_lz4: backup_manifest
Only in data/base: pgsql_tmp
Only in /tmp/data_lz4: base.tar
Only in /tmp/data_lz4: base.tar.lz4
Files data/global/pg_control and /tmp/data_lz4/global/pg_control differ
Files data/logfile and /tmp/data_lz4/logfile differ
Only in data/pg_stat: db_0.stat
Only in data/pg_stat: global.stat
Only in data/pg_subtrans: 
Only in data/pg_wal: 00010099.0028.backup
Only in data/pg_wal: 0001009A
Only in data/pg_wal: 0001009B
Only in data/pg_wal: 0001009C
Only in data/pg_wal: 0001009D
Only in data/pg_wal: 0001009E
Only in data/pg_wal/archive_status:
00010099.0028.backup.done
Only in data/: postmaster.opts

For now, what concerns me here is, the following `LZ4F_compressUpdate()`
API,
is the one which is doing the core work of streaming compression:

size_t LZ4F_compressUpdate(LZ4F_cctx* cctx,
   void* dstBuffer, size_t dstCapacity,
 const void* srcBuffer, size_t srcSize,
 const LZ4F_compressOptions_t* cOptPtr);

where, `dstCapacity`, is basically provided by the earlier call to
`LZ4F_compressBound()` which provides minimum `dstCapacity` required to
guarantee success of `LZ4F_compressUpdate()`, given a `srcSize` and
`preferences`, for a worst-case scenario. `LZ4F_compressBound()` is:

size_t LZ4F_compressBound(size_t srcSize, const LZ4F_preferences_t*
prefsPtr);

Now, hard learning here is that the `dstCapacity` returned by the
`LZ4F_compressBound()` even for a single byte i.e. 1 as `srcSize` is about
~256K (seems it is has something to do with the blockSize in lz4 frame that
we
chose, the minimum we can have is 64K), though the actual length of
compressed
data by the `LZ4F_compressUpdate()` is very less. Whereas, the destination
buffer length for us i.e. `mysink->base.bbs_next->bbs_buffer_length` is only
32K. In the function call `LZ4F_compressUpdate()`, if I directly try to
provide
this `mysink->base.bbs_next->bbs_buffer + bytes_written` as `dstBuffer` and
the value returned by the `LZ4F_compressBound()` as the `dstCapacity` that
sounds very much incorrect to me, since the actual out buffer length
remaining
is much less than what is calculated for the worst case by
`LZ4F_compressBound()`.

For now, I am creating a temporary buffer of the required size, passing it
for compression, assert that the actual compressed bytes are less than the
whatever length we have available and then copy it to our output buffer.

To give an example, I put some logging statements, and I can see in the log:
"
bytes remaining in mysink->base.bbs_next->bbs_buffer: 16537
input size to be compressed: 512
estimated size for compressed buffer by LZ4F_compressBound(): 262667
actual compressed size: 16
"

Will really appreciate any inputs, comments, suggestions here.

Regards,
Jeevan Ladhe

[1] https://fossies.org/linux/lz4/doc/lz4frame_manual.html
[2]
https://www.postgresql.org/message-id/CA+TgmoYgVN=-yoh71r3p9n7ekysd7_9b9s+1qfffcs3w7z-...@mail.gmail.com


lz4_compress_wip.patch
Description: Binary data


Re: refactoring basebackup.c

2021-09-13 Thread Jeevan Ladhe
Thanks, Robert for your response.

On Thu, Sep 9, 2021 at 1:09 AM Robert Haas  wrote:

> On Wed, Sep 8, 2021 at 2:14 PM Jeevan Ladhe
>  wrote:
> > To give an example, I put some logging statements, and I can see in the
> log:
> > "
> > bytes remaining in mysink->base.bbs_next->bbs_buffer: 16537
> > input size to be compressed: 512
> > estimated size for compressed buffer by LZ4F_compressBound(): 262667
> > actual compressed size: 16
> > "
>
> That is pretty lame. I don't know why it needs a ~256k buffer to
> produce 16 bytes of output.
>

As I mentioned earlier, I think it has something to do with the lz4
blocksize. Currently, I have chosen it has 256kB, which is 262144 bytes,
and here the LZ4F_compressBound() has returned 262667 for worst-case
accommodation of 512 bytes i.e. 262144(256kB) + 512 + I guess some
book-keeping bytes. If I choose to have blocksize as 64K, then this turns
out to be: 66059 which is 65536(64 kB) + 512 + bookkeeping bytes.

The way the gzip APIs I used work, you tell it how big the output
> buffer is and it writes until it fills that buffer, or until the input
> buffer is empty, whichever happens first. But this seems to be the
> other way around: you tell it how much input you have, and it tells
> you how big a buffer it needs. To handle that elegantly, I think I
> need to make some changes to the design of the bbsink stuff. What I'm
> thinking is that each bbsink somehow tells the next bbsink how big to
> make the buffer. So if the LZ4 buffer is told that its buffer should
> be at least, I don't know, say 64kB. Then it can compute how large an
> output buffer the LZ4 library requires for 64kB. Hopefully we can
> assume that liblz4 never needs a smaller buffer for a larger input.
> Then we can assume that if a 64kB input requires, say, a 300kB output
> buffer, every possible input < 64kB also requires an output buffer <=
> 300 kB.
>

 I agree, this assumption is fair enough.

But we can't just say, well, we were asked to create a 64kB buffer (or
> whatever) so let's ask the next bbsink for a 300kB buffer (or
> whatever), because then as soon as we write any data at all into it
> the remaining buffer space might be insufficient for the next chunk.
> So instead what I think we should do is have bbsink_lz4 set the size
> of the next sink's buffer to its own buffer size +
> LZ4F_compressBound(its own buffer size). So in this example if it's
> asked to create a 64kB buffer and LZ4F_compressBound(64kB) = 300kB
> then it asks the next sink to set the buffer size to 364kB. Now, that
> means that there will always be at least 300 kB available in the
> output buffer until we've accumulated a minimum of 64 kB of compressed
> data, and then at that point we can flush.

I think this would be relatively clean and would avoid the need for
> the double copying that the current design forced you to do. What do
> you think?
>

I think this should work.


>
> + /*
> + * If we do not have enough space left in the output buffer for this
> + * chunk to be written, first archive the already written contents.
> + */
> + if (nextChunkLen > mysink->base.bbs_next->bbs_buffer_length -
> mysink->bytes_written ||
> + mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
> + {
> + bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
> + mysink->bytes_written = 0;
> + }
>
> I think this is flat-out wrong. It assumes that the compressor will
> never generate more than N bytes of output given N bytes of input,
> which is not true. Not sure there's much point in fixing it now
> because with the changes described above this code will have to change
> anyway, but I think it's just lucky that this has worked for you in
> your testing.


I see your point. But for it to be accurate, I think we need to then
considered the return value of LZ4F_compressBound() to check if that
many bytes are available. But, as explained earlier our output buffer is
already way smaller than that.


>
>
+ /*
> + * LZ4F_compressUpdate() returns the number of bytes written into output
> + * buffer. We need to keep track of how many bytes have been cumulatively
> + * written into the output buffer(bytes_written). But,
> + * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
> + * written to output buffer, set autoFlush to 1 to force the writing to
> the
> + * output buffer.
> + */
> + prefs->autoFlush = 1;
>
> I don't see why this should be necessary. Elsewhere you have code that
> caters to bytes being stuck inside LZ4's buffer, so why do we also
> require this?
>

This is needed to know the actual bytes written in the output buffer. If

Re: refactoring basebackup.c

2021-09-21 Thread Jeevan Ladhe
Thanks for the newer set of the patches Robert!

I was wondering if we should change the bbs_buffer_length in bbsink to
be size_t instead of int, because that's what most of the compression
libraries have their length variables defined as.

Regards,
Jeevan Ladhe

On Mon, Sep 13, 2021 at 9:42 PM Robert Haas  wrote:

> On Mon, Sep 13, 2021 at 7:19 AM Dilip Kumar  wrote:
> > Seems like nothing has been done about the issue reported in [1]
> >
> > This one line change shall fix the issue,
>
> Oops. Try this version.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


Re: refactoring basebackup.c

2021-09-21 Thread Jeevan Ladhe
>
> >> + /*
> >> + * LZ4F_compressUpdate() returns the number of bytes written into
> output
> >> + * buffer. We need to keep track of how many bytes have been
> cumulatively
> >> + * written into the output buffer(bytes_written). But,
> >> + * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
> >> + * written to output buffer, set autoFlush to 1 to force the writing
> to the
> >> + * output buffer.
> >> + */
> >> + prefs->autoFlush = 1;
> >>
> >> I don't see why this should be necessary. Elsewhere you have code that
> >> caters to bytes being stuck inside LZ4's buffer, so why do we also
> >> require this?
> >
> > This is needed to know the actual bytes written in the output buffer. If
> it is
> > set to 0, then LZ4F_compressUpdate() would randomly return 0 or actual
> > bytes are written to the output buffer, depending on whether it has
> buffered
> > or really flushed data to the output buffer.
>
> The problem is that if we autoflush, I think it will cause the
> compression ratio to be less good. Try un-lz4ing a file that is
> produced this way and then re-lz4 it and compare the size of the
> re-lz4'd file to the original one. Compressors rely on postponing
> decisions about how to compress until they've seen as much of the
> input as possible, and flushing forces them to decide earlier, and
> maybe making a decision that isn't as good as it could have been. So I
> believe we should look for a way of avoiding this. Now I realize
> there's a problem there with doing that and also making sure the
> output buffer is large enough, and I'm not quite sure how we solve
> that problem, but there is probably a way to do it.
>

Yes, you are right here, and I could verify this fact with an experiment.
When autoflush is 1, the file gets less compressed i.e. the compressed file
is of more size than the one generated when autoflush is set to 0.
But, as of now, I couldn't think of a solution as we need to really advance
the
bytes written to the output buffer so that we can write into the output
buffer.

Regards,
Jeevan Ladhe


Re: refactoring basebackup.c

2021-09-21 Thread Jeevan Ladhe
Hi Robert,

Here is a patch for lz4 based on the v5 set of patches. The patch adapts
with the
bbsink changes, and is now able to make the provision for the required
length
for the output buffer using the new callback
function bbsink_lz4_begin_backup().

Sample command to take backup:
pg_basebackup -t server:/tmp/data_lz4 -Xnone --server-compression=lz4

Please let me know your thoughts.

Regards,
Jeevan Ladhe

On Mon, Sep 13, 2021 at 9:42 PM Robert Haas  wrote:

> On Mon, Sep 13, 2021 at 7:19 AM Dilip Kumar  wrote:
> > Seems like nothing has been done about the issue reported in [1]
> >
> > This one line change shall fix the issue,
>
> Oops. Try this version.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


lz4_compress_v2.patch
Description: Binary data


Re: refactoring basebackup.c

2021-09-22 Thread Jeevan Ladhe
On Tue, Sep 21, 2021 at 10:27 PM Robert Haas  wrote:

> On Tue, Sep 21, 2021 at 9:08 AM Jeevan Ladhe
>  wrote:
> > Yes, you are right here, and I could verify this fact with an experiment.
> > When autoflush is 1, the file gets less compressed i.e. the compressed
> file
> > is of more size than the one generated when autoflush is set to 0.
> > But, as of now, I couldn't think of a solution as we need to really
> advance the
> > bytes written to the output buffer so that we can write into the output
> buffer.
>
> I don't understand why you think we need to do that. What happens if
> you just change prefs->autoFlush = 1 to set it to 0 instead? What I
> think will happen is that you'll call LZ4F_compressUpdate a bunch of
> times without outputting anything, and then suddenly one of the calls
> will produce a bunch of output all at once. But so what? I don't see
> that anything in bbsink_lz4_archive_contents() would get broken by
> that.
>
> It would be a problem if LZ4F_compressUpdate() didn't produce anything
> and also didn't buffer the data internally, and expected us to keep
> the input around. That we would have difficulty doing, because we
> wouldn't be calling LZ4F_compressUpdate() if we didn't need to free up
> some space in that sink's input buffer. But if it buffers the data
> internally, I don't know why we care.
>

If I set prefs->autoFlush to 0, then LZ4F_compressUpdate() returns an
error: ERROR_dstMaxSize_tooSmall after a few iterations.

After digging a bit in the source of LZ4F_compressUpdate() in LZ4
repository, I
see that it throws this error when the destination buffer capacity, which in
our case is mysink->base.bbs_next->bbs_buffer_length is less than the
compress bound which it calculates internally by calling
LZ4F_compressBound()
internally for buffered_bytes + input buffer(CHUNK_SIZE in this case). Not
sure
how can we control this.

Regards,
Jeevan Ladhe


Re: refactoring basebackup.c

2021-09-22 Thread Jeevan Ladhe
On Tue, Sep 21, 2021 at 10:50 PM Robert Haas  wrote:

>
> + if (opt->compression == BACKUP_COMPRESSION_LZ4)
>
> else if
>
> + /* First of all write the frame header to destination buffer. */
> + Assert(CHUNK_SIZE >= LZ4F_HEADER_SIZE_MAX);
> + headerSize = LZ4F_compressBegin(mysink->ctx,
> + mysink->base.bbs_next->bbs_buffer,
> + CHUNK_SIZE,
> + prefs);
>
> I think this is wrong. I think you should be passing bbs_buffer_length
> instead of CHUNK_SIZE, and I think you can just delete CHUNK_SIZE. If
> you think otherwise, why?
>
> + * sink's bbs_buffer of length that can accomodate the compressed input
>
> Spelling.
>
> + * Make it next multiple of BLCKSZ since the buffer length is expected so.
>
> The buffer length is expected to be a multiple of BLCKSZ, so round up.
>
> + * If we are falling short of available bytes needed by
> + * LZ4F_compressUpdate() per the upper bound that is decided by
> + * LZ4F_compressBound(), send the archived contents to the next sink to
> + * process it further.
>
> If the number of available bytes has fallen below the value computed
> by LZ4F_compressBound(), ask the next sink to process the data so that
> we can empty the buffer.
>

Thanks for your comments, Robert.
Here is the patch addressing the comments, except the one regarding the
autoFlush flag setting.

Kindly have a look.

Regards,
Jeevan Ladhe


lz4_compress_v3.patch
Description: Binary data


Re: refactoring basebackup.c

2021-09-24 Thread Jeevan Ladhe
>
> Still, there's got to be a simple way to make this work, and it can't
> involve setting autoFlush. Like, look at this:
>
> https://github.com/lz4/lz4/blob/dev/examples/frameCompress.c
>
> That uses the same APIs that we're here and a fixed-size input buffer
> and a fixed-size output buffer, just as we have here, to compress a
> file. And it probably works, because otherwise it likely wouldn't be
> in the "examples" directory. And it sets autoFlush to 0.
>

Thanks, Robert. I have seen this example, and it is similar to what we have.
I went through each of the steps and appears that I have done it correctly.
I am still trying to debug and figure out where it is going wrong.

I am going to try hooking the pg_basebackup with the lz4 source and
debug both the sources.

Regards,
Jeevan Ladhe


Re: refactoring basebackup.c

2021-10-05 Thread Jeevan Ladhe
Hi Robert,

I have fixed the autoFlush issue. Basically, I was wrongly initializing
the lz4 preferences in bbsink_lz4_begin_archive() instead of
bbsink_lz4_begin_backup(). I have fixed the issue in the attached
patch, please have a look at it.

Regards,
Jeevan Ladhe


On Fri, Sep 24, 2021 at 6:27 PM Jeevan Ladhe 
wrote:

> Still, there's got to be a simple way to make this work, and it can't
>> involve setting autoFlush. Like, look at this:
>>
>> https://github.com/lz4/lz4/blob/dev/examples/frameCompress.c
>>
>> That uses the same APIs that we're here and a fixed-size input buffer
>> and a fixed-size output buffer, just as we have here, to compress a
>> file. And it probably works, because otherwise it likely wouldn't be
>> in the "examples" directory. And it sets autoFlush to 0.
>>
>
> Thanks, Robert. I have seen this example, and it is similar to what we
> have.
> I went through each of the steps and appears that I have done it correctly.
> I am still trying to debug and figure out where it is going wrong.
>
> I am going to try hooking the pg_basebackup with the lz4 source and
> debug both the sources.
>
> Regards,
> Jeevan Ladhe
>


lz4_compress_v4.patch
Description: Binary data


Re: refactoring basebackup.c

2021-10-07 Thread Jeevan Ladhe
Hi Robert,

I think the patch v6-0007-Support-base-backup-targets.patch has broken
the case for multiple tablespaces. When I tried to take the backup
for target 'none' and extract the base.tar I was not able to locate
tablespace_map file.

I debugged and figured out in normal tar backup i.e. '-Ft' case
pg_basebackup command is sent with TABLESPACE_MAP to the server:
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS,
TABLESPACE_MAP, MANIFEST 'yes', TARGET 'client')

But, with the target command i.e. "pg_basebackup -t server:/tmp/data_v1
-Xnone", we are not sending the TABLESPACE_MAP, here is how the command
is sent:
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, MANIFEST
'yes', TARGET 'server', TARGET_DETAIL '/tmp/data_none')

I am attaching a patch to fix this issue.

With the patch the command sent is now:
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, MANIFEST
'yes', TABLESPACE_MAP, TARGET 'server', TARGET_DETAIL '/tmp/data_none')

Regards,
Jeevan Ladhe

On Tue, Sep 21, 2021 at 10:22 PM Robert Haas  wrote:

> On Tue, Sep 21, 2021 at 7:54 AM Jeevan Ladhe
>  wrote:
> > I was wondering if we should change the bbs_buffer_length in bbsink to
> > be size_t instead of int, because that's what most of the compression
> > libraries have their length variables defined as.
>
> I looked into this and found that I was already using size_t or Size
> in a bunch of related places, so this seems to make sense.
>
> Here's a new patch set, responding also to Sergei's comments.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


fix_missing_tablespace_map_issue.patch
Description: Binary data


Re: [PATCH] improve the pg_upgrade error message

2021-07-13 Thread Jeevan Ladhe
> The admin might fix DB123, restart their upgrade procedure, spend 5 or 15
> minutes with that, only to have it then fail on DB1234.
>

Agree with this observation.

Here is a patch that writes the list of all the databases other than
template0
that are having their pg_database.datallowconn to false in a file. Similar
approach is seen in other functions like check_for_data_types_usage(),
check_for_data_types_usage() etc. Thanks Suraj Kharage for the offline
suggestion.

PFA patch.

For experiment, here is how it turns out after the fix.

postgres=# update pg_database set datallowconn='false' where datname in
('mydb', 'mydb1', 'mydb2');
UPDATE 3

$ pg_upgrade -d /tmp/v96/data -D /tmp/v13/data -b $HOME/v96/install/bin -B
$HOME/v13/install/bin
Performing Consistency Checks
-
Checking cluster versions   ok
Checking database user is the install user  ok
Checking database connection settings   fatal

All non-template0 databases must allow connections, i.e. their
pg_database.datallowconn must be true. Your installation contains
non-template0 databases with their pg_database.datallowconn set to
false. Consider allowing connection for all non-template0 databases
using:
UPDATE pg_catalog.pg_database SET datallowconn='true' WHERE datname NOT
LIKE 'template0';
A list of databases with the problem is given in the file:
databases_with_datallowconn_false.txt

Failure, exiting

$ cat databases_with_datallowconn_false.txt
mydb
mydb1
mydb2


Regards,
Jeevan Ladhe


v2-0001-Improve-the-pg_upgrade-error-message.patch
Description: Binary data


Re: .ready and .done files considered harmful

2021-07-22 Thread Jeevan Ladhe
Thanks, Dipesh. The patch LGTM.

Some minor suggestions:

+ *

+ * "nextLogSegNo" identifies the next log file to be archived in a log

+ * sequence and the flag "dirScan" specifies a full directory scan to find

+ * the next log file.


IMHO, this comment should go atop of pgarch_readyXlog() as a description

of its parameters, and not in pgarch_ArchiverCopyLoop().


 /*

+ * Interrupt handler for archiver

+ *

+ * There is a timeline switch and we have been notified by backend.

+ */


Instead, I would suggest having something like this:


+/*

+ * Interrupt handler for handling the timeline switch.

+ *

+ * A timeline switch has been notified, mark this event so that the next
iteration

+ * of pgarch_ArchiverCopyLoop() archives the history file, and we set the

+ * timeline to the new one for the next anticipated log segment.

+ */


Regards,

Jeevan Ladhe

On Thu, Jul 22, 2021 at 12:46 PM Dipesh Pandit 
wrote:

> Hi,
>
> > some comments on v2.
> Thanks for your comments. I have incorporated the changes
> and updated a new patch. Please find the details below.
>
> > On the timeline switch, setting a flag should be enough, I don't think
> > that we need to wake up the archiver.  Because it will just waste the
> > scan cycle.
> Yes, I modified it.
>
> > Why do we need multi level interfaces? I mean instead of calling first
> > XLogArchiveNotifyTLISwitch and then calling PgArchNotifyTLISwitch,
> > can't we directly call PgArchNotifyTLISwitch()?
> Yes, multilevel interfaces are not required. Removed extra interface.
>
> > +if (timeline_switch)
> > +{
> > +/* Perform a full directory scan in next cycle */
> > +dirScan = true;
> > +timeline_switch = false;
> > +}
>
> > I suggest you can add some comments atop this check.
> Added comment to specify the action required in case of a
> timeline switch.
>
> > I think you should use %m in the error message so that it also prints
> > the OS error code.
> Done.
>
> > Why is this a global variable?  I mean whenever you enter the function
> > pgarch_ArchiverCopyLoop(), this can be set to true and after that you
> > can pass this as inout parameter to pgarch_readyXlog() there in it can
> > be conditionally set to false once we get some segment and whenever
> > the timeline switch we can set it back to the true.
> Yes, It is not necessary to have global scope for "dirScan". Changed
> the scope to local for "dirScan" and "nextLogSegNo".
>
> PFA patch v3.
>
> Thanks,
> Dipesh
>


Re: PostgreSQL and big data - FDW

2020-06-24 Thread Jeevan Ladhe
On Wed, Jun 24, 2020 at 6:09 PM ROS Didier  wrote:

> Hi Bruce
>
> In the following link :
> https://www.enterprisedb.com/blog/connecting-hadoop-and-edb-postgres-shrink-big-data-challenges
> We can see :
> "Support for various authentication methods (i.e. Kerberos, NOSASL, etc.)"
>
> So HDFS_FDW  support kerberos authentication . how to be sure of that  ?
> Could EDB make a clear statement on this point?
>

HDFS_FDW does not support kerberos authentication.
The sentence you have pasted above is from the wish list or say TODO
list, here is what it says:
"Currently the HDFS_FDW only provides READ capabilities but EDB is planning
the following additional functionality:"

The functionality was not implemented. I think the part of confusion might
be
due to the formatting of the list in the blog.

You can follow the README[1] of HDFS_FDW to get an idea of how to use it.

[1] https://github.com/EnterpriseDB/hdfs_fdw/blob/master/README.md

Regards,
Jeevan


Re: WIP/PoC for parallel backup

2020-04-21 Thread Jeevan Ladhe
Hi Asif,

On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman  wrote:

> Hi,
>
> I did some tests a while back, and here are the results. The tests were
> done to simulate
> a live database environment using pgbench.
>
> machine configuration used for this test:
> Instance Type:t2.xlarge
> Volume Type  :io1
> Memory (MiB) :16384
> vCPU #   :4
> Architecture:X86_64
> IOP :16000
> Database Size (GB) :102
>
> The setup consist of 3 machines.
> - one for database instances
> - one for pg_basebackup client and
> - one for pgbench with some parallel workers, simulating SELECT loads.
>
>basebackup | 4 workers | 8 Workers  |
> 16 workers
> Backup Duration(Min):   69.25|  20.44  | 19.86  |
> 20.15
> (pgbench running with 50 parallel client simulating SELECT load)
>


Well that looks a bit strange. All 4, 8 and 16 workers backup configurations
seem to have taken the same time. Is it because the machine CPUs are
only 4? In that case did you try to run with 2-workers and compare that
with 4-workers time?

Also, just to clarify and be sure - was there anything else running on any
of
these 3 machines while the backup was in progress.

Regards,
Jeevan Ladhe


> Backup Duration(Min):   154.75   |  49.28 | 45.27 | 20.35
> (pgbench running with 100 parallel client simulating SELECT load)
>
>
>
> On Tue, Apr 21, 2020 at 9:27 AM Amit Kapila 
> wrote:
>
>> On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman 
>> wrote:
>>
>>>
>>> I forgot to make a check for no-manifest. Fixed. Attached is the updated
>>> patch.
>>>
>>>
>> Have we done any performance testing with this patch to see the benefits?
>> If so, can you point me to the results? If not, then can we perform some
>> tests on large backups to see the benefits of this patch/idea?
>>
>> --
>> With Regards,
>> Amit Kapila.
>> EnterpriseDB: http://www.enterprisedb.com
>>
>
>
> --
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>


Re: ToDo: show size of partitioned table

2018-06-19 Thread Jeevan Ladhe
On Fri, Jun 1, 2018 at 8:51 PM, Pavel Stehule 
wrote:

>
>
> 2018-06-01 17:15 GMT+02:00 Ashutosh Bapat  >:
>
>> On Fri, Jun 1, 2018 at 12:56 AM, Pavel Stehule 
>> wrote:
>> > Hi
>> >
>> > postgres=# SELECT count(*) from data;
>> > ┌─┐
>> > │  count  │
>> > ╞═╡
>> > │ 100 │
>> > └─┘
>> > (1 row)
>> >
>> > \dt+ can display actual size of partitioned table data - now zero is
>> > displayed
>> >
>> > postgres=# \dt+ data
>> > List of relations
>> > ┌┬──┬───┬───┬─┬─┐
>> > │ Schema │ Name │ Type  │ Owner │  Size   │ Description │
>> > ╞╪══╪═══╪═══╪═╪═╡
>> > │ public │ data │ table │ pavel │ 0 bytes │ │
>> > └┴──┴───┴───┴─┴─┘
>> > (1 row)
>>
>> I think we should at least display "Type" as "partitioned table" for a
>> partitioned table, so that it's easy to understand why the size is 0;
>> partitioned tables do not hold any data by themselves.
>>
>
> should be.
>
>
Yes, or maybe we can add that info in "Description".


> Some is missing still - there is not any total size across all partitions.
>
> maybe new command like
>
> \dtP+ .. show partitioned tables and their size
>

+1


Regards,
Jeevan Ladhe


Re: "Access privileges" is missing after pg_dumpall

2018-06-26 Thread Jeevan Ladhe
Seems this is a known issue and the reason I understand is that the
users/roles
may already exist but may have a different meaning, I see it has been
discussed[1]
in past and I also see there is a wiki[2] page called "Pg dump
improvements".

[1]https://www.postgresql.org/message-id/flat/5280E2AE.8070106%40usit.uio.no
[2]https://wiki.postgresql.org/wiki/Pg_dump_improvements

Regards,
Jeevan Ladhe


On Tue, Jun 26, 2018 at 12:12 PM, Prabhat Sahu <
prabhat.s...@enterprisedb.com> wrote:

> Hi,
>
> I have taken pg_dumpall in pg-master and after restoring the dump I am not
> able to see the "Access privileges" as below:
> Same is reproducible in back branches as well, is this fine ?
>
>
> CREATE ROLE user1 PASSWORD 'user1' SUPERUSER LOGIN;
> CREATE DATABASE db1 OWNER=user1;
> GRANT ALL ON DATABASE db1 TO user1;
>
> postgres=# \l+ db1
>  List of databases
>  Name | Owner | Encoding |  Collate  |   Ctype| Access
> privileges   |  Size   | Tablespace | Description
> --+---+--+++
> ---+-++-
>  db1| user1   | UTF8   | en_US.utf8 | en_US.utf8 | =Tc/user1
>+| 7621 kB | pg_default  |
>| | ||
>   | user1=CTc/user1|   ||
> (1 row)
>
> postgres=# SELECT datname as "Relation", datacl as "Access permissions"
> FROM pg_database WHERE datname = 'db1';
>  Relation | Access permissions
> --+-
>  db1  | {=Tc/user1,user1=CTc/user1}
> (1 row)
>
>
> -- pg_dumpall
> ./pg_dumpall > /tmp/dumpall.sql
>
> -- Restore
> ./psql -a -f /tmp/dumpall.sql
>
>
> postgres=# \l+ db1
>  List of databases
>  Name | Owner | Encoding |  Collate   |   Ctype   | Access
> privileges |  Size  | Tablespace | Description
> --+---+--+++
> ---+-++-
>  db1| user1   | UTF8   | en_US.utf8 | en_US.utf8 |
>   | 7699 kB | pg_default |
> (1 row)
>
> postgres=# SELECT datname as "Relation", datacl as "Access permissions"
> FROM pg_database WHERE datname = 'db1';
>  Relation | Access permissions
> --+
>  db1  |
> (1 row)
>
> --
>
> With Regards,
>
> Prabhat Kumar Sahu
> Skype ID: prabhat.sahu1984
> EnterpriseDB Corporation
>
> The Postgres Database Company
>


Re: partition tree inspection functions

2018-06-26 Thread Jeevan Ladhe
Hi Amit,

On Tue, Jun 26, 2018 at 1:07 PM, Amit Langote  wrote:

> On 2018/06/26 14:08, Amit Langote wrote:
> > Hi.
> >
> > As discussed a little while back [1] and also recently mentioned [2],
> here
> > is a patch that adds a set of functions to inspect the details of a
> > partition tree.  There are three functions:
> >
> > pg_partition_parent(regclass) returns regclass
> > pg_partition_root_parent(regclass) returns regclass
> > pg_partition_tree_tables(regclass) returns setof regclass
> >
>

I quickly tried applying your patch. Created couple of tables,
subpartitions with
mix of range and list partitions, and I see these 3 functions are working as
documented.

Also, the patch does not have any 'make check' failures.

I will do the further code review and post if any comments.

Regards,
Jeevan Ladhe