Re: Insert into on conflict, data size upto 3 billion records

2021-02-15 Thread Karthik K
exactly, for now, what I did was, as the table is already partitioned, I
created 50 different connections and tried updating the target table by
directly querying from the source partition tables. Are there any other
techniques that I can use to speed this up? also when we use on conflict
statement for both insert and update does Postgres uses batching internally
(committing for every 1 records etc) or will it update all records at
once, in that case, does it create a version for each record and do swap
all at once? I'm wondering how atomicity is guaranteed, also if I have to
do batching other than selecting from individual partitions does doing it
batches of 1 records help?





On Sat, Feb 13, 2021 at 12:04 PM Ron  wrote:

> On 2/12/21 12:46 PM, Karthik Kumar Kondamudi wrote:
>
> Hi,
>
> I'm looking for suggestions on how I can improve the performance of the
> below merge statement, we have a batch process that batch load the data
> into the _batch tables using Postgres and the task is to update the main
> target tables if the record exists else into it, sometime these batch table
> could go up to 5 billion records. Here is the current scenario
>
> target_table_main has 700,070,247  records and is hash partitioned into
> 50 chunks, it has an index on logical_ts and the batch table has
> 2,715,020,546 close to 3 billion records, so I'm dealing with a huge set of
> data so looking of doing this in the most efficient way.
>
>
> Many times, I have drastically sped up batch processing by #1 partitioning
> on the same field as an index, and #2 pre-sorting the input data by that
> field.
>
> That way, you get excellent "locality of data" (meaning lots of writes to
> the same hot bits of cache, which later get asynchronously flushed to
> disk).  Unfortunately for your situation, the purpose of hash partitioning
> is to *reduce* locality of data.  (Sometimes that's useful, but *not*
> when processing batches.)
>
> --
> Angular momentum makes the world go 'round.
>


-- 
Regards,

Karthik K Kondamudi


Re: Insert into on conflict, data size upto 3 billion records

2021-02-15 Thread Karthik K
yes, I'm using \copy to load the batch table,

with the new design that we are doing, we expect updates to be less going
forward and more inserts, one of the target columns I'm updating is
indexed, so I will drop the index and try it out, also from your suggestion
above splitting the on conflict into insert and update is performant but in
order to split the record into batches( low, high) I need to do a count of
primary key on the batch tables to first split it into batches


On Mon, Feb 15, 2021 at 11:06 AM Rob Sargent  wrote:

>
>
> On 2/15/21 11:41 AM, Karthik K wrote:
> > exactly, for now, what I did was, as the table is already partitioned, I
> > created 50 different connections and tried updating the target table by
> > directly querying from the source partition tables. Are there any other
> > techniques that I can use to speed this up? also when we use on conflict
> > statement for both insert and update does Postgres uses batching
> > internally (committing for every 1 records etc) or will it update
> > all records at once, in that case, does it create a version for each
> > record and do swap all at once? I'm wondering how atomicity is
> > guaranteed, also if I have to do batching other than selecting from
> > individual partitions does doing it batches of 1 records help?
> >
> >
>
> What is your ratio of inserts versus update?  Can you separate the
> inserts and updates?  Is the target table indexed other than on primary
> key? If so can they be dropped?
>
> Assuming you use \copy to load the batch tables
>
> I've found this strategy to be effective:
> index batch on id
>
> --update first
> begin
> update target t set "all fields" from batch b where t.id = b.id and b.id
> between "hi" and "low"
> commit
> increment hi low, avoid overlap; repeat
>
> --insert
> begin;
> insert into target as select b.* from from batch b where not exists
> (select 1 from target v where b.id = v.id) and b.id between "hi" and "low"
> commit
> increment hi, low, avoid overlap; repeat
>
>
>
>
>
>
>
>
>

-- 
Regards,

Karthik K Kondamudi


Migrating from Oracle - Implicit Casting Issue

2022-07-18 Thread Karthik K L V
Hi Team,

We are migrating from Oracle 12c to Aurora Postgres 13 and running into
implicit casting issues.

Oracle is able to implicitly cast the bind value of prepared statements
executed from the application to appropriate type - String -> Number,
String -> Date, Number -> String etc. when there is a mismatch b/w java
data type and the column Datatype.

For example: If the Datatype of a Column is defined as Number and the
application sends the bind value as a String (with single quotes in the
query) - Oracle DB is able to implicitly cast to Number and execute the
query and return the results.

The same is not true with Postgres and we are getting below exception

*org.postgresql.util.PSQLException: ERROR: operator does not exist: bigint
= character varying*
*Hint: No operator matches the given name and argument types. You might
need to add explicit type casts..*

We found a Postgres Driver property - stringtype=unspecified which appears
to solve this problem and have the following questions.
https://jdbc.postgresql.org/documentation/83/connect.html

Could you please let us know the following?

Q1) Will configuring this stringtype property introduce overhead on
Postgres leading to Performance issues
Q2)Does setting this attribute have any other implications on the data in
the DB.
Q3)Is there any plan to deprecate / stop supporting this attribute in
future Aurora Postgres releases.


-- 
Karthik klv


Re: Migrating from Oracle - Implicit Casting Issue

2022-07-19 Thread Karthik K L V
Hi David,

Thanks for the quick response.
Making sure I got it right - U mean Postgres DB Server when you say server
right?
IIUC, by configuring this property, the driver will not set the type and
leave it to the Postgres DB Server to map it to the appropriate type. Will
this have any performance implication on the Postgres server that we need
to worry about?

Thanks,
Karthik K L V

On Tue, Jul 19, 2022 at 12:12 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Monday, July 18, 2022, Karthik K L V 
> wrote:
>
>> Hi Team,
>>
>> We are migrating from Oracle 12c to Aurora Postgres 13 and running into
>> implicit casting issues.
>>
>> Oracle is able to implicitly cast the bind value of prepared statements
>> executed from the application to appropriate type - String -> Number,
>> String -> Date, Number -> String etc. when there is a mismatch b/w java
>> data type and the column Datatype.
>>
>> For example: If the Datatype of a Column is defined as Number and the
>> application sends the bind value as a String (with single quotes in the
>> query) - Oracle DB is able to implicitly cast to Number and execute the
>> query and return the results.
>>
>> The same is not true with Postgres and we are getting below exception
>>
>> *org.postgresql.util.PSQLException: ERROR: operator does not exist:
>> bigint = character varying*
>> *Hint: No operator matches the given name and argument types. You might
>> need to add explicit type casts..*
>>
>> We found a Postgres Driver property - stringtype=unspecified which
>> appears to solve this problem and have the following questions.
>> https://jdbc.postgresql.org/documentation/83/connect.html
>>
>> Could you please let us know the following?
>>
>> Q1) Will configuring this stringtype property introduce overhead on
>> Postgres leading to Performance issues
>> Q2)Does setting this attribute have any other implications on the data in
>> the DB.
>> Q3)Is there any plan to deprecate / stop supporting this attribute in
>> future Aurora Postgres releases.
>>
>>
> That setting is not recognized by the server in any way, it is a driver
> concern only.  IIUC it makes the Java Driver behave in a way consistent
> with the expectations of the server since by leaving the supplied type info
> undeclared the server can use its own logic.  If it works for you I say use
> it, it will be less problematic than methodically fixing your queries up
> front.  Though if there are some that show to be bottlenecks getting the
> type info correct may prove to make a difference in some situations.
>
> David J.
>
>

-- 
Karthik klv


operator does not exist: text = bytea

2022-07-20 Thread Karthik K L V
Hi Team,

I am getting the below error while executing a Select query using Spring
DataJPA and Hibernate framework in Aurora Postgres SQL.



*Caused by: org.postgresql.util.PSQLException: ERROR: operator does not
exist: text = bytea  Hint: No operator matches the given name and argument
types. You might need to add explicit type casts.  Position: 1037*

We have a query with bind value which sometimes gets resolved to null (no
data) depending on the application scenario.
The datatype of the bindvalue and the corresponding column is String.
The same query executes fine when the value of the bindvalue is populated.

Could you please let me know how I can resolve this issue without making
changes to the query?
Is there any configuration available in the Postgres Driver or on the
Postgres DB Server which can be set to resolve null bind values?

Appreciate your help. Thank you.

-- 
Karthik klv


Re: operator does not exist: text = bytea

2022-07-20 Thread Karthik K L V
Update:

Followed this thread
PostgreSQL: Re: Null bind variable in where clause
<https://www.postgresql.org/message-id/4906DD3E-322A-4E26-8600-B967AFA8A58F%40fastcrypt.com>

and set *transform_null_equals to ON* in the parameter group on the AWS
Console. But no luck.

We are using Aurora PostgresSQL v13.3

On Wed, Jul 20, 2022 at 3:02 PM Karthik K L V 
wrote:

> Hi Team,
>
> I am getting the below error while executing a Select query using Spring
> DataJPA and Hibernate framework in Aurora Postgres SQL.
>
>
>
> *Caused by: org.postgresql.util.PSQLException: ERROR: operator does not
> exist: text = bytea  Hint: No operator matches the given name and argument
> types. You might need to add explicit type casts.  Position: 1037*
>
> We have a query with bind value which sometimes gets resolved to null (no
> data) depending on the application scenario.
> The datatype of the bindvalue and the corresponding column is String.
> The same query executes fine when the value of the bindvalue is populated.
>
> Could you please let me know how I can resolve this issue without making
> changes to the query?
> Is there any configuration available in the Postgres Driver or on the
> Postgres DB Server which can be set to resolve null bind values?
>
> Appreciate your help. Thank you.
>
> --
> Karthik klv
>


-- 
Karthik klv


Re: operator does not exist: text = bytea

2022-07-20 Thread Karthik K L V
Hi  depesz,

Thanks for your reply. But, this issue is happening only when the bind
value of the query resolves to null. I am not trying to compare text to
bytes.
And the same query works fine when the bind value gets resolves to some
String. So, looking for an option which can tell Postgres Engine to read *=
null* as *is null*.

On Wed, Jul 20, 2022 at 5:29 PM hubert depesz lubaczewski 
wrote:

> On Wed, Jul 20, 2022 at 03:02:13PM +0530, Karthik K L V wrote:
> > *Caused by: org.postgresql.util.PSQLException: ERROR: operator does not
> > exist: text = bytea  Hint: No operator matches the given name and
> argument
> > types. You might need to add explicit type casts.  Position: 1037*
> > Could you please let me know how I can resolve this issue without making
> > changes to the query?
>
> I don't think it's possible.
>
> bytea is basically array of bytes.
> text on the other hand is array of characters.
>
> Do the bytes "\xbf\xf3\xb3\x77" equal text "żółw"?
>
> They actually kinda do, if we assume the bytes are text in encoding
> Windows-1252 - in which case the bytes mean "żółw".
>
> But if we'd assume the encoding to be, for example, iso8859-1, then the
> same sequence of bytes means "¿ó³w"
>
> That's why you can't compare bytes to characters.
>
> You have to either convert bytes to text using convert or convert_from
> functions, or change text into bytea using convert_to.
>
> In some cases you can simply cast text to bytea:
>
> $ select 'depesz'::text::bytea;
>  bytea
> 
>  \x64657065737a
> (1 row)
>
> which will work using current server encoding, afair.
>
> depesz
>


-- 
Karthik klv


Postgres SQL unable to handle Null values for Text datatype

2022-09-05 Thread Karthik K L V
Hi Team,

We are migrating from Oracle 12C to Aurora Postgres 13 and running into
query failures when the bind value of a Text datatype resolves to null.

The same query works fine in Oracle without any issues. We use
SpringDataJPA and Hibernate framework to connect and execute queries and
the application uses native queries.

Here is an example query:
*Select * from A where middle_name=?1*

The above query fails with the below exception when the value of ?1
resolves to null.


*org.postgresql.util.PSQLException: ERROR: operator does not exist:
character varying = bytea  Hint: No operator matches the given name and
argument types. You might need to add explicit type casts.  Position: 64*

We debugged through the Hibernate code comparing Oracle vs Postgres for the
same query to understand if the framework was doing anything different when
switched to Postgres and didn't notice any difference in the behaviour.

We have also set *transform_null_equals *to ON in Postgres..but this
doesn't help.

Could you please let us know if there are any other configurations that
need to be set in Postgres to make it work similar to Oracle?

This issue is impacting multiple modules in our application and any help
will be appreciated.

-- 
Karthik klv