Too many range table entries error

2018-06-25 Thread Akshaya Acharya
Hello.


Please could you help debug the error "too many range table entries”?


This error occurs when querying a view that is dependent on many other
views (i.e. a view high up in the pyramid of views that we've constructed).


I get this error when running select * on the view, or when running an
explain analyse on the select.


Views that use a total of more than around 4 table references (in the
complete tree considering all the dependent views recursively) don't work,
but it works with 2 table references. What is the maximum number of
table references possible?


Can I increase this number somehow?


Perhaps relevant:

Postgres docs, what is range table
https://www.postgresql.org/docs/current/static/querytree.html

postgres src, error message
https://github.com/postgres/postgres/blob/master/src/backend/optimizer/plan/setrefs.c


Postgres version 10.3 from official docker image.


Thanks

Akshaya


Re: Too many range table entries error

2018-06-25 Thread Andres Freund
Hi,

On 2018-06-25 13:02:37 +0530, Akshaya Acharya wrote:
> Hello.
> 
> 
> Please could you help debug the error "too many range table entries”?
> 
> 
> This error occurs when querying a view that is dependent on many other
> views (i.e. a view high up in the pyramid of views that we've constructed).
> 
> 
> I get this error when running select * on the view, or when running an
> explain analyse on the select.
> 
> 
> Views that use a total of more than around 4 table references (in the
> complete tree considering all the dependent views recursively) don't work,
> but it works with 2 table references. What is the maximum number of
> table references possible?

Why are you doing this?  I can't imagine queries with that many table
references ever being something useful? I'm pretty sure there's better
solutions for what you're doing.


> Can I increase this number somehow?

It's not impossible, it's not entirely trivial either. The relevant
variables currently are 16bit wide, and the limit is close to the max
for that.

Greetings,

Andres Freund



Re: Too many range table entries error

2018-06-25 Thread Akshaya Acharya
On Mon, 25 Jun 2018 at 13:40, Andres Freund  wrote:

> Hi,
>
> On 2018-06-25 13:02:37 +0530, Akshaya Acharya wrote:
> > Hello.
> >
> >
> > Please could you help debug the error "too many range table entries”?
> >
> >
> > This error occurs when querying a view that is dependent on many other
> > views (i.e. a view high up in the pyramid of views that we've
> constructed).
> >
> >
> > I get this error when running select * on the view, or when running an
> > explain analyse on the select.
> >
> >
> > Views that use a total of more than around 4 table references (in the
> > complete tree considering all the dependent views recursively) don't
> work,
> > but it works with 2 table references. What is the maximum number of
> > table references possible?
>
> Why are you doing this?  I can't imagine queries with that many table
> references ever being something useful? I'm pretty sure there's better
> solutions for what you're doing.
>

Our entire application—all our business logic—is built as layers of views
inside the database. The ref counts sort of multiple at each layer, hence
the large number.


>
>
> > Can I increase this number somehow?
>
> It's not impossible, it's not entirely trivial either. The relevant
> variables currently are 16bit wide, and the limit is close to the max
> for that.
>

I understand.

At slide 25 of this presentation a patch is indicated. Is this relevant to
our situation?
https://www.slideshare.net/hansjurgenschonig/postgresql-joining-1-million-tables

Alternatively we will have to optimize our views or change the architecture
of our application? Is there any other way to resolve this situation?


>
> Greetings,
>
> Andres Freund
>


Re: PostgreSQL : encryption with pgcrypto

2018-06-25 Thread Bruce Momjian
On Thu, May 17, 2018 at 07:07:00AM +, ROS Didier wrote:
> Hi
> 
>Regarding the encryption of data by pgcrypto, I would like to
> know the recommendations for the management of the key.
> 
>Is it possible to store it off the PostgreSQL server?
> 
>Is there the equivalent of Oracle "wallet" ?

Late reply, but the last presentation on this page shows how to use
cryptographic hardware with Postgres:

https://momjian.us/main/presentations/security.html

You could modify that to use a key management system (KMS).

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Using COPY to import large xml file

2018-06-25 Thread Anto Aravinth
Thanks a lot. But I do got lot of challenges! Looks like SO data contains
lot of tabs within itself.. So tabs delimiter didn't work for me. I thought
I can give a special demiliter but looks like Postrgesql copy allow only
one character as delimiter :(

Sad, I guess only way is to insert or do a through serialization of my data
into something that COPY can understand.

On Mon, Jun 25, 2018 at 8:09 AM, Tim Cross  wrote:

>
>
> On Mon, 25 Jun 2018 at 11:38, Anto Aravinth 
> wrote:
>
>>
>>
>> On Mon, Jun 25, 2018 at 3:44 AM, Tim Cross  wrote:
>>
>>>
>>> Anto Aravinth  writes:
>>>
>>> > Thanks for the response. I'm not sure, how long does this tool takes
>>> for
>>> > the 70GB data.
>>> >
>>> > I used node to stream the xml files into inserts.. which was very
>>> slow..
>>> > Actually the xml contains 40 million records, out of which 10Million
>>> took
>>> > around 2 hrs using nodejs. Hence, I thought will use COPY command, as
>>> > suggested on the internet.
>>> >
>>> > Definitely, will try the code and let you know.. But looks like it
>>> uses the
>>> > same INSERT, not copy.. interesting if it runs quick on my machine.
>>> >
>>> > On Sun, Jun 24, 2018 at 9:23 PM, Adrien Nayrat <
>>> adrien.nay...@anayrat.info>
>>> > wrote:
>>> >
>>> >> On 06/24/2018 05:25 PM, Anto Aravinth wrote:
>>> >> > Hello Everyone,
>>> >> >
>>> >> > I have downloaded the Stackoverflow posts xml (contains all SO
>>> questions
>>> >> till
>>> >> > date).. the file is around 70GB.. I wanna import the data in those
>>> xml
>>> >> to my
>>> >> > table.. is there a way to do so in postgres?
>>> >> >
>>> >> >
>>> >> > Thanks,
>>> >> > Anto.
>>> >>
>>> >> Hello Anto,
>>> >>
>>> >> I used this tool :
>>> >> https://github.com/Networks-Learning/stackexchange-dump-to-postgres
>>> >>
>>>
>>> If you are using nodejs, then you can easily use the pg-copy-streams
>>> module to insert the records into your database. I've been using this
>>> for inserting large numbers of records from NetCDF files. Takes between
>>> 40 to 50 minutes to insert 60 Million+ records and we are doing
>>> additional calculations on the values, not just inserting them,
>>> plus we are inserting into a database over the network and into a
>>> database which is
>>> also performing other processing.
>>>
>>> We found a significant speed improvement with COPY over blocks of insert
>>> transactions, which was faster than just individual inserts. The only
>>> downside with using COPY is that it either completely works or
>>> completely fails and when it fails, it can be tricky to work out which
>>> record is causing the failure. A benefit of using blocks of transactions
>>> is that you have more fine grained control, allowing you to recover from
>>> some errors or providing more specific detail regarding the cause of the
>>> error.
>>>
>>
>> Sure, let me try that.. I have a question here, COPY usually works when
>> you move data from files to your postgres instance, right? Now in node.js,
>> processing the whole file, can I use COPY
>> programmatically like COPY Stackoverflow ?
>> Because from doc:
>>
>> https://www.postgresql.org/docs/9.2/static/sql-copy.html
>>
>> I don't see its possible. May be I need to convert the files to copy
>> understandable first?
>>
>> Anto.
>>
>>>
>>>
>>
> Yes. Essentially what you do is create a stream and feed whatever
> information you want to copy into that stream. PG sees the. data as if it
> was seeing each line in a file, so you push data onto the stream wherre
> each item is seperated by a tab (or whatever). Here is the basic low level
> function I use (Don't know how the formatting will go!)
>
> async function copyInsert(sql, stringifyFN, records) {
>   const logName = `${moduleName}.copyInsert`;
>   var client;
>
>   assert.ok(Array.isArray(records), "The records arg must be an array");
>   assert.ok(typeof(stringifyFN) === "function", "The stringifyFN arg must
> be a function");
>
>   return getClient()
> .then(c => {
>   client = c;
>   return new Promise(function(resolve, reject) {
> var stream, rs;
> var idx = 0;
>
> function done() {
>   releaseClient(client);
>   client = undefined;
>   resolve(idx + 1);
> }
>
> function onError(err) {
>   if (client !== undefined) {
> releaseClient(client);
>   }
>   reject(new VError(err, `${logName}: COPY failed at record
> ${idx}`));
> }
>
> function arrayRead() {
>   if (idx === records.length) {
> rs.push(null);
>   } else {
> let rec = records[idx];
> rs.push(stringifyFN(rec));
> idx += 1;
>   }
> }
>
> rs = new Readable;
> rs._read = arrayRead;
> rs.on("error", onError);
> stream = client.query(copyFrom(sql));
> stream.on("error", onError);
> stream.on("end", done);
> rs.pipe(stream);
>   });
> })
> .catch(err => {
>   th

Re: Using COPY to import large xml file

2018-06-25 Thread Nicolas Paris
2018-06-25 16:25 GMT+02:00 Anto Aravinth :

> Thanks a lot. But I do got lot of challenges! Looks like SO data contains
> lot of tabs within itself.. So tabs delimiter didn't work for me. I thought
> I can give a special demiliter but looks like Postrgesql copy allow only
> one character as delimiter :(
>
> Sad, I guess only way is to insert or do a through serialization of my
> data into something that COPY can understand.
>

​easiest way would be:
xml -> csv -> \copy

​by csv, I mean regular quoted csv (Simply wrap csv field with double
quote, and escape
enventually contained quotes with an other double quote.).

Postgresql copy csv parser is one of the most robust I ever tested
before.


Re: Using COPY to import large xml file

2018-06-25 Thread Anto Aravinth
On Mon, Jun 25, 2018 at 8:20 PM, Nicolas Paris  wrote:

>
> 2018-06-25 16:25 GMT+02:00 Anto Aravinth :
>
>> Thanks a lot. But I do got lot of challenges! Looks like SO data contains
>> lot of tabs within itself.. So tabs delimiter didn't work for me. I thought
>> I can give a special demiliter but looks like Postrgesql copy allow only
>> one character as delimiter :(
>>
>> Sad, I guess only way is to insert or do a through serialization of my
>> data into something that COPY can understand.
>>
>
> ​easiest way would be:
> xml -> csv -> \copy
>
> ​by csv, I mean regular quoted csv (Simply wrap csv field with double
> quote, and escape
> enventually contained quotes with an other double quote.).
>

I tried but no luck. Here is the sample csv, I wrote from my xml convertor:

1   "Are questions about animations or comics inspired by Japanese
culture or styles considered on-topic?"  "pExamples include a href=""
http://www.imdb.com/title/tt0417299/""; rel=""nofollow""Avatar/a, a href=""
http://www.imdb.com/title/tt1695360/""; rel=""nofollow""Korra/a and, to some
extent, a href=""http://www.imdb.com/title/tt0278238/"";
rel=""nofollow""Samurai Jack/a. They're all widely popular American
cartoons, sometimes even referred to as ema href=""
https://en.wikipedia.org/wiki/Anime-influenced_animation"";
rel=""nofollow""Amerime/a/em./p


pAre questions about these series on-topic?/p

"   "pExamples include a href=""http://www.imdb.com/title/tt0417299/"";
rel=""nofollow""Avatar/a, a href=""http://www.imdb.com/title/tt1695360/"";
rel=""nofollow""Korra/a and, to some extent, a href=""
http://www.imdb.com/title/tt0278238/""; rel=""nofollow""Samurai Jack/a.
They're all widely popular American cartoons, sometimes even referred to as
ema href=""https://en.wikipedia.org/wiki/Anime-influenced_animation"";
rel=""nofollow""Amerime/a/em./p


pAre questions about these series on-topic?/p

"   "null"

the schema of my table is:

  CREATE TABLE so2 (
id  INTEGER NOT NULL PRIMARY KEY,
title varchar(1000) NULL,
posts text,
body TSVECTOR,
parent_id INTEGER NULL,
FOREIGN KEY (parent_id) REFERENCES so1(id)
);

and when I run:

COPY so2 from '/Users/user/programs/js/node-mbox/file.csv';


I get:


CONTEXT:  COPY so2, line 1: "1 "Are questions about animations or comics
inspired by Japanese culture or styles considered on-top..."

Not sure what I'm missing. Not sure the above csv is breaking because I
have newlines within my content. But the error message is very hard to
debug.



>
> Postgresql copy csv parser is one of the most robust I ever tested
> before.
>


Re: Using COPY to import large xml file

2018-06-25 Thread Anto Aravinth
On Mon, Jun 25, 2018 at 8:54 PM, Anto Aravinth 
wrote:

>
>
> On Mon, Jun 25, 2018 at 8:20 PM, Nicolas Paris 
> wrote:
>
>>
>> 2018-06-25 16:25 GMT+02:00 Anto Aravinth :
>>
>>> Thanks a lot. But I do got lot of challenges! Looks like SO data
>>> contains lot of tabs within itself.. So tabs delimiter didn't work for me.
>>> I thought I can give a special demiliter but looks like Postrgesql copy
>>> allow only one character as delimiter :(
>>>
>>> Sad, I guess only way is to insert or do a through serialization of my
>>> data into something that COPY can understand.
>>>
>>
>> ​easiest way would be:
>> xml -> csv -> \copy
>>
>> ​by csv, I mean regular quoted csv (Simply wrap csv field with double
>> quote, and escape
>> enventually contained quotes with an other double quote.).
>>
>
> I tried but no luck. Here is the sample csv, I wrote from my xml convertor:
>
> 1   "Are questions about animations or comics inspired by Japanese
> culture or styles considered on-topic?"  "pExamples include a href=""
> http://www.imdb.com/title/tt0417299/""; rel=""nofollow""Avatar/a, a href=""
> http://www.imdb.com/title/tt1695360/""; rel=""nofollow""Korra/a and, to
> some extent, a href=""http://www.imdb.com/title/tt0278238/"";
> rel=""nofollow""Samurai Jack/a. They're all widely popular American
> cartoons, sometimes even referred to as ema href=""https://en.wikipedia.
> org/wiki/Anime-influenced_animation"" rel=""nofollow""Amerime/a/em./p
>
>
> pAre questions about these series on-topic?/p
>
> "   "pExamples include a href=""http://www.imdb.com/title/tt0417299/"";
> rel=""nofollow""Avatar/a, a href=""http://www.imdb.com/title/tt1695360/"";
> rel=""nofollow""Korra/a and, to some extent, a href=""http://www.imdb.com/
> title/tt0278238/"" rel=""nofollow""Samurai Jack/a. They're all widely
> popular American cartoons, sometimes even referred to as ema href=""
> https://en.wikipedia.org/wiki/Anime-influenced_animation"";
> rel=""nofollow""Amerime/a/em./p
>
>
> pAre questions about these series on-topic?/p
>
> "   "null"
>
> the schema of my table is:
>
>   CREATE TABLE so2 (
> id  INTEGER NOT NULL PRIMARY KEY,
> title varchar(1000) NULL,
> posts text,
> body TSVECTOR,
> parent_id INTEGER NULL,
> FOREIGN KEY (parent_id) REFERENCES so1(id)
> );
>
> and when I run:
>
> COPY so2 from '/Users/user/programs/js/node-mbox/file.csv';
>
>
> I get:
>
>
> *ERROR:  missing data for column "body"*

*CONTEXT:  COPY so2, line 1: "1 "Are questions about animations or comics
inspired by Japanese culture or styles considered on-top..."*


> CONTEXT:  COPY so2, line 1: "1 "Are questions about animations or comics
> inspired by Japanese culture or styles considered on-top..."
>
> Not sure what I'm missing. Not sure the above csv is breaking because I
> have newlines within my content. But the error message is very hard to
> debug.
>
>
>
>>
>> Postgresql copy csv parser is one of the most robust I ever tested
>> before.
>>
>
>


RE: Load data from a csv file without using COPY

2018-06-25 Thread Kevin Brannen
From: Ravi Krishna [mailto:srkris...@yahoo.com]
Sent: Tuesday, June 19, 2018 4:15 PM
To: Steve Atkins 
Cc: PG mailing List 
Subject: Re: Load data from a csv file without using COPY

>
> If performance is relevant then your app should probably be using COPY
> protocol, not line by line inserts. It's supported by most postgresql
> access libraries. If your app does that then using "\copy" from psql would be 
> an appropriate benchmark.

Actually the reluctance to not use COPY is to make the benchmark same across 
two different RDBMS in two diff env.

---

Seems like a ~10 line Perl program could handle this very easily. Use the 
Text::CSV module to make handling of the input easier. Prepare your insert 
statement, then once you've pulled each line of input in execute the insert. 
For slightly better performance, I'd probably use transactions, add a counter, 
and commit every 10,000 rows (or something like that).

Once you have that working with 1 DB, you can copy your program, change the DBD 
driver and the connect statement to the other DB and try the other one. Unless 
you want to be really clever and make the same program do both and pick the DB 
by a command line switch. :)

HTH,
Kevin
This e-mail transmission, and any documents, files or previous e-mail messages 
attached to it, may contain confidential information. If you are not the 
intended recipient, or a person responsible for delivering it to the intended 
recipient, you are hereby notified that any disclosure, distribution, review, 
copy or use of any of the information contained in or attached to this message 
is STRICTLY PROHIBITED. If you have received this transmission in error, please 
immediately notify us by reply e-mail, and destroy the original transmission 
and its attachments without reading them or saving them to disk. Thank you.



Re: Using COPY to import large xml file

2018-06-25 Thread Nicolas Paris
2018-06-25 17:30 GMT+02:00 Anto Aravinth :

>
>
> On Mon, Jun 25, 2018 at 8:54 PM, Anto Aravinth <
> anto.aravinth@gmail.com> wrote:
>
>>
>>
>> On Mon, Jun 25, 2018 at 8:20 PM, Nicolas Paris 
>> wrote:
>>
>>>
>>> 2018-06-25 16:25 GMT+02:00 Anto Aravinth :
>>>
 Thanks a lot. But I do got lot of challenges! Looks like SO data
 contains lot of tabs within itself.. So tabs delimiter didn't work for me.
 I thought I can give a special demiliter but looks like Postrgesql copy
 allow only one character as delimiter :(

 Sad, I guess only way is to insert or do a through serialization of my
 data into something that COPY can understand.

>>>
>>> ​easiest way would be:
>>> xml -> csv -> \copy
>>>
>>> ​by csv, I mean regular quoted csv (Simply wrap csv field with double
>>> quote, and escape
>>> enventually contained quotes with an other double quote.).
>>>
>>
>> I tried but no luck. Here is the sample csv, I wrote from my xml
>> convertor:
>>
>> 1   "Are questions about animations or comics inspired by Japanese
>> culture or styles considered on-topic?"  "pExamples include a href=""
>> http://www.imdb.com/title/tt0417299/""; rel=""nofollow""Avatar/a, a
>> href=""http://www.imdb.com/title/tt1695360/""; rel=""nofollow""Korra/a
>> and, to some extent, a href=""http://www.imdb.com/title/tt0278238/"";
>> rel=""nofollow""Samurai Jack/a. They're all widely popular American
>> cartoons, sometimes even referred to as ema href=""
>> https://en.wikipedia.org/wiki/Anime-influenced_animation"";
>> rel=""nofollow""Amerime/a/em./p
>>
>>
>> pAre questions about these series on-topic?/p
>>
>> "   "pExamples include a href=""http://www.imdb.com/title/tt0417299/"";
>> rel=""nofollow""Avatar/a, a href=""http://www.imdb.com/title/tt1695360/"";
>> rel=""nofollow""Korra/a and, to some extent, a href=""
>> http://www.imdb.com/title/tt0278238/""; rel=""nofollow""Samurai Jack/a.
>> They're all widely popular American cartoons, sometimes even referred to as
>> ema href=""https://en.wikipedia.org/wiki/Anime-influenced_animation"";
>> rel=""nofollow""Amerime/a/em./p
>>
>>
>> pAre questions about these series on-topic?/p
>>
>> "   "null"
>>
>> the schema of my table is:
>>
>>   CREATE TABLE so2 (
>> id  INTEGER NOT NULL PRIMARY KEY,
>> title varchar(1000) NULL,
>> posts text,
>> body TSVECTOR,
>> parent_id INTEGER NULL,
>> FOREIGN KEY (parent_id) REFERENCES so1(id)
>> );
>>
>> and when I run:
>>
>> COPY so2 from '/Users/user/programs/js/node-mbox/file.csv';
>>
>>
>> I get:
>>
>>
>> *ERROR:  missing data for column "body"*
>
> *CONTEXT:  COPY so2, line 1: "1 "Are questions about animations or comics
> inspired by Japanese culture or styles considered on-top..."*
>
>
>> CONTEXT:  COPY so2, line 1: "1 "Are questions about animations or comics
>> inspired by Japanese culture or styles considered on-top..."
>>
>> Not sure what I'm missing. Not sure the above csv is breaking because I
>> have newlines within my content. But the error message is very hard to
>> debug.
>>
>>

​What you are missing is the configuration of COPY statement​ (please refer
to https://www.postgresql.org/docs/9.2/static/sql-copy.html)
such format, delimiter, quote and escape


Schema/Data conversion opensource tools from MySQL to PostgreSQL

2018-06-25 Thread chiru r
Hi All,


Please suggest Schema/Data conversion opensource tools from MySQL to
PostgreSQL.


Thanks,
Chiranjeevi


Re: using pg_basebackup for point in time recovery

2018-06-25 Thread Bruce Momjian
On Thu, Jun 21, 2018 at 04:50:38PM -0700, David G. Johnston wrote:
> On Thu, Jun 21, 2018 at 4:26 PM, Vik Fearing 
> wrote:
> 
> On 21/06/18 07:27, Michael Paquier wrote:
> > Attached is a patch which includes your suggestion.  What do you think?
> > As that's an improvement, only HEAD would get that clarification.
> 
> Say what?  If the clarification applies to previous versions, as it
> does, it should be backpatched.  This isn't a change in behavior, it's a
> change in the description of existing behavior.
> 
> 
> Generally only actual bug fixes get back-patched; but I'd have to say this
> looks like it could easily be classified as one.

FYI, in recent discussions on the docs list:


https://www.postgresql.org/message-id/CABUevEyumGh3r05U3_mhRrEU=dfacdrr2hew140mvn7fsbm...@mail.gmail.com

there was the conclusion that:

If it's a clean backpatch I'd say it is -- people who are using
PostgreSQL 9.6 will be reading the documentation for 9.6 etc, so they
will not know about the fix then.

If it's not a clean backpatch I can certainly see considering it, but if
it's not a lot of effort then I'd say it's definitely worth it.

so the rule I have been using for backpatching doc stuff has changed
recently.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Schema/Data conversion opensource tools from MySQL to PostgreSQL

2018-06-25 Thread Pavan Teja
Hi Chiru,

You can use MySQL foreign data wrapper to achieve this.

Regards,
Pavan

On Mon, Jun 25, 2018, 10:18 PM chiru r  wrote:

> Hi All,
>
>
> Please suggest Schema/Data conversion opensource tools from MySQL to
> PostgreSQL.
>
>
> Thanks,
> Chiranjeevi
>


Re: Schema/Data conversion opensource tools from MySQL to PostgreSQL

2018-06-25 Thread James Keener
Can you explain what you're looking for? Beyond manually editing the schema
dump to import correctly into postgres and the loading in INSERT dumps or
MySQL file (TSV?) dumps, what are you looking for? Do you have any
odd/incompatible data types?

Jim

On Mon, Jun 25, 2018 at 11:47 AM, chiru r  wrote:

> Hi All,
>
>
> Please suggest Schema/Data conversion opensource tools from MySQL to
> PostgreSQL.
>
>
> Thanks,
> Chiranjeevi
>


Re: [External] Re: Schema/Data conversion opensource tools from MySQL to PostgreSQL

2018-06-25 Thread Vijaykumar Jain
I have not tested this, but have read about this in somewhere.
https://github.com/the4thdoctor/pg_chameleon

if it helps
  good
else
  ignore

Thanks,
Vijay
From: Pavan Teja 
Date: Monday, June 25, 2018 at 10:25 PM
To: chiru r 
Cc: "pgsql-gene...@postgresql.org >> PG-General Mailing List" 

Subject: [External] Re: Schema/Data conversion opensource tools from MySQL to 
PostgreSQL

Hi Chiru,

You can use MySQL foreign data wrapper to achieve this.

Regards,
Pavan

On Mon, Jun 25, 2018, 10:18 PM chiru r 
mailto:chir...@gmail.com>> wrote:
Hi All,


Please suggest Schema/Data conversion opensource tools from MySQL to PostgreSQL.


Thanks,
Chiranjeevi


DB size growing exponentially when materialized view refreshed concurrently (postgres 9.6)

2018-06-25 Thread Vikas Sharma
Hi All,

I am looking for advice in a issue where two materialized views are being
refreshed concurrently and dbsize has grown to 150gb from 4gb in two days.

We use two materialized views to keep processed data for faster query
results for a search function. Earlier materialized views were refreshed
not concurrently and all was good on DB.

We changed mv refresh to concurrently to take advantage of simultaneous
access when mv refreshed. Now the refresh takes slightly longer and but DB
size has grown exponentially.

I ran full vacuum on DB and size again reduced to 4gb from 150gb.

We need to refresh mvs every 5 mins so I created a script to refresh MV
concurrently and then running vacuum (full,analyze) on the mv immediately.
I hoped it would solve the issue of DB size growing exponentially but it
hasn't and size still growing.

Please advice how can I refresh MV concurrently and DB size doesn't grow.

Much appreciated.

Regards
Vikas


Re: Too many range table entries error

2018-06-25 Thread Andres Freund
On 2018-06-25 13:46:06 +0530, Akshaya Acharya wrote:
> On Mon, 25 Jun 2018 at 13:40, Andres Freund  wrote:
> 
> > Hi,
> >
> > On 2018-06-25 13:02:37 +0530, Akshaya Acharya wrote:
> > > Hello.
> > >
> > >
> > > Please could you help debug the error "too many range table entries”?
> > >
> > >
> > > This error occurs when querying a view that is dependent on many other
> > > views (i.e. a view high up in the pyramid of views that we've
> > constructed).
> > >
> > >
> > > I get this error when running select * on the view, or when running an
> > > explain analyse on the select.
> > >
> > >
> > > Views that use a total of more than around 4 table references (in the
> > > complete tree considering all the dependent views recursively) don't
> > work,
> > > but it works with 2 table references. What is the maximum number of
> > > table references possible?
> >
> > Why are you doing this?  I can't imagine queries with that many table
> > references ever being something useful? I'm pretty sure there's better
> > solutions for what you're doing.
> >
> 
> Our entire application—all our business logic—is built as layers of views
> inside the database. The ref counts sort of multiple at each layer, hence
> the large number.

That still doesn't explain how you realistically get to 40k references,
and how that's a reasonable design. There's be quite the massive runtime
and memory overhead for an approach like this.   What was the reasoning
leading to this architecture.


> > > Can I increase this number somehow?
> >
> > It's not impossible, it's not entirely trivial either. The relevant
> > variables currently are 16bit wide, and the limit is close to the max
> > for that.
> >
> 
> I understand.
> 
> At slide 25 of this presentation a patch is indicated. Is this relevant to
> our situation?
> https://www.slideshare.net/hansjurgenschonig/postgresql-joining-1-million-tables

Yes, but that change likely isn't sufficient.


> Alternatively we will have to optimize our views or change the architecture
> of our application? Is there any other way to resolve this situation?

Yes I think you will have to, and no I don't see any other.

Greetings,

Andres Freund



Re: Schema/Data conversion opensource tools from MySQL to PostgreSQL

2018-06-25 Thread Thomas Kellerer

chiru r schrieb am 25.06.2018 um 18:47:

Please suggest Schema/Data conversion opensource tools from MySQL to PostgreSQL.


ora2pg also supports MySQL: https://github.com/darold/ora2pg






Re: DB size growing exponentially when materialized view refreshed concurrently (postgres 9.6)

2018-06-25 Thread Alban Hertroys


> On 25 Jun 2018, at 19:21, Vikas Sharma  wrote:
> 
> I am looking for advice in a issue where two materialized views are being 
> refreshed concurrently and dbsize has grown to 150gb from 4gb in two days.
> 
> We use two materialized views to keep processed data for faster query results 
> for a search function. Earlier materialized views were refreshed not 
> concurrently and all was good on DB.
> 
> We changed mv refresh to concurrently to take advantage of simultaneous 
> access when mv refreshed. Now the refresh takes slightly longer and but DB 
> size has grown exponentially.
> 
> I ran full vacuum on DB and size again reduced to 4gb from 150gb. 

You did not disable or tune down autovacuum perchance?
With materialized view refreshes that often, you probably need fairly 
aggressive autovacuuming on that table - you can tune autovacuum parameters per 
table (see Storage parameters). That probably won't put you at 4GB, more around 
double that size, but it should stay a reasonable size that way.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Using COPY to import large xml file

2018-06-25 Thread Adrian Klaver

On 06/25/2018 07:25 AM, Anto Aravinth wrote:
Thanks a lot. But I do got lot of challenges! Looks like SO data 
contains lot of tabs within itself.. So tabs delimiter didn't work for 
me. I thought I can give a special demiliter but looks like Postrgesql 
copy allow only one character as delimiter :(


I use | as it is rarely found in data itself.



Sad, I guess only way is to insert or do a through serialization of my 
data into something that COPY can understand.


On Mon, Jun 25, 2018 at 8:09 AM, Tim Cross > wrote:




On Mon, 25 Jun 2018 at 11:38, Anto Aravinth
mailto:anto.aravinth@gmail.com>>
wrote:



On Mon, Jun 25, 2018 at 3:44 AM, Tim Cross
mailto:theophil...@gmail.com>> wrote:


Anto Aravinth mailto:anto.aravinth@gmail.com>> writes:

> Thanks for the response. I'm not sure, how long does this tool 
takes for
> the 70GB data.
>
> I used node to stream the xml files into inserts.. which was very 
slow..
> Actually the xml contains 40 million records, out of which 
10Million took
> around 2 hrs using nodejs. Hence, I thought will use COPY 
command, as
> suggested on the internet.
>
> Definitely, will try the code and let you know.. But looks like 
it uses the
> same INSERT, not copy.. interesting if it runs quick on my 
machine.
>
> On Sun, Jun 24, 2018 at 9:23 PM, Adrien Nayrat mailto:adrien.nay...@anayrat.info>>
> wrote:
>
>> On 06/24/2018 05:25 PM, Anto Aravinth wrote:
>> > Hello Everyone,
>> >
>> > I have downloaded the Stackoverflow posts xml (contains all SO 
questions
>> till
>> > date).. the file is around 70GB.. I wanna import the data in 
those xml
>> to my
>> > table.. is there a way to do so in postgres?
>> >
>> >
>> > Thanks,
>> > Anto.
>>
>> Hello Anto,
>>
>> I used this tool :
>> 
https://github.com/Networks-Learning/stackexchange-dump-to-postgres


>>

If you are using nodejs, then you can easily use the
pg-copy-streams
module to insert the records into your database. I've been
using this
for inserting large numbers of records from NetCDF files.
Takes between
40 to 50 minutes to insert 60 Million+ records and we are doing
additional calculations on the values, not just inserting them,
plus we are inserting into a database over the network and
into a database which is
also performing other processing.

We found a significant speed improvement with COPY over
blocks of insert
transactions, which was faster than just individual inserts.
The only
downside with using COPY is that it either completely works or
completely fails and when it fails, it can be tricky to work
out which
record is causing the failure. A benefit of using blocks of
transactions
is that you have more fine grained control, allowing you to
recover from
some errors or providing more specific detail regarding the
cause of the
error.


Sure, let me try that.. I have a question here, COPY usually
works when you move data from files to your postgres instance,
right? Now in node.js, processing the whole file, can I use COPY
programmatically like COPY Stackoverflow ? Because from doc:

https://www.postgresql.org/docs/9.2/static/sql-copy.html


I don't see its possible. May be I need to convert the files to
copy understandable first?

Anto.




Yes. Essentially what you do is create a stream and feed whatever
information you want to copy into that stream. PG sees the. data as
if it was seeing each line in a file, so you push data onto the
stream wherre each item is seperated by a tab (or whatever). Here is
the basic low level function I use (Don't know how the formatting
will go!)

async function copyInsert(sql, stringifyFN, records) {
   const logName = `${moduleName}.copyInsert`;
   var client;

   assert.ok(Array.isArray(records), "The records arg must be an
array");
   assert.ok(typeof(stringifyFN) === "function", "The stringifyFN
arg must be a function");
   return getClient()
     .then(c => {
       client = c;
       return new Promise(function(resolve, reject) {
         var stream, rs;
  

Re: Schema/Data conversion opensource tools from MySQL to PostgreSQL

2018-06-25 Thread Adrian Klaver

On 06/25/2018 09:47 AM, chiru r wrote:

Hi All,


Please suggest Schema/Data conversion opensource tools from MySQL to 
PostgreSQL.




To add to the list:

pgloader:

http://pgloader.readthedocs.io/en/latest/tutorial/tutorial.html#migrating-from-mysql-to-postgresql



Thanks,
Chiranjeevi



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Load data from a csv file without using COPY

2018-06-25 Thread Adrian Klaver

On 06/25/2018 09:17 AM, Kevin Brannen wrote:

From: Ravi Krishna [mailto:srkris...@yahoo.com]
Sent: Tuesday, June 19, 2018 4:15 PM
To: Steve Atkins 
Cc: PG mailing List 
Subject: Re: Load data from a csv file without using COPY



If performance is relevant then your app should probably be using COPY
protocol, not line by line inserts. It's supported by most postgresql
access libraries. If your app does that then using "\copy" from psql would be 
an appropriate benchmark.


Actually the reluctance to not use COPY is to make the benchmark same across 
two different RDBMS in two diff env.


That is not clear from this post:

https://www.postgresql.org/message-id/em2345975f-0c51-42dd-a35f-ff88715e8bbb%40ravi-lenovo


"3. We now have a catch-22 situation.  Should we spend time porting the
app to PG without first verifying
 that PG can perform as well as DB2. In other words, if some sort of
testing rules out PG as a good
 replacement for DB2, why even bother to port.  Of course that does
not prove conclusively that if PG
 passes the test, then it would mean that the app will work just as
fine.  But at least basic test will tell
that we are not on a wrong path.
4. What I am planning is:
 4.a Get a set of large tables exported as a pipe delimited text
file.
 4.b Load them in both DB2 and PG on a similar h/w
 4.c  Run OLAP queries.

4.b is to test i/o. Our app is sensitive to the load times and some of
the tables are really wide.
4.c is to test maturity of PG in handling complex OLAP SQLs. From what I
have read, while PG
  optimizer is very good in handling OLTP, it is not, as yet, as good
in OLAP queries.

"

I read that as can Postgres replace DB2 and not lose performance? That 
would, to me, mean use whatever works best to get the job done.




---

Seems like a ~10 line Perl program could handle this very easily. Use the 
Text::CSV module to make handling of the input easier. Prepare your insert 
statement, then once you've pulled each line of input in execute the insert. 
For slightly better performance, I'd probably use transactions, add a counter, 
and commit every 10,000 rows (or something like that).

Once you have that working with 1 DB, you can copy your program, change the DBD 
driver and the connect statement to the other DB and try the other one. Unless 
you want to be really clever and make the same program do both and pick the DB 
by a command line switch. :)

HTH,
Kevin



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: DB size growing exponentially when materialized view refreshed concurrently (postgres 9.6)

2018-06-25 Thread Vikas Sharma
Hi Alban,
I haven't disabled autovacuum task, it's running fine for other objects.
I was also getting "Error: cancelling autovacuum task" on the materialized
view when concurrently refreshed so decided to write a script and run
vacuum (full, analyze) the MV immediately after concurrent refresh but
still it's not working as intended.

On Mon, Jun 25, 2018, 20:02 Alban Hertroys  wrote:

>
> > On 25 Jun 2018, at 19:21, Vikas Sharma  wrote:
> >
> > I am looking for advice in a issue where two materialized views are
> being refreshed concurrently and dbsize has grown to 150gb from 4gb in two
> days.
> >
> > We use two materialized views to keep processed data for faster query
> results for a search function. Earlier materialized views were refreshed
> not concurrently and all was good on DB.
> >
> > We changed mv refresh to concurrently to take advantage of simultaneous
> access when mv refreshed. Now the refresh takes slightly longer and but DB
> size has grown exponentially.
> >
> > I ran full vacuum on DB and size again reduced to 4gb from 150gb.
>
> You did not disable or tune down autovacuum perchance?
> With materialized view refreshes that often, you probably need fairly
> aggressive autovacuuming on that table - you can tune autovacuum parameters
> per table (see Storage parameters). That probably won't put you at 4GB,
> more around double that size, but it should stay a reasonable size that way.
>
> Regards,
>
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>
>


Re: Schema/Data conversion opensource tools from MySQL to PostgreSQL

2018-06-25 Thread chiru r
Thank you All.

On Mon, Jun 25, 2018 at 3:40 PM, Adrian Klaver 
wrote:

> On 06/25/2018 09:47 AM, chiru r wrote:
>
>> Hi All,
>>
>>
>> Please suggest Schema/Data conversion opensource tools from MySQL to
>> PostgreSQL.
>>
>>
> To add to the list:
>
> pgloader:
>
> http://pgloader.readthedocs.io/en/latest/tutorial/tutorial.
> html#migrating-from-mysql-to-postgresql
>
>
>> Thanks,
>> Chiranjeevi
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


PLPython Setup issue with EDB Windows Installer

2018-06-25 Thread Anthony DeBarros
I sent a report to EDB about this, but since I'm not a support subscriber I
am not sure it will get through. Thought I would share here and seek some
tips or feedback.

---

Installing EDB Postgres Standard 10.4 on Windows 10 does not set up
PLPython support correctly. (I also installed the Language Pack via Stack
Builder.)

Following the installation, running CREATE EXTENSION plypythonu; generates
this error:
"ERROR: could not access file "$libdir/plpython2": No such file or
directory SQL state: 58P01"

Trying CREATE EXTENSION plpython3u; generates this error:
"Could not load library "C:/Program Files/PostgreSQL/10/lib/plpython3.dll":
The specified module could not be found "

To track down the issue, I used Dependency Walker (
http://www.dependencywalker.com/) to examine plpython3.dll, which is stored
in the Postgres /lib directory. That DLL is looking for the file
python34.dll to be located in C:\windows\system32, but the EBD installer
doesn't place it there.

The solution is to copy python34.dll into C:\windows\system32 from the
language pack files. Once python34.dll is present in C:\windows\system32,
then you can create the language.

This is a bit annoying and not easily solvable for a typical user!

Anyone here from EDB who can address the issue?

Thanks,
Anthony DeBarros


Re: DB size growing exponentially when materialized view refreshed concurrently (postgres 9.6)

2018-06-25 Thread Jeff Janes
On Mon, Jun 25, 2018 at 1:21 PM, Vikas Sharma  wrote:

> Hi All,
>
> I am looking for advice in a issue where two materialized views are being
> refreshed concurrently and dbsize has grown to 150gb from 4gb in two days.
>
> We use two materialized views to keep processed data for faster query
> results for a search function. Earlier materialized views were refreshed
> not concurrently and all was good on DB.
>

Where is the space going?  Does it show up in psql with \l+ ?  Does is show
up in \dm+ ?  Does it only show up using OS tools in the data directory?
Which subdirectory of the data directory?

Cheers,

Jeff


Re: Using COPY to import large xml file

2018-06-25 Thread Tim Cross


Anto Aravinth  writes:

> Thanks a lot. But I do got lot of challenges! Looks like SO data contains
> lot of tabs within itself.. So tabs delimiter didn't work for me. I thought
> I can give a special demiliter but looks like Postrgesql copy allow only
> one character as delimiter :(
>
> Sad, I guess only way is to insert or do a through serialization of my data
> into something that COPY can understand.
>

The COPY command has a number of options, including setting what is used
as the delimiter - it doesn't have to be tab. You need to also look at
the logs/output to see exactly why the copy fails.

I'd recommend first pre-processing your input data to make sure it is
'clean' and all the fields actually match with whatever DDL you have
used to define your db tables etc. I'd then select a small subset and
try different parameters to the copy command until you get the right
combination of data format and copy definition.

It may take some effort to get the right combination, but the result is
probably worth it given your data set size i.e. difference between hours
and days. 

--
Tim Cross



Re: PostgreSQL Volume Question

2018-06-25 Thread Data Ace
I appreciate everyone's feedback and help and will consider everyone's
input. Thanks again.

On Fri, Jun 15, 2018 at 9:26 AM, Data Ace  wrote:

> Well I think my question is somewhat away from my intention cause of my
> poor understanding and questioning :(
>
>
>
> Actually, I have 1TB data and have hardware spec enough to handle this
> amount of data, but the problem is that it needs too many join operations
> and the analysis process is going too slow right now.
>
>
>
> I've searched and found that graph model nicely fits for network data like
> social data in query performance.
>
>
>
> Should I change my DB (I mean my DB for analysis)? or do I need some other
> solutions or any extension?
>
>
> Thanks
>
> On Thu, Jun 14, 2018 at 3:36 PM, Melvin Davidson 
> wrote:
>
>>
>>
>> On Thu, Jun 14, 2018 at 6:30 PM, Adrian Klaver > > wrote:
>>
>>> On 06/14/2018 02:33 PM, Data Ace wrote:
>>>
 Hi, I'm new to the community.

 Recently, I've been involved in a project that develops a social
 network data analysis service (and my client's DBMS is based on 
 PostgreSQL).
 I need to gather huge volume of unstructured raw data for this project,
 and the problem is that with PostgreSQL, it would be so dfficult to handle
 this kind of data. Are there any PG extension modules or methods that are
 recommended for my project?

>>>
>>> In addition to Ravi's questions:
>>>
>>> What does the data look like?
>>>
>>> What Postgres version?
>>>
>>> How is the data going to get from A <--> B, local or remotely or both?
>>>
>>> Is there another database or program involved in the process?
>>>
>>>
 Thanks in advance.

>>>
>>>
>>> --
>>> Adrian Klaver
>>> adrian.kla...@aklaver.com
>>>
>>>
>> In addition to Ravi's and Adrian's questions:
>>
>> What is the hardware configuration?
>>
>> --
>> *Melvin Davidson*
>> *Maj. Database & Exploration Specialist*
>> *Universe Exploration Command – UXC*
>> Employment by invitation only!
>>
>
>


Re: Too many range table entries error

2018-06-25 Thread Tom Lane
Andres Freund  writes:
> On 2018-06-25 13:46:06 +0530, Akshaya Acharya wrote:
>> Our entire application-all our business logic-is built as layers of views
>> inside the database. The ref counts sort of multiple at each layer, hence
>> the large number.

> That still doesn't explain how you realistically get to 40k references,
> and how that's a reasonable design.

The short answer here is that even if the system accepted queries with
that many tables, it's really unlikely to perform acceptably --- in fact,
I'm a bit astonished that you even found a way to reach this error without
having waited a few hours beforehand.  And we are *not* going to promise
to fix all the performance issues you will hit with a schema design like
this.  Redesign.  Please.

regards, tom lane



Re: using pg_basebackup for point in time recovery

2018-06-25 Thread Michael Paquier
On Mon, Jun 25, 2018 at 12:51:10PM -0400, Bruce Momjian wrote:
> FYI, in recent discussions on the docs list:
> 
>   
> https://www.postgresql.org/message-id/CABUevEyumGh3r05U3_mhRrEU=dfacdrr2hew140mvn7fsbm...@mail.gmail.com

I did not recall this one.  Thanks for the reminder, Bruce.

> There was the conclusion that:
> 
>   If it's a clean backpatch I'd say it is -- people who are using
>   PostgreSQL 9.6 will be reading the documentation for 9.6 etc, so they
>   will not know about the fix then.
>   
>   If it's not a clean backpatch I can certainly see considering it, but if
>   it's not a lot of effort then I'd say it's definitely worth it.
> 
> so the rule I have been using for backpatching doc stuff has changed
> recently.

In the case of this thread, I think that the patch applies cleanly
anyway as this comes from the period where hot standbys have been
introduced.  So that would not be a lot of work...  Speaking of which,
it would be nice to be sure about the wording folks here would prefer
using before fixing anything ;p
--
Michael


signature.asc
Description: PGP signature


Re: Schema/Data conversion opensource tools from MySQL to PostgreSQL

2018-06-25 Thread Arnaud L.

Le 25-06-18 à 18:47, chiru r a écrit :
Please suggest Schema/Data conversion opensource tools from MySQL to 
PostgreSQL.


Hi.
I used this php script which did a pretty good job : 
https://github.com/AnatolyUss/FromMySqlToPostgreSql


--
Regards



Re: pg_upgrade and wraparound

2018-06-25 Thread Alexander Shutyaev
Hello again,

I've performed another test - I've migrated to the new cluster using dump
restore: pg_dumpall | psql. It went well, although it took 6 days while
pg_upgrade usually took a night.

Is there any hope the issue with pg_upgrade can be resolved? If not, could
you give me some hints as to how can I decrease time needed for pg_dumpall
| psql?

Thanks in advance!

2018-06-13 0:11 GMT+03:00 Alexander Shutyaev :

> Back again,
>
> >> Alexander, could you hack things up so autovacuum logging is enabled
> >> (log_autovacuum_min_duration=0), and see whether it's triggered?
>
> I've changed this config setting in both 9.6 and 10.4 postgresql.conf,
> then I've ran pg_upgrade once more.
>
> However I'm not sure how can I see whether autovacuum was triggered or
> not. I've tried grepping the logs for lines containing both 'vacuum' and
> 'auto' (case-insensitive) - there were none. If you can be more specific, I
> can look for anything else.
>
> I've googled on how can one see that the autovacuum is working, and found
> out this query, which I ran on the 10.4 cluster:
>
> select count(*) from pg_stat_all_tables where last_autovacuum is not null;
>  count
> ---
>  0
> (1 row)
>
> So it seems autovacuum is indeed not working, just as you proposed.
>
> If I correctly summarized all your responses, the problem is that:
>
> 1) pg_restore (as part of pg_upgrade) inserts each large object in a
> different transaction
>
> That seems true to me given the log output - each time an object is
> inserted the wraparound warning decrements by 1
>
> 2) the autovacuum doesn't work while the database is restored
>
> That also seems true (see above)
>
> 3) the number of large objects is so big that as they are restored the
> transaction wraparound occurs
>
> Here's the number of large objects taken from the 9.6 cluster (spaces
> added manually for clarity):
>
> select count(*) from pg_largeobject_metadata ;
>count
> ---
>  133 635 871
> (1 row)
>
> If I've googled correctly - the transaction number is a 32bit integer so
> it's limit is 2 147 483 647 which is a lot more. I guess I'm missing
> something.
>
> This is just my attempt to summarize our progress so far.
>
> I'm further open to your suggestions.
>
> 2018-06-12 14:32 GMT+03:00 Daniel Verite :
>
>> Andres Freund wrote:
>>
>> > I'm not entirely clear why pg_restore appears to use a separate
>> > transaction for each large object, surely exascerbating the problem.
>>
>> To make sure that per-object locks don't fill up the shared
>> lock table?
>> There might be hundreds of thousands of large objects.
>> If it had to restore N objects per transaction, would it know
>> how to compute N that is large enough to be effective
>> and small enough not to exhaust the shared table?
>>
>> Best regards,
>> --
>> Daniel Vérité
>> PostgreSQL-powered mailer: http://www.manitou-mail.org
>> Twitter: @DanielVerite
>>
>
>