[GENERAL] Benchmarking partitioning triggers and rules

2015-03-08 Thread Tim Uckun
I wrote a quick benchmarking script to test various partitioning
strategies. The code is here.

https://gist.github.com/timuckun/954ab6bdce36fa14bc1c

I was astonished at the variability of the timings between the different
variations.

The test data contained 270K records.  I did a simple insert into without
any triggers, with three different trigger variations and with a rule.  The
results were like this

clean  0.00   0.00   0.00 (  3.119498)
func_1  0.00   0.00   0.00 (  7.435094)
func_2  0.00   0.00   0.00 ( 28.427617)
func_3  0.00   0.00   0.00 ( 18.348554)
Rule   0.00   0.00   0.00 (  2.901931)

A clean insert 3.1 seconds,  putting a rule took less time!

A simple insert into table_name values (NEW.*) doubled the time it takes to
insert the records.  Using an EXECUTE with an interpolated table name took
TEN TIMES LONGER!. Making a slight change to the EXECUTE took a third off
the execution time WTF???

This has left me both baffled and curious.  If changing little things like
this makes a huge difference what else can I do to make this even faster?

Would using a different language help? Is Javasscript, Python or Perl
faster?  Is there some other syntax I can use?  I tried this

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES ('  ||
NEW.* || ')' but that gave me an error.


Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-03-11 Thread Tim Uckun
Hey I hate to bump my post but I would really appreciate some input on this
benchmark. I am very alarmed that adding a very simple partitioning trigger
slows the insert speed by an order of magnitude.  Is there any way to speed
this up?

On Mon, Mar 9, 2015 at 11:44 AM, Tim Uckun  wrote:

> I wrote a quick benchmarking script to test various partitioning
> strategies. The code is here.
>
> https://gist.github.com/timuckun/954ab6bdce36fa14bc1c
>
> I was astonished at the variability of the timings between the different
> variations.
>
> The test data contained 270K records.  I did a simple insert into without
> any triggers, with three different trigger variations and with a rule.  The
> results were like this
>
> clean  0.00   0.00   0.00 (  3.119498)
> func_1  0.00   0.00   0.00 (  7.435094)
> func_2  0.00   0.00   0.00 ( 28.427617)
> func_3  0.00   0.00   0.00 ( 18.348554)
> Rule   0.00   0.00   0.00 (  2.901931)
>
> A clean insert 3.1 seconds,  putting a rule took less time!
>
> A simple insert into table_name values (NEW.*) doubled the time it takes
> to insert the records.  Using an EXECUTE with an interpolated table name
> took TEN TIMES LONGER!. Making a slight change to the EXECUTE took a third
> off the execution time WTF???
>
> This has left me both baffled and curious.  If changing little things like
> this makes a huge difference what else can I do to make this even faster?
>
> Would using a different language help? Is Javasscript, Python or Perl
> faster?  Is there some other syntax I can use?  I tried this
>
> EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES ('
> || NEW.* || ')' but that gave me an error.
>
>
>


Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-03-11 Thread Tim Uckun
I am using postgres 9.4, the default install with "brew install postgres,
no tuning at all.  BTW if I use postgres.app application the benchmarks run
twice as slow!

Why do you think there is such dramatic difference between

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
($1).*' USING NEW ;

and

 EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES(
($1).*)' USING NEW ;

One is thirty percent faster than the other.  Also is there an even better
way that I don't know about.


On Thu, Mar 12, 2015 at 10:42 AM, Tomas Vondra  wrote:

> On 11.3.2015 21:43, Tim Uckun wrote:
> > Hey I hate to bump my post but I would really appreciate some input
> > on this benchmark. I am very alarmed that adding a very simple
> > partitioning trigger slows the insert speed by an order of magnitude.
> > Is there any way to speed this up?
>
> I think to actually give you a meaningful response, we really need more
> details about your configuration - what PostgreSQL version are you
> using, what configuration have you changed from the defaults and such.
>
> Anyway, you're right that triggers are not cheap. The numbers I get on
> the development version with a bit of tuning look like this:
>
> INSERT (direct)1.5 sec
> INSERT 4.0 sec
> INSERT (EXECUTE)  11.5 sec
>
> So it's ~ what you've measured. Rules have the lowest overhead, but also
> there's a lot of tricky places.
>
> There's not much you can do, except for inserting the data directly into
> the right partition (without any triggers).
>
>
> --
> Tomas Vondrahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


[GENERAL] Name spacing functions and stored procedures

2015-03-18 Thread Tim Uckun
What do you guys do to namespace your functions so that they are not
jumbled in with the system functions and also somewhat hierarchically
organized.

Obviously it's possible to create schemas for different namespaces but that
can lead to a lot of schemas which hold no data.   The other way is to
simply name your functions like _lib_etl_csv_import_weird_data_format but
that's not too much fun either.

Just curious how other people deal with the issue.


Re: [GENERAL] Name spacing functions and stored procedures

2015-03-19 Thread Tim Uckun
I guess I see schemas as ways to group data not functions.

It would be very nice to be able to group your code into proper modules
though. It's something I really miss.


>>


Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-04-07 Thread Tim Uckun
I understand that there is overhead involved in parsing the strings and
such.  The amount of overhead was surprising to me but that's another
matter.  What I am really puzzled about is the difference between the
statements

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
($1).*' USING NEW ;

and

  EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES
(($1).*)' USING NEW ;

They both do string interpolation but one is significantly faster than the
other.   Is there a third and even faster way?

I am using RDS so I can't really do stored procs in C.


Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-04-08 Thread Tim Uckun
So is there a third and even faster way of doing this?

On Wed, Apr 8, 2015 at 4:08 PM, Adrian Klaver 
wrote:

> On 04/07/2015 07:49 PM, Tim Uckun wrote:
>
>> I understand that there is overhead involved in parsing the strings and
>> such.  The amount of overhead was surprising to me but that's another
>> matter.  What I am really puzzled about is the difference between the
>> statements
>>
>> EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
>> ($1).*' USING NEW ;
>>
>> and
>>
>>EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES
>> (($1).*)' USING NEW ;
>>
>
>
> Offhand I would say because in the first case you are doing a SELECT and
> in the second you are just doing a substitution.
>
>
>
>> They both do string interpolation but one is significantly faster than
>> the other.   Is there a third and even faster way?
>>
>> I am using RDS so I can't really do stored procs in C.
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: [GENERAL] Help with slow table update

2015-04-19 Thread Tim Uckun
On Sat, Apr 18, 2015 at 10:24 AM, Pawel Veselov 
wrote:

> I found some dangling prepared transactions



How do you find and remove these?


[GENERAL] IDE for function/stored proc development.

2016-09-03 Thread Tim Uckun
Does anybody use an IDE for doing heavy duty stored proc development?
PGadmin is decent but I am looking for something better.

I have tried jetbrains with the db browser plugin and on the surface it
seems like a good choice but it's really buggy when working with procs.

I also tried datagrip by jetbrains and that too seems to be all over the
place. It has some amazing features for working with the database but some
of the simplest stuff is lacking or half baked.

I looked at atom and could not find any useful plugins for PG.

Anybody have experience with something awesome?


Re: [GENERAL] IDE for function/stored proc development.

2016-09-03 Thread Tim Uckun
I was hoping there was some IDE which made that process seamless. Something
like PgAdmin but better editing features and features like "find
definition" or "find usages" and such.  The jetbrains products come close
but as I said they are buggy and don't work very well with postgres.

On Sat, Sep 3, 2016 at 11:03 PM, Pavel Stehule 
wrote:

> Hi
>
> 2016-09-03 11:36 GMT+02:00 Tim Uckun :
>
>> Does anybody use an IDE for doing heavy duty stored proc development?
>> PGadmin is decent but I am looking for something better.
>>
>> I have tried jetbrains with the db browser plugin and on the surface it
>> seems like a good choice but it's really buggy when working with procs.
>>
>> I also tried datagrip by jetbrains and that too seems to be all over the
>> place. It has some amazing features for working with the database but some
>> of the simplest stuff is lacking or half baked.
>>
>> I looked at atom and could not find any useful plugins for PG.
>>
>> Anybody have experience with something awesome?
>>
>
> I am using the Emacs - but any editor should be ok. There is one rule -
> edit file first, and import to database as next step. PGadmin is pretty bad
> tool for maintaing stored procedures.
>
> Regards
>
> Pavel
>
>
>


[GENERAL] Is there a way to fix this ugliness

2016-09-09 Thread Tim Uckun
I am trying to get the child elements of a one to many table to be rolled
up into a json field in the parent table. The query I am running is

select
ob.id
 ,case when array_position(array_agg(im.image_type), null) = 1  then
'[]' else  json_agg(row_to_json(im.*)) end as images
 from observations ob
   left join images im on ob.id = im.observation_id
group by 1


The reason I have the case statement there is because some observations
don't have images but the json_agg(row_to_json function returns [NULL]
instead of [] which is what I really want.

Is there a more elegant way to do this?

I searched on the internet and somebody suggested coalesce but no matter
what combination I tried I could not make it happen.

Thanks.


Re: [GENERAL] Is there a way to fix this ugliness

2016-09-09 Thread Tim Uckun
I could not make coalesce in the top level select to work even though
people on the internet say it should work. I'll do the sub select that
seems like it would work fine.

On Sat, Sep 10, 2016 at 3:22 AM, Karl Czajkowski  wrote:

> On Sep 10, Tim Uckun modulated:
> > I am trying to get the child elements of a one to many table to be
> > rolled up into a json field in the parent table. The query I am running
> > is...
>
> The problem is aggregating over the results of the left-outer join,
> which introduces NULLs. You can try pushing that down into a sub-query
> to create one image row per observation prior to joining:
>
>   SELECT
> ob.id,
> im.images
>   FROM observations ob
>   LEFT OUTER JOIN (
> SELECT
>   observation_id,
>   json_agg(row_to_json(im.*)) AS images
> FROM images im
> GROUP BY observation_id
>   ) im ON (ob.id = im.observation_id) ;
>
> you might use COALESCE in the top-level SELECT if you want to replace
> any NULL im.images with a different empty value constant...
>
>
> Karl
>
>


[GENERAL] Performance PLV8 vs PLPGSQL

2016-12-28 Thread Tim Uckun
I have seen various links on the internet which indicate that PLV8 is
significantly faster than PL-PGSQL sometimes an order of magnitude faster.

Is this uniformly true or is it just in certain circumstances?

Is there any benefit to choosing PL-PGSQL?

Is there work going on to make PL-PGSQL more performant or has it gotten
significantly faster in the last two years or so (some of the links are a
from a while ago).

Thanks.


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Tim Uckun
I am not doubting the efficacy of stored procs, just wondering which
language is better. From the sound of it string manupilation is slow in
PL-PGSQL but looking at my procs there does seem to be a lot of string
manipulation going on so maybe I better do some tests.



On Thu, Dec 29, 2016 at 3:02 AM, Mike Sofen  wrote:

> *From:* Tim Uckun
> I have seen various links on the internet which indicate that PLV8 is
> significantly faster than PL-PGSQL sometimes an order of magnitude faster.
>
>
>
> Is there any benefit to choosing PL-PGSQL?
>
> 
>
> I can’t speak to PLV8.  However, I can speak to plpgsql, and specifically
> stored functions (procs).  I use it exclusively to create a database API
> for real-time web applications to hit.  My API calls (procs) are hitting
> large tables, sometimes doing complex logic within the sproc.  It allows me
> to provide a simple, standardized interface to the web devs, allowing them
> to focus on the app code work.
>
>
>
> Performance is superb and continues to surprise me (I came from the SQL
> Server world).  As others have mentioned, the natural lashup of plpgsql to
> postgres (I liked Alban’s term, “impedance”), is a key aspect.  Also:
>
>
>
> -stored procs provide another security layer against sql
> injection attacks.
>
> -Caching SEEMS to be more efficient/effective with stored procs
> (that could be wishful thinking too).
>
> -Stored procs allow skilled sql practitioners to provide far more
> sophisticated sql solutions than the typical python developer is capable
> of…my experience is that most web devs don’t really understand databases
> (or even care about them – they are a necessary evil), so providing a pure
> encapsulated sql solution (via stored procs) removes that mental impedance
> mismatch.
>
> -Performance?  Simple “get” procs that return data for a specific
> indexed query against larger tables (50m+ rows) in a few milliseconds…I can
> live with that kind of performance.
>
> -I’m also doing some heavy lifting in the sql, calculating
> histograms and boxplots for data visualizations.  This is an unusual
> scenario, but the other option is sending a massive chunk of data to
> another server for processing – just the transit time would kill the deal.
> I am mindful that at a certain point, there won’t be enough memory and i/o
> to go around, but the web app is a low user count/high user task complexity
> app, so I’ve tailored the model to match.
>
>
>
> Mike Sofen  (Synthetic Genomics)
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Tim Uckun
I think it's awesome that postgres allows you to code in different
languages like this. It really is a unique development environment and one
that is overlooked as a development platform.  It would be nice if more
languages were delivered in the default package especially lua, V8 and
mruby.



On Thu, Dec 29, 2016 at 9:31 PM, Chris Travers 
wrote:

> My recommendation.  See them as tools in a toolkit, not a question of what
> is best.
>
> For places where you have SQL statements as primary do SQL or PLPGSQL
> functions.
>
> For places where you are manipulating values (parsing strings for example)
> use something else (I usually use pl/perl for string manipulation but ymmv).
>
> PLPGSQL works best where you have a large query and some procedurally
> supporting logic.  It becomes a lot less usable, performant, and
> maintainable the further you get away from that.
>
> So there is no best just different tools in a toolkit.
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Tim Uckun
Mostly generating SQL statements to execute. Like for example deciding
which partition to insert into.

On Thu, Dec 29, 2016 at 10:00 PM, Pavel Stehule 
wrote:

>
>
> 2016-12-29 9:23 GMT+01:00 Tim Uckun :
>
>> I am not doubting the efficacy of stored procs, just wondering which
>> language is better. From the sound of it string manupilation is slow in
>> PL-PGSQL but looking at my procs there does seem to be a lot of string
>> manipulation going on so maybe I better do some tests.
>>
>
> It is interesting, what string operations you are doing in stored
> procedures?
>
> Regards
>
> Pavel
>
>
>>
>>
>> On Thu, Dec 29, 2016 at 3:02 AM, Mike Sofen  wrote:
>>
>>> *From:* Tim Uckun
>>> I have seen various links on the internet which indicate that PLV8 is
>>> significantly faster than PL-PGSQL sometimes an order of magnitude faster.
>>>
>>>
>>>
>>> Is there any benefit to choosing PL-PGSQL?
>>>
>>> 
>>>
>>> I can’t speak to PLV8.  However, I can speak to plpgsql, and
>>> specifically stored functions (procs).  I use it exclusively to create a
>>> database API for real-time web applications to hit.  My API calls (procs)
>>> are hitting large tables, sometimes doing complex logic within the sproc.
>>> It allows me to provide a simple, standardized interface to the web devs,
>>> allowing them to focus on the app code work.
>>>
>>>
>>>
>>> Performance is superb and continues to surprise me (I came from the SQL
>>> Server world).  As others have mentioned, the natural lashup of plpgsql to
>>> postgres (I liked Alban’s term, “impedance”), is a key aspect.  Also:
>>>
>>>
>>>
>>> -stored procs provide another security layer against sql
>>> injection attacks.
>>>
>>> -Caching SEEMS to be more efficient/effective with stored procs
>>> (that could be wishful thinking too).
>>>
>>> -Stored procs allow skilled sql practitioners to provide far
>>> more sophisticated sql solutions than the typical python developer is
>>> capable of…my experience is that most web devs don’t really understand
>>> databases (or even care about them – they are a necessary evil), so
>>> providing a pure encapsulated sql solution (via stored procs) removes that
>>> mental impedance mismatch.
>>>
>>> -Performance?  Simple “get” procs that return data for a
>>> specific indexed query against larger tables (50m+ rows) in a few
>>> milliseconds…I can live with that kind of performance.
>>>
>>> -I’m also doing some heavy lifting in the sql, calculating
>>> histograms and boxplots for data visualizations.  This is an unusual
>>> scenario, but the other option is sending a massive chunk of data to
>>> another server for processing – just the transit time would kill the deal.
>>> I am mindful that at a certain point, there won’t be enough memory and i/o
>>> to go around, but the web app is a low user count/high user task complexity
>>> app, so I’ve tailored the model to match.
>>>
>>>
>>>
>>> Mike Sofen  (Synthetic Genomics)
>>>
>>
>>
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Tim Uckun
Honestly I don't even like JS. Having said that I am not too crazy about
PL-PGSQL either. I am willing to put up with either given that they are
supported widely in default installs of postgres in AWS, Linux and MacOSX,

As I said before, I think posgres gives a unique and underutilized language
platform. You can code in different languages, it has a good variety of
built in types, and of course you get persistance and caching built in!
Using DBLINK you might even be able to separate out your code from the bulk
of your data in another database. Postgres all the way down!

It's fun to play around with.  There is a lot of missing pieces though. A
good IDE like thing would be good, version control would be nice, deeper
namespacing (hierarchical schemas?), easier testing etc would go a long
way.

Thanks for all the input guys!

On Fri, Dec 30, 2016 at 12:14 AM, Ivan Sergio Borgonovo <
m...@webthatworks.it> wrote:

> On 12/29/2016 10:35 AM, Pavel Stehule wrote:
>
> 2016-12-29 10:03 GMT+01:00 Tim Uckun > <mailto:timuc...@gmail.com>>:
>>
>> I think it's awesome that postgres allows you to code in different
>> languages like this. It really is a unique development environment
>> and one that is overlooked as a development platform.  It would be
>> nice if more languages were delivered in the default package
>> especially lua, V8 and mruby.
>>
>>
>> It is about dependencies and maintenance. There are not too much people
>> who has good experience with C embedding Lua, V8 and others. Any people
>> who can do some work are welcome.
>>
>> The living outside main package has disadvantages - only enthusiast
>> knows about it, but some advantages too - you are not fixed on
>> PostgreSQL development cycle, and development can be faster.
>>
>
> I'll add my 2 cents.
>
> Postgresql and in general SQL are about integrity and coherency.
> Checking coherency is much easier with strict data type.
> PL/PGSQL gives you that, JS is far far away from that.
>
> Postgresql is a very flexible database and you can stretch it to do "MEAN
> like"[1] stuff but that's going to increase your "impedance mismatch".
>
> If you think there is some space for JS in your application stack that's
> nearer to the client rather than to the DB.
> Or possibly you need to do "MEAN like" stuff but you don't want to install
> another "database".
>
> As other said using stored procedures is a two edged sword.
> It can decouple DB schema from the application or it can increase the
> coupling.
> Choosing JS for performance in the stored procedure realm is going to
> encourage coupling and make scalability harder and it is going to become a
> mess when you'll need to refactor.
>
> [1] https://en.wikipedia.org/wiki/MEAN_(software_bundle)
>
> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it http://www.borgonovo.net
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Tim Uckun
I am not saying the postgres core people should work on an IDE, just that
an IDE like thing would be nice.

On Fri, Dec 30, 2016 at 12:51 PM, Rob Sargent  wrote:

> I would hope Postgres core folk take no more than a nanosecond to reject
> the idea that they work on an IDE. Focus on reading and writing faster and
> faster ACID all the while.
>
> On Dec 29, 2016, at 5:32 PM, Tim Uckun  wrote:
>
> Honestly I don't even like JS. Having said that I am not too crazy about
> PL-PGSQL either. I am willing to put up with either given that they are
> supported widely in default installs of postgres in AWS, Linux and MacOSX,
>
> As I said before, I think posgres gives a unique and underutilized
> language platform. You can code in different languages, it has a good
> variety of built in types, and of course you get persistance and caching
> built in!  Using DBLINK you might even be able to separate out your code
> from the bulk of your data in another database. Postgres all the way down!
>
> It's fun to play around with.  There is a lot of missing pieces though. A
> good IDE like thing would be good, version control would be nice, deeper
> namespacing (hierarchical schemas?), easier testing etc would go a long
> way.
>
> Thanks for all the input guys!
>
> On Fri, Dec 30, 2016 at 12:14 AM, Ivan Sergio Borgonovo <
> m...@webthatworks.it> wrote:
>
>> On 12/29/2016 10:35 AM, Pavel Stehule wrote:
>>
>> 2016-12-29 10:03 GMT+01:00 Tim Uckun >> <mailto:timuc...@gmail.com>>:
>>>
>>> I think it's awesome that postgres allows you to code in different
>>> languages like this. It really is a unique development environment
>>> and one that is overlooked as a development platform.  It would be
>>> nice if more languages were delivered in the default package
>>> especially lua, V8 and mruby.
>>>
>>>
>>> It is about dependencies and maintenance. There are not too much people
>>> who has good experience with C embedding Lua, V8 and others. Any people
>>> who can do some work are welcome.
>>>
>>> The living outside main package has disadvantages - only enthusiast
>>> knows about it, but some advantages too - you are not fixed on
>>> PostgreSQL development cycle, and development can be faster.
>>>
>>
>> I'll add my 2 cents.
>>
>> Postgresql and in general SQL are about integrity and coherency.
>> Checking coherency is much easier with strict data type.
>> PL/PGSQL gives you that, JS is far far away from that.
>>
>> Postgresql is a very flexible database and you can stretch it to do "MEAN
>> like"[1] stuff but that's going to increase your "impedance mismatch".
>>
>> If you think there is some space for JS in your application stack that's
>> nearer to the client rather than to the DB.
>> Or possibly you need to do "MEAN like" stuff but you don't want to
>> install another "database".
>>
>> As other said using stored procedures is a two edged sword.
>> It can decouple DB schema from the application or it can increase the
>> coupling.
>> Choosing JS for performance in the stored procedure realm is going to
>> encourage coupling and make scalability harder and it is going to become a
>> mess when you'll need to refactor.
>>
>> [1] https://en.wikipedia.org/wiki/MEAN_(software_bundle)
>>
>> --
>> Ivan Sergio Borgonovo
>> http://www.webthatworks.it http://www.borgonovo.net
>>
>>
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Tim Uckun
I have datagrip and it's OK but it doesn't really do everything I want.

I don't understand why it doesn't fetch all objects from the database and
then put them into the disk in a directory so I can put it all under git
and then let me work on them syncing the files back as they change.  For
example today I just renamed a function. It didn't refactor properly by
identifying stored procs that reference it. If I was using another
jetbrains IDE it would have built an index of the project files and did a
proper refactor.

This would also allow you to make wholesale disk changes and then sync them
up properly to get around postgres dependency issues.

On Fri, Dec 30, 2016 at 1:40 PM, Michael Sheaver  wrote:

> If you want an IDE, Jetbrains, the makers of great IDEs like IntelliJ,
> PyCharm. and AppCode, among others, have recently come out with what is
> arguably the BEST IDE for DBAs, DataGrip. It runs on most major platforms,
> and is so good that I have bitten the bullet and paid the yearly
> subscription for it.
>
> Leave the Postgres core alone focus on what they do best, and that is
> making the very BEST database environment that can be had at any price,
> period. Is Postgres perfect? No, not at all. But no other group is so
> focused on performance, scalability and security as these folks are. And
> the rate of development, enhancement and continual improvement is, quite
> honestly, astounding.
>
> So here is my hat tip to the Postgres team for an awesome job they are
> doing!
>
> On Dec 29, 2016, at 7:19 PM, Tim Uckun  wrote:
>
> I am not saying the postgres core people should work on an IDE, just that
> an IDE like thing would be nice.
>
> On Fri, Dec 30, 2016 at 12:51 PM, Rob Sargent 
> wrote:
>
>> I would hope Postgres core folk take no more than a nanosecond to reject
>> the idea that they work on an IDE. Focus on reading and writing faster and
>> faster ACID all the while.
>>
>> On Dec 29, 2016, at 5:32 PM, Tim Uckun  wrote:
>>
>> Honestly I don't even like JS. Having said that I am not too crazy about
>> PL-PGSQL either. I am willing to put up with either given that they are
>> supported widely in default installs of postgres in AWS, Linux and MacOSX,
>>
>> As I said before, I think posgres gives a unique and underutilized
>> language platform. You can code in different languages, it has a good
>> variety of built in types, and of course you get persistance and caching
>> built in!  Using DBLINK you might even be able to separate out your code
>> from the bulk of your data in another database. Postgres all the way down!
>>
>> It's fun to play around with.  There is a lot of missing pieces though. A
>> good IDE like thing would be good, version control would be nice, deeper
>> namespacing (hierarchical schemas?), easier testing etc would go a long
>> way.
>>
>> Thanks for all the input guys!
>>
>> On Fri, Dec 30, 2016 at 12:14 AM, Ivan Sergio Borgonovo <
>> m...@webthatworks.it> wrote:
>>
>>> On 12/29/2016 10:35 AM, Pavel Stehule wrote:
>>>
>>> 2016-12-29 10:03 GMT+01:00 Tim Uckun >>> <mailto:timuc...@gmail.com>>:
>>>>
>>>> I think it's awesome that postgres allows you to code in different
>>>> languages like this. It really is a unique development environment
>>>> and one that is overlooked as a development platform.  It would be
>>>> nice if more languages were delivered in the default package
>>>> especially lua, V8 and mruby.
>>>>
>>>>
>>>> It is about dependencies and maintenance. There are not too much people
>>>> who has good experience with C embedding Lua, V8 and others. Any people
>>>> who can do some work are welcome.
>>>>
>>>> The living outside main package has disadvantages - only enthusiast
>>>> knows about it, but some advantages too - you are not fixed on
>>>> PostgreSQL development cycle, and development can be faster.
>>>>
>>>
>>> I'll add my 2 cents.
>>>
>>> Postgresql and in general SQL are about integrity and coherency.
>>> Checking coherency is much easier with strict data type.
>>> PL/PGSQL gives you that, JS is far far away from that.
>>>
>>> Postgresql is a very flexible database and you can stretch it to do
>>> "MEAN like"[1] stuff but that's going to increase your "impedance mismatch".
>>>
>>> If you think there is some space for JS in your application stack that's
>>> nearer to the client rather than t

Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-31 Thread Tim Uckun
Yes I use a migrations tool as well. I like ruby so I use either the Sequel
migrations or the ActiveRecord migrations depending on the project. That's
a great way to work. Having said that I think it might be an interesting
thing to do to create the PGAdmin hierarchy of objects on disk. Not the
data itself, just the table defs, views, code etc. It might be interesting
to see what ctags could do with that. Being able to grep to find something
would be useful. A proper IDE could also keep track of dependencies and
when you came to deploy it could deploy the children changes first etc.

Might be an interesting way to work, treat your database schema as if it
was programming poject.

On Sat, Dec 31, 2016 at 2:32 AM, Michael Sheaver  wrote:

> The reason that you cannot use git for that is that it was never intended
> for live data and stuff that resides on a database.
>
> That said, I agree with you that all table DDLs and stored procedures
> should be kept under version control. And in fact I do keep them under VC
> on my database. How do I do this? I have a directory in which I keep all my
> SQL scripts, and this directory is a git repo. In this directory/project I
> keep:
> 1. For every table that is created on the database, I have an SQL script
> file with the DDL that creates it
> 2. For every stored proc that I create, I put it in its own SQL script as
> well
> 3. For any (relatively) static lookup tables (i.e. state-region
> associations, etc.), they are put in a SQL script
>
> This workflow gives me several important benefits, including:
> 1. Everything needed to recreate the database is kept under version
> control, including the DDL, stored procedures and lookup tables
> 2. This script folder does not need to reside on the DB server, can be
> kept anywhere, even on my local laptop; I pull them up in DataGrip and run
> them on the server (my scripts are on my laptop)
> 3. All these scripts are VC'd under git
> 4. They are easily and quickly pushed to my remote repo on Github and
> Bitbucket for backup
> 5. When I need to create a new fresh, empty copy of the database
> somewhere, I just run these scripts and it is done quickly and easily
>
> One more little trick I have is to use Gitkraken for my git GUI. It. is
> free, and is absolutely the best git GUI available. The devs made it an
> absolute joy to use, and I never need to use the command line anymore.
>
> Hope this helps!
>
> On Dec 29, 2016, at 10:43 PM, Tim Uckun  wrote:
>
> I have datagrip and it's OK but it doesn't really do everything I want.
>
> I don't understand why it doesn't fetch all objects from the database and
> then put them into the disk in a directory so I can put it all under git
> and then let me work on them syncing the files back as they change.  For
> example today I just renamed a function. It didn't refactor properly by
> identifying stored procs that reference it. If I was using another
> jetbrains IDE it would have built an index of the project files and did a
> proper refactor.
>
> This would also allow you to make wholesale disk changes and then sync
> them up properly to get around postgres dependency issues.
>
> On Fri, Dec 30, 2016 at 1:40 PM, Michael Sheaver  wrote:
>
>> If you want an IDE, Jetbrains, the makers of great IDEs like IntelliJ,
>> PyCharm. and AppCode, among others, have recently come out with what is
>> arguably the BEST IDE for DBAs, DataGrip. It runs on most major platforms,
>> and is so good that I have bitten the bullet and paid the yearly
>> subscription for it.
>>
>> Leave the Postgres core alone focus on what they do best, and that is
>> making the very BEST database environment that can be had at any price,
>> period. Is Postgres perfect? No, not at all. But no other group is so
>> focused on performance, scalability and security as these folks are. And
>> the rate of development, enhancement and continual improvement is, quite
>> honestly, astounding.
>>
>> So here is my hat tip to the Postgres team for an awesome job they are
>> doing!
>>
>> On Dec 29, 2016, at 7:19 PM, Tim Uckun  wrote:
>>
>> I am not saying the postgres core people should work on an IDE, just that
>> an IDE like thing would be nice.
>>
>> On Fri, Dec 30, 2016 at 12:51 PM, Rob Sargent 
>> wrote:
>>
>>> I would hope Postgres core folk take no more than a nanosecond to reject
>>> the idea that they work on an IDE. Focus on reading and writing faster and
>>> faster ACID all the while.
>>>
>>> On Dec 29, 2016, at 5:32 PM, Tim Uckun  wrote:
>>>
>>> Honestly I don't even like JS. Having said that I am not too crazy about
>>> PL

[GENERAL] Importing directly from BCP files

2015-11-15 Thread Tim Uckun
Does anybody have any documentation on what a BCP file from SQL server
looks like?  I have a directory full of BCP files and I would like to
import them into PG.

Has anybody done anything like this before?

Thanks


Re: [GENERAL] Importing directly from BCP files

2015-11-15 Thread Tim Uckun
>
>
> That appears to depend on the -f option
>
> 
>

I have the format files so that's good.



>
> Have you tried opening a file in a text editor to see what they look like?
>
>

Yes. It looks like a binary file with lots of non printable characters. I
tried running it through iconv with common encodings (windows-1524, utf-16,
utf-32 and few others) but I can't seem to get a clean UTF-8 or ASCII
conversion for some reason.


> The bulk import command in PostgreSQL is COPY, so you'll likely want to
> look at http://www.postgresql.org/docs/9.4/static/sql-copy.html
>


I need to get them into a shape where copy command can process them first I
think.



>
>> Has anybody done anything like this before?
>>
>>
> I'd imagine that it's fairly common.
>


I googled for quite a while and couldn't find anything.  Unfortunately the
-f "format" option makes it hard to search for "BCP file format" or
something similar.


Re: [GENERAL] Importing directly from BCP files

2015-11-15 Thread Tim Uckun
>
>
> I moved a database from MS Sql Server 2000 to Postgresql a few years
> ago via BCP files.  I used a Python script to do some fixup on the
> BCP files to make them importable as CSV files into Postgresql.  I
> don't know if quirks I ran into are still an issue with newer versions
> of Sql Server but for what it's worth, here are the comments from that
> script:
>
>
>

Do you rember what the encoding of the files was by any chance?


Re: [GENERAL] Importing directly from BCP files

2015-11-16 Thread Tim Uckun
I don't see any documentation anywhere about the BCP format and as I said
googling for "BCP format" gives a zillion links about the format files.
Every project on github just calls out the BCP command but I am on linux
and that doesn't help me at all.

Bummer.  This is going to be a huge pain to try and import these files.


Re: [GENERAL] Importing directly from BCP files

2015-11-16 Thread Tim Uckun
>
>
> On openSuSE 13.2
>
> sudo zypper install freetds-tools
>
> Ubuntu 14.04
>
> sudo apt-get install freetds-bin
>
> aklaver@killi:~> freebcp -h
> usage:  freebcp [[database_name.]owner.]table_name {in | out} datafile
> [-m maxerrors] [-f formatfile] [-e errfile]
> [-F firstrow] [-L lastrow] [-b batchsize]
> [-n] [-c] [-t field_terminator] [-r row_terminator]
> [-U username] [-P password] [-I interfaces_file] [-S server]
> [-v] [-d] [-h "hint [,...]" [-O "set connection_option on|off,
> ...]"
> [-A packet size] [-T text or image size] [-E]
> [-i input_file] [-o output_file]
>
> example: freebcp testdb.dbo.inserttest in inserttest.txt -S mssql -U guest
> -P password -c
>
>>
>>
>From what I can make out this tool reads the BCP files and puts them into
an SQL server.  I need to either put them into a postgres server or to
output as plain text CSV files.


Re: [GENERAL] Importing directly from BCP files

2015-11-16 Thread Tim Uckun
>
> the binary form of the BCP file output is undocumented. So if all the BCP
> files you have are the binary(native) version you are up this creek without
> a paddle.
>
>
Ugh.  Yes it looks like that's the creek I am on.   Thanks Microsoft!


> So, moving to another creek. It depends on the amount of data you are
> working with, but it might be worth it to spin up a VM in the cloud on AWS,
> Azure, etc that has SQL Server on it and import the BCP files there. You
> could then export the data using the character format instead of the native
> format:
>

Man that really sucks but if it's the only way then I guess it's the only
way.


Re: [GENERAL] Importing directly from BCP files

2015-11-16 Thread Tim Uckun
I am going to ask them that but there are tons of legacy files which might
need to be dealt with again in the future so I was hoping to use them
directly.


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-25 Thread Tim Uckun
I'll add my two cents.

I set up something similar a while ago. Here are my suggestions for what
they are worth.

You don't ever want to delete from such a table so you need to set up
something which allows you to truncate the tables when you no longer need
them.

One way to accomplish this is with rules (not triggers, rules are blazingly
fast compared to triggers).Set up a table inheritance scheme based on
whatever search criteria you have (date, sequence etc).   Set up a cron job
to create the child tables well ahead and to set up the rule redirecting
the insert. For example let's say you have a date partitioned table and you
want to  keep a table for every day.  Your cron job would run once a day
and would create the next seven days worth of tables (just incase the cron
job fails to run for some reason) and would rewrite the rule to insert into
the table with a if then else type of logic.  This method is preferable to
the dynamic creation of the table name with string concatenation because
again it's significantly faster.

Another method I tried was to have one "primary" child table and "archival"
child tables and insert directly into the primal child table.  For example
say you have a table called "Q".  You set up a table called Q_in which
inherits from Q.  Your code inserts into the Q_in table, you select from
the Q table.  On a periodic basis you do this

BEGIN TRANSACTION
LOCK TABLE Q_IN IN EXCLUSIVE MODE;
ALTER TABLE Q_IN RENAME TO Q_SOME_DATETIME;
CREATE TABLE Q_IN
  (LIKE Q_SOME_DATETIME INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING
INDEXES);
ALTER SEQUENCE q_in_id_seq OWNED BY q_in.id;
 -- Set some constraints so the query optimizer knows what to do
END TRANSACTION


There is one other method which is the Kafka approach( You can use this in
addition to the above methods)

Create N tables for incoming queue, each one has a sequence for their ID
number. N should be determined by how many clients you expect to run.
Create a rule which uses some hash function or round robin or randomly to
insert data into one of the tables.   Create a different table which keeps
track of client connections.   The clients use this table to keep track of
the last id fetched.  For example let's have I have three types of
processes that run on the incoming data p1,p2, p3  The table logs the
highest ID fetched from each table for each process. When a client connects
it connects to the table with the lowest used ID for that process,  and it
locks it for that client (not process because you can multiple clients
running each process), it processes the records, it updates the id, it
unlocks the table and it backs off for a few seconds.  The next client
which woke up goes through the same process and so on.  Both Apache Kafka
and Amazon kinesis use this approach.  One nice thing about this approach
is that you can put each table in it's own tablespace in it's own disk for
higher performance.

One other thing. This is dangerous but you can turn off logging of the
tables, this will make the inserts much faster.

Finally:

There is no need to do any of this.  Kinesis is cheap, Kafka is pretty
awesome, Rabbit is crazy useful.

Hope this helps, ping me offline if you want more details.

Cheers.


On Tue, Nov 24, 2015 at 11:51 AM, Merlin Moncure  wrote:

> On Mon, Nov 23, 2015 at 12:54 PM, John R Pierce 
> wrote:
> > On 11/23/2015 2:41 AM, Chris Withers wrote:
> >>
> >>
> >> If it's totally wrong, how should I be looking to approach the problem?
> >
> > depending on where these queue entries are coming from, I'd considering
> > using a message queueing system like AMS, MQseries, etc, rather than
> trying
> > to use a relational database table as a queue. your external data
> source(s)
> > would write messages to this queue, and you'd have 'subscriber' processes
> > that listen to the queue and process the messages, inserting persistent
> data
> > into the database as needed.
>
> I just don't agree with this generalization.  Keeping the state of the
> queue in the database has a lot of advantages and is a lot easier to
> deal with from a programming perspective especially if SQL is your
> core competency.  Being able to produce and consume in SQL based on
> other relational datasources is...elegant.
>
> Specialized queue systems are a very heavy dependency and adding a new
> server to your platform to mange queues is not something to take
> lightly.  This advice also applies to scheduling systems like quartz,
> specialized search like solr and elastisearch, and distributed data
> platforms like hadoop.  I've used all of these things and have tended
> to wish I had just used the database instead in just about every case.
>
> Also, check out pgq (https://wiki.postgresql.org/wiki/PGQ_Tutorial).
> Personally, I tend to roll my own queues.   It's not difficult.
>
> merlin
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/

[GENERAL] Selecting top N percent of records.

2010-10-17 Thread Tim Uckun
Is there a way to select the top 10% of the values from a column?

For example the top 10% best selling items where number of sales is a column.

Thanks.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Selecting top N percent of records.

2010-10-17 Thread Tim Uckun
On Mon, Oct 18, 2010 at 12:53 PM, Darren Duncan  wrote:
> Tim Uckun wrote:
>>
>> Is there a way to select the top 10% of the values from a column?
>>
>> For example the top 10% best selling items where number of sales is a
>> column.
>
> The top 10% would be a variable number of records.  Is that what you want,
> or would you rather, say, just see the top N items?

Yes this is what I want. I don't care how many records I pull up (well
I might want to put a LIMIT on them). I want the top 10%

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Selecting top N percent of records.

2010-10-17 Thread Tim Uckun
> That is a bit problematic because it necessitates knowing the number
> of rows total, and slow counting is an idiosyncrasy of postgres.
>
> http://wiki.postgresql.org/wiki/Slow_Counting
>
> To get the top 10%:
>
> SELECT * FROM table LIMIT(SELECT (COUNT(*) * 0.1)::integer FROM table)


I think I wasn't making myself clear. I don't want the top 10% of the
rows. I want the rows with the top 10% of the values in a column.

In my case there is a very non linear set of values. The lowest value
is 1 and the vast majority of records have a 1 in the column. The
highest value might be in the tens of thousands.  I want to pull up
the records that have the top ten percent values.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Selecting top N percent of records.

2010-10-17 Thread Tim Uckun
>
> OK, so you want a median-style "sort them in descending order and count down
> until you've selected the first 10% of rows" approach? In other words,
> values in the 90th percentile of the distribution?
>
> Try this. Given table "x" with single integer column "y", obtain rows of x
> in the 90th percentile of y:
>
> select ranked.y FROM (select percent_rank() over (order by y desc) as pc, y
> from x) AS ranked WHERE pc <= 0.1;
>
> or:
>
> select ranked.y from (select ntile(10) over (order by y desc) as pc, y from
> x) AS ranked WHERE pc = 1;
>


Thanks I will look into the window functions. I haven't used them
before so thanks for the tip.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Updates, deletes and inserts are very slow. What can I do make them bearable?

2010-10-20 Thread Tim Uckun
I have a very simple update query.

update cu
set screenshot_file_name = tu.screenshot_file_name,
screenshot_content_type  = tu.screenshot_content_type,
screenshot_file_size = tu.screenshot_file_size,
screenshot_status  = tu.screenshot_status

from  cu
inner join tu on tu.cu_id = cu.id

The join has five to six thousand rows in it.

The only indexed field in the update is screenshot_status which is an integer.

I changed the checkpoint settings to

checkpoint_segments = 256
checkpoint_completion_target = 0.9

but it still does not help.

I am having similar problems with deletes and inserts. Trying to
delete even a few thousand records takes forever.  The selects seem to
be just fine.

I am running this on my laptop with no other processes hitting the
database. It's a i5 with lots of RAM and quad core and a IDE drive.

Where is the FAST button for postgres updates? What parameter do I
have to set in order to update 6000 records in under an hour?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Updates, deletes and inserts are very slow. What can I do make them bearable?

2010-10-20 Thread Tim Uckun
On Thu, Oct 21, 2010 at 3:37 PM, Gary Chambers  wrote:
>> Where is the FAST button for postgres updates? What parameter do I
>> have to set in order to update 6000 records in under an hour?
>
> Which version of Postgres?

8.4

  >Have you investigated more than just two
> performance tuning parameters?

I followed the advice I found when searching this mailing list.


>Does your MS Access version of the
> query run any faster?

I don't have access but I suspect it would.  How long should it take
to update three fields in 6000 records in your opinion.


> http://tinyurl.com/2fsjmv4

Ah yes I was expecting something snarky. Thanks.

BTW I have read many of those links and have adjusted some values but
honestly there are so many buttons to push and knobs to dial that it's
hard to know what will fix what.

Also searching the mailing list shows that this is a pretty commonly
asked question but I wasn't able to find an answer other than
increasing the parameters I listed.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Updates, deletes and inserts are very slow. What can I do make them bearable?

2010-10-20 Thread Tim Uckun
>
> 1) Increase checkpoint_segments (>64, increases beyond that can be helpful
> but they eventually level out)

Changed it back to 64 from 256

> 2) Increase shared_buffers (~25% of RAM is normal)

Changed it to one gig (25% of my RAM) obviously this involved changing
the shmmax and shmall settings in the kernel.

> 3) Confirm there are no constraints or foreign keys happening at each update

There are none.

> 4) Make sure your indexes aren't filled with junk and that VACUUM is running
> effectively.  REINDEX or CLUSTER tables that haven't been well maintained in
> the past.

Autovacuum is on. I presume it's doing it's job. Didn't re-index or
recluster because it's a dev database and the data rarely changes.

> 5) Upgrade to better hardware that has a battery-backed write cache

Not for my laptop.

> Disable synchronous_commit and cheat on individual commits, at the expense
> of potential lost transactions after a crash.

I will leave this as a last resort.

> Updating rows in PostgreSQL is one of the most intensive things you do to
> your disks, and it's hard to get a laptop drive to do a very good job at
> that.
>


After making the above changes I re-ran the query. It's been running
for five minutes and it's still running.

This is a database with nothing else hitting it.

So obviously something else is out of kilter.

I'll ask the same question I asked Gary.

Say I just apt-get install postgres and do nothing else. One table has
about 500K records. The other has about 5K records. The joins are on
indexed integer fields (one is the primary key). How long should it
take to update five to six thousand records in your experience?   Out
of the box with no tuning. How long should this take on an almost new
laptop, four gigs of RAM, i5 quad core processor?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Updates, deletes and inserts are very slow. What can I do make them bearable?

2010-10-20 Thread Tim Uckun
To follow up...

I did a full vacuum analyze on both tables and re-ran the query. Same
story. I ended the query after eight minutes.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Updates, deletes and inserts are very slow. What can I do make them bearable?

2010-10-20 Thread Tim Uckun
On Thu, Oct 21, 2010 at 5:05 PM, Tom Lane  wrote:
> Tim Uckun  writes:
>> I have a very simple update query.
>
>> update cu
>> set screenshot_file_name = tu.screenshot_file_name,
>>     screenshot_content_type  = tu.screenshot_content_type,
>>     screenshot_file_size = tu.screenshot_file_size,
>>     screenshot_status  = tu.screenshot_status
>> from  cu
>> inner join tu on tu.cu_id = cu.id
>
> That isn't actually the query you're issuing, because if it were
> you would get an error "table name "cu" specified more than once".

Sorry I shortened the names of the tables.

> I suspect that the query you're actually issuing involves an
> unconstrained cartesian product self-join between the target table
> and another instance of itself.  Postgres doesn't consider that
> the target table should be named again in FROM.  But it's hard to
> be sure about that when looking at a redacted query.

I apologize for the redacted query. I was trying to make it easier to follow.

Here is the actual query.

update consolidated_urls
set screenshot_file_name = tu.screenshot_file_name,
   screenshot_content_type  = tu.screenshot_content_type,
   screenshot_file_size = tu.screenshot_file_size,
   screenshot_status  = tu.screenshot_status

from  consolidated_urls cu
inner join trending_urls tu on tu.consolidated_url_id = cu.id

This is a simple inner join.

select count(cu.id)
from  consolidated_urls cu
inner join trending_urls tu on tu.consolidated_url_id = cu.id

yields 3657 records.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Updates, deletes and inserts are very slow. What can I do make them bearable?

2010-10-20 Thread Tim Uckun
>
> No, it isn't.  This is a three-way join between consolidated_urls, cu,
> and tu --- the fact that cu is the same underlying table as

cu is an alias for consolidated_urls. tu is an alias for trending_urls.

There are only two tables in the query.

> consolidated_urls doesn't change that.  And the join is
> underconstrained, causing each row of consolidated_urls to be joined
> to every row of the cu/tu join.  That's why it's taking such an
> unreasonably long time --- you're generating many thousands of redundant
> updates to each row of consolidated_urls.  You should just write this as
>

Once again there are only two tables in the query. The join clause is
inner join trending_urls tu on tu.consolidated_url_id = cu.id


> update consolidated_urls
> set screenshot_file_name = tu.screenshot_file_name,
>   screenshot_content_type  = tu.screenshot_content_type,
>   screenshot_file_size = tu.screenshot_file_size,
>   screenshot_status  = tu.screenshot_status
> from trending_urls tu where tu.consolidated_url_id = consolidated_urls.id
>
> Postgres is a bit different from some other DBMSes in how it interprets
> UPDATE ... FROM syntax.
>


I'll try this too.

Anything to make this query complete in a reasonable amount of time.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Updates, deletes and inserts are very slow. What can I do make them bearable?

2010-10-20 Thread Tim Uckun
One more follow up.

Did a vacuum full on both tables and a re-index on both tables.

Changed the wal_buffers to 16MB  (increased the kernel param as a
result) as per
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

I also turned off fsync but still no joy.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Updates, deletes and inserts are very slow. What can I do make them bearable?

2010-10-21 Thread Tim Uckun
On Fri, Oct 22, 2010 at 4:33 AM, Brian Hirt  wrote:
>>
>> There are only two tables in the query.
>>
>
> Tim,
>
> No, your query is written incorrectly.  I don't understand why you come on to 
> this list all hostile and confrontational.  Regardless, people still try to 
> help you and then you still ignore the advice of people that are giving you 
> the solutions to your problems.


I don't think I was hostile. I think people were hostile to me as a
matter of fact.

I asked a question and the first reply was really snarky and unhelpful.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Updates, deletes and inserts are very slow. What can I do make them bearable?

2010-10-21 Thread Tim Uckun
>>> True.  His only real snark was in reponse to the "let me google that
> for you" link.  OTOH, he's arguing with Tom Lane about whether his SQL
> is well formed.  There's arguing on the internet is stupid, then
> there's arguing with Tom Lane about SQL is stupid.

I wasn't arguing with Tom Lane about the SQL. I was using standard SQL
the kind used by all other databases. Tom correctly pointed out that
postgres does not deal with that particular SQL statement the way
other databases do and wrote it the way postgres wants it.


As far as I know update table set x=y from table inner join other
table is a pretty standard way of doing things.  Tom pointed out that
in postgres you have to leave out the second mention of the table as
postgres takes that as a self join.

>
> Have to admit when I saw that I said to myself OP needs someone to tell
> him "whoa, big fella".  I've been in similar situations where I was
> "sure" of one thing and the problem must be elsewhere, when of course I
> was wrong about the one thing...
>
>

I have been wrong lots of times.  It's not a big deal. In this case I
was expecting postgres to act one way because most of my experience is
with other databases and that's the way other databases handle things.
 I would think that's a common occurrence here.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Updates, deletes and inserts are very slow. What can I do make them bearable?

2010-10-22 Thread Tim Uckun
> Agreed.  But when Tom pointed out the problem in your query you were
> quite sure you couldn't be wrong.  When I asked you to run explain to
> see what kind of row estimate you got, I got no answer.  This was a
> query problem not a hardware tuning problem.
>
> The best step for getting a good answer is forming a good question.
>
> So, has changing the query made it faster?  Did explain show what I
> expect, a huge cartesian product?  Just guessing.
>


Yes changing the query fixed the problem.   When I replied to Tom I
was under the impression that my redaction of the query was causing a
confusion.

In my defense the query I posted works fine in mssql server and mysql.
I just presumed it would work the same in postgres. I guess that was
my mistake.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Updates, deletes and inserts are very slow. What can I do make them bearable?

2010-10-22 Thread Tim Uckun
> It's unfortunate that the first response you got was that message from Gary
> Chambers, which was a bit unprofessional and started the whole thread off in
> a bad direction for you.  As what I've seen of Gary's posts suggests he is
> trying to be helpful but has a quirky sense of humor, I think that was meant
> to be read as a little joke rather than a malicious comment.


Really there is no need to go on about this. This is the internet, we
are all adults. I don't take it personally.  I just wanted to explain
my side of the story that's all.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Postgres connection errors

2010-11-02 Thread Tim Uckun
Hello.

I have lots of ruby daemons running connected to postgres. Some of
them start getting connection errors after about a day or two of
running. The odd thing is that they don't all get the same error.

Some get this error:  PGError: lost synchronization with server: got
message type "T"
Others get this  PGError: lost synchronization with server:
got message type "e"
And sometimes this   PGError: lost synchronization with server: got
message type ""


What is postgres trying to tell me here?  This error is most likely
coming out of libpq I would think.

Thanks.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres connection errors

2010-11-02 Thread Tim Uckun
>
> Most of the cases we've seen like that have been because multiple
> threads in the client application were trying to use the same PGconn
> connection object concurrently.  There's no cross-thread synchronization
> built into libpq, so you have to provide the interlocks yourself if
> there's any possibility of multiple threads touching the same PGconn
> concurrently.  And it will not support more than one query at a time
> in any case.


These are not threaded daemons but this does give me some sort of a
clue to work on. I noticed that there is a call to clear stale
connections which might be the culprit because in the case of these
workers there is only one connection.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Adding an "and is not null" on an indexed field slows the query down immensely.

2011-01-14 Thread Tim Uckun
I have this query it runs reasonably quickly.


SELECT "consolidated_urls".* FROM "consolidated_urls" INNER JOIN "topical_urls"
 ON "consolidated_urls".id = "topical_urls".consolidated_url_id
 WHERE (("topical_urls".domain_id = 157) AND ((topical_urls.hidden = 'f')))
  ORDER BY index_delta DESC LIMIT 10

The explain for this goes like this.

"Limit  (cost=29899.43..29899.46 rows=10 width=1880)"
"  ->  Sort  (cost=29899.43..29906.04 rows=2642 width=1880)"
"Sort Key: consolidated_urls.index_delta"
"->  Nested Loop  (cost=101.29..29842.34 rows=2642 width=1880)"
"  ->  Bitmap Heap Scan on topical_urls
(cost=101.29..7490.32 rows=2642 width=4)"
"Recheck Cond: (domain_id = 157)"
"Filter: (NOT hidden)"
"->  Bitmap Index Scan on
index_topical_urls_on_domain_id_and_consolidated_url_id
(cost=0.00..100.63 rows=2643 width=0)"
"  Index Cond: (domain_id = 157)"
"  ->  Index Scan using consolidated_urls_pkey on
consolidated_urls  (cost=0.00..8.45 rows=1 width=1880)"
"Index Cond: (consolidated_urls.id =
topical_urls.consolidated_url_id)"


I add one more clause on to it to filter out index_deltas that are not
null and the query becomes unusably slow.

SELECT "consolidated_urls".* FROM "consolidated_urls" INNER JOIN "topical_urls"
 ON "consolidated_urls".id = "topical_urls".consolidated_url_id
 WHERE (("topical_urls".domain_id = 157) AND ((topical_urls.hidden = 'f')))
 AND (index_delta IS NOT NULL) ORDER BY index_delta DESC LIMIT 10

The explain for this is goes like this

"Limit  (cost=0.00..20555.33 rows=10 width=1880)"
"  ->  Nested Loop  (cost=0.00..5430717.58 rows=2642 width=1880)"
"->  Index Scan Backward using
index_consolidateds_url_on_index_delta on consolidated_urls
(cost=0.00..5316175.98 rows=15242 width=1880)"
"  Filter: (index_delta IS NOT NULL)"
"->  Index Scan using
index_topical_urls_on_domain_id_and_consolidated_url_id on
topical_urls  (cost=0.00..7.50 rows=1 width=4)"
"  Index Cond: ((topical_urls.domain_id = 157) AND
(topical_urls.consolidated_url_id = consolidated_urls.id))"
"  Filter: (NOT topical_urls.hidden)"



The index_delta field is double precision and is indexed.

Any suggestions as to how to make this query run faster?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Adding an "and is not null" on an indexed field slows the query down immensely.

2011-01-15 Thread Tim Uckun
2011/1/15 pasman pasmański :
> Try :
> order by index_delta+1 desc
>

I have attached the explain analyze for that below why does this
return instantly?




Limit  (cost=29910.05..29910.07 rows=10 width=1880) (actual
time=42.563..42.563 rows=0 loops=1)
   ->  Sort  (cost=29910.05..29916.65 rows=2642 width=1880) (actual
time=42.558..42.558 rows=0 loops=1)
 Sort Key: ((consolidated_urls.index_delta + 1::double precision))
 Sort Method:  quicksort  Memory: 17kB
 ->  Nested Loop  (cost=105.30..29852.95 rows=2642 width=1880)
(actual time=10.428..10.428 rows=0 loops=1)
   ->  Bitmap Heap Scan on topical_urls
(cost=105.30..7494.33 rows=2642 width=4) (actual time=10.424..10.424
rows=0 loops=1)
 Recheck Cond: (domain_id = 157)
 Filter: (NOT hidden)
 ->  Bitmap Index Scan on
index_topical_urls_on_domain_id_and_consolidated_url_id
(cost=0.00..104.64 rows=2643 width=0) (actual time=10.419..10.419
rows=0 loops=1)
   Index Cond: (domain_id = 157)
   ->  Index Scan using consolidated_urls_pkey on
consolidated_urls  (cost=0.00..8.45 rows=1 width=1880) (never
executed)
 Index Cond: (consolidated_urls.id =
topical_urls.consolidated_url_id)
 Filter: (consolidated_urls.index_delta IS NOT NULL)
 Total runtime: 42.932 ms
(14 rows)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Adding an "and is not null" on an indexed field slows the query down immensely.

2011-01-15 Thread Tim Uckun
2011/1/16 pasman pasmański :
> I think this is a planner's bug. Can you send these explains to pgsql-bugs ?
>



Sure. BTW I thought I would change the query a little by putting a AND
index_value .>100 instead of index_delta and it didn't help at all. I
thought maybe using another index would help but it didn't.

So what I did was add NULLS LAST which was more bearable.

This really should be a pretty quick query, there are only twenty
records after all and all criteria fields are indexed.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Fast data, slow data

2014-06-26 Thread Tim Uckun
I have a use case in which the most recent data experiences a lot of
transactions (inserts and updates) and then the churn kind of calms down.
 Eventually the data is relatively static and will only be updated in
special and sporatic events.

I was thinking about keeping the high churn data in a different table so
that the vacuums on that table can go faster and the vacuums on the rest of
the data will rarely be needed.

Ideally the high churn data would be in an in memory table but failing that
I thought that an unlogged table would be good enough.  So now I need a way
to flush the oldest data in this table to another logged table and do it in
an efficient manner. I don't think partitioning is a good idea in this case
because the partitions will be for small time periods (5 to 15 minutes).

Anybody have a similar problem? If so how did you solve it?

Thanks.


Re: [GENERAL] Fast data, slow data

2014-06-26 Thread Tim Uckun
This is what I was thinking but I am worried about two things.

1. If there is a very large set of data in the table that needs to be moved
this will be slow and might throw locks which would impact the performance
of the inserts and the updates.
2. Constantly deleting large chunks of data might cause vacuum problems.


On Fri, Jun 27, 2014 at 2:49 AM, Shaun Thomas 
wrote:

> On 06/26/2014 04:29 AM, Tim Uckun wrote:
>
>  I don't think partitioning is a good idea in this case because the
>> partitions will be for small time periods (5 to 15 minutes).
>>
>
> Actually, partitioning might be exactly what you want, but not in the way
> you might think. What you've run into is actually a pretty common usage
> pattern. How we solve problems like this where I work is to use table
> inheritance alone. Consider this:
>
> CREATE TABLE my_table
> (
>   ... columns
> );
>
> CREATE TABLE my_table_stable (INHERITS my_table);
>
> Then you create a job that runs however often you want, and all that job
> does, is move old rows from my_table, to my_table_stable. Like so:
>
> BEGIN;
>
> INSERT INTO my_table_stable
> SELECT * FROM ONLY my_table
>  WHERE date_col >= now() - INTERVAL '15 minutes';
>
> DELETE FROM ONLY my_table
>  WHERE date_col >= now() - INTERVAL '15 minutes';
>
> COMMIT;
>
> Or whatever. But you get the idea.
>
> This way, you still get all the data by selecting from my_table, but the
> data is partitioned in such a way that you can put the high turnover table
> in another tablespace, or otherwise modify it for performance reasons.
>
> --
> Shaun Thomas
> OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
> 312-676-8870
> stho...@optionshouse.com
>
> __
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions
> related to this email
>


Re: [GENERAL] Fast data, slow data

2014-06-26 Thread Tim Uckun
The database is functioning fine now but I am anticipating a much higher
workload in the future.  The table in question is probably going to have a
few million rows per day inserted into it when it gets busy, if it gets
very busy it might be in the tens of millions per day but that's
speculation at this point.

I don't want to say that the data is not important but if I drop one or two
sensor readings it's not going to be the end of the world.



On Fri, Jun 27, 2014 at 3:45 AM, Steve Crawford <
scrawf...@pinpointresearch.com> wrote:

> On 06/26/2014 02:29 AM, Tim Uckun wrote:
>
>> I have a use case in which the most recent data experiences a lot of
>> transactions (inserts and updates) and then the churn kind of calms down.
>>  Eventually the data is relatively static and will only be updated in
>> special and sporatic events.
>>
>> I was thinking about keeping the high churn data in a different table so
>> that the vacuums on that table can go faster and the vacuums on the rest of
>> the data will rarely be needed.
>>
>> Ideally the high churn data would be in an in memory table but failing
>> that I thought that an unlogged table would be good enough.  So now I need
>> a way to flush the oldest data in this table to another logged table and do
>> it in an efficient manner. I don't think partitioning is a good idea in
>> this case because the partitions will be for small time periods (5 to 15
>> minutes).
>>
>> Anybody have a similar problem? If so how did you solve it?
>>
>
> It's not at all unusual for a database to experience a high portion of its
> activity on a small subset of the data so in that sense the "problem" is
> already solved by appropriate sizing and tuning to make appropriate
> adjustments to the caching and other properties already provided by the OS
> and DB.
>
> Having said that, there is far too little information here to provide
> specific advice on tuning and other approaches that may be of use for you.
> In particular:
>
> Is this currently a hypothetical question or do you have a running system
> that is experiencing performance problems?
>
> In either case can you provide some parameters including rows and size of
> your large table(s), what you mean by "high churn" (rate of inserts,
> deletes, updates and how they are grouped into transactions), the size of
> the table that is heavily updated, how the database is used (transactional,
> data-mining, ...) and anything else you feel might help the list understand
> your use-case.
>
> If you have a live database, any metrics/observations you can supply might
> be helpful. For example, are particular queries slow? What speed is
> required? Do you have lots of simultaneous connections or is everything
> through a single connection.
>
> Finally,  confirm that the data is not important or is easily recreated
> (implied by your in-memory/unlogged-table comments).
>
> Cheers,
> Steve
>
>
>


[GENERAL] Managing Key Value tags on rows

2014-11-12 Thread Tim Uckun
What is the most efficient way to model key value tags on records. The keys
and values will be variable and changing over time.  The values may be of
various types (numbers, dates, strings etc).   There will be filters and
group by selects based on tag names and maybe even tag values.

Postgres gives me the option of using hstore, JSON or just a linked table
with all the keys and values and I am wondering which would be most
efficient in terms of SELECT speeds. The writes are done in batches so this
is mostly for analytical purposes.

Thanks.


[GENERAL] Hierarchical rollups.

2014-11-12 Thread Tim Uckun
I am looking to model a hierarchical structure where the parent nodes on
each level will contain calculations based on immediate children's values.
For example the parent row may have an average of all the child rows (the
calculations will be more complicated than that of course).

So every time a record changes this is going to possibly trigger an
aggregation function on the parent which will trigger an aggregation on the
parent of that record etc until the root node is calculated.

I am wondering if it's more efficient to build very complicated selects
which do the calculations on the fly or whether to put the calculation
results in the rows to make selects more efficient but make the writes more
inefficient.

At this time I am not planning on using ltree but I would be interested in
hearing your experiences with ltree as well.

TIA


Re: [GENERAL] Managing Key Value tags on rows

2014-11-17 Thread Tim Uckun
My Main worry is that the tag values will be of different types and ideally
I would be able to search for using type specific ranges. For example if
the tag value is a date then be able to do a date interval search but if
the tag values are strings then do an ilike search.

I was thinking of creating different columns for different types so that I
can do a search like 'WHERE tag_name = 'blah' and date_value between '
.  In other words I would have a string_value, integer_value,
numeric_value, date_value ... columns.



On Tue, Nov 18, 2014 at 5:40 AM, Merlin Moncure  wrote:

> On Wed, Nov 12, 2014 at 7:17 PM, Tim Uckun  wrote:
> > What is the most efficient way to model key value tags on records. The
> keys
> > and values will be variable and changing over time.  The values may be of
> > various types (numbers, dates, strings etc).   There will be filters and
> > group by selects based on tag names and maybe even tag values.
> >
> > Postgres gives me the option of using hstore, JSON or just a linked table
> > with all the keys and values and I am wondering which would be most
> > efficient in terms of SELECT speeds. The writes are done in batches so
> this
> > is mostly for analytical purposes.
>
> For pre-9.4 you should be contrasting hstore to EAV model, where each
> key value pair is stored in its own record.  Text mode json (that is,
> not jsonb) is generally not a good choice for arbitrary searching.
>
> If you'll permit a gross generalization, if the data is mostly static
> hstore should work pretty well otherwise you may want to use the 'old
> school' wasteful but flexible EAV mechanic, double so if you have to
> store other describing criteria than the type itself.
>
> 9.4 and above, I'd be looking pretty intently at jsonb with its
> fancier searching options, better type handling, and ability to store
> complicated structures for this type of work.  Frequent updates will
> still be a pain point however especially if there are a large number
> of keys per object.
>
> merlin
>


[GENERAL] Getting truncated queries from pg_stat_statements

2015-01-19 Thread Tim Uckun
I am trying to get some slow query information and the results from my
query are truncated at 2047 characters.   Some of my queries are very long
so they get truncated in the select. Is there a way around this?

Here is my query.

WITH query_stats AS (
SELECT
  query::text,
  (total_time / 1000 / 60) as total_minutes,
  (total_time / calls) as average_time,
  calls
FROM
  pg_stat_statements
INNER JOIN
  pg_database ON pg_database.oid = pg_stat_statements.dbid
WHERE
  pg_database.datname = current_database()
  )
  SELECT
query,
total_minutes,
average_time,
calls,
total_minutes * 100.0 / (SELECT SUM(total_minutes) FROM
query_stats) AS total_percent
  FROM
query_stats
  WHERE
calls >= 10
AND average_time >= 20
  ORDER BY
average_time DESC
  LIMIT 100


Re: [GENERAL] Getting truncated queries from pg_stat_statements

2015-01-19 Thread Tim Uckun
Ok thanks. I am still on 9.3 so I'll adjust that setting.

On Tue, Jan 20, 2015 at 12:15 PM, Peter Geoghegan <
peter.geoghega...@gmail.com> wrote:

> On Mon, Jan 19, 2015 at 2:37 PM, Tim Uckun  wrote:
> > I am trying to get some slow query information and the results from my
> query
> > are truncated at 2047 characters.   Some of my queries are very long so
> they
> > get truncated in the select. Is there a way around this?
>
>
> Yes. This is fixed in PostgreSQL 9.4, so that query texts can be of
> virtually unlimited size and still be stored. Otherwise, it depends on
> your track_activity_query_size setting.
>
> --
> Regards,
> Peter Geoghegan
>


Re: [GENERAL] Getting truncated queries from pg_stat_statements

2015-01-19 Thread Tim Uckun
Thanks for that tip. I'll check it out.

On Tue, Jan 20, 2015 at 3:12 PM, Melvin Davidson 
wrote:

> Since you are on 9.3. you might want to consider using PgBadger as a
> better way to get information about slow queries.
> http://sourceforge.net/projects/pgbadger/
>
>
> On Mon, Jan 19, 2015 at 6:41 PM, Tim Uckun  wrote:
>
>> Ok thanks. I am still on 9.3 so I'll adjust that setting.
>>
>> On Tue, Jan 20, 2015 at 12:15 PM, Peter Geoghegan <
>> peter.geoghega...@gmail.com> wrote:
>>
>>> On Mon, Jan 19, 2015 at 2:37 PM, Tim Uckun  wrote:
>>> > I am trying to get some slow query information and the results from my
>>> query
>>> > are truncated at 2047 characters.   Some of my queries are very long
>>> so they
>>> > get truncated in the select. Is there a way around this?
>>>
>>>
>>> Yes. This is fixed in PostgreSQL 9.4, so that query texts can be of
>>> virtually unlimited size and still be stored. Otherwise, it depends on
>>> your track_activity_query_size setting.
>>>
>>> --
>>> Regards,
>>> Peter Geoghegan
>>>
>>
>>
>
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>


[GENERAL] Postgres seems to use indexes in the wrong order

2015-01-22 Thread Tim Uckun
Take a look at this explain

http://explain.depesz.com/s/TTRN

The final number of records is very small but PG is starting out with a
massive number of records and then filtering most of them out.

I don't want to really force pg to always use the same index because in
some cases this strategy would win but I am wondering if there is anything
I need to do in order to get the planner to make better decisions.

I already did an analyze on the table.


Re: [GENERAL] Postgres seems to use indexes in the wrong order

2015-01-26 Thread Tim Uckun
The effective_cache_size is one gig. The others are not set so therefore
the default.



On Sun, Jan 25, 2015 at 6:08 AM, Sameer Kumar 
wrote:

>
> On Fri, Jan 23, 2015 at 3:04 PM, Tim Uckun  wrote:
>
>> Take a look at this explain
>>
>> http://explain.depesz.com/s/TTRN
>>
>
> ​Adding some info on the query and table structure (and indexes) would be
> helpful here.​
>
>
>>
>>
>> The final number of records is very small but PG is starting out with a
>> massive number of records and then filtering most of them out.
>>
>> I don't want to really force pg to always use the same index because in
>> some cases this strategy would win but I am wondering if there is anything
>> I need to do in order to get the planner to make better decisions.
>>
>>
> ​What are the values for below parameters-
>
> - random_page_cost
> - seq_page_cost
> ​- effective_cache_size
>
>
>
>> I already did an analyze on the table.
>
>
>
>
> Best Regards,
>
> *Sameer Kumar | Database Consultant*
>
> *ASHNIK PTE. LTD.*
>
> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533
>
> M: *+65 8110 0350 <%2B65%208110%200350>*  T: +65 6438 3504 |
> www.ashnik.com
>
> *[image: icons]*
>
>
>
> [image: Email patch] <http://www.ashnik.com/>
>
>
>
> This email may contain confidential, privileged or copyright material and
> is solely for the use of the intended recipient(s).
>


Re: [GENERAL] Postgres seems to use indexes in the wrong order

2015-01-26 Thread Tim Uckun
Sorry I forgot about the table description.

The table is pretty simple There are about 15 fields and about 75 million
records.  This query is supposed to use three fields to narrow down the
records. One is a timestamp column, the other is a macaddress type, the
third is a integer. All three are indexed.

The query seems to first use the timestamp column which results in a huge
number of records and then filters out using the integer and the macaddr
indices.  If it was to use the integer index first it would start with a
tiny number of records.



On Sun, Jan 25, 2015 at 6:08 AM, Sameer Kumar 
wrote:

>
> On Fri, Jan 23, 2015 at 3:04 PM, Tim Uckun  wrote:
>
>> Take a look at this explain
>>
>> http://explain.depesz.com/s/TTRN
>>
>
> ​Adding some info on the query and table structure (and indexes) would be
> helpful here.​
>
>
>>
>>
>> The final number of records is very small but PG is starting out with a
>> massive number of records and then filtering most of them out.
>>
>> I don't want to really force pg to always use the same index because in
>> some cases this strategy would win but I am wondering if there is anything
>> I need to do in order to get the planner to make better decisions.
>>
>>
> ​What are the values for below parameters-
>
> - random_page_cost
> - seq_page_cost
> ​- effective_cache_size
>
>
>
>> I already did an analyze on the table.
>
>
>
>
> Best Regards,
>
> *Sameer Kumar | Database Consultant*
>
> *ASHNIK PTE. LTD.*
>
> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533
>
> M: *+65 8110 0350 <%2B65%208110%200350>*  T: +65 6438 3504 |
> www.ashnik.com
>
> *[image: icons]*
>
>
>
> [image: Email patch] <http://www.ashnik.com/>
>
>
>
> This email may contain confidential, privileged or copyright material and
> is solely for the use of the intended recipient(s).
>


Re: [GENERAL] Postgres seems to use indexes in the wrong order

2015-01-27 Thread Tim Uckun
Thanks for the tips. I'll make some adjustments

On Tue, Jan 27, 2015 at 8:38 PM, Sameer Kumar 
wrote:

>
> On Tue, Jan 27, 2015 at 6:59 AM, Tim Uckun  wrote:
>
>> The query seems to first use the timestamp column which results in a huge
>> number of records and then filters out using the integer and the macaddr
>> indices.  If it was to use the integer index first it would start with a
>> tiny number of records.
>>
>
> ​May be the record distribution of quantiles is skewed.​ Have you tried to
> set more granular statistics for your int column?
>
> The effective_cache_size is one gig. The others are not set so therefore
>> the default.
>
>
> ​Ideally the effective_cache_size can be set to as much as 50-60% of your
> available memory. Also you need to tune your random_page_cost as per the
> behavior of your disk.
>
> https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server​
>
> ​If these two does not work then may be you should go for setting a more
> granular statistics collection for your specific column-
>
> alter table  alter column  set  statistics 1000;
> analyze ;
>
> ​
>
>
> Best Regards,
>
> *Sameer Kumar | Database Consultant*
>
> *ASHNIK PTE. LTD.*
>
> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533
>
> M: *+65 8110 0350 <%2B65%208110%200350>*  T: +65 6438 3504 |
> www.ashnik.com
>
> *[image: icons]*
>
>
>
> [image: Email patch] <http://www.ashnik.com/>
>
>
>
> This email may contain confidential, privileged or copyright material and
> is solely for the use of the intended recipient(s).
>


[GENERAL] Partioning with overlapping and non overlapping constraints

2015-02-05 Thread Tim Uckun
I have two partitioning questions I am hoping somebody can help me with.

I have a fairly busy metric(ish) table. It gets a few million records per
day, the data is transactional for a while but then settles down and is
used for analytical purposes later.

When a metric is reported both the UTC time and the local times are stored
along with the other data belonging to the metric.  I want to partition
this table to both make it faster to query and also to spread out the
writes.  Ideally the partitions would be based on the UTC timestamp and the
sending location. For example

metrics_location_X_2015_01_01

First problem with this approach is that there could be tens of thousands
of locations so this is going to result hundreds of thousands of tables.
I know there are no upper limits to how many tables there are but I am
thinking this might really get me into trouble later.

Second and possibly more vexing problem is that often the local time is
queried.  Ideally I would like to put three constraints on the child
tables. Location id, UTC timestamp and the local time but obviously the
local timestamps would overlap with other locations in the same timezone
 Even if I was to only partition by UTC the local timestamps would overlap
between tables.

So the questions in a nutshell are.

1. Should I be worried about having possibly hundreds of thousands of
shards.
2. Is PG smart enough to handle overlapping constraints on table and limit
it's querying to only those tables that have the correct time constraint.

Thanks.


Re: [GENERAL] Partioning with overlapping and non overlapping constraints

2015-02-08 Thread Tim Uckun
Partitioning by day would result in less partitions but of course it would
create a "hot" table where all the writes go.

Actually I have thought of an alternative and  I'd be interested in your
opinion of it.


I leave the metrics table alone, The current code continues to read and
write from the metrics.  Every night I create a table based on
metricts_MMDD which inherit from metrics and move data (using the
"ONLY" clause in the delete) into the table and then set a constraint for
that table for that day. I also adjust the constraint for the metrics table
which is basically saying "where timestamp > YYYMMDD".

This way there is no trigger in the parent table to slow down the inserts
and I still have partitions that will speed up read queries.  I realize
that moving large amounts of data is going to be painful but perhaps I can
do it in chunks.



On Sat, Feb 7, 2015 at 3:09 AM, Melvin Davidson 
wrote:

> Perhaps, I do not fully understand completely, but would it not be simpler
> to just rearrange the key (and partition) by date & location?
> EG: 2015_01_01_metrics_location_X
>
> In that way, you would only have 365 partitions per year at most. But you
> also have the option to break it down by week or month, or year.
>
> EG:
>
> EXTRACT(YEAR FROM utc_time) = 2015 AND
> EXTRACT(WEEK FROM utc_time) = 1
>
> or
> EXTRACT(YEAR FROM utc_time) = 2015 AND
> EXTRACT(MONTH FROM utc_time) = 1
>
> or just
> EXTRACT(YEAR FROM utc_time) = 2015
>
>
> On Thu, Feb 5, 2015 at 10:12 PM, David G Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> Tim Uckun wrote
>> > 1. Should I be worried about having possibly hundreds of thousands of
>> > shards.
>>
>> IIRC, yes.
>>
>>
>> > 2. Is PG smart enough to handle overlapping constraints on table and
>> limit
>> > it's querying to only those tables that have the correct time
>> constraint.
>>
>> Probably yes, but seems easy enough to verify.
>>
>> All constraints are checked for each partiton and if any return false the
>> entire partiton will be excluded; which means multiple partitions can be
>> included.
>>
>> Note, this is large reason why #1 poses a problem.
>>
>> David J.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://postgresql.nabble.com/Partioning-with-overlapping-and-non-overlapping-constraints-tp5836869p5836871.html
>> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>


Re: [GENERAL] Partioning with overlapping and non overlapping constraints

2015-02-09 Thread Tim Uckun
>Don't you have duplicate information within your UTC, location and
local_time data ? Maybe you can just attach a timezone to each location...

Yes there is duplicate information but dealing with time zones are a PITA
and the easiest way to solve the myriad of problems I have is to store the
local time as it was at that instant.

>With smaller partitions for hot data, it should be quite fast to move them
one by one to the old data. I have no experience with the trigger based
partitioning of Postgres (we handle partitions logically at the application
level), so I'm not sure how difficult this approach is. I suppose that
you'll need a function that move data from hot to old partitons and that
fix the triggers accordingly.

This I think would be a good approach but it does involve a heavy rewrite
of the app.



On Mon, Feb 9, 2015 at 11:14 PM, Marc Mamin  wrote:

>
> >I have two partitioning questions I am hoping somebody can help me with.
> >
> >I have a fairly busy metric(ish) table. It gets a few million records per
> day, the data is transactional for a while but then settles down and is
> used for analytical purposes later.
> >
> >When a metric is reported both the UTC time and the local times are
> stored along with the other data belonging to the metric.
>
> Don't you have duplicate information within your UTC, location and
> local_time data ?
> Maybe you can just attach a timezone to each location...
>
> >I want to partition this table to both make it faster to query and also
> to spread out the writes.  Ideally the partitions would be based on the UTC
> timestamp and the sending location. For example
> >
> >metrics_location_X_2015_01_01
> >
> >First problem with this approach is that there could be tens of thousands
> of locations so this is going to result hundreds of thousands of tables.
> I know there are no upper limits to how many tables there are but I am
> thinking this might really get me into trouble later.
>
> With only a few millions rows per day, weekly or even monthly partitions
> without regard of locations should be sufficient for older data.
> It should be possible to partition your hot data differently; But Instead
> of using one partition per location, you may use a hash/modulo approach to
> keep the number of partitions in a reasonable count if required at all
> (This can be helpful: https://github.com/markokr/pghashlib). Here I would
> avoid to include time information except for the limit between old and hot
> tables. And depending on the pattern and performance requirement of your
> analytic queries this may be sufficient (i.e. don't partition on the time
> at all).
> With smaller partitions for hot data, it should be quite fast to move them
> one by one to the old data. I have no experience with the trigger based
> partitioning of Postgres (we handle partitions logically at the application
> level), so I'm not sure how difficult this approach is. I suppose that
> you'll need a function that move data from hot to old partitons and that
> fix the triggers accordingly.
>
> >
> >Second and possibly more vexing problem is that often the local time is
> queried.  Ideally I would like to put three constraints on the child
> tables. Location id, UTC timestamp and the local time but obviously the
> local timestamps would overlap with other locations in the same timezone
> Even if I was to only partition by UTC the local timestamps would overlap
> between tables.
> >
> >So the questions in a nutshell are.
> >
> >1. Should I be worried about having possibly hundreds of thousands of
> shards.
> >2. Is PG smart enough to handle overlapping constraints on table and
> limit it's querying to only those tables that have the correct time
> constraint.
>
> If you partition on the UTC time only, you don't have overlapping. When
> querying on the local time, the planner will consider all partitions, but
> an additional index or constraint on this column should be sufficient as
> long as your partition count remains small.
>
> regards,
> Marc Mamin
>


Re: [GENERAL] Partioning with overlapping and non overlapping constraints

2015-02-09 Thread Tim Uckun
This approach wouldn't work I think.  The data keeps growing in the "hot"
table.

On Tue, Feb 10, 2015 at 6:01 AM, Melvin Davidson 
wrote:

> Well, without knowing too much about your application, it certainly sounds
> like using the metricts_MMDD is the way to go. As for modifying the
> constraint daily, couldn't you just use
>
> where timestamp > current_date - Interval '1 Day'
>
> ?
>
> On Mon, Feb 9, 2015 at 5:14 AM, Marc Mamin  wrote:
>
>>
>> >I have two partitioning questions I am hoping somebody can help me with.
>> >
>> >I have a fairly busy metric(ish) table. It gets a few million records
>> per day, the data is transactional for a while but then settles down and is
>> used for analytical purposes later.
>> >
>> >When a metric is reported both the UTC time and the local times are
>> stored along with the other data belonging to the metric.
>>
>> Don't you have duplicate information within your UTC, location and
>> local_time data ?
>> Maybe you can just attach a timezone to each location...
>>
>> >I want to partition this table to both make it faster to query and also
>> to spread out the writes.  Ideally the partitions would be based on the UTC
>> timestamp and the sending location. For example
>> >
>> >metrics_location_X_2015_01_01
>> >
>> >First problem with this approach is that there could be tens of
>> thousands of locations so this is going to result hundreds of thousands of
>> tables.   I know there are no upper limits to how many tables there are but
>> I am thinking this might really get me into trouble later.
>>
>> With only a few millions rows per day, weekly or even monthly partitions
>> without regard of locations should be sufficient for older data.
>> It should be possible to partition your hot data differently; But Instead
>> of using one partition per location, you may use a hash/modulo approach to
>> keep the number of partitions in a reasonable count if required at all
>> (This can be helpful: https://github.com/markokr/pghashlib). Here I
>> would avoid to include time information except for the limit between old
>> and hot tables. And depending on the pattern and performance requirement of
>> your analytic queries this may be sufficient (i.e. don't partition on the
>> time at all).
>> With smaller partitions for hot data, it should be quite fast to move
>> them one by one to the old data. I have no experience with the trigger
>> based partitioning of Postgres (we handle partitions logically at the
>> application level), so I'm not sure how difficult this approach is. I
>> suppose that you'll need a function that move data from hot to old
>> partitons and that fix the triggers accordingly.
>>
>> >
>> >Second and possibly more vexing problem is that often the local time is
>> queried.  Ideally I would like to put three constraints on the child
>> tables. Location id, UTC timestamp and the local time but obviously the
>> local timestamps would overlap with other locations in the same timezone
>> Even if I was to only partition by UTC the local timestamps would overlap
>> between tables.
>> >
>> >So the questions in a nutshell are.
>> >
>> >1. Should I be worried about having possibly hundreds of thousands of
>> shards.
>> >2. Is PG smart enough to handle overlapping constraints on table and
>> limit it's querying to only those tables that have the correct time
>> constraint.
>>
>> If you partition on the UTC time only, you don't have overlapping. When
>> querying on the local time, the planner will consider all partitions, but
>> an additional index or constraint on this column should be sufficient as
>> long as your partition count remains small.
>>
>> regards,
>> Marc Mamin
>>
>
>
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>


[GENERAL] What's a reasonable maximum number for table partitions?

2015-02-12 Thread Tim Uckun
The documentation says having too many partitions will end up being
unproductive as it will cause the optimizer to examine all the tables for
query planning.  So I am wondering what's a reasonable upper limit?

If I was to partition a table by day I would have 365 tables per year. Is
that too many? What if I used a different criteria that would cause a
thousand tables?

Does anybody have experience with huge number of partitions if so where did
you start running into trouble?

Thanks.


Re: [GENERAL] What's a reasonable maximum number for table partitions?

2015-02-13 Thread Tim Uckun
If I used modulo arithmetic how would the query optimizer know which table
to include and exclude? For example say I did modulo 100 based on the field
client_id.  I create a base table with the trigger to insert the data into
the proper child table. Each table has the constraint (client_id % 100) = X

So if I do select from base table where client_id = 10  would postgres know
to only select from client_table_10? Normally I would always have a
client_id in my queries so hopefully the this could be very efficient.





On Sat, Feb 14, 2015 at 5:12 AM, Vick Khera  wrote:

>
> On Thu, Feb 12, 2015 at 7:44 PM, Tim Uckun  wrote:
>
>> Does anybody have experience with huge number of partitions if so where
>> did you start running into trouble?
>>
>
> I use an arbitrary 100-way split for a lot of tracking info. Just modulo
> 100 on the ID column. I've never had any issues with that. If you can
> adjust your queries to pick the right partition ahead of time, which I am
> able to do for many queries, the number of partitions shouldn't matter
> much. Only rarely do I need to query the primary table.
>
> I don't think your plan for 365 partitions is outrageous on modern large
> hardware. For 1000 partitions, I don't know. It will depend on how you can
> optimize your queries before giving them to postgres.
>


Re: [GENERAL] What's a reasonable maximum number for table partitions?

2015-02-13 Thread Tim Uckun
This might get pretty crazy if I am doing queries like WHERE client_id in
() or when I am trying to join some table with the client table.   Maybe I
can precalculate the ids that are going to go into each partition and set
the constraint as where client_id in (some_huge_list).



On Sat, Feb 14, 2015 at 11:15 AM, Bill Moran 
wrote:

> On Sat, 14 Feb 2015 11:14:10 +1300
> Tim Uckun  wrote:
>
> > If I used modulo arithmetic how would the query optimizer know which
> table
> > to include and exclude? For example say I did modulo 100 based on the
> field
> > client_id.  I create a base table with the trigger to insert the data
> into
> > the proper child table. Each table has the constraint (client_id % 100)
> = X
> >
> > So if I do select from base table where client_id = 10  would postgres
> know
> > to only select from client_table_10? Normally I would always have a
> > client_id in my queries so hopefully the this could be very efficient.
>
> Unless the newest versions of PostgreSQL has improved on this, you have to
> give the planner just a bit of a hint ... you're query should look like:
>
> SELET ... WHERE client_id = 10 AND client_id % 100 = 10;
>
> The part after the AND looks silly and redundant, but it guarantees that
> the planner will consider the partition layout when it plans the query,
> and in every test that I've run the result will be that the planner only
> looks at the one child table.
>
> > On Sat, Feb 14, 2015 at 5:12 AM, Vick Khera  wrote:
> >
> > >
> > > On Thu, Feb 12, 2015 at 7:44 PM, Tim Uckun  wrote:
> > >
> > >> Does anybody have experience with huge number of partitions if so
> where
> > >> did you start running into trouble?
> > >>
> > >
> > > I use an arbitrary 100-way split for a lot of tracking info. Just
> modulo
> > > 100 on the ID column. I've never had any issues with that. If you can
> > > adjust your queries to pick the right partition ahead of time, which I
> am
> > > able to do for many queries, the number of partitions shouldn't matter
> > > much. Only rarely do I need to query the primary table.
> > >
> > > I don't think your plan for 365 partitions is outrageous on modern
> large
> > > hardware. For 1000 partitions, I don't know. It will depend on how you
> can
> > > optimize your queries before giving them to postgres.
> > >
>
>
> --
> Bill Moran
>


[GENERAL] Triggers and scalability in high transaction tables.

2015-02-26 Thread Tim Uckun
I want to write a trigger which runs semi-complicated code after each
insert.  I have done some reading and from what I can gather this could
cause problems because after insert triggers "don't spill to the disk" and
can cause queue problems.   Many people suggest LISTEN NOTIFY but that's
not going to help me because my daemons could be offline and I would lose
records.

I have two questions.

There are some hints out there that it could be possible to do asynchronous
triggers based on dblink but I haven't seen any documentation or examples
of this.   Is there a writeup someplace about this?

Secondly I had the idea of "partitioning" the trigger processing by
partitioning the table and then putting a trigger on each child table.
This way theoretically I could be running the triggers in parallel.  Is my
presumption correct here?  If I only have one table the trigger calls get
queued up one at a time but if I partition my table into N tables I am
running N triggers simultaneously?

Thanks.


Re: [GENERAL] Triggers and scalability in high transaction tables.

2015-02-26 Thread Tim Uckun
I just want to make sure I understood correctly.

All the triggers are firing in a single thread assigned to the connection
and will be run serially no matter how many tables are firing triggers.

If this is correct then yes I guess I have to create a queue of some sort
and process them via an external process.  Thanks.

On Fri, Feb 27, 2015 at 11:12 AM, Jerry Sievers 
wrote:

> Tim Uckun  writes:
>
> > I want to write a trigger which runs semi-complicated code after each
> insert.  I have done some reading and from what I can gather this could
> cause problems because
> > after insert triggers "don't spill to the disk" and can cause queue
> problems.   Many people suggest LISTEN NOTIFY but that's not going to help
> me because my daemons
> > could be offline and I would lose records.
> >
> > I have two questions.
> >
> > There are some hints out there that it could be possible to do
> asynchronous triggers based on dblink but I haven't seen any documentation
> or examples of this.   Is
> > there a writeup someplace about this?
> >
> > Secondly I had the idea of "partitioning" the trigger processing by
> > partitioning the table and then putting a trigger on each child
> > table.  This way theoretically I could be running the triggers
> > in parallel.  Is my presumption correct here?  If I only
> > have one table the trigger calls get queued up one at a time but if I
> > partition my table into N tables I am running N triggers
> > simultaneously?
> >
> False on both counts.
>
> Nothing to prevent concurrent firing of same trigger on same table given
> multi session concurrent insert.
>
> Nothing to prevent contention related single-threading of any triggers
> firing for whatever reason if  the code they are running  will result in
> lock contention with other sessions.
>
> Just like 2 or more sessions trying to update the same row,  you are
> going to single thread around such an operation like it or not.
>
> You need to tell us a lot more about your problem and what the triggers
> do.
>
>
> > Thanks.
> >
>
> --
> Jerry Sievers
> Postgres DBA/Development Consulting
> e: postgres.consult...@comcast.net
> p: 312.241.7800
>


Re: [GENERAL] Running update in chunks?

2013-01-21 Thread Tim Uckun
> Can you try a couple of things just to check timings. Probably worth EXPLAIN
> ANALYSE.
>
> SELECT count(*) FROM cars.imports i JOIN cars.models md ON i.model_id =
> md.id;


Takes about 300 ms

>
> CREATE TEMP TABLE tt AS SELECT i.* FROM cars.imports i JOIN cars.models md
> ON i.model_id = md.id;

Takes about 300 ms

>
> Now the first one should take half a second judging by your previous
> explain. If the second one takes 50 seconds too then that's just the limit
> of your SSD's write. If it's much faster then something else is happening.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running update in chunks?

2013-01-21 Thread Tim Uckun
Just to close this up and give some guidance to future googlers...

There are no foreign keys. The table is empty when I start. I copy the
data into it from a text file.

Removing indexes didn't help much (made a very slight difference).

running a query CREATE TEMP TABLE tt AS SELECT  using a massive
join takes about 8 seconds. I presume that's the baseline for the disk
and RAM given my current postgres configuration.  Note that this is
not a satisfactory option for me because I can't do what I want in one
step (the update I specified is one of many).

running a very simple update "UPDATE imports set make_id = null"
takes over 50 seconds so that's the minimum amount of time any update
is going to take.

Running a complex update where I join all the tables together and
update all the fields takes about 106 seconds.

Just running a complex select with the joins takes no time at all.

I tried chunking the updates using chunks of 100 records and 1000
records (where ID between X and Y repeatedly) and it was even slower.

Conclusion.  Updates on postgres are slow (given the default
postgresql.conf).  I presume this is due to MVCC or the WAL or
something and there are probably some things I can do to tweak the
conf file to make them go faster but out of the box running an update
on a table with lots of rows is going to cost you a lot. Removing the
indexes doesn't help that much.

Suggestion for the PG team.  Deliver a more realistic postgres.conf by
default. The default one seems to be aimed at ten year old PCs with
very little RAM and disk space. At least deliver additional conf files
for small, medium, large, huge setups.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running update in chunks?

2013-01-21 Thread Tim Uckun
>
> Nope.

If you have any suggestions I am all ears. For the purposes of this
discussion we can narrow down the problem this update statement.

Update imports set make_id = null.

There are 98K records in the table. There is no index on the make_id
field. Standard untouched postgresql.conf from the brew install of
postgres.


> Unlikely. Do you really think that a PostgreSQL installation typically runs
> 100 times slower on updates than inserts and every other user has just said
> "oh, that's ok then"? Or is it more likely that something peculiar is broken
> on your setup.

I really don't know. That's why I am here asking.  I don't think
anything particular is broken with my system.  As mentioned above the
setup is really simple. Standard postgres install, the default conf
file, update one field on one table. It takes fifty plus seconds.

I concede that if I was to go into the postgres.conf and make some
changes it will probably run faster (maybe much faster) but I wanted
to exhaust other factors before I went messing with the default
install.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running update in chunks?

2013-01-21 Thread Tim Uckun
> I'd be curious to see results of the same "update" on a standard HDD
> vs the SSD, and maybe on a more typical database deployment hardware
> vs a macbook air.
>


I haven't tried it on any other machine yet.  CREATE TEMP TABLE tt as
SELECT ...  takes eight seconds so presumably the disk is not the
choke point.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running update in chunks?

2013-01-21 Thread Tim Uckun
> First off, what does it say for rows affected? (Hint, if you really
> are using a default configuration and it doesn't say 0 rows
> affected, please show us the actual query used.)

update imports set make_id = null

Query returned successfully: 98834 rows affected, 49673 ms execution time.


vacuum analyze imports

Query returned successfully with no result in 4138 ms.

VACUUM FULL imports;

Query returned successfully with no result in 38106 ms.

VACUUM FREEZE ANALYZE;

Query returned successfully with no result in 184635 ms

update imports set make_id = 0

Query returned successfully: 98834 rows affected, 45860 ms execution time.


So all the vacuuming saved about four seconds of execution time.

here is the postgresql.conf completely untouched from the default
install https://gist.github.com/4590590


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running update in chunks?

2013-01-21 Thread Tim Uckun
Oh I forgot



 SELECT version();

"PostgreSQL 9.2.2 on x86_64-apple-darwin12.2.1, compiled by Apple
clang version 4.1 (tags/Apple/clang-421.11.65) (based on LLVM 3.1svn),
64-bit"


SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');



"application_name";"pgAdmin III - Query Tool";"client"
"bytea_output";"escape";"session"
"client_encoding";"UNICODE";"session"
"client_min_messages";"notice";"session"
"DateStyle";"ISO, DMY";"session"
"default_text_search_config";"pg_catalog.english";"configuration file"
"lc_messages";"en_NZ.UTF-8";"configuration file"
"lc_monetary";"en_NZ.UTF-8";"configuration file"
"lc_numeric";"en_NZ.UTF-8";"configuration file"
"lc_time";"en_NZ.UTF-8";"configuration file"
"log_timezone";"NZ";"configuration file"
"max_connections";"20";"configuration file"
"max_stack_depth";"2MB";"environment variable"
"search_path";"chrysler, public";"session"
"shared_buffers";"1600kB";"configuration file"
"TimeZone";"NZ";"configuration file"


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running update in chunks?

2013-01-21 Thread Tim Uckun
I already posted the schema earlier. It's a handful of integer fields
with one hstore field.


On Tue, Jan 22, 2013 at 1:23 PM, Steve Crawford
 wrote:
> select * from pg_stat_user_tables where relname='yourtable';



Messy output

"relid","schemaname","relname","seq_scan","seq_tup_read","idx_scan","idx_tup_fetch","n_tup_ins","n_tup_upd","n_tup_del","n_tup_hot_upd","n_live_tup","n_dead_tup","last_vacuum","last_autovacuum","last_analyze","last_autoanalyze","vacuum_count","autovacuum_count","analyze_count","autoanalyze_count"
26710,"chrysler","imports",274,25280539,4,2749,98926,5757462,92,327542,98834,0,"2013-01-22
12:28:29.01505+13","2013-01-22 12:32:29.249588+13","2013-01-22
12:28:29.173772+13","2013-01-22 12:32:44.123493+13",3,30,3,24


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running update in chunks?

2013-01-25 Thread Tim Uckun
Sorry I haven't been responsive for a little while.

I ran your script  but creating a new schema instead of my existing
schema. My timings were similar to yours (more or less) except fo the
vacuums which took roughly 147891 ms.


On Tue, Jan 22, 2013 at 3:21 PM, Kevin Grittner  wrote:
> Kevin Grittner wrote:
>
>> update imports set make_id = 0
>>
>> Query returned successfully: 98834 rows affected, 45860 ms execution time.
>
> For difficult problems, there is nothing like a self-contained test
> case, that someone else can run to see the issue. Here's a starting
> point:
>
> create extension if not exists hstore;
> create schema cars;
> drop table if exists cars.imports;
> CREATE TABLE cars.imports
> (
> id serial NOT NULL,
> target_id integer,
> batch_id integer,
> make_id integer,
> model_id integer,
> date timestamp without time zone,
> division_id integer,
> dealer_id integer,
> data hstore,
> created_at timestamp without time zone NOT NULL,
> updated_at timestamp without time zone NOT NULL,
> CONSTRAINT imports_pkey PRIMARY KEY (id)
> );
> CREATE INDEX index_imports_on_data ON cars.imports USING gin (data);
> CREATE INDEX index_imports_on_dealer_id ON cars.imports USING btree 
> (dealer_id);
> CREATE INDEX index_imports_on_division_id ON cars.imports USING btree 
> (division_id);
> CREATE INDEX index_imports_on_make_id ON cars.imports USING btree (make_id);
> CREATE INDEX index_imports_on_model_id ON cars.imports USING btree (model_id);
> insert into cars.imports (created_at, updated_at) select now(), now() from 
> (select generate_series(1, 10)) x;
> vacuum freeze analyze;
> \timing on
> update cars.imports set make_id = 0;
> vacuum analyze;
> update cars.imports set make_id = 0;
>
> ... and here's what I get when I run it on my desktop computer with
> ordinary disk drives and a completely default configuration:
>
> test=# create extension if not exists hstore;
> CREATE EXTENSION
> Time: 48.032 ms
> test=# create schema cars;
> CREATE SCHEMA
> Time: 8.150 ms
> test=# drop table if exists cars.imports;
> NOTICE:  table "imports" does not exist, skipping
> DROP TABLE
> Time: 0.205 ms
> test=# CREATE TABLE cars.imports
> test-# (
> test(# id serial NOT NULL,
> test(# target_id integer,
> test(# batch_id integer,
> test(# make_id integer,
> test(# model_id integer,
> test(# date timestamp without time zone,
> test(# division_id integer,
> test(# dealer_id integer,
> test(# data hstore,
> test(# created_at timestamp without time zone NOT NULL,
> test(# updated_at timestamp without time zone NOT NULL,
> test(# CONSTRAINT imports_pkey PRIMARY KEY (id)
> test(# );
> NOTICE:  CREATE TABLE will create implicit sequence "imports_id_seq" for 
> serial column "imports.id"
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "imports_pkey" 
> for table "imports"
> CREATE TABLE
> Time: 152.677 ms
> test=# CREATE INDEX index_imports_on_data ON cars.imports USING gin (data);
> CREATE INDEX
> Time: 6.391 ms
> test=# CREATE INDEX index_imports_on_dealer_id ON cars.imports USING btree 
> (dealer_id);
> CREATE INDEX
> Time: 64.668 ms
> test=# CREATE INDEX index_imports_on_division_id ON cars.imports USING btree 
> (division_id);
> CREATE INDEX
> Time: 65.573 ms
> test=# CREATE INDEX index_imports_on_make_id ON cars.imports USING btree 
> (make_id);
> CREATE INDEX
> Time: 64.959 ms
> test=# CREATE INDEX index_imports_on_model_id ON cars.imports USING btree 
> (model_id);
> CREATE INDEX
> Time: 64.906 ms
> test=# insert into cars.imports (created_at, updated_at) select now(), now() 
> from (select generate_series(1, 10)) x;
> INSERT 0 10
> Time: 2516.559 ms
> test=# vacuum freeze analyze;
> VACUUM
> Time: 3357.778 ms
> test=# \timing on
> Timing is on.
> test=# update cars.imports set make_id = 0;
> UPDATE 10
> Time: 2937.241 ms
> test=# vacuum analyze;
> VACUUM
> Time: 2097.426 ms
> test=# update cars.imports set make_id = 0;
> UPDATE 10
> Time: 3935.939 ms
>
> Ubuntu 12.10
> i7-3770 CPU @ 3.40GHz with 16GB RAM
> Linux Kevin-Desktop 3.5.0-22-generic #34-Ubuntu SMP Tue Jan 8 21:47:00 UTC 
> 2013 x86_64 x86_64 x86_64 GNU/Linux
> A pair of WDC WD10EALX-009BA0 (15.01H51) 7200 RPM drives in RAID 1.
>
>  PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 
> 4.7.2-2ubuntu1) 4.7.2, 64-bit
>
> test=# SELECT name, current_setting(name), source
> test-#   FROM pg_settings
> test-#   WHERE source NOT IN ('default', 'override');
> name|  current_setting   |source
> ++--
>  application_name   | psql   | client
>  client_encoding| UTF8   | client
>  DateStyle  | ISO, MDY   | configuration file
>  default_text_search_config | pg_catalog.english | configuration file
>  lc_messages| en_US.UTF-8| configuration file
>  lc_monetary| en_US.UTF-8| configuration file
>  lc_nu

Re: [GENERAL] Running update in chunks?

2013-01-25 Thread Tim Uckun
>
> What if you do:
> alter table cars.imports set (fillfactor=50);
> Before the vacuum full, and then try the update again?


This makes a dramatic difference when combined with a vacuum.

UPDATE 98834
Time: 3408.210 ms

Ten times faster!


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running update in chunks?

2013-01-25 Thread Tim Uckun
relid | 26710
schemaname| cars
relname   | imports
seq_scan  | 280
seq_tup_read  | 25873543
idx_scan  | 4
idx_tup_fetch | 2749
n_tup_ins | 98926
n_tup_upd | 6350466
n_tup_del | 92
n_tup_hot_upd | 625286
n_live_tup| 98834
n_dead_tup| 0
last_vacuum   | 2013-01-25 21:55:36.078614+13
last_autovacuum   | 2013-01-25 21:58:40.850546+13
last_analyze  | 2013-01-25 21:55:36.305967+13
last_autoanalyze  | 2013-01-25 21:51:54.307639+13
vacuum_count  | 6
autovacuum_count  | 32
analyze_count | 6
autoanalyze_count | 25


On Wed, Jan 23, 2013 at 7:50 AM, Steve Crawford
 wrote:
> On 01/21/2013 05:02 PM, Tim Uckun wrote:
>>
>> I already posted the schema earlier. It's a handful of integer fields
>> with one hstore field.
>
>
> Oh well. I can't find it but maybe it got lost in shipping or eaten by a
> spam filter.
>
>
>>
>>
>> On Tue, Jan 22, 2013 at 1:23 PM, Steve Crawford
>>  wrote:
>>>
>>> select * from pg_stat_user_tables where relname='yourtable';
>>
>>
>>
>> Messy output
>
> Don't know if you are in psql but if so, expanded display works for this.
> I.e.:
> steve@[local] => \x
> Expanded display is on.
> steve@[local] => select * from pg_stat_user_tables where relname='footest';
> -[ RECORD 1 ]-+--
> relid | 781691
> schemaname| public
> relname   | footest
> seq_scan  | 3
> seq_tup_read  | 609
> idx_scan  |
> idx_tup_fetch |
> n_tup_ins | 609
> n_tup_upd | 0
> n_tup_del | 0
> n_tup_hot_upd | 0
> n_live_tup| 301
> n_dead_tup| 0
> last_vacuum   |
> last_autovacuum   |
> last_analyze  |
> last_autoanalyze  | 2012-12-19 08:42:23.347368-08
> vacuum_count  | 0
> autovacuum_count  | 0
> analyze_count | 0
> autoanalyze_count | 2
>
>
>
>>
>>
>> "relid","schemaname","relname","seq_scan","seq_tup_read","idx_scan","idx_tup_fetch","n_tup_ins","n_tup_upd","n_tup_del","n_tup_hot_upd","n_live_tup","n_dead_tup","last_vacuum","last_autovacuum","last_analyze","last_autoanalyze","vacuum_count","autovacuum_count","analyze_count","autoanalyze_count"
>>
>> 26710,"chrysler","imports",274,25280539,4,2749,98926,5757462,92,327542,98834,0,"2013-01-22
>> 12:28:29.01505+13","2013-01-22 12:32:29.249588+13","2013-01-22
>> 12:28:29.173772+13","2013-01-22 12:32:44.123493+13",3,30,3,24
>>
>>
> So at least autovacuum is running (and some manual vacuum and analyze as
> well).
>
> Cheers,
> Steve
>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running update in chunks?

2013-01-25 Thread Tim Uckun
> I agree that seems like the most likely cause. Each update to the
> row holding the hstore column requires adding new index entries for
> all the hstore elements, and autovacuum will need to clean up the
> old ones in the background. The best solution would be to either
> normalize the data instead of using hstore, or move the hstore to a
> separate table which is referenced by some sort of ID from the
> frequently-updated table.


That's very interesting. I can certainly split up the table, no big
deal there.  So would the index be redone even if I am not updating
the hstore field itself?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running update in chunks?

2013-01-25 Thread Tim Uckun
>
> That suggests (to me, at least) that it is related to index updating. Again,
> your GIN index seems primary candidate.
>
> Try running iostat (I think that's available on a Mac) with/without the
> fillfactor and with/without the GIN index while you do the updates. It's
> possible your SSD is just behaving oddly under stress.
>


I dropped the index and the numbers shot up tenfold or more.  I don't
know why postgres feels the need to update the GIN index on the hstore
field when I am only updating an integer field but it looks like I
need to split the hstore into a different table.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running update in chunks?

2013-01-26 Thread Tim Uckun
>
> That would be one solution, but I think a better one would be to not
> store "make_id" in "imports" in the first place, but instead to always
> fetch it by joining "imports" to "models" at query time.
>

My problem here is that the incoming data is quite messy so the join
conditions become weird (lots of ORs and such). A multi pass approach
seems to work better.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Dropping default privileges.

2013-01-29 Thread Tim Uckun
I am unable to drop a user.

drop role tim;
ERROR:  role "tim" cannot be dropped because some objects depend on it
DETAIL:  owner of default privileges on new relations belonging to
role tim in schema strongmail

ALTER DEFAULT PRIVILEGES IN SCHEMA strongmail
REVOKE INSERT, SELECT, UPDATE, DELETE, TRUNCATE, REFERENCES,
TRIGGER ON TABLES
FROM tim;

drop role tim;
ERROR:  role "tim" cannot be dropped because some objects depend on it
DETAIL:  owner of default privileges on new relations belonging to
role tim in schema strongmail

reassign owned by tim to postgres;

Nothing seems to work.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Dropping default privileges.

2013-01-29 Thread Tim Uckun
>> drop role tim;
>> ERROR:  role "tim" cannot be dropped because some objects depend on it
>> DETAIL:  owner of default privileges on new relations belonging to
>> role tim in schema strongmail
>
> DROP OWNED BY ought to get rid of that.


Just to be clear.

I don't want to drop the schema. Will DROP OWNED BY only drop the
priviliege or the schema?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Dropping default privileges.

2013-01-30 Thread Tim Uckun
Thanks.

Worked.

On Wed, Jan 30, 2013 at 9:12 PM, Albe Laurenz  wrote:
> Tim Uckun wrote:
>>>> drop role tim;
>>>> ERROR:  role "tim" cannot be dropped because some objects depend on it
>>>> DETAIL:  owner of default privileges on new relations belonging to
>>>> role tim in schema strongmail
>>>
>>> DROP OWNED BY ought to get rid of that.
>>
>>
>> Just to be clear.
>>
>> I don't want to drop the schema. Will DROP OWNED BY only drop the
>> priviliege or the schema?
>
> It will not drop the schema unless the schema is owned by
> the role you are dropping.
>
> Yours,
> Laurenz Albe


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Upsert Functionality using CTEs

2013-02-10 Thread Tim Uckun
I am using a query like this to try and normalize a table.

WITH nd as (select * from sales order by id limit 100),

people_update as (update people p set first_name = nd.first_name from
nd where p.email = nd.email returning nd.id),

insert into  people (first_name, email, created_at, updated_at)
  select first_name, email , now(), now()
  from nd
  left join people_update using(id) where
people_update.id is null),

This works pretty good except for when the top 100 records have
duplicated email address (two sales for the same email address).

I am wondering what the best strategy is for dealing with this
scenario.  Doing the records one at a time would work but obviously it
would be much slower.  There are no other columns I can rely on to
make the record more unique either.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Upsert Functionality using CTEs

2013-02-11 Thread Tim Uckun
> The best strategy is fixing your data-model so that you have a unique key. As 
> you found out already, e-mail addresses aren't very suitable as unique keys 
> for people. For this particular case I'd suggest adding a surrogate key.
>
> Alternatively, you might try using (first_name, email) as your key. You'll 
> probably still get some duplicates, but they should be less and perhaps few 
> enough for your case.
>

Yea I'll have to see if I can do a better match.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Upsert Functionality using CTEs

2013-02-11 Thread Tim Uckun
>
>
> I read it that he has multiple sales from the same person? In which case
> pretending that the two sales were from different people isn't the correct
> result at all.

Actually it turns out that both cases exist.

>
> I may be missing the point of the query, but wasn't it to add an entry for
> each email address and (if it already exists in people) to update the
> firstname field to match the firstname in the order?

Yes. Basically the sales table is denormalized and I need to create or
update normalized tables from it. As you have pointed out the
difficult scenario is when the same person has multiple sales.


> Given that the results will be returned in random order I would just change
> the "nd" query to SELECT DISTINCT ON (email) , but like I said I may be
> missing the point.

I think in this case I am probably going to have go through the
records one at a time.  I run into the same issues with other columns
such as products.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Why is this a cross join?

2013-02-17 Thread Tim Uckun
I have a seemingly simple query that returns way too many records.
Here is the count query

select count(*) from (
select crm.*
from sap_crm_sales crm
 inner join model_configurations mc on left(crm.customer_class, 6)
= left(mc.sap_code,6)
) as s

The crm table has about 44K records, the model table has about 1K the
join produces about 9 million  records

It doesn't matter if I make it a left join or inner join the count
ends up the same.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this a cross join?

2013-02-17 Thread Tim Uckun
> Apparently the first 6 characters of those fields are quite common, which
> gives you a result for every possible combination of the same 6-character
> value.


M. That seems kind of weird.   Is there any way to NOT have this
be a cross join?  For example if I extracted the first six characters
into a field and then joined on them it would not be a cross join
right?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this a cross join?

2013-02-17 Thread Tim Uckun
>
> In some way, every join is a cross join, with the results filtered according 
> to the specificity of the join conditions. In this case:
>
> inner join model_configurations mc on left(crm.customer_class, 6) = 
> left(mc.sap_code,6)
>
> "customer_class" sounds like a fairly generic sort of field, so you'd expect 
> many matches. Truncating the fields is likely to make this even less 
> specific, returning more results.
>

I guess I am not explaining it properly..

Say I created new columns on both tables called "first_6" and
populated them with the substrings.  If I did a inner join or a left
join on those fields would I still get a cross join?

inner join model_configurations mc on mc.first_6 = crm.first_6


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this a cross join?

2013-02-18 Thread Tim Uckun
> for every row in one of the tables?  If so, how do you want to
> determine which of the matching rows in the other table to choose,
> and which to ignore?


In this case it's a simple lookup. Any of the matching rows will do
but I just want one.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Using varchar primary keys.

2013-03-31 Thread Tim Uckun
Consider the following scenario.

I have a typical tagging structure. There is a table called tags, there is
a table called taggings. The taggings table acts as a many to many join
table between the taggers and the tags.

The tags table has two fields id and tag. id is a serial data type.

The taggings has a tag_id field along with the context, tagger_id etc.

I don't think there is even one query in this scenario which does not join
the taggings table to the tags table so I am wondering if there is any
advantage at all of having that id field in the tags table. Why shouldn't I
just put the tag itself as the primary key?  The tags are quite short so if
pg is smart about it I would guess they would take up less space than an
integer in some cases.

I guess the same scenario happens in many of my lookup tables. I have all
these tables with just two fields in them.  id field and then some varchar
field.  Then every time I want to display that record I join with five
other tables so I can look up all the associated lookup tables and display
the text value to the user.

Given that you can still set foreign keys and cascade commands to adjust
child records either way the tables would be properly normalized so I am
wondering what I am gaining by using these serial ID fields.

Is this false economy?


Re: [GENERAL] Using varchar primary keys.

2013-03-31 Thread Tim Uckun
>
>
> Natural Keys have a purpose but when do they exist in the database?
>
>
In my case it's the tags. Table tags has only two columns id and tag. Each
one has a unique index.  I have many other similar lookup tables. For
example things like order_status, item_type etc.



> What about the Natural Keys of an external source? Should that be stored
> in a lookup table along with the integer based Surrogate Key?
> Maybe I'll call it "bars_pks".
>
>
I always keep those.  The API's I expose to clients and partners allow them
to modify the records they send and I let them refer to items by their own
primary keys.  This has always worked out well for me.



>
> --Spelling error.
>
> UPDATE tags SET tag = 'foo' WHERE tag = 'fu';
>
> This will fail unless you ON UPDATE CASCADE.
>
>
Yes of course you'd need the on update cascade.  Then again maybe I don't
even need that tags table. Tags could just be a view (select distinct tag
from child_tags).  I am not saying that's efficient or desirable but it's
possible.


Re: [GENERAL] Using varchar primary keys.

2013-03-31 Thread Tim Uckun
>
> how about using an enum instead of this table?
>
>
That's an interesting idea.  Are enums mutable?


Re: [GENERAL] Using varchar primary keys.

2013-04-01 Thread Tim Uckun
On Tue, Apr 2, 2013 at 8:35 AM, jesusthefrog  wrote:

> On the topic of 'natural' versus 'synthetic' primary keys, I am generally
> in the camp that an extra ID field won't cost you too much, and while one
> may not need it for a simple table (i.e. id, name) one might add any number
> of columns later, and you'll be glad to have it.
>
>
Nothing prevents you from adding more columns if you use varchar primary
keys.


>
> My preferred method is to give every table an ID column of UUID type and
> generate a UUID using the uuid-ossp contrib module. This also prevents
> someone not familiar with the database design from using an ID somewhere
> they should not (as is possible with natural PKs) or treating the ID as an
> integer, not an identifier (as is all too common with serial integers).
>
>
>
This would be a concern if you had multi master writes . As far as I know
Postgres does not have a true multi master replication system so all the
writes have to happen on one server right?

As for UUIDs I use them sometimes but I tend to also use one serial column
because when I am communicating with people it makes it so much easier to
say "dealer number X" than "dealer number SOME_HUGE_UUID".  I often have to
talk to people about the data and UUIDs make it very difficult to
communicate with humans.


[GENERAL] Shortcut evaluation in OR or IN

2013-05-05 Thread Tim Uckun
Say I have a select like this.

SELECT * FROM table where field = X OR field = Y limit 1

And I have two records one that matches X and one that matches Y will I
always get X because the evaluation will stop after the first clause in the
OR matches?

What about for IN (X, Y)

how about if I am doing an update

UPDATE table1 set x=table2.y where table1.field1 = table2.field1 OR
table1.field2=table2.field2

Will it update based on field1 if both fields match?


Basically I want to know if and how OR shortcuts the evaluation.

Thanks.


Re: [GENERAL] Shortcut evaluation in OR or IN

2013-05-07 Thread Tim Uckun
Thanks for the explanation.



On Mon, May 6, 2013 at 8:43 PM, Jasen Betts  wrote:

> On 2013-05-06, Tim Uckun  wrote:
> > --047d7b2e4ea07402b004dc034a3b
> > Content-Type: text/plain; charset=UTF-8
> >
> > Say I have a select like this.
> >
> > SELECT * FROM table where field = X OR field = Y limit 1
> >
> > And I have two records one that matches X and one that matches Y will I
> > always get X because the evaluation will stop after the first clause in
> the
> > OR matches?
>
> no. there is no guarantee which matching row you will get. Testing may
> suggest that one answer is preferred but udating the table can change
> which one. also you may get a different row without updating the table.
>
> > What about for IN (X, Y)
>
> same deal.
>
> > how about if I am doing an update
> >
> > UPDATE table1 set x=table2.y where table1.field1 = table2.field1 OR
> > table1.field2=table2.field2
> >
> > Will it update based on field1 if both fields match?
>
> what difference does that make to the result?
>
> > Basically I want to know if and how OR shortcuts the evaluation.
>
> In a word. "unpredictably".
>
> The planner will try to do the cheapest, most useful side first.
>
> --
> ⚂⚃ 100% natural
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


[GENERAL] "Reverse" inheritance?

2017-04-03 Thread Tim Uckun
I am trying to make postgres tables work like an object hierarchy. As an
example I have done this.

drop table if exists os.linux cascade;
create table os.linux
(
script_name character varying(255) NOT NULL,
script_body text,
CONSTRAINT os_linux_pkey PRIMARY KEY (script_name)
);

drop table if exists os.red_hat;

CREATE TABLE os.red_hat
(
CONSTRAINT os_red_hat_pkey PRIMARY KEY (script_name)
)INHERITS (os.linux);

drop table if exists os.debian;
CREATE TABLE os.debian
(
CONSTRAINT os_debian_pkey PRIMARY KEY (script_name)
)INHERITS (os.linux);


insert into os.linux(script_name, script_body) VALUES ('package', 'tgz' );
insert into os.linux(script_name, script_body) VALUES ('awk', 'awk' );
insert into os.debian(script_name, script_body) values( 'package', 'apt');
insert into os.red_hat(script_name, script_body) values( 'package', 'yum');

When I do  SELECT * from os.debian I would like to get two records one
where the package is 'apt' and one where the awk is 'awk'.

So the package row is overridden in the child but it inherits the parent
row.

Is there a way to do this?

Ideally I would like to have a deeper hierarchy like nix -> linux -> debian
-> ubuntu -> ubuntu_16_04

so that when I select from ubuntu_16_04 I get all the rows from all the
parent tables but properly overridden so they don't union.

Thanks.


  1   2   3   >