Re: [GENERAL] efficient way to do "fuzzy" join

2014-04-15 Thread Rémi Cura
A little related bonus :

when doing the time-join,
the next step is to interpolate to have a more accurate estimation :

---
DROP FUNCTION IF EXISTS range_interpolate(nr anyrange,obs anyelement) ;
CREATE OR REPLACE FUNCTION range_interpolate(nr anyrange,obs
anyelement)
RETURNS TABLE(lower_weight NUMERIC,upper_weight NUMERIC)
AS $$
--@param a range
--@param an observation (value) of the same type as the range
--@return the weight to apply to lower bound and upper bound of
range to get the value.

--exceptions : -inf or +inf : weight of the bound is 0, the other
1.
--exceptions : range = a point : returns weight of 0.5 for each
bound (they are identical but the asociated data may not be)
SELECT
CASE WHEN upper(nr)=lower(nr) THEN ROW(0.5,0.5)
--WHEN obs=lower(nr) AND obs=upper(nr) THEN ARRAY[0.5,0.5]
--THEN (obs-lower(nr))/ra, (upper(nr)-obs)/ra
WHEN lower_inf(nr)=TRUE OR lower(nr) IS NULL THEN ROW(0,1)
WHEN upper_inf(nr)=TRUE OR upper(nr) IS NULL THEN ROW(1,0)
ELSE ROW(
(upper(nr)-obs)/(upper(nr)-lower(nr)),(obs-lower(nr))/(upper(nr)-lower(nr)))
END

--testing :
--SELECT * FROM range_interpolate(numrange(1,10) ,  round(10,2))
$$
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;
--
Cheers,
Rémi-C


Re: [GENERAL] streaming replication + wal shipping

2014-04-15 Thread Albe Laurenz
> Anupama Ramaswamy wrote:
>>> I would like to setup a 2 servers with streaming replication, one master 
>>> and another hot standby.
>>> I want to use the standby for read-only queries. So I want the replication 
>>> lag to be as small as
>>> possible.
>>> So I choose streaming replication over WAL shipping.
>>>
>>> When the master fails, I want the standby to take over as master. So I 
>>> would like minimal data loss,
>>> if there is a streaming replication delay.
>>>
>>> Is it possible to setup such a way that under normal conditions the standby 
>>> by replicating using
>>> streaming replication and on failover, it uses the WAL archive for syncing 
>>> up with the transactions.
>>> Of course the WAL will be available on a shared storage volume. If this is 
>>> possible, what exactly do
>>> I need in my configuration files - postgresql.conf, recovery.conf ?
>> 
>> Most of this will happen automatically - WAL archives are used if recovery
>> falls behind.

> So are you saying that if I setup the following in my recovery.conf
> restore_command =.
> 
> It will it be used only when the streaming replication falls behind more than 
> ( wal_keep_segments ) or
> replication stream is not available (master goes down) ?

Yes.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] [GENARAL] round() bug?

2014-04-15 Thread Willy-Bas Loos
Hi,
I ran into some strange behavior.
Seems like a bug to me?

wbloos=# select round(0.5::numeric), round(0.5::double precision);
 round | round
---+---
 1 | 0
(1 row)

wbloos=# select version();

version
--
 PostgreSQL 9.1.13 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1, 64-bit
(1 row)

Cheers,
-- 
Willy-Bas Loos


Re: [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Vincent Veyron
On Mon, 14 Apr 2014 15:22:13 -0700
Robert DiFalco  wrote:

Hi Robert,

> But then I lose a bunch of data like the TIMESTAMPTZ of the call, answer,
> connection, etc. Btw, currently these tables never need to be UPDATEd. They
> are immutable in the current design.

Yes, but you wrote :

>Occasionally I will want to know things like "When was the last time a user
>answered a call" or "How many times has a user been called".

With your schema, you would have to query several tables each time. Queries 
will get messy in a hurry, updates and DDL changes too.


> And in the end I'm not sure how the
> proposal of one table and a state that is updatable changes the basic
> thrust of the question. For example, getting last call, last answered,
> total called, total answered. If the state of a call transitions from
> called to answered then making it a field loses all the data with the
> previous state, make sense?
>

If you need this, you can use a schema that accommodates it.

(off the top of my head, insert a new record instead of altering an existing 
one, and create a hierarchy with parent records that point to an original call, 
may be with a 'parent' field and recursive queries. You probably have many 
requirements that I'm not aware of, but this one can be met)


-- 
Salutations, Vincent Veyron

http://marica.fr
Gestion des contentieux juridiques, des contrats et des sinistres d'assurance


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Robert DiFalco
I'm sorry Vincent I'm not exactly sure what you are proposing. Are you
proposing that I add another table in addition to what I already have that
all the other tables JOIN to and add a state field in that parent table?
How is that different than what I have except now I have a new table with
an updatable state field? Maybe you can show a query or two to more
specifically show what you are suggesting?

Right now this seems like a simple way to get the last time John was called:

// last answered called for John
SELECT MAX(a.answered)
FROM calls_answered a JOIN calls c ON c.id = a.id
WHERE c.user = John;

If I don't have a calls_answered table I'm not sure how I would get this
information if I had a single table with a mutable state. Unless you are
suggesting denormalizing all the tables into one table that would have a
lot of null fields. For example answered_date would be null if the call was
never answered.






On Tue, Apr 15, 2014 at 5:37 AM, Vincent Veyron  wrote:

> On Mon, 14 Apr 2014 15:22:13 -0700
> Robert DiFalco  wrote:
>
> Hi Robert,
>
> > But then I lose a bunch of data like the TIMESTAMPTZ of the call, answer,
> > connection, etc. Btw, currently these tables never need to be UPDATEd.
> They
> > are immutable in the current design.
>
> Yes, but you wrote :
>
> >Occasionally I will want to know things like "When was the last time a
> user
> >answered a call" or "How many times has a user been called".
>
> With your schema, you would have to query several tables each time.
> Queries will get messy in a hurry, updates and DDL changes too.
>
>
> > And in the end I'm not sure how the
> > proposal of one table and a state that is updatable changes the basic
> > thrust of the question. For example, getting last call, last answered,
> > total called, total answered. If the state of a call transitions from
> > called to answered then making it a field loses all the data with the
> > previous state, make sense?
> >
>
> If you need this, you can use a schema that accommodates it.
>
> (off the top of my head, insert a new record instead of altering an
> existing one, and create a hierarchy with parent records that point to an
> original call, may be with a 'parent' field and recursive queries. You
> probably have many requirements that I'm not aware of, but this one can be
> met)
>
>
> --
> Salutations, Vincent Veyron
>
> http://marica.fr
> Gestion des contentieux juridiques, des contrats et des sinistres
> d'assurance
>


Re: [GENERAL] Querying all documents for a company and its projects etc

2014-04-15 Thread David G Johnston
A couple of thoughts:

1) The "union" really only pertains to the entity table rows; once you
"union all" those (duplicates should not matter and probably will not even
be present so using "all" avoids an unnecessary sort) you can join that
sub-query to the document_usage table.

2) Since every entity must have an associated company moving the company_id
field to the entity table will allow a direct search for company objects
using entity alone and remove the need to perform the union.  The "company"
table omits the implicit self-referencing company_id but it is still there
in reality.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Querying-all-documents-for-a-company-and-its-projects-etc-tp5799967p5799978.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Vincent Veyron
On Mon, 14 Apr 2014 09:27:29 -0700
Robert DiFalco  wrote:

> I have several related tables that represent a call state. 
> 
> And so on for calls_connected, calls_completed, call_errors, etc.
> 
> So for my question -- is the choice between these a personal preference
> sort of thing or is there a right or wrong approach? Am I missing another
> approach that would be better?  

Hi Robert,

I guess a call state is subject to change, in which case you would have to 
shuffle records between tables when that happens?

ISTM you should consider using only a 'calls' table, and add an 'id_call_state' 
field to it that references the list of possible states. This would make your 
queries simpler.

create table call_state(
id_call_state text PRIMARY KEY,
libelle text);

INSERT INTO call_state (id_call_state, libelle) VALUES ('calls_connected', 
'Connected'), ('calls_completed', 'Completed'), ('call_errors', 'Error');

> CREATE TABLE calls (
>   id  BIGINT NOT NULL, // sequence generator

id_call_state INTEGER NOT NULL REFERENCES call_state,

>   user_id BIGINT NOT NULL,
>   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
> 
>   PRIMARY KEY (id),
>   FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
> );


-- 
Regards, Vincent Veyron 

http://libremen.com/ 
Legal case, contract and insurance claim management software


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] streaming replication + wal shipping

2014-04-15 Thread senthilnathan
*postgresql.conf:*
wal_level = hot_standby
hot_standby = on
archive_mode = on
archive_command = 'cp %p ../archive/%f'
## "archive" parallel to data directory 
max_wal_senders = 10
## number of slaves 

*recovery.conf(in slave)*
restore_command = 'cp ../archive/%f "%p"'
standby_mode = 'on'
primary_conninfo = 'host=MASTERIP port=5432 user=REPLUSER password=REPLPASS'
trigger_file = '/tmp/pg_failover_trigger'
recovery_target_timeline='latest'

*pg_hba.conf*
host   replicationREPLUSER   0.0.0.0/0 md5

The above entries to be added for replication to be configured.

-Senthil



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/streaming-replication-wal-shipping-tp5799821p5800081.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Chris Curvey
On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco
wrote:

> I have several related tables that represent a call state. Let's think of
> these as phone calls to simplify things. Sometimes I need to determine the
> last time a user was called, the last time a user answered a call, or the
> last time a user completed a call.
>
> The basic schema is something like this:
>
> CREATE TABLE calls (
>   id  BIGINT NOT NULL, // sequence generator
>   user_id BIGINT NOT NULL,
>   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>
>   PRIMARY KEY (id),
>   FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
> );
>
> CREATE TABLE calls_answered (
>   idBIGINT NOT NULL,
>   answered  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>
>   PRIMARY KEY (id),
>   FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE
> );
>
>
> And so on for calls_connected, calls_completed, call_errors, etc.
>
> Occasionally I will want to know things like "When was the last time a
> user answered a call" or "How many times has a user been called".
>
> I can do these queries using a combination of MAX or COUNT. But I'm
> concerned about the performance.
>
> SELECT MAX(a.id)
> FROM calls_answered a JOIN calls c ON c.id = a.id
> WHERE c.user_id = ?;
>
>
> Or the number of answered calls:
>
> SELECT MAX(a.id)
> FROM calls_answered a JOIN calls c ON c.id = a.id
> WHERE c.user_id = ?;
>
>
> Sometimes I might want to get this data for a whole bunch of users. For
> example, "give me all users whose have not answered a call in the last 5
> days." Or even "what percentage of users called actually answered a call."
> This approach could become a performance issue. So the other option is to
> create a call_summary table that is updated with triggers.
>
> The summary table would need fields like "user_id", "last_call_id",
> "call_count", "last_answered_id", "answered_count", "last_completed_id",
> "last_completed_count", etc.
>
> My only issue with a summary table is that I don't want a bunch of null
> fields. For example, if the user was *called* but they have never
> *answered* at call then the last_call_id and call_count fields on the
> summary table would be non-NULL but the last_answer_id and answer_count
> fields WOULD be NULL. But over time all fields would eventually become
> non-NULL.
>
> So that leads me to a summary table for EACH call state. Each summary
> table would have a user id, a ref_id, and a count -- one summary table for
> each state e.g. call_summary, call_answered_summary, etc.
>
> This approach has the down side that it creates a lot of tables and
> triggers. It has the upside of being pretty efficient without having to
> deal with NULL values.  It's also pretty easy to reason about.
>
> So for my question -- is the choice between these a personal preference
> sort of thing or is there a right or wrong approach? Am I missing another
> approach that would be better?  I'm okay with SQL but I'm not expert so I'm
> not sure if there is an accepted DESIGN PATTERN for this that I am missing.
>
> Thanks!
>
>
>
My initial thought is:  that design is over-normalized. The thing you are
trying to model is the call, and it has severl attributes, some of which
may be unknown or not applicable (which is what NULL is for).  So my
thought would be to do something like this:

CREATE TABLE calls (
  id  BIGINT NOT NULL, // sequence generator
  user_id BIGINT NOT NULL,
  called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
answered  TIMESTAMPTZ

  PRIMARY KEY (id),
  FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
);


-- 
I asked the Internet how to train my cat, and the Internet told me to get a
dog.


Re: [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Chris Curvey
On Tue, Apr 15, 2014 at 10:56 AM, Chris Curvey wrote:

> On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco  > wrote:
>
>> I have several related tables that represent a call state. Let's think of
>> these as phone calls to simplify things. Sometimes I need to determine the
>> last time a user was called, the last time a user answered a call, or the
>> last time a user completed a call.
>>
>> The basic schema is something like this:
>>
>> CREATE TABLE calls (
>>   id  BIGINT NOT NULL, // sequence generator
>>   user_id BIGINT NOT NULL,
>>   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>>
>>   PRIMARY KEY (id),
>>   FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
>> );
>>
>> CREATE TABLE calls_answered (
>>   idBIGINT NOT NULL,
>>   answered  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>>
>>   PRIMARY KEY (id),
>>   FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE
>> );
>>
>>
>> And so on for calls_connected, calls_completed, call_errors, etc.
>>
>> Occasionally I will want to know things like "When was the last time a
>> user answered a call" or "How many times has a user been called".
>>
>> I can do these queries using a combination of MAX or COUNT. But I'm
>> concerned about the performance.
>>
>> SELECT MAX(a.id)
>> FROM calls_answered a JOIN calls c ON c.id = a.id
>> WHERE c.user_id = ?;
>>
>>
>> Or the number of answered calls:
>>
>> SELECT MAX(a.id)
>> FROM calls_answered a JOIN calls c ON c.id = a.id
>> WHERE c.user_id = ?;
>>
>>
>> Sometimes I might want to get this data for a whole bunch of users. For
>> example, "give me all users whose have not answered a call in the last 5
>> days." Or even "what percentage of users called actually answered a call."
>> This approach could become a performance issue. So the other option is to
>> create a call_summary table that is updated with triggers.
>>
>> The summary table would need fields like "user_id", "last_call_id",
>> "call_count", "last_answered_id", "answered_count", "last_completed_id",
>> "last_completed_count", etc.
>>
>> My only issue with a summary table is that I don't want a bunch of null
>> fields. For example, if the user was *called* but they have never
>> *answered* at call then the last_call_id and call_count fields on the
>> summary table would be non-NULL but the last_answer_id and answer_count
>> fields WOULD be NULL. But over time all fields would eventually become
>> non-NULL.
>>
>> So that leads me to a summary table for EACH call state. Each summary
>> table would have a user id, a ref_id, and a count -- one summary table for
>> each state e.g. call_summary, call_answered_summary, etc.
>>
>> This approach has the down side that it creates a lot of tables and
>> triggers. It has the upside of being pretty efficient without having to
>> deal with NULL values.  It's also pretty easy to reason about.
>>
>> So for my question -- is the choice between these a personal preference
>> sort of thing or is there a right or wrong approach? Am I missing another
>> approach that would be better?  I'm okay with SQL but I'm not expert so I'm
>> not sure if there is an accepted DESIGN PATTERN for this that I am missing.
>>
>> Thanks!
>>
>>
>>
> (Sorry, fat-fingered and hit "send too early"...)

CREATE TABLE calls (
  id  BIGINT NOT NULL, // sequence generator
  user_id BIGINT NOT NULL,
  called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
  answered  TIMESTAMPTZ NULL,
  connected  TIMESTAMPTZ NULL,
  completed  TIMESTAMPTZ NULL,


  PRIMARY KEY (id),
  FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
);

Then  your queries end up looking like this:

--last time john answered
SELECT MAX(a.id)
FROM calls
where answered is not null
and user_id = ?

-- users that have not answered a call in the last five days (I can think
of a few ways to interpret that phrase)
select myusers.*
from myusers
where not exists
(  select *
  from calls
  where calls.user_id = myusers.user_id
  and answered >= )

-- average ring time
select avg(extract ('seconds' from called - answered))
where answered is not null



-- 
I asked the Internet how to train my cat, and the Internet told me to get a
dog.


Re: [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Vincent Veyron
On Tue, 15 Apr 2014 07:21:58 -0700
Robert DiFalco  wrote:

> I'm sorry Vincent I'm not exactly sure what you are proposing. Are you
> proposing that I add another table in addition to what I already have that
> all the other tables JOIN to and add a state field in that parent table?

No : keep table 'calls' with an additional 'status' field, and drop the other 
tables

> How is that different than what I have except now I have a new table with
> an updatable state field? Maybe you can show a query or two to more
> specifically show what you are suggesting?
> 
> Right now this seems like a simple way to get the last time John was called:
> 
> // last answered called for John
> SELECT MAX(a.answered)
> FROM calls_answered a JOIN calls c ON c.id = a.id
> WHERE c.user = John;
> 

I probably misunderstood something from your first post. Not sure what you call 
calls_completed, call_errors for instance, but I had the impression your 
records would go into a different table according to their status. Do 
calls_answered move to calls_completed at some point?

In that case, how do you know that you should query calls_answered instead of 
calls_completed? Or, if John did not answer, do you query call_errors?

> If I don't have a calls_answered table I'm not sure how I would get this
> information if I had a single table with a mutable state. 

Check an appropriate boolean field (call_answered boolean not null default 
false) would be a way. Again, this needs studying, and it would take more 
details to go on.

I stand by my earlier comment though, see Chris's answer which is on the same 
line.

>Unless you are
> suggesting denormalizing all the tables into one table that would have a
> lot of null fields. For example answered_date would be null if the call was
> never answered.
> 

Not a big problem I should say, unless you deal with really hude data.

-- 
Salutations, Vincent Veyron

http://marica.fr
Gestion des contentieux juridiques, des contrats et des sinistres d'assurance


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Robert DiFalco
Actually that was exactly the initial table design. There were more fields
because for my use case there were a lot more states and certain states
have additional data (for example when a call goes from answered to
connected it also gets the user_id of the person being connected to). So
that one table started getting a LOT of columns which starting making it
hard to reason about.

The more normalized version has a couple of things going for it. COUNT,
MIN, MAX, etc are very fast because I don't have to conditionally add null
checks. Everything is inserted so for the millions of calls that get made
the normalized schema was much more efficient for writing. It was also
easier to understand. The answer table only has calls that were answered,
the error table only has calls the resulted in an error after being
connected, etc.

I know this kind of gets into a religious area when discussing NULLs and
what level of normalization is appropriate so I don't want to spark any of
that on this thread. But only doing inserts and never doing updates or
deletes performed very well for large data sets.

That said, I could explore a compromise between the monolithic table
approach and the completely normalized set of tables approach. Thanks for
your input!


On Tue, Apr 15, 2014 at 8:12 AM, Chris Curvey  wrote:

>
>
>
> On Tue, Apr 15, 2014 at 10:56 AM, Chris Curvey wrote:
>
>> On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco <
>> robert.difa...@gmail.com> wrote:
>>
>>> I have several related tables that represent a call state. Let's think
>>> of these as phone calls to simplify things. Sometimes I need to determine
>>> the last time a user was called, the last time a user answered a call, or
>>> the last time a user completed a call.
>>>
>>> The basic schema is something like this:
>>>
>>> CREATE TABLE calls (
>>>   id  BIGINT NOT NULL, // sequence generator
>>>   user_id BIGINT NOT NULL,
>>>   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>>>
>>>   PRIMARY KEY (id),
>>>   FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
>>> );
>>>
>>> CREATE TABLE calls_answered (
>>>   idBIGINT NOT NULL,
>>>   answered  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>>>
>>>   PRIMARY KEY (id),
>>>   FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE
>>> );
>>>
>>>
>>> And so on for calls_connected, calls_completed, call_errors, etc.
>>>
>>> Occasionally I will want to know things like "When was the last time a
>>> user answered a call" or "How many times has a user been called".
>>>
>>> I can do these queries using a combination of MAX or COUNT. But I'm
>>> concerned about the performance.
>>>
>>> SELECT MAX(a.id)
>>> FROM calls_answered a JOIN calls c ON c.id = a.id
>>> WHERE c.user_id = ?;
>>>
>>>
>>> Or the number of answered calls:
>>>
>>> SELECT MAX(a.id)
>>> FROM calls_answered a JOIN calls c ON c.id = a.id
>>> WHERE c.user_id = ?;
>>>
>>>
>>> Sometimes I might want to get this data for a whole bunch of users. For
>>> example, "give me all users whose have not answered a call in the last 5
>>> days." Or even "what percentage of users called actually answered a call."
>>> This approach could become a performance issue. So the other option is to
>>> create a call_summary table that is updated with triggers.
>>>
>>> The summary table would need fields like "user_id", "last_call_id",
>>> "call_count", "last_answered_id", "answered_count", "last_completed_id",
>>> "last_completed_count", etc.
>>>
>>> My only issue with a summary table is that I don't want a bunch of null
>>> fields. For example, if the user was *called* but they have never
>>> *answered* at call then the last_call_id and call_count fields on the
>>> summary table would be non-NULL but the last_answer_id and answer_count
>>> fields WOULD be NULL. But over time all fields would eventually become
>>> non-NULL.
>>>
>>> So that leads me to a summary table for EACH call state. Each summary
>>> table would have a user id, a ref_id, and a count -- one summary table for
>>> each state e.g. call_summary, call_answered_summary, etc.
>>>
>>> This approach has the down side that it creates a lot of tables and
>>> triggers. It has the upside of being pretty efficient without having to
>>> deal with NULL values.  It's also pretty easy to reason about.
>>>
>>> So for my question -- is the choice between these a personal preference
>>> sort of thing or is there a right or wrong approach? Am I missing another
>>> approach that would be better?  I'm okay with SQL but I'm not expert so I'm
>>> not sure if there is an accepted DESIGN PATTERN for this that I am missing.
>>>
>>> Thanks!
>>>
>>>
>>>
>> (Sorry, fat-fingered and hit "send too early"...)
>
> CREATE TABLE calls (
>   id  BIGINT NOT NULL, // sequence generator
>   user_id BIGINT NOT NULL,
>   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>   answered  TIMESTAMPTZ NULL,
>   connected  TIMESTAMPTZ NULL,
>   completed  TIMESTAMPTZ NULL,
>
>
>   PRIMARY KEY (id),
>   FOREIGN KEY 

Re: [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Rob Sargent

On 04/15/2014 09:53 AM, Robert DiFalco wrote:
Actually that was exactly the initial table design. There were more 
fields because for my use case there were a lot more states and 
certain states have additional data (for example when a call goes from 
answered to connected it also gets the user_id of the person being 
connected to). So that one table started getting a LOT of columns 
which starting making it hard to reason about.


The more normalized version has a couple of things going for it. 
COUNT, MIN, MAX, etc are very fast because I don't have to 
conditionally add null checks. Everything is inserted so for the 
millions of calls that get made the normalized schema was much more 
efficient for writing. It was also easier to understand. The answer 
table only has calls that were answered, the error table only has 
calls the resulted in an error after being connected, etc.


I know this kind of gets into a religious area when discussing NULLs 
and what level of normalization is appropriate so I don't want to 
spark any of that on this thread. But only doing inserts and never 
doing updates or deletes performed very well for large data sets.


That said, I could explore a compromise between the monolithic table 
approach and the completely normalized set of tables approach. Thanks 
for your input!


I wonder if the "LOT of columns" are the bits that need to be parcelled 
off as specific to one condition of a call?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [GENARAL] round() bug

2014-04-15 Thread David G Johnston
Willy-Bas Loos-3 wrote
> Hi,
> I ran into some strange behavior.
> Seems like a bug to me?
> 
> wbloos=# select round(0.5::numeric), round(0.5::double precision);
>  round | round
> ---+---
>  1 | 0
> (1 row)

Not a bug; and likely to simple to have escaped notice this long so the
first reaction should be "what am I missing here?"

[google: round .5 postgresql]

http://grokbase.com/t/postgresql/pgsql-hackers/03ap11tckn/round-function-wrong/oldest

Round( numeric ) - 0.5 rounds away from zero
Round( float ) - platform dependent, IEEE 0.5 rounds toward even

You can argue the decision but at this point it's not likely to change.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/GENARAL-round-bug-tp5800087p5800118.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [GENARAL] round() bug

2014-04-15 Thread Raymond O'Donnell
On 15/04/2014 17:20, David G Johnston wrote:
> Willy-Bas Loos-3 wrote
>> Hi, I ran into some strange behavior. Seems like a bug to me?
>> 
>> wbloos=# select round(0.5::numeric), round(0.5::double precision); 
>> round | round ---+--- 1 | 0 (1 row)
> 
> Not a bug; and likely to simple to have escaped notice this long so
> the first reaction should be "what am I missing here?"
> 
> [google: round .5 postgresql]
> 
> http://grokbase.com/t/postgresql/pgsql-hackers/03ap11tckn/round-function-wrong/oldest
>
>  Round( numeric ) - 0.5 rounds away from zero Round( float ) -
> platform dependent, IEEE 0.5 rounds toward even
> 
> You can argue the decision but at this point it's not likely to
> change.

Interestingly, I get different results (on both 9.1.4 and 9.3.0) on Windows:

postgres=# select round(0.5::numeric), round(0.5::double precision);
 round | round
---+---
 1 | 1
(1 row)


postgres=# select version();
   version
-
 PostgreSQL 9.1.4, compiled by Visual C++ build 1500, 64-bit
(1 row)


 Same on 9.3.0.

Ray.

-- 
Raymond O'Donnell :: Galway :: Ireland
r...@iol.ie


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [GENARAL] round() bug

2014-04-15 Thread David G Johnston
Raymond O'Donnell wrote
> On 15/04/2014 17:20, David G Johnston wrote:
>> Willy-Bas Loos-3 wrote
>>> Hi, I ran into some strange behavior. Seems like a bug to me?
>>> 
>>> wbloos=# select round(0.5::numeric), round(0.5::double precision); 
>>> round | round ---+--- 1 | 0 (1 row)
>> 
>> Not a bug; and likely to simple to have escaped notice this long so
>> the first reaction should be "what am I missing here?"
>> 
>> [google: round .5 postgresql]
>> 
>> http://grokbase.com/t/postgresql/pgsql-hackers/03ap11tckn/round-function-wrong/oldest
>>
>>  Round( numeric ) - 0.5 rounds away from zero Round( float ) -
>> platform dependent, IEEE 0.5 rounds toward even
>> 
>> You can argue the decision but at this point it's not likely to
>> change.
> 
> Interestingly, I get different results (on both 9.1.4 and 9.3.0) on
> Windows:
> 
> postgres=# select round(0.5::numeric), round(0.5::double precision);
>  round | round
> ---+---
>  1 | 1
> (1 row)
> 
> 
> postgres=# select version();
>version
> -
>  PostgreSQL 9.1.4, compiled by Visual C++ build 1500, 64-bit
> (1 row)
> 
> 
>  Same on 9.3.0.

I'm not particularly surprised that Windows is not being IEEE compliant, and
instead chooses the more common round-away-from-zero behavior, here though I
am unsure where the dependent implementation would end up existing.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/GENARAL-round-bug-tp5800087p5800121.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [GENARAL] round() bug

2014-04-15 Thread Raymond O'Donnell
On 15/04/2014 17:34, David G Johnston wrote:
> Raymond O'Donnell wrote
>> On 15/04/2014 17:20, David G Johnston wrote:
>>> Willy-Bas Loos-3 wrote
 Hi, I ran into some strange behavior. Seems like a bug to me?

 wbloos=# select round(0.5::numeric), round(0.5::double precision); 
 round | round ---+--- 1 | 0 (1 row)
>>>
>>> Not a bug; and likely to simple to have escaped notice this long so
>>> the first reaction should be "what am I missing here?"
>>>
>>> [google: round .5 postgresql]
>>>
>>> http://grokbase.com/t/postgresql/pgsql-hackers/03ap11tckn/round-function-wrong/oldest
>>>
>>>  Round( numeric ) - 0.5 rounds away from zero Round( float ) -
>>> platform dependent, IEEE 0.5 rounds toward even
>>>
>>> You can argue the decision but at this point it's not likely to
>>> change.
>>
>> Interestingly, I get different results (on both 9.1.4 and 9.3.0) on
>> Windows:
>>
>> postgres=# select round(0.5::numeric), round(0.5::double precision);
>>  round | round
>> ---+---
>>  1 | 1
>> (1 row)
>>
>>
>> postgres=# select version();
>>version
>> -
>>  PostgreSQL 9.1.4, compiled by Visual C++ build 1500, 64-bit
>> (1 row)
>>
>>
>>  Same on 9.3.0.
> 
> I'm not particularly surprised that Windows is not being IEEE compliant, and
> instead chooses the more common round-away-from-zero behavior, here though I
> am unsure where the dependent implementation would end up existing.

Oh, so does the rounding code use OS facilities, then, rather than being
implemented in Postgres? - Didn't know that, though I was aware PG does
that in other areas (collation, for example).

Ray.


-- 
Raymond O'Donnell :: Galway :: Ireland
r...@iol.ie


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Roxanne Reid-Bennett

On 4/14/2014 12:27 PM, Robert DiFalco wrote:

And so on for calls_connected, calls_completed, call_errors, etc.

Occasionally I will want to know things like "When was the last time a 
user answered a call" or "How many times has a user been called".

...
Sometimes I might want to get this data for a whole bunch of users.
...
So the other option is to create a call_summary table that is updated 
with triggers.

...
My only issue with a summary table is that I don't want a bunch of 
null fields.

...
But over time all fields would eventually become non-NULL.

So that leads me to a summary table for EACH call state. This approach 
has the down side that it creates a lot of tables and triggers. It has 
the upside of being pretty efficient without having to deal with NULL 
values.  It's also pretty easy to reason about.

...
So for my question -- is the choice between these a personal 
preference sort of thing or is there a right or wrong approach? Am I 
missing another approach that would be better?  I'm okay with SQL but 
I'm not expert so I'm not sure if there is an accepted DESIGN PATTERN 
for this that I am missing.


There is no right or wrong - there is better, worse, best, and worst for 
any specific scenario.  In my experience, most people have time/money to 
get to an 80% "better" design than all the other answers during design 
and then it gets refined over time.  And yes, personal experience does 
play a part in how people interpret better/worse [aka religion] ;)


I didn't see anybody ask these questions - and to identify "better" -  
they have to be asked.

1. How much data are you feeding into your system how fast?
   this directly affects your choices on distribution, parallel 
processing...  writes vs updates vs triggers for copying vs all reads
   [and if on bare metal - potentially where you place your logs, 
indexes, core lookup tables, etc]

2. How much data are you reading out of your system - how fast?
   you have given "simple" use cases (how many calls completed 
within a time frame or to a number)
   you have given very slightly more complex use cases (when was 
the last time John answered a call)
   you have given a slightly more bulky processing question of (how 
many times have these users been called)

So...
  a) How many users executing read queries do you have?
  b) What is the expected load for simple queries (per 
week/day/hour/minute - depending upon your resolution on speed)

  c) What is the expected load for your mid-line complex queries
  d) What is the "maximum" volume you expect a bulk query to go 
after (like all users in the last 18 years, or this city's users in the 
last day?)  and how frequently will that kind of query be executed?  How 
much tolerance for delay do your users have?
  e) do you have any known really complex queries that might bog 
the system down?

  f) How much lag time can you afford between capture and reporting?

Answers to the above define your performance requirements - which 
defines the style of schema you need.  Queries can be written to pull 
data from any schema design - but how fast they can perform or how 
easily they can be created...


Chris and Vincent both targeted a balance between writes and reads - 
which adequately answers 80-85% of the usages out there.  But you didn't 
give us any of the above - so their recommendation (while very likely 
valid) may not actually fit your case at all.


As to design patterns -
"Generally" a database schema is more normalized for an operational 
system because normalization results in fewer writes/updates and lowers 
the risk of corruption if a failure takes place. It also isolates 
updates for any specific value to one location minimizing internally 
caused data corruption.
Reporting systems are generally less normalized because writes are more 
one-time and reads are where the load occurs.
Sometimes you have to use data replication to have a system that 
appropriately supports both.


you have shown you are already aware of normalization.
If you weren't aware of approaches to Data Warehousing... you can review 
information about how it is accomplished
- see the blogs on kimballgroup DOT com  they cover a lot of high(er) 
level concepts with enough specificity to be of some direct use.
[that website also covers some ideas for "Big Data" which aren't 
necessarily limited to RDBMS']


Specify your performance requirements, then figure out your schema design.

FWIW I don't understand your (or any other person's) hesitancy for "lots 
of" "NULL" values.  They provide meaning in a number of different 
ways... not the least of which is that you don't know (yet) - which is 
knowledge in and of itself.


Roxanne



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [GENARAL] round() bug

2014-04-15 Thread Adrian Klaver

On 04/15/2014 10:21 AM, Raymond O'Donnell wrote:

On 15/04/2014 17:34, David G Johnston wrote:




Oh, so does the rounding code use OS facilities, then, rather than being
implemented in Postgres? - Didn't know that, though I was aware PG does
that in other areas (collation, for example).


See the thread below for that discussion:

http://www.postgresql.org/message-id/34608c0c0906150358xbc53de2pe0c7053779e7b...@mail.gmail.com




Ray.





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Robert DiFalco
1. >500K rows per day into the calls table.
2. Very rarely. The only common query is gathering users that have not been
called "today" (along with some other qualifying criteria). More analytical
queries/reports are done for internal use and it is not essential that they
be lickity-split.
a. Usually just one connection at a time executes read queries.
b. the users not called today query will be done once a day.
c. Daily
d. All users for the last year (if you are asking about retention). We will
also rarely have to run for all time.
e. Not that I'm aware of (or seen) today.
f. For the simple queries we cannot afford latency between calls and
querying who was already called.

While I don't seem to be getting much support for it here :D my write
performance (which is most essential) has been much better since I further
normalized the tables and made it so that NULL is never used and data is
never updated (i.e. it is immutable once it is written).

As for wanting to avoid NULLs I don't really know what to say. Obviously
some times NULL's are required. For this design I don't really need them
and they make the data harder to reason about (because they are kind of
open to interpretation).  They can also give you different results than you
sometimes expect (for example when looking for a non matching key, you
start having to inject some OR IS NULLs and such). Also, the absence of
null can make a lot of queries more optimal). That said, I understand where
you all are coming from with de-normalization. It's definitely the path of
the least resistance. Our instinct is to want to see all related data in a
single table when possible.

The summary table was really a separate point from whether or not people
liked my schema or not -- I mean whether I de-normalize as people are
asking or not, there would still be the question of a summary table for MAX
and COUNT queries or to not have a summary table for those. I probably made
the original question too open ended.



On Tue, Apr 15, 2014 at 3:26 PM, Roxanne Reid-Bennett wrote:

> On 4/14/2014 12:27 PM, Robert DiFalco wrote:
>
>> And so on for calls_connected, calls_completed, call_errors, etc.
>>
>> Occasionally I will want to know things like "When was the last time a
>> user answered a call" or "How many times has a user been called".
>> ...
>>
>> Sometimes I might want to get this data for a whole bunch of users.
>> ...
>>
>> So the other option is to create a call_summary table that is updated
>> with triggers.
>> ...
>>
>> My only issue with a summary table is that I don't want a bunch of null
>> fields.
>> ...
>>
>> But over time all fields would eventually become non-NULL.
>>
>> So that leads me to a summary table for EACH call state. This approach
>> has the down side that it creates a lot of tables and triggers. It has the
>> upside of being pretty efficient without having to deal with NULL values.
>>  It's also pretty easy to reason about.
>> ...
>>
>> So for my question -- is the choice between these a personal preference
>> sort of thing or is there a right or wrong approach? Am I missing another
>> approach that would be better?  I'm okay with SQL but I'm not expert so I'm
>> not sure if there is an accepted DESIGN PATTERN for this that I am missing.
>>
>>  There is no right or wrong - there is better, worse, best, and worst for
> any specific scenario.  In my experience, most people have time/money to
> get to an 80% "better" design than all the other answers during design and
> then it gets refined over time.  And yes, personal experience does play a
> part in how people interpret better/worse [aka religion] ;)
>
> I didn't see anybody ask these questions - and to identify "better" -
>  they have to be asked.
> 1. How much data are you feeding into your system how fast?
>this directly affects your choices on distribution, parallel
> processing...  writes vs updates vs triggers for copying vs all reads
>[and if on bare metal - potentially where you place your logs,
> indexes, core lookup tables, etc]
> 2. How much data are you reading out of your system - how fast?
>you have given "simple" use cases (how many calls completed within
> a time frame or to a number)
>you have given very slightly more complex use cases (when was the
> last time John answered a call)
>you have given a slightly more bulky processing question of (how
> many times have these users been called)
> So...
>   a) How many users executing read queries do you have?
>   b) What is the expected load for simple queries (per
> week/day/hour/minute - depending upon your resolution on speed)
>   c) What is the expected load for your mid-line complex queries
>   d) What is the "maximum" volume you expect a bulk query to go after
> (like all users in the last 18 years, or this city's users in the last
> day?)  and how frequently will that kind of query be executed?  How much
> tolerance for delay do your users have?
>   e) do you have any

Re: [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Gavin Flower

On 16/04/14 13:10, Robert DiFalco wrote:

1. >500K rows per day into the calls table.
2. Very rarely. The only common query is gathering users that have not 
been called "today" (along with some other qualifying criteria). More 
analytical queries/reports are done for internal use and it is not 
essential that they be lickity-split.

a. Usually just one connection at a time executes read queries.
b. the users not called today query will be done once a day.
c. Daily
d. All users for the last year (if you are asking about retention). We 
will also rarely have to run for all time.

e. Not that I'm aware of (or seen) today.
f. For the simple queries we cannot afford latency between calls and 
querying who was already called.


While I don't seem to be getting much support for it here :D my write 
performance (which is most essential) has been much better since I 
further normalized the tables and made it so that NULL is never used 
and data is never updated (i.e. it is immutable once it is written).


As for wanting to avoid NULLs I don't really know what to say. 
Obviously some times NULL's are required. For this design I don't 
really need them and they make the data harder to reason about 
(because they are kind of open to interpretation).  They can also give 
you different results than you sometimes expect (for example when 
looking for a non matching key, you start having to inject some OR IS 
NULLs and such). Also, the absence of null can make a lot of queries 
more optimal). That said, I understand where you all are coming from 
with de-normalization. It's definitely the path of the least 
resistance. Our instinct is to want to see all related data in a 
single table when possible.


The summary table was really a separate point from whether or not 
people liked my schema or not -- I mean whether I de-normalize as 
people are asking or not, there would still be the question of a 
summary table for MAX and COUNT queries or to not have a summary table 
for those. I probably made the original question too open ended.



On Tue, Apr 15, 2014 at 3:26 PM, Roxanne Reid-Bennett > wrote:


On 4/14/2014 12:27 PM, Robert DiFalco wrote:

And so on for calls_connected, calls_completed, call_errors, etc.

Occasionally I will want to know things like "When was the
last time a user answered a call" or "How many times has a
user been called".
...

Sometimes I might want to get this data for a whole bunch of
users.
...

So the other option is to create a call_summary table that is
updated with triggers.
...

My only issue with a summary table is that I don't want a
bunch of null fields.
...

But over time all fields would eventually become non-NULL.

So that leads me to a summary table for EACH call state. This
approach has the down side that it creates a lot of tables and
triggers. It has the upside of being pretty efficient without
having to deal with NULL values.  It's also pretty easy to
reason about.
...

So for my question -- is the choice between these a personal
preference sort of thing or is there a right or wrong
approach? Am I missing another approach that would be better?
 I'm okay with SQL but I'm not expert so I'm not sure if there
is an accepted DESIGN PATTERN for this that I am missing.

There is no right or wrong - there is better, worse, best, and
worst for any specific scenario.  In my experience, most people
have time/money to get to an 80% "better" design than all the
other answers during design and then it gets refined over time.
 And yes, personal experience does play a part in how people
interpret better/worse [aka religion] ;)

I didn't see anybody ask these questions - and to identify
"better" -  they have to be asked.
1. How much data are you feeding into your system how fast?
   this directly affects your choices on distribution,
parallel processing...  writes vs updates vs triggers for copying
vs all reads
   [and if on bare metal - potentially where you place your
logs, indexes, core lookup tables, etc]
2. How much data are you reading out of your system - how fast?
   you have given "simple" use cases (how many calls completed
within a time frame or to a number)
   you have given very slightly more complex use cases (when
was the last time John answered a call)
   you have given a slightly more bulky processing question of
(how many times have these users been called)
So...
  a) How many users executing read queries do you have?
  b) What is the expected load for simple queries (per
week/day/hour/minute - depending upon your resolution on speed)
  c) What is the expected load for your mid-line complex queries
  d) W

Re: [GENERAL] Non-deterministic 100% CPU hang on postgres 9.3

2014-04-15 Thread Fenn Bailey
Some more information on this:

>
> > I say "hang", where really I've given up after ~12 hours execution. The
> > exact same query can then be terminated and run in <90 seconds, with none
> > of the underlying data changing.
>
> But does the plan change?
>
>
As it turns out it does. Some further examination shows that the plan
changes, at least partially due to the fact that I now see this query
attempts to modify the data that it's querying (ie: if any updates have
occurred, the subsequent query is not the same).

That said, all queries were performed in a transaction, so when they're
aborted, no modification should be made to the underlying data.

This makes it even more odd that on a subsequent run of the same query
(after the rollback) it will complete in ~140 seconds (vs the several hours
it was taking on the first run).

Is it possible that the initial run is changing the statistics available on
the table, hence providing a different query plan (even on the same data)?

Query plan on first/second run follows -

First run (failure):
---
 Update on ad_events e  (cost=1426714795.64..1426749795.64 rows=100
width=237)
   ->  Hash Join  (cost=1426714795.64..1426749795.64 rows=100 width=237)
 Hash Cond: (x.ad_event_id = e.ad_event_id)
 ->  Subquery Scan on x  (cost=1426638856.33..1426651356.33
rows=100 width=144)
   ->  Sort  (cost=1426638856.33..1426641356.33 rows=100
width=108)
 Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"
 ->  Nested Loop Left Join
 (cost=12046.09..1426539198.49 rows=100 width=108)
   Join Filter: ((t."timestamp" >= s.session_start)
AND (t."timestamp" <= s.session_end) AND ((s.ad_user_id = u.ad_user_id) OR
(s.ad_user_id = u.merged_id)))
   ->  Hash Join  (cost=12046.09..203878.09
rows=100 width=92)
 Hash Cond: (t.user_id = u.orig_distinct_id)
 ->  Seq Scan on ad_events_mv t
 (cost=0.00..173082.00 rows=100 width=112)
   Filter: (NOT processed)
 ->  Hash  (cost=7932.15..7932.15
rows=329115 width=67)
   ->  Seq Scan on ad_users u
 (cost=0.00..7932.15 rows=329115 width=67)
   ->  Materialize  (cost=0.00..5478.88 rows=63392
width=32)
 ->  Seq Scan on ad_sessions s
 (cost=0.00..5161.92 rows=63392 width=32)
 ->  Hash  (cost=50938.58..50938.58 rows=258 width=101)
   ->  Seq Scan on ad_events e  (cost=0.00..50938.58
rows=258 width=101)

Second run (completed in ~148 seconds):
--
 Update on ad_events e  (cost=39730372.76..39765372.76 rows=100
width=237)
   ->  Hash Join  (cost=39730372.76..39765372.76 rows=100 width=237)
 Hash Cond: (x.ad_event_id = e.ad_event_id)
 ->  Subquery Scan on x  (cost=39654433.45..39666933.45
rows=100 width=144)
   ->  Sort  (cost=39654433.45..39656933.45 rows=100
width=108)
 Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"
 ->  Nested Loop Left Join  (cost=12054.20..39554775.61
rows=100 width=108)
   Join Filter: ((t."timestamp" >= s.session_start)
AND (t."timestamp" <= s.session_end))
   ->  Hash Join  (cost=12046.09..203878.09
rows=100 width=92)
 Hash Cond: (t.user_id = u.orig_distinct_id)
 ->  Seq Scan on ad_events_mv t
 (cost=0.00..173082.00 rows=100 width=112)
   Filter: (NOT processed)
 ->  Hash  (cost=7932.15..7932.15
rows=329115 width=67)
   ->  Seq Scan on ad_users u
 (cost=0.00..7932.15 rows=329115 width=67)
   ->  Bitmap Heap Scan on ad_sessions s
 (cost=8.11..39.22 rows=8 width=32)
 Recheck Cond: ((ad_user_id = u.ad_user_id)
OR (ad_user_id = u.merged_id))
 ->  BitmapOr  (cost=8.11..8.11 rows=8
width=0)
   ->  Bitmap Index Scan on
ad_sessions_ad_user_id_idx  (cost=0.00..4.05 rows=4 width=0)
 Index Cond: (ad_user_id =
u.ad_user_id)
   ->  Bitmap Index Scan on
ad_sessions_ad_user_id_idx  (cost=0.00..4.05 rows=4 width=0)
 Index Cond: (ad_user_id =
u.merged_id)
 ->  Hash  (cost=50938.58..50938.58 rows=258 width=101)
   ->  Seq Scan on ad_events e  (cost=0.00..50938.58
rows=258 width=101)
---

>From a quick look, the plans diverge at:
->  Seq Scan on ad_sessions s  (cost=0.00..5161.92 rows=63392 width=32)
vs
->  Bitmap H

[GENERAL] JSONB in-place updates?

2014-04-15 Thread kelas
Are there any plans to add "in-place at-depth" update operator for JSONB
type, e.g.:

UPDATE test SET attrs->'anwser' = 42 where attrs->'answer' = 41


Re: [GENERAL] Need some help in postgres locking mechanism

2014-04-15 Thread santhosh kumar
Hi
I am locking some rows using select for update(nowait=true) inside a
transaction.If server in which code exists crashes,locks will be on
rows.Those should be removed manually.I donot want that.I want to keep a
timeout for that transaction.Could you please helpme out with a solution
for this issue.i did not get a proper solution from internet.


On Tue, Apr 8, 2014 at 8:58 PM, Hannes Erven  wrote:

> Hi,
>
>
> On 2014-04-08 15:27, Vick Khera wrote:
> [...]
>
>  Problem 2: you cannot have a second process skip over locked rows.
>>
>
>
> In fact, you can: use "FOR UPDATE NOWAIT" and catch any errors.
>
> e.g. SELECT * FROM jobqueue WHERE id=? FOR UPDATE NOWAIT
>
>
>
> -hannes
>



-- 
K.Santhosh Kumar


Re: [GENERAL] Trouble installing Slony 2.0

2014-04-15 Thread Alberto Cabello Sánchez
On Tue, 15 Apr 2014 22:07:09 +
Samir Parikh  wrote:

> Thanks Alberto.  I could figure out that and install the software 
> successfully.
> Is there any nice document you can suggest for configuring slony (set up
> replication, sets,nodes etc...).

Hi, Samir.

There are many. You should begin with the official docs, esp. chapters 1, 2 and 
3:
http://slony.info/documentation/2.0/index.html

When you are familiar with the terminology you can check some recipes elsewhere
to get an outlook of the tasks involved:

http://raghavt.blogspot.com.es/2012/07/simple-slony-i-replication-setup.html
http://www.howtoforge.com/configuring-slony-i-cascading-replication-on-postgresql-8.3

But Slony-I documentation is still the best resource.

-- 
Alberto Cabello Sánchez



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general