Re: [SQL] HOw to convert unicode to string

2012-10-01 Thread Jasen Betts
On 2012-09-23, Abhijit Prusty -X (abprusty - UST Global at Cisco) 
 wrote:
> --_000_8A2A33BFAA5E2F408D0BBB80844412720487D0xmbalnx03ciscocom_
> Content-Type: text/plain; charset="us-ascii"
> Content-Transfer-Encoding: quoted-printable
>
> Hi,
>
> I have a query in oracle like this mentioned below
>
> Insert into TEST
>(TEMPLATE_ID, TEMPLATE_NAME, CREATED_BY, CREATED_DT, UPDATED_BY,
> UPDATED_DT, TEMPLATE_KEY)
> Values
>(1, UNISTR('\D3C9\BA85\B3C4 \B514\C2A4\D50C\B808\C774'), 'dmin', SYSDATE=
> , 'admin',
> SYSDATE ,'FLOOR');
>
> Now the oracle uses the UNISTR function to convert and insert the Unicode to
> string and store in database.

oracle uNISTR-like UTF-16 can be written like this:

 U&'\D3C9\BA85\B3C4 \B514\C2A4\D50C\B808\C774'

it's not a function, it a way of writing strings... if you need a
it probably wouldn't be hard to write.

but you can also write in UTF-8 (literal or escaped) or unicode escaped
see docs:

u&'\+021502'  -- unicode
u&'\D845\DD02'-- utf16  (docs tell methis is legal with recent versions)
e'\xF0\xA1\x94\x82'   -- utf8 hex escape
e'\360\241\224\202'   -- utf8 octal escape
'𡔂'  -- utf8 string literal

the first 2 can be intermixed as can the last three forms.

http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html

select length('𡔂'), octet_length( '𡔂' ), length('test'),
octet_length('test');

 length | octet_length | length | octet_length 
+--++--
  1 |4 |  4 |4
   
-- 
⚂⚃ 100% natural



-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Reuse temporary calculation results in an SQL update query

2012-10-01 Thread Jasen Betts
On 2012-09-29, Matthias Nagel  wrote:
> Hello,
>
> is there any way how one can store the result of a time-consuming calculation 
> if this result is needed more than once in an SQL update query? This solution 
> might be PostgreSQL specific and not standard SQL compliant. Here is an 
> example of what I want:
>
> UPDATE table1 SET
>StartTime = 'time consuming calculation 1',
>StopTime = 'time consuming calculation 2',
>Duration = 'time consuming calculation 2' - 'time consuming calculation 1'
> WHERE foo;
>
> It would be nice, if I could use the "new" start and stop time to calculate 
> the duration time. First of all it would make the SQL statement faster and 
> secondly much more cleaner and easily to understand.
>
> Best regards, Matthias


use a CTE.
http://www.postgresql.org/docs/9.1/static/queries-with.html


with a as (
   select 'time consuming calculation 1' as tcc1  
, 'time consuming calculation 2' as tcc2
)
update table1
SET StartTime = a.tcc1 
 StopTime = a.tcc2 
Duration =  a.tcc2 - a.tcc1 
WHERE foo;

you man need to move foo into the CTE too.


-- 
⚂⚃ 100% natural



-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Reuse temporary calculation results in an SQL update query

2012-10-01 Thread Thomas Kellerer

Matthias Nagel wrote on 29.09.2012 12:49:

Hello,

is there any way how one can store the result of a time-consuming calculation 
if this result is needed more
than once in an SQL update query? This solution might be PostgreSQL specific 
and not standard SQL compliant.
 Here is an example of what I want:

UPDATE table1 SET
StartTime = 'time consuming calculation 1',
StopTime = 'time consuming calculation 2',
Duration = 'time consuming calculation 2' - 'time consuming calculation 1'
WHERE foo;

It would be nice, if I could use the "new" start and stop time to calculate the 
duration time.
First of all it would make the SQL statement faster and secondly much more 
cleaner and easily to understand.



Something like:

with my_calc as (
select pk,
   time_consuming_calculation_1 as calc1,
   time_consuming_calculation_2 as calc2
from foo
)
update foo
  set startTime = my_calc.calc1,
  stopTime = my_calc.calc2,
  duration = my_calc.calc2 - calc1
where foo.pk = my_calc.pk;

http://www.postgresql.org/docs/current/static/queries-with.html#QUERIES-WITH-MODIFYING




--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] [noob] How to optimize this double pivot query?

2012-10-01 Thread Robert Buck
I have two tables that contain key-value data that I want to combine in
pivoted form into a single result set. They are related to two separate
tables.

The tables are: test_results, test_variables, metric_def, metadata_key. The
latter two tables are enum-like tables, basic descriptors of data stored in
other tables. The former two tables are basically key-value tables (with
ids as well); these k-v tables are related to the latter two tables via
foreign keys.

The following SQL takes about 11 seconds to run on a high-end laptop. The
largest table is about 54k records, pretty puny.

Can someone provide a hint as to why this is so slow? Again, I am a noob to
SQL, so the SQL is probably poorly written.

Thanks in advance,

Bob

select

t.id_name,
max(t.begin_time) as begin_time,
max(t.end_time) as end_time,

max(case when (m.id_name = 'package-version') then v.value end) as
package_version,
max(case when (m.id_name = 'database-vendor') then v.value end) as
database_vendor,
max(case when (m.id_name = 'bean-name') then v.value end) as bean_name,
max(case when (m.id_name = 'request-distribution') then v.value end) as
request_distribution,
max(case when (m.id_name = 'ycsb-workload') then v.value end) as
ycsb_workload,
max(case when (m.id_name = 'record-count') then v.value end) as
record_count,
max(case when (m.id_name = 'transaction-engine-count') then v.value
end) as transaction_engine_count,
max(case when (m.id_name = 'transaction-engine-maxmem') then v.value
end) as transaction_engine_maxmem,
max(case when (m.id_name = 'storage-manager-count') then v.value end)
as storage_manager_count,
max(case when (m.id_name = 'test-instance-count') then v.value end) as
test_instance_count,
max(case when (m.id_name = 'operation-count') then v.value end) as
operation_count,
max(case when (m.id_name = 'update-percent') then v.value end) as
update_percent,
max(case when (m.id_name = 'thread-count') then v.value end) as
thread_count,

max(case when (d.id_name = 'tps') then r.value end) as tps,
max(case when (d.id_name = 'Memory') then r.value end) as memory,
max(case when (d.id_name = 'DiskWritten') then r.value end) as
disk_written,
max(case when (d.id_name = 'PercentUserTime') then r.value end) as
percent_user,
max(case when (d.id_name = 'PercentCpuTime') then r.value end) as
percent_cpu,
max(case when (d.id_name = 'UserMilliseconds') then r.value end) as
user_milliseconds,
max(case when (d.id_name = 'YcsbUpdateLatencyMicrosecs') then r.value
end) as update_latency,
max(case when (d.id_name = 'YcsbReadLatencyMicrosecs') then r.value
end) as read_latency,
max(case when (d.id_name = 'Updates') then r.value end) as updates,
max(case when (d.id_name = 'Deletes') then r.value end) as deletes,
max(case when (d.id_name = 'Inserts') then r.value end) as inserts,
max(case when (d.id_name = 'Commits') then r.value end) as commits,
max(case when (d.id_name = 'Rollbacks') then r.value end) as rollbacks,
max(case when (d.id_name = 'Objects') then r.value end) as objects,
max(case when (d.id_name = 'ObjectsCreated') then r.value end) as
objects_created,
max(case when (d.id_name = 'FlowStalls') then r.value end) as
flow_stalls,
max(case when (d.id_name = 'NodeApplyPingTime') then r.value end) as
node_apply_ping_time,
max(case when (d.id_name = 'NodePingTime') then r.value end) as
node_ping_time,
max(case when (d.id_name = 'ClientCncts') then r.value end) as
client_connections,
max(case when (d.id_name = 'YcsbSuccessCount') then r.value end) as
success_count,
max(case when (d.id_name = 'YcsbWarnCount') then r.value end) as
warn_count,
max(case when (d.id_name = 'YcsbFailCount') then r.value end) as
fail_count

from test as t

left join test_results as r on r.test_id = t.id
left join test_variables as v on v.test_id = t.id
left join metric_def as d on d.id = r.metric_def_id
left join metadata_key as m on m.id = v.metadata_key_id

group by t.id_name

;

"GroupAggregate  (cost=5.87..225516.43 rows=926 width=81)"
"  ->  Nested Loop Left Join  (cost=5.87..53781.24 rows=940964 width=81)"
"->  Nested Loop Left Join  (cost=1.65..1619.61 rows=17235
width=61)"
"  ->  Index Scan using test_uc on test t  (cost=0.00..90.06
rows=926 width=36)"
"  ->  Hash Right Join  (cost=1.65..3.11 rows=19 width=29)"
"Hash Cond: (m.id = v.metadata_key_id)"
"->  Seq Scan on metadata_key m  (cost=0.00..1.24
rows=24 width=21)"
"->  Hash  (cost=1.41..1.41 rows=19 width=16)"
"  ->  Index Scan using test_variables_test_id_idx
on test_variables v  (cost=0.00..1.41 rows=19 width=16)"
"Index Cond: (test_id = t.id)"
"->  Hash Right Join  (cost=4.22..6.69 rows=55 width=28)"
"  Hash Cond: (d.id = r.metric_def_id)"
"  ->  Seq Scan on metric

Re: [SQL] [noob] How to optimize this double pivot query?

2012-10-01 Thread David Johnston
From: [email protected] [mailto:[email protected]] On 
Behalf Of Robert Buck
Sent: Monday, October 01, 2012 8:47 PM
To: [email protected]
Subject: [SQL] [noob] How to optimize this double pivot query?

 

I have two tables that contain key-value data that I want to combine in pivoted 
form into a single result set. They are related to two separate tables.

The tables are: test_results, test_variables, metric_def, metadata_key. The 
latter two tables are enum-like tables, basic descriptors of data stored in 
other tables. The former two tables are basically key-value tables (with ids as 
well); these k-v tables are related to the latter two tables via foreign keys.

The following SQL takes about 11 seconds to run on a high-end laptop. The 
largest table is about 54k records, pretty puny.

Can someone provide a hint as to why this is so slow? Again, I am a noob to 
SQL, so the SQL is probably poorly written.



 

Your query, while maybe not great, isn’t the cause of your problem.  It is the 
table schema, specifically the “key-value” aspect, that is killing you.

 

You may want to try:

 

SELECT *

FROM (SELECT id FROM …) id_master

NATURAL LEFT JOIN (SELECT id, field_value AS … FROM … WHERE fieldtype = ‘’) f1

NATURAL LEFT JOIN (SELECT id, field_value AS … FROM … WHERE fieldtype = ‘’) f2

[repeat one left join for every field; though you will then need to decide 
if/how to deal with NULL – not that you are currently doing anything special 
anyway…]

 

Mainly the above avoids the use of “max()” and instead uses direct joins 
between the relevant tables.  I have no clue whether that will improve things 
but if you are going to lie in this bed you should at least try different 
positions.

 

The better option is to educate yourself on better ways of constructing the 
tables so that you do not have to write this kind of god-awful query.  In some 
cases key-value has merit but usually only when done in moderation.  Not for 
the entire database.  You likely should simply have a table that looks like the 
result of the query below.

 

As a second (not necessarily mutually exclusive) alternative: install and use 
the hstore extension.

 

David J.

 


Thanks in advance,

Bob

select

t.id_name,
max(t.begin_time) as begin_time,
max(t.end_time) as end_time,

max(case when (m.id_name = 'package-version') then v.value end) as 
package_version,
max(case when (m.id_name = 'database-vendor') then v.value end) as 
database_vendor,
max(case when (m.id_name = 'bean-name') then v.value end) as bean_name,
max(case when (m.id_name = 'request-distribution') then v.value end) as 
request_distribution,
max(case when (m.id_name = 'ycsb-workload') then v.value end) as 
ycsb_workload,
max(case when (m.id_name = 'record-count') then v.value end) as 
record_count,
max(case when (m.id_name = 'transaction-engine-count') then v.value end) as 
transaction_engine_count,
max(case when (m.id_name = 'transaction-engine-maxmem') then v.value end) 
as transaction_engine_maxmem,
max(case when (m.id_name = 'storage-manager-count') then v.value end) as 
storage_manager_count,
max(case when (m.id_name = 'test-instance-count') then v.value end) as 
test_instance_count,
max(case when (m.id_name = 'operation-count') then v.value end) as 
operation_count,
max(case when (m.id_name = 'update-percent') then v.value end) as 
update_percent,
max(case when (m.id_name = 'thread-count') then v.value end) as 
thread_count,

max(case when (d.id_name = 'tps') then r.value end) as tps,
max(case when (d.id_name = 'Memory') then r.value end) as memory,
max(case when (d.id_name = 'DiskWritten') then r.value end) as disk_written,
max(case when (d.id_name = 'PercentUserTime') then r.value end) as 
percent_user,
max(case when (d.id_name = 'PercentCpuTime') then r.value end) as 
percent_cpu,
max(case when (d.id_name = 'UserMilliseconds') then r.value end) as 
user_milliseconds,
max(case when (d.id_name = 'YcsbUpdateLatencyMicrosecs') then r.value end) 
as update_latency,
max(case when (d.id_name = 'YcsbReadLatencyMicrosecs') then r.value end) as 
read_latency,
max(case when (d.id_name = 'Updates') then r.value end) as updates,
max(case when (d.id_name = 'Deletes') then r.value end) as deletes,
max(case when (d.id_name = 'Inserts') then r.value end) as inserts,
max(case when (d.id_name = 'Commits') then r.value end) as commits,
max(case when (d.id_name = 'Rollbacks') then r.value end) as rollbacks,
max(case when (d.id_name = 'Objects') then r.value end) as objects,
max(case when (d.id_name = 'ObjectsCreated') then r.value end) as 
objects_created,
max(case when (d.id_name = 'FlowStalls') then r.value end) as flow_stalls,
max(case when (d.id_name = 'NodeApplyPingTime') then r.value end) as 
node_apply_ping_time,
max(case when (d.id_name = 'NodePingTime') then r.value end) as 
node_ping_time,
 

Re: [SQL] [noob] How to optimize this double pivot query?

2012-10-01 Thread Robert Buck
So as you can probably glean, the tables store performance metric data. The
reason I chose to use k-v is simply to avoid having to create an additional
column every time a new metric type come along. So those were the two
options I thought of, straight k-v and column for every value type.

Are there other better options worth considering that you could point me
towards that supports storing metrics viz. with an unbounded number of
metric types in my case?

Bob

On Mon, Oct 1, 2012 at 9:07 PM, David Johnston  wrote:

> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Robert Buck
> *Sent:* Monday, October 01, 2012 8:47 PM
> *To:* [email protected]
> *Subject:* [SQL] [noob] How to optimize this double pivot query?
>
> ** **
>
> I have two tables that contain key-value data that I want to combine in
> pivoted form into a single result set. They are related to two separate
> tables.
>
> The tables are: test_results, test_variables, metric_def, metadata_key.
> The latter two tables are enum-like tables, basic descriptors of data
> stored in other tables. The former two tables are basically key-value
> tables (with ids as well); these k-v tables are related to the latter two
> tables via foreign keys.
>
> The following SQL takes about 11 seconds to run on a high-end laptop. The
> largest table is about 54k records, pretty puny.
>
> Can someone provide a hint as to why this is so slow? Again, I am a noob
> to SQL, so the SQL is probably poorly written.
>
> 
>
> ** **
>
> Your query, while maybe not great, isn’t the cause of your problem.  It is
> the table schema, specifically the “key-value” aspect, that is killing you.
> 
>
> ** **
>
> You may want to try:
>
> ** **
>
> SELECT *
>
> FROM (SELECT id FROM …) id_master
>
> NATURAL LEFT JOIN (SELECT id, field_value AS … FROM … WHERE fieldtype =
> ‘’) f1
>
> NATURAL LEFT JOIN (SELECT id, field_value AS … FROM … WHERE fieldtype =
> ‘’) f2
>
> [repeat one left join for every field; though you will then need to decide
> if/how to deal with NULL – not that you are currently doing anything
> special anyway…]
>
> ** **
>
> Mainly the above avoids the use of “max()” and instead uses direct joins
> between the relevant tables.  I have no clue whether that will improve
> things but if you are going to lie in this bed you should at least try
> different positions.
>
> ** **
>
> The better option is to educate yourself on better ways of constructing
> the tables so that you do not have to write this kind of god-awful query.
> In some cases key-value has merit but usually only when done in
> moderation.  Not for the entire database.  You likely should simply have a
> table that looks like the result of the query below.
>
> ** **
>
> As a second (not necessarily mutually exclusive) alternative: install and
> use the hstore extension.
>
> ** **
>
> David J.
>
> ** **
>
>
> Thanks in advance,
>
> Bob
>
> select
>
> t.id_name,
> max(t.begin_time) as begin_time,
> max(t.end_time) as end_time,
>
> max(case when (m.id_name = 'package-version') then v.value end) as
> package_version,
> max(case when (m.id_name = 'database-vendor') then v.value end) as
> database_vendor,
> max(case when (m.id_name = 'bean-name') then v.value end) as bean_name,
> max(case when (m.id_name = 'request-distribution') then v.value end)
> as request_distribution,
> max(case when (m.id_name = 'ycsb-workload') then v.value end) as
> ycsb_workload,
> max(case when (m.id_name = 'record-count') then v.value end) as
> record_count,
> max(case when (m.id_name = 'transaction-engine-count') then v.value
> end) as transaction_engine_count,
> max(case when (m.id_name = 'transaction-engine-maxmem') then v.value
> end) as transaction_engine_maxmem,
> max(case when (m.id_name = 'storage-manager-count') then v.value end)
> as storage_manager_count,
> max(case when (m.id_name = 'test-instance-count') then v.value end) as
> test_instance_count,
> max(case when (m.id_name = 'operation-count') then v.value end) as
> operation_count,
> max(case when (m.id_name = 'update-percent') then v.value end) as
> update_percent,
> max(case when (m.id_name = 'thread-count') then v.value end) as
> thread_count,
>
> max(case when (d.id_name = 'tps') then r.value end) as tps,
> max(case when (d.id_name = 'Memory') then r.value end) as memory,
> max(case when (d.id_name = 'DiskWritten') then r.value end) as
> disk_written,
> max(case when (d.id_name = 'PercentUserTime') then r.value end) as
> percent_user,
> max(case when (d.id_name = 'PercentCpuTime') then r.value end) as
> percent_cpu,
> max(case when (d.id_name = 'UserMilliseconds') then r.value end) as
> user_milliseconds,
> max(case when (d.id_name = 'YcsbUpdateLatencyMicrosecs') then r.value
> end) as update_latency,
> max(case when (d.id_name = 'YcsbReadLatencyMicrosecs') then r.value
> end

Re: [SQL] [noob] How to optimize this double pivot query?

2012-10-01 Thread David Johnston
Two issues...
1. I do not know how you acquire the data or who controls how/what is generated
2. I do not know the primary means of using said data

If you capture a new metric you generally have to change quite a few things to 
actually use it so you might as well just add a column as well.  If you want to 
be able to at least capture unknown input and store it somewhere that is where 
the hstore extension comes in.  In effect you should store unknown data there 
until you decide to update the schema to actually make use of it.  In 
situations where you want to use it without altering the schema you normally 
simply list the unknowns and as such should output a row-like structure.

The query you provide will need to be updated in the same way a physical table 
would be.  So just use a table.  Or do not provide a consolidated/wide query.  
If both, then deal with the performance hit one time per id and create a 
materialized view - basically insert the results of the query into a physical 
table and for live usage query that table.  This is a cache and comes with all 
the benefits and downsides thereof.

David J.


On Oct 1, 2012, at 21:13, Robert Buck  wrote:

> So as you can probably glean, the tables store performance metric data. The 
> reason I chose to use k-v is simply to avoid having to create an additional 
> column every time a new metric type come along. So those were the two options 
> I thought of, straight k-v and column for every value type.
> 
> Are there other better options worth considering that you could point me 
> towards that supports storing metrics viz. with an unbounded number of metric 
> types in my case?
> 
> Bob
> 
> On Mon, Oct 1, 2012 at 9:07 PM, David Johnston  wrote:
> From: [email protected] [mailto:[email protected]] 
> On Behalf Of Robert Buck
> Sent: Monday, October 01, 2012 8:47 PM
> To: [email protected]
> Subject: [SQL] [noob] How to optimize this double pivot query?
> 
>  
> 
> I have two tables that contain key-value data that I want to combine in 
> pivoted form into a single result set. They are related to two separate 
> tables.
> 
> The tables are: test_results, test_variables, metric_def, metadata_key. The 
> latter two tables are enum-like tables, basic descriptors of data stored in 
> other tables. The former two tables are basically key-value tables (with ids 
> as well); these k-v tables are related to the latter two tables via foreign 
> keys.
> 
> The following SQL takes about 11 seconds to run on a high-end laptop. The 
> largest table is about 54k records, pretty puny.
> 
> Can someone provide a hint as to why this is so slow? Again, I am a noob to 
> SQL, so the SQL is probably poorly written.
> 
> 
>  
> 
> Your query, while maybe not great, isn’t the cause of your problem.  It is 
> the table schema, specifically the “key-value” aspect, that is killing you.
> 
>  
> 
> You may want to try:
> 
>  
> 
> SELECT *
> 
> FROM (SELECT id FROM …) id_master
> 
> NATURAL LEFT JOIN (SELECT id, field_value AS … FROM … WHERE fieldtype = ‘’) f1
> 
> NATURAL LEFT JOIN (SELECT id, field_value AS … FROM … WHERE fieldtype = ‘’) f2
> 
> [repeat one left join for every field; though you will then need to decide 
> if/how to deal with NULL – not that you are currently doing anything special 
> anyway…]
> 
>  
> 
> Mainly the above avoids the use of “max()” and instead uses direct joins 
> between the relevant tables.  I have no clue whether that will improve things 
> but if you are going to lie in this bed you should at least try different 
> positions.
> 
>  
> 
> The better option is to educate yourself on better ways of constructing the 
> tables so that you do not have to write this kind of god-awful query.  In 
> some cases key-value has merit but usually only when done in moderation.  Not 
> for the entire database.  You likely should simply have a table that looks 
> like the result of the query below.
> 
>  
> 
> As a second (not necessarily mutually exclusive) alternative: install and use 
> the hstore extension.
> 
>  
> 
> David J.
> 
>  
> 
> 
> Thanks in advance,
> 
> Bob
> 
> select
> 
> t.id_name,
> max(t.begin_time) as begin_time,
> max(t.end_time) as end_time,
> 
> max(case when (m.id_name = 'package-version') then v.value end) as 
> package_version,
> max(case when (m.id_name = 'database-vendor') then v.value end) as 
> database_vendor,
> max(case when (m.id_name = 'bean-name') then v.value end) as bean_name,
> max(case when (m.id_name = 'request-distribution') then v.value end) as 
> request_distribution,
> max(case when (m.id_name = 'ycsb-workload') then v.value end) as 
> ycsb_workload,
> max(case when (m.id_name = 'record-count') then v.value end) as 
> record_count,
> max(case when (m.id_name = 'transaction-engine-count') then v.value end) 
> as transaction_engine_count,
> max(case when (m.id_name = 'transaction-engine-maxmem') then v.value end) 
> as transaction_en

Re: [SQL] [noob] How to optimize this double pivot query?

2012-10-01 Thread Thomas Kellerer

Robert Buck, 02.10.2012 03:13:

So as you can probably glean, the tables store performance metric
data. The reason I chose to use k-v is simply to avoid having to
create an additional column every time a new metric type come along.
So those were the two options I thought of, straight k-v and column
for every value type.

Are there other better options worth considering that you could point
me towards that supports storing metrics viz. with an unbounded
number of metric types in my case?



Have a look at the hstore module. It's exactly meant for that scenario with the 
added
benefit that you can index on that column and looking up key names and their 
values
is blazingly fast then.

That combined with the tablefunc module (which let's you do pivot queries) might
make your queries substantially more readable (and maybe faster as well).

Regards
Thomas






--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql