Execute INSERT during a parallel operation

2019-04-13 Thread Donald Dong
Hi,

I'm trying to use the SPI to save the executed plans in the ExecutorEnd. When 
the plan involves multiple workers, the insert operations would trigger an 
error: cannot execute INSERT during a parallel operation.

I wonder if there's a different hook I can use when there's a gather node? or 
other ways to get around?

Thank you,
Donald Dong



Why could GEQO produce plans with lower costs than the standard_join_search?

2019-05-07 Thread Donald Dong
Hi,

I was expecting the plans generated by standard_join_search to have lower costs
than the plans from GEQO. But after the results I have from a join order
benchmark show that GEQO produces plans with lower costs most of the time!

I wonder what is causing this observation? From my understanding,
standard_join_search is doing a complete search. So I'm not sure how the GEQO
managed to do better than that.

Thank you,
Donald Dong



Re: Why could GEQO produce plans with lower costs than the standard_join_search?

2019-05-22 Thread Donald Dong
Hi,

Thank you very much for the explanation! I think the join order
benchmark I used [1] is somewhat representative, however, I probably
didn't use the most accurate cost estimation.

I find the cost from cheapest_total_path->total_cost is different
from the cost from  queryDesc->planstate->total_cost. What I saw was
that GEQO tends to form paths with lower
cheapest_total_path->total_cost (aka the fitness of the children).
However, standard_join_search is more likely to produce a lower
queryDesc->planstate->total_cost, which is the cost we get using
explain.

I wonder why those two total costs are different? If the total_cost
from the planstate is more accurate, could we use that instead as the
fitness in geqo_eval?

[1] https://github.com/gregrahn/join-order-benchmark

Regards,
Donald Dong

> On May 7, 2019, at 4:44 PM, Tom Lane  wrote:
> 
> Donald Dong  writes:
>> I was expecting the plans generated by standard_join_search to have lower 
>> costs
>> than the plans from GEQO. But after the results I have from a join order
>> benchmark show that GEQO produces plans with lower costs most of the time!
> 
>> I wonder what is causing this observation? From my understanding,
>> standard_join_search is doing a complete search. So I'm not sure how the GEQO
>> managed to do better than that.
> 
> standard_join_search is *not* exhaustive; there's a heuristic that causes
> it not to consider clauseless joins unless it has to.
> 
> For the most part, GEQO uses the same heuristic (cf desirable_join()),
> but given the right sort of query shape you can probably trick it into
> situations where it will be forced to use a clauseless join when the
> core code wouldn't.  It'd still be surprising for that to come out with
> a lower cost estimate than a join order that obeys the heuristic,
> though.  Clauseless joins are generally pretty awful.
> 
> I'm a tad suspicious about the representativeness of your benchmark
> queries if you find this is happening "most of the time".
> 
>   regards, tom lane





Re: Why could GEQO produce plans with lower costs than the standard_join_search?

2019-05-22 Thread Donald Dong
On May 22, 2019, at 11:42 AM, Tom Lane  wrote:
> 
> Donald Dong  writes:
>> I find the cost from cheapest_total_path->total_cost is different
>> from the cost from  queryDesc->planstate->total_cost. What I saw was
>> that GEQO tends to form paths with lower
>> cheapest_total_path->total_cost (aka the fitness of the children).
>> However, standard_join_search is more likely to produce a lower
>> queryDesc->planstate->total_cost, which is the cost we get using
>> explain.
> 
>> I wonder why those two total costs are different? If the total_cost
>> from the planstate is more accurate, could we use that instead as the
>> fitness in geqo_eval?
> 
> You're still asking us to answer hypothetical questions unsupported
> by evidence.  In what case does that really happen?

Hi,

My apologies if this is not the minimal necessary set up. But here's
more information about what I saw using the following query
(JOB/1a.sql):

SELECT MIN(mc.note) AS production_note,
   MIN(t.title) AS movie_title,
   MIN(t.production_year) AS movie_year
FROM company_type AS ct,
 info_type AS it,
 movie_companies AS mc,
 movie_info_idx AS mi_idx,
 title AS t
WHERE ct.kind = 'production companies'
  AND it.info = 'top 250 rank'
  AND mc.note NOT LIKE '%(as Metro-Goldwyn-Mayer Pictures)%'
  AND (mc.note LIKE '%(co-production)%'
   OR mc.note LIKE '%(presents)%')
  AND ct.id = mc.company_type_id
  AND t.id = mc.movie_id
  AND t.id = mi_idx.movie_id
  AND mc.movie_id = mi_idx.movie_id
  AND it.id = mi_idx.info_type_id;

I attached the query plan and debug_print_rel output for GEQO and
standard_join_search.

planstate->total_cost   cheapest_total_path
GEQO54190.1354239.03
STD 54179.0254273.73

Here I observe GEQO  produces a lower
cheapest_total_path->total_cost, but its planstate->total_cost is higher
than what standard_join_search produces.

Regards,
Donald Dong

 Finalize Aggregate  (cost=54190.12..54190.13 rows=1 width=68)
   ->  Gather  (cost=54189.90..54190.11 rows=2 width=68)
 Workers Planned: 2
 ->  Partial Aggregate  (cost=53189.90..53189.91 rows=1 width=68)
   ->  Nested Loop  (cost=15318.71..53189.55 rows=46 width=45)
 Join Filter: (mc.movie_id = t.id)
 ->  Hash Join  (cost=15318.28..53162.39 rows=46 width=32)
   Hash Cond: (mc.company_type_id = ct.id)
   ->  Parallel Hash Join  (cost=15317.21..53160.33 
rows=185 width=36)
 Hash Cond: (mc.movie_id = mi_idx.movie_id)
 ->  Parallel Seq Scan on movie_companies mc  
(cost=0.00..37814.90 rows=7320 width=32)
   Filter: (((note)::text !~~ '%(as 
Metro-Goldwyn-Mayer Pictures)%'::text) AND (((note)::text ~~ 
'%(co-production)%'::text) OR ((note)::text ~~ ' %(presents)%'::text)))
 ->  Parallel Hash  (cost=15253.60..15253.60 
rows=5089 width=4)
   ->  Hash Join  (cost=2.43..15253.60 
rows=5089 width=4)
 Hash Cond: (mi_idx.info_type_id = 
it.id)
 ->  Parallel Seq Scan on 
movie_info_idx mi_idx  (cost=0.00..13685.15 rows=575015 width=8)
 ->  Hash  (cost=2.41..2.41 rows=1 
width=4)
   ->  Seq Scan on info_type it 
 (cost=0.00..2.41 rows=1 width=4)
 Filter: ((info)::text 
= 'top 250 rank'::text)
   ->  Hash  (cost=1.05..1.05 rows=1 width=4)
 ->  Seq Scan on company_type ct  
(cost=0.00..1.05 rows=1 width=4)
   Filter: ((kind)::text = 'production 
companies'::text)
 ->  Index Scan using title_pkey on title t  
(cost=0.43..0.58 rows=1 width=25)
   Index Cond: (id = mi_idx.movie_id)
 Finalize Aggregate  (cost=54179.01..54179.02 rows=1 width=68)
   ->  Gather  (cost=54178.79..54179.00 rows=2 width=68)
 Workers Planned: 2
 ->  Partial Aggregate  (cost=53178.79..53178.80 rows=1 width=68)
   ->  Nested Loop  (cost=37881.27..53178.44 rows=46 width=45)
 Join Filter: (mc.movie_id = t.id)
 ->  Parallel Hash Join  (cost=37880.84..53151.28 rows=46 
width=32)
   Hash Cond: (mi_idx.movie_id = mc.movie_id)
   ->  Hash Join  (cost=2.43..15253.60

Re: Why could GEQO produce plans with lower costs than the standard_join_search?

2019-05-23 Thread Donald Dong
On May 23, 2019, at 9:02 AM, Tom Lane  wrote:
> 
> Donald Dong  writes:
>> On May 22, 2019, at 11:42 AM, Tom Lane  wrote:
>>> You're still asking us to answer hypothetical questions unsupported
>>> by evidence.  In what case does that really happen?
> 
>> I attached the query plan and debug_print_rel output for GEQO and
>> standard_join_search.
> 
>>  planstate->total_cost   cheapest_total_path
>> GEQO 54190.1354239.03
>> STD  54179.0254273.73
> 
>> Here I observe GEQO  produces a lower
>> cheapest_total_path->total_cost, but its planstate->total_cost is higher
>> than what standard_join_search produces.
> 
> Well,
> 
> (1) the plan selected by GEQO is in fact more expensive than
> the one found by the standard search.  Not by much --- as Andrew
> observes, this difference is less than what the planner considers
> "fuzzily the same" --- but nonetheless 54190.13 > 54179.02.
> 
> (2) the paths you show do not correspond to the finally selected
> plans --- they aren't even the same shape.  (The Gathers are in
> different places, to start with.)  I'm not sure where you were
> capturing the path data, but it looks like you missed top-level
> parallel-aggregation planning, and that managed to find some
> plans that were marginally cheaper than the ones you captured.
> Keep in mind that GEQO only considers join planning, not
> grouping/aggregation.
> 
> Andrew's point about fuzzy cost comparison is also a good one,
> though we needn't invoke it to explain these particular numbers.

Oh, that's very good to know! I captured the path at the end of the
join_search_hook. If I understood correctly, top-level
parallel-aggregation will be applied later, so GEQO is not taking it
into consideration during the join searching?

By looking at the captured costs, I thought GEQO found a better join
order than the standard_join_search. However, the final plan using
the join order produced by GEQO turns out to be more expansive. Would
that imply if GEQO sees a join order which is identical to the one
produced by standard_join_search, it will discard it since the
cheapest_total_path has a higher cost, though the final plan may be
cheaper?

Here is another query (JOB/27a.sql) which has more significant cost
differences:

planstate->total_cost   cheapest_total_path
GEQO343016.77   343016.75
STD 342179.13   344137.33

Regards,
Donald Dong




Re: Why could GEQO produce plans with lower costs than the standard_join_search?

2019-05-23 Thread Donald Dong
On May 23, 2019, at 10:43 AM, Tom Lane  wrote:
> Donald Dong  writes:
>> On May 23, 2019, at 9:02 AM, Tom Lane  wrote:
>>> (2) the paths you show do not correspond to the finally selected
>>> plans --- they aren't even the same shape.  (The Gathers are in
>>> different places, to start with.)  I'm not sure where you were
>>> capturing the path data, but it looks like you missed top-level
>>> parallel-aggregation planning, and that managed to find some
>>> plans that were marginally cheaper than the ones you captured.
>>> Keep in mind that GEQO only considers join planning, not
>>> grouping/aggregation.
> 
>> By looking at the captured costs, I thought GEQO found a better join
>> order than the standard_join_search. However, the final plan using
>> the join order produced by GEQO turns out to be more expansive. Would
>> that imply if GEQO sees a join order which is identical to the one
>> produced by standard_join_search, it will discard it since the
>> cheapest_total_path has a higher cost, though the final plan may be
>> cheaper?
> 
> I suspect what's really going on is that you're looking at the wrong
> paths.  The planner remembers more paths for each rel than just the
> cheapest-total-cost one, the reason being that total cost is not the
> only figure of merit.  The plan that is winning in the end, it looks
> like, is parallelized aggregation on top of a non-parallel join plan,
> but the cheapest_total_path uses up the opportunity for a Gather on
> a parallelized scan/join.  If we were just doing a scan/join and
> no aggregation, that path would have been the basis for the final
> plan, but it's evidently not being chosen here; the planner is going
> to some other scan/join path that is not parallelized.

Seems the paths in the final rel (path list, cheapest parameterized
paths, cheapest startup path, and cheapest total path)  are the same
identical path for this particular query (JOB/1a.sql). Am I missing
anything?

Since the total cost of the cheapest-total-path is what GEQO is
currently using to evaluate the fitness (minimizing), I'm expecting
the cheapest-total-cost to measure how good is a join order. So a
join order from standard_join_search, with higher
cheapest-total-cost, ends up to be better is pretty surprising to me.

Perhaps the cheapest-total-cost should not be the best/only choice
for fitness?

Regards,
Donald Dong




Different row estimations on base rels

2019-05-26 Thread Donald Dong
Hi,

I noticed the estimated rows of the base relations during the join
searching is *very* different from the estimations in the final plan.

Join search (rows of the initial_rels):
RELOPTINFO (ct): rows=1 width=4
RELOPTINFO (it): rows=1 width=4
RELOPTINFO (mc): rows=17567 width=32
RELOPTINFO (mi_idx): rows=1380035 width=8
RELOPTINFO (t): rows=2528356 width=25

The final plan:
Seq Scan on company_type ct
  (cost=0.00..1.05 rows=1 width=4)
Seq Scan on info_type it
  (cost=0.00..2.41 rows=1 width=4)
Parallel Seq Scan on movie_companies mc
  (cost=0.00..37814.90 rows=7320 width=32)
Parallel Seq Scan on movie_info_idx mi_idx
  (cost=0.00..13685.15 rows=575015 width=8)
Index Scan using title_pkey on title t
  (cost=0.43..0.58 rows=1 width=25)

By looking at the joinrel->rows, I would expect relation t to have
the largest size, however, this is not true at all. I wonder what's
causing this observation, and how to get estimations close to the
final plan?

Thank you,
Donald Dong




Re: Different row estimations on base rels

2019-05-29 Thread Donald Dong
On May 29, 2019, at 1:36 PM, Robert Haas  wrote:
> Well, it's all there in the code.  I believe the issue is that the
> final estimates are based on the number of rows that will be returned
> from the relation, which is often less, and occasionally more, than
> the total of the rows in the relation.  The reason it's often less is
> because there might be a WHERE clause or similar which rules out some
> of the rows.  The reason it might be more is because a nested loop
> could return the same rows multiple times.

Yes, indeed. I was confused, and I guess I could've thought about it
about more before posting here. Thank you for answering this
question!

Regards,
Donald Dong




Print baserestrictinfo for varchar fields

2019-05-29 Thread Donald Dong
Hi,

I noticed that debug_print_rel outputs "unknown expr" when the fields
in baserestrictinfo are typed as varchar.

create table tbl_a(id int, info varchar(32));

RELOPTINFO (tbl_a): rows=4 width=86
baserestrictinfo: unknown expr = pattern

My approach is to handle the RelabelType case in print_expr. After
the patch, I get:

RELOPTINFO (tbl_a): rows=4 width=86
baserestrictinfo: tbl_a.info = pattern

I wonder if this is a proper way of fixing it?

Thank you,
Donald Dong



001_print_baserestrictinfo_varchar.patch
Description: Binary data


undefined symbol: PQgssEncInUse

2019-05-29 Thread Donald Dong
Hi,

After I make temp-install on HEAD with a clean build, I fail to start
psql (tmp_install/usr/local/pgsql/bin/psql) and get an error message:

./psql: symbol lookup error: ./psql: undefined symbol: PQgssEncInUse

However, make check and other tests still work. For me, it is fine
until commit b0b39f72b9904bcb80f97b35837ccff1578aa4b8. I wonder if
this only occurs to me?

Thank you,
Donald Dong




Re: undefined symbol: PQgssEncInUse

2019-05-29 Thread Donald Dong
On May 29, 2019, at 8:23 PM, Paul Guo  wrote:
> Have you used the correct libpq library? If yes, you might want to check the 
> build logs and related files to see where is wrong. In my environment, it's 
> ok with both gssapi enabled or disabled.

Thank you! Resetting libpq's path fixes it.

Regards,
Donald Dong



Re: Ryu floating point output patch

2019-01-19 Thread Donald Dong


> On Jan 18, 2019, at 2:05 AM, Andrew Gierth  
> wrote:
> 
> BTW, doing that in a thread about a commitfest patch confuses the
> commitfest app and cfbot (both of which think it's a new version of the
> patch under discussion), so best avoided.

Oops. Thank you. Noted.

I think the previous additional digits behavior still exist
in the latest patch. For example:

=# set extra_float_digits = 0;
SET
=# select float4in('1.123456789');
 float4in
--
  1.12346
(1 row)

=# set extra_float_digits = 1;
SET
=# select float4in('1.123456789');
 float4in
---
 1.1234568
(1 row)

The extra_float_digits is increased by 1, but two digits are
added. Without the patch, it will render ' 1.123457' instead.


Re: Ryu floating point output patch

2019-01-19 Thread Donald Dong


> On Jan 19, 2019, at 5:12 PM, Andrew Gierth  
> wrote:
> 
>>>>>> "Donald" == Donald Dong  writes:
> 
> Donald> I think the previous additional digits behavior still exist
> Donald> in the latest patch. For example:
> 
> Donald> =# set extra_float_digits = 0;
> Donald> SET
> Donald> =# select float4in('1.123456789');
> Donald>  float4in
> Donald> --
> Donald>   1.12346
> Donald> (1 row)
> 
> Donald> =# set extra_float_digits = 1;
> Donald> SET
> Donald> =# select float4in('1.123456789');
> Donald>  float4in
> Donald> ---
> Donald>  1.1234568
> Donald> (1 row)
> 
> Donald> The extra_float_digits is increased by 1, but two digits are
> Donald> added.
> 
> That's intentional; this patch takes any value of extra_float_digits
> greater than 0 to mean "generate the shortest precise output".
> 
> In practice setting extra_float_digits to 1 is rare to nonexistent;
> almost all users of extra_float_digits set it to either 3 (to get
> precise values), 2 (for historical reasons since we didn't allow more
> than 2 in older versions), or less than 0 (to get rounded output for
> more convenient display when precision is not required).

Makes sense! Thank you for explaining.

I wonder if it's necessary to update the doc accordingly? Such as
https://www.postgresql.org/docs/11/datatype-numeric.html#DATATYPE-FLOAT
and
https://www.postgresql.org/docs/11/runtime-config-client.html
.




Install JIT headers

2019-01-22 Thread Donald Dong
Hi,

In the standard_planner, we set the proper JIT flags on the resulting
PlannedStmt node. We also offered a planer hook so extensions can
add customized planners. The JIT flags in jit/jit.h however, is
not present on the system. I think it's probably better to
install the jit headers?


diff --git a/src/include/Makefile b/src/include/Makefile
index 6bdfd7db91..041702809e 100644
--- a/src/include/Makefile
+++ b/src/include/Makefile
@@ -19,7 +19,7 @@ all: pg_config.h pg_config_ext.h pg_config_os.h
 # Subdirectories containing installable headers
 SUBDIRS = access bootstrap catalog commands common datatype \
executor fe_utils foreign \
-   lib libpq mb nodes optimizer parser partitioning postmaster \
+   jit lib libpq mb nodes optimizer parser partitioning postmaster \
regex replication rewrite \
statistics storage tcop snowball snowball/libstemmer tsearch \
tsearch/dicts utils port port/atomics port/win32 port/win32_msvc \

 
Regards,
Donald Dong



Analyze all plans

2019-01-23 Thread Donald Dong
Hi,

I'm working on an extension which analyzes all possible plans
generated by the planner. I believe this extension would become
useful for benchmarking the planner (e.g. the performance of the
estimation and the cost model) and better understanding the cases
where the planners would make a suboptimal move.

Here are my procedures:
1. Enumerate all the plans
1.1 Create a hook for add_path so that it won't discard the
expensive paths from the planner's point of view.
1.2 Collect all the paths from final_rel->pathlist, and turn them
into PlannedStmt nodes by patching standard_planner.
1.3 Mark the cheapest path from the planner's point of view.

2. Explain all the plans
2.1 First explain the cheapest plan
2.1.1 If analyzing, collect the execution time and use it to set
a timer interrupt.
2.2 Explain the remaining plans
2.2.1 The other plans can be disastrous; the plans may never
finish in a reasonable amount of time. If 
analyzing, the timer
interrupt shall stop the executor.
2.2.2 Move on to the next plan

Are those procedures reasonable?

I'm able to implement all the steps except for 2.2.1.

- Attempt 1
Our most common way of handling the timeouts is to kill the
process. However, it would terminate the entire explain statement,
yet there're still plans to be explained.

- Attempt 2
Fork before explain so it would possible to kill the child process
without disrupting the explain statement. However, simply
allocating shared memory for the QueryDesc would not work (PANIC:
failed to re-find shared lock object). To me, this plan is more
doable, but it seems there exist other mechanisms I need to be
aware, to execute a plan in the child process and report the
result in the parent process?

What do you think?  I will appreciate any feedback.

Thank you,
Donald Dong


Re: Analyze all plans

2019-01-23 Thread Donald Dong
On Jan 23, 2019, at 6:46 AM, Tom Lane  wrote:
> 
> Oleksandr Shulgin  writes:
>> On Wed, Jan 23, 2019 at 9:44 AM Donald Dong  wrote:
>>> 1. Enumerate all the plans
> 
>> So enumerating all possible plans stops being practical for even slightly
>> complicated queries.
> 
> Yes.  You can *not* disable the planner's aggressive pruning of losing
> paths and subpaths without ending up with something that's completely
> impractical for anything beyond toy queries.  That's just from the
> standpoint of planner runtime.  Adding on the cost of actually creating
> a finished plan and then running it for long enough to get a reliable
> runtime for each case would ... well, let's just say you'd be safely dead
> before you got any interesting answers.

Thank you for the feedback. I didn't think about this carefully. I
thought the planner would use GEQO when the number of joins is
large, but indeed it's still n! for normal path selection.

Now I keep top-k cheapest sub plans, where k is configured
by users. If the k is set up properly, setting a timeout may not
be necessary. However, if I do want to set a timeout, I would run
into the issues I had in step 2.

I'm looking forward to hearing more thoughts :)

Thanks again!



Actual Cost

2019-02-16 Thread Donald Dong
Hi,

When explaining a query, I think knowing the actual rows and pages in addition
to the operation type  (e.g seqscan) would be enough to calculate the actual
cost. The actual cost metric could be useful when we want to look into how off
is the planner's estimation, and the correlation between time and cost. Would
it be a feature worth considering?

Thank you,
Donald Dong


Re: Actual Cost

2019-02-16 Thread Donald Dong
On Feb 16, 2019, at 6:44 PM, Tomas Vondra wrote:
> 
> On 2/17/19 3:40 AM, David Fetter wrote:
>> 
>> As someone not volunteering to do any of the work, I think it'd be a
>> nice thing to have.  How large an effort would you guess it would be
>> to build a proof of concept?
> 
> I don't quite understand what is meant by "actual cost metric" and/or
> how is that different from running EXPLAIN ANALYZE.

Here is an example:

Hash Join  (cost=3.92..18545.70 rows=34 width=32) (actual cost=3.92..18500 
time=209.820..1168.831 rows=47 loops=3)

Now we have the actual time. Time can have a high variance (a change
in system load, or just noises), but I think the actual cost would be
less likely to change due to external factors.

On 2/17/19 3:40 AM, David Fetter wrote:
> On Sat, Feb 16, 2019 at 03:10:44PM -0800, Donald Dong wrote:
>> Hi,
>> 
>> When explaining a query, I think knowing the actual rows and pages
>> in addition to the operation type  (e.g seqscan) would be enough to
>> calculate the actual cost. The actual cost metric could be useful
>> when we want to look into how off is the planner's estimation, and
>> the correlation between time and cost. Would it be a feature worth
>> considering?
> 
> As someone not volunteering to do any of the work, I think it'd be a
> nice thing to have.  How large an effort would you guess it would be
> to build a proof of concept?

Intuitively it does not feel very complicated to me, but I think the
interface when we're planning (PlannerInfo, RelOptInfo) is different
from the interface when we're explaining a query (QueryDesc). Since
I'm very new, if I'm doing it myself, it would probably take many
iterations to get to the right point. But still, I'm happy to work on
a proof of concept if no one else wants to work on it.

regards,
Donald Dong


Re: Actual Cost

2019-02-16 Thread Donald Dong
On Feb 16, 2019, at 9:31 PM, Tom Lane  wrote:
> 
> Donald Dong  writes:
>> On Feb 16, 2019, at 6:44 PM, Tomas Vondra wrote:
>>> I don't quite understand what is meant by "actual cost metric" and/or
>>> how is that different from running EXPLAIN ANALYZE.
> 
>> Here is an example:
> 
>> Hash Join  (cost=3.92..18545.70 rows=34 width=32) (actual cost=3.92..18500 
>> time=209.820..1168.831 rows=47 loops=3)
> 
>> Now we have the actual time. Time can have a high variance (a change
>> in system load, or just noises), but I think the actual cost would be
>> less likely to change due to external factors.
> 
> I'm with Tomas: you have not explained what you think those
> numbers mean.

Yeah, I was considering the actual cost to be the output of the cost
model given the actual rows and pages after we instrument the
execution: plug in the values which are no longer estimations.

For a hash join, we could use the actual inner_rows_total to get the
actual cost. For a seqscan, we can use the actual rows to get the
actual CPU cost.

regards,
Donald Dong




Re: Actual Cost

2019-02-17 Thread Donald Dong
On Feb 17, 2019, at 10:56 AM, Tom Lane  wrote:
> Perhaps, but refactoring to get that seems impractically invasive &
> expensive, since e.g. index AM cost estimate functions would have to
> be redefined, plus we'd have to carry around some kind of cost vector
> rather than single numbers for every Path ...

Maybe we could walk through the final plan tree and fill the expression? With 
another tree structure to hold the cost vectors.

On Feb 17, 2019, at 8:29 AM, Jeff Janes  wrote:
> I don't think there is any way to assign an actual cost.  For example how do 
> you know if a buffer read was "actually" seq_page_cost or random_page_cost?  
> And if there were a way, it too would have a high variance.

Thanks for pointing that out! I think it's probably fine to use the same 
assumptions as the cost model? For example, we assume 3/4ths of accesses are 
sequential, 1/4th are not.

> What would I find very useful is a verbosity option to get the cost estimates 
> expressed as a multiplier of each *_cost parameter, rather than just as a 
> scalar.

Yeah, such expression would also help if we want to plug in the actual values.


On Feb 17, 2019, at 4:11 AM, Tomas Vondra  wrote:
> Perhaps I'm just too used to comparing the rows/pages directly, but I
> don't quite see the benefit of having such "actual cost". Mostly because
> the cost model is rather rough anyway.

Yeah, I think the "actual cost" is only "actual" for the cost model - the cost 
it would output given the exact row/page number. Some articles/papers have 
shown row estimation is the primary reason for planners to go off, so showing 
the actual (or the updated assumption) might also be useful if people want to 
compare different plans and want to refer to a more accurate quantitative 
measure.

regards,
Donald Dong


Re: Actual Cost

2019-02-17 Thread Donald Dong
On Feb 17, 2019, at 11:05 AM, Donald Dong  wrote:
> 
> On Feb 17, 2019, at 10:56 AM, Tom Lane  wrote:
>> Perhaps, but refactoring to get that seems impractically invasive &
>> expensive, since e.g. index AM cost estimate functions would have to
>> be redefined, plus we'd have to carry around some kind of cost vector
>> rather than single numbers for every Path ...
> 
> Maybe we could walk through the final plan tree and fill the expression? With 
> another tree structure to hold the cost vectors.

Here is a draft patch. I added a new structure called CostInfo to the Plan 
node. The CostInfo is be added in create_plan, and the cost calculation is 
centered at CostInfo. Is this a reasonable approach?

Thank you,
Donald Dong



01_actual_cost_seqscan_001.patch
Description: Binary data




Implicit make rules break test examples

2018-12-31 Thread Donald Dong
Hi,

In src/test/example, the implicit make rules produce errors:

make -C ../../../src/backend generated-headers
make[1]: Entering directory '/home/ddong/postgresql/bld/src/backend'
make -C catalog distprep generated-header-symlinks
make[2]: Entering directory '/home/ddong/postgresql/bld/src/backend/catalog'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory '/home/ddong/postgresql/bld/src/backend/catalog'
make -C utils distprep generated-header-symlinks
make[2]: Entering directory '/home/ddong/postgresql/bld/src/backend/utils'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory '/home/ddong/postgresql/bld/src/backend/utils'
make[1]: Leaving directory '/home/ddong/postgresql/bld/src/backend'
gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -Wno-format-truncation -O2
-I/home/ddong/postgresql/bld/../src/interfaces/libpq
-I../../../src/include -I/home/ddong/postgresql/bld/../src/include
-D_GNU_SOURCE   -c -o testlibpq.o
/home/ddong/postgresql/bld/../src/test/examples/testlibpq.c
gcc -L../../../src/port -L../../../src/common -L../../../src/common
-lpgcommon -L../../../src/port -lpgport
-L../../../src/interfaces/libpq -lpq   -Wl,--as-needed
-Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags  testlibpq.o   -o
testlibpq
testlibpq.o: In function `exit_nicely':
testlibpq.c:(.text.unlikely+0x5): undefined reference to `PQfinish'
testlibpq.o: In function `main':
testlibpq.c:(.text.startup+0x22): undefined reference to `PQconnectdb'
testlibpq.c:(.text.startup+0x2d): undefined reference to `PQstatus'
testlibpq.c:(.text.startup+0x44): undefined reference to `PQexec’
…

I think the -lpq flag does not have any effects in the middle of the
arguments. It works if move the flag to the end:

gcc -L../../../src/port -L../../../src/common -L../../../src/common
-lpgcommon -L../../../src/port -lpgport
-L../../../src/interfaces/libpq -Wl,--as-needed
-Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags  testlibpq.o   -o
testlibpq -lpq

So I added an explicit rule to rearrange the flags:

gcc testlibpq.o -o testlibpq -L../../../src/port -L../../../src/common
-L../../../src/common -lpgcommon -L../../../src/port -lpgport
-L../../../src/interfaces/libpq -lpq   -Wl,--as-needed
-Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags

Then the make command works as expected. This is my first time writing
a patch. Please let me know what you think!

Thank you,
Happy new year!
Donald Dong



Re: Implicit make rules break test examples

2018-12-31 Thread Donald Dong
On Mon, Dec 31, 2018 at 11:24 PM Donald Dong  wrote:
>
> Hi,
>
> In src/test/example, the implicit make rules produce errors:
>
> make -C ../../../src/backend generated-headers
> make[1]: Entering directory '/home/ddong/postgresql/bld/src/backend'
> make -C catalog distprep generated-header-symlinks
> make[2]: Entering directory '/home/ddong/postgresql/bld/src/backend/catalog'
> make[2]: Nothing to be done for 'distprep'.
> make[2]: Nothing to be done for 'generated-header-symlinks'.
> make[2]: Leaving directory '/home/ddong/postgresql/bld/src/backend/catalog'
> make -C utils distprep generated-header-symlinks
> make[2]: Entering directory '/home/ddong/postgresql/bld/src/backend/utils'
> make[2]: Nothing to be done for 'distprep'.
> make[2]: Nothing to be done for 'generated-header-symlinks'.
> make[2]: Leaving directory '/home/ddong/postgresql/bld/src/backend/utils'
> make[1]: Leaving directory '/home/ddong/postgresql/bld/src/backend'
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -fexcess-precision=standard -Wno-format-truncation -O2
> -I/home/ddong/postgresql/bld/../src/interfaces/libpq
> -I../../../src/include -I/home/ddong/postgresql/bld/../src/include
> -D_GNU_SOURCE   -c -o testlibpq.o
> /home/ddong/postgresql/bld/../src/test/examples/testlibpq.c
> gcc -L../../../src/port -L../../../src/common -L../../../src/common
> -lpgcommon -L../../../src/port -lpgport
> -L../../../src/interfaces/libpq -lpq   -Wl,--as-needed
> -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags  testlibpq.o   -o
> testlibpq
> testlibpq.o: In function `exit_nicely':
> testlibpq.c:(.text.unlikely+0x5): undefined reference to `PQfinish'
> testlibpq.o: In function `main':
> testlibpq.c:(.text.startup+0x22): undefined reference to `PQconnectdb'
> testlibpq.c:(.text.startup+0x2d): undefined reference to `PQstatus'
> testlibpq.c:(.text.startup+0x44): undefined reference to `PQexec’
> …
>
> I think the -lpq flag does not have any effects in the middle of the
> arguments. It works if move the flag to the end:
>
> gcc -L../../../src/port -L../../../src/common -L../../../src/common
> -lpgcommon -L../../../src/port -lpgport
> -L../../../src/interfaces/libpq -Wl,--as-needed
> -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags  testlibpq.o   -o
> testlibpq -lpq
>
> So I added an explicit rule to rearrange the flags:
>
> gcc testlibpq.o -o testlibpq -L../../../src/port -L../../../src/common
> -L../../../src/common -lpgcommon -L../../../src/port -lpgport
> -L../../../src/interfaces/libpq -lpq   -Wl,--as-needed
> -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags
>
> Then the make command works as expected. This is my first time writing
> a patch. Please let me know what you think!
>
> Thank you,
> Happy new year!
> Donald Dong



-- 
Donald Dong


explicit_make_rule_test_example_v1.patch
Description: Binary data


Re: Implicit make rules break test examples

2019-01-01 Thread Donald Dong
Thank you for the explanation! That makes sense. It is strange that it does
not work for me.


> What platform are you on exactly, and what toolchain (gcc and ld
> versions) are you using?


I'm using Ubuntu 18.04.1 LTS.

gcc version:
gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0

ld version:
GNU ld (GNU Binutils for Ubuntu) 2.30

Regards,
Donald Dong


On Tue, Jan 1, 2019 at 9:54 AM Tom Lane  wrote:

> Donald Dong  writes:
> > In src/test/example, the implicit make rules produce errors:
>
> Hm.  "make" in src/test/examples works fine for me.
>
> The only way I can account for the results you're showing is if your
> linker is preferring libpq.a to libpq.so, so that reading the library
> before the *.o files causes none of it to get pulled in.  But that
> isn't the default behavior on any modern platform AFAIK, and certainly
> isn't considered good practice these days.  Moreover, if that's what's
> happening, I don't see how you would have managed to build PG at all,
> because there are a lot of other places where our Makefiles write
> $(LDFLAGS) before the *.o files they're trying to link.  Maybe we
> shouldn't have done it like that, but it's been working for everybody
> else.
>
> What platform are you on exactly, and what toolchain (gcc and ld
> versions) are you using?
>
> regards, tom lane
>


-- 
Donald Dong


Re: Implicit make rules break test examples

2019-01-01 Thread Donald Dong
> I observe that ecpg's Makefile.regress is already doing #3:
> %: %.o
> $(CC) $(CFLAGS) $< $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
> so what we'd be talking about is moving that to some more global spot,
> probably Makefile.global.  (I wonder why the target is not specified
> as $@$(X) here?)


Thank you for pointing that out!
I think #3 is a better choice since it's less invasive and would
potentially avoid similar problems in the future. I think may worth
the risks of breaking some extensions. I moved the rule to the
Makefile.global and added  $(X) in case it's set.

I also updated the order in Makefile.linux in the same patch since
they have the same cause. I'm not sure if changes are necessary for
other platforms, and I am not able to test it, unfortunately.

I've built it again on Ubuntu and tested src/test/examples and
src/interfaces/libpq/test. There are no errors.

Thank you again for the awesome explanation,
Donald Dong

On Tue, Jan 1, 2019 at 11:54 AM Tom Lane  wrote:
>
> Donald Dong  writes:
> > Thank you for the explanation! That makes sense. It is strange that it does
> > not work for me.
>
> Yeah, I still can't account for the difference in behavior between your
> platform and mine (I tried several different ones here, and they all
> manage to build src/test/examples).  However, I'm now convinced that
> we do have an issue, because I found another place that does fail on my
> platforms: src/interfaces/libpq/test gives failures like
>
> uri-regress.o: In function `main':
> uri-regress.c:58: undefined reference to `pg_printf'
>
> the reason being that the link command lists -lpgport before
> uri-regress.o, and since we only make the .a flavor of libpgport, it's
> definitely going to be order-sensitive.  (This has probably been busted
> since we changed over to using snprintf.c everywhere, but nobody noticed
> because this test isn't normally run.)
>
> The reason we haven't noticed this already seems to be that in all the
> places where it matters, we have explicit link rules that order things
> like this:
>
> $(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
>
> However, the places that are having problems are trying to rely on
> gmake's implicit rule, which according to their manual is
>
>   Linking a single object file
>  `N' is made automatically from `N.o' by running the linker
>  (usually called `ld') via the C compiler.  The precise command
>  used is `$(CC) $(LDFLAGS) N.o $(LOADLIBES) $(LDLIBS)'.
>
> So really the problem here is that we're doing the wrong thing by
> injecting -l switches into LDFLAGS; it would be more standard to
> put them into LDLIBS (or maybe LOADLIBES, though I think that's
> not commonly used).
>
> I hesitate to try to change that though.  The places that are messing with
> that are injecting both -L and -l switches, and we want to keep putting
> the -L switches into LDFLAGS because of the strong likelihood that the
> initial (autoconf-derived) value of LDFLAGS contains -L switches; our
> switches pointing at within-tree directories need to come first.
> So the options seem to be:
>
> 1. Redefine our makefile conventions as being that internal -L switches
> go into LDFLAGS_INTERNAL but internal -l switches go into LDLIBS_INTERNAL,
> and we use the same recursive-expansion dance for LDLIBS[_INTERNAL] as for
> LDFLAGS[_INTERNAL], and we have to start mentioning LDLIBS in our explicit
> link rules.  This would be a pretty invasive patch, I'm afraid, and would
> almost certainly break some third-party extensions' Makefiles.  It'd be
> the cleanest solution in a green field, I think, but our Makefiles are
> hardly a green field.
>
> 2. Be sure to override the gmake implicit link rule with an explicit
> link rule everywhere --- basically like your patch, but touching more
> places.  This seems like the least risky alternative in the short
> run, but we'd be highly likely to reintroduce such problems in future.
>
> 3. Replace the default implicit link rule with one of our own.
> Conceivably this also breaks some extensions' Makefiles, though
> I think that's less likely.
>
> I observe that ecpg's Makefile.regress is already doing #3:
>
> %: %.o
> $(CC) $(CFLAGS) $< $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@
>
> so what we'd be talking about is moving that to some more global spot,
> probably Makefile.global.  (I wonder why the target is not specified
> as $@$(X) here?)
>
> I notice that the platform-specific makefiles in src/makefiles
> are generally also getting it wrong in their implicit rules for
> building shlibs, eg in Makefi

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-09 Thread Donald Dong
Hi Michael,

Thank you for the information!

> On Dec 11, 2018, at 9:55 PM, Michael Paquier  wrote:
> 
> Well, the conninfo and slot name accessible to the user are the values
> available only once the information of the WAL receiver in shared memory
> is ready to be displayed.  Relying more on the GUC context has the
> advantage to never have in shared memory the original string and only
> store the clobbered one, which actually simplifies what's stored in
> shared memory because you can entirely remove ready_to_display (I forgot
> to remove that in the patch actually).  If those parameters become
> reloadable, you actually rely only on what the param context has, and
> not on what the shared memory context may have or not.

I wonder why do we need to have this addition?

src/backend/replication/walreceiver.c
+   memset(walrcv->slotname, 0, NAMEDATALEN);
+   if (PrimarySlotName)
+   strlcpy((char *) walrcv->slotname, PrimarySlotName, 
NAMEDATALEN);


src/backend/replication/walreceiver.c
288: PrimaryConnInfo == NULL || PrimaryConnInfo[0] == '\0'
291: errmsg("cannot connect to the primary server as \"primary_conninfo\" is 
not defined")));
392: PrimarySlotName && PrimarySlotName[0] != '\0'

src/backend/access/transam/xlog.c
11824: * If primary_conninfo is set, launch walreceiver to try
11833: PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0

I notice these patterns appear in different places. Maybe it will be better to 
have a common method such as pg_str_empty()?

Regards,
Donald Dong


Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-09 Thread Donald Dong


On Jan 9, 2019, at 4:47 PM, Michael Paquier  wrote:
> That's much cleaner to me to clean up the field for safety before
> starting the process.  When requesting a WAL receiver to start,
> slotname and conninfo gets zeroed before doing anything, you are right
> that we could do without it actually.

Cool! I agree that cleaning up the field would make the logic cleaner.

> That's a pattern used quite a lot for many GUCs, so that's quite
> separate from this patch.  Perhaps it would make sense to have a macro
> for that purpose for GUCs, I am not sure if that's worth it though.

Yeah. I think having such macro would make the code a bit cleaner. Tho the
duplication of logic is not too significant.





Re: Unified logging system for command-line programs

2019-01-09 Thread Donald Dong
I think this patch is a nice improvement!

On Jan 3, 2019, at 2:08 PM, Andres Freund  wrote:
> Or we could just add an ERROR variant that doesn't exit. Years back
> I'd proposed that we make the log level a bitmask, but it could also
> just be something like CALLSITE_ERROR or something roughly along those
> lines.  There's a few cases in backend code where that'd be beneficial
> too.

I think the logging system can also be applied on pg_regress. Perhaps even
for the external frontend applications?

The patch cannot be applied directly on HEAD. So I patched it on top of 
60d99797bf. When I call pg_log_error() in initdb, I see

Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:62
62  ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:62
#1  0x55568f96 in dopr.constprop ()
#2  0x55569ddb in pg_vsnprintf ()
#3  0x55564236 in pg_log_generic ()
#4  0xc240 in main ()

I'm not sure what would be causing this behavior. I would appreciate
references or docs for testing and debugging patches more efficiently.
Now I'm having difficulties loading symbols of initdb in gdb.

Thank you,
Donald Dong



How does the planner determine plan_rows ?

2019-01-10 Thread Donald Dong
Hi,

I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found
the results have different row numbers, but the tables are all empty.

=# CREATE TABLE t1(id INT, data INT);
=# EXPLAIN ANALYZE SELECT * FROM t1;
  Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8) (actual
  time=0.003..0.003 rows=0 loops=1)

=# CREATE TABLE t2(data VARCHAR);
=# EXPLAIN ANALYZE SELECT * FROM t2;
  Seq Scan on t2  (cost=0.00..23.60 rows=1360 width=32) (actual
  time=0.002..0.002 rows=0 loops=1)

=# CREATE TABLE t3(id INT, data VARCHAR);
=# EXPLAIN ANALYZE SELECT * FROM t3;
  Seq Scan on t3  (cost=0.00..22.70 rows=1270 width=36) (actual
  time=0.001..0.001 rows=0 loops=1)

I found this behavior unexpected. I'm still trying to find out how/where the 
planner
determines the plan_rows. Any help will be appreciated!

Thank you,
Donald Dong


Re: How does the planner determine plan_rows ?

2019-01-10 Thread Donald Dong
Thank you for the great explanation!

> On Jan 10, 2019, at 7:48 PM, Andrew Gierth  
> wrote:
> 
>>>>>> "Donald" == Donald Dong  writes:
> 
> Donald> Hi,
> Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on
> Donald> `SELECT * `. I found the results have different row numbers,
> Donald> but the tables are all empty.
> 
> Empty tables are something of a special case, because the planner
> doesn't assume that they will _stay_ empty, and using an estimate of 0
> or 1 rows would tend to create a distorted plan that would likely blow
> up in runtime as soon as you insert a second row.
> 
> The place to look for info would be estimate_rel_size in
> optimizer/util/plancat.c, from which you can see that empty tables get
> a default size estimate of 10 pages. Thus:
> 
> Donald> =# CREATE TABLE t1(id INT, data INT);
> Donald> =# EXPLAIN ANALYZE SELECT * FROM t1;
> Donald>   Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8) (actual
> Donald>   time=0.003..0.003 rows=0 loops=1)
> 
> An (int,int) tuple takes about 36 bytes, so you can get about 226 of
> them on a page, so 10 pages is 2260 rows.
> 
> Donald> =# CREATE TABLE t2(data VARCHAR);
> Donald> =# EXPLAIN ANALYZE SELECT * FROM t2;
> Donald>   Seq Scan on t2  (cost=0.00..23.60 rows=1360 width=32) (actual
> Donald>   time=0.002..0.002 rows=0 loops=1)
> 
> Size of a varchar with no specified length isn't known, so the planner
> determines an average length of 32 by the time-honoured method of rectal
> extraction (see get_typavgwidth in lsyscache.c), making 136 rows per
> page.
> 
> -- 
> Andrew (irc:RhodiumToad)




Re: How does the planner determine plan_rows ?

2019-01-10 Thread Donald Dong


> On Jan 10, 2019, at 8:01 PM, Tom Lane  wrote:
> 
> ... estimate_rel_size() in plancat.c is where to look to find out
> about the planner's default estimates when it's lacking hard data.

Thank you! Now I see how the planner uses the rows to estimate the cost and
generates the best_plan.

To me, tracing the function calls is not a simple task. I'm using cscope, and I
use printf when I'm not entirely sure. I was considering to use gbd, but I'm
having issues referencing the source code in gdb.

I'm very interested to learn how the professionals explore the codebase!


Re: Unified logging system for command-line programs

2019-01-11 Thread Donald Dong


> On Jan 11, 2019, at 9:14 AM, Peter Eisentraut 
>  wrote:
> 
>> The patch cannot be applied directly on HEAD. So I patched it on top of 
>> 60d99797bf.
> 
> Here is an updated patch with the merge conflicts of my own design
> resolved.  No functionality changes.
> 
>> When I call pg_log_error() in initdb, I see
>> 
>> Program received signal SIGSEGV, Segmentation fault.
>> __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:62
>> 62  ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
>> (gdb) bt
>> #0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:62
>> #1  0x55568f96 in dopr.constprop ()
>> #2  0x55569ddb in pg_vsnprintf ()
>> #3  0x55564236 in pg_log_generic ()
>> #4  0xc240 in main ()
> 
> What do you mean exactly by "I call pg_log_error()"?  The existing calls
> in initdb clearly work, at least some of them, that is covered by the
> test suite.  Are you adding new calls?

Thank you. I did add a new call for my local testing. There are no more errors
after re-applying the patch on master.

>> I'm not sure what would be causing this behavior. I would appreciate
>> references or docs for testing and debugging patches more efficiently.
>> Now I'm having difficulties loading symbols of initdb in gdb.
> 
> The above looks like you'd probably get a better insight by compiling
> with -O0 or some other lower optimization setting.
> 
> There is also this:
> https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

Thank you for the reference. That's very helpful!


I noticed in some places such as

pg_log_error("no data directory specified");
fprintf(stderr,
_("You must identify the directory 
where the data for this database system\n"
...

and

pg_log_warning("enabling \"trust\" authentication for local 
connections");
fprintf(stderr, _("You can change this by editing pg_hba.conf or using 
the option -A, or\n"
"--auth-local and --auth-host, the next 
time you run initdb.\n"));

, pg_log  does not completely replace fprintf. Would it be better to use pg_log
so the logging level can also filter these messages?




Re: Ryu floating point output patch

2019-01-13 Thread Donald Dong


> On Jan 11, 2019, at 10:40 AM, Andres Freund  wrote:
> 
> And of course it'd change the dump's text contents between ryu and
> non-ryu backends even with extra_float_digits = 3, but the resulting
> floats ought to be the same. It's just that ryu is better at figuring
> out what the minimal text representation is than the current code.

+1. I spent some time reading the Ryu paper, and I'm convinced.

I think Ryu will be a good choice we want to have as many digits as possible
(and roundtrip-safe). Since we have the extra_float_digits and ndig, the Ryu may
give us more digits than we asked.

Maybe one way to respect ndig is to clear the last few digits of (a, b, c)
(This is the notation in the paper. I think the code used different var names)
in base 10,
https://github.com/ulfjack/ryu/blob/b2ae7a712937711375a79f894106a0c6d92b035f/ryu/d2s.c#L264-L266
  // Step 3: Convert to a decimal power base using 128-bit arithmetic.
  uint64_t vr, vp, vm;
  int32_t e10;
for Ryu to generate the desired string?

Also I wonder why do we need to make this change?
-intextra_float_digits = 0; /* Added to DBL_DIG or FLT_DIG 
*/
+intextra_float_digits = 1; /* Added to DBL_DIG or FLT_DIG 
*/


Re: Ryu floating point output patch

2019-01-16 Thread Donald Dong

> On Jan 15, 2019, at 2:37 AM, Andrew Gierth  
> wrote:
> 
> Andres> strtod()'s job ought to computationally be significantly easier
> Andres> than the other way round, no? And if there's buggy strtod()
> Andres> implementations out there, why would they be guaranteed to do
> Andres> the correct thing with our current output, but not with ryu's?
> 
> Funny thing: I've been devoting considerable effort to testing this, and
> the one failure mode I've found is very interesting; it's not a problem
> with strtod(), in fact it's a bug in our float4in caused by _misuse_ of
> strtod().

Hi,

I'm trying to reproduce the results by calling float4in in my test code. But I
have some difficulties linking the code:

testfloat4.c:(.text+0x34): undefined reference to `float4in'
testfloat4.c:(.text+0x3c): undefined reference to `DirectFunctionCall1Coll'

I tried offering float.o to the linker in addition to my test program. I also
tried to link all the objects (*.o). But the errors still exist. I attached my 
changes as a patch.

I wonder if creating separated test programs is a good way of testing the code,
and I'm interested in learning what I missed.

Thank you.



test_float4.patch
Description: Binary data