Hi kingofthecity,
>
> But when I change the result-mode from TABLE to CHANGELOG,
> there are 4 rows for each update like:
>
 The results are expected when the result-mode is `CHANGELOG`.

> SELECT sector, avg(`value`) as `index` FROM stock INNER JOIN metadata ON
> stock.id=metadata.id GROUP BY sector;
>
In this case, the topology is source -> join -> aggregate.
For an update for a stock source, let's say (id1, 1) -> (id1,2).
When the input record (id1, 2) arrives at join operator. the output of join
would be two record:
-U (id1,  1, 'Technology'),
+ U(id1, 2, 'Technology')
Those two record arrives at downstream aggregate, each record would leads
to two output record (-U, +U). so each update at source would lead to 4
output records.
Hope this helps.

Best,
JING ZHANG



thekingofcity <kingofthec...@protonmail.com> 于2021年10月20日周三 下午5:52写道:

> Hi,
>
> I'm working on a simple sql that contains aggregation.
>
> The results looks fine in the `SQL Query Result (Table)`
> but looks weird when I change the result-mode to CHANGELOG.
>
> ```
> CREATE TABLE stock (
>   id VARCHAR(10) NOT NULL PRIMARY KEY,
>   `value` DOUBLE NOT NULL
> ) WITH (
>   'connector' = 'pravega',
>   'controller-uri' = 'tcp://pravega:9090',
>   'scope' = 'stock',
>   'scan.execution.type' = 'streaming',
>   'scan.streams' = 'dbserver1.stock.stock',
>   'format' = 'debezium-json'
> );
>
> CREATE TABLE metadata (
>   id VARCHAR(10) NOT NULL PRIMARY KEY,
>   sector STRING NOT NULL
> ) WITH (
>   'connector' = 'pravega',
>   'controller-uri' = 'tcp://pravega:9090',
>   'scope' = 'stock',
>   'scan.execution.type' = 'streaming',
>   'scan.streams' = 'dbserver1.stock.metadata',
>   'format' = 'debezium-json'
> );
>
> SELECT sector, avg(`value`) as `index` FROM stock INNER JOIN metadata ON
> stock.id=metadata.id GROUP BY sector;
> ```
>
> The sql should return a continuously updating stock average numbers with
> small changes:
>
> And this is exactly what I see in the `SQL Query Result (Table)`.
>
> ```
>                          sector                          index
>                      Technology             115.95986429850261
>               Consumer Cyclical              482.5149917602539
>                     Industrials             167.49693044026694
> ```
>
> But when I change the result-mode from TABLE to CHANGELOG,
> there are 4 rows for each update like:
>
> ```
> -U[Technology, 115.77986653645833]
> +U[Technology, 103.54979705810547]
> -U[Technology, 103.54979705810547]
> +U[Technology, 115.81319681803386]
>
> -U[Technology, 115.81319681803386]
> +U[Technology, 103.42479705810547]
> -U[Technology, 103.42479705810547]
> +U[Technology, 115.82653299967448]
>
> -U[Technology, 115.82653299967448]
> +U[Technology, 140.48730087280273]
> -U[Technology, 140.48730087280273]
> +U[Technology, 115.84153493245442]
>
> -U[Industrials, 167.40963745117188]
> +U[Industrials, 140.22450256347656]
> -U[Industrials, 140.22450256347656]
> +U[Industrials, 167.43633524576822]
> ```
>
> > I insert blank line every 4 lines to get a better visualization.
>
> The technology sector should mostly be around 115 +/- 5,
> but there are 103, 140, and 167 which are wired since stock value can't
> fluctuate like this
> and I can be sure because TABLE and my own python code shows the same
> correct (small changes) outcome.
>
> With many thanks,
> kingofthecity
>
>

Reply via email to