Hi Maciek,

could you bypass the MATCH_RECOGNIZE (=comment out) and check if the
records appear in a shortcutted output?

I suspect that they may be filtered out before (for example because of
number conversion issues with 0E-18)

On Tue, Jul 6, 2021 at 3:26 PM Maciek Bryński <mac...@brynski.pl> wrote:

> Hi,
> I have a very strange bug when using MATCH_RECOGNIZE.
>
> I'm using some joins and unions to create event stream. Sample event
> stream (for one user) looks like this:
>
> uuid    cif     event_type      v       balance ts
> 621456e9-389b-409b-aaca-bca99eeb43b3    0004091386      trx
>  4294.380000000000000000 74.524950000000000000   2021-05-01 04:42:57
> 7b2bc022-b069-41ca-8bbf-e93e3f0e85a7    0004091386      application
>  0E-18   74.524950000000000000   2021-05-01 10:29:10
> 942cd3ce-fb3d-43d3-a69a-aaeeec5ee90e    0004091386      application
>  0E-18   74.524950000000000000   2021-05-01 10:39:02
> 433ac9bc-d395-457n-986c-19e30e375f2e    0004091386      trx
>  4294.380000000000000000 74.524950000000000000   2021-05-01 04:42:57
>
> Then I'm using following MATCH_RECOGNIZE definition (trace function will
> be explained later)
>
> CREATE VIEW scenario_1 AS (
> SELECT * FROM events
>     MATCH_RECOGNIZE(
>         PARTITION BY cif
>         ORDER BY ts
>         MEASURES
>             TRX.v as trx_amount,
>             TRX.ts as trx_ts,
>             APP_1.ts as app_1_ts,
>             APP_2.ts as app_2_ts,
>             APP_2.balance as app_2_balance
>         ONE ROW PER MATCH
>         PATTERN (TRX ANY_EVENT*? APP_1 NOT_LOAN*? APP_2) WITHIN INTERVAL
> '10' DAY
>         DEFINE
>         TRX AS trace(TRX.event_type = 'trx' AND TRX.v > 1000,
>                   'TRX', TRX.uuid, TRX.cif, TRX.event_type, TRX.ts),
>         ANY_EVENT AS trace(true,
>                   'ANY_EVENT', TRX.uuid, ANY_EVENT.cif,
> ANY_EVENT.event_type, ANY_EVENT.ts),
>         APP_1 AS trace(APP_1.event_type = 'application' AND APP_1.ts <
> TRX.ts + INTERVAL '3' DAY,
>                   'APP_1', TRX.uuid, APP_1.cif, APP_1.event_type,
> APP_1.ts),
>         APP_2 AS trace(APP_2.event_type = 'application' AND APP_2.ts >
> APP_1.ts
>                    AND APP_2.ts < APP_1.ts + INTERVAL '7' DAY AND
> APP_2.balance < 100,
>                   'APP_2', TRX.uuid, APP_2.cif, APP_2.event_type,
> APP_2.ts),
>         NOT_LOAN AS trace(NOT_LOAN.event_type <> 'loan',
>                   'NOT_LOAN', TRX.uuid, NOT_LOAN.cif, NOT_LOAN.event_type,
> NOT_LOAN.ts)
>     ))
>
>
> This scenario could be matched by sample events because:
> - TRX is matched by event with ts 2021-05-01 04:42:57
> - APP_1 by ts 2021-05-01 10:29:10
> - APP_2 by ts 2021-05-01 10:39:02
> Unfortunately I'm not getting any data. And it's not watermarks fault.
>
> Trace function has following code and gives me some logs:
>
> public class TraceUDF extends ScalarFunction {
>
>     public Boolean eval(Boolean condition, @DataTypeHint(inputGroup =
> InputGroup.ANY) Object ... message) {
>         log.info((condition ? "Condition true: " : "Condition false: ") +
> Arrays.stream(message).map(Object::toString).collect(Collectors.joining("
> ")));
>         return condition;
>     }
> }
>
> And log from this trace function is following.
>
> 2021-07-06 13:09:43,762 INFO TraceUDF                             [] -
> Condition true: TRX 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 trx
> 2021-05-01T04:42:57
> 2021-07-06 13:12:28,914 INFO  TraceUDF                             [] -
> Condition true: ANY_EVENT 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386
> trx 2021-05-01T15:28:34
> 2021-07-06 13:12:28,915 INFO  TraceUDF                             [] -
> Condition false: APP_1 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 trx
> 2021-05-01T15:28:34
> 2021-07-06 13:12:28,915 INFO  TraceUDF                             [] -
> Condition false: TRX 433ac9bc-d395-457n-986c-19e30e375f2e 0004091386 trx
> 2021-05-01T15:28:34
>
> As you can see 2 events are missing.
> What can I do ?
> I failed with create minimal example of this bug. Any other ideas ?
>

Reply via email to