Hi Maciek, is there a typo in the input data? Timestamp 2021-05-01 04:42:57 appears twice, but timestamp 2021-05-01T15:28:34 (from the log lines) is not there at all. I find it hard to correlate the logs with the input...
Best regards, Nico On Wed, Jul 7, 2021 at 11:16 AM Arvid Heise <ar...@apache.org> wrote: > Hi Maciek, > > could you bypass the MATCH_RECOGNIZE (=comment out) and check if the > records appear in a shortcutted output? > > I suspect that they may be filtered out before (for example because of > number conversion issues with 0E-18) > > On Tue, Jul 6, 2021 at 3:26 PM Maciek Bryński <mac...@brynski.pl> wrote: > >> Hi, >> I have a very strange bug when using MATCH_RECOGNIZE. >> >> I'm using some joins and unions to create event stream. Sample event >> stream (for one user) looks like this: >> >> uuid cif event_type v balance ts >> 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 trx >> 4294.380000000000000000 74.524950000000000000 2021-05-01 04:42:57 >> 7b2bc022-b069-41ca-8bbf-e93e3f0e85a7 0004091386 application >> 0E-18 74.524950000000000000 2021-05-01 10:29:10 >> 942cd3ce-fb3d-43d3-a69a-aaeeec5ee90e 0004091386 application >> 0E-18 74.524950000000000000 2021-05-01 10:39:02 >> 433ac9bc-d395-457n-986c-19e30e375f2e 0004091386 trx >> 4294.380000000000000000 74.524950000000000000 2021-05-01 04:42:57 >> >> Then I'm using following MATCH_RECOGNIZE definition (trace function will >> be explained later) >> >> CREATE VIEW scenario_1 AS ( >> SELECT * FROM events >> MATCH_RECOGNIZE( >> PARTITION BY cif >> ORDER BY ts >> MEASURES >> TRX.v as trx_amount, >> TRX.ts as trx_ts, >> APP_1.ts as app_1_ts, >> APP_2.ts as app_2_ts, >> APP_2.balance as app_2_balance >> ONE ROW PER MATCH >> PATTERN (TRX ANY_EVENT*? APP_1 NOT_LOAN*? APP_2) WITHIN INTERVAL >> '10' DAY >> DEFINE >> TRX AS trace(TRX.event_type = 'trx' AND TRX.v > 1000, >> 'TRX', TRX.uuid, TRX.cif, TRX.event_type, TRX.ts), >> ANY_EVENT AS trace(true, >> 'ANY_EVENT', TRX.uuid, ANY_EVENT.cif, >> ANY_EVENT.event_type, ANY_EVENT.ts), >> APP_1 AS trace(APP_1.event_type = 'application' AND APP_1.ts < >> TRX.ts + INTERVAL '3' DAY, >> 'APP_1', TRX.uuid, APP_1.cif, APP_1.event_type, >> APP_1.ts), >> APP_2 AS trace(APP_2.event_type = 'application' AND APP_2.ts > >> APP_1.ts >> AND APP_2.ts < APP_1.ts + INTERVAL '7' DAY AND >> APP_2.balance < 100, >> 'APP_2', TRX.uuid, APP_2.cif, APP_2.event_type, >> APP_2.ts), >> NOT_LOAN AS trace(NOT_LOAN.event_type <> 'loan', >> 'NOT_LOAN', TRX.uuid, NOT_LOAN.cif, >> NOT_LOAN.event_type, NOT_LOAN.ts) >> )) >> >> >> This scenario could be matched by sample events because: >> - TRX is matched by event with ts 2021-05-01 04:42:57 >> - APP_1 by ts 2021-05-01 10:29:10 >> - APP_2 by ts 2021-05-01 10:39:02 >> Unfortunately I'm not getting any data. And it's not watermarks fault. >> >> Trace function has following code and gives me some logs: >> >> public class TraceUDF extends ScalarFunction { >> >> public Boolean eval(Boolean condition, @DataTypeHint(inputGroup = >> InputGroup.ANY) Object ... message) { >> log.info((condition ? "Condition true: " : "Condition false: ") >> + Arrays.stream(message).map(Object::toString).collect(Collectors.joining(" >> "))); >> return condition; >> } >> } >> >> And log from this trace function is following. >> >> 2021-07-06 13:09:43,762 INFO TraceUDF [] - >> Condition true: TRX 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 trx >> 2021-05-01T04:42:57 >> 2021-07-06 13:12:28,914 INFO TraceUDF [] - >> Condition true: ANY_EVENT 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 >> trx 2021-05-01T15:28:34 >> 2021-07-06 13:12:28,915 INFO TraceUDF [] - >> Condition false: APP_1 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 trx >> 2021-05-01T15:28:34 >> 2021-07-06 13:12:28,915 INFO TraceUDF [] - >> Condition false: TRX 433ac9bc-d395-457n-986c-19e30e375f2e 0004091386 trx >> 2021-05-01T15:28:34 >> >> As you can see 2 events are missing. >> What can I do ? >> I failed with create minimal example of this bug. Any other ideas ? >> >