Hi Maciek,

is there a typo in the input data? Timestamp 2021-05-01 04:42:57 appears
twice, but timestamp 2021-05-01T15:28:34 (from the log lines) is not there
at all. I find it hard to correlate the logs with the input...

Best regards,
Nico

On Wed, Jul 7, 2021 at 11:16 AM Arvid Heise <ar...@apache.org> wrote:

> Hi Maciek,
>
> could you bypass the MATCH_RECOGNIZE (=comment out) and check if the
> records appear in a shortcutted output?
>
> I suspect that they may be filtered out before (for example because of
> number conversion issues with 0E-18)
>
> On Tue, Jul 6, 2021 at 3:26 PM Maciek Bryński <mac...@brynski.pl> wrote:
>
>> Hi,
>> I have a very strange bug when using MATCH_RECOGNIZE.
>>
>> I'm using some joins and unions to create event stream. Sample event
>> stream (for one user) looks like this:
>>
>> uuid    cif     event_type      v       balance ts
>> 621456e9-389b-409b-aaca-bca99eeb43b3    0004091386      trx
>>  4294.380000000000000000 74.524950000000000000   2021-05-01 04:42:57
>> 7b2bc022-b069-41ca-8bbf-e93e3f0e85a7    0004091386      application
>>  0E-18   74.524950000000000000   2021-05-01 10:29:10
>> 942cd3ce-fb3d-43d3-a69a-aaeeec5ee90e    0004091386      application
>>  0E-18   74.524950000000000000   2021-05-01 10:39:02
>> 433ac9bc-d395-457n-986c-19e30e375f2e    0004091386      trx
>>  4294.380000000000000000 74.524950000000000000   2021-05-01 04:42:57
>>
>> Then I'm using following MATCH_RECOGNIZE definition (trace function will
>> be explained later)
>>
>> CREATE VIEW scenario_1 AS (
>> SELECT * FROM events
>>     MATCH_RECOGNIZE(
>>         PARTITION BY cif
>>         ORDER BY ts
>>         MEASURES
>>             TRX.v as trx_amount,
>>             TRX.ts as trx_ts,
>>             APP_1.ts as app_1_ts,
>>             APP_2.ts as app_2_ts,
>>             APP_2.balance as app_2_balance
>>         ONE ROW PER MATCH
>>         PATTERN (TRX ANY_EVENT*? APP_1 NOT_LOAN*? APP_2) WITHIN INTERVAL
>> '10' DAY
>>         DEFINE
>>         TRX AS trace(TRX.event_type = 'trx' AND TRX.v > 1000,
>>                   'TRX', TRX.uuid, TRX.cif, TRX.event_type, TRX.ts),
>>         ANY_EVENT AS trace(true,
>>                   'ANY_EVENT', TRX.uuid, ANY_EVENT.cif,
>> ANY_EVENT.event_type, ANY_EVENT.ts),
>>         APP_1 AS trace(APP_1.event_type = 'application' AND APP_1.ts <
>> TRX.ts + INTERVAL '3' DAY,
>>                   'APP_1', TRX.uuid, APP_1.cif, APP_1.event_type,
>> APP_1.ts),
>>         APP_2 AS trace(APP_2.event_type = 'application' AND APP_2.ts >
>> APP_1.ts
>>                    AND APP_2.ts < APP_1.ts + INTERVAL '7' DAY AND
>> APP_2.balance < 100,
>>                   'APP_2', TRX.uuid, APP_2.cif, APP_2.event_type,
>> APP_2.ts),
>>         NOT_LOAN AS trace(NOT_LOAN.event_type <> 'loan',
>>                   'NOT_LOAN', TRX.uuid, NOT_LOAN.cif,
>> NOT_LOAN.event_type, NOT_LOAN.ts)
>>     ))
>>
>>
>> This scenario could be matched by sample events because:
>> - TRX is matched by event with ts 2021-05-01 04:42:57
>> - APP_1 by ts 2021-05-01 10:29:10
>> - APP_2 by ts 2021-05-01 10:39:02
>> Unfortunately I'm not getting any data. And it's not watermarks fault.
>>
>> Trace function has following code and gives me some logs:
>>
>> public class TraceUDF extends ScalarFunction {
>>
>>     public Boolean eval(Boolean condition, @DataTypeHint(inputGroup =
>> InputGroup.ANY) Object ... message) {
>>         log.info((condition ? "Condition true: " : "Condition false: ")
>> + Arrays.stream(message).map(Object::toString).collect(Collectors.joining("
>> ")));
>>         return condition;
>>     }
>> }
>>
>> And log from this trace function is following.
>>
>> 2021-07-06 13:09:43,762 INFO TraceUDF                             [] -
>> Condition true: TRX 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 trx
>> 2021-05-01T04:42:57
>> 2021-07-06 13:12:28,914 INFO  TraceUDF                             [] -
>> Condition true: ANY_EVENT 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386
>> trx 2021-05-01T15:28:34
>> 2021-07-06 13:12:28,915 INFO  TraceUDF                             [] -
>> Condition false: APP_1 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 trx
>> 2021-05-01T15:28:34
>> 2021-07-06 13:12:28,915 INFO  TraceUDF                             [] -
>> Condition false: TRX 433ac9bc-d395-457n-986c-19e30e375f2e 0004091386 trx
>> 2021-05-01T15:28:34
>>
>> As you can see 2 events are missing.
>> What can I do ?
>> I failed with create minimal example of this bug. Any other ideas ?
>>
>

Reply via email to