RalfJL opened a new issue, #12949:
URL: https://github.com/apache/pinot/issues/12949
Currently I am investigating if Pinot can help us to calculate inbytes and
outbytes from Radius accounting data.
The setup is:
sysbench (produces artificial accounting data) -> freeradius -> kafka ->
pinot
The randomness of the artificial data is low so we end up with several
"sessions" having the same Acct-Unique-Session-ID.
e.g. query:
```
select a.acctuniquesessionid, a.acctstatustype, a.eventtime,
a.eventtimestamp from radius_json a
where a.acctuniquesessionid = 'da0f86138ec36b2364889f048bf2ac82'
order by a.eventtime
```
returns:
```
acctuniquesessionid acctstatustype eventtime eventtimestamp
da0f86138ec36b2364889f048bf2ac82 Start 1712050245468 Apr 2 2024
09:30:45 UTC
da0f86138ec36b2364889f048bf2ac82 Stop 1712050293427 Apr 2 2024
09:31:33 UTC
da0f86138ec36b2364889f048bf2ac82 Start 1712759564017 Apr 10 2024
14:32:43 UTC
da0f86138ec36b2364889f048bf2ac82 Interim-Update 1712759589307 Apr 10
2024 14:33:09 UTC
da0f86138ec36b2364889f048bf2ac82 Interim-Update 1712759612449 Apr 10
2024 14:33:32 UTC
da0f86138ec36b2364889f048bf2ac82 Interim-Update 1712759616370 Apr 10
2024 14:33:36 UTC
da0f86138ec36b2364889f048bf2ac82 Stop 1712759668126 Apr 10 2024
14:34:27 UTC
```
Obviously there are two different sessions with the same acctuniquesessionid
in the data.
In real life we can not guarantee that Session ID's are always unique, so
this might also happen in real life data.
The solution is to filter out all 'Start' - 'Stop' combinations where there
is a 'Start' or a 'Stop' in between.
And here comes the problem:
query:
```
select a.acctuniquesessionid, (a.eventtime - b.eventtime)/1000 as Zeit,
a.acctstatustype, b.acctstatustype, a.eventtime, b.eventtime
,( select count(c.eventtime) from radius_json c where b.acctuniquesessionid
= c.acctuniquesessionid
and a.acctuniquesessionid =
c.acctuniquesessionid
and a.eventtime > 0 and b.eventtime > 0
and (c.acctstatustype = 'Stop' or
c.acctstatustype = 'Start' )
and c.eventtime between b.eventtime+1 and
a.eventtime-1
)
from radius_json a
join radius_json b
on b.acctuniquesessionid = a.acctuniquesessionid
and a.acctstatustype = 'Stop' and b.acctstatustype = 'Start' -- and
a._3gppimsi <> 'null'
and a.eventtime > b.eventtime
where
not exists ( select 1 from radius_json c where b.acctuniquesessionid =
c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid
and a.eventtime > 0 and b.eventtime > 0
and (c.acctstatustype = 'Stop' or
c.acctstatustype = 'Start' )
and c.eventtime between b.eventtime+1 and
a.eventtime-1
)
-- and a.acctuniquesessionid =
'fac07ae7853810946d87e868e463af2c'
-- and a.acctuniquesessionid =
'ac6baa744130522c1eb1eec161114d1b'
and a.acctuniquesessionid = 'da0f86138ec36b2364889f048bf2ac82'
order by Zeit desc
```
Returns:
```
acctuniquesessionid Zeit acctstatustype acctstatustype eventtime
eventtime EXPR$4
da0f86138ec36b2364889f048bf2ac82 104 Stop Start 1712759668126
1712759564017 0
da0f86138ec36b2364889f048bf2ac82 47 Stop Start 1712050293427
1712050245468 0
```
The two sessions as expected.
Removing the filter a.acctuniquesessionid =
'da0f86138ec36b2364889f048bf2ac82' shows a very different result
query:
```
select a.acctuniquesessionid, (a.eventtime - b.eventtime)/1000 as Zeit,
a.acctstatustype, b.acctstatustype, a.eventtime, b.eventtime
,( select count(c.eventtime) from radius_json c where b.acctuniquesessionid
= c.acctuniquesessionid
and a.acctuniquesessionid =
c.acctuniquesessionid
and a.eventtime > 0 and b.eventtime > 0
and (c.acctstatustype = 'Stop' or
c.acctstatustype = 'Start' )
and c.eventtime between b.eventtime+1 and
a.eventtime-1
)
from radius_json a
join radius_json b
on b.acctuniquesessionid = a.acctuniquesessionid
and a.acctstatustype = 'Stop' and b.acctstatustype = 'Start' -- and
a._3gppimsi <> 'null'
and a.eventtime > b.eventtime
where
not exists ( select 1 from radius_json c where b.acctuniquesessionid =
c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid
and a.eventtime > 0 and b.eventtime > 0
and (c.acctstatustype = 'Stop' or
c.acctstatustype = 'Start' )
and c.eventtime between b.eventtime+1 and
a.eventtime-1
)
-- and a.acctuniquesessionid =
'fac07ae7853810946d87e868e463af2c'
-- and a.acctuniquesessionid =
'ac6baa744130522c1eb1eec161114d1b'
-- and a.acctuniquesessionid =
'da0f86138ec36b2364889f048bf2ac82'
order by Zeit desc
```
result (shortened by thousands of other sessions):
```
da0f86138ec36b2364889f048bf2ac82 709422 Stop Start 1712759668126
1712050245468 0
```
So here obviously both subqueries return wrong results. There is a 'Start'
and a 'Stop' between timestamp 1712759668126 and 1712050245468 but it is not
detected anymore when the filter for the acctuniquessionid is removed
Am I missing here something or am I hitting a bug in multistage engine?
I am using Pinot 1.1.0 as docker compose image (single instance of all
server processes)
The query plan for both queries is the same beside the Logical Filter
first query:
LogicalFilter(condition=[AND(=($14,
_UTF-8'da0f86138ec36b2364889f048bf2ac82'), =($12, _UTF-8'Stop'))])
second query:
LogicalFilter(condition=[=($12, _UTF-8'Stop')])
PLEASE NOTE: Some filters are redundant like "b.acctuniquesessionid =
c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid" and
are a test if the result changes. But it doesn't
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]