michaelritsema commented on issue #28610: URL: https://github.com/apache/superset/issues/28610#issuecomment-2147846348
If you had a denormalized table with a persons information called PersonSightings with a row like: person_id:1,f_name:evan,l_name:r, address:"343232 W. Palm St Jacksonville,Flordia",:sighting_timestamp:2024-01-01 imagine there was a new row for everytime a camera spotted the person You would want to take advantage of the fact that most of the columsn are functionality depenendt on person_id Currently to get a report on distinct sightings you'd need to add most of the columns as a dimension which generates: SELECT person_id,f_name,l_name, address,sighting_timestamp,count(distinct sighting_timestamp) from PersonSightings GROUP BY person_id,f_name,l_name, address,sighting_timestamp The main problem is the GROUP BY is very inefficient. I'd want SQL Like SELECT person_id,any(f_name),any(l_name), any(address),any(sighting_timestamp),count(distinct sighting_timestamp) from PersonSightings GROUP BY person_id To do this I need to turn those dimensions into metrics and add custom sql. But they should be treated as dimensions not metrics. So I proposed a way to put an aggregate around a dimension so superset can apply ANY or FIRST (whatever the database supports) In my use case I actually have a few dozen of these and using clickhouse which can be exponentially faster with smaller GROUP BYs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
