Hey Team, We're trying to implement an aggregation which involves *several trillions of rows *using apache beam sql. However I'm getting an exception Exception in thread "main" java.lang.UnsupportedOperationException: Does not support COUNT DISTINCT
Here's the code for doing the aggregation: PCollection<Row> aggregate = joinedCollection.apply("Aggregation", SqlTransform.query("SELECT" + " exchange_name as adexchange," + " strategy," + " platform," + " segment," + " auction_type," + " placement_type," + " country," + " COALESCE(loss, 0) AS loss_code," + " COUNT(DISTINCT identifier) AS uniques," + " no_bid_reason," + " SUM(1) AS auctions," + " SUM(CASE WHEN cpm_bid > 0 THEN 1 ELSE 0 END) AS bids," + " SUM(cpm_bid) AS total_bid_price," + " SUM(CASE WHEN loss = 0 THEN 1 END) AS wins," + " app_bundle AS app_bundle," + " model_id AS model_id," + " identifier_type AS identifier_type," + " promotion_id AS promotion_id," + " sub_floor_bid_min_price_cohort AS sub_floor_bid_min_price_cohort," + " bf_match_experiment AS bf_match_experiment," + " bep_matched_floor AS bep_matched_floor," + " SUM(p_ctr) AS p_ctr_total," + " SUM(p_ir) AS p_ir_total," + " SUM(p_cpa) AS p_cpa_total," + " SUM(arppu) AS arppu_total," + " SUM(spend) AS spend_total," + " SUM(cpm_price) AS cpm_price_total" + " FROM" + " PCOLLECTION" + " GROUP BY exchange_name,strategy,platform,segment,auction_type" + ",placement_type,country,loss,no_bid_reason,app_bundle" + ",model_id,identifier_type,promotion_id,sub_floor_bid_min_price_cohort" + ",bf_match_experiment,bep_matched_floor") ); Can you please guide us? Let me know in case you need any more information. Goutham Miryala Senior Data Engineer <http://chartboost.com/>