Hey Team,

We're trying to implement an aggregation which involves *several trillions
of rows *using apache beam sql.
However I'm getting an exception
Exception in thread "main" java.lang.UnsupportedOperationException: Does
not support COUNT DISTINCT

Here's the code for doing the aggregation:

PCollection<Row> aggregate = joinedCollection.apply("Aggregation",
        SqlTransform.query("SELECT" +
                "        exchange_name as adexchange," +
                "        strategy," +
                "        platform," +
                "        segment," +
                "        auction_type," +
                "        placement_type," +
                "        country," +
                "        COALESCE(loss, 0) AS loss_code," +
                "        COUNT(DISTINCT identifier) AS uniques," +
                "        no_bid_reason," +
                "        SUM(1) AS auctions," +
                "        SUM(CASE WHEN cpm_bid > 0 THEN 1 ELSE 0 END)
AS bids," +
                "        SUM(cpm_bid) AS total_bid_price," +
                "        SUM(CASE WHEN loss = 0 THEN 1 END) AS wins," +
                "        app_bundle AS app_bundle," +
                "        model_id AS model_id," +
                "        identifier_type AS identifier_type," +
                "        promotion_id AS promotion_id," +
                "        sub_floor_bid_min_price_cohort AS
sub_floor_bid_min_price_cohort," +
                "        bf_match_experiment AS bf_match_experiment," +
                "        bep_matched_floor AS bep_matched_floor," +
                "        SUM(p_ctr) AS p_ctr_total," +
                "        SUM(p_ir) AS p_ir_total," +
                "        SUM(p_cpa) AS p_cpa_total," +
                "        SUM(arppu) AS arppu_total," +
                "        SUM(spend) AS spend_total," +
                "        SUM(cpm_price) AS cpm_price_total" +
                "    FROM" +
                "        PCOLLECTION" +
                "    GROUP BY
exchange_name,strategy,platform,segment,auction_type" +
                ",placement_type,country,loss,no_bid_reason,app_bundle" +

",model_id,identifier_type,promotion_id,sub_floor_bid_min_price_cohort"
+
                ",bf_match_experiment,bep_matched_floor")
);


Can you please guide us?

Let me know in case you need any more information.

Goutham Miryala
Senior Data Engineer

<http://chartboost.com/>

Reply via email to