Re: [E] Re: Using Quantile sketches for additive metrics

vijay rajan Tue, 24 May 2022 10:59:59 -0700

Hi Alex,

You are right. I assumed that quantiles would have an intersection like
Theta Sketches but they don't.


If it did based on transaction ID, it would have been cool. So many
traditional data mining & predictive analytics algorithms could be
reimplemented with sketches.

Thanks for your insight.

Vijay

On Mon, May 23, 2022, 22:18 Alexander Saydakov <sayda...@yahooinc.com>
wrote:

> This seems to be a convoluted way of computing the sum of all values. This
> is an additive metric, easy to compute exactly, no sketches needed.
>
> On Sun, May 22, 2022 at 5:56 AM vijay rajan <vijay.sankar.ra...@gmail.com>
> wrote:
>
>> Hi folks (And Lee),
>>
>> I think I have found what I was looking for in quantile sketches though I
>> am not able to formulate error bounds for the same.
>> I should have raised a PR request but I am going to write the code here.
>> The code below estimates the volume of the quantile sketche based on the
>> example
>> https://datasketches.apache.org/docs/Quantiles/QuantilesJavaExample.html
>> <https://urldefense.com/v3/__https://datasketches.apache.org/docs/Quantiles/QuantilesJavaExample.html__;!!Op6eflyXZCqGR5I!CL-lHnjU1bAay6Rk9U0ORoIKw-HDloG81s_Kve9k0arHZpZRyLYDHO0iJGTgIHqo-4SrlYoii5ErA1X6UVOv50QOcuzs$>
>> .
>>
>> The assumption is that the values put in the sketch are all greater than
>> 0. It kind of works like this.
>> 1. Get 100 quantiles using getQuantiles.
>> 2. Get Histogram percentage numbers
>> 3. The average of the boundaries of the quantiles(getMinValue and
>> getMaxValue for the extremities)  will be used with getN to estimate the
>> volume.
>>
>> Code is straightforward. I think this was what I was looking for and
>> honestly I do not care if the max and min values are really extreme
>> compared to the rest of the numbers as my use case is supposed to remove
>> outliers.
>>
>> private static double estimateVolume(DoublesSketch sketch) {
>>     double retVal = 0.0;
>>     int n = 100;
>>
>>     double [] quantiles = sketch.getQuantiles(n);
>>     int totalItems = sketch.getRetainedItems();
>>     double [] histogramCounts = new double[n];
>>
>>     //Arrays.sort(quantiles);
>>     histogramCounts = sketch.getPMF(quantiles);
>>
>>     //compute the volume
>>     retVal += histogramCounts[0] * sketch.getN() * ((sketch.getMinValue() +  
>> quantiles[0]) / 2);
>>     for (int i = 1; i < n-1; i++) {
>>         retVal += histogramCounts[i] * sketch.getN() * (( quantiles[i-1]+  
>> quantiles[i]) / 2);
>>     }
>>     retVal += histogramCounts[n-1] * sketch.getN() * ((sketch.getMaxValue() 
>> +  quantiles[n-1]) / 2);
>>
>>     return retVal;
>> }
>>
>>
>>
>> Regards
>> Vijay
>>
>>
>>
>>
>>
>> On Mon, May 2, 2022 at 10:10 PM Will Lauer <wla...@yahooinc.com> wrote:
>>
>>> OK, this is interesting. I've got some concerns and questions that I've
>>> put inline below.
>>>
>>> Will
>>>
>>>
>>>
>>> Will Lauer
>>>
>>>
>>> Senior Principal Architect, Audience & Advertising Reporting
>>>
>>> Data Platforms & Systems Engineering
>>>
>>>
>>> M 508 561 6427
>>>
>>> Champaign Office
>>>
>>> 1908 S. First St
>>>
>>> Champaign, IL 61822
>>>
>>>
>>>
>>> On Mon, May 2, 2022 at 9:19 AM vijay rajan <vijay.sankar.ra...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Lee. Please find my answers inline below in blue. I think you
>>>> will find my use case very interesting. My next endeavor would be to make a
>>>> decision tree with entropy / gini impurity measures with sketches. I am
>>>> amazed at some of the results I have gotten. You may find this quite
>>>> interesting.
>>>>
>>>> On Sun, May 1, 2022 at 12:55 AM leerho <lee...@gmail.com> wrote:
>>>>
>>>>> Vijay,
>>>>> Sorry about the delay in getting back to you.
>>>>>
>>>>> There is some critical information missing from your description and
>>>>> that is the domain of what you are sketching.
>>>>> I presume that it is User-IDs, otherwise it doesn't make sense.
>>>>> If this is the case I think the solution can be achieved in a couple
>>>>> of ways.  Going forward with this assumption:
>>>>>
>>>>
>>>>> Your raw data would look something like this:
>>>>>
>>>>>> {UserID, AgeGroup, Gender, Country, Impressions, Clicks, AdSpend}
>>>>>
>>>>>
>>>>>
>>>> Actually it would be an event_id instead of user_Id. The domain for the
>>>> data is a hypothetical online advertising summary.
>>>> {EventId, AgeGroup, Gender, Country, Impressions, Clicks, AdSpend}
>>>>
>>>> The problem that we faced in the past was that it was impossible to
>>>> "join" a click to an impression. While the raw impression data would have a
>>>> unique event_id the raw click data would also have its own event_id. The
>>>> dimensions were the same for clicks and impressions. What we did back in
>>>> thge day was aggregate the data for impressions separately and then
>>>> aggregate for clicks separately and then join.
>>>>
>>>> You could assume the same here. Each dimension aggregate tuple
>>>> {AgeGroup, Gender and Country in out case} would have a theta sketch for
>>>> impressions and another for clicks.
>>>>
>>>
>>> I don't see what a "theta sketch for impressions and another for clicks"
>>> would mean here. Theta sketches are for doing distinct counts, so I could
>>> understand a theta sketch for EventId, counting the number of distinct
>>> event ids associated with the AgeGroup/Gender/Country combination. But that
>>> doesn't seem to make sense for clicks or impressions. If your original data
>>> was {EventId, AgeGroup, Gender, Country, Impressions, Clicks, AdSpend}, it
>>> implies to me that clicks and impressions would always be 1 (that is, an
>>> event is either a single click or single impression). When you drop the
>>> EventId dimension, the natural aggregation for these (as well as for
>>> AdSpend) is summation, giving you the total clicks, total impressions (and
>>> total AdSpend) for the Age/Gender/Country combination. If you were to
>>> generate a Theta sketch, it seems like it would be non-sensical, as you all
>>> impressions or clicks inserted would be the same value and the theta sketch
>>> would always return "1".
>>>
>>>>
>>>>
>>>>
>>>>> A proposed solution would be to configure a TupleSketch with 3 fields:
>>>>>  (Impressions, Clicks, AdSpend).
>>>>> (This requires a little programming.  You would need to create classes
>>>>> that implement the interfaces Summary,
>>>>> SummaryFactory and SummarySetOperations.  There are examples of how to
>>>>> do this on the website.)
>>>>>
>>>>>
>>>> I will look into Tuple sketch and the SummaryFactory and Summary Set
>>>> operations. I did try reading it but I cannot say I understood this
>>>> completely. What type of sketch would we use for AdSpend? Would it be
>>>> Quantile?
>>>>
>>>
>>>>
>>>>
>>>>> Then you would process your raw data as follows:
>>>>>
>>>>>    - Partition your raw data into 213 streams defined by the three
>>>>>    dimensions:
>>>>>       - 10 X AgeGroup
>>>>>       - 3 X Gender
>>>>>       - 200 X Country
>>>>>    - Each stream would feed a configured TupleSketch as above. (so
>>>>>    you need only 213 sketches).
>>>>>    - These sketches are in the form of a 3D hypercube.
>>>>>    - Each input tuple would feed 3 sketches, one in each dimension.
>>>>>
>>>>>
>>>> I believe I did something similar. Let me try explaining this with
>>>> concrete examples
>>>>
>>>>    1. This is the raw data
>>>>    
>>>> <https://urldefense.com/v3/__https://drive.google.com/file/d/1zcwZB9KC7cASmM_8mvbPHPQaIeFJNrVw/view?usp=sharing__;!!Op6eflyXZCqGR5I!HkafhezcasvOAcC_LolHeVdS4lQ6RK0hiMgY3wnIiCdOcLxN0a6Xl0vEMxfPXHDWKWlP4zU41o3xAojDzS2zhJlequTMOw$>(please
>>>>    click on the link https://tinyurl.com/3d2r7xjh
>>>>    
>>>> <https://urldefense.com/v3/__https://tinyurl.com/3d2r7xjh__;!!Op6eflyXZCqGR5I!HkafhezcasvOAcC_LolHeVdS4lQ6RK0hiMgY3wnIiCdOcLxN0a6Xl0vEMxfPXHDWKWlP4zU41o3xAojDzS2zhJmz6G7UcA$>
>>>>    ) for a company in India that provides car drivers for hire. The data 
>>>> has
>>>>    most columns as categorical variables except the first which is tripId. 
>>>> The
>>>>    schema looks something like this
>>>>       1. { 
>>>> *tripId*,source,estimated_usage_bins,city,zone,zone_demand_popularity,dayType,pickUpHourOfDay,sla,booking_type,
>>>>       *tripOutcome* }
>>>>       2. tripId the first column, is distinct per row which I put into
>>>>       theta sketches as shown later
>>>>       3. tripOutcome the last column, is a classification variable
>>>>       which has values "bad or good"
>>>>       4. The problem statement is to dod data analysis of combinations
>>>>       of dimensions that produce an unfavourable result that is
>>>>       "tripOutcome=bad/(tripOutcome=bad + tripOutcome=good)" is high
>>>>    2. I was able to populate theta sketches for each dimension and
>>>>    value pair (example "city=Mumbai")  and store them as my effective data
>>>>    representation.
>>>>       1. with 10 as base and hence nominal entries as 2^10 the data is
>>>>       https://tinyurl.com/bkvz4xrh
>>>>       
>>>> <https://urldefense.com/v3/__https://tinyurl.com/bkvz4xrh__;!!Op6eflyXZCqGR5I!HkafhezcasvOAcC_LolHeVdS4lQ6RK0hiMgY3wnIiCdOcLxN0a6Xl0vEMxfPXHDWKWlP4zU41o3xAojDzS2zhJmA8ikIbg$>
>>>>
>>>>       2. with 12 as base and hence nominal entries as 2^12 the data is
>>>>       https://tinyurl.com/f9mzm8ef
>>>>       
>>>> <https://urldefense.com/v3/__https://tinyurl.com/f9mzm8ef__;!!Op6eflyXZCqGR5I!HkafhezcasvOAcC_LolHeVdS4lQ6RK0hiMgY3wnIiCdOcLxN0a6Xl0vEMxfPXHDWKWlP4zU41o3xAojDzS2zhJm3OdhKSA$>
>>>>
>>>>       3. with 14 as base and hence nominal entries as 2^14 the data is
>>>>       https://tinyurl.com/2p8kwj3m
>>>>       
>>>> <https://urldefense.com/v3/__https://tinyurl.com/2p8kwj3m__;!!Op6eflyXZCqGR5I!HkafhezcasvOAcC_LolHeVdS4lQ6RK0hiMgY3wnIiCdOcLxN0a6Xl0vEMxfPXHDWKWlP4zU41o3xAojDzS2zhJkD1YuIMA$>
>>>>       4. The data looks like this
>>>>          1. dimension=value, Base64 encoded Theta sketch formed from
>>>>          unique tripId
>>>>
>>>> I'm not sure what you are sketching here. What is the entry being
>>> inserted into the sketch? Basically, what are you counting? Normally, I
>>> would think of sketching as taking your original data (  
>>> tripId,source,estimated_usage_bins,city,zone,zone_demand_popularity,dayType,pickUpHourOfDay,sla,booking_type,
>>> tripOutcome ) and picking a column (or columns) that you want to count
>>> the distinct values of and generating a new table that contains the
>>> dimensions you care about plus that new sketch. In this case, if you are
>>> counting trips, the resulting table would look like {source,
>>> estimate_usage_bins, city, zone, zone_demand_popularity, dayType,
>>> pickupHourOfDay, sla, booking_type, tripOutcome, sketch(tripId)}. You could
>>> then report on a variety of combinations by dropping dimensions and
>>> aggregating (merging) the sketches.
>>>
>>>>
>>>>    1. Next I run an Apriori like Frequent Itemset algorithm [ECLAT
>>>>    https://towardsdatascience.com/the-eclat-algorithm-8ae3276d2d17
>>>>    
>>>> <https://urldefense.com/v3/__https://towardsdatascience.com/the-eclat-algorithm-8ae3276d2d17__;!!Op6eflyXZCqGR5I!HkafhezcasvOAcC_LolHeVdS4lQ6RK0hiMgY3wnIiCdOcLxN0a6Xl0vEMxfPXHDWKWlP4zU41o3xAojDzS2zhJnr167ghw$>
>>>>    ] to generate a combination of "dimension=value" with a minimum support
>>>>    threshold. The input to this implementation takes the files above at 
>>>> point
>>>>    2. https://tinyurl.com/2p8kwj3m
>>>>    
>>>> <https://urldefense.com/v3/__https://tinyurl.com/2p8kwj3m__;!!Op6eflyXZCqGR5I!HkafhezcasvOAcC_LolHeVdS4lQ6RK0hiMgY3wnIiCdOcLxN0a6Xl0vEMxfPXHDWKWlP4zU41o3xAojDzS2zhJkD1YuIMA$>
>>>>    2. I produced frequent itemsets for both sides i.e tripOutcome=bad
>>>>    and tripOutcome=good and then join the results. The answers I get are
>>>>    listed below
>>>>       1. with input having dimension=value and theta sketch with 2^10
>>>>       nominal entries answers can be found here
>>>>       https://tinyurl.com/yc3a3rde
>>>>       
>>>> <https://urldefense.com/v3/__https://tinyurl.com/yc3a3rde__;!!Op6eflyXZCqGR5I!HkafhezcasvOAcC_LolHeVdS4lQ6RK0hiMgY3wnIiCdOcLxN0a6Xl0vEMxfPXHDWKWlP4zU41o3xAojDzS2zhJnFg2WTnQ$>
>>>>
>>>>       2. with input having dimension=value and theta sketch with 2^12
>>>>       nominal entries answers can be found here
>>>>       https://tinyurl.com/5n7jjr32
>>>>       
>>>> <https://urldefense.com/v3/__https://tinyurl.com/5n7jjr32__;!!Op6eflyXZCqGR5I!HkafhezcasvOAcC_LolHeVdS4lQ6RK0hiMgY3wnIiCdOcLxN0a6Xl0vEMxfPXHDWKWlP4zU41o3xAojDzS2zhJnpRcNAoQ$>
>>>>       3. with input having dimension=value and theta sketch with 2^14
>>>>       nominal entries answers can be found here
>>>>       https://tinyurl.com/3wrxp4a7
>>>>       
>>>> <https://urldefense.com/v3/__https://tinyurl.com/3wrxp4a7__;!!Op6eflyXZCqGR5I!HkafhezcasvOAcC_LolHeVdS4lQ6RK0hiMgY3wnIiCdOcLxN0a6Xl0vEMxfPXHDWKWlP4zU41o3xAojDzS2zhJkCF5fBKg$>
>>>>    3.  The results show that 2^14 (16K) nominal sketches give
>>>>    immaculate results. There are errors in the results from the other 
>>>> sketches
>>>>    (RMSE).
>>>>    4. Sample of results
>>>>       1.
>>>>
>>>>       *level of combination, rule for combination, bad count, good
>>>>       count, percentage of bad count*
>>>>       2.
>>>>
>>>>       4,booking_type=one_way_trip & city=Pune & sla=Scheduled &
>>>>       source=android,26.0,9.0,35.0,74.0
>>>>
>>>>       3,booking_type=one_way_trip & city=Pune &
>>>>       source=android,26.0,10.0,36.0,72.0
>>>>
>>>>       5,city=Delhi NCR & pickUpHourOfDay=VERYLATE & sla=Scheduled &
>>>>       source=ios & 
>>>> zone_demand_popularity=POPULARITY_INDEX_4,17.0,9.0,26.0,65.0
>>>>
>>>>       4,city=Delhi NCR & pickUpHourOfDay=VERYLATE & source=android &
>>>>       zone_demand_popularity=POPULARITY_INDEX_5,15.0,9.0,24.0,63.0
>>>>
>>>>       5,booking_type=one_way_trip & city=Delhi NCR &
>>>>       pickUpHourOfDay=VERYLATE & source=ios &
>>>>       zone_demand_popularity=POPULARITY_INDEX_4,16.0,10.0,26.0,62.0
>>>>
>>>>       3,booking_type=one_way_trip & sla=Scheduled &
>>>>       zone=205,14.0,9.0,23.0,61.0
>>>>
>>>> I am aware from one of Lee's recent youtube videos that theta sketches
>>>> work like Sets(Bit sets) when large enough and the data is small. My use
>>>> case is fine with a count of 100 being reported as 80 or 120 (20%) but not
>>>> much higher. I could use the jaccard formula explained by Lee to get the
>>>> margin of error.
>>>>
>>>>
>>>> Sorry for the details but this should help you suggest if what I am
>>>> doing is indeed what tuple sketches does on its own. I am using update
>>>> sketches for merging using conjunction. Code is here 
>>>> *https://tinyurl.com/4pzfsj6v
>>>> <https://urldefense.com/v3/__https://tinyurl.com/4pzfsj6v__;!!Op6eflyXZCqGR5I!HkafhezcasvOAcC_LolHeVdS4lQ6RK0hiMgY3wnIiCdOcLxN0a6Xl0vEMxfPXHDWKWlP4zU41o3xAojDzS2zhJlcvki1Mg$>*
>>>>
>>>>
>>>>
>>>>
>>>>> A query would choose the desired coordinates of the 3 dimensions.
>>>>> If the query selects more than one coordinate from any one dimension,
>>>>> then you would need to first merge the corresponding coordinate
>>>>> sketches of that dimension together.
>>>>> A missing dimension would mean first merging all coordinate sketches
>>>>> of that dimension together.
>>>>> The final result sketch is an Intersection of the resulting 3
>>>>> dimensional sketches.
>>>>>
>>>>
>>>> Yes. the code here should serve as an example
>>>> https://tinyurl.com/4pzfsj6v
>>>> <https://urldefense.com/v3/__https://tinyurl.com/4pzfsj6v__;!!Op6eflyXZCqGR5I!HkafhezcasvOAcC_LolHeVdS4lQ6RK0hiMgY3wnIiCdOcLxN0a6Xl0vEMxfPXHDWKWlP4zU41o3xAojDzS2zhJlcvki1Mg$>
>>>>
>>>>
>>>>
>>>>>
>>>>> With the resulting Tuple sketch you iterate over all the retained
>>>>> items (Summaries) and sum each of the 3 fields.
>>>>> Then you divide each sum by Theta.  The result is the estimate of
>>>>> Impressions, Clicks, and AdSpend
>>>>> for the population of users selected by the query.  The size of that
>>>>> population would be obtained by getEstimate().
>>>>>
>>>>>
>>>> My question remains. Which base sketch type should I use for an
>>>> additive metric like Ad-spend? Would that be a *quantile sketch*?
>>>>
>>>
>>> I still think this question doesn't make sense. What are you trying to
>>> understand with the sketch? Are you trying to find out the distribution of
>>> adspend for a particular set of dimension values? Then that sounds like a
>>> quantile sketch, Are you trying to find out how many different values of
>>> adspend there were? That's a distinct counting sketch. Are you trying to
>>> find out the most common ad spend values? That would be a frequent items
>>> sketch.
>>>
>>> But really, none of those actually seem to match what you described
>>> above. What you described above sounded like you were simply trying to
>>> calculate the total AdSpend, which just requires sum, not a sketch.
>>>
>>>>
>>>>
>>>>
>>>>> Be aware that Intersections, by their nature, can increase error
>>>>> significantly. (see the Website).
>>>>> Try to avoid doing intersections with sketch coordinates that have
>>>>> very small populations, if possible.
>>>>>
>>>>
>>>> Yes. However if I use very high nominal entries like 2^20 shouldn't
>>>> this get mitigated?
>>>>
>>>
>>> The point of using a sketch is estimation. If you can't tolerate any
>>> error, you shouldn't be using the sketches in the first place. don't just
>>> assume that you SHOULD use a high nominal entries. That said, when using
>>> theta sketches and doing intersections, we do sometimes use values like
>>> 2^20 for nominal entries, but often it is overkill. Remember, in the case
>>> of intersections, you get high error when the intersection size is small
>>> compared with the overall set sizes, The case where you see high relative
>>> error are the cases where you are looking at small intersections. In most
>>> use cases, a relative error of 1,000% or even 10,000%  on an intersection
>>> size of 10 when comparing sets of a billion is still a reasonable error
>>> given how small the _actual_ value is.
>>>
>>>>
>>>>
>>>>
>>>>> You can get a sense of how good your results are by utilizing the
>>>>> getUpperBound() and getLowerBound()
>>>>> (with respect to the user population.).  The bigger that spread is,
>>>>> the worse your estimates will be.
>>>>>
>>>>> Thanks. This is perfect. I do not need to use the Jaccard to get my
>>>> estimate
>>>>
>>>>
>>>>
>>>>> The other option would be to do the cross-product upfront and create
>>>>> 6000 Tuple sketches.
>>>>> In this case the input tuple would feed only one sketch and the query
>>>>> would be to only one sketch.
>>>>>
>>>>
>>>> Since I am not using SQL queries and have my own unique use case, my
>>>> questions are quite different.
>>>>
>>>>
>>>>> This can quickly get unwieldy with more dimensions or high cardinality
>>>>> dimensions.
>>>>> It is more accurate because it avoids the intersections.  It is also
>>>>> possible to do a mix of
>>>>> these two approaches, but you would need to think it through carefully.
>>>>>
>>>>> I hope this helps.
>>>>>
>>>>> Lee.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 28, 2022 at 1:35 AM vijay rajan <
>>>>> vijay.sankar.ra...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>> Just like theta sketches are used for distinct count metrics like
>>>>>> impressions and clicks, is there a sketch (perhaps quantile?) that can be
>>>>>> used for metrics like ad_spend? If so, what are the error bounds?
>>>>>>
>>>>>> There is a big opportunity that I see in storing very little data in
>>>>>> sketches (which I use as a set) which makes retrieval of 
>>>>>> aggregate/analytic
>>>>>> data very fast (although it is approximate).
>>>>>>
>>>>>> This question is best explained with an example. Say I have a fact
>>>>>> table schema as follows
>>>>>>
>>>>>>
>>>>>>    1. *Age_Groups* is a dimension with 10 bucketed distinct values
>>>>>>    {less than 18, 19-25, 26-35....}
>>>>>>    2. *Gender* is a dimension with 3 distinct values F, M & Unknown
>>>>>>    3. *Country* is a  dimension with 200 possible values
>>>>>>    4. *Impressions* is a metric which is a long count (perhaps with
>>>>>>    Theta sketch or HLL)
>>>>>>    5. *Clicks* is a metric which is a long count (perhaps with Theta
>>>>>>    sketch or HLL)
>>>>>>    6. *Ad-Spend* is a metric which is a double which is a sum (*perhaps
>>>>>>    with quantile sketch??*)
>>>>>>
>>>>>>
>>>>>> The maximum possible number of entries in this table would be the
>>>>>> cross product of the cardinalities of the dimensions which is 
>>>>>> 10(Age_Group)
>>>>>> x 3(Gender) x 200(Country) = 6000. Now instead of storing 6000 records, I
>>>>>> could store only (10 + 3 + 200) * *3 sketches(**one each for
>>>>>> impression, clicks and Ad-Spend)* = 639 sketches and accomplish the
>>>>>> group by queries using set operations that sketches provides.
>>>>>>
>>>>>> For impression metric, one sketch for Gender=F, another sketch for
>>>>>> Gender=M, yet another for Gender=Unknown and so on for other dimensions 
>>>>>> as
>>>>>> well.
>>>>>> For click metric, one sketch for Gender=F, another sketch for
>>>>>> Gender=M, yet another for Gender=Unknown and so on for other dimensions 
>>>>>> as
>>>>>> well.
>>>>>> Question is what sketch would I need for ad-spend?
>>>>>>
>>>>>> On a side note, I have used Theta sketches in this manner and even
>>>>>> implemented ECLAT for frequent itemset computations and association rule
>>>>>> mining ... example from another project below with theta sketches used 
>>>>>> for
>>>>>> count, I do not know how to do the same for a metric like Ad-Spend.
>>>>>>
>>>>>> Level Conjunction Count
>>>>>> 2 H10 & MemberGender=F 74
>>>>>> 2 M15 & MemberGender=F 83
>>>>>> 2 G31 & MemberGender=F 59
>>>>>> 2 MemberGender=F & R13 66
>>>>>> *In the example above, H10, M15 etc are International Disease codes
>>>>>> for specific diseases. *https://www.aapc.com/codes/code-search/
>>>>>> <https://urldefense.com/v3/__https://www.aapc.com/codes/code-search/__;!!Op6eflyXZCqGR5I!HkafhezcasvOAcC_LolHeVdS4lQ6RK0hiMgY3wnIiCdOcLxN0a6Xl0vEMxfPXHDWKWlP4zU41o3xAojDzS2zhJnBPvqOMw$>
>>>>>>
>>>>>> Hope this is a clear representation of the problem.
>>>>>>
>>>>>> Regards
>>>>>> Vijay
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>

Re: [E] Re: Using Quantile sketches for additive metrics

Reply via email to