[ 
https://issues.apache.org/jira/browse/FLINK-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179276#comment-16179276
 ] 

ASF GitHub Bot commented on FLINK-7465:
---------------------------------------

Github user jparkie commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4652#discussion_r140828795
  
    --- Diff: 
flink-libraries/flink-table/src/main/java/org/apache/flink/table/runtime/functions/aggfunctions/cardinality/ICardinality.java
 ---
    @@ -0,0 +1,80 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.table.runtime.functions.aggfunctions.cardinality;
    +
    +import java.io.IOException;
    +
    +/**
    + * An interface definition for implementation of cardinality.
    + */
    +public interface ICardinality {
    +
    +   /**
    +    * Check whether the element is impact estimate.
    +    *
    +    * @param o stream element
    +    * @return false if the value returned by cardinality() is unaffected 
by the appearance of o in the stream.
    +    */
    +   boolean offer(Object o);
    +
    +   /**
    +    * Offer the value as a hashed long value.
    +    *
    +    * @param hashedLong - the hash of the item to offer to the estimator
    +    * @return false if the value returned by cardinality() is unaffected 
by the appearance of hashedLong in the stream
    +    */
    +   boolean offerHashed(long hashedLong);
    +
    +   /**
    +    * Offer the value as a hashed long value.
    +    *
    +    * @param hashedInt - the hash of the item to offer to the estimator
    +    * @return false if the value returned by cardinality() is unaffected 
by the appearance of hashedInt in the stream
    +    */
    +   boolean offerHashed(int hashedInt);
    +
    +   /**
    +    * @return the number of unique elements in the stream or an estimate 
thereof.
    +    */
    +   long cardinality();
    +
    +   /**
    +    * @return size in bytes needed for serialization.
    +    */
    +   int sizeof();
    +
    +   /**
    +    * Get the byte array used for the calculation.
    +    *
    +    * @return The byte array used for the calculation
    +    * @throws IOException
    +    */
    +   byte[] getBytes() throws IOException;
    +
    +   /**
    +    * Merges estimators to produce a new estimator for the combined streams
    +    * of this estimator and those passed as arguments.
    +    * <p/>
    +    * Nor this estimator nor the one passed as parameters are modified.
    +    *
    +    * @param estimators Zero or more compatible estimators
    +    * @throws Exception If at least one of the estimators is not 
compatible with this one
    +    */
    +   ICardinality merge(ICardinality... estimators) throws Exception;
    --- End diff --
    
    Wouldn't it be nicer to have a more specific Exception?


> Add build-in BloomFilterCount on TableAPI&SQL
> ---------------------------------------------
>
>                 Key: FLINK-7465
>                 URL: https://issues.apache.org/jira/browse/FLINK-7465
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API & SQL
>            Reporter: sunjincheng
>            Assignee: sunjincheng
>         Attachments: bloomfilter.png
>
>
> In this JIRA. use BloomFilter to implement counting functions.
> BloomFilter Algorithm description:
> An empty Bloom filter is a bit array of m bits, all set to 0. There must also 
> be k different hash functions defined, each of which maps or hashes some set 
> element to one of the m array positions, generating a uniform random 
> distribution. Typically, k is a constant, much smaller than m, which is 
> proportional to the number of elements to be added; the precise choice of k 
> and the constant of proportionality of m are determined by the intended false 
> positive rate of the filter.
> To add an element, feed it to each of the k hash functions to get k array 
> positions. Set the bits at all these positions to 1.
> To query for an element (test whether it is in the set), feed it to each of 
> the k hash functions to get k array positions. If any of the bits at these 
> positions is 0, the element is definitely not in the set – if it were, then 
> all the bits would have been set to 1 when it was inserted. If all are 1, 
> then either the element is in the set, or the bits have by chance been set to 
> 1 during the insertion of other elements, resulting in a false positive.
> An example of a Bloom filter, representing the set {x, y, z}. The colored 
> arrows show the positions in the bit array that each set element is mapped 
> to. The element w is not in the set {x, y, z}, because it hashes to one 
> bit-array position containing 0. For this figure, m = 18 and k = 3. The 
> sketch as follows:
> !bloomfilter.png!
> Reference:
> 1. https://en.wikipedia.org/wiki/Bloom_filter
> 2. 
> https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hive/common/util/BloomFilter.java
> Hi [~fhueske] [~twalthr] I appreciated if you can give me some advice. :-)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to