GitHub user rmatharu opened a pull request:

    https://github.com/apache/samza/pull/796

    Enabling auto-discovery of regex input topics

    This PR makes the following changes 
    
    * Enriches StreamPartitionCountMonitor to periodically monitor 
input-regexes to match to actual inputs and stop the job when a new input 
stream is discovered. 
    
    * Add a new API to SysAdmin to allow listing of all streams, e.g., 
Kafka-topics. KafkaSysAdmin implementation of this uses KafkaConsumer's 
listTopics API. (Even if listTopics had 1 million topics with 100 bytes per 
topic total, temporary memory overhead will be 100 MB). 
    
    * Added config job.coordinator.monitor-input-regex.frequency.ms for the 
monitoring frequency, and job.coordinator.monitor-input-regex.%s for each input 
system. Users can then choose desired regex for each input system, e.g., 
job.coordinator.monitor-input-regex.kafka=test-.*. 
    
    * We can later enrich RegexTopicGen rewriter to add a monitor-input-regex 
config to allow periodic jonitoring
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rmatharu/samza newtopic-test

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/samza/pull/796.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #796
    
----
commit f33839e9b7eae354a790d0002352c732c5f6868f
Author: Ray Matharu <rmatharu@...>
Date:   2018-11-06T01:58:13Z

    Full-working logic for new topic discovery

----


---

Reply via email to