GitHub user rmatharu opened a pull request: https://github.com/apache/samza/pull/796
Enabling auto-discovery of regex input topics This PR makes the following changes * Enriches StreamPartitionCountMonitor to periodically monitor input-regexes to match to actual inputs and stop the job when a new input stream is discovered. * Add a new API to SysAdmin to allow listing of all streams, e.g., Kafka-topics. KafkaSysAdmin implementation of this uses KafkaConsumer's listTopics API. (Even if listTopics had 1 million topics with 100 bytes per topic total, temporary memory overhead will be 100 MB). * Added config job.coordinator.monitor-input-regex.frequency.ms for the monitoring frequency, and job.coordinator.monitor-input-regex.%s for each input system. Users can then choose desired regex for each input system, e.g., job.coordinator.monitor-input-regex.kafka=test-.*. * We can later enrich RegexTopicGen rewriter to add a monitor-input-regex config to allow periodic jonitoring You can merge this pull request into a Git repository by running: $ git pull https://github.com/rmatharu/samza newtopic-test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/796.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #796 ---- commit f33839e9b7eae354a790d0002352c732c5f6868f Author: Ray Matharu <rmatharu@...> Date: 2018-11-06T01:58:13Z Full-working logic for new topic discovery ---- ---