GitHub user lhaiesp opened a pull request: https://github.com/apache/samza/pull/633
SAMZA-1870: hdfs offset comparator to handle end of stream offset This happens particularly when using HDFS as a bootstrap stream: org.apache.samza.SamzaException: Invalid offset for MultiFileHdfsReader: END_OF_STREAM at org.apache.samza.system.hdfs.reader.MultiFileHdfsReader.getCurFileIndex(MultiFileHdfsReader.java:64) at org.apache.samza.system.hdfs.HdfsSystemAdmin.offsetComparator(HdfsSystemAdmin.java:224) at org.apache.samza.system.chooser.BootstrappingChooser.org$apache$samza$system$chooser$BootstrappingChooser$$checkOffset(BootstrappingChooser.scala:274) at org.apache.samza.system.chooser.BootstrappingChooser.choose(BootstrappingChooser.scala:204) at org.apache.samza.system.chooser.DefaultChooser.choose(DefaultChooser.scala:294) at org.apache.samza.system.SystemConsumers.choose(SystemConsumers.scala:210) at org.apache.samza.task.AsyncRunLoop.chooseEnvelope(AsyncRunLoop.java:208) at org.apache.samza.task.AsyncRunLoop.run(AsyncRunLoop.java:156) at org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:787) at org.apache.samza.runtime.LocalContainerRunner.run(LocalContainerRunner.java:101) at org.apache.samza.runtime.LocalContainerRunner.main(LocalContainerRunner.java:148) You can merge this pull request into a Git repository by running: $ git pull https://github.com/lhaiesp/samza master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/633.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #633 ---- commit 42b80cc34a999955b79997494fe078f8024c9c2c Author: Hai Lu <halu@...> Date: 2018-09-11T15:49:51Z hdfs offset comparator to handle end of stream offset ---- ---