Hi Rob, Yes, your use-case is a good fit. You can use Samza for fault-tolerant stream processing.
We have document (eg: member profiles, articles/blogs) standardization use-cases at LinkedIn powered by Samza. Please let us know should you have further questions! On Sun, Apr 28, 2019 at 7:09 AM Rob Martin <rob.mart...@gmail.com> wrote: > Im looking at creating a distributed steaming pipeline for processing text > documents (eg cleaning, NER and machine learning). Documents will generally > be under 1mb and processing will be stateless. Was aiming to feed documents > from various sources and additional data into Kafka to be streamed to the > proccing pipeline in Samza. Would this be an appropriate use case for > Samza? > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University