Hi, What would be the best approach for doing "blocking" operations in Samza?
For example, we have a kafka stream of urls for which we need to gather external data via HTTP (such as alexa rank, get the page title and headers..). Other scenarios include database access and decision making via a rule engine. Samza processes messages in a singe thread, HTTP requests might take hundreds of miliseconds. With the single threaded design the throughput would be very limited, which can be solved with an asynchronous approach. However Samza documentation explicitely states "*You are strongly discouraged from using threads in your job’s code*". It seems that Samza design suits very well "data transformation" scenarios, what is not clear is how well can it support external services? Thanks, Michael Sklyar