It is obviously not a one size fits all. It depends on a lot of factors. How
much data will you be ingesting, what is the data source, is it a firehose, a
web front end or an app that is batching the messages. How much processing
will you be doing in the storm/kafka layer and obviously what will be the rate
at which you will persist data to your sink. So all these factors will
determine your topology. Storm/Spark are memory intensive but if you are
streaming as would be the case with Kafka then it should not be much of an
issue.
On Sunday, March 8, 2015 11:26 PM, "Adaryl "Bob" Wakefield, MBA"
<[email protected]> wrote:
Let’s say you put together a real time streaming solution using Storm, Kafka,
and the necessary Zookeeper and whatever storage tech you decide. Is it true
that these applications are so resource intensive that they all need to live by
themselves on their own machine? Put another way, for the ingestion portion, is
the minimum number of machines required here 9? Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData