On Nov 21, 2012, at 11:37 AM, German Blanco wrote: > Hello, > > My problem is similar to the one in this thread: > S4-Piper: Scalability in input adapter Fri, 12 Oct 2012 > > The solution proposes to "distribute the connections among adapter nodes". > Would the distribution be done in the client application that connects to the > adaptors? > Or else, how?
That really depends on your use case, infrastructure, and the kind of preprocessing you need to do in the adapter. Usually you would use several adapter nodes because the input stream is big and fast and therefore you need more processing power to convert it into S4 events in a timely fashion. If you control the input stream provider: - If you can "tee" the input traffic - that would be the role of the client app in front of the adaptor - then it's simple to distribute to various adapter nodes. - If you have a pub/sub messaging system (like Kafka) that provides the input stream, you may configure it to split the stream so that you can fetch different data from different adapters. If you don't control the input stream provider: - If you have only 1 input connection but that there is quite some work to do in the adapter (for instance, enrichment), then you'd benefit listening to the input stream from a single adapter node but still using several adapter nodes for parallelizing the processing (in keyed PEs). - If you have only 1 input connection but that conversion is trivial, and if the input stream is really big, you might try to do some batching of the data in the listening adapter node, then parallelize the processing of the batches. Hope this helps, Matthieu