What we do is the following: When we receive MOs or DLRs, Kannel posts them to a Java WebServices running under Tomcat (load balanced cluster), which does nothing else (or almost) than insert it into a message queue. That way the webservers processes them very fast without producing load on their servers, nor killing a backend database.
Then a Java daemon consumes the MOs and DLRs from their queues doing whatever it needs to do with them. There we use database pools, cache common information, etc. Regards, Juan 2010/4/1 Alejandro Guerrieri <alejandro.guerri...@gmail.com>: > Hmm, I wouldn't throw an opinion about the technologies involved without > really knowing what's really going on infrastructure-wise. > IMHO there's nothing wrong with your setup from a high level perspective. Of > course I can't tell a thing about how it is configured, dimensioned and/or > being used, but seems to be an enterprise-grade platform and it should be > able to handle tons of traffic if used properly. > To give you a real example, we use Tomcat to handle MO's and DLR's. > I can tell you we handle many, many hundreds of requests per second without > even noticing the load on the servers. WAY more than what we were doing with > our former php solution. > NOTE: However if the application layer is poorly designed, nothing like java > to bring a system to its knees in no time ;) but it wouldn't be a Java > problem but a design problem. With enough (or too little) skills, you can > trash a server using any language of choice :D > Now seriously, kannel doesn't spawn infinite threads to hit the application > server when a DLR arrives, I'm not sure about how many requests per second > you're getting, but using the proper approach you should be able to deal > with it with a cluster setup for sure. > I don't know what you're doing when you receive the dlrs, but whatever it > is, if you're expecting high traffic you need to take some strategies in > order to be able to handle it. My rather uninformed recommendations for your > case are: > 1. Try to do as little as possible when receiving the DLR. You can > post-process them afterwards in a cron, batch, separate daemon, pull from a > queue, whatever. > 2. Try to finish the process as fast as possible. This is more or less the > same as 1: the less time you take processing each dlr, the better. > 3. Profile the applications involved. Sometimes a silly bug somewhere holds > things longer than needed and brings everything to a crawl. e.g. a > misconfigured DB pool, objects being held in memory longer than needed or > not enough memory assigned to an application. > 4. Last but not least: Load balancers, message queues (JMS) and memory > caches (ehcache, memcached, etc) are your best friends when designing > high-performance platforms ;) > Hope it helps, > Alex > 2010/4/1 Nikos Balkanas <nbalka...@gmail.com> >> >> Big, big, big mistake. You are using a java application server, to do the >> work a c web server. That is your problem. I am surprised that your system >> is not frozen from the many connections. You could at least invest in a >> JBoss that is open source. >> >> Still my question stands: DLRs are not synchronous. That means you don't >> get them all at once when you stop sending SMS. SMS can be delivered at any >> time, and therefore DLRs are received at the time of delivery, i.e. any >> time. With 25 SMS/s DLRs should be coming in at about the same throughput. >> >> Anyway to answer your question. Kannel cannot limit its DLR output, since >> it doesn't have a queue specific for this, and it sends them as it receives >> them (OK there is a queue at the SMSc driver, but that's not it). You have a >> problem in your architecture of your backend. Maybe you can talk to your >> SMSc and ask them to observe the same inbound throuput restrictions that >> they have for outbound traffic. 25 SMS/s in, 25 SMS/s out. >> >> Nikos >> ----- Original Message ----- From: "Gabor Maros" >> <gabor.ma...@erstebank.hu> >> To: <users@kannel.org> >> Sent: Thursday, April 01, 2010 9:22 PM >> Subject: Re: Too many dlr at once >> >> >> >> Hi Seikath, >> >> Dlrs are very important for us they are in an Oracle database, and the web >> application is running in a WebLogic cluster (it makes DB connectio >> pooling). I can not use other infrastructure. I think opening 40k network >> connection at once is not the best and effective thing (allowing so many >> open files is risky) so I just like a solution that doesn't allow kannel >> to >> open such amount of connections at once. I like a solution where I can >> tell >> kannel not to send more dlrs back then 100/sec. This is a normal pooling >> system a lot of applications are using such tuning parameters to save >> others >> and I think kannel use the same thing on smsc side, but I want it on the >> reverse side (how flow-control and window work). >> The problem is that cimd2 has the less possibility to manadge according to >> the documentation. >> >> Bye, >> Gabor >> >> >> seikath wrote: >>> >>> In general DLR is not so important info to be injected right away into >>> the >>> database. >>> if you have high load of MO/DLR, consider db pooling and even better, >>> drop >>> the http requests. >>> The Apache or Lighty or even ISS can handle the traffic you have >>> mentioned >>> with no issues. >>> What I do for high load of MO/DLR, is either use sqlbox to handle it, >>> either simply write directly to simple xml files. >>> OR, you may parse the kannel logs, which will require some regexp skills. >>> I used to implement all of the above, according to the specific projects. >>> >>> The XML files easily can be loaded later in a queue in the database. >>> >>> >>> On 04/01/2010 06:33 PM, Gabor Maros wrote: >>>> >>>> Thanks Nikos, >>>> >>>> it may help but there is another problem i haven't mentioned before. We >>>> have >>>> a webapplication that receives dlrs from kannel. If kannel gets 10k dlr >>>> in >>>> one sec then kannel tries to send all of them in the same sec to the >>>> app. >>>> This behaviour kills the app (and the database behind it) because it >>>> gets >>>> 10000 http connections in one sec which is quite huge amount according >>>> to >>>> our peaktime when there is 25 SMs/sec. >>>> Unfortunately we are not the NASA with unimaginable computing capacity, >>>> so >>>> the ideal solution for us would be a parameter that tells kannel how >>>> many >>>> connections are allowed in one sec. >>>> >>>> Bye, >>>> Gabor >>>> >>>> >>>> >>>> Nikos Balkanas wrote: >>>>> >>>>> Hi, >>>>> >>>>> Check if you havd /etc/hosts, and if you do you should have specified >>>>> your >>>>> gateway. >>>>> >>>>> Also check if named is running (Linux) >>>>> >>>>> BR, >>>>> Nikos >>>>> ----- Original Message ----- From: "Gabor Maros" >>>>> <gabor.ma...@erstebank.hu> >>>>> To: <users@kannel.org> >>>>> Sent: Thursday, April 01, 2010 12:58 PM >>>>> Subject: Too many dlr at once >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> I've got a kannel install with emi smsc connection. >>>>> When we send lots of sms to the smsc at once the delivery notifications >>>>> only >>>>> come at the end when kannel's queue is empty. Smsc only accepts 10-15 >>>>> SM/sec >>>>> but can send back horrible amount at once. This is a problem for us >>>>> because >>>>> kannel gets thousands of dlrs in one second and ERROR messages appear >>>>> in >>>>> smsbox.log: >>>>> >>>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms> >>>>> from >>>>> <0036303444481> >>>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms> >>>>> from >>>>> <0036303444481> >>>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms> >>>>> from >>>>> <0036303444481> >>>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms> >>>>> from >>>>> <0036303444481> >>>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms> >>>>> from >>>>> <0036303444481> >>>>> >>>>> Ξ²β'¬Β|after thousands of such normal logrecords we can see thousands >>>>> of the >>>>> following: >>>>> >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname >>>>> occurs. >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or >>>>> directory >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: gethostbyname failed >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: error connecting to server `xxxx' >>>>> at >>>>> port `yyy' >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Couldn't send request to >>>>> <https://xyz> >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname >>>>> occurs. >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or >>>>> directory >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: gethostbyname failed >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: error connecting to server `xxxx' >>>>> at >>>>> port `yyy' >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Couldn't send request to >>>>> <https://xyz> >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname >>>>> occurs. >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or >>>>> directory >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: gethostbyname failed >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: error connecting to server `xxxx' >>>>> at >>>>> port `yyy' >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Couldn't send request to >>>>> <https://xyz> >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname >>>>> occurs. >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or >>>>> directory >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: gethostbyname failed >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: error connecting to server `xxxx' >>>>> at >>>>> port `yyy' >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Couldn't send request to >>>>> <https://xyz> >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname >>>>> occurs. >>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or >>>>> directory >>>>> >>>>> Is there a configuration parameter that change this behavior and we >>>>> can >>>>> slow it down? >>>>> I donΞ²β'¬β"Άt know why it is happen but there must be some kind of >>>>> limit (I >>>>> think >>>>> it is not an open file issue but something similar). >>>>> Maybe there is another side effect (but IΞ²β'¬β"Άm not sure yet) in >>>>> connection >>>>> with >>>>> DLR database because the number of SMs that are not in the end phase >>>>> (delivered or canΞ²β'¬β"Άt be delivered) are growing. >>>>> >>>>> Thanks, >>>>> Gabor >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/Too-many-dlr-at-once-tp28106589p28106589.html >>>>> Sent from the Kannel - User mailing list archive at Nabble.com. >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/Too-many-dlr-at-once-tp28106589p28112070.html >> Sent from the Kannel - User mailing list archive at Nabble.com. >> >> >> >> > > -- Juan Nin 3Cinteractive / Mobilizing Great Brands http://www.3cinteractive.com