Hmm, I wouldn't throw an opinion about the technologies involved without really knowing what's really going on infrastructure-wise.
IMHO there's nothing wrong with your setup from a high level perspective. Of course I can't tell a thing about how it is configured, dimensioned and/or being used, but seems to be an enterprise-grade platform and it should be able to handle tons of traffic if used properly. To give you a real example, we use Tomcat to handle MO's and DLR's. I can tell you we handle many, many hundreds of requests per second without even noticing the load on the servers. WAY more than what we were doing with our former php solution. NOTE: However if the application layer is poorly designed, nothing like java to bring a system to its knees in no time ;) but it wouldn't be a Java problem but a design problem. With enough (or too little) skills, you can trash a server using any language of choice :D Now seriously, kannel doesn't spawn infinite threads to hit the application server when a DLR arrives, I'm not sure about how many requests per second you're getting, but using the proper approach you should be able to deal with it with a cluster setup for sure. I don't know what you're doing when you receive the dlrs, but whatever it is, if you're expecting high traffic you need to take some strategies in order to be able to handle it. My rather uninformed recommendations for your case are: 1. Try to do as little as possible when receiving the DLR. You can post-process them afterwards in a cron, batch, separate daemon, pull from a queue, whatever. 2. Try to finish the process as fast as possible. This is more or less the same as 1: the less time you take processing each dlr, the better. 3. Profile the applications involved. Sometimes a silly bug somewhere holds things longer than needed and brings everything to a crawl. e.g. a misconfigured DB pool, objects being held in memory longer than needed or not enough memory assigned to an application. 4. Last but not least: Load balancers, message queues (JMS) and memory caches (ehcache, memcached, etc) are your best friends when designing high-performance platforms ;) Hope it helps, Alex 2010/4/1 Nikos Balkanas <nbalka...@gmail.com> > Big, big, big mistake. You are using a java application server, to do the > work a c web server. That is your problem. I am surprised that your system > is not frozen from the many connections. You could at least invest in a > JBoss that is open source. > > Still my question stands: DLRs are not synchronous. That means you don't > get them all at once when you stop sending SMS. SMS can be delivered at any > time, and therefore DLRs are received at the time of delivery, i.e. any > time. With 25 SMS/s DLRs should be coming in at about the same throughput. > > Anyway to answer your question. Kannel cannot limit its DLR output, since > it doesn't have a queue specific for this, and it sends them as it receives > them (OK there is a queue at the SMSc driver, but that's not it). You have a > problem in your architecture of your backend. Maybe you can talk to your > SMSc and ask them to observe the same inbound throuput restrictions that > they have for outbound traffic. 25 SMS/s in, 25 SMS/s out. > > > Nikos > ----- Original Message ----- From: "Gabor Maros" <gabor.ma...@erstebank.hu > > > To: <users@kannel.org> > Sent: Thursday, April 01, 2010 9:22 PM > Subject: Re: Too many dlr at once > > > > > Hi Seikath, > > Dlrs are very important for us they are in an Oracle database, and the web > application is running in a WebLogic cluster (it makes DB connectio > pooling). I can not use other infrastructure. I think opening 40k network > connection at once is not the best and effective thing (allowing so many > open files is risky) so I just like a solution that doesn't allow kannel to > open such amount of connections at once. I like a solution where I can tell > kannel not to send more dlrs back then 100/sec. This is a normal pooling > system a lot of applications are using such tuning parameters to save > others > and I think kannel use the same thing on smsc side, but I want it on the > reverse side (how flow-control and window work). > The problem is that cimd2 has the less possibility to manadge according to > the documentation. > > Bye, > Gabor > > > seikath wrote: > >> >> In general DLR is not so important info to be injected right away into the >> database. >> if you have high load of MO/DLR, consider db pooling and even better, drop >> the http requests. >> The Apache or Lighty or even ISS can handle the traffic you have mentioned >> with no issues. >> What I do for high load of MO/DLR, is either use sqlbox to handle it, >> either simply write directly to simple xml files. >> OR, you may parse the kannel logs, which will require some regexp skills. >> I used to implement all of the above, according to the specific projects. >> >> The XML files easily can be loaded later in a queue in the database. >> >> >> On 04/01/2010 06:33 PM, Gabor Maros wrote: >> >>> >>> Thanks Nikos, >>> >>> it may help but there is another problem i haven't mentioned before. We >>> have >>> a webapplication that receives dlrs from kannel. If kannel gets 10k dlr >>> in >>> one sec then kannel tries to send all of them in the same sec to the app. >>> This behaviour kills the app (and the database behind it) because it gets >>> 10000 http connections in one sec which is quite huge amount according to >>> our peaktime when there is 25 SMs/sec. >>> Unfortunately we are not the NASA with unimaginable computing capacity, >>> so >>> the ideal solution for us would be a parameter that tells kannel how many >>> connections are allowed in one sec. >>> >>> Bye, >>> Gabor >>> >>> >>> >>> Nikos Balkanas wrote: >>> >>>> >>>> Hi, >>>> >>>> Check if you havd /etc/hosts, and if you do you should have specified >>>> your >>>> gateway. >>>> >>>> Also check if named is running (Linux) >>>> >>>> BR, >>>> Nikos >>>> ----- Original Message ----- From: "Gabor Maros" < >>>> gabor.ma...@erstebank.hu> >>>> To: <users@kannel.org> >>>> Sent: Thursday, April 01, 2010 12:58 PM >>>> Subject: Too many dlr at once >>>> >>>> >>>> >>>> Hi, >>>> >>>> I've got a kannel install with emi smsc connection. >>>> When we send lots of sms to the smsc at once the delivery notifications >>>> only >>>> come at the end when kannel's queue is empty. Smsc only accepts 10-15 >>>> SM/sec >>>> but can send back horrible amount at once. This is a problem for us >>>> because >>>> kannel gets thousands of dlrs in one second and ERROR messages appear in >>>> smsbox.log: >>>> >>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms> from >>>> <0036303444481> >>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms> from >>>> <0036303444481> >>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms> from >>>> <0036303444481> >>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms> from >>>> <0036303444481> >>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms> from >>>> <0036303444481> >>>> >>>> Ξ²β'¬Β|after thousands of such normal logrecords we can see thousands of >>>> the >>>> >>>> following: >>>> >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname >>>> occurs. >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or >>>> directory >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: gethostbyname failed >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: error connecting to server `xxxx' >>>> at >>>> port `yyy' >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Couldn't send request to >>>> <https://xyz> >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname >>>> occurs. >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or >>>> directory >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: gethostbyname failed >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: error connecting to server `xxxx' >>>> at >>>> port `yyy' >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Couldn't send request to >>>> <https://xyz> >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname >>>> occurs. >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or >>>> directory >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: gethostbyname failed >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: error connecting to server `xxxx' >>>> at >>>> port `yyy' >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Couldn't send request to >>>> <https://xyz> >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname >>>> occurs. >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or >>>> directory >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: gethostbyname failed >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: error connecting to server `xxxx' >>>> at >>>> port `yyy' >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Couldn't send request to >>>> <https://xyz> >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname >>>> occurs. >>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or >>>> directory >>>> >>>> Is there a configuration parameter that change this behavior and we can >>>> slow it down? >>>> I donΞ²β'¬β"Άt know why it is happen but there must be some kind of >>>> limit (I >>>> >>>> think >>>> it is not an open file issue but something similar). >>>> Maybe there is another side effect (but IΞ²β'¬β"Άm not sure yet) in >>>> >>>> connection >>>> with >>>> DLR database because the number of SMs that are not in the end phase >>>> (delivered or canΞ²β'¬β"Άt be delivered) are growing. >>>> >>>> >>>> Thanks, >>>> Gabor >>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/Too-many-dlr-at-once-tp28106589p28106589.html >>>> Sent from the Kannel - User mailing list archive at Nabble.com. >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >> >> > -- > View this message in context: > http://old.nabble.com/Too-many-dlr-at-once-tp28106589p28112070.html > Sent from the Kannel - User mailing list archive at Nabble.com. > > > > >