Re: Too many dlr at once

Juan Nin Thu, 15 Apr 2010 17:27:50 -0700

What we do is the following:

When we receive MOs or DLRs, Kannel posts them to a Java WebServices
running under Tomcat (load balanced cluster), which does nothing else
(or almost) than insert it into a message queue. That way the
webservers processes them very fast without producing load on their
servers, nor killing a backend database.


Then a Java daemon consumes the MOs and DLRs from their queues doing
whatever it needs to do with them.
There we use database pools, cache common information, etc.

Regards,

Juan


2010/4/1 Alejandro Guerrieri <alejandro.guerri...@gmail.com>:
> Hmm, I wouldn't throw an opinion about the technologies involved without
> really knowing what's really going on infrastructure-wise.
> IMHO there's nothing wrong with your setup from a high level perspective. Of
> course I can't tell a thing about how it is configured, dimensioned and/or
> being used, but seems to be an enterprise-grade platform and it should be
> able to handle tons of traffic if used properly.
> To give you a real example, we use Tomcat to handle MO's and DLR's.
> I can tell you we handle many, many hundreds of requests per second without
> even noticing the load on the servers. WAY more than what we were doing with
> our former php solution.
> NOTE: However if the application layer is poorly designed, nothing like java
> to bring a system to its knees in no time ;) but it wouldn't be a Java
> problem but a design problem. With enough (or too little) skills, you can
> trash a server using any language of choice :D
> Now seriously, kannel doesn't spawn infinite threads to hit the application
> server when a DLR arrives, I'm not sure about how many requests per second
> you're getting, but using the proper approach you should be able to deal
> with it with a cluster setup for sure.
> I don't know what you're doing when you receive the dlrs, but whatever it
> is, if you're expecting high traffic you need to take some strategies in
> order to be able to handle it. My rather uninformed recommendations for your
> case are:
> 1. Try to do as little as possible when receiving the DLR. You can
> post-process them afterwards in a cron, batch, separate daemon, pull from a
> queue, whatever.
> 2. Try to finish the process as fast as possible. This is more or less the
> same as 1: the less time you take processing each dlr, the better.
> 3. Profile the applications involved. Sometimes a silly bug somewhere holds
> things longer than needed and brings everything to a crawl. e.g. a
> misconfigured DB pool, objects being held in memory longer than needed or
> not enough memory assigned to an application.
> 4. Last but not least: Load balancers, message queues (JMS) and memory
> caches (ehcache, memcached, etc) are your best friends when designing
> high-performance platforms ;)
> Hope it helps,
> Alex
> 2010/4/1 Nikos Balkanas <nbalka...@gmail.com>
>>
>> Big, big, big mistake. You are using a java application server, to do the
>> work a c web server. That is your problem. I am surprised that your system
>> is not frozen from the many connections. You could at least invest in a
>> JBoss that is open source.
>>
>> Still my question stands: DLRs are not synchronous. That means you don't
>> get them all at once when you stop sending SMS. SMS can be delivered at any
>> time, and therefore DLRs are received at the time of delivery, i.e. any
>> time. With 25 SMS/s DLRs should be coming in at about the same throughput.
>>
>> Anyway to answer your question. Kannel cannot limit its DLR output, since
>> it doesn't have a queue specific for this, and it sends them as it receives
>> them (OK there is a queue at the SMSc driver, but that's not it). You have a
>> problem in your architecture of your backend. Maybe you can talk to your
>> SMSc and ask them to observe the same inbound throuput restrictions that
>> they have for outbound traffic. 25 SMS/s in, 25 SMS/s out.
>>
>> Nikos
>> ----- Original Message ----- From: "Gabor Maros"
>> <gabor.ma...@erstebank.hu>
>> To: <users@kannel.org>
>> Sent: Thursday, April 01, 2010 9:22 PM
>> Subject: Re: Too many dlr at once
>>
>>
>>
>> Hi Seikath,
>>
>> Dlrs are very important for us they are in an Oracle database, and the web
>> application is running in a WebLogic cluster (it makes DB connectio
>> pooling). I can not use other infrastructure. I think opening 40k network
>> connection at once is not the best and effective thing (allowing so many
>> open files is risky) so I just like a solution that doesn't allow kannel
>> to
>> open such amount of connections at once. I like a solution where I can
>> tell
>> kannel not to send more dlrs back then 100/sec. This is a normal pooling
>> system a lot of applications are using such tuning parameters to save
>> others
>> and I think kannel use the same thing on smsc side, but I want it on the
>> reverse side (how flow-control and window work).
>> The problem is that cimd2 has the less possibility to manadge according to
>> the documentation.
>>
>> Bye,
>> Gabor
>>
>>
>> seikath wrote:
>>>
>>> In general DLR is not so important info to be injected right away into
>>> the
>>> database.
>>> if you have high load of MO/DLR, consider db pooling and even better,
>>> drop
>>> the http requests.
>>> The Apache or Lighty or even ISS can handle the traffic you have
>>> mentioned
>>> with no issues.
>>> What I do for high load of MO/DLR, is either use sqlbox to handle it,
>>> either simply write directly to simple xml files.
>>> OR, you may parse the kannel logs, which will require some regexp skills.
>>> I used to implement all of the above, according to the specific projects.
>>>
>>> The XML files easily can be loaded later in a queue in the database.
>>>
>>>
>>> On 04/01/2010 06:33 PM, Gabor Maros wrote:
>>>>
>>>> Thanks Nikos,
>>>>
>>>> it may help but there is another problem i haven't mentioned before. We
>>>> have
>>>> a webapplication that receives dlrs from kannel. If kannel gets 10k dlr
>>>> in
>>>> one sec then kannel tries to send all of them in the same sec to the
>>>> app.
>>>> This behaviour kills the app (and the database behind it) because it
>>>> gets
>>>> 10000 http connections in one sec which is quite huge amount according
>>>> to
>>>> our peaktime when there is 25 SMs/sec.
>>>> Unfortunately we are not the NASA with unimaginable computing capacity,
>>>> so
>>>> the ideal solution for us would be a parameter that tells kannel how
>>>> many
>>>> connections are allowed in one sec.
>>>>
>>>> Bye,
>>>> Gabor
>>>>
>>>>
>>>>
>>>> Nikos Balkanas wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Check if you havd /etc/hosts, and if you do you should have specified
>>>>> your
>>>>> gateway.
>>>>>
>>>>> Also check if named is running (Linux)
>>>>>
>>>>> BR,
>>>>> Nikos
>>>>> ----- Original Message ----- From: "Gabor Maros"
>>>>> <gabor.ma...@erstebank.hu>
>>>>> To: <users@kannel.org>
>>>>> Sent: Thursday, April 01, 2010 12:58 PM
>>>>> Subject: Too many dlr at once
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I've got a kannel install with emi smsc connection.
>>>>> When we send lots of sms to the smsc at once the delivery notifications
>>>>> only
>>>>> come at the end when kannel's queue is empty. Smsc only accepts 10-15
>>>>> SM/sec
>>>>> but can send back horrible amount at once. This is a problem for us
>>>>> because
>>>>> kannel gets thousands of dlrs in one second and ERROR messages appear
>>>>> in
>>>>> smsbox.log:
>>>>>
>>>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms>
>>>>> from
>>>>> <0036303444481>
>>>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms>
>>>>> from
>>>>> <0036303444481>
>>>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms>
>>>>> from
>>>>> <0036303444481>
>>>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms>
>>>>> from
>>>>> <0036303444481>
>>>>> 2010-04-01 08:21:17 [4834] [4] INFO: Starting delivery report <sms>
>>>>> from
>>>>> <0036303444481>
>>>>>
>>>>> Ξ²β'¬Β|after thousands of such normal logrecords we can see thousands
>>>>> of the
>>>>> following:
>>>>>
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname
>>>>> occurs.
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or
>>>>> directory
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: gethostbyname failed
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: error connecting to server `xxxx'
>>>>> at
>>>>> port `yyy'
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Couldn't send request to
>>>>> <https://xyz>
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname
>>>>> occurs.
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or
>>>>> directory
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: gethostbyname failed
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: error connecting to server `xxxx'
>>>>> at
>>>>> port `yyy'
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Couldn't send request to
>>>>> <https://xyz>
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname
>>>>> occurs.
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or
>>>>> directory
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: gethostbyname failed
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: error connecting to server `xxxx'
>>>>> at
>>>>> port `yyy'
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Couldn't send request to
>>>>> <https://xyz>
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname
>>>>> occurs.
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or
>>>>> directory
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: gethostbyname failed
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: error connecting to server `xxxx'
>>>>> at
>>>>> port `yyy'
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Couldn't send request to
>>>>> <https://xyz>
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: Error while gw_gethostbyname
>>>>> occurs.
>>>>> 2010-04-01 08:21:18 [4834] [9] ERROR: System error 2: No such file or
>>>>> directory
>>>>>
>>>>>  Is there a configuration parameter that change this behavior and we
>>>>> can
>>>>> slow it down?
>>>>> I donΞ²β'¬β"Άt know why it is happen but there must be some kind of
>>>>> limit (I
>>>>> think
>>>>> it is not an open file issue but something similar).
>>>>> Maybe there is another side effect (but IΞ²β'¬β"Άm not sure yet) in
>>>>> connection
>>>>> with
>>>>> DLR database because the number of SMs that are not in the end phase
>>>>> (delivered or canΞ²β'¬β"Άt be delivered) are growing.
>>>>>
>>>>> Thanks,
>>>>> Gabor
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/Too-many-dlr-at-once-tp28106589p28106589.html
>>>>> Sent from the Kannel - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Too-many-dlr-at-once-tp28106589p28112070.html
>> Sent from the Kannel - User mailing list archive at Nabble.com.
>>
>>
>>
>>
>
>



-- 
Juan Nin
3Cinteractive / Mobilizing Great Brands
http://www.3cinteractive.com

Re: Too many dlr at once

Reply via email to