Hi Brett, I've found that workaround, which was offered by Mr. Donald Jackson. you may find it at http://www.mail-archive.com/users@kannel.org/msg15958.html
Regards -- Abdulmnem Benaiad Almontaha CTO www.almontaha.ly Tripoli-Libya On Thu, Sep 23, 2010 at 2:58 PM, brett skinner <tatty.dishcl...@gmail.com>wrote: > Hi > > We are using transmitter and receiver because apparently there are > performance issues when using transceiver due to both the > transmitting/receiving being handled on a single thread. We were advised to > use Tx and Rx. > > We have contacted the provider and while they acknowledge that they had an > outage for a couple of seconds everyone else was able to reconnect without > an issue. It was just us. But this is not limited to them, it seems any bind > that dies and comes back there is a chance that bearerbox will start > queuing. > > Do you have any extra information on the work-around? > > Regards, > > > On Thu, Sep 23, 2010 at 2:24 PM, Benaiad <bena...@gmail.com> wrote: > >> Hi Brett, >> >> Which type of connection are you using? if it's not as transceiver, I >> suggest you to use it if your provider has a support for this. >> There is a known bug regarding the separated connections for Tx & Rx. >> I beleave that there is another workaround for this by defining two smsc >> groups one for Tx and the other for Rx. >> >> Regards >> >> -- >> Benaiad >> >> >> >> On Thu, Sep 23, 2010 at 11:21 AM, brett skinner < >> tatty.dishcl...@gmail.com> wrote: >> >>> Hi guys >>> >>> We have just hit this issue AGAIN this morning. Can ANYONE please give >>> some guidance here. I have had zero response on this critical issue for over >>> two weeks now. Can someone please help. This issue is becoming increasingly >>> urgent. >>> >>> Regards, >>> Brett >>> >>> ---------- Forwarded message ---------- >>> From: brett skinner <tatty.dishcl...@gmail.com> >>> Date: Fri, Sep 17, 2010 at 2:16 PM >>> Subject: Re: Weird binding issue that causes queues to build up. >>> To: Users <users@kannel.org> >>> >>> >>> Hi >>> >>> We have experienced this problem again. A couple of our binds to one >>> particular smsc (the rest were okay) had connectivity issues last night at >>> 12 AM. The binds were re-established and reported as being online from the >>> status pages. However a queue for one of the binds built up on the >>> bearerbox. Only when I had run a stop-smsc and start-smsc for that bind did >>> the queue for that bind start processing again. >>> >>> In the logs at 12AM we have a bunch of Errors: >>> >>> 2010-09-16 23:59:35 [32641] [44] ERROR: Error reading from fd 57: >>> 2010-09-16 23:59:35 [32641] [44] ERROR: System error 104: Connection >>> reset by peer >>> 2010-09-16 23:59:35 [32641] [44] ERROR: SMPP[XXX]: Couldn't connect to >>> SMS center (retrying in 10 seconds). >>> 2010-09-16 23:59:38 [32641] [38] ERROR: Error reading from fd 52: >>> 2010-09-16 23:59:38 [32641] [38] ERROR: System error 104: Connection >>> reset by peer >>> 2010-09-16 23:59:38 [32641] [38] ERROR: SMPP[YYY]: Couldn't connect to >>> SMS center (retrying in 10 seconds). >>> 2010-09-16 23:59:39 [32641] [46] ERROR: Error reading from fd 50: >>> 2010-09-16 23:59:39 [32641] [46] ERROR: System error 104: Connection >>> reset by peer >>> 2010-09-16 23:59:39 [32641] [46] ERROR: SMPP[AAA]: Couldn't connect to >>> SMS center (retrying in 10 seconds). >>> 2010-09-16 23:59:47 [32641] [39] ERROR: Error reading from fd 49: >>> 2010-09-16 23:59:47 [32641] [39] ERROR: System error 104: Connection >>> reset by peer >>> 2010-09-16 23:59:47 [32641] [39] ERROR: SMPP[YYY]: Couldn't connect to >>> SMS center (retrying in 10 seconds). >>> 2010-09-16 23:59:51 [32641] [48] ERROR: Error reading from fd 61: >>> 2010-09-16 23:59:51 [32641] [48] ERROR: System error 104: Connection >>> reset by peer >>> 2010-09-16 23:59:51 [32641] [48] ERROR: SMPP[BBB]: Couldn't connect to >>> SMS center (retrying in 10 seconds). >>> 2010-09-17 00:00:00 [32641] [47] ERROR: Error reading from fd 40: >>> 2010-09-17 00:00:00 [32641] [47] ERROR: System error 104: Connection >>> reset by peer >>> 2010-09-17 00:00:00 [32641] [47] ERROR: SMPP[XXX]: Couldn't connect to >>> SMS center (retrying in 10 seconds). >>> >>> >>> I am not sure how Kannel works internally, but it is almost as if when >>> the bind is re-established, the old one is disposed and a new one is created >>> but the queue and the pointers are still sticking around for the old one and >>> have not been updated. This results in messages sitting in the queue and not >>> being routed to the bind which reports as being online. >>> >>> I see that there might have been similar issues in the past: >>> http://www.kannel.org/pipermail/users/2009-May/007166.html. It might be >>> related maybe not. >>> >>> <http://www.kannel.org/pipermail/users/2009-May/007166.html>We have >>> already set our binds up in a transmitter and receiver. We are not running >>> transceiver. >>> >>> Regards, >>> >>> >>> On Thu, Sep 9, 2010 at 3:42 PM, brett skinner <tatty.dishcl...@gmail.com >>> > wrote: >>> >>>> Thanks Alvaro for your response. >>>> >>>> I am running a build from SVN from about a 2 weeks ago. I am bit weary >>>> of turning the loggers to debug mode because we are doing a lot of traffic >>>> and debug mode is very verbose we will eat through our disk in no time. It >>>> would be different if it was reproducible or if we could anticipate the >>>> problem because we could just turn on the loggers at the right time. This >>>> happens so sporadically we would have to leave the loggers in debug mode. >>>> The last time this happened was last week. >>>> >>>> I will go check out that tool you mentioned. >>>> >>>> I am not that interested in the extra TLVs. They were just making a bit >>>> of noise in our logs :) >>>> >>>> Thanks again for your help. >>>> >>>> >>>> >>>> On Thu, Sep 9, 2010 at 3:35 PM, Alvaro Cornejo < >>>> cornejo.alv...@gmail.com> wrote: >>>> >>>>> Have you checked what does the system logs in debug mode? >>>>> >>>>> Regarding the queue, there is a tool created by Alejandro Guerreri >>>>> that allows you to view the queue content and delete messages... well >>>>> kannel does have several queues, so I don't know if it does wirk for >>>>> the one you mention. I don't remember the details but you can check >>>>> his blog. http://www.blogalex.com/archives/72 >>>>> >>>>> About the TLV's you are receiving, you should ask yoru provider to see >>>>> what does they mean and what info are they sending. If its of your >>>>> interest, you can configure meta-data so you can capture that info; >>>>> otherwise you can safely ignore. As the PDU type is deliver_sm, I >>>>> suspect that it might be the dlr status... and that is why you have >>>>> that queue. >>>>> >>>>> Also if you upgrade to a recent version, the status page was improved >>>>> and it shows now separate counters for MT and dlrs. in older versions >>>>> MT/dlr counters were mixed >>>>> >>>>> >>>>> Hope helps >>>>> >>>>> Alvaro >>>>> >>>>> |-----------------------------------------------------------------------------------------------------------------| >>>>> Envíe y Reciba Datos y mensajes de Texto (SMS) hacia y desde cualquier >>>>> celular y Nextel >>>>> en el Perú, México y en mas de 180 paises. Use aplicaciones 2 vias via >>>>> SMS y GPRS online >>>>> Visitenos en www.perusms.NET www.smsglobal.com.mx y >>>>> www.pravcom.com >>>>> >>>>> >>>>> >>>>> On Thu, Sep 9, 2010 at 2:42 AM, brett skinner < >>>>> tatty.dishcl...@gmail.com> wrote: >>>>> > Hi everyone >>>>> > Just wondering if anyone has had a chance to look at this yet? >>>>> > Thanks and appreciate any help. >>>>> > >>>>> > ---------- Forwarded message ---------- >>>>> > From: brett skinner <tatty.dishcl...@gmail.com> >>>>> > Date: Tue, Sep 7, 2010 at 10:47 AM >>>>> > Subject: Weird binding issue that causes queues to build up. >>>>> > To: Users <users@kannel.org> >>>>> > >>>>> > >>>>> > Hi >>>>> > We are experiencing a rather weird occasional issue with Kannel. We >>>>> have two >>>>> > different boxes each with a Kannel installation. Every now and then >>>>> one of >>>>> > the boxes stops processing SMS queues and the queues just build up. >>>>> This >>>>> > happens to both boxes (just not at the same time) When we have a look >>>>> at the >>>>> > status page we can see the queue and there are sms queued to the >>>>> bearerbox. >>>>> > I assume that it is the bearerbox queue. It looks as followed (from >>>>> the >>>>> > status page) >>>>> > SMS: received 123 (0 queued), sent 123 (456 queued), store size -1 >>>>> > It is the 456 queued part that we are concerned about. All the binds >>>>> report >>>>> > as being online with 0 in the queues but that 456 queue does not >>>>> disappear. >>>>> > If I sit trying to restart bind after bind one of them usually does >>>>> the >>>>> > trick and queue disappears. The problem is we usually have no idea >>>>> which >>>>> > bind it is and they are all reporting as being online. I have noticed >>>>> > looking through our logs from upstream applications that it appears >>>>> that >>>>> > there was a network outage at round about the same time. I have not >>>>> yet >>>>> > confirmed this with the hosting company. Also this is what appears in >>>>> the >>>>> > syslog. >>>>> > Sep 6 23:02:46 123-123-123-123 avahi-daemon[17943]: Received >>>>> response from >>>>> > host 64.150.181.120 with invalid source port 53756 on interface >>>>> 'eth0.0' >>>>> > Sep 6 23:17:01 123-123-123-123 CRON[16934]: (root) CMD ( cd / && >>>>> > run-parts --report /etc/cron.hourly) >>>>> > Sep 6 23:32:46 123-123-123-123 avahi-daemon[17943]: Received >>>>> response from >>>>> > host 64.150.181.120 with invalid source port 33895 on interface >>>>> 'eth0.0' >>>>> > Sep 7 00:02:45 123-123-123-123 avahi-daemon[17943]: Received >>>>> response from >>>>> > host 64.150.181.120 with invalid source port 55945 on interface >>>>> 'eth0.0' >>>>> > Sep 7 00:17:01 123-123-123-123 CRON[17231]: (root) CMD ( cd / && >>>>> > run-parts --report /etc/cron.hourly) >>>>> > Sep 7 00:32:45 123-123-123-123 avahi-daemon[17943]: Received >>>>> response from >>>>> > host 64.150.181.120 with invalid source port 45291 on interface >>>>> 'eth0.0' >>>>> > Sep 7 01:02:45 123-123-123-123 avahi-daemon[17943]: Received >>>>> response from >>>>> > host 64.150.181.120 with invalid source port 39067 on interface >>>>> 'eth0.0' >>>>> > Sep 7 01:17:01 123-123-123-123 CRON[17479]: (root) CMD ( cd / && >>>>> > run-parts --report /etc/cron.hourly) >>>>> > That IP address is not in our kannel.conf file. I am not sure what >>>>> these >>>>> > errors are about. I might need to investigate this further. I am not >>>>> > security expert so I have no idea if this is malicious or not. >>>>> > This is what appears in the bearerbox logs at about the same time as >>>>> the >>>>> > outage: >>>>> > 2010-09-06 23:02:46 [32641] [12] WARNING: SMPP: Unknown >>>>> > TLV(0x1406,0x0007,01906032503580) for PDU type (deliver_sm) received! >>>>> > 2010-09-06 23:03:07 [32641] [12] WARNING: SMPP: Unknown >>>>> > TLV(0x1406,0x0007,01906032605180) for PDU type (deliver_sm) received! >>>>> > 2010-09-06 23:08:04 [32641] [10] WARNING: SMPP: Unknown >>>>> > TLV(0x1406,0x0007,01906032113180) for PDU type (deliver_sm) received! >>>>> > 2010-09-06 23:14:32 [32641] [9] WARNING: SMPP: Unknown >>>>> > TLV(0x1406,0x0007,01906032711480) for PDU type (deliver_sm) received! >>>>> > 2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: I/O error or other >>>>> error. >>>>> > Re-connecting. >>>>> > 2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: Couldn't connect to >>>>> SMS >>>>> > center (retrying in 10 seconds). >>>>> > 2010-09-06 23:26:12 [32641] [18] WARNING: SMPP[BBB]: Not ACKED >>>>> message >>>>> > found, will retransmit. SENT<94>sec. ago, SEQ<423861>, >>>>> DST<+xxxxxxxxxxxxx> >>>>> > 2010-09-06 23:26:12 [32641] [18] WARNING: SMPP[BBB]: Not ACKED >>>>> message >>>>> > found, will retransmit. SENT<94>sec. ago, SEQ<423862>, >>>>> DST<+xxxxxxxxxxxxx> >>>>> > 2010-09-06 23:26:12 [32641] [18] ERROR: SMPP[BBB]: I/O error or other >>>>> error. >>>>> > Re-connecting. >>>>> > 2010-09-06 23:26:12 [32641] [18] ERROR: SMPP[BBB]: Couldn't connect >>>>> to SMS >>>>> > center (retrying in 10 seconds). >>>>> > 2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for >>>>> concatenated >>>>> > message ''+xxxxxxxxxxxxx>' '+yyyyyyyyyyy' 'EEE' '10' '2' '''. Send >>>>> message >>>>> > parts as is. >>>>> > 2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for >>>>> concatenated >>>>> > message ''+xxxxxxxxxxxxx>' '+yyyyyyyyyyy' 'EEE' '85' '2' '''. Send >>>>> message >>>>> > parts as is. >>>>> > 2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for >>>>> concatenated >>>>> > message ''+xxxxxxxxxxxxx>' '+yyyyyyyyyyy' 'EEE' '152' '2' '''. Send >>>>> message >>>>> > parts as is. >>>>> > 2010-09-06 23:26:42 [32641] [17] ERROR: SMPP[CCC]: I/O error or other >>>>> error. >>>>> > Re-connecting. >>>>> > 2010-09-06 23:26:42 [32641] [17] ERROR: SMPP[CCC]: Couldn't connect >>>>> to SMS >>>>> > center (retrying in 10 seconds). >>>>> > 2010-09-06 23:27:08 [32641] [13] WARNING: SMPP: Unknown >>>>> > TLV(0x1406,0x0007,01906032035180) for PDU type (deliver_sm) received! >>>>> > 2010-09-06 23:27:12 [32641] [19] ERROR: SMPP[BBB]: I/O error or other >>>>> error. >>>>> > Re-connecting. >>>>> > 2010-09-06 23:27:12 [32641] [19] ERROR: SMPP[BBB]: Couldn't connect >>>>> to SMS >>>>> > center (retrying in 10 seconds). >>>>> > 2010-09-06 23:27:14 [32641] [12] WARNING: SMPP: Unknown >>>>> > TLV(0x1406,0x0007,01906032031280) for PDU type (deliver_sm) received! >>>>> > 2010-09-06 23:27:25 [32641] [16] ERROR: SMPP[CCC]: I/O error or other >>>>> error. >>>>> > Re-connecting. >>>>> > 2010-09-06 23:27:25 [32641] [16] ERROR: SMPP[CCC]: Couldn't connect >>>>> to SMS >>>>> > center (retrying in 10 seconds). >>>>> > 2010-09-06 23:28:11 [32641] [6] WARNING: SMPP: Unknown >>>>> > TLV(0x140e,0x000c,323738373735303030303200) for PDU type (deliver_sm) >>>>> > received! >>>>> > 2010-09-06 23:55:18 [32641] [8] ERROR: SMPP[DDD]: I/O error or other >>>>> error. >>>>> > Re-connecting. >>>>> > 2010-09-06 23:55:18 [32641] [8] ERROR: SMPP[DDD]: Couldn't connect to >>>>> SMS >>>>> > center (retrying in 10 seconds). >>>>> > 2010-09-06 23:55:21 [32641] [11] WARNING: SMPP: Unknown >>>>> > TLV(0x1406,0x0007,01906032858280) for PDU type (deliver_sm) received! >>>>> > I am looking for any sort of guidance of where to start to resolve >>>>> this >>>>> > issue. Also any comments will be most welcome. In general I would >>>>> like to >>>>> > know: >>>>> > >>>>> > Is there anyway that I can see what is in that queue of 456, I would >>>>> like to >>>>> > know which bind is down. I thought store-status might be it does not >>>>> appear >>>>> > to be. >>>>> > What could be causing this issue? (If you suspect that is something >>>>> to do >>>>> > with configuration I will post the configuration file) >>>>> > I notice some unknown TLV warnings. Is this something we should be >>>>> concerned >>>>> > about? >>>>> > It seems that there was some sort of network problem and all the >>>>> connections >>>>> > (to different smscs) disconnected and reconnected. Why does the queue >>>>> not >>>>> > disappear after they reconnect? >>>>> > >>>>> > I greatly appreciate your time and effort. Thanks >>>>> > Regards, >>>>> > >>>>> >>>> >>>> >>>> >>> >>> >> > >