Re: Weird binding issue that causes queues to build up.

Benaiad Thu, 23 Sep 2010 06:14:36 -0700

Hi Brett,

I've found that workaround, which was offered by Mr. Donald Jackson.
you may find it at
http://www.mail-archive.com/users@kannel.org/msg15958.html


Regards
--
Abdulmnem Benaiad
Almontaha CTO
www.almontaha.ly
Tripoli-Libya



On Thu, Sep 23, 2010 at 2:58 PM, brett skinner <tatty.dishcl...@gmail.com>wrote:

> Hi
>
> We are using transmitter and receiver because apparently there are
> performance issues when using transceiver due to both the
> transmitting/receiving being handled on a single thread. We were advised to
> use Tx and Rx.
>
> We have contacted the provider and while they acknowledge that they had an
> outage for a couple of seconds everyone else was able to reconnect without
> an issue. It was just us. But this is not limited to them, it seems any bind
> that dies and comes back there is a chance that bearerbox will start
> queuing.
>
> Do you have any extra information on the work-around?
>
> Regards,
>
>
> On Thu, Sep 23, 2010 at 2:24 PM, Benaiad <bena...@gmail.com> wrote:
>
>> Hi Brett,
>>
>> Which type of connection are you using? if it's not as transceiver, I
>> suggest you to use it if your provider has a support for this.
>> There is a known bug regarding the separated connections for Tx & Rx.
>> I beleave that there is another workaround for this by defining two smsc
>> groups one for Tx and the other for Rx.
>>
>> Regards
>>
>> --
>> Benaiad
>>
>>
>>
>> On Thu, Sep 23, 2010 at 11:21 AM, brett skinner <
>> tatty.dishcl...@gmail.com> wrote:
>>
>>> Hi guys
>>>
>>> We have just hit this issue AGAIN this morning. Can ANYONE please give
>>> some guidance here. I have had zero response on this critical issue for over
>>> two weeks now. Can someone please help. This issue is becoming increasingly
>>> urgent.
>>>
>>> Regards,
>>> Brett
>>>
>>> ---------- Forwarded message ----------
>>> From: brett skinner <tatty.dishcl...@gmail.com>
>>>  Date: Fri, Sep 17, 2010 at 2:16 PM
>>> Subject: Re: Weird binding issue that causes queues to build up.
>>> To: Users <users@kannel.org>
>>>
>>>
>>> Hi
>>>
>>> We have experienced this problem again. A couple of our binds to one
>>> particular smsc (the rest were okay) had connectivity issues last night at
>>> 12 AM. The binds were re-established and reported as being online from the
>>> status pages. However a queue for one of the binds built up on the
>>> bearerbox. Only when I had run a stop-smsc and start-smsc for that bind did
>>> the queue for that bind start processing again.
>>>
>>> In the logs at 12AM we have a bunch of Errors:
>>>
>>> 2010-09-16 23:59:35 [32641] [44] ERROR: Error reading from fd 57:
>>> 2010-09-16 23:59:35 [32641] [44] ERROR: System error 104: Connection
>>> reset by peer
>>> 2010-09-16 23:59:35 [32641] [44] ERROR: SMPP[XXX]: Couldn't connect to
>>> SMS center (retrying in 10 seconds).
>>> 2010-09-16 23:59:38 [32641] [38] ERROR: Error reading from fd 52:
>>> 2010-09-16 23:59:38 [32641] [38] ERROR: System error 104: Connection
>>> reset by peer
>>> 2010-09-16 23:59:38 [32641] [38] ERROR: SMPP[YYY]: Couldn't connect to
>>> SMS center (retrying in 10 seconds).
>>> 2010-09-16 23:59:39 [32641] [46] ERROR: Error reading from fd 50:
>>> 2010-09-16 23:59:39 [32641] [46] ERROR: System error 104: Connection
>>> reset by peer
>>> 2010-09-16 23:59:39 [32641] [46] ERROR: SMPP[AAA]: Couldn't connect to
>>> SMS center (retrying in 10 seconds).
>>> 2010-09-16 23:59:47 [32641] [39] ERROR: Error reading from fd 49:
>>> 2010-09-16 23:59:47 [32641] [39] ERROR: System error 104: Connection
>>> reset by peer
>>> 2010-09-16 23:59:47 [32641] [39] ERROR: SMPP[YYY]: Couldn't connect to
>>> SMS center (retrying in 10 seconds).
>>> 2010-09-16 23:59:51 [32641] [48] ERROR: Error reading from fd 61:
>>> 2010-09-16 23:59:51 [32641] [48] ERROR: System error 104: Connection
>>> reset by peer
>>> 2010-09-16 23:59:51 [32641] [48] ERROR: SMPP[BBB]: Couldn't connect to
>>> SMS center (retrying in 10 seconds).
>>> 2010-09-17 00:00:00 [32641] [47] ERROR: Error reading from fd 40:
>>> 2010-09-17 00:00:00 [32641] [47] ERROR: System error 104: Connection
>>> reset by peer
>>> 2010-09-17 00:00:00 [32641] [47] ERROR: SMPP[XXX]: Couldn't connect to
>>> SMS center (retrying in 10 seconds).
>>>
>>>
>>> I am not sure how Kannel works internally, but it is almost as if when
>>> the bind is re-established, the old one is disposed and a new one is created
>>> but the queue and the pointers are still sticking around for the old one and
>>> have not been updated. This results in messages sitting in the queue and not
>>> being routed to the bind which reports as being online.
>>>
>>> I see that there might have been similar issues in the past:
>>> http://www.kannel.org/pipermail/users/2009-May/007166.html. It might be
>>> related maybe not.
>>>
>>> <http://www.kannel.org/pipermail/users/2009-May/007166.html>We have
>>> already set our binds up in a transmitter and receiver. We are not running
>>> transceiver.
>>>
>>> Regards,
>>>
>>>
>>> On Thu, Sep 9, 2010 at 3:42 PM, brett skinner <tatty.dishcl...@gmail.com
>>> > wrote:
>>>
>>>> Thanks Alvaro for your response.
>>>>
>>>> I am running a build from SVN from about a 2 weeks ago. I am bit weary
>>>> of turning the loggers to debug mode because we are doing a lot of traffic
>>>> and debug mode is very verbose we will eat through our disk in no time. It
>>>> would be different if it was reproducible or if we could anticipate the
>>>> problem because we could just turn on the loggers at the right time. This
>>>> happens so sporadically we would have to leave the loggers in debug mode.
>>>> The last time this happened was last week.
>>>>
>>>> I will go check out that tool you mentioned.
>>>>
>>>> I am not that interested in the extra TLVs. They were just making a bit
>>>> of noise in our logs :)
>>>>
>>>> Thanks again for your help.
>>>>
>>>>
>>>>
>>>> On Thu, Sep 9, 2010 at 3:35 PM, Alvaro Cornejo <
>>>> cornejo.alv...@gmail.com> wrote:
>>>>
>>>>> Have you checked what does the system logs in debug mode?
>>>>>
>>>>> Regarding the queue, there is a tool created by Alejandro Guerreri
>>>>> that allows you to view the queue content and delete messages... well
>>>>> kannel does have several queues, so I don't know if it does wirk for
>>>>> the one you mention. I don't remember the details but you can check
>>>>> his blog. http://www.blogalex.com/archives/72
>>>>>
>>>>> About the TLV's you are receiving, you should ask yoru provider to see
>>>>> what does they mean and what info are they sending. If its of your
>>>>> interest, you can configure meta-data so you can capture that info;
>>>>> otherwise you can safely ignore. As the PDU type is deliver_sm, I
>>>>> suspect that it might be the dlr status... and that is why you have
>>>>> that queue.
>>>>>
>>>>> Also if you upgrade to a recent version, the status page was improved
>>>>> and it shows now separate counters for MT and dlrs. in older versions
>>>>> MT/dlr counters were mixed
>>>>>
>>>>>
>>>>> Hope helps
>>>>>
>>>>> Alvaro
>>>>>
>>>>> |-----------------------------------------------------------------------------------------------------------------|
>>>>> Envíe y Reciba Datos y mensajes de Texto (SMS) hacia y desde cualquier
>>>>> celular y Nextel
>>>>> en el Perú, México y en mas de 180 paises. Use aplicaciones 2 vias via
>>>>> SMS y GPRS online
>>>>>               Visitenos en www.perusms.NET www.smsglobal.com.mx y
>>>>> www.pravcom.com
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Sep 9, 2010 at 2:42 AM, brett skinner <
>>>>> tatty.dishcl...@gmail.com> wrote:
>>>>> > Hi everyone
>>>>> > Just wondering if anyone has had a chance to look at this yet?
>>>>> > Thanks and appreciate any help.
>>>>> >
>>>>> > ---------- Forwarded message ----------
>>>>> > From: brett skinner <tatty.dishcl...@gmail.com>
>>>>> > Date: Tue, Sep 7, 2010 at 10:47 AM
>>>>> > Subject: Weird binding issue that causes queues to build up.
>>>>> > To: Users <users@kannel.org>
>>>>> >
>>>>> >
>>>>> > Hi
>>>>> > We are experiencing a rather weird occasional issue with Kannel. We
>>>>> have two
>>>>> > different boxes each with a Kannel installation. Every now and then
>>>>> one of
>>>>> > the boxes stops processing SMS queues and the queues just build up.
>>>>> This
>>>>> > happens to both boxes (just not at the same time) When we have a look
>>>>> at the
>>>>> > status page we can see the queue and there are sms queued to the
>>>>> bearerbox.
>>>>> > I assume that it is the bearerbox queue. It looks as followed (from
>>>>> the
>>>>> > status page)
>>>>> > SMS: received 123 (0 queued), sent 123 (456 queued), store size -1
>>>>> > It is the 456 queued part that we are concerned about. All the binds
>>>>> report
>>>>> > as being online with 0 in the queues but that 456 queue does not
>>>>> disappear.
>>>>> > If I sit trying to restart bind after bind one of them usually does
>>>>> the
>>>>> > trick and queue disappears. The problem is we usually have no idea
>>>>> which
>>>>> > bind it is and they are all reporting as being online. I have noticed
>>>>> > looking through our logs from upstream applications that it appears
>>>>> that
>>>>> > there was a network outage at round about the same time. I have not
>>>>> yet
>>>>> > confirmed this with the hosting company. Also this is what appears in
>>>>> the
>>>>> > syslog.
>>>>> > Sep  6 23:02:46 123-123-123-123 avahi-daemon[17943]: Received
>>>>> response from
>>>>> > host 64.150.181.120 with invalid source port 53756 on interface
>>>>> 'eth0.0'
>>>>> > Sep  6 23:17:01 123-123-123-123 CRON[16934]: (root) CMD (   cd / &&
>>>>> > run-parts --report /etc/cron.hourly)
>>>>> > Sep  6 23:32:46 123-123-123-123 avahi-daemon[17943]: Received
>>>>> response from
>>>>> > host 64.150.181.120 with invalid source port 33895 on interface
>>>>> 'eth0.0'
>>>>> > Sep  7 00:02:45 123-123-123-123 avahi-daemon[17943]: Received
>>>>> response from
>>>>> > host 64.150.181.120 with invalid source port 55945 on interface
>>>>> 'eth0.0'
>>>>> > Sep  7 00:17:01 123-123-123-123 CRON[17231]: (root) CMD (   cd / &&
>>>>> > run-parts --report /etc/cron.hourly)
>>>>> > Sep  7 00:32:45 123-123-123-123 avahi-daemon[17943]: Received
>>>>> response from
>>>>> > host 64.150.181.120 with invalid source port 45291 on interface
>>>>> 'eth0.0'
>>>>> > Sep  7 01:02:45 123-123-123-123 avahi-daemon[17943]: Received
>>>>> response from
>>>>> > host 64.150.181.120 with invalid source port 39067 on interface
>>>>> 'eth0.0'
>>>>> > Sep  7 01:17:01 123-123-123-123 CRON[17479]: (root) CMD (   cd / &&
>>>>> > run-parts --report /etc/cron.hourly)
>>>>> > That IP address is not in our kannel.conf file. I am not sure what
>>>>> these
>>>>> > errors are about. I might need to investigate this further. I am not
>>>>> > security expert so I have no idea if this is malicious or not.
>>>>> > This is what appears in the bearerbox logs at about the same time as
>>>>> the
>>>>> > outage:
>>>>> > 2010-09-06 23:02:46 [32641] [12] WARNING: SMPP: Unknown
>>>>> > TLV(0x1406,0x0007,01906032503580) for PDU type (deliver_sm) received!
>>>>> > 2010-09-06 23:03:07 [32641] [12] WARNING: SMPP: Unknown
>>>>> > TLV(0x1406,0x0007,01906032605180) for PDU type (deliver_sm) received!
>>>>> > 2010-09-06 23:08:04 [32641] [10] WARNING: SMPP: Unknown
>>>>> > TLV(0x1406,0x0007,01906032113180) for PDU type (deliver_sm) received!
>>>>> > 2010-09-06 23:14:32 [32641] [9] WARNING: SMPP: Unknown
>>>>> > TLV(0x1406,0x0007,01906032711480) for PDU type (deliver_sm) received!
>>>>> > 2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: I/O error or other
>>>>> error.
>>>>> > Re-connecting.
>>>>> > 2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: Couldn't connect to
>>>>> SMS
>>>>> > center (retrying in 10 seconds).
>>>>> > 2010-09-06 23:26:12 [32641] [18] WARNING: SMPP[BBB]: Not ACKED
>>>>> message
>>>>> > found, will retransmit. SENT<94>sec. ago, SEQ<423861>,
>>>>> DST<+xxxxxxxxxxxxx>
>>>>> > 2010-09-06 23:26:12 [32641] [18] WARNING: SMPP[BBB]: Not ACKED
>>>>> message
>>>>> > found, will retransmit. SENT<94>sec. ago, SEQ<423862>,
>>>>> DST<+xxxxxxxxxxxxx>
>>>>> > 2010-09-06 23:26:12 [32641] [18] ERROR: SMPP[BBB]: I/O error or other
>>>>> error.
>>>>> > Re-connecting.
>>>>> > 2010-09-06 23:26:12 [32641] [18] ERROR: SMPP[BBB]: Couldn't connect
>>>>> to SMS
>>>>> > center (retrying in 10 seconds).
>>>>> > 2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for
>>>>> concatenated
>>>>> > message ''+xxxxxxxxxxxxx>' '+yyyyyyyyyyy' 'EEE' '10' '2' '''. Send
>>>>> message
>>>>> > parts as is.
>>>>> > 2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for
>>>>> concatenated
>>>>> > message ''+xxxxxxxxxxxxx>' '+yyyyyyyyyyy' 'EEE' '85' '2' '''. Send
>>>>> message
>>>>> > parts as is.
>>>>> > 2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for
>>>>> concatenated
>>>>> > message ''+xxxxxxxxxxxxx>' '+yyyyyyyyyyy' 'EEE' '152' '2' '''. Send
>>>>> message
>>>>> > parts as is.
>>>>> > 2010-09-06 23:26:42 [32641] [17] ERROR: SMPP[CCC]: I/O error or other
>>>>> error.
>>>>> > Re-connecting.
>>>>> > 2010-09-06 23:26:42 [32641] [17] ERROR: SMPP[CCC]: Couldn't connect
>>>>> to SMS
>>>>> > center (retrying in 10 seconds).
>>>>> > 2010-09-06 23:27:08 [32641] [13] WARNING: SMPP: Unknown
>>>>> > TLV(0x1406,0x0007,01906032035180) for PDU type (deliver_sm) received!
>>>>> > 2010-09-06 23:27:12 [32641] [19] ERROR: SMPP[BBB]: I/O error or other
>>>>> error.
>>>>> > Re-connecting.
>>>>> > 2010-09-06 23:27:12 [32641] [19] ERROR: SMPP[BBB]: Couldn't connect
>>>>> to SMS
>>>>> > center (retrying in 10 seconds).
>>>>> > 2010-09-06 23:27:14 [32641] [12] WARNING: SMPP: Unknown
>>>>> > TLV(0x1406,0x0007,01906032031280) for PDU type (deliver_sm) received!
>>>>> > 2010-09-06 23:27:25 [32641] [16] ERROR: SMPP[CCC]: I/O error or other
>>>>> error.
>>>>> > Re-connecting.
>>>>> > 2010-09-06 23:27:25 [32641] [16] ERROR: SMPP[CCC]: Couldn't connect
>>>>> to SMS
>>>>> > center (retrying in 10 seconds).
>>>>> > 2010-09-06 23:28:11 [32641] [6] WARNING: SMPP: Unknown
>>>>> > TLV(0x140e,0x000c,323738373735303030303200) for PDU type (deliver_sm)
>>>>> > received!
>>>>> > 2010-09-06 23:55:18 [32641] [8] ERROR: SMPP[DDD]: I/O error or other
>>>>> error.
>>>>> > Re-connecting.
>>>>> > 2010-09-06 23:55:18 [32641] [8] ERROR: SMPP[DDD]: Couldn't connect to
>>>>> SMS
>>>>> > center (retrying in 10 seconds).
>>>>> > 2010-09-06 23:55:21 [32641] [11] WARNING: SMPP: Unknown
>>>>> > TLV(0x1406,0x0007,01906032858280) for PDU type (deliver_sm) received!
>>>>> > I am looking for any sort of guidance of where to start to resolve
>>>>> this
>>>>> > issue. Also any comments will be most welcome. In general I would
>>>>> like to
>>>>> > know:
>>>>> >
>>>>> > Is there anyway that I can see what is in that queue of 456, I would
>>>>> like to
>>>>> > know which bind is down. I thought store-status might be it does not
>>>>> appear
>>>>> > to be.
>>>>> > What could be causing this issue? (If you suspect that is something
>>>>> to do
>>>>> > with configuration I will post the configuration file)
>>>>> > I notice some unknown TLV warnings. Is this something we should be
>>>>> concerned
>>>>> > about?
>>>>> > It seems that there was some sort of network problem and all the
>>>>> connections
>>>>> > (to different smscs) disconnected and reconnected. Why does the queue
>>>>> not
>>>>> > disappear after they reconnect?
>>>>> >
>>>>> > I greatly appreciate your time and effort. Thanks
>>>>> > Regards,
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Re: Weird binding issue that causes queues to build up.

Reply via email to