VoIP based phone systems bring many benefits, but they also bring some problems. Not least is the annoying tendency for some calls to drop mid-way through your conversation for no obvious reason. In this article I will identify the most common reasons why a VoIP call might suddenly drop mid-way through an established call and explain how you can diagnose the cause. At the end are some pointers to the solutions for these problems.
This article is not about problems setting up calls in the first place, nor about calls that have poor quality audio, no audio or 1-way audio (the latter are more likely to be explained in my other articles about SIP and NAT which can be found here).
Some updates were made to this article in December 2020.
Gather information from the users
It is easy to fall into the trap of thinking you can only identify this type of problem using sophisticated technology-based solutions. However, in my experience the key to initially identifying the cause of dropping calls is to ask the users a few simple questions. The answers will often be sufficient to allow you to narrow your search down to just one, or at worst two, possibilities. Try asking them the following:
- Does the call drop after a fixed period of time? If so, how long into the call does it happen? (Get the users to measure exactly how long it takes on a number of occasions).
- Does it only happen when you are calling a particular destination (e.g. a conference service)?
- Do some of your colleagues never experience the problem and, if so, can you see anything different about their phone or the destinations they are calling?
- Do you use the microphone mute button and, if so, do you find calls mainly drop when the microphone is muted?
- Does the call only seem to drop when you are talking?
- Do you, or the other participants in the call, sometimes hear a short blast of tone coming through the phone’s earpiece while someone is talking?
Look at the answers and see if there is a clear pattern – does it point to certain phones being worse than others, or certain destinations, or both? Now read on and you should quickly see how the answers to the above questions will help to pinpoint the cause.
Talk-off is an unintended command activation when the human voice is mistakenly detected as a DTMF control signal. DTMF tones are normally only generated when you press a key on the phone’s keypad. Talk-off is where the detector in the remote server or PBX gets triggered by similar frequencies in human speech.
This false triggering of the tone detector may not always cause the call to drop, but it is not unheard of for the signal to be misinterpreted as a request to end the call or put the call on hold. For example, a conference bridge might interpret * as meaning the user is leaving the conference.
- It can happen at any time after the start of the call
- If triggered from the local end, it will happen when the user is speaking
- Certain destinations may be much more susceptible to this fault than others
- Calling/called parties may sometimes hear a DTMF tone during speech
- Certain voices are more susceptible than others – tends to happen more with female voices than male
Malfunctioning SIP Session Timers
With VoIP calls, it is possible for a connection to fail and for that failure to not be detected immediately. For example, if you trip over the cable and pull the power lead out of your phone then it never gets the chance to send an end-of-call signal. The SIP Session Timers (SST) mechanism is designed to prevent such “orphan” calls from persisting for an excessive length of time. “Keep-alive” messages are sent from one end-point to the other at regular intervals (e.g. every 15 minutes). If the expected message does not arrive “on time” then it is assumed the connection to the far end has failed and the call is ended.
It is not uncommon for SIP Session Timers to go wrong resulting in a false positive and the call being dropped. Of course, they should not go wrong, but they do. This is probably due to subtle incompatibilities in the way the mechanism was implemented in the end-point devices, especially if those devices are not from the same manufacturer.
Diagnosing a problem with SIP Session Timers
- The call drops at almost exactly the same duration into the call every time, typically 10 minutes, 15 minutes or 30 minutes
- The call will normally last for at least 5 minutes
- Some makes or models of handset may be likely to exhibit the fault while others are completely immune
- It can happen whether or not speech is present and irrespective of who is talking
Over-aggressive Silence or “No RTP” Detection
Some VoIP servers may assume that a period of “no audio” means the connection to the far end has failed. This is another way that some VoIP equipment tries to detect an “orphan” call. It looks at the media stream (which uses the RTP protocol) and detects when no audio signal is present. Usually there would be enough background noise to prevent this happening, but a muted microphone might trigger a false positive. Another reason for this happening would be if the handset has “silence suppression” or “voice activity detection” (VAD) enabled. This is a mechanism that deliberately stops sending audio packets when the sound level at the microphone falls below a certain threshold. It is meant to reduce network bandwidth demand.
Diagnosing the silence detection fault
- The call drops when the user at one end of the circuit has been silent, or is using mic mute, for a period of time.
- Some makes or models of handset may be likely to exhibit the fault while others are completely immune
- Most equipment will allow at least 30 seconds of silence before dropping the call
- It is almost the opposite of talk-off
Bad routing/proxying of SIP ACK signals
The SIP protocol requires that certain timeout periods are set, within which a response or acknowledgement message must arrive from the far end. It is possible for a call to start, apparently with everything ok, but to then end, say, 10 seconds or 20 seconds later because the SIP ACK (Acknowledgement) message failed to reach the intended destination within the timeout period.
Diagnosing failed ACK signals
- Every time a call fails, it will be exactly the same number of seconds after it was answered
- It usually happens well under 1 minute into the call and could be as little as 10 seconds
- It may only happen when certain destinations are called or when certain call routes are selected
Maximum call time exceeded
Many service providers set a limit on the maximum duration for any call passing through their system. This is yet another way of protecting against so-called “orphan” calls which could otherwise persist on the service providers system for days. The maximum call duration would almost always be set to at least 1 hour, but in most cases it would be 2 or more hours. On a pre-paid system, the maximum permitted length of your call is likely to be linked to how much credit is in your account.
Loss of signal and other issues unrelated to VoIP
Just because you have a VoIP system, do not assume that all faults are VoIP related. Calls to or from mobile handsets (cell phones) will often drop simply because the signal on the mobile handset was lost. This type of problem happens for everyone and is no different for VoIP users than it is for users of legacy PBX’s. Consider also that your IP handset and IP-PBX depend on network connections. If any part of that network relies on Wi-Fi or other non-cable based connections, it could simply be a fault in the network equipment or something as banal as a loss of a Wi-Fi signal.
How to fix dropping VoIP calls
If you are clearly able to identify the cause of the problem, various remedies may be available to you. If the cause is unclear, a packet capture can often help to prove or disprove a tentative diagnosis. In some cases you can simply proceed on the basis of your best guess and see if things get better, or at least change, when you make certain adjustments. Sometimes, the solution may be out of your hands and you will have to work with the support department of your service provider.
With talk-off problems, reducing the gain on the handset’s microphone may help, but the real solution lies further downstream in the connection chain. If you have admin access to the PBX, look for settings that reduce the sensitivity during DTMF detection. On Asterisk or FreePBX systems try setting “relaxdtmf=no” for the relevant sip connections. It may also help if you change the method of detection, especially disabling so-called “in-band” DTMF detection. On Asterisk, look for the dtmfmode setting in the sip configuration:
|dtmfmode=inband||Susceptible to talk-off|
|dtmfmode=auto||Susceptible to talk-off|
|dtmfmode=rfc2833||Recommended – has a reduced chance of talk-off|
If you suspect the problem is due to SIP Session Timers, then you may need to experiment with settings. Look for settings on your IP phones. If you cannot find them, try a Google search (e.g. “Snom 360 session timers”) and, if necessary, contact the support department of the manufacturer. If the problem happens with some phones, but not others, then try to duplicate the good phone’s settings on the bad handset. Try increasing the Min-SE value to determine if it alters the time before a call drops. On an Asterisk system, try setting “session-timers=refuse” in the sip.conf file or the advanced SIP settings of FreePBX – this will disable SST’s and may instantly solve your problem.
When it looks like the problem is an over-aggressive silence detection system, the culprit is likely to be the equipment you are calling. This means you may not be able to disable it or adjust the timeout. However, there may be remedies within your reach. Some phones have settings that allow you to enable or disable “silence suppression” or “VAD” (Voice Activity Detection). Try altering the settings to see if it makes a difference. You may even find a setting that is specifically there for this problem. On Snom phones with v8 firmware, it is called “Send silent RTP packets on mute” and is in the Advanced > Audio section. I recommend you switch it on.
If you have an Asterisk system and suspect it is disconnecting calls when the voice stream goes silent, then you should consider changing the RTP Timer settings. Here is an extract from the auto-generated sip.conf file of an Asterisk 1.6 installation:
;rtptimeout=60 ; Terminate call if 60 seconds of no RTP or RTCP activity on the audio channel when ; we're not on hold. This is to be able to hangup a call in the case of a phone ; disappearing from the net, like a power loss or grandma tripping over a cable. ;rtpholdtimeout=300 ; Terminate call if 300 seconds of no RTP or RTCP activity on the audio channel ; when we're on hold (must be > rtptimeout) ;rtpkeepalive=<secs> ; Send keepalives in the RTP stream to keep NAT open (default is off - zero)
If you think your problem fits the symptoms of the missing ACK message, I regret that I can only provide a limited amount of “self-help” advice here. The first step would be to disable the “SIP ALG” option if it is enabled in any NAT routers or firewalls. In most business-grade firewalls, this option creates more problems than it solves. High-end commercial firewalls from the big manufacturers such as Cisco should be okay, as long as they have been configured correctly.
The next step is to obtain a packet capture using a tool such as Wireshark. This really needs to be done on the service provider’s Proxy server – a packet capture at the customer’s premises might not be adequate, but is still worth a try. In my experience, the usual reason for an ACK message to go missing is because the wrong address was given in a Contact header earlier in the SIP dialogue. If you are examining a packet capture for the call, it is easy to miss this issue because a bad address in one SIP message does not immediately result in any obvious problem. The problem shows up later, in a SIP message travelling in the opposite direction. For example, consider a call starting with a SIP INVITE request, followed by 180 Ringing, then 200 OK, then ACK. The ACK would not arrive if the wrong IP address (or port) was given in the Contact header of the 200 OK response. The address given in any Record-Route headers is also important for correct routing of later messages – errors here can be even harder to spot because the route set is established very early in the dialogue. If you need a refresher on Contact and Record-Route headers, please check out my article covering this topic:
The SIP packet capture should allow you to identify where the problem is happening. It is sometimes possible to fix this type of problem by adjusting the NAT settings on the IP phone, softphone, IP-PBX or other device at the customer’s premises. That is because the NAT settings are likely to alter the address pushed into the Contact header – it may need the external public address to be used instead of the local LAN address. Enabling STUN on the IP phone could be the solution. Defining an external address in the configuration options may do the trick. If you cannot fix the problem at the customer’s device, or the problem is in Record-Route header addresses, then there could be a bug in the provider’s SIP Proxy server or you may need a server-side solution. Either way, this would require expert help from your service provider.
If you are responsible for supporting VoIP infrastructure and getting problems arising from unexpected errors at the end of calls using TCP or TLS, then you might find it useful to read about TCP Persistence in my article here. It is quite a technical article very much aimed at VoIP professionals:
I hope this article helped you. If you are aware of other things that can cause call drops, please post details in the comments below. If you found this article useful, please click the Facebook “Like” button at the top of the article and/or the internal Like-counter voting button below.