VoIP based phone systems bring many benefits, but they also bring some problems. Not least is the annoying tendency for some calls to drop mid-way through your conversation for no obvious reason. In this article I will identify the most common reasons why a VoIP call might suddenly drop mid-way through an established call and explain how you can diagnose the cause. At the end are some pointers to the solutions for these problems.
This article is not about problems setting up calls in the first place, nor about calls that have poor quality audio, no audio or 1-way audio (the latter are more likely to be explained in my other articles about SIP and NAT which can be found here).
Some updates were made to this article in December 2020.
Gather information from the users
It is easy to fall into the trap of thinking you can only identify this type of problem using sophisticated technology-based solutions. However, in my experience the key to initially identifying the cause of dropping calls is to ask the users a few simple questions. The answers will often be sufficient to allow you to narrow your search down to just one, or at worst two, possibilities. Try asking them the following:
- Does the call drop after a fixed period of time? If so, how long into the call does it happen? (Get the users to measure exactly how long it takes on a number of occasions).
- Does it only happen when you are calling a particular destination (e.g. a conference service)?
- Do some of your colleagues never experience the problem and, if so, can you see anything different about their phone or the destinations they are calling?
- Do you use the microphone mute button and, if so, do you find calls mainly drop when the microphone is muted?
- Does the call only seem to drop when you are talking?
- Do you, or the other participants in the call, sometimes hear a short blast of tone coming through the phone’s earpiece while someone is talking?
Look at the answers and see if there is a clear pattern – does it point to certain phones being worse than others, or certain destinations, or both? Now read on and you should quickly see how the answers to the above questions will help to pinpoint the cause.
Talk-off
Talk-off is an unintended command activation when the human voice is mistakenly detected as a DTMF control signal. DTMF tones are normally only generated when you press a key on the phone’s keypad. Talk-off is where the detector in the remote server or PBX gets triggered by similar frequencies in human speech.
This false triggering of the tone detector may not always cause the call to drop, but it is not unheard of for the signal to be misinterpreted as a request to end the call or put the call on hold. For example, a conference bridge might interpret * as meaning the user is leaving the conference.
Diagnosing talk-off:
- It can happen at any time after the start of the call
- If triggered from the local end, it will happen when the user is speaking
- Certain destinations may be much more susceptible to this fault than others
- Calling/called parties may sometimes hear a DTMF tone during speech
- Certain voices are more susceptible than others – tends to happen more with female voices than male
Malfunctioning SIP Session Timers
With VoIP calls, it is possible for a connection to fail and for that failure to not be detected immediately. For example, if you trip over the cable and pull the power lead out of your phone then it never gets the chance to send an end-of-call signal. The SIP Session Timers (SST) mechanism is designed to prevent such “orphan” calls from persisting for an excessive length of time. “Keep-alive” messages are sent from one end-point to the other at regular intervals (e.g. every 15 minutes). If the expected message does not arrive “on time” then it is assumed the connection to the far end has failed and the call is ended.
It is not uncommon for SIP Session Timers to go wrong resulting in a false positive and the call being dropped. Of course, they should not go wrong, but they do. This is probably due to subtle incompatibilities in the way the mechanism was implemented in the end-point devices, especially if those devices are not from the same manufacturer.
Diagnosing a problem with SIP Session Timers
- The call drops at almost exactly the same duration into the call every time, typically 10 minutes, 15 minutes or 30 minutes
- The call will normally last for at least 5 minutes
- Some makes or models of handset may be likely to exhibit the fault while others are completely immune
- It can happen whether or not speech is present and irrespective of who is talking
Over-aggressive Silence or “No RTP” Detection
Some VoIP servers may assume that a period of “no audio” means the connection to the far end has failed. This is another way that some VoIP equipment tries to detect an “orphan” call. It looks at the media stream (which uses the RTP protocol) and detects when no audio signal is present. Usually there would be enough background noise to prevent this happening, but a muted microphone might trigger a false positive. Another reason for this happening would be if the handset has “silence suppression” or “voice activity detection” (VAD) enabled. This is a mechanism that deliberately stops sending audio packets when the sound level at the microphone falls below a certain threshold. It is meant to reduce network bandwidth demand.
Diagnosing the silence detection fault
- The call drops when the user at one end of the circuit has been silent, or is using mic mute, for a period of time.
- Some makes or models of handset may be likely to exhibit the fault while others are completely immune
- Most equipment will allow at least 30 seconds of silence before dropping the call
- It is almost the opposite of talk-off
Bad routing/proxying of SIP ACK signals
The SIP protocol requires that certain timeout periods are set, within which a response or acknowledgement message must arrive from the far end. It is possible for a call to start, apparently with everything ok, but to then end, say, 10 seconds or 20 seconds later because the SIP ACK (Acknowledgement) message failed to reach the intended destination within the timeout period.
Diagnosing failed ACK signals
- Every time a call fails, it will be exactly the same number of seconds after it was answered
- It usually happens well under 1 minute into the call and could be as little as 10 seconds
- It may only happen when certain destinations are called or when certain call routes are selected
Maximum call time exceeded
Many service providers set a limit on the maximum duration for any call passing through their system. This is yet another way of protecting against so-called “orphan” calls which could otherwise persist on the service providers system for days. The maximum call duration would almost always be set to at least 1 hour, but in most cases it would be 2 or more hours. On a pre-paid system, the maximum permitted length of your call is likely to be linked to how much credit is in your account.
Loss of signal and other issues unrelated to VoIP
Just because you have a VoIP system, do not assume that all faults are VoIP related. Calls to or from mobile handsets (cell phones) will often drop simply because the signal on the mobile handset was lost. This type of problem happens for everyone and is no different for VoIP users than it is for users of legacy PBX’s. Consider also that your IP handset and IP-PBX depend on network connections. If any part of that network relies on Wi-Fi or other non-cable based connections, it could simply be a fault in the network equipment or something as banal as a loss of a Wi-Fi signal.
How to fix dropping VoIP calls
If you are clearly able to identify the cause of the problem, various remedies may be available to you. If the cause is unclear, a packet capture can often help to prove or disprove a tentative diagnosis. In some cases you can simply proceed on the basis of your best guess and see if things get better, or at least change, when you make certain adjustments. Sometimes, the solution may be out of your hands and you will have to work with the support department of your service provider.
With talk-off problems, reducing the gain on the handset’s microphone may help, but the real solution lies further downstream in the connection chain. If you have admin access to the PBX, look for settings that reduce the sensitivity during DTMF detection. On Asterisk or FreePBX systems try setting “relaxdtmf=no” for the relevant sip connections. It may also help if you change the method of detection, especially disabling so-called “in-band” DTMF detection. On Asterisk, look for the dtmfmode setting in the sip configuration:
dtmfmode=inband | Susceptible to talk-off |
dtmfmode=auto | Susceptible to talk-off |
dtmfmode=rfc2833 | Recommended – has a reduced chance of talk-off |
If you suspect the problem is due to SIP Session Timers, then you may need to experiment with settings. Look for settings on your IP phones. If you cannot find them, try a Google search (e.g. “Snom 360 session timers”) and, if necessary, contact the support department of the manufacturer. If the problem happens with some phones, but not others, then try to duplicate the good phone’s settings on the bad handset. Try increasing the Min-SE value to determine if it alters the time before a call drops. On an Asterisk system, try setting “session-timers=refuse” in the sip.conf file or the advanced SIP settings of FreePBX – this will disable SST’s and may instantly solve your problem.
When it looks like the problem is an over-aggressive silence detection system, the culprit is likely to be the equipment you are calling. This means you may not be able to disable it or adjust the timeout. However, there may be remedies within your reach. Some phones have settings that allow you to enable or disable “silence suppression” or “VAD” (Voice Activity Detection). Try altering the settings to see if it makes a difference. You may even find a setting that is specifically there for this problem. On Snom phones with v8 firmware, it is called “Send silent RTP packets on mute” and is in the Advanced > Audio section. I recommend you switch it on.
If you have an Asterisk system and suspect it is disconnecting calls when the voice stream goes silent, then you should consider changing the RTP Timer settings. Here is an extract from the auto-generated sip.conf file of an Asterisk 1.6 installation:
;rtptimeout=60 ; Terminate call if 60 seconds of no RTP or RTCP activity on the audio channel when ; we're not on hold. This is to be able to hangup a call in the case of a phone ; disappearing from the net, like a power loss or grandma tripping over a cable. ;rtpholdtimeout=300 ; Terminate call if 300 seconds of no RTP or RTCP activity on the audio channel ; when we're on hold (must be > rtptimeout) ;rtpkeepalive=<secs> ; Send keepalives in the RTP stream to keep NAT open (default is off - zero)
If you think your problem fits the symptoms of the missing ACK message, I regret that I can only provide a limited amount of “self-help” advice here. The first step would be to disable the “SIP ALG” option if it is enabled in any NAT routers or firewalls. In most business-grade firewalls, this option creates more problems than it solves. High-end commercial firewalls from the big manufacturers such as Cisco should be okay, as long as they have been configured correctly.
The next step is to obtain a packet capture using a tool such as Wireshark. This really needs to be done on the service provider’s Proxy server – a packet capture at the customer’s premises might not be adequate, but is still worth a try. In my experience, the usual reason for an ACK message to go missing is because the wrong address was given in a Contact header earlier in the SIP dialogue. If you are examining a packet capture for the call, it is easy to miss this issue because a bad address in one SIP message does not immediately result in any obvious problem. The problem shows up later, in a SIP message travelling in the opposite direction. For example, consider a call starting with a SIP INVITE request, followed by 180 Ringing, then 200 OK, then ACK. The ACK would not arrive if the wrong IP address (or port) was given in the Contact header of the 200 OK response. The address given in any Record-Route headers is also important for correct routing of later messages – errors here can be even harder to spot because the route set is established very early in the dialogue. If you need a refresher on Contact and Record-Route headers, please check out my article covering this topic:
https://kb.smartvox.co.uk/opensips/contact-and-record-route-headers-explained/
The SIP packet capture should allow you to identify where the problem is happening. It is sometimes possible to fix this type of problem by adjusting the NAT settings on the IP phone, softphone, IP-PBX or other device at the customer’s premises. That is because the NAT settings are likely to alter the address pushed into the Contact header – it may need the external public address to be used instead of the local LAN address. Enabling STUN on the IP phone could be the solution. Defining an external address in the configuration options may do the trick. If you cannot fix the problem at the customer’s device, or the problem is in Record-Route header addresses, then there could be a bug in the provider’s SIP Proxy server or you may need a server-side solution. Either way, this would require expert help from your service provider.
If you are responsible for supporting VoIP infrastructure and getting problems arising from unexpected errors at the end of calls using TCP or TLS, then you might find it useful to read about TCP Persistence in my article here. It is quite a technical article very much aimed at VoIP professionals:
https://kb.smartvox.co.uk/opensips/NAT-Contact-and-Via-Fixing-part2
Finishing up
I hope this article helped you. If you are aware of other things that can cause call drops, please post details in the comments below. If you found this article useful, please click the Facebook “Like” button at the top of the article and/or the internal Like-counter voting button below.
Hi, Appreciate this very useful post of yours.
We have a GenBand SBC and have interconnects with more than 300 companies.
Recently 1 customer complained for call drop and we checked and it was due to the vendor not answering our UPDATE message. We shared and they said it must’ve been temporary related to packet loss or sth (which we did not believe).
Anyways, it was fixed already and we did not see it happening again, but after a few hours we saw few calls dropped after 202s with internal release code(same release code when we would wait too much for answer to the UPDATE message and then dropping the call).
Now it is our realm giving the CANCEL message and dropping the call.
From what I’ve investigated, it could be VAD issue.
But would you please advise if I’m right and how to start debugging this matter?
Thank you in advance.
If the calls drop exactly 202 seconds after the call started, then it is most likely to do with SIP Session Timers.
For it to be VAD, the time when the call drops would be related to the period of silence rather than the duration of the call.
Problems with Session Timers can be difficult to pin down because there are several variables involved – the time between session refresh requests, the type of request used to do the refresh and the direction of that request. That last one, direction, can even change during a call because the endpoints may exchange a parameter that assigns the task of refreshing to one or other end. Look for refresher=uac or refresher=uas in the relevant headers.
To diagnose/confirm whether Session Timers are causing your problem, are you able to adjust the value for “Session-Expires”? If so, change the time period and see if the call cut-off time also changes.
Just switched to the “newer” version of x-lite as the previous version seems to finally be having some compatibility issues. However, we are now having issues with calls being dropped. We did not have this issue with the older version. Calls will drop at random points in the conversation–sometimes (though only sometimes) this will happen as soon as the call connects.
I have duplicated all the settings and account preferences from the older version so none of that is different. I am the “guinea pig” for the new system so the others in the office are waiting on me to see how it works before we switch.
Any ideas would be appreciated. Cheers!
Very difficult to diagnose because the issues don’t show a consistent pattern. Dropping as soon as the call connects is likely to be a call setup issue, whereas dropping at random times is more likely to be “talk-off” or silence detection. If there were other symptoms like 1-way audio then that would help to identify the issue, but what you really need is to get a packet capture and pass it to someone with the skills to analyse it.
If the old soft phone worked okay and the new version doesn’t then I would look closely at NAT handling settings and interaction with any firewalls or NAT routers you have in the path. That said, NAT problems usually cause different symptoms, not call drops. Talk-off would be consistent with random drops during the conversation, so you could look at settings for DTMF detection – don’t have in-band detection enabled. Instead, make sure it is only using RFC2833.
Consider also trying a different soft phone. That should at least allow you to confirm that it is a problem with the newer x-lite
Hi John
We are a small Business and have 5 voip phones. Every single handset drops out after 10 min 37 sec exactly
Our provider just has us running around in circles do you think by changing providers and handsets/modems this will fix the problem or is it something that might keep re-occurring
That has got to be SIP Session Timers. Your provider should at least be able to help you sort it out. However, it depends what they are actually responsible for. e.g. did they provide the voip handsets; if not, did you follow their recommendations for which handsets to purchase and how to configure them; are they providing a hosted PBX service or do you have an IP-PBX on your premises?
If your current provider is not able to sort this out then it suggests they are incompetent so changing to a different provider is likely to be a good move.
Hi, Thanks for your best post,
My problem is i’ve added a new server in my network which has already two server i configured all without any problem and after 2 / 3 hours it display for a problem of maximum retiries on transmission XXXXXX cause 34 and all the other calls are dropped and i don’t know exactly from where this problem is ther’s a probability of someone hack my new server, NOTE : each time after i unplugged the new server from my network the problem resolved or when i restart the astersik service in all the 2 other server
Cause 34 is Circuit Congestion. Could it simply be that you’re adding a new server at your end, but this results in you exceeding the capacity of the trunk to your carrier. Even if this is true, it should not impact on existing calls. Perhaps it results in other calls being dropped because of the way your configuration handles the initial failure. For example, if the initial “cause 34” congestion failure triggers further attempts to route the call in a way that will also fail – or worse still in a way that sets up a loop – then that could result in Asterisk crashing. In my opinion, Asterisk is not a product I would recommend for serious Telco-type operations or very high traffic volumes or high numbers of Calls/Second. It’s great as a PBX and useful for sandboxing, testing and supporting special applications. As a gateway it is okay for moderate loads, but FreeSwitch is a more reliable platform for serious high capacity operations.
Thank you for this article, very helpful 🙂
John, we have VOIP at 2 locations, and digital phones at our 3rd. The internet phone server dropped out completely today 50 or more times which resulted in dropped calls. When the phones were operational they cut out intermittently at random times, the actual ring to the phone was on a delay instead of coordinating with the flashing phone light, and the ring even cut in out out. The internet phone provider says the circuits test fine. I can assure you they are not fine and no one will accept any responsibility. Any ideas?
It sounds like network problems, but the same symptoms arise when a system is under attack with denial of service or high speed password guessing attempts.
To be honest, it is impossible to problem solve your particular case with the details you’ve provided. “we have VOIP at 2 locations” could mean almost anything. “internet phone provider” suggests a hosted service, but which circuit are they testing and how. If the problem is intermittent, they might do a ping test that works fine one minute and doesn’t the next.
You need to use a systematic approach to identify where the problem is. I would start with the network connections between sites – on a Linux box, use ping, traceroute and mtr as simple basic tools to test network connectivity between two points. If you use ping, set the packet size with the -s option so it sends larger packets than the default.
ping -s 1300
The best tool is mtr because it will keep running tests continuously and updates an on-screen table of results. Leave it running for an hour and it will report the range of results over that hour rather than just a quick snapshot.
Another approach to problem solving is to change one part of the system while keeping everything else the same. If changing the one part makes no difference then it is likely that is not where the fault lies. If changing it makes a difference, then further investigation can be done. With VoIP that might mean trying a different IP handset, trying over a different broadband connection (if one is available, though sometimes you can use a 4G SIM in a portable hotspot adaptor), trying the same device with a different VoIP service (there are usually one or two VoIP services that you can sign up to and get one free account that is at least able to receive calls, but if you want to test both way you may need to put a few dollars of credit on it).
I’m sure there are whole books written on the subject of problem solving techniques. I’m not able to go into more detail here.
Recently switched to VOIP at home. Had no idea that call drop-off was possible. Wishing I hadn’t bothered now. Seriously, why would yould you go through the above when the legacy system was bomb- proof?
Sorry to hear that you’re having problems. Call drops are not uncommon, but one would hope this is only during commissioning and testing. They should certainly not be a normal part of the everyday user experience.
You’ve installed VoIP at home. Almost all the VoIP solutions I know about are used for business and not for home. Businesses are more likely to have IT departments with support people to fix any issues, especially those concerning network config like NAT, routing and firewall rules. For issues with configuration of the customer’s equipment (usually a VoIP phone or SIP-Analogue adaptor) you should be able to get help from your VoIP service provider. Many providers will recommend (or insist) that you only use certain approved equipment. They will then be able to provide online documentation explaining exactly how that equipment should be configured to work with their service. It is much more difficult for the provider to offer support in cases where customers have chosen unapproved equipment. Obviously, I don’t know what your situation is.
Although VoIP is much more likely to be found in the workplace than at home, there are still good reasons to consider it. The prime reason being lower call costs. You may also get the benefit of extra functions like voicemail and redirection of calls to multiple destinations. To get those benefits, you may have to go through some pain in the beginning. I’m sure there are other things that this is true for in life.
I understand that the number of households using legacy landlines for their phone is dropping. That is not because they are being replaced by VoIP. It’s because people are using mobile phones and don’t see the need for a landline other than maybe as a conduit for delivering broadband Internet connectivity.
Thanks for the topic. It doesn’t give direct answers what to do, but it gives all directions.
You’re right, there are few direct answers. I tried to give them where it was possible (such as certain settings that can be changed in Asterisk). For many cases the problems may involve remote equipment and/or there are so many possibilities I could not hope to list them all.
One interesting observation I made recently: Twice now I have seen cases where incorrect setting on a router/switch port for Ethernet Duplex/Half-Duplex/Auto-negotiate was causing intermittent packet loss. When the ACK packet was lost, it did not get re-sent. This meant the call was answered (200 OK) but then it dropped after about 35 seconds because no ACK was returned in response to the 200 OK. In both the cases where this happened one of the servers was an Asterisk box. This is a good example of why I cannot list all possible remedies. It was a combination of Asterisk plus network equipment with mis-matched Ethernet port settings causing random packets to be dropped.
John