Shaping ADSL traffic with linux

What?

The trouble: A basic ADSL connection is very asymetric. The download direction has much more bandwidth than the upload direction. This is great when you (and everyone else in your house with whom you are sharing the DSL line) are downloading. When one program or connection is uploading (for example, this web server is serving up a large file, or someone is sending an email with large attachemants), the downloads slow down and interactive sessions (ssh, telnet) become too choppy to be any fun, and online games' ping-times get too slow for playability.

Why?

The typical ADSL upload bandwidth is 128Kbits/sec. The typical large ethernet packet is 1500 bytes. This means you can upload 128Kbit/sec / 8*1500byte = 10 packets/sec. Woah, that already is not good, but it gets worse. The typical DSL modem buffers packets in a fifo that can hold 4K or 8K of data, or between a quarter to half a second of data. It does this to accomodate short bursts without dropping packets. (dropping the packets instead of buffering them would be worse since in most cases the sender would then retransmit, further increasing the congestion). The trouble is that the dsl modems do no reordering within this large buffer. Packets are sent up the telephone line in the order they are received from the ethernet. A large upload keeps the buffer filled with its packets, and any new packet (the TCP ACK needed to continue a download, or the TCP data containing the ssh or telnet keystroke) must first fight for a space in the upload buffer, and once there wait a quarter second or more to be transmitted. Effectively every packet sent up the DSL line gets a non-negligable chance of being dropped, and if it isn't, a quarter second+ delay.

How?

The fix is to shape the uplink's traffic before it gets into the DSL modem's upload buffer. And the tool is linux, specifically linux's IP forwarding and traffic shaping capabilities. The cleanest way to add this to your home network is to give the linux box two NIC cards, and place it between everything else and the DSL modem. However if you are a sophisticated DSL user you probably already have a router and want the linux box to be on the same network as the rest of the machines in the house. Apart from that, placing the linux box between the DSL modem and the rest of the house means the linux box must be up and running or the internet connection is down. The way I set things up is to place the traffic shaping linux box in the same network as everything else, and have it send gratuitous ARP packets claiming it is the gateway. This tricks the rest of the machines in the house into sending it their packets destined for the outside world as long as they keep receiving the GARPs. If the linux box is down, the GARPs don't come, the ARP entries time out, the hosts send ARP requests and receive a reply from the router. This way if the linux box is down the internet connection switches over automatically.

With this GARP trick, packets being sent from anywhere in the house to an outside destination take an extra hop through the linux traffic shaper, while packets between hosts in the house and packets from outside into the house go directly to their destinations. This is exactly the desired effect.

much less than 1000 Words

HOWTO

Install a 2.4.18 or better kernel, and the debian dsniff and iproute packages (or the equivalent on other linux distributions). The dsniff package supplies /usr/sbin/arpspoof, and iproute /sbin/tc.

The traffic shaping HOWTOs are getting better, but I will still explain what is going on because it isn't completely obvious: I create a 3-band priority shaper, and then completely override the default priority filtering by adding filters. The 1st band receives in-house traffic. The 2nd band receives small and interactive (ssh) packets destined for outside the house, and the 3rd band the rest. Lastly I add a rate limiter to the 3rd band to limit it to less than 128kbit/sec.

The rate limiter is enough to prevent the upload buffer in the DSL modem from containing more than one large packet (the one currently being sent). This allows small packets from the 2nd band to get ahead of the large ones, and fixes the slowdown of downloads and interactive sessions caused by a large upload.

So without further ado, here is the script I use as part of my startup. The in-house network uses 192.168.168.0/24, and the DSL router is 192.168.168.1.

#!/bin/sh


case "$1" in
    start)
	# configure eth0 so that there is a bandwidth cap on large packets going up the DSL line
	# and then garp advertising the true gateway's IP, so that other hosts use us rather than it

	# enable ip forwarding
	echo 1 >/proc/sys/net/ipv4/ip_forward 

	# disable sending of icmp redirects (after all, we are deliberatly causing the hosts to use us instead of the true gateway)
	echo 0 >/proc/sys/net/ipv4/conf/all/send_redirects
	echo 0 >/proc/sys/net/ipv4/conf/eth0/send_redirects

	# clear whatever is attached to eth0
	# this can fail if there is nothing attached, btw, but that is fine
	/sbin/tc qdisc del dev eth0 root 2>/dev/null

	# add default 3-band priority qdisc to eth0
	/sbin/tc qdisc add dev eth0 root handle 1: prio

	# add a <128kbit rate limit (matches DSL upstream bandwidth) with a very deep buffer to the bulk band (#3)
	# 99 kbit/s == 8 1500 byte packets/sec, so a latency of 5 sec means we will buffer up to 40 of these big
	# ones before dropping. a buffer of 1600 tokens means that at any time we are ready to burst one of
	# these big ones (at the peakrate, 120kbit/s). the mtu of 1518 instead of 1514 is in case I ever start
	# using vlan tagging, because if mtu is too low (like 1500) then all traffic blocks
	/sbin/tc qdisc add dev eth0 parent 1:3 handle 13: tbf rate 99kbit buffer 1600 peakrate 120kbit mtu 1518 mpu 64 latency 5000ms

	# add fifos to the other two bands so we can have some stats
	#/sbin/tc qdisc add dev eth0 parent 1:1 handle 11: pfifo
	#/sbin/tc qdisc add dev eth0 parent 1:2 handle 12: pfifo

	# add a filter so DIP's within the house go to prio band #1 instead of being assigned by TOS
	# thus traffic going to an inhouse location has top priority
	/sbin/tc filter add dev eth0 parent 1:0 prio 1 protocol ip u32 match ip dst 192.168.168.0/24 flowid 1:1

	# multicasts also go into band #1, since they are all inhouse (and we don't want to delay ntp packets and mess up time)
	/sbin/tc filter add dev eth0 parent 1:0 prio 1 protocol ip u32 match ip dst 224.0.0.0/4 flowid 1:1

	# interactive session (not scp) ssh packets go to band #2.
	# (scp sets the IP TOS/diffserv flags to indicate bulk traffic, which allow us to tell it apart from ssh)
        /sbin/tc filter add dev eth0 parent 1:0 prio 2 protocol ip u32 match ip protocol 6 0xff \
                                                                       match ip sport 22 0xffff \
                                                                       match ip tos 0x10 0xff \
                                                                       flowid 1:2
        /sbin/tc filter add dev eth0 parent 1:0 prio 2 protocol ip u32 match ip protocol 6 0xff \
                                                                       match ip dport 22 0xffff \
                                                                       match ip tos 0x10 0xff \
                                                                       flowid 1:2


	# small IP packets go to band #2
	# by small I mean <128 bytes in the IP datagram, or in other words, the upper 9 bits of the iph.tot_len are 0
	# note: this completely fails to do the right thing with fragmented packets. However
	# we happen to not have many (any? icmp maybe, but tcp?) fragmented packets going out the DSL line
	/sbin/tc filter add dev eth0 parent 1:0 prio 2 protocol ip u32 match u16 0x0000 0xff80 at 2 flowid 1:2

	# a final catch-all filter that redirects all remaining ip packets to band #3
	# presumably all that is left are large packets headed out the DSL line, which are
	# precisly those we wish to rate limit in order to keep them from filling the
	# DSL modem's uplink egress queue and keeping the shorter 'interactive' packets from
	# getting through
	# the dummy match is required to make the command parse
	/sbin/tc filter add dev eth0 parent 1:0 prio 3 protocol ip u32 match u8 0 0 at 0 flowid 1:3

	# have the rest of the house think we are the gateway
	# the reason I use arpspoofing is that I want automatic failover to the real gateway
	# should this machine go offline, and since the real gateway does not do vrrp, I hack
	# the network and steal its arp address instead
	# It takes 5-10 seconds for the failback to happen, but it works :-)
	/usr/sbin/arpspoof -i eth0 192.168.168.1 >/dev/null 2>&1 &
	echo $! >/var/run/shapedsl.arpspoof.pid
	;;
    stop)
	/sbin/tc qdisc del dev eth0 root 2>/dev/null
        if [ -r /var/run/shapedsl.arpspoof.pid ]; then
	  kill `cat /var/run/shapedsl.arpspoof.pid`
	  rm /var/run/shapedsl.arpspoof.pid
	fi
	;;
    restart)
	$0 stop
	$0 start
	;;
    *)
    	echo "Usage: $0 [start|stop|restart]"
	exit 1
esac

exit 0

PS

Newer kernels contain a nice qdisc that can combine bands with rate limiting. Using that would allow us to combine both the small and large packets in the <128Kbit/sec rate limitation. However in practice just limiting the larger packets is enough.

 

 

 

-- 
July 2003, Nicolas Dade <ndade@nsd.dyndns.org>; fixes by Matthias Flege