Linux networking sucks. XEN Networking sucks.

Yes, it does. For a while I started to move stuff away from physical machines onto XEN-virtualized servers. Using CentOS, this works reasonably well, except…
… well, when you use the regular xen network-bridge and vif-bridge scripts to set up and tear down your virtual interfaces, suddenly other services start to scream:

Jan 24 19:45:33 shirley kernel: xenbr0: port 1(vif0.0) entering forwarding state
Jan 24 19:45:33 shirley ntpd[4806]: sendto(192.53.103.104) (fd=18): Network is unreachable
Jan 24 19:45:33 shirley ntpd[4806]: sendto(131.188.3.221) (fd=18): Network is unreachable

Great. The bridge setup changed the network topology and the ntpd (which is bound to eth0 for multicast) doesn’t get it. But wait, there is more.

Jan 28 22:47:04 plucky kernel: xenbr0: port 2(peth0) entering forwarding state
Jan 28 22:47:04 plucky snmpd[2278]: error on subcontainer ‘’ insert (-1)
Jan 28 22:47:04 plucky last message repeated 12 times

So the snmpd chokes on topology changes?!? But hey! Who would run stuff like time sync and network monitoring on a production system? Well, just about, everyone???

The whole xen network setup and changing magic is pretty much useful for your little at-home-boot-every day box. Not for serious production needs (Well, I’m sure that lots of people will scream now in anger, that their production boxes work perfectly with the scripts. If you a) run snmpd and ntpd, b) check your log files on a regular base and c) don’t find the bugs described above, please let me know. Else, don’t. Because without time sync and monitoring, you are not running in production mode.).

So what to do? Don’t let xen setup the networking. Let the OS do it!

Because it has a marginally better understanding of what it is doing. And the scripts are written by people who are paid to think about more complex setups than the one-person, one-interface PC at home.

Setup a bridge for dom0: /etc/sysconfig/network-scripts/ifcfg-br0:

DEVICE=br0
ONBOOT=yes
BOOTPROTO=none
IPADDR=1.2.3.4
NETMASK=255.255.255.0
GATEWAY=1.2.3.1
NO_ALIASROUTING=yes
TYPE=Bridge

Now connect your physical interface to the bridge: /etc/sysconfig/network-scripts/ifcfg-eth0:

DEVICE=eth0
ONBOOT=yes
BOOTPROTO=none
BRIDGE=br0
HWADDR=22:44:66:88:aa:cc

The last line should be the HW address of your physical interface. (If you don’t know what this is, please use the XEN supplied networking scripts and don’t bother…)

You must enable ip forwarding on your box. Add the following line to /etc/sysctl.conf:

net.ipv4.ip_forward = 1

Poof! Instant, stable, bridged networking when booting up.

However, starting xend produces a set of ugly veth / vif0. pairs. Because it unconditionally loads the netloop module which in turn creates four (four? Why is it always four? Why not a nice round number like, e.g. zero) pairs by default unless you explicitly disable this. Save yourself the googling. Just add

options netloop nloopbacks=0

to your /etc/modprobe.conf file. Do it. As long as the module is loaded, XEN is able to create its needed netloops on the fly. You really don’t need to have four more dangling around.

Almost there. Now edit your virtual host definitions in /etc/xen/auto and change the bridge name in the vif= lines from xenbr0 to br0. E.g. like this:

vif = [ ‘mac=ee:cc:aa:88:66:44, bridge=br0′, ]

And finally, open the /etc/xen/xend-config.sxp file and comment out all lines beginning with (network-script …). Really. You don’t need these any longer. Because the networking is already set up correctly. Keep the (vif-script vif-bridge) line.

==> Working, stable and reliable networking with XEN. And even snmpd and ntpd are happy.

2 February 2008 | Netstuff | Comments

5 Responses to “Linux networking sucks. XEN Networking sucks.”

  1. 1 Henning Sprang 16 February 2008 @ 18:48

    Yes, the default xen network script are truly crappy code. And that goes for the vif-scripts as well.

    The only downside that happened to me when using my own network scripts/letting the OS network config set up my bridge was: if you happen to do something like /etc/initd/networking restart (on debian…), your bridges come up again, but all the guest domains you have running are not connected to the bridge anymore.
    But if you do run networking restart on a heavily used production machine, you should know what you are doing :)

  2. 2 Jan Marquardt 8 March 2008 @ 8:58

    Thanks a lot for this information. We’ve encountered some weird problems with the default Xen-networking and were able to fix this by letting the OS do the bridging stuff. In the next days I will write a entry in my blog how to do the bridging under Debian.
    Cheers

  3. 3 KS 30 March 2008 @ 20:04

    I followed that explanation for setting up br0 and eth0 on CentOS 5.1, but it definitely does not work for me. eth0 comes up with no IP address (expected), but br0 doesn’t come up. The output of service network start shows the line about starting loopback (OK), then the next line says “does not seem to be present, delaying initialization - failed” (note, it doesn’t even give a name!) Then it brings up eth0 (OK). That’s it. When I stop it I see this in messages br0: port1(eth0) entering disabled state. No message for bringing it up.

  4. 4 Wolfgang 29 April 2008 @ 18:26

    Hello,

    thanks for that article, it looks like, that you did allready, what I was searching for.
    However I realized, that in Ubuntu the configuration via /etc/modprobe.conf is not working. To get rid of the annoying vethX/vifX-pairs you need to pass the option
    netloop.nloopbacks=0 to the kernel.

  5. 5 Henning Schmiedehausen 30 April 2008 @ 15:41

    Ubuntu seems to have the netloop module compiled into the kernel, so you need to pass the option at boot time. Thanks for the information, I am truly a RedHat person. :-)

Leave a Reply

  1.  
  2.  
  3.  

Search

Blogroll

Tags

 

July 2008
M T W T F S S
« May    
 123456
78910111213
14151617181920
21222324252627
28293031  

Archives

Recent Posts

XML-Sitemap

(C) 2005-2007 Henning Schmiedehausen