【 tulaoshi.com - 编程语言 】
by Alessandro Rubini
Figure One: The data flow through insane (INterface SAmple for Network Errors), which simulates random packet loss, or intermittent network failure.
In the Linux (or Unix) world, most network interfaces, sUCh as eth0 and ppp0, are associated with a physical device that is in charge of transmitting and receiving data packets. However, some logical network interfaces don't feature any physical packet transmission. The most well-known examples of these "virtual" interfaces are the shaper and eql interfaces. This month, we'll look at how this kind of interface attaches to the kernel and to the packet-transmission mechanism.
From the kernel's point of view, a network interface is a software object that can process outgoing packets, with the actual transmission mechanism hidden inside the interface driver. Even though most interfaces are associated with physical devices (or, for the loopback interface, to a software-only data loop), it is possible to design network-interface drivers that rely on other interfaces to perform actual packet transmission. The idea of a "virtual" interface can be useful to implement special-purpose processing on data packets while avoiding hacking the network subsystem of the kernel. Although some of what can be accomplished by a virtual interface is more easily implemented by writing a netfilter module, not everything can be implemented by netfilters, and the virtual interface is an additional tool for customizing network behavior.
To support this discussion with a real-world example, I wrote an insane (INterface SAmple for Network Errors) driver, available from FTP//ftp.linux.it/pub/People/Rubini/ insane.tar.gz. The interface simulates semi-random packet loss or intermittent network failures. (This kind of functionality can be more easily accomplished with netfilters, and is shown here only to exemplify the related API.) The code fragments shown here are part of the insane driver and have been tested with Linux 2.3.42. While the following description is rather terse, the sample code is well-commented and tries to fill in some of the gaps left open by this quick tour of the topic.
How an Interface Plugs into the Kernel
Like many other kinds of device drivers, a network-interface module connects to the rest of Linux by registering its own data structure within the kernel. The insane driver, for example, registers itself by calling register_netdev(&insane_dev);.
The device structure being registered, insane_dev, is a struct net_device object (Linux 2.3.13 and earlier called it struct device), and it must feature at least two valid fields: the interface name and a pointer to its initialization function:
static struct net_device insane_dev = {
name: "insane",
init: insane_init,
};
The init callback is meant for internal use by the driver: It usually fills other fields of the data structure with pointers to device methods, the functions that perform the real work during the interface's lifetime. When an interface driver is linked into the kernel (instead of being loaded as a module), the first task of the init function is to check whether the interface hardware is there.
The interface can be removed by calling unregister_netdev(), usually invoked by cleanup_module() (or not invoked at all if the driver is not modularized). The net_ device structure includes, in addition to all the standardized fields, a "private" pointer (a void *) that can be used by the driver for its own use. Where virtual interfaces are concerned, the private field is the best place to store configuration information; Listing One shows how the insane sample interface follows the good practice of allocating its own priv structure at initialization time.
Listing One: insane Allocates Its Own priv Structure at Initialization
/* priv is used to host the statistics, and packet dropping policy */
dev-priv = kmalloc(sizeof(struct insane_private), GFP_USER);
if (!dev-priv) return -ENOMEM;
memset(dev-priv, 0, sizeof(struct insane_private));
The allocation is released at interface shutdown (i.e., when the module is removed from the kernel).
Device Methods
A network-interface object, like most kernel objects, eXPorts a list of methods so the rest of the kernel can use it. These methods are function pointers located in fields of the object data structure, here struct net_device.
An interface can be perfectly functional by exporting just a subset of all the methods; the recommended minimum subset includes open,stop (i.e., close), do_ioctl, and get_ stats. These methods are directly related to system calls invoked by a user program (such as ifconfig). With the exception of ioctl, which needs some detailed discussion, their implementation is pretty trivial, and they turn out to be just a few lines of code (See Listing Two).
Listing Two: Exporting Methods
int insane_open(struct net_device *dev)
{
dev-start = 1;
MOD_INC_USE_COUNT;
return 0;
}
int insane_close(struct net_device *dev)
{
dev-start = 0;
MOD_DEC_USE_COUNT;
return 0;
}
struct net_device_stats *insane_get_stats(struct net_device *dev)
{
return &((struct insane_private *)dev-priv)-priv_stats;
}
The open method is called when you call ifconfig insane up, and close is called with ifconfig insane down; get_stats returns a pointer to the local statistics structure and is used by ifconfig as well as by the /proc filesystem. The driver is responsible for filling the statistic information (although it may choose not to), whose fields are defined in linux/netdevice.h).
Other methods are related to the low-level details of packet transmission, but they fall outside of the scope of this discussion (although they are implemented in the source package). The only interesting low-level method is hard_ start_xmit, which I discuss later.
ioctl
The do_ioctl call is the most important entry point for virtual interfaces. When a user program configures the behavior of the interface, it invokes the ioctl() system call. This is how shapecfg defines network shaping and how eql_enslave attaches real interfaces to the load-balancing interface eql. Similarly, the insanely application configures the insane behavior on the insane virtual interface. Unlike "normal" device drivers, such as char and block drivers, the implementation of ioctl for interfaces is pretty well-defined. The invoking file descriptor must be a socket, the available commands are only SIOCDEVPRIVATE to SIOCDEVPRIVATE 15, and the infamous "third argument" of the system call is always a struct ifreq * pointer instead of the generic void * pointer. This restriction in ioctl arguments takes place because socket ioctl commands span several logical layers and several protocols.
The predefined values are reserved for a device's private use and are unique throughout the protocol stack. Note that no other ioctl command will be delivered to the network-interface method, so you really cannot choose your own values. Passing a predefined data structure to ioctl doesn't necessarily limit the flexibility of interface configuration, however, since the ifreq structure includes the data field, a caddr_t value that can point to arbitrary configuration information.
Based on the information above, the insane interface can be controlled using these commands (defined in insane.h):