Unified Performance tool for networking (uperf)

Introduction

Microbenchmarks rarely measure real world performance. This is especially true in networking where applications can use multiple protocols, use different types of communication, interleave CPU processing with communication, etc. However, popular microbenchmarks like iPerf and netperf are very simplistic, supporting only one protocol at a time, fixed message sized communication, no support for interleaving CPU processing between communication, and so on. Thus there is a need for a tool to closely model real world performance.

Uperf (Unifed performance tool for networking) solves this problem by allowing the user to model the real world application using a very high level language (called profile) and running this over the network. It allows the user to use multiple protocols, varying message sizes for communication, a 1xN communication model, support for collection of CPU counter statistics, and much more.

Authors

uperf was developed by the Performance Availablity Engineering group at Sun Microsystems. It was originally developed by Neelakanth Nadgir and Nitin Rammanavar. Jing Zhang added support for the uperf harness. Joy added SSL support, and Eric He ported it to windows and is currently a core contributer. Charles Suresh, Alan Chiu, Jan-Lung Sung have contributed significantly to its design and development.

Features

The following list is a short overview of some of the features supported by uperf:

Support modeling of workloads using profiles
Support for Multiple protocols (new protocols can be easily integrated). Currently the following protocols are supported.
- TCP
- UDP
- SCTP
- SSL
- VSOCK
1-Many hosts
Support for CPU counters and lots of other detailed statistics
Fenxi integration (Cool graphs!).
Ability to choose whether to use processes or threads
Runs on Solaris, Linux and windows ^[1]

Using uperf

Getting uperf

uperf is opensource software using the GNU General Public License v2 . You can download it from http://uperf.org. Binaries are available for Solaris and Linux.

Running uperf

uperf can be run as either a master(active) or a slave(passive). When run as active it needs master flag(-m) with a profile describing the test application.

Uperf Version 1.0.8
Usage:   uperf [-m profile] [-hvV] [-ngtTfkpaeE:X:i:P:RS:]
         uperf [-s] [-hvV]

        -m <profile>     Run uperf with this profile
        -s               Slave
        -S <protocol>    Protocol type for the control Socket [def: tcp]
        -n               No statistics
        -T               Print Thread statistics
        -t               Print Transaction averages
        -f               Print Flowop averages
        -g               Print Group statistics
        -k               Collect kstat statistics
        -p               Collect CPU utilization for flowops [-f assumed]
        -e               Collect default CPU counters for flowops [-f assumed]
        -E <ev1,ev2>     Collect CPU counters for flowops [-f assumed]
        -a               Collect all statistics
        -X <file>        Collect response times
        -i <interval>    Collect throughput every <interval>
        -P <port>        Set the master port (defaults to 20000)
        -R               Emit raw (not transformed), time-stamped (ms) statistics
        -v               Verbose
        -V               Version
        -h               Print usage

More information at http://www.uperf.org

uperf comes bundled with quite a few sample profiles in the workloads directory. You can always tweak them to suit your needs or write your own profile. Several of these profiles pick up values (like remotehost, or protocol) from the ENVIRONMENT. These variables begin with the $ sign in the profile. You can either set these (via export h=192.168.1.4) or hardcode them in the profile.

The list of profiles included by uperf is as follows

netperf.xml: This profile represents the request-response kind of traffic. One thread on the master is reading and writing 90 bytes of data from the slave. The remote end (slave) address is specified via the $h environment variable. $proto specifies the protocol to be used.
iperf.xml: In this profile, multiple threads simulates one way traffic (8K size) between two hosts (similar to the iperf networking tool) for 30 seconds. $h specifies the remote host, $proto specifies the protocol, and $nthr specifies the numnber of threads.
connect.xml: In this profile, multiple threads try to connect and disconnect from the remote host. This can be used to measure the connection setup performance. $nthr specifies the numnber of threads, and $iter determines number of connects and disconnects each thread will do.
two-hosts.xml: This profile demonstrates an application in which each thread opens a connection each to two hosts, and then reads 200 bytes from the first connection and writes it to the other connection.

Uperf profiles

uperf is based on the idea that you can describe your application or workload in very general terms and the framework will run that application or workload for you. For example, if you are familiar with netperf or request-response microbenchmarks, this general description would be "each thread sends 100bytes and receives 100 bytes using UDP". For a more complex application, we may have to specify the number of connections, and/or the number of threads, are the threads all doing the same kind of operation, what protocols are being used, Is the traffic bursty?, etc. As you can see, it gets quite complicated for any real-world application. uperf defines a language to specify all of these information in a machine-understandable format (xml) called a profile. uperf then parses and runs whatever the profile specifies. The user has to specify the profile for the master only. uperf automatically transforms the profile for the slaves and uses it.

The profile needs to be a valid XML file. Variables that begin with a '$' are picked up from the ENVIRONMENT.

Sample Profile

A sample profile for the request-response microbenchmark is shown below.

<?xml version="1.0"?>
<profile name="netperf">
  <group nthreads="1">
        <transaction iterations="1">
            <flowop type="accept" options="remotehost=$h protocol=$proto
	    wndsz=50k tcp_nodelay"/>
        </transaction>
        <transaction duration="30s">
            <flowop type="write" options="size=90"/>
            <flowop type="read" options="size=90"/>
        </transaction>
        <transaction iterations="1">
            <flowop type="disconnect" />
        </transaction>
  </group>
</profile>

Explanation of profile

Every profile begins with a xml header that specifies that it is a XML file. A profile has name. This is used to identify the name of the profile, and is not used by uperf. The major parts of a profile are

group
transaction
flowop

Lets look at each of these in detail.

Group

A profile can have multiple groups. A group is a collection of threads or processes that execute transactions contained in that group.

Transaction

A transaction is a unit of work. Transactions have either an iteration or a duration associated with it. If <transaction iteration=1000> is specified, the contents of the transactions are executed 1000 times. If <transaction duration=30s> is specified, the contents of the transaction are executed for 30 seconds. By default, the transaction executes its contents only once. All threads or processes start executing transactions at the same time.

Flowop

The contents of the transaction are called flowops. These basic operations (building blocks) are used to define a workload. Current supported flowps are

Connect
Accept
disconnect
read
write
redv
sendto
sendfilev
NOP
think

Every Flowop has a set of options. In the XML file, these are space seperated. The supported options are listed below.

Common options

count	The number of times this flowop will be executed
duration	The amount of time this flowop will be executed. Example: `duration=100ms`. This option will no longer be supported in future versions of uperf. Specify the duration in the transaction
rate	Experimental: This option causes uperf to execute this flowop at the specified rate for `iterations` or `duration` seconds.

Connect/Accept

writerse_option The connect flowop specifies that a connection needs to be opened. The options parameter specifies more details regaring the connection. The following keys are supported

remotehost	The remote host that we need to connect or accept connection from
protocol	The protocol used to connect to the remote host. Valid values are tcp, udp, ssl, sctp, and vsock
tcp_nodelay	Controls whether `TCP_NODELAY` is set or not
wndsz	Size of the socket send and receive buffer. This parameter is used to set `SO_SNDBUF, SO_RCVBUF` flags using `setsocktopt()`
engine	SSL Engine.

Read, Write, Sendto and Recv flowops

size	Amount of data that is either read or written. Uperf supports exchange of Fixed size messages Asymmetrical size messages Random size messages For fixed size messages, the master and all slaves used a fixed size for receives and transmits. For asymmetrical sized messages, the slaves use the size specified by the `rszize` parameter. The master still uses the `size` parameter. For a random sized message, the a uniformly distributed value between the user specifed min and max is used by the transmitting side, and the receiving side uses the max as the message size. Example: `size=64k` or `size=rand(4k,8k)`
rsize	See description of asymmetrical messages above.
canfail	Indicates that a failure for this flowop will not cause uperf to abort. This is espcially useful in UDP where a packet drop does not constitue a fatal error. This can be also be used for example, to test a SYN flood attack (Threads `connect()` repeatedly ignoring errors).
non_blocking	Use non-blocking IO. The socket/file descriptor is set the NO_BLOCK flag.
poll_timeout	If this option is set, the thread will first `poll` for specified duration before trying to carry out the operation. A `poll` timeout is returned as an error back to uperf.
conn	Every open connection is assigned a connection name. Currently, the name can be any valid integer, however, uperf could take a string in the future. `conn` identifies the connection to use with this flowop. This connection name is thread private.

Sendfile and Sendfilev flowops

The sendfile flowop uses the sendfile(3EXT) function call to transfer a single file. The sendfilev flowop transfers a set of files using the sendfilev(3EXT) interface. Multiple files are randomly picked from all transferrable files (see dir below) and tranferred to the slave.

dir	This parameter identifies the directory from which the files will be transferred. The directory is search recursively to generate a list of all readable files. Example: `dir=/space`
nfiles	This parameter identifies the number of files that will be transferred with each call to `sendfilev(3EXT)`. This is used as the 3rd argument to the `sendfilev(3EXT)`. nfiles is assumed to be 1 for the sendfile flowop. function. Example: `nfiles=10`
size	This parameter identifies the chunk size for the transfer. Instead of sending the whole file, uperf will send size sized chunks one at a time. This is used only if nfiles=1

Statistics collected by uperf

uperf collects quite a wide variety of statistics. By default, uperf prints the throughput every second while the test is running, and then prints out the total throughput. uperf also prints the network statistics, calculated independently using system statistics, to verify the throughput reported via uperf. uperf also prints statistics from all the hosts involved in this test to validate the output.

Some of the statistics collected by uperf are listed below

Throughput
Latency
Group Statistics
Per-Thread statistics
Transaction Statistics
Flowops Statistics
Netstat Statistics
Per-second Throughput

Default uperf output

bash$ ./framework/uperf  -m netperf.xml  -a -e -p
Starting 4 threads running profile:netperf ...   0.01 seconds
Txn0           0B/1.01   (s) =        0b/s           3txn/s     254.89ms/txn
Txn1     195.31MB/30.30  (s) =   54.07Mb/s       13201txn/s       2.30ms/txn
Txn2           0B/0.00   (s) =        0b/s
--------------------------------------------------------------------------------
netperf       195.31MB/32.31(s) =   50.70Mb/s (CPU 21.42s)

Section: Group details
--------------------------------------------------------------------------------
         Elapsed(s)   CPU(s)       DataTx             Throughput
Group0   32.31        21.40        195.31M            50.70M

Group 0 Thread details
--------------------------------------------------------------------------------
Thread   Elapsed(s)   CPU(s)       DataTx             Throughput
0        32.31        5.30         48.83M             12.68M
1        32.31        5.31         48.83M             12.68M
2        32.31        5.44         48.83M             12.68M
3        32.31        5.36         48.83M             12.68M

Group 0 Txn details
--------------------------------------------------------------------------------
Txn  Avg(ms)    CPU(ms)    Min(ms)    Max(ms)
0    5.45       0.51       5.37       5.68
1    0.29       0.00       0.23       408.63
2    0.32       0.16       0.07       0.81

Group 0 Flowop details (ms/Flowop)
--------------------------------------------------------------------------------
Flowop       Avg(ms)  CPU(ms)  Min(ms)  Max(ms)
Connect      5.41     0.49     5.31     5.66
Write        0.02     0.00     0.01     0.53
Read         0.25     0.00     0.05     408.59
Disconnect   0.30     0.14     0.06     0.79

Netstat statistics for this run
--------------------------------------------------------------------------------
Nic       opkts/s     ipkts/s     obits/s     ibits/s
ce0         12380       12391      30.68M      30.70M
ce1             0           0           0      84.67
--------------------------------------------------------------------------------
Waiting to exchange stats with slave[s]...
Error Statistics
--------------------------------------------------------------------------------
Slave           Total(s)     DataTx   Throughput   Operations      Error %
192.9.96.101       32.25   195.31MB    50.80Mbps       800008        0.00

Master             32.31   195.31MB    50.70Mbps       800008        0.00
--------------------------------------------------------------------------------
Difference(%)      0.20%      0.00%       -0.20%        0.00%        0.00%

Frequently Asked Questions

Q: What is the history behind uperf?
Q: Where can I submit bugs/feedback?
Q: How do I specify which interface to use?
Q: Does the use of -a affect performance?
Q: Does uperf support socket autotuning on Linux?
Q: Where can I get the uperf harness?
Q: Why do you even have a -n option?
Q: Why do we have an option to do sendfilev with chunks?

Q:	What is the history behind uperf?
A:	uperf was developed by the Performance Availablity Engineering group at Sun Microsystems circa 2004. It was originally inspired by Filebench, and developed by Neelakanth Nadgir and Nitin Rammanavar.
Q:	Where can I submit bugs/feedback?
A:	Until we have something better, please email `<uperf@sun.com>`
Q:	How do I specify which interface to use?
A:	uperf just specifies the host to connect to. It is upto the OS to determine which interface to use. You can change the default interface to that host by changing the routing tables
Q:	Does the use of `-a` affect performance?
A:	Since `-a` collects all kinds of statistical information, there is a measurable impact when the flowop is lightweight (UDP TX for small packets).
Q:	Does uperf support socket autotuning on Linux?
A:	uperf currently always call `setsocketopt()`, and this disables autotuning on Linux, so you can't test autotuning. If no window size(`wndsz`) is specified in the profile, `setsocketopt()` won't be called by uperf, thus enabling the autotuning on Linux
Q:	Where can I get the uperf harness?
A:	The harness is not opensource, although if there is sufficient interest, we would definetely consider opensourcing it. For more details, please contact Jing Zhang.
Q:	Why do you even have a `-n` option?
A:	uperf uses a global variable to count the number of bytes transferred. This is updated using atomic instructions `atomic_add_64` function. However, if you have thousands of threads, there is very high likelyhood that many threads update this value simultaneously. causing higher CPU utilization. The `-n` helps in this case.
Q:	Why do we have an option to do sendfilev with chunks?
A:	Pallab identified an issue where chunked sendfilev's were faster than transferring the whole file in one go. This will help debug the issue.

Named connections

uperf supports named connections. To specify a name, you should specify conn=X variable in the options to a connect or accept flowop. For example, <flowop type="connect" options="conn=2 remotehost=$h protocol=tcp>

If a name is not specified, the connection is an anonymous connection. For any flowop, if a connection is not specified, it uses the first anonymous connection.

Using Fenxi with Uperf

Uperf can generate data that can be post processed by Fenxi. To use that feature, you have to use the -x option of uperf. The output should be stored in file whose name has the uperf prefix. For ex

$ uperf -m iperf.xml -x > uperf-iperf.out
$ fenxi process uperf-iperf.out outdir iperf

. The output is now stored in outdir