System Tuning

System Tuning Info for Linux Servers
This page is about optimizing and tuning Linux based systems for server
oriented tasks. Most of the info presented here I've used myself, and have
found it to be beneficial. I've tried to avoid the well tread ground
(hdparm, turning off hostname lookups in apache, etc) as that info is easy
to find elsewhere.
Disk Tuning
File system Tuning
SCSI Tuning
Network Interface Tuning
TCP Tuning
File limits
Process limits
Threads
Apache and other web servers
Samba
Openldap tuning
Sys V shm
Benchmarks
System Monitoring
Utilities
System Tuning Links
Music
TODO

File and Disk Tuning

Benchmark performance is often heavily based on disk I/O performace. So
getting as much disk I/O as possible is the real key.
Depending on the array, and the disks used, and the controller, you may want
to try software raid. It is tough to beat software raid performace on a
modern cpu with a fast disk controller.
The easiest way to configure software raid is to do it during the install.
If you use the gui installer, there is options in the disk partion screen to
create a "md" or multiple-device, linux talk for a software raid partion.
You will need to make partions on each of the drives of type "linux raid",
and then after creating all these partions, create a new partion, say "
/test", and select md as its type. Then you can select all the partions that
should be part of it, as well as the raid type. For pure performance, RAID 0
is the way to go.
Note that by default, I belive you are limited to 12 drives in a MD device,
so you may be limited to that. If the drives are fast enough, that should be
sufficent to get >100 mb/s pretty consistently.

File System Tuning

Some of the default kernel paramaters for system performance are geared more
towards workstation performance that file server/large disk io type of
operations. The most important of these is the "bdflush" value in
/proc/sys/vm/bdflush
These values are documented in detail in
/usr/src/linux/Documenation/sysctl/vm.txt.
A good set of values for this type of server is:
echo 100 5000 640 2560 150 30000 5000 1884 2 > /proc/sys/vm/bdflush
(you change these values by just echo'ing the new values to the file. This
takes effect immediately. However, it needs to be reinitilized at each
kernel boot. The simplest way to do this is to put this command into the end
of /etc/rc.d/rc.local)
Also, for pure file server applications like web and samba servers, you
probably want to disable the "atime" option on the filesystem. This disabled
updating the "atime" value for the file, which indicates that the last time
a file was accessed. Since this info isnt very useful in this situation, and
causes extra disk hits, its typically disabled. To do this, just edit
/etc/fstab and add "notime" as a mount option for the filesystem.
for example:
/dev/rd/c0d0p3          /test                    ext2    noatime        1 2
With these file system options, a good raid setup, and the bdflush values,
filesystem performace should be suffiecent.

SCSI Tuning

SCSI tuning is highly dependent on the particular scsi cards and drives in
questions. The most effective variable when it comes to SCSI card performace
is tagged command queueing.
For the Adaptec aic7xxx seriers cards (2940's, 7890's, *160's, etc) this can
be enabled with a module option like:
	aic7xx=tag_info:{{0,0,0,0,}}
This enabled the default tagged command queing on the first device, on the
first 4 scsi ids.
	options aic7xxxaic7xxx=tag_info:{{24.24.24.24.24.24}}
in /etc/conf.modules will set the TCQ depth to 24
You probably want to check the driver documentation for your particular scsi
modules for more info.

Network Interface Tuning
Most benchmarks benifit heavily from making sure the NIC's in use are well
supported, with a well written driver. Examples include eepro100, tulip's,
newish 3com cards, and acenic and sysconect gigabit cards.
Making sure the cards are running in full duplex mode is also very often
critical to benchmark performace. Depending on the networking hardware used,
some of the cards may not autosense properly and may not run full duplex by
default.
Many cards include module options that can be used to force the cards into
full duplex mode. Some examples for common cards include
alias eth0 eepro100
options eepro100 full_duplex=1
alias eth1 tulip
options eepro100 full_duplex=1
Though full duplex gives the best overall performance, I've seen some
circumstances where setting the cards to half duplex will actually increase
thoughput, particulary in cases where the data flow is heavily one sided.
If you think your in a situation where that may help, I would suggest trying
it and benchmarking it.

TCP tuning

For servers that are serving up huge numbers of concurent sessions, there
are some tcp options that should probabaly be enabled. With a large # of
clients doing their best to kill the server, its probabaly not uncommon for
the server to have 20000 or more open sockets.
In order to optimize TCP performace for this situation, I would suggest
tuning the following parameters.
echo 1024 65000 > /proc/sys/net/ipv4/ip_local_port_range
Allows more local ports to be for incoming connections. Generally not a
issue, but in a benchmarking scenario you often need more ports available.
echo 0 > /proc/sys/net/ipv4/tcp_sack
echo 0 > /proc/sys/net/ipv4/tcp_timestamps
These reduce the amount of work the TCP stack has to do, so is often helpful
in this situation.

File Limits and the like
Open tcp sockets, and things like apache are prone to opening a large amount
of file descriptors. The default number of available FD is 4096, but this
may need to be upped for this scenario.
The theorectial limit is roughly a million file descriptors, though I've
never been able to get close to that many open.
I'd suggest doubling the default, and trying the test. If you still run out
of file descriptors, double it again.
For example:
echo 128000 > /proc/sys/fs/inode-max
echo 64000 > /proc/sys/fs/file-max
and as root:
	ulimit -n 64000
You probabaly want to add these to /etc/rc.d/rc.local so they get set on
each boot.
There are more than a few ways to make these changes "sticky". In Red Hat
Linux <http://www.redhat.com>, you can you /etc/sysctl.conf and
/etc/security/limits.conf to set and save these values.
If you get errors of the variety "Unable to open file descriptor" you
definately need to up these values.
You can examine the contents of /proc/sys/fs/file-nr to determine the number
of allocated file handles, the number of file handles currently being used,
and the max number of file handles.

Process Limits
For heavily used web servers, or machines that spawn off lots and lots of
processes, you probabaly want to up the limit of processes for both apache
and the kernel.
Apache sets a maximum number of possible processes at compile time. It is
set to 256 by default, but in this kind of scenario, can often be exceeded.
To change this, you will need to chage the hardcoded limit in the apache
source code, and recompile it. An example of the change is below:
--- apache_1.3.6/src/include/httpd.h.prezab     Fri Aug  6 20:11:14 1999
+++ apache_1.3.6/src/include/httpd.h    Fri Aug  6 20:12:50 1999
@@ -306,7 +306,7 @@
  * the overhead.
  */
 #ifndef HARD_SERVER_LIMIT
-#define HARD_SERVER_LIMIT 256
+#define HARD_SERVER_LIMIT 4000
 #endif

 /*
Also, the 2.2 kernel itself has a max process limit. The default values for
this are 2560, but a kernel recompile can take this as high as 4000. This is
a limitation in the 2.2 kernel, and has been removed from 2.3/2.4.
The values that need to be changed are:
If your running out how many task the kernel can handle by default, you may
have to rebuild the kernel after editing:
        /usr/src/linux/include/linux/tasks.h
and change:
#define NR_TASKS        2560    /* On x86 Max 4092, or 4090 w/APM
configured.*/
to
#define NR_TASKS        4000    /* On x86 Max 4092, or 4090 w/APM
configured.*/
and:
#define MAX_TASKS_PER_USER (NR_TASKS/2)
to
#define MAX_TASKS_PER_USER (NR_TASKS)
Then recompile the kernel.
also run:
ulimit -u 4000
Note: This process limit is gone in the 2.4 kernel series.

Threads
Limitations on threads are tightly tied to both file descriptor limits, and
process limits.
Under Linux, threads are counted as processes, so any limits to the number
of processes also applies to threads. In a heavily threaded app like a
threaded TCP engine, or a java server, you can quickly run out of threads.
For starters, you want to get an idea how many threads you can open. The
`thread-limit` util mentioned in the Tuning Utilities section is probabaly
as good as any.
The first step to increasing the possible number of threads is to make sure
you have boosted any process limits as mentioned before.
There are few things that can limit the number of threads, including process
limits, memory limits, mutex/semaphore/shm/ipc limits, and compiled in
thread limits. For most cases, the process limit is the first one to run
into, then the compiled in thread limits, then the memory limits.
To increase the limits, you have to recompile glibc. Oh fun!. And the patch
is essentially two lines!. Woohoo!
--- ./linuxthreads/sysdeps/unix/sysv/linux/bits/local_lim.h.akl Mon Sep  4
19:37:42 2000
+++ ./linuxthreads/sysdeps/unix/sysv/linux/bits/local_lim.h     Mon Sep  4
19:37:56 2000
@@ -64,7 +64,7 @@
 /* The number of threads per process.  */
 #define _POSIX_THREAD_THREADS_MAX      64
 /* This is the value this implementation supports.  */
-#define PTHREAD_THREADS_MAX    1024
+#define PTHREAD_THREADS_MAX  8192

 /* Maximum amount by which a process can descrease its asynchronous I/O
    priority level.  */
--- ./linuxthreads/internals.h.akl      Mon Sep  4 19:36:58 2000
+++ ./linuxthreads/internals.h  Mon Sep  4 19:37:23 2000
@@ -330,7 +330,7 @@
    THREAD_SELF implementation is used, this must be a power of two and
    a multiple of PAGE_SIZE.  */
 #ifndef STACK_SIZE
-#define STACK_SIZE  (2 * 1024 * 1024)
+#define STACK_SIZE  (64 * PAGE_SIZE)
 #endif

 /* The initial size of the thread stack.  Must be a multiple of PAGE_SIZE.
  * */
Now just patch glibc, rebuild, and install it. ;-> If you have a package
based system, I seriously suggest making a new package and using it.
Two references on how to do this are Jlinux.org
<http://www.jlinux.org/server.html>, and Volano
<http://www.volano.com/linuxnotes.html>.Both describe how to increase the
number of threads so Java apps can use them.

Apache config
Make sure you starting a ton of initial daemons if you want good benchmark
scores.
Something like:
#######
MinSpareServers 20
MaxSpareServers 80
StartServers 32

# this can be higher if apache is recompiled
MaxClients 256

MaxRequestsPerChild 10000

Note: Starting a massive amount of httpd processes is really a benchmark
hack. In most real world cases, setting a high number for max servers, and a
sane spare server setting will be more than adequate. It's just the instant
on load that benchmarks typically generate that the StartServers helps with.
If apache isnt enough, you may want to investigate some other web servers
that do more specific tasks faster.
If you are serving large amounts of static content, you may want to
investigate something like "phhttpd" a web server designed to do heavy
caching of static content and spew it out as fast as possible.
More info on it can be found at: <http://people.redhat.com/zab/phhttpd/>
There are also some experimental patches from SGI that accelerate apache.
More info at:
<http://reality.sgi.com/mja/apache/>
I havent really had a chance to test the SGI patches yet, but I've been told
they are pretty effective.
For purely static content, some of the other smaller more lightweight web
servers can offer very good performance. They arent nearly as powerful or as
flexible as apache, but for very specific performance crucial tasks, they
can be a big win.
Boa: <http://www.boa.org/>
thttpd: <http://www.acme.com/software/thttpd/>
mathopd: <http://mathop.diva.nl>
If you need even more ExtremeWebServerPerformance, you probabaly want to
take a look at TUX, written by Ingo Molnar <http://people.redhat.com/mingo>.
This is the current world record holder for SpecWeb99
<http://www.spec.org/osg/web99/results/res2000q3/web99-20000710-00057.html>.
It probabaly owns the right to be called the worlds fastest web server.
Watch Ingo's page for the pending source release.

Samba Tuning
Depending on the type of tests, there are a number of tweaks you can do to
samba to improve its performace over the default. The default is best for
general purpose file sharing, but for extreme uses, there are a couple of
tweaks.
The first one is to rebuild it with mmap support. In cases where you are
serving up a large amount of small files, this seems to be particularly
useful. You just need to add a "--with-mmap" to the configure line.
You also want to make sure the following options are enabled in the
/etc/smb.conf file:
read raw = no
read prediction = true
level2 oplocks = true

Openldap tuning
The most important tuning aspect for OpenLDAP is deciding what attributes
you want to build indexes on.
I use the values:
cachesize 10000
dbcachesize 100000
sizelimit 10000
loglevel 0
dbcacheNoWsync

index cn,uid
index uidnumber
index gid
index gidnumber
index mail
If you add the following parameters to /etc/openldap/slapd.conf before
entering the info into the database, they will all get indexed and
performance will increase.

SysV shm

Some applications, databases in particular, sometimes need large amounts of
SHM segments and semaphores. The default limit for the number of shm
segments is 128 for 2.2.
This limit is set in a couple of places in the kernel, and requires a
modification of the kernel source and a recompile to increase them.
A sample diff to bump them up:
--- linux/include/linux/sem.h.save      Wed Apr 12 20:28:37 2000
+++ linux/include/linux/sem.h   Wed Apr 12 20:29:03 2000
@@ -60,7 +60,7 @@
        int semaem;
 };

-#define SEMMNI  128             /* ?  max # of semaphore identifiers */
+#define SEMMNI  512             /* ?  max # of semaphore identifiers */
 #define SEMMSL  250              /* <= 512 max num of semaphores per id */
 #define SEMMNS  (SEMMNI*SEMMSL) /* ? max # of semaphores in system */
 #define SEMOPM  32             /* ~ 100 max num of ops per semop call */
--- linux/include/asm-i386/shmparam.h.save      Wed Apr 12 20:18:34 2000
+++ linux/include/asm-i386/shmparam.h   Wed Apr 12 20:28:11 2000
@@ -21,7 +21,7 @@
  * Keep _SHM_ID_BITS as low as possible since SHMMNI depends on it and
  * there is a static array of size SHMMNI.
  */
-#define _SHM_ID_BITS   7
+#define _SHM_ID_BITS   10
 #define SHM_ID_MASK    ((1<<_SHM_ID_BITS)-1)

 #define SHM_IDX_SHIFT  (_SHM_ID_BITS)
Theoretically, the _SHM_ID_BITS can go as high as 11. The rule is that
_SHM_ID_BITS + _SHM_IDX_BITS must be <= 24 on x86.
In addition to the number of shared memory segments, you can control the
maximum amount of memory allocated to shm at run time via the /proc
interface. /proc/sys/kernel/shmmax indicates the current. Echo a new value
to it to increase it.
	echo "67108864" > /proc/sys/kernel/shmmax
To double the default value.
A good resource on this is Tunings The Linux Kernel's Memory
<http://ps-ax.com/shared-mem.html>.

Benchmarks

Lies, damn lies, and statistics.
But aside from that, a good set of benchmarking utilities are often very
helpful in doing system tuning work. It is impossible to duplicate "real
world" situations, but that isnt really the goal of a good benchmark. A good
benchmark typically tries to measure the performance of one particular thing
very accurately. If you understand what the benchmarks are doing, they can
be very useful tools.
Some of the common and useful benchmarks include:
Bonnie
No idea where the original page for bonnie is anymore, but it's pretty easy
to google for source packages.
This is a pretty common utility for testing driver performance. It's only
drawback is it sometimes requires the use of huge datasets on large memory
machines to get useful results, but I suppose that goes with the territory.
Check Doug Ledford's list of benchmarks
<http://people.redhat.com/dledford/benchmark.html> for more info on Bonnie.
Dbench
My personal favorite disk io benchmarking utility is `dbench`. It is
designed to simulate the disk io load of a system when running the NetBench
benchmark suite. It seems to do an excellent job at making all the drive
lights blink like mad. Always a good sign.
Dbench is available at The Samba ftp site and mirrors
<ftp://ftp.samba.org/pub/tridge/dbench/>
http_load
A nice simple http benchmarking app, that does integrity checking, parallel
requests, and simple statistics. Generates load based off a test file of
urls to hit, so it is flexible.
http_load is available from ACME Labs
<http://www.acme.com/software/http_load/>
dkftpbench
A (the?) ftp benchmarking utility. Designed to simulate real world ftp
useage (large number of clients, throttles connections to modem speeds,
etc). Handy. Also includes the useful dklimits utility .
dkftpbench is available from Dan kegel's page
<http://www.kegel.com/dkftpbench/>
tiobench
A multithread disk io benchmarking utility. Seems to do an a good job at
pounding on the disks. Comes with some useful scripts for generating reports
and graphs.
tiobench is available at the tiotest site <http://www.icon.fi/~mak/tiotest/>
dt
dt does a lot. disk io, process creation, async io, etc.
dt is available at The dt page <http://www.bit-net.com/~rmiller/dt.html>
ttcp
A tcp/udp benchmarking app. Useful for getting an idea of max network
bandwidth of a device. Tends to be more accurate than trying to guestimate
with ftp or other protocols.
General benchmark Sites
Doug Ledford's page <http://people.redhat.com/dledford/benchmark.html>
ResierFS benchmark page <http://devlinux.com/projects/reiserfs/>

System Monitoring

Standard, and not so standard system monitoring tools that can be useful
when trying to tune a system.
vmstat
This util is part of the procps package, and can provide lots of useful info
when diagnosing performance problems.
Heres a sample vmstat output on a lightly used desktop:
   procs                      memory    swap          io     system  cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
id
 1  0  0   5416   2200   1856  34612   0   1     2     1  140   194   2   1
97
And heres some sample output on a heavily used server:
   procs                      memory    swap          io     system  cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
id
16  0  0   2360 264400  96672   9400   0   0     0     1   53    24   3   1
96
24  0  0   2360 257284  96672   9400   0   0     0     6 3063 17713  64  36
0
15  0  0   2360 250024  96672   9400   0   0     0     3 3039 16811  66  34
0
The interesting numbers here are the first one, this is the number of the
process that are on the run queue. This value shows how many process are
ready to be executed, but can not be ran at the moment because other process
need to finish. For lightly loaded systems, this is almost never above 1-3,
and numbers consistently higher than 10 indicate the machine is getting
pounded.
Other interseting values include the "system" numbers for in and cs. The in
value is the number of interupts per second a system is getting. A system
doing a lot of network or disk I/o will have high values here, as interupts
are generated everytime something is read or written to the disk or network.
The cs value is the number of context switches per second. A context switch
is when the kernel has to take off of the executable code for a program out
of memory, and switch in another. It's actually _way_ more complicated than
that, but thats the basic idea. Lots of context swithes are bad, since it
takes some fairly large number of cycles to performa a context swithch, so
if you are doing lots of them, you are spending all your time chaining jobs
and not actually doing any work. I think we can all understand that concept.
netstat
Since this document is primarily concerned with network servers, the
`netstat` command can often be very useful. It can show status of all
incoming and outgoing sockets, which can give very handy info about the
status of a network server.
One of the more useful options is:
        netstat -pa
The `-p` options tells it to try to determine what program has the socket
open, which is often very useful info. For example, someone nmap's their
system and wants to know what is using port 666 for example. Running
netstat -pa will show you its satand running on that tcp port.
One of the most twisted, but useful invocations is:
netstat -a -n|grep -E "^(tcp)"| cut -c 68-|sort|uniq -c|sort -n
This will show you a sorted list of how many sockets are in each connection
state. For example:
      9  LISTEN
     21  ESTABLISHED
ps
Okay, so everyone knows about ps. But I'll just highlight one of my favorite
options:
ps -eo pid,%cpu,vsz,args,wchan
Shows every process, their pid, % of cpu, memory size, name, and what
syscall they are currently executing. Nifty.

Utilities

Some simple utilities that come in handy when doing performance tuning.
dklimits
a simple util to check the acutally number of file descriptors available,
ephemeral ports available, and poll()-able sockets. Handy. Be warned that it
can take a while to run if there are a large number of fd's available, as it
will try to open that many files, and then unlinkt them.
This is part of the dkftpbench package.
fd-count
a tiny util for determining the number of file descriptors available.
fd-count.c <tuning_utils/fd-count.c>
thread-limit
A util for determining the number of pthreads a system can use. This and
fd-count are both from the system tuning page for Volano chat
<http://www.volano.com/linuxnotes.html>, a multithread java based chat
server.
thread-limit.c <tuning_utils/thread-limit.c>

System Tuning Links

<http://www.kegel.com>
Check out the "c10k problem" page in particular, but the entire site has
_lots_ of useful tuning info.
<http://linuxperf.nl.linux.org/>
Site organized by Rik Van Riel and a few other folks. Probabaly the best
linux specific system tuning page.
<http://www.citi.umich.edu/projects/citi-netscape/>
Linux Scalibity Project at Umich.
<http://home.att.net/~jageorge/performance.html>
Linux Performace Checklist. Some useful content.
<http://www.linux.com/tuneup/>
Miscelaneous performace tuning tips at linux.com
http://www.psc.edu/networking/perf_tune.html#Linux
<http://www.psc.edu/networking/perf_tune.html>
Summary of tcp tuning info

Music
Careful analysis and benchmarking has shown that server will respond
positively to being played the approriate music. For the common case, this
can be about anything, but for high performane servers, a more careful
choice needs to be made.
The industry standard for pumping up a server has always been "Crazy Train",
By Ozzy Ozbourne. While this has been proven over and over to offer
increased performance, in some circumstances I recomdend alternatives.
A classic case is the co-located server. Nothing like packing up your pride
and joy and shipping it to strange far off locations like Sunnyvale and
Herndon, VA. Its enough to make a server homesick, so I like to suggest
choosing a piece of music that will remind them of home and tide them over
till the bigger servers stop picking on them. For servers from North
Carolina, I like to play the entirety of "feet in mud again" by Geezer Lake
<http://www.slendermusic.com/articles/record/geezer.phtml>. Nothing like
some good old NC style avant-metal-alterna-prog.
Comentary, controverys,chatter. chit-chat. Chat and irc servers have their
own unique set of problems. I find the polyrythmic and incessant restatement
of purpose of Elephant Talk
<http://www.elephant-talk.com/releases/discipli.htm> by King Crimson a good
way to bend those servers back into shape.
btw, Xach says "Crazy Train" has the best guitar solo ever.

TODO
Add info for oracle tuning
any other useful server specific tuning info I stumble across
add info about kernel mem limits, PAE, bigmeme, LFS and other kernel related
stuff likely to be useful


Back to the Index