
			TUNING_NOTES FOR DIABLO

(0) Location of options

    Diablo compilation options mainly appear in two files: lib/config.h and
    lib/vendor.h.  lib/config.h is supposed to hold only permanent 
    configuration options.  The more advanced options are usually disabled
    unless it is possible to do preprocessor conditionals on the OS version.

    Diablo has numerable startup-time options specified in diablo.config.  See
    samples/diablo.config for a description.

    Generally speaking, any option overrides that you do should be done in
    lib/vendor.h
 
(I) Use of mmap()

    Diablo requires at least shared read-only file mmaps to work properly.
    This is known to work on Sun, Solaris, IRIX, AIX, and FreeBSD.  

    BSDI releases including 3.0 are known to have serious problems with mmap()
    and it is not suggested that you run diablo on it.  The most recent BSDI
    releases (as of Oct 1998) should have no problem with Diablo's use of
    mmap().

    SYSV SHARED MEMORY is required.  All major platforms support this, but
    Solaris uses ridiculously tiny kernel hard limits for shared memory and
    you will have to bump them up significantly in order to run Diablo.

    Once you get past shared read-only file maps, you get into shared 
    read-write file maps, shared read-write anonymous maps, and sys-v
    shared memory maps.  These are optional.  I believe the Sun, Solaris, 
    IRIX, and FreeBSD support shared r/w maps but SunOS does not support 
    anonymous maps (solaris does).  Most systems support sys-v shared memory.
    I have only tested advanced mmap features on FreeBSD.

    Diablo should work fine with systems which do not have a unified buffer
    cache for read+write mmaps, such as BSDI.

    Memory allocation features are defaulted in lib/config.h and can be
    overriden in lib/vendor.h, but the defaults ought to work for you.

(II) memory, disk, and cpu

					CPU

    A 100 MIPS class cpu is suggested for up to 40 feeds, a 200 MIPS class cpu
    is suggested otherwise.  A 200+ MIPS class cpu is necessary if you are 
    running a full feed into a reader and wish to support more then 500 online
    readers.

    I use FreeBSD-current on Pentium-II boxes myself.  A Pentium-II/300 would
    work just dandy.

					MEMORY

    A minimum of 128MB of ram is required (mainly to maintain the dhistory 
    file efficiently).  If you have more then 30 feeds, 192MB of ram is
    suggested.  If you have more then 70 feeds, 256MB of ram is suggested.
    The more memory the merrier.

    For a reader box, 256MB of memory is suggested but 128MB will work if you
    have fewer then, say, 100 readers.


					DISK

    A minimum of three disks (of any capacity) is recommended.  It is 
    recommended that you stripe all three disks together into a single
    big disk, and then cut pieces out of the single big disk to create your
    /news, /news/spool/news, /news/spool/group (for a reader), and
    /news/spool/cache (for a caching reader) partitions.  You can use 
    large-capacity drives and a single SCSI controller but an ultra-wide
    controller (such as the Adaptec 2940UW) is recommended.

    You should not need more then four or five disks even in a maximal
    configuration unless you are trying to slap together an insanely huge
    spool.  Three 18G drives will work for a nominal reader machine
    assuming you give a short expire to large postings.

    The machine should not ever have to swap, but swap should be configured
    to allow the machine to retire idle processes.  I suggest configuring
    128MB of swap on every disk to spread any swap activity around.

(III) sysctl tuning on FreeBSD

    FreeBSD allows you to tune certain parameters via sysctl.  You typically
    want to do the following:

    /sbin/sysctl -w kern.maxvnodes=16384
    /sbin/sysctl -w kern.maxfiles=16384
    /sbin/sysctl -w net.inet.tcp.always_keepalive=1
    /sbin/sysctl -w net.inet.tcp.keepidle=1800
    /sbin/sysctl -w net.inet.tcp.rfc1323=1
    /sbin/sysctl -w net.inet.tcp.rfc1644=0

    You can also tune VM parameters such as vm.v_cache_min, but unless you
    exactly what you are doing I suggest leaving the VM parameters alone.
    FreeBSD-3.x is especially capable of dynamicaly tuning itself without
    intervention from sysops.

    Always be careful when configuration the tcp buffer sizes (typically
    command line options to diablo and dreaderd or in diablo.config).  Due
    to the large number of active connections Diablo maintains, too large
    a tcp buffer size may waste too much kernel memory.

(IV) file descriptors, process limits, datasize resource limits

    Configure the system to support a minimum of 512 descriptors per process
    and at least 8192 descriptors for the system as a whole.  The system
    must support at least 512 processes per user and 1024 total processes.
    This may involve both kernel configuration and resource limit settings.

    The number of descriptors used by Diablo will increase 6 fold if you
    turn on reader expiration (variable expire), verses feeder expiration
    (straight FIFO expire).  If you are running a spool for a reader you
    generally MUST use the reader expiration config option in diablo.config.

    The per-process datasize limit should be at least 128MB.

    NOTE:  FreeBSD has an /etc/login.conf file.  You must ensure that 
    sufficient limits are set for daemon, default, standard, root, and
    news.  Specifically, do not set a small hard datasize limit in daemon
    or cron will not be able to re-limit the process to a higher datasize
    limit.  'datasize=...' is a HARD limit.  'datasize-curr=...' is a 
    soft limit.

    NOTE:  the rc.news file and the ~news user's .cshrc should probably unlimit
    all resources (in tcsh or csh, simply 'unlimit').

(V) Kernel Configuration for kernel builds (NBUFs, NMBCLUSTERS, etc...)

    On kernels for which filesystem buffers are static, configure a large
    number of buffers.  If you have 256MB of ram, I would dedicate half
    of it to filesystem buffers.

    On FreeBSD-2.2.x boxes you may have to increase NBUF.  However, on 
    FreeBSD-current (3.x) boxes NBUF is dynamically sized and should not have
    to be messed with.

    Since you are going to be running a large number of tcp connections,
    you should probably increase NMBCLUSTERS (again, a BSDish kernel option).
    I suggest at least 4096 and perhaps even 6144 or 8192.  You also specify 
    the system-supported maximum user data segment size in the kernel config.

    The typical FreeBSD kernel config line is:

	# FreeBSD-2.2.x only
	#options "NBUF=6144"

	# other potentially necessary options
	options "NMBCLUSTERS=4096"
	options "MAXDSIZ=(512UL*1024*1024)"
	options	SYSVSHM
	options	SYSVSEM
	options SYSVMSG

    Other experimental features you may want to try with BSD kernels include
    SOFTUPDATES (on FreeBSD, see README.softupdates in /usr/src/sys/ufs/ffs),
    AHC_ALLOW_MEMIO, and KTRACE.  You can also tune the kernel by configuring
    only those cpu classes you actually expect to use.  For example, my 
    kernel config only has "I686_CPU" defined (pentium-pro or pentium-II class
    cpu).

    I highly recommend turning softupdates on in FreeBSD-current, though it
    should be noted that you should have an up-to-date OS release as releases
    prior to September 1998 are a bit too buggy in the softupdates department.

(VI) DHistory file tuning

    Diablo should be able to handle upwards of 3000 accepted articles/min
    and message-id history lookups (check/ihave) rates between 40,000 and
    100,000 lookups/minute.  The actual performance depends heavily on
    the amount of memory you have and the number of diablo processes 
    in contention with each other.

    Full feeds are getting big enough that you may want to seriously consider
    increasing the default dhistory hash table size from 4m to 8m in 
    diablo.config.  This will reduce disk I/O on the dhistory file at the
    cost of another 16MB (32MB total) of memory dedicated to caching the
    dhistory file's hash table.  Note: the hash table size must be a power
    of 2 so you have limited options here.

    Many kernels will bog down on internal filesystem locks as the number
    of incoming feeds rises.  You need to worry once you get over 35 or so
    simultanious diablo processes.   Adding memory, reducing the size of
    the dhistory file, or increasing the hash table size will help here.

    The dhistory file defaults to a 14 day retention and will stabilize
    at between 350 and 400 MBytes given an article rate of 800,000 articles/day
    (a full feed as of this writing).  You can configure a lower expiration
    by setting the 'remember' variable in diablo.config to a lower number,
    such as 7 or 3.  

    It is recommended that you set this option to between 3 and 7 days in
    order to keep the dhistory file a reasonable size.

(VII) Tuning outgoing feeds to INN

    Please examine the samples/dnewsfeeds file.  Generally speaking, you need
    to tune any outgoing feeds to INN reader boxes.

    You should consider cutting control messages in front of articles
    and then delaying non-control messages by 5 minutes.  This will allow
    cancel controls to leap ahead of articles and reduce INN's article write
    overhead (which is usually the big bottleneck in INN).

    Typically, you separate control messages out by creating two separate
    feeds to your reader box.  The first one has a 'delgroupany control.*',
    and the second one has a 'requiregroup control.*'.  Taking the example
    from the sample dnewsfeeds file:

	# dnewsfeeds
	#
	label   nntp2a
	    alias       nntp2.best.com
	    ... other add and delgroups ...
	    delgroupany control.*
	end

	label   nntp2c
	    alias       nntp2.best.com
	    ... other add and delgroups ...
	    requiregroup control.*
	end

    Then, in dnntpspool.ctl you program the normal feed for queue-delayed,
    to delay it by 5 minutes (assuming you run dspoolout from cron every 5
    minutse), and you program the control feed as realtime.  Also, if you
    don't mind slightly longer delays, q2 may be a better choice then q1.

	# dnntpspool.ctl
	#
	nntp2a          oldnntp.best.com                500     n4 q1
	nntp2c          nntp1x.ba.best.com              500     n4 realtime

(VIII) Tuning Incoming feeds

    The main thing to remember when tuning incoming feeds is that the 
    load on your news system is related to the number of message-id check
    or ihave requests you receive.  You do not have to go overboard taking
    full feeds... three or four incoming full feeds is quite sufficient.
    Most other incoming feeds will be from smaller sites and having them
    ship you just the local postings is good enough.  The message-id load
    determines how quickly your news box can catch up after prolonged
    downtime or loss of network connectivity and it may be a good idea to
    test this by purposefully taking the machine offline for an hour, just
    to see where you stand.

    There is more then one way to ensure incoming feed redundancy.  Due to
    the way the precommit cache works, if you get offered the same article
    from N different feeds at the same time, Diablo will return a duplicate
    reply code to all but one of those incoming feeds.  If diablo were to
    crash at that point, you wind up relying on that one incoming feed to
    retry the article because the others have already marked it off. 

    It may be beneficial to purposefully lag one of your incoming full
    feeds to provide added redundancy.  This is something your peer must
    set up for you and it isn't easy unless they are running news software
    that can do it.  Diablo 1.12 or greater can through the 'q2' or 'q3'
    option in dnntpspool.ctl on the feeder.  While this virtually guarentees
    that you will never accept an article from that particular site under
    normal conditions, it gives your system added redundancy by ensuring
    that the same message-id will be offered at two different times.  If
    something does go wrong, the time delay may help you recover more quickly
    without any article loss.

    You may also wish to tune the TCP buffer size used by the sending machine
    to your machine, especially for internal feeds.  A larger buffer size will
    increase Diablo's ability to absorb disk I/O bottlenecks by increasing
    the size of the streaming pipeline.

(IX) Tuning dexpire

    There are two cron jobs that deal with dexpire.  The first is called
    quadhr.expire and nominally runs dexpire every four hours (6 times a day).
    The second is called hourly.expire and attempts to rerun dexpire if
    the quadhr cron fails.

    DExpire in Diablo is very fast.  Since diablo stores multiple-articles
    per spool file, DExpire is able to free up disk space very quickly and
    you should not be scared of running it often.  DExpire's biggest hog
    is that it must scan the dhistory file.  Unlike INN's expire, dexpire
    does not rewrite the dhistory file.  Instead, it expires entries in-
    place which is considerably faster.

    NOTE!!! If you have a small spool (8G or smaller) you may be forced
    to run dexpire once an hour rather then once every four hours.  If you
    have a larger spool you can get away with once every four hours but
    may have to increase the free space margin.  See samples/adm/hourly.expire
    and samples/adm/quadhr.expire.

    The sample expiration cron jobs adm/quadhr.expire and adm/hourly.expire
    set a free space target of 2.5 gigabytes.  This is the suggested free space
    target if you run expire every 4 hours and is designed to deal with
    large influxes of data that may occur in a 4 hour period, but not really
    designed to deal with an unfiltered full feed.

    If you are running an unfiltered full feed (15GB/day) you should run
    dexpire once an hour rather then once every four hours with a 2.5GB
    free space margin.

    You can manually retire articles from the spool if you like without running
    dexpire.  The history file will still be completely synchronized when
    dexpire does run.  The only legal way to retire articles from the spool is
    to remove an entire spool directory.  You MUST RENAME the directory before
    rm -rf'ing it or you risk creating corrupted article files due to the way
    diablo's article create/append spooling works.  ** YOU CAN NEVER RECREATE A
    DELETED DIRECTORY **.  This will cause history record corruption.  If
    your spool is configured for reader-mode expiration (see diablo.config),
    you can only legally remove spool directories in sorted order or a diablo
    restart will improperly regenerate the 'missing' directories, creating
    the potential for feed corruption.

    dexpire is now capable of retiring articles without updating the history
    file, a very fast operation that can be run on a shorter scheduler if
    you feel the history update takes too much time.  Using dexpire with
    the -h0 option is safer then expiring the spool manually.

    IF YOU ARE USING A SOFTUPDATES-MOUNTED FILESYSTEM, statfs() will not 
    necessarily return the actual amount of free space on the fs.  You should
    use the -s option to dexpire to force it to sync/sleep/sync/sleep so
    statfs() returns a more reasonable value.  If you do not do this, dexpire
    may deleted 80% of your spool before it realizes that it has freed up 
    enough space!

(X) Spam Filter

    The spam filter is turned on by default in Diablo.  You can disable or
    adjust the spam filter parameters with the -S option to diablo 
    (man diablo).  You will almost certainly want to keep the spam filter
    turned on because it does a pretty good job detecting attacks, and the
    USENET gets attacked quite often these days.

    Unfortunately, the filter's best defense is rate-checking the
    NNTP-Posting-Host: header and this will break some news sources that
    propogate news with POST rather then via a transit feed.  I consider these
    news sources broken anyway, but if you do not you may have to mess with
    either the filter parameters or with the addspam and delspam directives
    in dnewsfeeds to create exception cases.

(XI) SOLARIS SPECIFIC NOTES

    The shared memory defaults in /etc/system may have to be tuned due to
    having too low a maximum segment size, the following is suggested:

	set shmsys:shminfo_shmmni = 100
	set shmsys:shminfo_shmseg = 16
	set shmsys:shminfo_shmmax = 16777216

    The file descriptor limits may also be too low, the following is
    suggested:

	set rlim_fd_max = 4096
	set rlim_fd_cur = 1024

(XII) LINUX SPECIFIC NOTES ( work in progress )

    ( Ray Rocker <rocker@ametro.net>, in regards to dnewslink problems )

    I started having the hung dnewslink problem when I brought the kernel
    from 2.0.29 to 2.0.33. At that time, the diablo version didn't seem
    to matter. And the hang was definitely mmap-related; I traced the
    hangs to a kernel function in that area, though I don't remember
    now what it was exactly.

    I've been running 2.0.29 kernel and 1.14 diablo since then w/o problem.
    Upgrading both is high on my todo list...



