README-file for the distribution of the Norwegian dictionaries for ISPELL.

COPYING

 Copyright  1998 by Rune Kleveland <runekl@math.uio.no>

    This dictionary is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This dictionary is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this dictionary; if not, write to the Free Software
    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.



DESCRIPTION

This distribution contains Norwegian dictionaries for ispell.  The big
dictionary contains 81 350 ispell roots and 338 500 words.  Since it
is so big, people who know Norwegian well will make most use of it.
The small dictionary contains 30 000 ispell roots and 104 800 words.
It is based on the original ispell_norsk from ftp.uio.no:/pub/ispell
with 66651 words and 45000 roots, but many wrong and rare words have
been removed, and expansion-flags have been added for the words left.
Among other things, this makes use of the -C flag for ispell much
safer.


The latest version is available at

http://www.uio.no/~runekl/dictionary.html

Comments, suggestions and bug-reports to runekl@math.uio.no.


This distribution contains 12 files.

README:        This file.

COPYING:       The GNU general public license.

norsk.base:    The list of words with affix flags.

norsk.orig:    A smaller list of words with affix flags based on the
               original ispell_norsk from ftp.uio.no/pub/ispell.  But
               it is substantially improved and adapted to ispell.
               This is a true subset of norsk.sml.

norsk.matte:   A small collection of words for mathematics.

norsk.olje:    More additional words, mainly from math and oil-technology.

norsk.skole:   A few words from the field of education.

norsk.7bit:    An extended! and improved(?) version of Ivar Aavatsmarks
               norsk.aff.  You *must* use this affix file and not the
               old one.  If you don't, the dictionary will be plain
               wrong.

norsk.cfg:     An addition to babel-3.6 for latex that makes the
               `"'-character active.  See the GOODIES-section.

expand.sh:     Expands a dictionary for use as a /usr/dict/words file.
               This file is used by ispell to search for words
               matching a pattern and for listing of possible
               competitions of words, but it is not required and
               takes more than 4 Mb if norsk.base is used.

munch.sh:      A shell script you can use if you want to re-munch
               the dictionary, perhaps after adding a lot of new
               words.  It runs munchlist and removed some redundant
               flags that munchlist does not remove.   It also checks
               if the munching was done correctly.  This script is
               closely related to this version of norsk.aff, and must
               not be used with any other affix file.

munchlist.el:  Some really simple emacs functions i have used when
               developing the dictionary.  You might use them if you
               want to contribute a dictionary.



INSTALLATION

The affix file use 57 flags, so ispell must be compiled with
MASKBITS=64.  Redhat-5.0's version of ispell has only 32 maskbits, so
if you use this distribution you will need to recompile.  Get the
sources from somewhere and read the instructions.

There is a bug in ispell's parsing-routine of the affix-file. This bug
prevents ispell from accepting some legal flags, so without patching
ispell you will only have access to 52 flags.  There are two
solutions; You can patch ispell or you can use MASKBITS=128.  The
latter solution will make the hash-files bigger and ispell slower, and
is not recommended.

Here is the patch you need (I picked it up on no.linux newsgroup).

-----------------------------------------------------------------------
*** parse.y     Mon Nov 21 19:26:05 1994
--- parse.y     Fri Apr 17 19:12:31 1998
***************
*** 872,874 ****
  #if MASKBITS <= 64
!                           if (!isalpha (flagbit))
                                yyerror (PARSE_Y_BAD_FLAG);
--- 872,874 ----
  #if MASKBITS <= 64
!                           if (flagbit < 'A' || flagbit > 'z')
                                yyerror (PARSE_Y_BAD_FLAG);
***************
*** 904,906 ****
  #if MASKBITS <= 64
!                           if (!isalpha (flagbit))
                                yyerror (PARSE_Y_BAD_FLAG);
--- 904,906 ----
  #if MASKBITS <= 64
!                           if (flagbit < 'A' || flagbit > 'z')
                                yyerror (PARSE_Y_BAD_FLAG);
-----------------------------------------------------------------------

The following local.h works for me on my Slackware-linux-system.  You
have to adopt the file to those languages you have dictionaries for.

-----------------------------------------------------------------------
#define MINIMENU	/* Display a mini-menu at the bottom of the screen */
#define USG		/* Define this on System V */
#define CFLAGS	"-O3"

/*
 * Important directory paths
 */
#define BINDIR	"/usr/bin"
#define LIBDIR	"/usr/lib/ispell"
#define ELISPDIR "usr/share/emacs/site-lisp"
#define TEXINFODIR "/usr/info"
#define MAN1DIR	"/usr/man/man1"
#define MAN4DIR	"/usr/man/man4"
/*
 * Place any locally-required #include statements here
 */

#define LANGUAGES "{american,MASTERDICTS=american.med+,HASHFILES=americanmed+.hash} {deutsch} {norsk} {svenska} {dansk} {francais}"

#define MASKBITS	64
-----------------------------------------------------------------------

The installation:

1.  Get ispell-3.1.20.tar.gz and unpack the sources in /usr/src
2.  Put all the files from this distribution in
    ispell-3.1/languages/norsk

    cd /usr/src/ispell-3.1/languages/norsk
    gzip -d < ispell-norsk-1.1.tar.gz | tar -xvf -

3.  Decide what you want in your dictionary, and put it in the file
    norsk.sml. You must include norsk.base *OR* norsk.orig.

    cat norsk.base norsk.matte > norsk.sml

4.  Follow the installation instructions for ispell.  Note that the
    TeXinfo file distributed with ispell does not work with
    TeXinfo-3.12.  Just remove ispell.info from the target all in the
    top makefile to get around this, and run makeinfo manually.

If you don't need to [re]install ispell (if you already have applied
the patch and have MASKBITS=64), you can only unpack the sources
somewhere and say something like

cp norsk.7bit norsk.aff
buildhash  norsk.sml norsk.aff norsk.hash
cat norsk.base norsk.matte > norsk.sml
cp norsk.hash norsk.aff /usr/lib/ispell
chmod 644 /usr/lib/ispell/norsk.aff
chmod 644 /usr/lib/ispell/norsk.hash

assuming your hash-files is in /usr/lib/ispell


CHARACTER SETS

By default ispell assumes you use latin-1 encoding in your files.  To
spell-check such a file you just say

ispell -d norsk mythesis.tex

In TeX you can use `{\aa}', `{\oe}', `{\o}', `\'e', `\^o' and `\^o' to
represent the spesial norwegian characters.  If you do this, you have
to say

ispell -T plaintex -d norsk mythesis.tex

to spell-check a file.  The characters  will not be recognized
then, so you have to choose one standard.

In a plain ASCII file  are sometimes represented ae oe aa.  Use

ispell -T ascii -d norsk mythesis.tex

to spell-check such a file.  Question: How is/should ,  and  be
written in such a file?

The iso246 (?) encoding puts  after z in the collating sequence.
If you use this encoding, say

ispell -T iso246 -d norsk mythesis.tex


COMPOUND WORDS

Ispell has two very important switches, -B and -C, controlling whether
ispell accepts words formed by a root and another word as correct.  If
the -C flag is given, ispell will accept words as
`avdelingsbestyrerstilling', which is right, but also words as
`benyttebenzen', which is wrong.  Since the norsk.base dictionary is
so big, it is not nessesary to give the -C option to get good
performance.  What you do depends on how important it is to get
everything right.  By default ispell.el for emacs use the -C flag.

The variable compoundmin controls how long the words have to be to be
accepted as one part of a compound word.  It is set to 4 by default in
this affix file, which is sensible.  It should NOT be set to anything
less than 3!


EMACS

The version of `ispell.el' distributed with emacs-19.34 does not
support norsk.  I suggest you get the latest ispell.el from
ftp://kdstevens.com/pub/stevens/ispell.el.gz This version is included
in emacs-20.3.

This is an extended version with region skipping and spelling-control
on the fly.  And it also supports norsk, even the characters .
You can even extend it yourself so it for example skips the amsmath
environments.  It should work with emacs-19 and above.

Follow the installation instructions in the file closely.  It is not
sufficient just to load the package.

There is also a file flyspell.el around.  This also offers
spell-checking on the fly, and the interface is more like m$-word.
Flyspell-mode highlights incorrect words, and you can even click on
them to get suggestions for correct spelling.


GOODIES --- mainly for the friends of TeX

Have you ever considered the problem of hyphenating the word `villede'
in TeX.  Of course the hyphenation should be `vill-lede', thus an
extra `l' should be added.

Most languages which have such hyphenation (in particular German, with
ss) support this in babel.  Norsk does not for some unknown reason.

If you would like to be able to code villede as vi"llede to get
correct hyphenation you need to add support in babel for this.  The
file norsk.cfg is included in this distribution.  It makes the
character " active and offers you many `different' hyphen signs.  Just
place it somewhere LaTeX can find it.  Merge it with your
norsk.cfg-file if you have one.  See the top of the file for some
documentation.  It is quite likely that this functionality will be
added to babel in version 3.7.

The dictionary norsk.base will support words coded as vi"llede,
spi"sslede etc., and ispell even suggest this spelling for for villede
and villlede.  You have to declare `"' as a boundarychar in ispell.el.
If you don't want this feature, remove all `"'-characters from
norsk.sml before installing.


The complete ispell-entry for norsk in ispell-dictionary-alist should
be

    ("norsk"                           ;8 bit Norwegian mode
     "[A-Za-z\351\346\370\345\350\364\362\311\306\330\305\310\324\322]"
     "[^A-Za-z\351\346\370\345\350\364\362\311\306\330\305\310\324\322]"
     "[\"]" nil ("-C" "-d" "norsk") "~list" iso-latin-1)  

where you replace -C with -B if you want to be more careful by
default.  In fact I recommend that.

If you use plain TeX, add the entry

    ("norsk7-tex"                      ;7 bit Norwegian mode
     "[A-Za-z{}\\'^`]" "[^A-Za-z{}\\'^`]"
     "[\"]" nil ("-C" "-d" "norsk") "~plaintex" nil)

and

    ("norsk7-ascii"                    ;7 bit Norwegian
     "[A-Za-z]" "[^A-Za-z]"
     "[\"]" nil ("-C" "-d" "norsk") "~ascii" nil)

if you use aa ae and oe to write .  Finally, if you use {|} to
represent , add the entry

    ("norsk7-iso246" "[][A-Za-z{}|\\]" "[^][A-Za-z{}|\\]"
      "[\"]" nil ("-C" "-d" "norsk")  "~iso246" nil)


CUSTOMIZATION OF THE DICTIONARY

There are some new controversial spelling conventions in Norwegian.  I
Personally strongly dislike that you can write `de steinete menneskene
i Kabul' and get away with it.  If you disagree, uncomment the
appropriate line under the M-flag in the affix-file.

You can also comment out some affixes that ends with an `a', for
example in the t-flag if you don't like these forms.  This might be
useful only to people employed by Aftenposten.

You can also uncomment the spellings kafen ect.  Such spellings are
not allowed by default, since use of accents where it is not really
needed is not recommended in Norwegian.  Customization options are
marked with the word `valgfritt', and changing these should be safe.

When you have done this, you only need to rehash.  I can not guarantee
that all the forms you think of will be gone, but you get rid of lots
of them, maybe even more than you planned.


WHAT ABOUT MY PRIVATE DICTIONARY?

If you have a lot of words you wish to include in the dictionary in
your private dictionary mywords.dic you can say

cat myvords.dic | ispell -l -d norsk > onlymywords.dic

or

cat myvords.dic | ispell -a -d norsk > onlymywords.information

This first line will produce a list of words in your dictionary that
is not in the main dictionary at astonishing speed.  The second list
proposals for correct spelling and is much slower.  If you really want
to do good work, you might want to have a look at my really primitive
but functional emacs-commands in munchlist.el.  Suppose you have the
words

gjennomstrmningsmekanisme
gjennomstrmningsmekanismen
gjennomstrmningsmekanismens
gjennomstrmningsmekanismer
gjennomstrmningsmekanismene

in your onlymywords.dic.  Load munchlist.el. If you mark this region
and press C-c b, then `gjennomstrmningsmekanisme/AEG' is returned, so
the line

gjennomstrmningsmekanisme/AEG

represents this five words.  (Of course this only work if ispell and
munchlist is correctly installed.) Mark your whole file, and do the
same.  Then munchlist munches all your words, and you can go through
the file and add/modify the suffixes as you want. C-c c shows the line
expanded, C-c v a region expanded.  Of course you need to read and
understand norsk.aff to do this properly.

If you know emacs-lisp and would like to contribute something better
than my functions, you are most welcome.

If you have a *correct* list of words not in the dictionary,
preferably from within a some field (physics, math...), I would like
to include it in the next release.


CHANGES since 1.0

- Fixed the affix file.  Now it supports plain TeX representations of
  Norwegian characters.

- Fixed a bug in the U flag in norsk.aff.

- Reorganized the -messige flags to save one flag.

- Added the ~Z and ~_ flags.  This flags only take effect if the
  word if it is a part of compound word.  Useful in words like
  kostnad-s.  Added about 1000 ~Z-flags in norsk.base for better
  compound word support.

- Upgraded the munch.sh script to handle the new flags.  Munchlist does
  not do this by default, since the *J flag is similar (I think).

- Made munch.sh generate null.aff needed by the script.

- Added the *very* small script expand.sh to expand the dictionary for
  use as a /usr/dict/words file.

- The norsk.olje file has been split.  Some words have gone to
  norsk.matte, some to norsk.base and some have disappeared.

- A few common words missing in norsk.orig have been added.

- Added some roots of words in norsk.orig.  Result: Better munching.

- A new small file norsk.skole containing some words from the field of
  education have been added.  Contribute --- and it will grow!



TODO

- Remove all incorrect words.

- Add more user-includable dictionaries covering different fields of
  science.

- Add support for controlled compoundwords to make use of the -C
  switch safer.  See ispell man-pages.

- Make an affix file with only 32 flags as an option to those who do
  not want to recompile ispell.  Maybe not?

- Split the dictionary in four parts, where the first part contains
  the most common words and the fourth contains the rare words.  The
  big dictionary contains far to many uncommon words that only
  confuses the average writer, and the small one doesn't contain the
  right subset.

