Custom Hunspell Dictionary¶
Overview¶
I use a custom dictionary, since the Hunspell plugin for Eclipse does not look at my personal dictionary, nor does it allow you to add words from any of the Plugins editors I use, Pydev and ReST to name a few.
The solution was to copy the package dictionary to a new location and add words to it manually.
See this document eclipse spell check, on how to install Hunspell spelling service for Eclipse on a Linux workstation.
Create a new dictionary¶
Install Hunspell if you haven’t already.
$ sudo apt update
$ sudo apt install hunspell
Rather than start from scratch, we will be copying the dictionary that comes with the Hunspell package. We could use the package dictionary, but whenever the Hunspell package is updated, it would overwrite any changes you made.
Note
You need to copy both the .dic and .aff files, we will discuss the .dff file later.
$ cd
$ mkdir .hunspell
$ cd .hunspell/
$ cp /usr/share/hunspell/en_US* .
$ ls
en_US.aff en_US.dic
$ sudo chown billf: * # (use your UID obviously)
Test your new dictionary¶
Check a word to test your new dictionary.
Note
Do not use the full dictionary file name, only en_US
$ echo "Linux" | hunspell -d ~/.hunspell/en_US
@(#) International Ispell Version 3.2.06 (but really Hunspell 1.7.0)
*
Hunspell returned an * which means it found that word spelled correctly.
Add words to the dictionary¶
Check the spelling of the word "sudo"
$ echo "sudo" | hunspell -d ~/.hunspell/en_US
Hunspell 1.7.0
& sudo 3 0: suds, ludo, sumo
The first letter returned was an &, so Hunspell did not find a match in the dictionary. It would have been an * if it did, but it did find some close matches 3 total, not good enough.
Now, lets add sudo to our custom dictionary.
Danger
Be careful when using this, if you use only one > it will erase the whole dictionary and only add the single word you echoed. You should always make a backup of your custom dictionary.
$ cd ~/.hunspell
$ cp en_US.dic en_US.dic_BU_`date +"%m_%d_%Y_%I_%M_%p"`
$ echo "sudo" >> ~/.hunspell/en_US.dic
Lets check the spelling again:
$ echo "sudo" | hunspell -d ~/.hunspell/en_US
Hunspell 1.7.0
*
Found the correct spelling, returned an *.
You can also add several words at one time. To do that, add the words to a file, each on a new line.
Using the file new_words in this example.
$ cd ~/.hunspell
$ cp en_US.dic en_US.dic_BU_`date +"%m_%d_%Y_%I_%M_%p"`
$ nano new_words # add your new words each on a new line
$ cat "new_words" >> en_US.dic
Note
You need to restart Eclipse for the new word(s) to show up, when Eclipse starts it caches the dictionary.
Understanding the .dic and .dff files¶
The .dff or affix file is used to cut down on the number of entries in the .dic dictionary file, by using a single word in the dictionary that has common suffixes and prefixes. For example the word build with a suffix builders or suffix building.
I’m going to use build for an example.
Lets first look at the build word in the dictionary file.
$ grep build en_US.dic
bodybuilder/SM
bodybuilding/M
build/SMRZGJ
builder/M
building/M
buildup/SM
outbuilding/MS
overbuild/SG
rebuild/SG
shipbuilder/SM
shipbuilding/M
The build word has /SMRZGJ types. Let see where those come from and what they stand for.
Lets start with the "S" and grep the affix file en_US.aff looking "SFX S". The SFX stands for suffix and the PFX in the file stands for prefixes.
$ grep "SFX S" en_US.aff
SFX S Y 4
SFX S y ies [^aeiou]y
SFX S 0 s [aeiou]y
SFX S 0 es [sxzh]
SFX S 0 s [^sxzhy]
Each of the types work the same.
M type match if 's is on the end of the word build’s
$ grep "SFX M" en_US.aff
SFX M Y 1
SFX M 0 's .
R type match if er is on the end of the word builder
$ grep "SFX R" en_US.aff
SFX R Y 4
SFX R 0 r e
SFX R y ier [^aeiou]y
SFX R 0 er [aeiou]y
SFX R 0 er [^ey]
Z type match if ers is on the end of the word builders
$ grep "SFX Z" en_US.aff
SFX Z Y 4
SFX Z 0 rs e
SFX Z y iers [^aeiou]y
SFX Z 0 ers [aeiou]y
SFX Z 0 ers [^ey]
G type match if ing is on the end of the word building
$ grep "SFX G" en_US.aff
SFX G Y 2
SFX G e ing e
SFX G 0 ing [^e]
J type match if ings is on the end of the word buildings
$ grep "SFX J" en_US.aff
SFX J Y 2
SFX J e ings e
SFX J 0 ings [^e]
Here are a couple of links with more detail on the affix file.
https://zverok.github.io/blog/2021-03-16-spellchecking-dictionaries.html