numberwords

This project provides wordlists for conversion between numbers and pronounceable words, and scripts with sample implementations of the necessary methods.

The wordlists were created with several principles in mind:

In addition, encoding of geographical coordinates is presented as application for such wordlists.

format of wordlists

The NUMBERs must be subsequent integers starting with 0, i.e the last one must be MAX-1.

This permits a program using these files to convert between words and integers in the range 0..(MAX-1) or alternatively in the range 1..MAX. (In the latter case, the indices of course must be increased by 1.)

A program using these lists shall convert an integer (first column) to the first word (second column) of the equivalent words of the corresponding line, but should accept any of the words of this line when converting words to integers.

example

# 1001 lines of four-letter words
0 able abel
1 ache
2 acid
3 acre
4 aeon
...

files

The scripts are explained further below.


possible applications

choice of lists

The lists four and five can be used in several ways:

  1. Only one of them;
  2. both intermittently;
  3. combined as fourplus.

If the shortest possible words should be used, case 1 with only four might be best suited.

The combination (case 3) is be useful in case the number of words must be as small as possible.

The intermittent use of both lists is safest from the viewpoint of recognition of words, as they are mutually exclusive and provide a small implicit verification check. In addition, the change in word length might be more comfortable to pronounce and memorize.

suitable numbers for conversion

If a defined number of digits (or long integers) should be encoded, it might be best to only use the four list, possibly pruned to exactly 1000 or 100 entries. If chunks of two digits are to be encoded (00..99), an additional verification could be implemented by selecting the third digit based on a checksum; this would effectively reduce the amount of possible words by a factor of ten.

If real numbers must be encoded, with an absolute value smaller than a certain maximum, it might be best to first divide them by this maximum value, therefore converting them to the range from -0.9999... to 0.9999..., and then encoding just the fractional part by repeated multiplication by the MAX value of the list, modulo MAX calculation, and continuation with the remainder, until the desired precision is reached. (In other words, represent the number in the base made from the length of the wordlist.)

A negative sign can be indicated by prepending minus, which is not present in the word lists.

checksums

In case additional verification is needed, a checksum might be calculated and e.g its value modulo 23 be added to the encoding words, with the first 23 words of the ICAO/ITU alphabet (alfa..whiskey) corresponding to the values 0..23.

Excluding the words x-ray, yankee, zulu keeps the sequence of encoding words free from x, y, z at the beginning. This may be helpful for specific applications.


demo implementation for fractional numbers

The script numwords.sh takes as arguments either a list of words or a fractional number between 0 and 1, and converts them into the other type. The wordlists can be defined in the environment variable NUMBERWORDS.

The script uses the tool dc for arithmetics of arbitrary precision, and might therefore be difficult to understand. However, it simply implements conversion between fractional numbers of base 10 and of a base defined by the concatenation of the lengths of the used wordlists.

examples

With two words, one can encode about seven digits:

$ export NUMBERWORDS="Dict/five Dict/four"
$ ./numwords.sh

usage: ./numwords.sh [words|fractional number]
  will convert between words out of wordlists,
  and a fractional number (pattern /[.0-9]*/ i.e between 0 and 1)
  wordlists: Dict/five Dict/four
   (may be set with NUMBERWORDS from the environment)
  allowing for 7 encoded digits or less

$ ./numwords.sh 0.1234567
bidder foam
$ ./numwords.sh bidder foam
.1234563

In this case, only six digits are reliably encoded. The value given in the usage information is only a rough estimation: the effective precision depends on the lengths of the word lists and their combination.


geographical coordinates

motivation

The initial idea for the "numberwords" project is stolen from the project what3words, which attributes to every patch of 3m by 3m on Earth's surface a unique combination of three words, its "address".

Unfortunately, the conversion algorithm is proprietary, and although the company is promising that there will be always a free way to use the conversion facility and that the algorithm will be transferred to some other entity in case the company ceases its operations, this is not satisfying.

A free and open alternative is preferable, because that is the only future-proof way.

encoding of geographical coordinates

Different applications might require different resolution of geographical coordinates, therefore we propose a slightly other way of encoding them, instead of cutting Earth's surface into pieces of 9 square meters.

For the discussion below, angles will be expressed in degrees, with a full circle corresponding to 360 degrees.

Geographical latitude (90 degrees south to 90 degrees north) can be trivially projected onto a flat surface, because there is a linear correspondence between its value and the distance on Earth's surface from the equator or one of its poles to the point corresponding to the latitude: the arc length given by the latitude angle and the (mean) radius of the Earth.

Geographical longitude (180 degrees west to 180 degrees east) on the other hand is not linearly dependent on the distance between meridian zero (whatever that reference may be) and the meridian passing through the corresponding point on Earth's surface: a difference in longitude of one degree corresponds to about 111 km at the equator, but reduces towards zero with increasing northern or southern latitude.

simple linearisation

If simple linearisation is acceptable, the following conversion formula will be sufficient:

cLat = (90+Lat)/180
cLong = (180+Long)/360

where positive latitude is for the northern hemisphere, negative for the southern, and positive longitude is for the eastern hemisphere, negative for the western. This results in encoded values between 0 and 1 for both coordinates.

The inverse functions are trivial in this case:

Lat = 180*cLat-90
Long = 360*cLong-180

Although this linearisation may be acceptable for most cases, it is a waste of precision for higher latitudes. Therefore, one could convert the angle values of a pair of coordinates in dependence of each other. However, the gain is rarely worth the complexity of the needed calculations, therefore it has not been implemented here.

The resulting fractional numbers now can be converted to a sequence of words, as described above, with the number of words chosen freely according to the resolution needed.

demo implementation

In the subdirectory coordinates of this repository is a shell script coconv.sh with a reference implementation of the coordinate conversion.

The script is using dc for arithmetics with arbitrarily high precision, and might therefore not be too easy to understand. It is however just the dc implementation of the functions described above.

The word lists used for word conversion are chosen in the script at the beginning; they can also be specified with the environment variable NUMBERWORDS as noted in the script source and usage information. Currently, four words will be generated for a coordinate pair, from the wordlists five, four, five, four.

The combination of wordlists five and four results in a precision of 1/(1930*1001) or approximatively 5E-7. For latitude (with Earth's half circle of about 20'000 km), the absolute precision is close to 10 m. For longitude, it varies from about 21 m at the equator to 15 m at 45 degrees north or south, 10 m at 60 degrees, and tends to 0 m at the poles.

examples

Convert the coordinates of the Victoria Falls according to Openstreetmap (OSM) into a four word sequence:

$ ./coconv.sh -17.9246 25.8567
fulcrum sole motive pest

Convert the coordinates of the post office at Livingston Way in Victoria Falls:

$ ./coconv.sh -17.9270 25.8406
fulcrum skip motive mock

Convert the last result back into an URL for direct OSM display:

$ ./coconv.sh :osm fulcrum skip motive mock
http://openstreetmap.org/?mlat=-17.9270460&mlon=25.8404400

(about 10m south and 15m west from the initial coordinates)


(2015 Y.C.Bonetti)