Welcome to the FreeBMD District Aliasers' home page. Starting from this page you should find all the information necessary for you to understand what we are trying to do, the process which shows how we plan to do it, the software available to help us and any reference material that is available to help us decide what to do in unclear circumstances.
URLs of reference material listing districts, sub-districts and place names
Caution - Easily Confused Districts - Some similar names in the same volumes that cannot be separated using conditional aliasing if too many UCF characters are present. Confusion can also be called by Saint's names being used along with other districts.
Spelling - Deciding if an alias is misspelt or not - including a list of abbreviations of common words that are to be interpretted as correct alternative spellings - compiled by Philip Powell
Common Endings - Lists to save you working out for yourself if conditional aliasing can be used to separate spellings with common endings
File Format - Details of the file format of the files we are editing, and the matching algorithm which explains the treatment of the ? character and leading and trailing spaces. Written from emails from Dave Mayall.
Database Tables - Trying to understand the database tables and array used during district aliasing - from reading the Perl code
The files edited by the district aliasers are used during the phase of building a new database where the data from the transcribers is added into the database as one row in the table called Submissions. An extra number is added. This number links it to one row in the table Districts.
DistrictNumber added during database build
1. A.txt file is processed - Each line causes a new entry to be created in the table Districts
and it is automatically given the next number
2. Every alias to that district is entered in the table DistrictSynonyms with this number
3. The ! misspelling indicator is removed from the alias and replaced by a 1 in DistrictSynonyms
4. Every line in A.txt to Z.txt is treated in the same way
5. A new entry is added from the transcribed file - the district is looked up in the table
DistrictSynonyms to find a number which is added to Submissions.
6. If the district is unknown, it is added to Districts and DistrictSynonyms first to get the next
number, and then the new entry is added to Submissions
During searching the entries in the Districts field on the search page is the list of districts held in the table DistrictSynonyms. Although the DistrictName field is displayed to the user to pick from, the corresponding number in the DistrictSynonyms tables is picked and used by the search software. Every entry in the Submissions table that matches the other search criteria and also has this number is included in the search results. The information displayed in the search results is taken from the table Submissions but does not include the number. Consequently it uses the original spelling as it was transcribed.
2. Usage of DistrictNumber during search
e.g. Searching for William Williams in Castle Ward might return the following, assuming that aliases have been set up to Castle Ward for each of C Ward, Castle W. and ![COQ]Ward.
Williams William
Williams William
Williams William
Williams William
Castle Ward
C Ward
Castle W.
[COQ]Ward
10b
24
10b
1_b
123
45
678
__
If you are interested in learning a little more, there is a reference page that describes some of the tables in the database and how they are used.
There are two levels of problems that need to be avoided during aliasing.
what we produce must not cause the software used to build the database to fail
we must take a reasonable amount of care that we do not alias an entry to the wrong district
Syntax / Layout Errors
Since it takes several days to build the database, this software has been written assuming that no errors exist in the A.txt type of district map files. Through experience we know that the pitfalls that need to be avoided are
misplaced space characters
precautions need to taken every time a file is edited of passed from one machine to another that extra spaces characters are not introduced
when emailing files this may happen if the files are not zipped first
when editing the file with an editor
when editing the file uding dmEdit
? characters being included in the district map file
they can exist in the transcribed files, which are held in the database table Submissions, and should exist here if they exist in the original
they should not exist in the A.txt type of district map files, which give the spellings that appear in the database table DistrictSynonyms for all previously known about aliases of the standard districts
duplicate entries
the same spelling can be aliased against only one district, unless the % mechanism for conditional aliasing is used. Using the % is an advanced feature that is described on the File Format page.
typos being introduced
when typing or cutting and pasting into the dmEdit single alias entry field be careful not to introduce misplaced spaces
accidental multiple insertion of | ! and % characters into aliases that include space characters due to your own way of processing files to be input into dmEdit (e.g. in an editor be careful using 'find / replace' together with 'replace all')
Semantic / Meaning Errors
If we alias an entry to the wrong district this is equivalent to an archivist putting a document back in the wrong shelf section at a record office. When someone comes to look for the information later (using either a district or a county as part of their search) they will not find it where they expect to, and unless they find it by accident, it becomes lost.
So the rule of thumb here is if in doubt, do not alias.
At present there is no systematic activity (review) to look for incorrect aliases. Some will be spotted as new aliases are created, and others will be emailed to us by searchers. Over a period of time we may be able to draw up some guidelines on how to detect and avoid making these mistakes.
The first such example emailed to us by a transcriber who checked their own entries in 2002 was the case of a incorrect aliases
|!St.Thomas Birmingahm|!St.Thomas Birmingham|
which are both entries for Birmingham, Warwickshire being made to
St Thomas
which is in Exeter, Devon
and further investigation showing another incorrect alias of
|!St Thomas Westmr.|
Where to get your files
As an aliaser you will need the software to make the edits and check the format of the data file, dmEdit, and each month you will need a set of data files to edit and a list of unmatched spellings of districts compiled from the database that are the candidate list for being made district aliases.
How to get your files to edit
About 24 - 30 hours after the database update is started, the file ToBeAliased.txt is produced which contains spellings that need to be aliased during this round of aliasing. The co-ordinator decides which portions of this file are allocated to each aliaser. These are the spellings and corresponding volume numbers that are to be converted into either aliases or returned to the co-ordinator as 'dregs' that the aliaser cannot see how to alias.
The co-ordinator extracts the district map letter files e.g. K.txt from the CVS system, which is where the master copy is kept, and zips the corresponding files plus the allocated portion of ToBeAliased.txt and emails them to the aliaser. It is important to not use the files that you may have left over from a previous round of edits since it is possible that someone, most probably the co-ordinator but it could be anyone on the software team, has made a change to the file since you returned it at the end of a previous round of edits.
Where aliases are found that are for letters other than those allocated to the aliaser, they are emailed as cross-letter aliases to the appropriate person.
Getting and using dmEdit
A special district aliasing editor called dmEdit has been written for the aliasers by Ian Brooke, the author of WinBMD.
There is a separate page about how to use dmEdit from which it can be downloaded.
The co-ordinator looks through the 'dregs' files returned by the aliasers and from them decides which are worth looking up to see if they can be aliased, and which are not worthwhile looking up since they could not be aliased anyway based on what has been transcribed in the district and volume columns. The co-ordinator then uses the http://www.freebmd.org.uk/cgi/aliasing-tools.pl page to find the full information that has been transcribed for the entry.
Those that have been identified as worth looking up are split by contiguous years and allocated in spreadsheet format among those members of the team who are available for scan lookups during the round of aliasing.
When the completed spreadsheets are returned, the co-ordinator produces a cross-letter alias file for those that can be aliased and emails it to those who have been allocated the letter files to edit, as described in the previous section of this file. For those that cannot be aliased, the co-ordinator adds them to an appropriate list, e.g. (looked up - illegible), to stop the same spellings from appearing in the ToBeAliased.txt file during the following round of aliasing.
How Corrections are Collated and sent to the Corrections Co-ordinator