Welcome to the How FreeBMD District Aliasers Work page. This page should provide all the information necessary to participate as a district alias team member.
Roles involved in the district aliasing process
From time to time it will be necessary for you to contact other members of the district alias team by email.
During each round of edits email is the main means of communications between members of the district alias team.
For those performing lookups, whether from scan or from microfilm / microfiche at an LDS centre, the minimum email activities are:-
receiving a spreadsheet in .xls format as a mail attachment containing the entries which are to be looked up
after completing the extra columns in the spreadsheet, including indicating which corrections are to be submitted, emailing the spreadsheet to both the coordinator and the corrections collator
For those editing the district map files, the minimum email activities are:-
being informed of which letters are allocated to each team member for the up-coming round of edits
receiving allocated files as a mail message attachment as a zipped file
receiving aliases based on the scan lookups from the coordinator
sending aliases found in your files that need to be included in the files being edited by another team member and receiving them from those working on the other letters
returning your edited district map files and the 'dregs' files of those spellings that could not be aliases as a zipped file to the co-ordinator
the co-ordinator informing the deployment manager when all edited files have been tested and are ready for the start of the next build
For the page ranger(s) the email activities are:-
sending aliases found that need to be included in the files being edited by another team member
sending the dregs to the co-ordinator
For the testers the email activities are:-
receiving a group of standard districts that are to be checked for correct use of date and volume dependent aliases
sending aliases or corrections found to be necessary to the person allocated the appropriate letter file(s) for editing
where appropriate, sending Extras_from_testing.xls style spreadsheets to the co-ordinator
sending any problems found to the co-ordinator
For the corrections collator the minimum email activities are:-
receiving copies each month of the spreadsheets after completion by
the scan lookup team
Extras.xls style files from the lookup teams
Extras_from_testing.xls style files from the testers
extracting the corrections from the spreadsheets and reformatting them into a single spreadsheet and emailing it to the Corrections Co-Ordinator at the end of each round of aliasing.
If there are too many corrections in any round of aliasing, prioritizing the corrections so that those giving the most benefit are processed first
those affecting the surname, given names and page numbers are given highest priority
those that make the difference between being able to make an alias or not are are given second highest priority. For those rounds of aliasing where too many corrections are identified, the coordinator will identify these after the district map files have been tested and returned for the start of the next database update.
those that involve a correction to a volume number are given third highest priority
those that involve a substantial change to the spelling of the district are given higher priority than simple corrections such as changing an l to a t
those that are minor changes can be put aside until a later round of aliasing so that the Corrections Coordinator does not become overwhelmed by our activities
changes where the transcriber has correctly used the UCF characters will be given lowest priority for correction
receiving and feedback from the Corrections Co-Ordinator and emailing it within the aliasing team as appropriate
Nowadays there are usually only a handful of entries per round that require a scan lookup and so there is rarely a need to split the lookups between team members as described in the rest of this section.
Since Bob Philips is involved in a co-ordinator role for the John Slann syndicate, every effort will be made for the scan lookups of members of the John Slann syndicate to be allocated to him. If any other current or future members of the scan lookup teams have similar preferences please email the coordinator.
In the early days of scan lookups it was discovered that 1 individual was responsible for 5% of the errors that were targetted during that month. Further investigation showed that 94% of the entries in one of his files contained transcription errors. Diana volunteered and agreed with the transcriber's syndicate co-ordinator that she would re-transcribe his files herself. This way it is not necessary for him to be inundated with the corrections, as we find them one by one. His entries can either be aliased as normal or hidden by means of a special (looked up - hide) hidden pseudo-district that has been created specifically for this purpose in the lookups.txt file. Any further scan lookups for this individual will be allocated to Diana for her to provide the correct entry, but these corrections will not be forwarded at present.
The lists created in the file lookups.txt cannot be used until the database build has completed. It then takes a further few days before submissions queries are done using http://www.freebmd.org.uk/cgi/aliasing-tools.pl and then sorted using Excel by year. Where known those years which are not scanned and those entries submitted as parts of one-name studies are removed from the worksheet. Because aliases are made by spelling + volume combinations, this can also result in some of the other pre-selected lookups being a waste of time since even if they can be read they cannot be aliased. These are also removed from the worksheet.
The remaining entries are split by year or as a group of consecutive years into separate spreadsheets and emailed to a member of the scan lookup team. Where possible smaller files are allocated to those who are busy co-ordinating their syndicates and larger files are allocated to those who are members of the scan lookup team and also transcribers.
Since the way the scans to be looked up are identified does not match the way the scans are looked up it is not possible to create the aliases until all the scans have been looked up without added complexity and time, so it is important to return the scan lookups on time, and if you cannot to return some to the co-ordinator for reallocation to others if necessary.
The reason for working this way is that it is faster to find the files in the GUS structure this way and it ensures that multiple problems in the same file end up with the same person to look up, and minimizes the number of scan downloads that need to be done by the district aliasing team.
Allocation of letters to the district map editing team
If you have a preference for editing any particular letter then email the co-ordinator while the database build is in progress. When the build has progressed far enough (approx 24 hours after the start of the build), you will receive an email from the co-ordinator advising you of which portion of the ToBeAliased.txt is allocated to each person involved this month and who will be editing each part of the district map. The file ToBeAliased.txt will be downloaded from the page http://www.freebmd.org.uk/cgi/aliasing-tools.pl by the co-ordinator and split into portions. You will receive an individual email with an attachment of a zipped file containing the letters that you have been allocated. In it will be a portion of ToBeAliased.txt and the data files to be edited and returned, e.g. C.txt
The easy part of district aliasing is to edit the data file (e.g. C.txt) that has been sent to you by trying to find entries in your allocated portion of ToBeAliased.txt that correspond to the standard spellings of the recognized districts. To do this a specialized editor dmEdit has been written. It has it's own web page.
The much harder part is deciding whether a spelling should be aliased or not, and which form of aliasing should be used.
Not every unaliased spelling in the ToBeAliased.txt file needs to be aliased. Some of them are genuine place names, often listed as subdistricts, which may well have been treated as registration districts especially in the early years of aliasing. Sometimes it is necessary to use the district spelling together with the volume number in deciding which alias or aliases to create. If two or more possibilities exist, it may be possible to resolve them by means of the conditional alias mechanism described below.
Having decided which entries should be aliased the next decision is how to alias them. The first choice is to decide if the entry is an alternative spelling or if it is a spelling mistake. A spelling mistake is indicated by being preceded by a ! character.
Obvious alternative spellings
Since 1837 central index books have been compiled on a quarterly basis for each of births, marriages and deaths from the registrations made in a number of registration districts spread across England and Wales.
Within any given quarter there were a fixed number of registration districts and many of the district names were abbreviated, or even misspelt, as the central list was made. In some cases different alternative spellings were used, and this is most noticable in districts with long names containing several words.
For a more detailed discussion about the difference between an abbreviation and a spelling mistake, please refer to the district aliasers' page about Spelling.
There are a wide range of reasons why something is a spelling mistake, including
Typing problems
'fat fingering' eg j or jk instead of k; n instead of b - Nlackburn
'key-board bounce' eg aa instead of a - Blaackburn
'transposing' eg Balckburn for Blackburn
typing first names in the district column
typing volume number in the district column
Type-setting problems
e.g.
Kensingham
Tur6tt
Problems in reading which can result in either the uncertain character format being used or something incorrect being read instead. This problem is not limited to this transcription exercise but can also have been introduced as the index was compiled and as the type-set version was prepared
Transcription problems as the index was compiled
The first half of one entry being finished by the second half of another entry is a classic example of this
Incorrect information being recorded on the original certificate for a variety of reasons including
illiteracy
deaths being reported by non-family members, although hopefully that does not affect the registration district!
The problems shown above in red are candidates for being hidden using district aliasing.
All incomplete spellings require a ! immediately before the alias without any space or other character. Incomplete spellings should have been transcribed according to the 'uncertain character format', but in many cases this convention has not been followed. Among the more common de facto formats that have been observed are:-
single characters - usage of characters such as
*
?
.
whole entries - misuse of both square brackets and parentheses for surrounding an entire uncertain spelling e.g.
A subdistrict, and variant spellings and misspellings, should be aliased against the main registration district. A subdistrict should be treated as a valid alternative and not as a spelling mistake. The initial list of district equivalencies was compiled from the booklet St Catherine's House Districts by Ray Wiggins and http://www.ukbmd.org.uk/genuki/reg/.
Dealing with places other than districts and sub-districts
The GENUKI page http://www.ukbmd.org.uk/genuki/places/ may be useful in locating place names that are neither districts nor sub-districts. The original 1837 registration districts were identical to the districts covered by the Poor Law Unions, i.e. the groups (or unions) of parishes which administered the workhouses for the destitute. Many of them were reorganized within 18 months of the introduction of Civil Registration. For a list of these Poor Law Unions, you can consult http://www.workhouses.org.uk/England/UnionsEngland.shtml which is just one page of Peter Higginbotham's web-site on workhouses.
Deciding how to deal with a spelling that is not on the district list is subjective to a certain extent, and the options available are:-
use the % conditional alias mechanisms to alias those spellings from one volume to a certain standard district and those from another volume to a different standard district
choose to leave it unaliased (if in doubt, this is the preferred option) as part of the 'dregs' file and as a candidate for a scan lookup and leave it for the co-ordinator to work out what to do with it
where so little information remains in what has been transcribed that it is impossible to make an educated guess at what it could read, alias it as (illegible) to remove it from the list of spellings that need to be aliased after the next database build is completed.
Sometimes a spelling that has been partially conditionally aliased will keep reoccurring on the list of spellings to be looked up each month. This could be because
a new entry has been included in the latest database build that uses a volume number that has not been used before
the aliasing done in the previous rounds of aliasing did not cover all the cases where the spelling had been used. This is less likely to happen since the file ToBeAliased.txt is used as the starting point for creating aliases (Aug 2003)than it was when unk.txt was used, since now the volume number is available alongside the spelling when the aliases are created instead of just the spelling.
Use of the % conditional alias provides a mechanism for the same spelling in a file from the transcribers to be
treated as an alias of one district under a certain set of conditions (i.e. is given 1 district number)
or as an alias of another district under another set of conditions (i.e. is given a different district number).
These conditions can be based either upon a quarter + year, or on a volume number.
For date related aliasing the % occurs before the alias, and for volume related at the end of the alias.
For full details see the File Format page.
A volume dependent conditional alias should be used whenever the choice of making the alias includes the fact that it is the only spelling in the volume that matches the spelling.
Note: Before Barrie reduced the time needed to build the database by resizing the files used by MySQL we needed to keep the number of conditional aliases to a minimum, but we do not need to be limited by that constraint any longer.
Barrow upon Soar was a registration district when registration was introduced, and Barrow in Furness started in 1876. From then on both districts existed at the same time. So date dependent aliasing is not appropriate and volume dependent conditional aliases should be used where necessary.
In the early days of 1837 until mid 1876, there was only one Barrow, and clerks are likely to have used plain "Barrow", and only started to use "Barrow upon Soar" when the new Barrow in Furness came into use. It would be rare for Barrow in Furness ever to have been abbreviated "Barrow"
In consequence "Barrow" is aliased to Barrow upon Soar, and "Barrow%8e" to Barrow in Furness. Any examples of Barrow with a "wrong" volume will attach to Barrow upon Soar, probably correctly.
It is not always feasible to use aliases to be able to relate non-standard district spellings to values on the standard list.
Consider a theoretical entry of Loxdon. Most people will recognize it as being only one letter different from London. So looking through the standard list for London shows
1837 - 4Q1869, vol 2 then 1c, City of London
1837 - 4Q1869, vol 2 then 1c, East London
1837 - 4Q1869, vol 2 then 1c, West London
1870 onwards London City 1c
So if investigation shows that the entry for Loxdon was transcribed in 1870 or later then it can be aliases unconditionally to London City. But if the date was prior to 1870 there is no mechanism for aliasing it across City of London, East London and West London.
But there are less well known places that may also have been transcribed as Loxdon, and these can be discriminated by use of the volume information.
1837 onwards, vol 13 then 4b, Loddon
1837 onwards, vol 12 then 4a, Lexden
So, if there were many entries, both before and after 1870, across all of these volumes, the following could be used.
City of London |!%Loxdon%2|!%Loxdon%1c
London City |!%Loxdon%2|!%Loxdon%1c
Loddon |!Loxdon%13|!Loxdon%4b
Lexden|!Loxdon
So, by default Loxdon would be aliased to Lexden unless the volume number was 1c, 2, 4b or 13, when the conditional aliases would be used instead. For entries using volume 1c and 2 the date dependent aliasing mechanism would also be applied. Note that for entries prior to 1870 in the London area (volumes 1c and 2) that the aliases have been made to City of London and so any entries for East London or West London would show up there instead.
The volume numbers in use prior to 1852 may be transcribed in Roman form (e.g. XIII) from handwritten scans or in Arabic form (e.g. 13) from typed
scans. For aliasing purposes we use only the Arabic form. Thus as shown in Example 2 above
Loddon |!Loxdon%13 is the correct form
Loddon |!Loxdon%XIII would not work
This applies to any "valid" Roman number, as validated by the database update software. The way we make the aliases using Roman numbers has to correspond to the way that the database udpate software treats them.
If it is necessary to create aliases with invalid or incomplete Roman numbers (e.g. XIVI) or those with UCF (e.g. X*I) then these should be used as they are as
follows
Loddon |!Loxdon%XIVI
Loddon |!Loxdon%X*I
There is a special case with Roman numbers, namely IIII is converted to Arabic 4.
IIII isn't a particularly peculiar conversion, although it is probably the
most regularly encountered roman numeral which breaks the "rule" on never
having 4 of the same character together, and using subtraction instead.
IIII is commonly found on clocks, because it achieves a more balanced layout than IV.
Of course, real Romans were well known for breaking the "rules", and constructions such as VIIII rather than IX were seen.
For FreeBMD purposes though, only IIII is handled outwith normal roman numeral rules. The manner in which it is handled ( $roman=~s/IIII/IV/; ) or in plain English, replace every occurence of IIII by IV instead and then convert to Arabic, means that
all the following will be accepted as valid volume numbers;
There is no constraint that prevents roman numbers over 27, so a volume of XXXIIII (34) would be accepted. XXXXIIII (44) would
not, as we don't read translate XXXX as XL, and it fails roman validation.
Potential Problems Resulting from Incomplete Conditional Aliasing
Omitting to specify a default behaviour for an conditional alias can result in the same spelling appearing on the list of unknown spellings during a later build.
This can be avoided by always providing 1 alias without a % (the default behaviour) and adding to it conditional aliases to cater for the entries that have actually been transcribed. This technique is used regularly for aliasing spellings for which some are illegible unconditionally to (illegible), except for those cases where the volume number matches one of more volume numbers where a volume dependent conditional alias should be created for each volume number that can be unambiguously aliased.
Dealing with other ambiguities
All spellings that cannot be identified as being any of the standard aliases are considered by the coordinator on a case by case basis as being
recognizable as a valid place name, for example a region such as one of the old hundreds (e.g. Kerrier), or a Valley, or a county (e.g. Northumberland), or a town where the correct standard district is not known (e.g. Gosport). These are collected together in the file unaliasable.txt in the category (unaliasable - filter)
a candidate for a scan lookup in the file lookups.txt
unaliasable - sometimes the entry can only be narrowed down to a small group of districts, for example
(unaliasable - but Ashford)
(unaliasable - but a Ward)
(unaliasable - Berk/East/Hemel/West Hampstead)
(unaliasable - but London county)
(unaliasable - Mitford or Mutford)
(unaliasable - but Newcastle)
(unaliasable - but Newport)
(unaliasable - but Wellington)
or as being totally illegible when it is held as (illegible) in the file illegible.txt
This activity has been renamed District Alias Testing and has been moved to a later stage in the process.
At this stage the full details of how we can do this are not clear, as the new software which was developed to help in this area returns a <search timed out> message rather than the data that was expected.
Note: This is due to the size of the database.
Correcting the category from 'Variant' to 'Spelling mistake' and vice-versa can be done using dmEdit. Any alias that includes the use of an uncertain format character should be prefixed with a ! character. If any such aliases are found without a ! dmEdit displays these as warnings (with a pop-up window) in magenta.
Moving from one registration district to another.
This is the usual root cause of 'phantom duplicates'. If a spelling is moved from one letter to another, and the files are put back in the order that file with the addition is returned earlier than the file with the removal, a phantom duplicate will result.
This can be worked around by the co-ordinator removing all the districtmap data files [A-Z].txt from the test machine and only explicitly updating those files where the edits have been completed during this round of edits.
Note: If all files are not edited it is still necessary to complete the set by doing a cvs update to get the full set of files and repeat the testing prior to the start of the build, to test for conflicting duplicate entries between new aliases created during this round of edits and unchanged files from the previous round.
During the review process it may be useful to refer to a number of reports that are created towards the end of each database build. These can either be viewed (but slowly!) by use of the left mouse button or saved onto your own machine by using the right mouse button and doing 'Save Target As' or 'Save link as' (for Internet Explorer and Netscape respectively)
Note: The first two pages take a long time to load and display, even on broadband
district-map-all.html - list of all 'complete list of district' entries in the 1st column with their aliases shown in the 2nd column, with misspellings shown in red (approx 2.7Mb, but increasing by a few hundred kb after each build)
district-clash.html - list of all 'complete list of district' entries in the 1st column with the years for which entries exist in the database which do not match the volume numbers for the years, as defined in the districtmap data files [A-Z].txt (approx 780kb and growing)
district-page-reverse-map-index.html - needs to be used online; a page of links to the page range information for each of births, marriages and deaths for the years 1837 through 1901. Each file is organised by quarter, and within quarter by volume, within volume by page numer of the 1st occurence of a standard district, and shows the range of pages against which that district has been found. Unfortunately it is derived from data in the database, so it is distorted by the errors.
There is also a new page http://www.freebmd.org.uk/cgi/aliasing-tools.pl and a description of how these can be used will be added to this page shortly. The most important file is ToBeAliased.txt which replaces unk.txt as the file containing the spellings that are to be aliased during the round of district aliasing.
Returning files after editing
After the District Map Editor has completed editing the district map file it needs to be tested and returned to CVS where the complete set of district map files are tested together in time for the start of the next databse update.
District Map Editor
Check for errors within single file
Check for errors across own set of files
Zip and email files to co-ordinator
Co-ordinator
The co-ordinator will comply with the development process and treat the set of files as a piece of software.
Return files to CVS
Test for errors between files
How to request improvements to these web pages or the distributed files
If you have comments for changes or improvements to these pages please send an email to the co-ordinator.