FreeBMDFreeBMD How District Aliasers Work

Welcome to the How FreeBMD District Aliasers Work page. This page should provide all the information necessary to participate as a district alias team member.

Roles involved in the district aliasing process

From time to time it will be necessary for you to contact other members of the district alias team by email.

If you need to email groups of people here are some email links that may be useful.

Former members of the district aliasing team are:-

Regular email activities

During each round of edits email is the main means of communications between members of the district alias team.

For those performing lookups, whether from scan or from microfilm / microfiche at an LDS centre, the minimum email activities are:-

  1. receiving a spreadsheet in .xls format as a mail attachment containing the entries which are to be looked up
  2. after completing the extra columns in the spreadsheet, including indicating which corrections are to be submitted, emailing the spreadsheet to both the coordinator and the corrections collator

For those editing the district map files, the minimum email activities are:-

  1. being informed of which letters are allocated to each team member for the up-coming round of edits
  2. receiving allocated files as a mail message attachment as a zipped file
  3. receiving aliases based on the scan lookups from the coordinator
  4. sending aliases found in your files that need to be included in the files being edited by another team member and receiving them from those working on the other letters
  5. returning your edited district map files and the 'dregs' files of those spellings that could not be aliases as a zipped file to the co-ordinator
  6. the co-ordinator informing the deployment manager when all edited files have been tested and are ready for the start of the next build

For the page ranger(s) the email activities are:-

  1. sending aliases found that need to be included in the files being edited by another team member
  2. sending the dregs to the co-ordinator

For the testers the email activities are:-

  1. receiving a group of standard districts that are to be checked for correct use of date and volume dependent aliases
  2. sending aliases or corrections found to be necessary to the person allocated the appropriate letter file(s) for editing
  3. where appropriate, sending Extras_from_testing.xls style spreadsheets to the co-ordinator
  4. sending any problems found to the co-ordinator

For the corrections collator the minimum email activities are:-

  1. receiving copies each month of the spreadsheets after completion by
    1. the scan lookup team
    2. Extras.xls style files from the lookup teams
    3. Extras_from_testing.xls style files from the testers
  2. extracting the corrections from the spreadsheets and reformatting them into a single spreadsheet and emailing it to the Corrections Co-Ordinator at the end of each round of aliasing.
    If there are too many corrections in any round of aliasing, prioritizing the corrections so that those giving the most benefit are processed first
    1. those affecting the surname, given names and page numbers are given highest priority
    2. those that make the difference between being able to make an alias or not are are given second highest priority. For those rounds of aliasing where too many corrections are identified, the coordinator will identify these after the district map files have been tested and returned for the start of the next database update.
    3. those that involve a correction to a volume number are given third highest priority
    4. those that involve a substantial change to the spelling of the district are given higher priority than simple corrections such as changing an l to a t
    5. those that are minor changes can be put aside until a later round of aliasing so that the Corrections Coordinator does not become overwhelmed by our activities
    6. changes where the transcriber has correctly used the UCF characters will be given lowest priority for correction
  3. receiving and feedback from the Corrections Co-Ordinator and emailing it within the aliasing team as appropriate

Allocation of scan lookups to team

Nowadays there are usually only a handful of entries per round that require a scan lookup and so there is rarely a need to split the lookups between team members as described in the rest of this section.

Since Bob Philips is involved in a co-ordinator role for the John Slann syndicate, every effort will be made for the scan lookups of members of the John Slann syndicate to be allocated to him. If any other current or future members of the scan lookup teams have similar preferences please email the coordinator.

In the early days of scan lookups it was discovered that 1 individual was responsible for 5% of the errors that were targetted during that month. Further investigation showed that 94% of the entries in one of his files contained transcription errors. Diana volunteered and agreed with the transcriber's syndicate co-ordinator that she would re-transcribe his files herself. This way it is not necessary for him to be inundated with the corrections, as we find them one by one. His entries can either be aliased as normal or hidden by means of a special (looked up - hide) hidden pseudo-district that has been created specifically for this purpose in the lookups.txt file. Any further scan lookups for this individual will be allocated to Diana for her to provide the correct entry, but these corrections will not be forwarded at present.

The lists created in the file lookups.txt cannot be used until the database build has completed. It then takes a further few days before submissions queries are done using http://www.freebmd.org.uk/cgi/aliasing-tools.pl and then sorted using Excel by year. Where known those years which are not scanned and those entries submitted as parts of one-name studies are removed from the worksheet. Because aliases are made by spelling + volume combinations, this can also result in some of the other pre-selected lookups being a waste of time since even if they can be read they cannot be aliased. These are also removed from the worksheet.

The remaining entries are split by year or as a group of consecutive years into separate spreadsheets and emailed to a member of the scan lookup team. Where possible smaller files are allocated to those who are busy co-ordinating their syndicates and larger files are allocated to those who are members of the scan lookup team and also transcribers.

Since the way the scans to be looked up are identified does not match the way the scans are looked up it is not possible to create the aliases until all the scans have been looked up without added complexity and time, so it is important to return the scan lookups on time, and if you cannot to return some to the co-ordinator for reallocation to others if necessary.

The reason for working this way is that it is faster to find the files in the GUS structure this way and it ensures that multiple problems in the same file end up with the same person to look up, and minimizes the number of scan downloads that need to be done by the district aliasing team.

Allocation of letters to the district map editing team

If you have a preference for editing any particular letter then email the co-ordinator while the database build is in progress. When the build has progressed far enough (approx 24 hours after the start of the build), you will receive an email from the co-ordinator advising you of which portion of the ToBeAliased.txt is allocated to each person involved this month and who will be editing each part of the district map. The file ToBeAliased.txt will be downloaded from the page http://www.freebmd.org.uk/cgi/aliasing-tools.pl by the co-ordinator and split into portions. You will receive an individual email with an attachment of a zipped file containing the letters that you have been allocated. In it will be a portion of ToBeAliased.txt and the data files to be edited and returned, e.g. C.txt

District aliasing tasks

The easy part of district aliasing is to edit the data file (e.g. C.txt) that has been sent to you by trying to find entries in your allocated portion of ToBeAliased.txt that correspond to the standard spellings of the recognized districts. To do this a specialized editor dmEdit has been written. It has it's own web page.

The much harder part is deciding whether a spelling should be aliased or not, and which form of aliasing should be used.

Deciding what to edit

Not every unaliased spelling in the ToBeAliased.txt file needs to be aliased. Some of them are genuine place names, often listed as subdistricts, which may well have been treated as registration districts especially in the early years of aliasing. Sometimes it is necessary to use the district spelling together with the volume number in deciding which alias or aliases to create. If two or more possibilities exist, it may be possible to resolve them by means of the conditional alias mechanism described below.

Having decided which entries should be aliased the next decision is how to alias them. The first choice is to decide if the entry is an alternative spelling or if it is a spelling mistake. A spelling mistake is indicated by being preceded by a ! character.

Obvious alternative spellings

Since 1837 central index books have been compiled on a quarterly basis for each of births, marriages and deaths from the registrations made in a number of registration districts spread across England and Wales.

Within any given quarter there were a fixed number of registration districts and many of the district names were abbreviated, or even misspelt, as the central list was made. In some cases different alternative spellings were used, and this is most noticable in districts with long names containing several words.

For a more detailed discussion about the difference between an abbreviation and a spelling mistake, please refer to the district aliasers' page about Spelling.

Obvious spelling mistakes

There are a wide range of reasons why something is a spelling mistake, including
  1. Typing problems
    1. 'fat fingering' eg j or jk instead of k; n instead of b - Nlackburn
    2. 'key-board bounce' eg aa instead of a - Blaackburn
    3. 'transposing' eg Balckburn for Blackburn
    4. typing first names in the district column
    5. typing volume number in the district column
  2. Type-setting problems
  3. Problems in reading which can result in either the uncertain character format being used or something incorrect being read instead. This problem is not limited to this transcription exercise but can also have been introduced as the index was compiled and as the type-set version was prepared
  4. Transcription problems as the index was compiled
  5. Incorrect information being recorded on the original certificate for a variety of reasons including
    1. illiteracy
    2. deaths being reported by non-family members, although hopefully that does not affect the registration district!
The problems shown above in red are candidates for being hidden using district aliasing.

Dealing with incomplete spellings

All incomplete spellings require a ! immediately before the alias without any space or other character. Incomplete spellings should have been transcribed according to the 'uncertain character format', but in many cases this convention has not been followed. Among the more common de facto formats that have been observed are:-

Dealing with sub-districts

A subdistrict, and variant spellings and misspellings, should be aliased against the main registration district. A subdistrict should be treated as a valid alternative and not as a spelling mistake. The initial list of district equivalencies was compiled from the booklet St Catherine's House Districts by Ray Wiggins and http://www.ukbmd.org.uk/genuki/reg/.

Dealing with places other than districts and sub-districts

The GENUKI page http://www.ukbmd.org.uk/genuki/places/ may be useful in locating place names that are neither districts nor sub-districts. The original 1837 registration districts were identical to the districts covered by the Poor Law Unions, i.e. the groups (or unions) of parishes which administered the workhouses for the destitute. Many of them were reorganized within 18 months of the introduction of Civil Registration. For a list of these Poor Law Unions, you can consult http://www.workhouses.org.uk/England/UnionsEngland.shtml which is just one page of Peter Higginbotham's web-site on workhouses.

Dealing with ambiguous spellings

Deciding how to deal with a spelling that is not on the district list is subjective to a certain extent, and the options available are:-
  1. use the % conditional alias mechanisms to alias those spellings from one volume to a certain standard district and those from another volume to a different standard district
  2. choose to leave it unaliased (if in doubt, this is the preferred option) as part of the 'dregs' file and as a candidate for a scan lookup and leave it for the co-ordinator to work out what to do with it
  3. where so little information remains in what has been transcribed that it is impossible to make an educated guess at what it could read, alias it as (illegible) to remove it from the list of spellings that need to be aliased after the next database build is completed.

Dealing with spellings that you've seen before

Sometimes a spelling that has been partially conditionally aliased will keep reoccurring on the list of spellings to be looked up each month. This could be because
  1. a new entry has been included in the latest database build that uses a volume number that has not been used before
  2. the aliasing done in the previous rounds of aliasing did not cover all the cases where the spelling had been used. This is less likely to happen since the file ToBeAliased.txt is used as the starting point for creating aliases (Aug 2003)than it was when unk.txt was used, since now the volume number is available alongside the spelling when the aliases are created instead of just the spelling.

Using the % conditional alias mechanisms

Use of the % conditional alias provides a mechanism for the same spelling in a file from the transcribers to be treated as an alias of one district under a certain set of conditions (i.e. is given 1 district number) or as an alias of another district under another set of conditions (i.e. is given a different district number).

These conditions can be based either upon a quarter + year, or on a volume number. For date related aliasing the % occurs before the alias, and for volume related at the end of the alias. For full details see the File Format page.

A volume dependent conditional alias should be used whenever the choice of making the alias includes the fact that it is the only spelling in the volume that matches the spelling.

Note: Before Barrie reduced the time needed to build the database by resizing the files used by MySQL we needed to keep the number of conditional aliases to a minimum, but we do not need to be limited by that constraint any longer.

Example 1 of Using % - Barrow upon Soar and Barrow in Furness

Barrow upon Soar was a registration district when registration was introduced, and Barrow in Furness started in 1876. From then on both districts existed at the same time. So date dependent aliasing is not appropriate and volume dependent conditional aliases should be used where necessary.

In the early days of 1837 until mid 1876, there was only one Barrow, and clerks are likely to have used plain "Barrow", and only started to use "Barrow upon Soar" when the new Barrow in Furness came into use. It would be rare for Barrow in Furness ever to have been abbreviated "Barrow"

In consequence "Barrow" is aliased to Barrow upon Soar, and "Barrow%8e" to Barrow in Furness. Any examples of Barrow with a "wrong" volume will attach to Barrow upon Soar, probably correctly.

Example 2 of Using % - Loxdon

It is not always feasible to use aliases to be able to relate non-standard district spellings to values on the standard list.

Consider a theoretical entry of Loxdon. Most people will recognize it as being only one letter different from London. So looking through the standard list for London shows

So if investigation shows that the entry for Loxdon was transcribed in 1870 or later then it can be aliases unconditionally to London City. But if the date was prior to 1870 there is no mechanism for aliasing it across City of London, East London and West London.

But there are less well known places that may also have been transcribed as Loxdon, and these can be discriminated by use of the volume information.

So, if there were many entries, both before and after 1870, across all of these volumes, the following could be used.
City of London |!%Loxdon%2|!%Loxdon%1c
London City |!%Loxdon%2|!%Loxdon%1c
Loddon |!Loxdon%13|!Loxdon%4b
Lexden|!Loxdon

So, by default Loxdon would be aliased to Lexden unless the volume number was 1c, 2, 4b or 13, when the conditional aliases would be used instead. For entries using volume 1c and 2 the date dependent aliasing mechanism would also be applied. Note that for entries prior to 1870 in the London area (volumes 1c and 2) that the aliases have been made to City of London and so any entries for East London or West London would show up there instead.

Roman Volume Numbers and Volume Numbers using UCF

The volume numbers in use prior to 1852 may be transcribed in Roman form (e.g. XIII) from handwritten scans or in Arabic form (e.g. 13) from typed scans. For aliasing purposes we use only the Arabic form. Thus as shown in Example 2 above
Loddon |!Loxdon%13 is the correct form
Loddon |!Loxdon%XIII would not work

This applies to any "valid" Roman number, as validated by the database update software. The way we make the aliases using Roman numbers has to correspond to the way that the database udpate software treats them. If it is necessary to create aliases with invalid or incomplete Roman numbers (e.g. XIVI) or those with UCF (e.g. X*I) then these should be used as they are as follows
Loddon |!Loxdon%XIVI
Loddon |!Loxdon%X*I

Special Case - Roman Number IIII is Converted to Arabic 4

There is a special case with Roman numbers, namely IIII is converted to Arabic 4. IIII isn't a particularly peculiar conversion, although it is probably the most regularly encountered roman numeral which breaks the "rule" on never having 4 of the same character together, and using subtraction instead. IIII is commonly found on clocks, because it achieves a more balanced layout than IV. Of course, real Romans were well known for breaking the "rules", and constructions such as VIIII rather than IX were seen.

For FreeBMD purposes though, only IIII is handled outwith normal roman numeral rules. The manner in which it is handled ( $roman=~s/IIII/IV/; ) or in plain English, replace every occurence of IIII by IV instead and then convert to Arabic, means that all the following will be accepted as valid volume numbers;

IIII (4)
VIIII (9)
XIIII (14)
XVIIII (19)
XXIIII (24)

There is no constraint that prevents roman numbers over 27, so a volume of XXXIIII (34) would be accepted. XXXXIIII (44) would not, as we don't read translate XXXX as XL, and it fails roman validation.

Potential Problems Resulting from Incomplete Conditional Aliasing

Omitting to specify a default behaviour for an conditional alias can result in the same spelling appearing on the list of unknown spellings during a later build.

This can be avoided by always providing 1 alias without a % (the default behaviour) and adding to it conditional aliases to cater for the entries that have actually been transcribed. This technique is used regularly for aliasing spellings for which some are illegible unconditionally to (illegible), except for those cases where the volume number matches one of more volume numbers where a volume dependent conditional alias should be created for each volume number that can be unambiguously aliased.

Dealing with other ambiguities

All spellings that cannot be identified as being any of the standard aliases are considered by the coordinator on a case by case basis as being

Reviewing the existing aliases

This activity has been renamed District Alias Testing and has been moved to a later stage in the process.

At this stage the full details of how we can do this are not clear, as the new software which was developed to help in this area returns a <search timed out> message rather than the data that was expected.

Note: This is due to the size of the database.

  1. Correcting the category from 'Variant' to 'Spelling mistake' and vice-versa can be done using dmEdit. Any alias that includes the use of an uncertain format character should be prefixed with a ! character. If any such aliases are found without a ! dmEdit displays these as warnings (with a pop-up window) in magenta.
  2. Moving from one registration district to another.
During the review process it may be useful to refer to a number of reports that are created towards the end of each database build. These can either be viewed (but slowly!) by use of the left mouse button or saved onto your own machine by using the right mouse button and doing 'Save Target As' or 'Save link as' (for Internet Explorer and Netscape respectively) There is also a new page http://www.freebmd.org.uk/cgi/aliasing-tools.pl and a description of how these can be used will be added to this page shortly. The most important file is ToBeAliased.txt which replaces unk.txt as the file containing the spellings that are to be aliased during the round of district aliasing.

Returning files after editing

After the District Map Editor has completed editing the district map file it needs to be tested and returned to CVS where the complete set of district map files are tested together in time for the start of the next databse update.

District Map Editor

  1. Check for errors within single file
  2. Check for errors across own set of files
  3. Zip and email files to co-ordinator

Co-ordinator

The co-ordinator will comply with the development process and treat the set of files as a piece of software.
  1. Return files to CVS
  2. Test for errors between files

How to request improvements to these web pages or the distributed files

If you have comments for changes or improvements to these pages please send an email to the co-ordinator.

District aliasers' home page | dmEdit page | File Format page

FreeBMD Main Page


Search engine, layout and database Copyright © 1998-2022 Free UK Genealogy CIO, a charity registered in England and Wales, Number 1167484.
We make no warranty whatsoever as to the accuracy orM completeness of the FreeBMD data.
Use of the FreeBMD website is conditional upon acceptance of the Terms and Conditions







Explore FreeBMD