Migrating sites from one platform to another poses a number of problems. The content needs to be migrated, of course. That is a given. Migrating resources and entities is relatively straightforward in most cases. However, the fun begins when the validity of the links needs to be maintained after the migration has been completed.
Curveballs, wrinkles and headaches
Legacy content issues:
- External links may be broken. External resources change from time to time, leading to broken links.
- Internal links may be broken. This can be caused by incorrect markup or possibly through the deletion or removal of the target resource.
- 301 Redirects may have been set up to rectify the changing of links. In many cases, redirect upon redirect can pile up, leading to a nest of redirects to find the correct resource.
- Internal resources can be referred to in various ways, sometimes by an alias (path) or an identifier (node id). Resolving these links will require understanding the context in which they are operating.
- Editors may incorrectly enter links, sometimes including the fully qualified domain in the URL.
Migrating content will layer another set of issues on top of this:
- The identifier for the resources will most likely change as new objects are created in the system.
- The path for the resources will most likely change as the site’s IA will be different.
- New URLs will be minted and may need to be used as replacements for objects in the old system.
- New URLs will be minted in place of special paths in the legacy system, which may not be present in the new system
- In many cases, migrated content will not be needed in the new system, and this can lead to more broken links.
Finally, there are other concerns, such as:
- External bookmarks may be pointing to resources that no longer exist, requiring redirects to be in place.
- Changes in the URL structure will require a new set of redirects to be maintained, i.e. the “Google juice” needs to be retained.
- Legacy redirects may not be needed once links have been fixed.
This all adds up to quite a headache for the migrator who wishes to do a great job and reassure the client that the integrity of the old site will be maintained. The aim should be to improve the site so that many old issues can be healed.
An example
Recently Morpht has worked on a number of large migration projects where we have seen these problems across websites. It is not unusual to be managing migrations for hundreds of thousands of resources. The number of links to manage can also be hundreds of thousands. The source site may have tens of thousands of broken links. The migration process is about handling broken links and mapping the old links to the new destination.
Taking stock
One of the first things to do in a link migration project is to take stock of the size of the problem. Analysing a site with a tool such as Screaming Frog is a convenient tool to use. Most importantly, this will identify the following:
- External links which are broken
- Internal links which are broken
- The number of redirects that are being used.
Unfortunately, there is no magic cure for fixing broken links. For the most part, this is a task for the editors, and work should begin on the legacy site to rectify these broken links. It is best to start this work early as fixing the broken links will improve the current site immediately and ensure the new site is in a good place.
There will be cases where the same link is broken across many pages - possibly hundreds or thousands of times. In these cases, fixing the same link over and over is tedious work. In these egregious cases, the broken links can be fixed by automated means, as described below.
Automation to the rescue
Morpht has a “link fixer” utility that can run on the content in the newly migrated website. The utility can process all resources with HTML content, identify links and then work out the correct URL to use in place of the old one.
Utility
The utility runs as follows:
foreach content resource
foreach html content
foreach link in the content
fix_link
track the success or failure of the fix
report back the statistics on success or failure
Reporting back on the success rate of the process allows for continuous improvement as the process can be refined over time. For example, there may be 10% broken links on the first run. There may only be 0.1% of broken links on the final run.
Link fixer
The link fixer runs through the following steps:
Outcome
The iterative process described here offers several advantages. Firstly, it maintains link integrity in the legacy site while using mapping files to fix a large number of broken links automatically. This reduces the reliance on legacy redirects. Furthermore, it standardises link formatting, correcting editorial lapses. Lastly, it simplifies the generation of new redirects. Overall, this process streamlines the website migration with improved link functionality.
The end result is a newly migrated site that has been uplifted in quality. Double win!