Introduction
Hi everyone, Welcome to DrupalSouth and thanks for coming to my sessions. Today I am going to share my experience with content migration.
Let's start with some introductions of myself. I'm a Drupal developer at Morpht, doing back-end and front-end developer. Recently I have been focusing on web accessibility and content migration.
Migrate API in Drupal.It is a three-step process. Extract, transform and load. Extract is the process we get the data from our source. The raw data we're getting may not be in the right formats that works in Drupal, so that’s why we need the process plug-ins to transform the data into something that can be used in Drupal.
What can we do with migrations?
Once off migrations. Examples:
- Upgrade a site from Drupal 7 to 8.
- Rebuild a non-Drupal into Drupal and transfer the contents across.
- Bulk content update, like populating the data in a new field or transferring data from one content type to another.
Continues migrations. Examples:
- Getting weather information from the forecast on a regular basis.
- Getting the stock price from the share market.
- Distributing content into multiple satellite sites from a centralized content repository.
- Migration source
- Migrate API supports a wide range of data sources. CSV, XML, JSON, JSON API etc. We can also make direct SQL queries to the database if we have access to it.
Sometimes exporting data from the source is not possible, or the data structure is too complicated, or it is too hard to handle, if there is the case, we can do content scraping. There are a number of libraries that allow us to scrape content from a website.
What's good about content scraping? “What you see is what you get”. An image is an image, instead of getting some data in a token format that we will need further process to retrieve the data. It also works for Drupal or non Drupal sites.
The downside is however, the content we are getting is not well structured, and it is hard to deal with things like entity reference.
Migration examples
Let's say our client wants us to migrate a simple blog site like the one below.
What are we dealing with?
From the front page, we can see there's a title, a publication date, a summary field in each article.
Inside the article, there are tags, HTML body, internal links, and PDF downloads, etc.
There are also other hidden elements, like the publishing status, menu, URL aliases, 301 redirects, if we want to transfer the traffic from the old path to a new one.
What are the challenges?
Let's look at the date, “26 November 2019” is the date format we're getting from the source, but in order to import into a date field in Drupal. “2019-11-26” is the format we need.
For taxonomy. A list of links or a list of tags could be what we are getting from the source, but in the Drupal world, we need to first create the terms and handle the field with IDs.
The same things for images, how do we download the image? Where to save it? Likewise we will need to create a file entity and deal with the file ID in migration.
What tools are available?
There are a number of plugins in migration and below are some handy ones.
Strings and arrays
Concat
It is a good choice to combine number of fields into a single string. In this example we are converting an address field with many sub fields into one single address field.
Migrate
process: field_address: plugin: concat source: - stree - suburb - state - postcode delimiter: ,
Source
{ "stree": "1 Davey St", "suburb": "Hobart", "state": "TAS", "postcode": "7000" }
Output
"1 Davey St, Hobart, TAS, 7000";
Explode
The explode plugin allows us to break up a single string into an array.
Migrate
process: field_address: plugin: explode source: address delimiter: ,
Source
{ "address": "1 Davey St, Hobart, TAS" }
Output
["1 Davey St", "Hobart", "TAS", "7000"]
Substr
Substring plugin allows us to get a segment of a string to the output. In this example we are getting only the state field from the string of an address.
Migrate
process: field_state: plugin: substr source: address start: -3 length: 3
Source
{ "address": "1 Davey St, Hobart, TAS" }
Output
"TAS"
These plugins are all familiars because they are like the PHP functions.
Default value
In some cases, When the data is not available in the source, we can use the default value. In this example, we are setting the UID to 12 for all the articles.
Migrate
process: title: title body: body uid: plugin: default_value default_value: 12
Source
{ "title": "Article 1", "body": "This is a good article." }
Output
Node author = User ID 12
Static map
Static map allows us to create a one-to-one mapping from one set of data to another set of data. In Drupal 7, user roles are stored in UID and we can do a mapping on the role machine names for that to work in Drupal 8.
Migrate
process: roles: plugin: static_map source: rids map: 3: administrator 4: moderator 5: editorial_board 6: site_architect
Source
{ "rids": [3, 5, 6] }
Output
User roles: Administrator Editorial board Site architect
Format date
The date formats we are getting from source may not be matching the format that is used in data import. We can use the format date plug-in to transform date formats.
Migrate
process: field_date: plugin: format_date from_format: 'j F Y' to_format: 'Y-m-d' source: date
Source
{ "date": "26 November 2019" }
Output
2019-11-29
Entity reference
Entity Reference can be simple if we have the target ID in souce, which is possible if we are doing Drupal to Drupal migrations.
Source
{ "tids": [14, 15, 17, 32] }
Migrate
process: field_tags: tids
Entity lookup
In most cases, we are getting a list of names in the source and we can use Entity Lookup for it. It takes the entity name and looks it up to work out the ID of the entity.
Source
{ "tags": ["tag 1", "tag 2", "tag 3"] }
Migrate
process: field_tags: plugin: entity_lookup source: tags value_key: name entity_type: taxonomy_term bundle: tags
Entity generate
If there are new items. The “Entity generate” plugin can create new terms on the fly when migrating the term field.
Notes from Quietone: Entity generate plugin creates entities that are not in our migration map. The entities created cannot be rolled back, or be found in lookups. This plugin is useful but it is not following the rule of ETL process.
Source
{ "tags": [ "A new tag", "tag 1"] }
Migrate
process: field_tags: plugin: entity_generate source: tags value_key: name entity_type: taxonomy_term bundle: tags
Migration lookup
Migration Lookup is another awesome plugin, it allows us to reference an entity that has been created from another migration.
In this example we have a list of users which were originally imported. Then, when we import the articles, we are able to reference a user that was created in the first migration.
Migrate
process: uid: plugin: migration_lookup migration: users source: author
Source (users)
{ "id": 1, "first name": "Peter", "last name": "Smith", }
Source (articles)
{ "title": "Article 1", "body": "This is a good article...", "author": 1, }
Files
Download
The Files Download plugin allows us to grab a file from remote and save it into a destination. Then we can use the migration lookup to populate the file ID into a file field.
Migrate
process: filename: filename filemime: filemime status: plugin: default_value default_value: 1 uri: plugin: download source: - file_source - file_destination
Source
{ "id": "1", "filename": "file1.pdf", "filemime": "application/pdf", "file_source": "http://example.com/file1.pdf", "file_destination": "public://documents/file1.pdf" }
File import
The file import plugin provides a much simpler way to handle file migrations. It combines the creation of the file entity as well as the migration locked up in one go.
We can see the source required for the import is much simpler, and all we need is the URL of the file.
Migrate
process: field_file: plugin: file_import source: file
Source
{ "file": "http://example.com/file1.pdf" }
Image import
It’s the same thing for images. We can use the Image Import plugin to import images.
Migrate
process: field_image: plugin: image_import source: image destination: plugin: default_value default_value: "public://images/" title: image_title alt: !title
Source
{ "image": "https://example.com/logo.png", "image_title": "Logo" }
URL
URL redirect
If we need to handle URL redirect or alias. Redirects are entities in Drupal 8. So we can do a simple migration with the destination plugin set to entity:redirect.
Migrate
process: redirect_source: old_url redirect_redirect: new_url status_code: plugin: default_value default_value: 301 ... destination: plugin: 'entity:redirect'
Source
{ "old_url": "old-url", "new_url": "internal:/node/54" }
Output
301 redirect from old-url to /node/54
URL alias
The same for aliases. We have a destination plugin to handle URL aliases, that will give our content a SEO friendly URL.
Migrate
process: source: source alias: alias langcode: plugin: default_value default_value: 'en' destination: plugin: url_alias
Source
{ "source": "/node/5", "alias": "/article/good-article" }
Output
Set alias /article/good-article to node/5
More
Callback
Callback plugins allow us to use PHP functions to process our data. In this example, we are using a function from Drupal core to convert line breaks into P tags.
Migrate
process: body/value: plugin: callback callable: _filter_autop source: body
Source
{ "body": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur aliquet quam id dui posuere blandit ... " }
Output
Converts line breaks into <p> and <br>
That is the same function we are calling in the input filter.
Pipeline
Pipeline itself is not a plugin but it allows us to run a number of plugins sequentially. In this example, after we have done the input text filter, we are running a string replace plugin to fix typos.
Migrate
process: body/value: - plugin: callback callable: _filter_autop source: body - plugin: str_replace search: ["typo 1", "typo 2", ...] replace: [ "correction 1", "correction 2", ...
Source
{ "body": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur aliquet quam id dui posuere blandit ... " }
Output
Body text with strings replaced.
Custom plugins
If none of the plugins available can handle our data. We can also create our own custom plugin.
class TransformValue extends ProcessPluginBase { /** * {@inheritdoc} */ public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) { return strrev($value); } }
Reference: https://www.drupal.org/docs/8/api/migrate-api/migrate-process/writing-a…
List of core Migrate process plugins
There are a lot of process plugins available. I have just covered a few of those as highlighted.
|
|
Process plugins by Migrate Plus
And there are more from contrib modules.
|
|
Useful links
Migrate process overview
https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/m…
List of core Migrate process plugins
https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/l…;
List of process plugins provided by Migrate Plus
https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/l…;
Writing a process plugin
https://www.drupal.org/docs/8/api/migrate-api/migrate-process/writing-a…;