Data migration for the Book module

So you have been assigned the task of migrating Book Pages from Drupal 7 to Drupal 9. You've migrated your content OK but now you need to bring across the book structure. This article walks you through how this can be achieved.

30 November 2022

Firstly, it is ok to wonder why you are being punished, or what you did wrong on your last project that got you relegated to this task. Take the appropriate time for self-reflection before moving on and getting this task done.

Secondly, any time spent looking at the book API is not going to give you the desired result.
For example, the 'magic' code snippet that made this migration possible is not documented anywhere.

And finally, there is precious little information on the web on how this can be easily achieved.

In this article I will show you how I did it on a recent migration project.

Assumptions

I'll assume that you have migrated all your content including the actual book pages from Drupal 7 to Drupal 9.
The typical way you would do this is with the Migrate module. It is a fairly straight forward process of mapping the book pages into Drupal 9 as you would do for any typical migration.

At this point you will have the content in your new Drupal 9 site. The main problem is that the hierarchical structure of the book pages will not have come across. Drupal stores this information in a separate table and so this data will not have been migrated.

I will also assume that you have the original node ID of the legacy data stored in the newly created content. This will be used to look up the correct node in the new Drupal 9 site.

Export your book hierarchy from Drupal 7

Upon reflection, this took a lot more effort that I expected. There is no easy way to do this as Drupal 7 uses mlid and plid for all of its data including the fields p1 to p9, whereas Drupal 9 uses NIDs.

Below is the SQL query that was used to gather all of the data from the menu_links table and convert those pesky plid's and mlid's into nid's. It also orders items in a way that you can be sure a parent book page is in the hierarchy before you try to add on a child book page. The COALESCE function is used to make sure we get 0 for a field instead of NULL in the output.

SELECT
n.nid,
ml.link_title AS title,
n.type AS node_type,
b.bid,
ml.weight,
ml.has_children,
ml.depth,
COALESCE(REPLACE(ml.link_path,  'node/', ''), 0) AS pid,
COALESCE(REPLACE(ml1.link_path, 'node/', ''), 0) AS p1,
COALESCE(REPLACE(ml2.link_path, 'node/', ''), 0) AS p2,
COALESCE(REPLACE(ml3.link_path, 'node/', ''), 0) AS p3,
COALESCE(REPLACE(ml4.link_path, 'node/', ''), 0) AS p4,
COALESCE(REPLACE(ml5.link_path, 'node/', ''), 0) AS p5,
COALESCE(REPLACE(ml6.link_path, 'node/', ''), 0) AS p6,
COALESCE(REPLACE(ml7.link_path, 'node/', ''), 0) AS p7,
COALESCE(REPLACE(ml8.link_path, 'node/', ''), 0) AS p8,
COALESCE(REPLACE(ml9.link_path, 'node/', ''), 0) AS p9
FROM book b
INNER JOIN node n ON b.nid=n.nid
LEFT JOIN menu_links ml  ON ml.mlid=b.mlid
LEFT JOIN menu_links ml1 ON ml.p1=ml1.mlid
LEFT JOIN menu_links ml2 ON ml.p2=ml2.mlid
LEFT JOIN menu_links ml3 ON ml.p3=ml3.mlid
LEFT JOIN menu_links ml4 ON ml.p4=ml4.mlid
LEFT JOIN menu_links ml5 ON ml.p5=ml5.mlid
LEFT JOIN menu_links ml6 ON ml.p6=ml6.mlid
LEFT JOIN menu_links ml7 ON ml.p7=ml7.mlid
LEFT JOIN menu_links ml8 ON ml.p8=ml8.mlid
LEFT JOIN menu_links ml9 ON ml.p9=ml9.mlid
WHERE ml.menu_name LIKE 'book-toc-%'
ORDER BY ml.depth ASC,
ml.p1 ASC,
ml.p2 ASC,
ml.p3 ASC,
ml.p4 ASC,
ml.p5 ASC,
ml.p6 ASC,
ml.p7 ASC,
ml.p8 ASC,
ml.p9 ASC

Hint 1: If you only want to export published pages in your hierarchy, you can add the following to the WHERE clause 'AND n.status = 1'

Hint 2: If you only want to export books of certain content types you can add the following to the WHERE clause 'AND n.type IN ('type1', 'type2', etc)' or if there are more inclusions that exclusions, you could use 'AND n.type NOT IN ('type1', 'type2', etc)'

Import your book hierarchy into Drupal 9

Import all of your content data into the D9 site. Importing the book hierarchy is really a post-import task.

You will need a custom script to get the data out of your CSV, SQL or whatever output format you chose and import this into the Book hierarchy.

use Drupal\book\BookManagerInterface;
use Drupal\node\NodeInterface;

/* Change the following location to point to the CSV file to be used for this Import. */
$book_structure_csv_filename = 'location_of_csv_file';

// Load date from your csv file.
$rows = array_map('str_getcsv', file($book_structure_csv_filename));
$header = array_shift($rows);
foreach ($rows as $row) {
  processBookMapping(array_combine($header, $row));
}

function processBookMapping(array $item) {
  if (empty($item['nid'])) {
    // Nid must have a value, there is something wrong with the CSV data.
    // Display a message and skip this item.
    print "ERROR: " . $item['nid'] . " cannot be empty or 0. Check your CSV file for errors.\n";
  }
  elseif (empty($item['bid'])) {
    // Bid must have a value, there is something wrong with the CSV data.
    // Display a message and skip this item.
    print "ERROR: " . $item['bid'] . " cannot be empty or 0. Check your CSV file for errors.\n";
  }
  elseif (!($d9_nid = lookup($item['nid']))) {
    // Confirm that we can identify the D9 nid from the D7 nid.
    // Otherwise display a message and skip this item.
    print "Notice: " . $item['nid'] . " old D7 Node has no corresponding node on D9 site\n";
  }
  elseif (!($d9_bid = lookup($item['bid']))) {
    // Confirm that we can identify the top-level D9 book node from the D7 bid.
    // Otherwise display a message and skip this item.
    //
    // Note that pid can be 0 so there is no need to check for that value.
    print "Notice: " . $item['bid'] . " old D7 Book has no corresponding node on D9 site\n";
  }
  elseif (!($content = \Drupal::entityTypeManager()->getStorage('node')->load($d9_nid))) {
    // Load the D9 node for processing.
    // If node cannot be loaded, display a message and skip this item.
    print "Notice: Cannot load D9 node " . $d9_nid . "\n";
  }
  else {
    // Now we know that the relevant nodes exist in D9,
    // lets process the remaining CSV data.
    $d9_pid = lookup($item['pid']);
    $d9_p1 = $d9_bid;
    $d9_p2 = lookup($item['p2']);
    $d9_p3 = lookup($item['p3']);
    $d9_p4 = lookup($item['p4']);
    $d9_p5 = lookup($item['p5']);
    $d9_p6 = lookup($item['p6']);
    $d9_p7 = lookup($item['p7']);
    $d9_p8 = lookup($item['p8']);
    $d9_p9 = lookup($item['p9']);
    $weight = $item['weight'];
    $depth = $item['depth'];
    $has_children = $item['has_children'];

    // Create the hierarchy for a book page in the book table.
    $book_link = [
      'nid' => $d9_nid,
      'bid' => $d9_bid,
      'pid' => $d9_pid,
      'weight' => $weight,
    ];

    $parents = [
      'p1' => $d9_p1,
      'p2' => $d9_p2,
      'p3' => $d9_p3,
      'p4' => $d9_p4,
      'p5' => $d9_p5,
      'p6' => $d9_p6,
      'p7' => $d9_p7,
      'p8' => $d9_p8,
      'p9' => $d9_p9,
      'depth' => $depth,
      'has_children' => $has_children,
    ];

    /** @var \Drupal\book\BookOutlineStorageInterface $book_outline_storage */
    $book_outline_storage = \Drupal::service('book.outline_storage');
    $book_outline_storage->insert($book_link, $parents);

    // Update the node with the book data.
    // This is required to get the correct structure at /admin/structure/book
    //
    // If you have required fields in your D9 but they are not populated, save will fail.
    // And you will get array errors when trying to save a node post-import
    // at /node/xxxx/outline
    // Hence, this error message will give you a heads-up for any impending issues.

    $content->book['nid'] = $d9_nid;
    $content->book['bid'] = $d9_bid;
    $content->book['pid'] = $d9_pid;
    $content->book['has_children'] = $has_children;
    $content->book['weight'] = $weight;
    $content->book['depth'] = $depth;
    $content->book['p1'] = $d9_p1;
    $content->book['p2'] = $d9_p2;
    $content->book['p3'] = $d9_p3;
    $content->book['p4'] = $d9_p4;
    $content->book['p5'] = $d9_p5;
    $content->book['p6'] = $d9_p6;
    $content->book['p7'] = $d9_p7;
    $content->book['p8'] = $d9_p8;
    $content->book['p9'] = $d9_p9;
    $content->book['link_path'] = 'node/' . $d9_nid;
    $content->book['link_title'] = $item['title'];
    try {
      $content->save();
    } catch (Exception $e) {
      print "ERROR: Unable to save node - " .  $e->getMessage() . "\n";
    }
  }
}

function lookup($d7_nid) {
  if (empty($d7_nid)) {
    return 0;
  }
  else {
    $database = \Drupal::database();
    $query = $database->query("SELECT entity_id FROM {node__field_migration_internal_path} WHERE field_migration_internal_path_value = :value",
    [
      ':value' => 'node/' . $d7_nid,
    ]);
    if ($result = $query->fetchAll()) {
      return $result[0]->entity_id;
    }
    else {
      return 0;
    }
  }
}

How it works

The first part of the code retrieves data from the CSV file and processes each row.

The code in the lookup() function translates the D7 nids into D9 nids using a simple SQL lookup.
For example, node/123 in your D7 site may be node/4444 in your D9 site, so we use the lookup function to find the D9 nid based on our D7 nid. Feel free to tweak this bit to match your own field names that capture the D7 data.

The various if/elseif statements check to see that we have all the data we need to process the CSV row, otherwise it displays a message on-screen and skips processing of that item. Failed items can be written to dblog or suitable CSV files, however I found that getting instant feedback on the screen saved time and effort during development and testing.

The code under // Create the hierarchy for a book page in the book table creates the structure of your book pages. If you were to test and use this part of the code only, then after execution you can go to /admin/structure/book and will see that all of your books have been created. However, if you then click on the 'Edit order and titles' button next to any of your imported books, you will see a page without any data.

The code under // Update the node with the book data updates the node with all of the correct book data.
After this part of the code is executed, you can go back to /admin/structure/book click on the 'Edit order and titles' button next to any of your imported books and see all of your data and its structure.

You should also be able view the book page at /node/xxxx/outline and then save it. If you get an array error on save then look at my note regarding required fields in the code.

Click around and confirm that your data/structure is correct and choose various books to compare between the D7 and D9 sites as part of your testing/QA, but that’s pretty much it.

Job done. Feel free to now reflect on your success.

Conclusion

So long as you have exported your data from D7 and can track various nid’s, pid's and bid’s in D9 this should now be a straight-forward task.

I hope this helps.

About

Drupal Migration

Technical Project Manager

Jim Koutsouris

Jim worked as a technical project manager for Morpht.