Audrey Foo
2008
19
06

Database Migration Part II

Blog

In the first blog on Database Migration this post skimmed some info on starting the journey. This blog posts is a little drier, and more detailed, listing some major steps of database import and a few code samples. Shown below is my partner in crime for the second project, Harry. The current methods used are always open to new methods of improvement.


Image courtesy of Steve Krueger

Mapping tables

For all content imported, there is a mapping row that corresponds drupal's unique id to the original unique id of that content. These tables are very useful, when going back and tracking where the issue may have arrived from. Also, the mapping tables are useful when importing new data, to determine which nodes/terms/users not to re-import.

Depending on preference, each vocabulary or content type may be in its own mapping table, or if combining vocabularies and content types in the same mapping table, then the original table name is also stored.

The minimum columns in a mapping table needed are; an auto-increment primary key, drupal id (either nid, uid or tid), the old unique id. Other columns can be created as needed for quick reference and checking.

Connecting to old data source

The database import scripts used are using drupal functions, for manipulating node objects, user objects, etc. At the beginning of the PHP script, include the following:

include_once("includes/bootstrap.inc"); include_once("includes/common.inc"); drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

If the old data source is a different database type than the drupal setup, for example postgres, the db import scripts can connect directly to a postgres install, using pg_connect, pg_query, pg_fetch object, etc.

Otherwise the postgres dump (generated with correct parameters) can be converted to a mysqldump sometimes by manually altering the table definition, and running as is, as special functions are not needed for the purpose of importing data to the drupal database.

Ownership of content

If nodes to be inserted do not have an owner, usually a drupal 'archive' user is created, and nodes are created as this archive user.

Otherwise after users are imported, and the user mapping table populated, content then is imported referencing the user mapping table to save the node with appropriate uid.

Re-using import scripts

Populate an array with the old unique id of terms/nodes/users that already exist in the drupal system.

$sql = "SELECT class_id FROM map_terms WHERE tid != 0 AND vid = %d";
$result = db_query($sql, $vid);
while ($obj = db_fetch_object($result)) {
$current[$obj->class_id] = $obj->class_id;
}

Select all data rows from the original source table but skip the rows that have a mapped data item in drupal

while ($obj = pg_fetch_object($result)) {
if (! array_key_exists($obj->class_id, $current)) {
// Save taxonomy term with taxonomy_save_term and insert tid in term mapping table
// Alternatively save node using node_save() and insert nid in node mapping table
}
}

Importing custom content fields

Refer to Lullabot's article for an example of using drupal_execute and database view of cck is also a great resource.

Code for importing a simple node that only contains title information.

$n = new stdClass();
$n->uid = 36;
$n->name = 'archive';
$n->type = 'ppl_label';
if ($obj->active == 'Y') {
$n->status = 1;
}
$n->validated = 1;
$n->title = trim($obj->company_name);
node_save($n);

Installing devel module, and viewing devel load on a node is great for understanding how to structure data insertion for cck fields.

Harry discovered that for some fields, the following works $node->field_abc[0] = array('nid' => 1); but not $node->field_abc[0]['nid'] = 1.

Updating imagefields, involve adding a row to the files table, making use of basename(), filesize() functions, and determining mime type of the image. Then updating the node with $node->field_image[0] = array('fid' => 552); where 552 is the file id inserted.

Confirming drupal setup

While I would love to envision an agile world, I am not sure yet how database import can fit into the paradigm of agile methods. Database migration tends to be quite time intensive; in handling bulk data, downloading the latest drupal database, running scripts on local, checking test, making changes, re-importing latest drupal database, re-running scripts on local, running on live and checking.

When a customer decides to make a change to a content type, in comparison the development time may likely be much less to make the code change on local, refresh browser, test, commit to revision control. The steps to test data changes can be quite a bit more lengthy. If the data change is run as a hot fix, it still may require: downloading latest drupal database, running scripts on local, testing, fixing, re-importing drupal database, re-running scripts on local, committing script to version control, running on live and checking.

It is much easier, imo, to have a more complete and confirmed understanding with the customer to ensure minimum hot fixes are required, so that all the steps listed below can be run a minimum of times, containing a full understanding of content to be imported/modified to the drupal setup. I suggest face-to-face meeting, with possibly a signed confirmation that the content type setup, vocabulary, user profiles function as expected and contain the data that they expect to see.

List of steps to import content

Latest data dump
Need access to a regularly updated database dump
Images
Easiest to rsync or tarball latest images. Need access to customers server, or http
Simple non-dependent nodes
Relatively simple nodes, that are not dependent (node-reference fields) on other content types
Vocabulary terms
Insert terms into vocabularies previously created
Users
Profile fields or nodes created, depending on which modules used
User roles
Provide customer with list of likely candidates and confirm drupal roles
Other users
For example newsletter users, other modules that store user info in module specific tables
The major content that does not node-reference other content
Could contain many cck fields, including custom fields, imagefields, etc.
Other content that node-references content already created
Import content from 3rd party sources
For example, video meta data as needed.
Import content to other modules that need that data
Specific modules that might store some part of the data.
Path redirect
Path redirect module used to re-direct old urls to new drupal urls. For the new url, use the non-aliased drupal path, e.g. user/123 or node/234 rather than the pathauto generated url. So that if the customer changes their mind about the pathauto pattern, only the url_alias table needs to be modified not the path_redirect table.
$rid = db_next_id('{path_redirect}_rid');
$old_url = 'old-path';
$new_url = 'new-path';
db_query("INSERT INTO {path_redirect} (rid, path, redirect, type) VALUES (%d, '%s', '%s', '%s')", $rid, trim($old_url, '/'), trim($new_url, '/'), '301');
Pathauto
Using node_save(), user_save() should cause a url alias to be generated. However, sometimes bulk regeneration is required.
Search
Don't forget to setup cron to index content, so that testing of search functionality can begin, however, the downside is that the database starts to grow very large , and thus any further database import steps are slowed down.

Bonus! Converting vocabulary terms to nodes in a content type

There can be a time that after setting up drupal, and importing content there is a decision made to extend the functionality of a vocabulary, and thus terms need to be converted to nodes. Here is the code to do so.

$vid = 3;
$sql = "SELECT * FROM term_data WHERE vid = %d";
$result = db_query($sql, $vid);
while ($obj = db_fetch_object($result)) {
// Create the award node
$n = new stdClass();
$n->uid = 12; // Where 12 is the uid of a valid user
$n->name = 'archive'; // archive is the login of user with uid 12
$n->title = $obj->name;
$n->type = 'award';
$n->status = 1;
$n->validated = 1;
node_save($n);

// Reference all nodes that have the term to the new node
$sql2 = "SELECT nid FROM term_node WHERE tid = %d";
$result2 = db_query($sql2, $obj->tid);
while ($obj2 = db_fetch_object($result2)) {
$n2 = node_load($obj2->nid);
$n2->field_award[0]['nid'] = $n->nid;
node_save($n2);
}

//taxonomy_del_term($obj->tid);
}

After running the script, check ...

Count all nodes of type 'award' before running the script, and also show the maximum nid. For example, the count is 5, and the max nid is 123 SELECT COUNT(*) FROM node WHERE type LIKE 'award'; SELECT MAX(nid) FROM node WHERE type LIKE 'award'; Count the terms that belong to vocabulary id $vid. For example vid is 3 and the count is 100. SELECT COUNT(*) FROM term_data WHERE vid = 3; Count all nodes of type 'award' after running the script, using the maximum nid obtained before importing, show imported awards. For example, the count should be 105. SELECT COUNT(*) FROM node WHERE type LIKE 'award'; SELECT * FROM node WHERE type LIKE 'award' AND nid > 123; If all looks good, then delete the vocabulary and associated terms.
附件大小
harry_sing.jpg40.38 千字节

回复

此内容将保密,不会被其他人看见。
  • 行和段被自动切分。
  • Allowed HTML tags: <a> <p> <br> <i> <b> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Glossary terms will be automatically marked with links to their descriptions
  • 网页地址和电子邮件地址将会被自动转换为链接。

更多格式化选项信息