hyphenation

Djun Kim
2007
09
10

Tweaking Drupal Search

Blog
created on Mon, 2007-10-08 22:51

I've recently been doing work extending and adapting Drupal's search module, and I thought I'd take the opportunity to show just how easy it is to tweak search module's indexing behaviour.

One of the issues I was asked to address is the default behaviour of search module with respect to hyphenated words. One might reasonably expect that searching for 'intuitive' or 'counter' would find a post containing the hyphenated word 'counter-intuitive'. However, search module will by default not return a match unless you type 'counter-intuitive', or 'counterintuitive'.

Reading through the code for search module, we find that embedded dots, underscores and hyphens are simply stripped out of words, to allow meaningful search behaviour for URLs and acronyms.

Fair enough - we'd like searches for 'F.B.I.' to match documents containing 'FBI', and vice-versa. But it is counter-intuitive that searching for the constituents of a hyphenated word won't necessarily find posts containing that word.

Fixing this seemed like it might involve a lot of work. I was happy to discover that the architecture of search module allowed me to enable the behaviour I wanted for hyphens, without breaking the nice default behaviour for such things as acronyms, with a half-dozen lines of code.

The key doing this is the search_preprocess() function, which invokes the hook_search_preprocess() hook in all modules that implement it. The hook takes a string (initially the text to be indexed) and returns a transformed version of the text. Fortunately, this hook is invoked before hyphens and such are stripped out. It is possible to imagine many applications of this kind of transformation, but it is easy to see that we can use it to append the individual words constituting a hyphenated compound to the text.

Syndicate content