Tweaking Drupal Search
I've recently been doing work extending and adapting Drupal's search module, and I thought I'd take the opportunity to show just how easy it is to tweak search module's indexing behaviour.
One of the issues I was asked to address is the default behaviour of search module with respect to hyphenated words. One might reasonably expect that searching for 'intuitive' or 'counter' would find a post containing the hyphenated word 'counter-intuitive'. However, search module will by default not return a match unless you type 'counter-intuitive', or 'counterintuitive'.
Reading through the code for search module, we find that embedded dots, underscores and hyphens are simply stripped out of words, to allow meaningful search behaviour for URLs and acronyms.
Fair enough - we'd like searches for 'F.B.I.' to match documents containing 'FBI', and vice-versa. But it is counter-intuitive that searching for the constituents of a hyphenated word won't necessarily find posts containing that word.
Fixing this seemed like it might involve a lot of work. I was happy to discover that the architecture of search module allowed me to enable the behaviour I wanted for hyphens, without breaking the nice default behaviour for such things as acronyms, with a half-dozen lines of code.
The key doing this is the search_preprocess() function, which invokes the hook_search_preprocess() hook in all modules that implement it. The hook takes a string (initially the text to be indexed) and returns a transformed version of the text.
Fortunately, this hook is invoked before hyphens and such are stripped out. It is possible to imagine many applications of this kind of transformation, but it is easy to see that we can use it to append the individual words constituting a hyphenated compound to the text.
Without further ado, here is an implementation of hook_search_preprocess(). Placing this into a Drupal application's custom module (e.g, example.module) gives the required behaviour. Note that the original words from the text are maintained, but new words obtained by splitting hyphenated compounds are merged in.
/**
* Implementation of hook_search_preprocess. Returns a string derived
* from the original text by splitting all hyphenated strings at the
* hyphen and appending these to the set of words of the original text.
*
* This preserves the original hypenated version of words.
*
* This uses PHP 'explode' to split the strings, hence may create
* differnent splittings of text than Drupal's core indexing mechanism.
* However, the set of words to be indexed is never diminished.
*
*/
function example_search_preprocess(&$text) {
$arr = split(' ', $text);
$newarr = $arr;
foreach ($arr as $word) {
$words = split('-', $word);
if (count($words) > 1) {
$newarr = array_merge($arr, $words);
}
}
$text = implode(' ', $newarr);
return $text;
}
Other uses for hook_search_preprocess() would be to introduce associated keywords which would match a given text. For example, one could maintain a dictionary of words which have different meanings between British and American English: trainer vs. sneaker, jumper vs. sweater, etc. Finding an instance of one in a text, one would also include the other. Another example would be to include grammatically related words: search, searching, searched, seek, seeing, sought, etc.













