Search
Search sprint goals
BlogFollowing Robert Douglass' lead, I figured I would jot down a few things I'd like to focus on for the next few days while in Minneapolis. No doubt this list will get thrown out the window tomorrow when we actually down together, but here is my attempt, posted from the YVR airport departure lounge at 9:00AM.
Distributed search and oauth
- Research and compare distributed search modules
- Document use cases for distributed search
Looking forward to the Drupal Search Sprint in Minneapolis
BlogThanks to sponsorship from OpenBand (see Boris' post on some of the other work we've done with OpenBand over the years), I will be heading off to Minneapolis during the second weekend in May (May 8-11) for what promises to be an intense, interesting, productive weekend. Robert Douglass has been hard at work in organizing the Drupal Search Design sprint that is taking place there. The goal of the sprint is to extend or redesign the Drupal search framework to support integrating engines such as SOLR, frameworks/strategies such as faceted search, distributed search and other services or agents involved in the production, consumption, or intermediation of information indexing and retrieval... in other words, everything imaginable to do with search.
Tweaking Drupal Search
BlogI've recently been doing work extending and adapting Drupal's search module, and I thought I'd take the opportunity to show just how easy it is to tweak search module's indexing behaviour.
One of the issues I was asked to address is the default behaviour of search module with respect to hyphenated words. One might reasonably expect that searching for 'intuitive' or 'counter' would find a post containing the hyphenated word 'counter-intuitive'. However, search module will by default not return a match unless you type 'counter-intuitive', or 'counterintuitive'.
Reading through the code for search module, we find that embedded dots, underscores and hyphens are simply stripped out of words, to allow meaningful search behaviour for URLs and acronyms.
Fair enough - we'd like searches for 'F.B.I.' to match documents containing 'FBI', and vice-versa. But it is counter-intuitive that searching for the constituents of a hyphenated word won't necessarily find posts containing that word.
Fixing this seemed like it might involve a lot of work. I was happy to discover that the architecture of search module allowed me to enable the behaviour I wanted for hyphens, without breaking the nice default behaviour for such things as acronyms, with a half-dozen lines of code.
The key doing this is the search_preprocess() function, which invokes the hook_search_preprocess() hook in all modules that implement it. The hook takes a string (initially the text to be indexed) and returns a transformed version of the text.
Fortunately, this hook is invoked before hyphens and such are stripped out. It is possible to imagine many applications of this kind of transformation, but it is easy to see that we can use it to append the individual words constituting a hyphenated compound to the text.
Drupal OpenSearch Aggregator
BlogI just committed a working version of my new OpenSearch Aggregator module to Drupal Contrib CVS.
OpenSearch is a standard by Amazon which allows you to share search results through RSS. The feeds are valid RSS, they just contain extra meta-data for searching. So, you can use OpenSearch with any RSS reader to set up feeds to track tags or keywords for example.
We also have an OpenSearch client module that provides these feeds, and I just updated it to send search relevance information along. So, you could set up 5 Drupal sites with OpenSearch module, and a sixth site with the OpenSearch aggregator. Now, you can search all 5 sites simultaneously, and get a single, ordered list of global results.
However, because OpenSearch is an open standard, it can be used for anything. Amazon's A9 search already offers media search for example. The possibilities really are endless.
The best part? The OpenSearch Aggregator presents its results through the normal search system. So, if you install the OpenSearch module on top of this, you automatically provide OpenSearch feeds for the aggregated search. In other words, Drupal is now a complete OpenSearch processing suite! There is no other CMS out there that can claim this.
More info is on the Drupal.org project page.
[cross-posted from acko.net]













