AJAX-SearchSuggest in Mediawiki

In a v1.15.1-mediawiki I had the following problem:

The searchsuggest-extension didn’t function properly. The suggestions only showed up for pagetitles with only capital letters at the beginning of the title. In another wiki whose MW-installation was an exact copy of this wiki the extension worked like a charme. quite curious.

To narrow the problem we made the wiki to show us the exact sql-query it uses to get the searchsuggestions. To do so we edited the file /mediawiki/includes/db/Database.php. After line 1025 in the function “select” we temporarily inserted an “echo $sql“:

function select( $table, $vars, $conds='', $fname = 'Database::select', $options = array(), $join_conds = array() )
		$sql = $this->selectSQLText( $table, $vars, $conds, $fname, $options, $join_conds );
		echo $sql;
		return $this->query( $sql, $fname );

After this the query showed up in the suggestbox and we got this:

SELECT  page_title  FROM `page`
WHERE page_namespace = '0' AND (UPPER(page_title) LIKE '%CPU%')

After some playing around with this query directly on the database it turned out, that the “UPPER(page_title)” didn’t work. The query only gave results when the pagetitle was already in capital letters – at least the part which matched the searchstring.

Comparing this db to the db of the properly working wiki, we found that the working db had been set up with “DEFAULT CHARSET=utf8” while the malfunctioning one had “DEFAULT CHARSET=binary“. So the pagetitle in the latter one was stored just as plain binary-code with no information about the encoding charset. Without this info the db is not able to find a matching uppercase letter to a given lower one.

Knowing this I wondered why Mediawiki uses the “binary”-setting as the default since some time now – because this makes all string-functions in the db useless. In earlier versions MW didn’t seem to make a fuss about charset-encodings and just used the mysql-default which was, and still is, “latin1”. So on older versions of MW, and newer ones with an explicitly set charset, the “searchsuggest” should work without any problems.

But what about the ones with “binary-encoding” enabled?

On the extension’s webpage you read this:

This extension is obsolete!
It has been replaced by core functionality in the MediaWiki software
(which was added in version 1.13.0).
See Manual:$wgEnableMWSuggest for the new core feature.

So you don’t need the extension anymore. Just put the named parameter “$wgEnableMWSuggest = TRUE;” in your LocalSettings.php and off you go.

I think this new suggestbox is even more ergonomical to use than the older one and also doesn’t blow up the layout like old SearchSuggest. But it has one discomfort: The search is casesensitive. To get around this you have to install the TitleKey-extension, which adds another table to your db. This additional table holds all the page_titles in uppercase letters to circumvent the problem with non-working UPPER/LOWER-db-functions when using “binary”-charset. TitleKey turns the searchstring into uppercase via PHP and compares it against this table – and this way makes the search case-insensitive.

Doing a “contains”-search

The normal behaviour of the searchsuggest is to match the searchstring against the beginning of the pagetitles – getting all pages that start with “searchstring”.
To get all pages that contain the searchstring somewhere in the title one could easily modify the TitleKey-Extension. To do that open the file “/extensions/TitleKey/TitleKey_body.php” and in the “static function prefixSearch” (line 188 or so) change the line ~202 like this:

old: 'tk_key LIKE \'' . $dbr->escapeLike( $key ) . '%\'',

new: 'tk_key LIKE \'%' . $dbr->escapeLike( $key ) . '%\'',

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: