What Jessie Did Next...

...being the inane ramblings of a mundane Yorkshire bird.

Tag: programming

I finally got sick of my Apple Airport Express network not quite behaving, and having to jump through hoops to play from the big fileserver holding my MP3 collection, and having to deal with sodding iTunes. My current workplace has a pair of Sonos units which are used in the office as a jukebox for the staff to play whatever music they want, and that seemed to tick most of my boxes.

Continue reading the full article…

I finally got sick of the blog spam under my very hacked up version of Nucleus, so I’ve shifted myself to using WordPress. Apologies if the RSS feed has just given you the past 10 articles as being ‘new’, or whatever.

Anyway, in porting all the data from Nucleus I happened upon nucleus2wordpress.pl which purported to be a hack to shift all the posts from old versions of Nucleus into, er, WordPress. It didn’t work first time since the categories system within WordPress has changed so I’ve frigged it to work with the new WordPress taxonomy system.

You can download my updated version (which works with WordPress 3.3) here – no guarantees implied etc. and it depends on your Nucleus database being the same as your WordPress database (I just copied the tables over and then removed the Nucleus ones when I was happy). Enjoy!

I’ll be back blogging soon. ‘Interesting times’ over the past six months have taken priority…

Addendum: After running this script and becoming more familiar with the concepts involved in the WordPress database, I discovered the comment counts were incorrect (even though the imported comments themselves appeared within the posts). To correct this, run the following on your WordPress database (remembering to back it up first!):

UPDATE wp_posts wpp
 LEFT JOIN
 (SELECT comment_post_id AS c_post_id, count(*) AS cnt FROM wp_comments
 WHERE comment_approved = 1 GROUP BY comment_post_id) wpc
 ON wpp.id=wpc.c_post_id
 SET wpp.comment_count=wpc.cnt
 WHERE wpp.post_type IN ('post', 'page')
 AND (wpp.comment_count!=wpc.cnt OR (wpp.comment_count != 0 AND wpc.cnt IS NULL));

This worked fine on WordPress 3.3 and corrected the comment counters on the front page and search results.

Once again in a contract I find myself writing a small search engine, this time for an internal Intranet content management system. The client has several use-cases of which several contain words 3 characters long such as ‘cog’ and ‘net’. Additionally, they’ve got a couple of abbreviations with punctuation in (‘H&S’ being one). And once again I find myself explaining to the boss why the MySQL fulltext search doesn’t return a full set of results because (out-of-the-box) the fulltext defaults aren’t very useful.

So here’s why, and how you might be able to fix the issues (and ‘don’t use MySQL’ is not going to be an answer here, dolphinistas)…

For a start, it only indexes words of 4 characters or more, so if you’re trying to index ‘cog’ or ‘net’ then it won’t unless you explicitly configure MySQL to enable this behaviour using the ‘ft_min_word_len config parameter. Perish the thought if you’re search for your own name and you’re called Bob, because MATCH(keywords) AGAINST(“bob”) will always return nothing, as will searching for the classic album ‘Let It Be’.

Then let’s take the stopword list. Stopwords in MySQL are words which won’t be indexed – common words such as ‘the’, ‘and’, ‘or’, etc. (there’s a full list here if you’re curious). However, it also prevents indexing of words such as ‘will’ – a real bugger if you’re indexing a list of users: pity the poor guy in sales called Will who can’t find his own name. You can give MySQL an alternative stoplist in the config, or disable it completely using ‘ft_stopword_file‘.

(It’s not looking good for poor Will or Bob is it!)

Finally, we’ve got the punctuation. This one is a little bit more silly because of the MySQL boolean search which uses various forms of punctuation to specify which words are mandatory, optional, grouped, and so on: the default list is ‘+ -><()~*:””&|‘ (note that we’ve got & in there). Personally I change & to ? since I’m less likely to index with the latter, using ‘ft_boolean_syntax‘ in the config file, and things like ‘H&S’ will get indexed.

Therefore your my.cnf gets the following extra bits:

ft_boolean_syntax=+ -><()~*:""?|
ft_min_word_len=3
ft_stopword_file=

So, mostly sortable right? Wrong: these are server-wide variables and if you alter them you must reindex all fulltext-indexed tables using REPAIR TABLE (or the dreaded myisamchk if you have a lot to do). Now read that again: it’s server-wide so will affect all databases on that server… As an example, the client I’m working with right now shares the CMS database server with a MediaWiki installation which has configured in some places therefore it comes under the banner of ‘service-impacting’ straight away. It also means if you’re on a shared hosted service, no matter how much you lick your ISP it’s unlikely that they’ll change the config on a shared box full of users (with all the reindexing that implies) just for you.

I understand that the InnoDB engine is getting fulltext capability in the next major version of MySQL – about time too: I hope that will herald fulltext configuration changes on a per-database (or even more usefully a per-table) basis.

(Postscript: Nowadays I tend to tackle most of this by creating a search index from data I need to make searchable, a simple script which assembles data and removes unneeded words in a self-configured stoplist table (that way I can also write something which warns the user that not all their search terms were used). I still need to do the changes above but it affords me a lot more control over index sizes, etc.)