Reversing MySQL’s Fulltext Index

MySQL’s Fulltext Index is very useful. I’ve written a little about it before.

With Bibleshark, we want to provide a text search and a Concordance. We tried reinventing the wheel using PHP, but this is going to be a much easier / more maintenance friendly task.

shell> sudo myisam_ftdump -dv /var/lib/mysql/database/table_name.MYI 6 > words

myisam_ftdump can take all of the words that were indexed by the fulltext index and give you a list similar to this one:


ee6818 0.8787346 zuzim
12981f0 0.8787346 zuzim
175856c 0.8699434 zuzim
20d9e44 0.8787346 zuzim
258c91c 0.8699434 zuzim
2a984c8 0.8877053 zuzim
3023d3c 0.8787346 zuzim
35cd580 0.8787346 zuzim
c0e8 0.8787346 zuzims
1c1d43c 0.8787346 zuzims

Unfortunately, the hex number on the far left does not have any meaning to our database.

Here’s how I got it working:

  • Import the words into MySQL.
  • First I took the text file from my system, imported it to Excel and exported it as a CSV. (Hint: It’s fixed width.)
  • Then, I used SQLyog to import the words into MySQL.
  • Next, I ran this php script:
<?php
/* table: scripture_words was the imported table from myisam_ftdump. 
site_scripture is our existing scripture table */
mysql_connect("host", "user", "password");
mysql_select_db("database");
$word_sql = "SELECT scripture_word from site_words";
$word_result = mysql_query($word_sql) or die(mysql_error());
while($data = mysql_fetch_assoc($word_result)) {
        $match_sql = "SELECT id FROM site_scripture WHERE 
                 MATCH(scripture) AGAINST('" . addslashes($data&#91;'scripture_word'&#93;) . "')";
        $match_result = mysql_query($match_sql) or die(mysql_error());
        while($match_data = mysql_fetch_assoc($match_result)) {
                $insert_sql = "INSERT INTO scripture_word(word, scripture_id) 
                        VALUES('" . addslashes($data&#91;'scripture_word'&#93;) . "', " . $match_data&#91;'id'&#93; . ")";
                @mysql_query($insert_sql);
        }
}
?>

Now the data is built and in a useful format for us to let Django to take care of the rest.

[08-24-2008] UPDATE: I’ve automated this process.

sudo myisam_ftdump -dv /var/lib/mysql/database/tablename.MYI COL_NUMBER 
| awk '{ print $3 }' 
| uniq
 &gt; word_list.txt
LOAD DATA INFILE '/path/to/word_list.txt' INTO TABLE database.table(column);
Advertisements

MySQL EXPLAIN Visualization with mktools-visual-explain

EXPLAIN statement visualization is something I love about Aqua Data Studio (we use it a lot with DB2) and SQL Server Management Studio. It takes the result of your EXPLAIN and displays it in an easier to understand diagram. Here’s example output from SQL Server:

visual-explain-ssms.png

Aqua Data Studio doesn’t support MySQL Visual Explains (yet). The only way I’ve been able to find EXPLAIN visualization is with maatkit and mk-visual-explain.

Here’s an explain from one of the queries in a project (columns have changed names)

Here’s the output with mk-visual-explain

Which one is easier to understand?

SQL Server: KILL Processes using a database

If you need to do some operations on a database, sometimes you’re required to be the only user. Typically you will execute this procedure:

exec sp_who

Then you’ll get the spid‘s of each one related to your database and KILL them. The problem is there’s no ability to put a WHERE clause with sp_who.

use master;
select 'kill ', spid from sysprocesses p
	inner join sysdatabases d
	on p.dbid = d.dbid
WHERE d.name = 'DBNAME'

Say spid’s 4, 8, 15, 16, 23, and 42 are using the database ‘DBNAME’. Now you can just copy/paste your results window as a set of new queries:

kill 4
kill 8
kill 15
kill 16
kill 23
kill 42

MySQL Goodies

MySQL: The Binary Log

From 5.2.3. The Binary Log, 6.2.2. Using Backups for Recovery, 6.3. Point-in-Time Recovery

Starting mysql like with --log-bin switched does this:

The binary log contains all statements that update data or potentially could have updated it (for example, a DELETE which matched no rows). Statements are stored in the form of “events” that describe the modifications. The binary log also contains information about how long each statement took that updated data.

This is near the equivalent of a transaction log in other DBMSes. You will notice a minimal (1%) loss in performance, but you’ll be able to restore your database to specified points in time.

To restore, run two commands:


shell> mysql < backup_sunday_1_PM.sql
shell> mysql mysqlbinlog host-bin.000007 host-bin.000008 | mysql

There are a few ways to deal with your logs from a maintenance standpoint.

You can periodically run this command:

PURGE MASTER LOGS

Or

shell> mysqladmin flush-logs

But my choice goes to changing expire_logs_days in /etc/my.cnf to an appropriate number (I like 30 days). You will have to bounce MySQL after updating /etc/my.cnf.

MySQL: Selecting a Random Record

From ~jk ORDER BY RAND()

SELECT col1, col2
  FROM table JOIN
       (SELECT CEIL(RAND() *
                    (SELECT MAX(id)
                       FROM table)) AS id
        ) AS r2
       USING (id);

MySQL: Full Text Search

From 11.8.1. Natural Language Full-Text Searches and MySQL SQL Syntax and Use

To examine your existing fulltext configuration directives:

show variables where variable_name like 'ft_%'

To change the minimum length of words to caught in your fulltext indexes, change the value of ft_min_word_len.

Example fulltext query:

SELECT * FROM apothegm
    WHERE MATCH(attribution, phrase)
    AGAINST('bell');

If you really want your fulltext searches to go to the next level, take a look at Stemming. This changes words into their “root” words so their variances can be found easier.

MySQL Closest GIS Location

With Latitude/Longitude stored in two seperate fields:

SELECT * FROM table 
ORDER BY SQRT(POW(fieldLong - givenLong, 2) + POW(fieldLat - givenLat, 2))