Reversing MySQL’s Fulltext Index

MySQL’s Fulltext Index is very useful. I’ve written a little about it before.

With Bibleshark, we want to provide a text search and a Concordance. We tried reinventing the wheel using PHP, but this is going to be a much easier / more maintenance friendly task.

shell> sudo myisam_ftdump -dv /var/lib/mysql/database/table_name.MYI 6 > words

myisam_ftdump can take all of the words that were indexed by the fulltext index and give you a list similar to this one:


ee6818 0.8787346 zuzim
12981f0 0.8787346 zuzim
175856c 0.8699434 zuzim
20d9e44 0.8787346 zuzim
258c91c 0.8699434 zuzim
2a984c8 0.8877053 zuzim
3023d3c 0.8787346 zuzim
35cd580 0.8787346 zuzim
c0e8 0.8787346 zuzims
1c1d43c 0.8787346 zuzims

Unfortunately, the hex number on the far left does not have any meaning to our database.

Here’s how I got it working:

  • Import the words into MySQL.
  • First I took the text file from my system, imported it to Excel and exported it as a CSV. (Hint: It’s fixed width.)
  • Then, I used SQLyog to import the words into MySQL.
  • Next, I ran this php script:
<?php
/* table: scripture_words was the imported table from myisam_ftdump. 
site_scripture is our existing scripture table */
mysql_connect("host", "user", "password");
mysql_select_db("database");
$word_sql = "SELECT scripture_word from site_words";
$word_result = mysql_query($word_sql) or die(mysql_error());
while($data = mysql_fetch_assoc($word_result)) {
        $match_sql = "SELECT id FROM site_scripture WHERE 
                 MATCH(scripture) AGAINST('" . addslashes($data&#91;'scripture_word'&#93;) . "')";
        $match_result = mysql_query($match_sql) or die(mysql_error());
        while($match_data = mysql_fetch_assoc($match_result)) {
                $insert_sql = "INSERT INTO scripture_word(word, scripture_id) 
                        VALUES('" . addslashes($data&#91;'scripture_word'&#93;) . "', " . $match_data&#91;'id'&#93; . ")";
                @mysql_query($insert_sql);
        }
}
?>

Now the data is built and in a useful format for us to let Django to take care of the rest.

[08-24-2008] UPDATE: I’ve automated this process.

sudo myisam_ftdump -dv /var/lib/mysql/database/tablename.MYI COL_NUMBER 
| awk '{ print $3 }' 
| uniq
 &gt; word_list.txt
LOAD DATA INFILE '/path/to/word_list.txt' INTO TABLE database.table(column);
Advertisements

About andyhillky
I'm cool.

2 Responses to Reversing MySQL’s Fulltext Index

  1. Stephen Mizell says:

    Is that PHP!? Noooooooo!

  2. adam says:

    steveo i was looking through some files here at the office and i found bofe’s $13.37 paypal receipt for lifetime php fan club membership

%d bloggers like this: