We launched Bibleshark’s Parallel Concordance

The concordance is live. Go look around.

It’s an early release – we know it’s a little rough. We just wanted to show everyone our collection of information.

Some of the “How we did it” is featured in this post: Reversing MySQL’s Fulltext Index.

Advertisements

Reversing MySQL’s Fulltext Index

MySQL’s Fulltext Index is very useful. I’ve written a little about it before.

With Bibleshark, we want to provide a text search and a Concordance. We tried reinventing the wheel using PHP, but this is going to be a much easier / more maintenance friendly task.

shell> sudo myisam_ftdump -dv /var/lib/mysql/database/table_name.MYI 6 > words

myisam_ftdump can take all of the words that were indexed by the fulltext index and give you a list similar to this one:


ee6818 0.8787346 zuzim
12981f0 0.8787346 zuzim
175856c 0.8699434 zuzim
20d9e44 0.8787346 zuzim
258c91c 0.8699434 zuzim
2a984c8 0.8877053 zuzim
3023d3c 0.8787346 zuzim
35cd580 0.8787346 zuzim
c0e8 0.8787346 zuzims
1c1d43c 0.8787346 zuzims

Unfortunately, the hex number on the far left does not have any meaning to our database.

Here’s how I got it working:

  • Import the words into MySQL.
  • First I took the text file from my system, imported it to Excel and exported it as a CSV. (Hint: It’s fixed width.)
  • Then, I used SQLyog to import the words into MySQL.
  • Next, I ran this php script:
<?php
/* table: scripture_words was the imported table from myisam_ftdump. 
site_scripture is our existing scripture table */
mysql_connect("host", "user", "password");
mysql_select_db("database");
$word_sql = "SELECT scripture_word from site_words";
$word_result = mysql_query($word_sql) or die(mysql_error());
while($data = mysql_fetch_assoc($word_result)) {
        $match_sql = "SELECT id FROM site_scripture WHERE 
                 MATCH(scripture) AGAINST('" . addslashes($data&#91;'scripture_word'&#93;) . "')";
        $match_result = mysql_query($match_sql) or die(mysql_error());
        while($match_data = mysql_fetch_assoc($match_result)) {
                $insert_sql = "INSERT INTO scripture_word(word, scripture_id) 
                        VALUES('" . addslashes($data&#91;'scripture_word'&#93;) . "', " . $match_data&#91;'id'&#93; . ")";
                @mysql_query($insert_sql);
        }
}
?>

Now the data is built and in a useful format for us to let Django to take care of the rest.

[08-24-2008] UPDATE: I’ve automated this process.

sudo myisam_ftdump -dv /var/lib/mysql/database/tablename.MYI COL_NUMBER 
| awk '{ print $3 }' 
| uniq
 &gt; word_list.txt
LOAD DATA INFILE '/path/to/word_list.txt' INTO TABLE database.table(column);

Re: Is SQL Server ‘keeping up’ with MySQL?

I came across this article in my DBA readings: Is SQL Server keeping up with MySQL?

Since I’ve got experience on both sides of the fence, I’ll chime in and try to not be biased.

When did MySQL finally add stored procedures?

Version 5.0. From my experience, they are young/buggy. For instance no variables for LIMITs or default parameter values.

How about transaction support, SSIS-like capabilities, a scheduled job management system, and a relatively easy-to-use consolidated GUI management tool?

Version 4.1. The rest of the question is a bit loaded.

MySQL as a query engine has lots of nice features and performs well. You should always use the platform that will meet your needs.

Simply put, MySQL’s administration tools don’t cut it.

“LIMIT” notwithstanding, if SQL Server’s functional specs weren’t already light years beyond MySQL with SQL Server 2005, it will certainly leave them in the dust when SQL Server 2008 hits the streets.

I don’t know about light years. Especially for the query engine.

SQL Server’s query engine is still greatly ahead of MySQL and the administrative tools are light years ahead.

MySQL EXPLAIN Visualization with mktools-visual-explain

EXPLAIN statement visualization is something I love about Aqua Data Studio (we use it a lot with DB2) and SQL Server Management Studio. It takes the result of your EXPLAIN and displays it in an easier to understand diagram. Here’s example output from SQL Server:

visual-explain-ssms.png

Aqua Data Studio doesn’t support MySQL Visual Explains (yet). The only way I’ve been able to find EXPLAIN visualization is with maatkit and mk-visual-explain.

Here’s an explain from one of the queries in a project (columns have changed names)

Here’s the output with mk-visual-explain

Which one is easier to understand?

Showdown: MySQL vs. SQL Server 2005 (Backups Edition)

My disaster recovery needs from a database standpoint:

  1. Ability to restore to a point in time
  2. Retention period of 4 weeks
  3. E-mail notification on failure

MySQL Backups

A basic cron job dumping your databases on a daily basis:

shell> crontab -e

Add this to your crontab:


0 2 * * * mysqldump -ubackup_user -ppasswordhere --all-databases | gzip -c > /path/to/mysql_backups/backup_`date +%Y-%m-%d`.sql.gz

This will run at 2:00 AM daily and back up all of your databases. It will produce a fancy .sql.gz file that has a date stamp. Example: backup_2008-03-12.sql.gz. For generating Cron jobs, look at this site.

This is nice, but you’ll have to pay attention to your backup folder because eventually you will run out of disk space. You will not receive any notification when the backups fail or when you run out of disk space from this method. You also see that it gives anyone who has access to your crontab access to your password. Make sure this is a database user that has READ ONLY access to all dbs/tables.

SQL Server 2005 Backups

Some will say I’m a little biased… but this is one of the parts of SQL Server that makes it worth the price. If an image is blurry in the display, click it for full resolution.

  1. Create a new Maintenance plan
  2. Click ‘Back Up Database’ on the toolbox.
    toolbox.png
  3. Drag to the blank window.
  4. Select your databases, and various options for your backups. I recommend using ‘verify backup integrity’ and having your .bak files moved off site.
    ss-maintplanoptions.png
  5. Schedule your maintenance plan.
    schedule.png
  6. Add a notify operator task (you must have Database Mail Configured) , and drag the red node to it. This will email your specified operator (I get them sent to my phone) when a backup fails.
  7. Clean up! I do separate a weekly data cleanup maintenance plan to check my indexes, delete my backups older than a month, and lots of other things. Here’s a screenshot:
    ss-maintplan-extended.png

Clearly, SQL Server 2005 meets and exceeds my disaster recovery needs over MySQL. Did I mention it was extremely easy to set up in SQL Server Management Studio?

I didn’t have to research bash datestamps or find a site to generate crontabs.

Tim suggests I use this perl script for deleting old MySQL backups (run at the same interval as the backups):


#!/usr/bin/perl
use File::Find;
my $path = shift;
my $threshold = 60*60*24*28;
find(sub { /mysqlbackup\.tar\.gz$/i and
unlink "$_" and
print "$File::Find::name deleted\n\n" and
(( ((stat($_))[9] ) + $threshold) > time);
}, $path);

Also from the comments:

find /path/to/mysql_backups/ -type f -mtime +N -exec rm -rf {} \;

USE AT YOUR OWN RISK.

MySQL Goodies

MySQL: The Binary Log

From 5.2.3. The Binary Log, 6.2.2. Using Backups for Recovery, 6.3. Point-in-Time Recovery

Starting mysql like with --log-bin switched does this:

The binary log contains all statements that update data or potentially could have updated it (for example, a DELETE which matched no rows). Statements are stored in the form of “events” that describe the modifications. The binary log also contains information about how long each statement took that updated data.

This is near the equivalent of a transaction log in other DBMSes. You will notice a minimal (1%) loss in performance, but you’ll be able to restore your database to specified points in time.

To restore, run two commands:


shell> mysql < backup_sunday_1_PM.sql
shell> mysql mysqlbinlog host-bin.000007 host-bin.000008 | mysql

There are a few ways to deal with your logs from a maintenance standpoint.

You can periodically run this command:

PURGE MASTER LOGS

Or

shell> mysqladmin flush-logs

But my choice goes to changing expire_logs_days in /etc/my.cnf to an appropriate number (I like 30 days). You will have to bounce MySQL after updating /etc/my.cnf.

MySQL: Selecting a Random Record

From ~jk ORDER BY RAND()

SELECT col1, col2
  FROM table JOIN
       (SELECT CEIL(RAND() *
                    (SELECT MAX(id)
                       FROM table)) AS id
        ) AS r2
       USING (id);

MySQL: Full Text Search

From 11.8.1. Natural Language Full-Text Searches and MySQL SQL Syntax and Use

To examine your existing fulltext configuration directives:

show variables where variable_name like 'ft_%'

To change the minimum length of words to caught in your fulltext indexes, change the value of ft_min_word_len.

Example fulltext query:

SELECT * FROM apothegm
    WHERE MATCH(attribution, phrase)
    AGAINST('bell');

If you really want your fulltext searches to go to the next level, take a look at Stemming. This changes words into their “root” words so their variances can be found easier.

MySQL Closest GIS Location

With Latitude/Longitude stored in two seperate fields:

SELECT * FROM table 
ORDER BY SQRT(POW(fieldLong - givenLong, 2) + POW(fieldLat - givenLat, 2))

Rewrite’s Complete

That wasn’t too bad. It took me probably two days total to completely overhaul the app’s model to an object based design. Those two days are going to save me many many days down the road.

More updates today… lunch time.