Reversing MySQL’s Fulltext Index

MySQL’s Fulltext Index is very useful. I’ve written a little about it before.

With Bibleshark, we want to provide a text search and a Concordance. We tried reinventing the wheel using PHP, but this is going to be a much easier / more maintenance friendly task.

shell> sudo myisam_ftdump -dv /var/lib/mysql/database/table_name.MYI 6 > words

myisam_ftdump can take all of the words that were indexed by the fulltext index and give you a list similar to this one:

ee6818 0.8787346 zuzim
12981f0 0.8787346 zuzim
175856c 0.8699434 zuzim
20d9e44 0.8787346 zuzim
258c91c 0.8699434 zuzim
2a984c8 0.8877053 zuzim
3023d3c 0.8787346 zuzim
35cd580 0.8787346 zuzim
c0e8 0.8787346 zuzims
1c1d43c 0.8787346 zuzims

Unfortunately, the hex number on the far left does not have any meaning to our database.

Here’s how I got it working:

  • Import the words into MySQL.
  • First I took the text file from my system, imported it to Excel and exported it as a CSV. (Hint: It’s fixed width.)
  • Then, I used SQLyog to import the words into MySQL.
  • Next, I ran this php script:
/* table: scripture_words was the imported table from myisam_ftdump. 
site_scripture is our existing scripture table */
mysql_connect("host", "user", "password");
$word_sql = "SELECT scripture_word from site_words";
$word_result = mysql_query($word_sql) or die(mysql_error());
while($data = mysql_fetch_assoc($word_result)) {
        $match_sql = "SELECT id FROM site_scripture WHERE 
                 MATCH(scripture) AGAINST('" . addslashes($data&#91;'scripture_word'&#93;) . "')";
        $match_result = mysql_query($match_sql) or die(mysql_error());
        while($match_data = mysql_fetch_assoc($match_result)) {
                $insert_sql = "INSERT INTO scripture_word(word, scripture_id) 
                        VALUES('" . addslashes($data&#91;'scripture_word'&#93;) . "', " . $match_data&#91;'id'&#93; . ")";

Now the data is built and in a useful format for us to let Django to take care of the rest.

[08-24-2008] UPDATE: I’ve automated this process.

sudo myisam_ftdump -dv /var/lib/mysql/database/tablename.MYI COL_NUMBER 
| awk '{ print $3 }' 
| uniq
 &gt; word_list.txt
LOAD DATA INFILE '/path/to/word_list.txt' INTO TABLE database.table(column);

Database Connectivity With Linux & PHP to IBM iSeries & DB2


I work with a local government, where the majority of our existing information systems are on an IBM iSeries machine.

The entire city runs on what is referred to as “green screen” apps. I was charged with the task of bringing our municipal utility company’s bill payment online. Well, not exactly bringing it online… but giving what is currently online a major overhaul.

The existing site was done with an abhorrent product called IBM WebSphere. Apparently people have made pretty impressive applications with this product, but I don’t see how. I spent three months in what I’ve dubbed “JSP hell” (while also working on other PHP/MySQL websites) trying to figure out what a Web Interaction is and then trying to figure out what COBOL program runs when a certain form is submitted. I’m not even going to start on COBOL.

Scattered throughout the JSP code were the one thing that made sense besides the HTML, SQL queries.

unix ODBC to the rescue

NOTE: This was all done on SUSE 10 Enterprise. Your mileage may very, blah blah blah, the usual disclaimer.

Per this wonderful, difficult to find IBM Redbook I found out that I need to install something called iSeries Access (IBM Account Required) – the software is in the form of a RPM.

[shell] rpm -ivh iSeriesAccess-5.2.0-X.YY.i386.rpm

We already had unixODBC installed.

After iSeries Access is installed, run this command:

[shell] odbcinst -j

You’ll get a path to some .ini files. Ours were in /etc/unixODBC/. Fire up your favorite text editor and edit /etc/unixODBC/odbc.ini.

Here is the sample config file that IBM references in the redbook. System is the IP or Domain name of the iSeries machine. Everything else is pretty self explanatory. Refer back to the redbook for clarification on a few of the settings if you need to.

After you’re done editing your file, verify that it works by running this command:

[shell] isql [Your DSN Name] [Your username] [Your Password]
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |

Now, you may need to reconfigure PHP to have unixODBC (use ./configure –with-unixODBC) . I didn’t.

After that, it’s simple: use odbc_connect() and you’re on your way.

Django Changed The Game

It all started with a pretty ordinary project request from our Parks & Recreation department – “we want to be able to enter/edit scores for our softball games.”

So, like any good developer, requested a meeting so I could really understand what was going on… it boiled down to this kind of these database entities:

scheduled games
completed games

Here’s a more traditional view of the design: (click it for a full view)

Keep in mind, only about four or five people (max) need to be able to have CRUD abilities with this data.

However, designing a system like this with PHP would be pretty time consuming to say the least. Even with my nifty little utils and db classes that I tout around.

Stevo had been ranting and raving about Django, specifically it’s administrative interface. So, I gave it a try. Luckily, I had been intending on doing this for some time. My server at home was ready to roll – Django was installed and I was ready to dive right in. A few hours later and with the guidance of Stevo, my model was made.

The source is simple. Python was very intuitive, except for a weird placement of a tuple ( [[ here it was ::: None, {‘fields’: (‘name’,)}),) ]] but BAM. The administrative interface was done before lunch.

PHP is unable to compete with this rapid administration capability. Now to get this put on our production server…

Push != Pull, or Why You Should RTFM

My first idea for an F8 app has tanked. I should have read those docs a little further…

I was making a simple countdown, because lot of friends of mine used their status to count down to an event.

From the FAQ:

Is there a way I can avoid pushing content individually to each user?

You can specify default FBML in your application settings, but if you want the boxes to look different for each user, you must manually push to those users. The content rendered in the profile at the time of display must come entirely from our servers to ensure speed and privacy and some level of design sensibility (hence, this is a “push” and not a “pull” model). You can make dynamic content in response to a user’s interaction with your module (via “mock ajaxv or flash). If there is content you want to share across multiple people’s profiles, consider using the tag.

Well… damn. The countdown won’t update itself upon every profile view.

So I thought, I know… I’ll use a PHP generated image, just like I have on my blog… Nope. From the FBML docs:

img tags will also be handled specially. At publish time, any URLs given for images will be requested by our servers and then served by our own image servers. The src attribute of img tags will be rewritten. This protects the privacy of our users and allows us to better control quality of service of images.

The only other way I can envision doing this is pushing out FBML updates via Cron daily or at a specified interval… think the users would go for that?

The Demo At Last

If you’re interested in a demo of what I’ve been referring to as “the app”, figure out a way to contact me about getting a URL and a Login/Pass.

Some more notes about it:

– Moving it to the server was a great learning experience. I’ve already made substantial improvements in performance and future compatibility which unfortunately are not in the version that I have somewhere on the web for demonstration purposes.

– The application’s purpose is becoming a lot more clear. It’s designed for the end users who need to be able to easily maintain the static parts of sites. It seems that everyone around here needs some type of database functionality. The directory structure I’ve been using is “site_url/content/page” for the static types of pages and “site_url/app/application” for others. I think it’s going to be pretty nice.

– It’s still early. I haven’t decided on a good way for users to include images in their pages. I originally wanted them to use some sort of 3rd photo gallery option, but that is not really sufficient regular inline images. Suggestions/ideas are welcome. I’d rather not do the uploading and thumbnailing on the server. =)

– The idea behind this is that I still maintain 100% of the control of the site’s “structure”/”look” while someone (typically a departmental secretary) maintains the content without having to talk to me. It’s very easy for me to make a new site with XHTML/CSS that uses content from this system.

They Upgraded, I’m getting it out there

So I’m working on getting everything working so I can test.

Lessons learned so far:

-Develop on the platform you’re implement on. Since I was developing this entirely locally with a WAMP server, I’ve run into a few quirks. Mainly with my inconsistent instantiation $nav = new Navigation(); versus $nav = new navigation(); — with PHP’s autoload function it was causing my stuff to die.

– Again with platform… backslashes versus forward slashes … ugh. Let’s also remember some servers have magic quotes enabled and some do not, in case you’re inserting to a database…

– Load times are a little slower than I was hoping for… I am going to try it later tonight to see if it’s an issue here, an application issue, or a dreamhost issue.

The Last 5%

A couple of days ago I mentioned the first indoctrination of “the app” is 95% done. The last 5% is hard. I’m actually spending this time removing a couple of those oh-so-nifty AJAX features.

The problem is they were taking away from page loads. I originally wanted to “deep link” to a couple of features, such as “Edit Content for this Page” and that wasn’t too possible because the WYSIWYG interface was being pulled up through an AJAX response.

So… I’m taking that (and a couple of others) out and fixing them.

I maaaaaaaay have a demo online by the beginning of the next week for review/suggestions. Wouldn’t that be special?