Sphinx SearchFull text search is usually not a nicety, but often a necessity. To search a database text field by simply doing a normal SQL query could often result in a slow query, which is NOT GOOD. Now, you might not always have a need for full text search, specially if you have a smallish data set, but if you get to a certain threshold you’ll hit serious performance problems.

That is exactly what happened on Afrigator. We started out using normal, simple SQL SELECT queries.

SELECT * FROM posts
WHERE title LIKE '%keywords%'
OR body LIKE '%keywords%'

This was doing just great in the beginning. Nowadays our blog post table is close to a Gig in size and as you can imagine, that is a lot of words to try and filter to find specific terms! Funny thing is, a Gig is still tiny in web terms. What is a fact though is that it will only grow and with it, so would our performance issues!

Thank goodness for Sphinx Search! Sphinx Search is an open source (GPL2 License) full text search engine. It is FAST. Not only is search fast, but so also is the text indexing.

According to the Sphinx Search site, here is a list of features:

  • high indexing speed (upto 10 MB/sec on modern CPUs);
  • high search speed (avg query is under 0.1 sec on 2-4 GB text collections);
  • high scalability (upto 100 GB of text, upto 100 M documents on a single CPU);
  • provides good relevance ranking through combination of phrase proximity ranking and statistical (BM25) ranking;
  • provides distributed searching capabilities;
  • provides document exceprts generation;
  • provides searching from within MySQL through pluggable storage engine;
  • supports boolean, phrase, and word proximity queries;
  • supports multiple full-text fields per document (upto 32 by default);
  • supports multiple additional attributes per document (ie. groups, timestamps, etc);
  • supports stopwords;
  • supports both single-byte encodings and UTF-8;
  • supports English stemming, Russian stemming, and Soundex for morphology;
  • supports MySQL natively (MyISAM and InnoDB tables are both supported);
  • supports PostgreSQL natively.

As you can see, that’s quite a nifty feature set! When we implemented it on Afrigator, we chose to run Sphinx Search as a daemon, though you could compile MySQL with Sphinx support should you choose.

It was also SUPER SIMPLE to set it up! All you would have to have to get it running for your own application is to be able to compile C++ on your web server. I.e. you’d have to have a shell account and capabilities to compile using gcc on Linux.

One tiny hiccup I experienced was that I needed to have the libmysqlclient-dev package installed (if you haven’t already) in order for Sphinx to compile. They don’t mention that in the documentation, so just be aware of that. Other than that, you simply have to follow the steps in the documentation to get it up and running.

Once compiled successfully, you have to set it up and configure it. More on that at a later stage. For now, if you are developing web apps that is heavily dependent on full text search, I can recommend Sphinx Search as a viable solution.

Welcome back! You should subscribe to my RSS feed here.
You should follow me on Twitter here
You should follow me on Gatorpeeps here.