Latest Updates: sphinxsearch RSS

  • Articles

    Sphinx API - The basics part 1

    Stii 2:33 pm on February 24, 2009 | Comments: 0 Permalink | Reply
    Tags: , sphinxapi, sphinxsearch

    Sphinx SearchYou’ve set up Sphinx Search and everything seems to be running smooth.

    Now you have to put it to work. This is where the API comes into play! All the API does is it lets you communicate with the indexes you’ve created. Various programming languages are supported like: JAVA, Python, PHP and Ruby. (NOTE: The Ruby on Rails community wrote some plugins for Rails for the Ruby API.)

    Lets look at the PHP version. The API is basically a PHP class you can create an instance of and then use the various methods to interact with the search daemon.

    First you need to add the file called sphinxapi.php to your include path. In your application, require the sphinxapi.php file:

    require_once "/path/to/sphinxapi.php";
    

    Next, create a new instance:

    $sphinx = new SphinxClient();
    

    Specify your server and port settings where your search daemon is running on:

    $sphinx->SetServer( "localhost", 3312 );
    

    Now, you’re all good to go and ready to start query the indexes. More on that in the next Sphinx Search API post.

    Welcome back! You should subscribe to my RSS feed here.
    You should follow me on Twitter here
    You should follow me on Gatorpeeps here.

    Save Cape Town City Ballet
     
  • Articles

    Setting up Sphinx Search (Part 2)

    Stii 2:18 pm on November 25, 2008 | Comments: 1 Permalink | Reply
    Tags: sphinxsearch

    Sphinx SearchI’ve split the setting up of Sphinx Search in two parts. Please read Setting up Sphinx Search (Part 1) first.

    Lets look at a “real world” example setup:

    #The settings and SQL source for your index
    source blogposts
    {
        type = mysql
    	sql_host = localhost
    	sql_user = dbuser
    	sql_pass = secret
    	sql_db = dbname
    	sql_port = 3306	#optional, default is 3306
    
    	sql_query_pre = SET NAMES utf8
    
    	sql_query = \
    		SELECT id, blog_id, title, body, \
                    UNIX_TIMESTAMP(publish_date) AS publish_date\
    		FROM blogposts \
    
    	sql_attr_uint = blog_id
    
    	sql_attr_timestamp = publish_date
    }
    

    The first section is the source of your index. Basically, it is the SQL query that you’d like to get indexed to search against. Note the sql_attr settings after the query. What that does is it allows you to filter or order your search results. For example: Lets say you want to order your blog post results by date, then you can in the API tell it to order by the field publish_date or lets suppose you only want to search blog posts from a blog with the blog_id = 2 you can tell the API in your code to filter results only for blog_id 2.

    The next part is setting up the index for your source.

    #The index for the source
    index blogpostsindex
    {
            source = blogposts
            path = /var/sphinx/data/blogposts
    }
    

    The source setting reference your source you’ve specified earlier. I.e., which SQL query it should index. When you use the API to query your index, you would have to specify blogpostindex as the index you’re querying. The path is where your index files would be stored on your web server. NOTE: The directory specified in the path setting need to be writable!

    Next, we specify settings for your indexer command.

    indexer
    {
        mem_limit = 256M
    }
    

    The mem_limit setting is optional, but it would be good to set it so that your indexer don’t eat too much of your server resources when it build your index.

    Finally, you need to set up your search daemon.

    searchd
    {
            log = /var/log/searchd.log
            query_log = /var/log/query.log
            pid_file = /var/log/searchd.pid
    }
    

    By the very least you need to specify the path to your log files and your process id or pid.

    With all this set up, you’re good to go. First you would build your index:

    indexer --all
    

    The –all would build all your indexes if you have more than one set up. Once the index is built successfully, you can fire up the daemon:

    searchd
    

    You would probably want to setup a cron job in order to periodically update your index. All you need to do is to add a crontab entry running the following command:

    /usr/local/bin/indexer --all --rotate
    

    The –rotate will build a new index and once it is completed it would replace the old with the new without interrupting your search daemon. NOTE: If your indexes are big you can create delta indexes and merge it with the current ones. Delta indexes are indexes with only the latest data. I’ll explain at a later stage.

    Thats it! You’re good to go. Next we’ll look at the API and how to run queries.

    Save Cape Town City Ballet
     
  • Articles

    Setting up Sphinx Search (Part 1)

    Stii 2:17 pm on November 25, 2008 | Comments: 4 Permalink | Reply
    Tags: sphinxsearch

    Sphinx SearchSo far, I’ve introduced Sphinx search and told you how to install Sphinx search. The next step would be to get it running. This is almost as simple as installing it, but a little bit more work. This is only a very basic configuration setup. I might at a later stage introduce you to some of the more complex settings.

    To get it up and running, there are three components that you need to be aware of.

    1. The sphinx.conf configuration file which is the heart and soul of Sphinx search
    2. The indexer command which will compile the index that you will be querying
    3. The searchd command which will start the search daemon (or service) on your server

    Note that by default on a Debian server, the indexer and searchd commands can be found in /usr/local/bin/ and the sphinx.conf file can be found in /usr/local/etc/

    1. Editing the sphinx.conf file
    You can either create a sphinx.conf file locally and then secure copy it to the /usr/local/etc/ directory on your server, or you can simply use your favorite command line editor (which should be vim :) ) on your server. There is a sample config file which you could copy and then edit, should you want to. Do it like such:

    cd /usr/local/etc
    cp sphinx.conf.in sphinx.conf
    

    In the config file, there are 4 main configuration components:

    1. The source of the index. I.e. what it should index. Plainly put, which database, the query, the query settings, etc.
    2. The index settings. How should Sphinx use the source you’ve set up.
    3. The indexer settings. How should Sphinx index your previously set up indexes
    4. The searchd settings. How should Sphinx listen for queries from your application

    Note that the source section of the index can be inherited, so you don’t have to repeat settings all the time. Also note that you may well want more than one index to query (lets say you want an index for blog posts and for bookmarks) so for every index you need, you’ll create a source and index. This is then in VERY basic terms what your structure of the config file should look like:

    #The settings and SQL source for your index
    source blogposts
    {
        #SQL and settings goes here
    }
    #The index for the source
    index blogpostsindex
    {
        #the source of this index
        #and its settings goes here
        source = blogposts
    }
    
    indexer
    {
        #settings for the indexer command
    }
    
    searchd
    {
        #settings for the sphinx daemon goes here
    }
    

    Lets move on to a practical example. Setting up Sphinx Search Part 2

    Save Cape Town City Ballet
     
  • Articles

    Installing Sphinx Search

    Stii 3:50 pm on November 17, 2008 | Comments: 1 Permalink | Reply
    Tags: , , sphinxsearch

    Sphinx SearchNow that you have been introduced to Sphinx Search, its time to set it up or install it. It is extremely simple. Just have a look at the guide for Debian, to give you an idea:

    sudo apt-get update
    sudo apt-get dist-upgrade
    sudo apt-get install build-essential
    sudo apt-get install libmysqlclient15-dev
    
    tar xvzf sphinx-0.9.8-rc2.tar.gz
    cd sphinx-0.9.8-rc2/
    ./configure \
    --with-mysql-includes=/usr/include/mysql \
    --with-mysql-libs=/usr/lib/mysql
    make
    sudo make install
    

    Below are two of the best guides I could find. One for installing on Debian and the other on CentOS.

    Next time, we’ll look at configuring it to actually work. Also, how to run some queries and how the results from Sphinx Search looks.

    Save Cape Town City Ballet
     
  • Articles

    Sphinx Search introduction

    Stii 10:24 am on November 17, 2008 | Comments: 2 Permalink | Reply
    Tags: , , sphinxsearch

    Sphinx SearchFull text search is usually not a nicety, but often a necessity. To search a database text field by simply doing a normal SQL query could often result in a slow query, which is NOT GOOD. Now, you might not always have a need for full text search, specially if you have a smallish data set, but if you get to a certain threshold you’ll hit serious performance problems.

    That is exactly what happened on Afrigator. We started out using normal, simple SQL SELECT queries.

    SELECT * FROM posts
    WHERE title LIKE '%keywords%'
    OR body LIKE '%keywords%'
    

    This was doing just great in the beginning. Nowadays our blog post table is close to a Gig in size and as you can imagine, that is a lot of words to try and filter to find specific terms! Funny thing is, a Gig is still tiny in web terms. What is a fact though is that it will only grow and with it, so would our performance issues!

    Thank goodness for Sphinx Search! Sphinx Search is an open source (GPL2 License) full text search engine. It is FAST. Not only is search fast, but so also is the text indexing.

    According to the Sphinx Search site, here is a list of features:

    • high indexing speed (upto 10 MB/sec on modern CPUs);
    • high search speed (avg query is under 0.1 sec on 2-4 GB text collections);
    • high scalability (upto 100 GB of text, upto 100 M documents on a single CPU);
    • provides good relevance ranking through combination of phrase proximity ranking and statistical (BM25) ranking;
    • provides distributed searching capabilities;
    • provides document exceprts generation;
    • provides searching from within MySQL through pluggable storage engine;
    • supports boolean, phrase, and word proximity queries;
    • supports multiple full-text fields per document (upto 32 by default);
    • supports multiple additional attributes per document (ie. groups, timestamps, etc);
    • supports stopwords;
    • supports both single-byte encodings and UTF-8;
    • supports English stemming, Russian stemming, and Soundex for morphology;
    • supports MySQL natively (MyISAM and InnoDB tables are both supported);
    • supports PostgreSQL natively.

    As you can see, that’s quite a nifty feature set! When we implemented it on Afrigator, we chose to run Sphinx Search as a daemon, though you could compile MySQL with Sphinx support should you choose.

    It was also SUPER SIMPLE to set it up! All you would have to have to get it running for your own application is to be able to compile C++ on your web server. I.e. you’d have to have a shell account and capabilities to compile using gcc on Linux.

    One tiny hiccup I experienced was that I needed to have the libmysqlclient-dev package installed (if you haven’t already) in order for Sphinx to compile. They don’t mention that in the documentation, so just be aware of that. Other than that, you simply have to follow the steps in the documentation to get it up and running.

    Once compiled successfully, you have to set it up and configure it. More on that at a later stage. For now, if you are developing web apps that is heavily dependent on full text search, I can recommend Sphinx Search as a viable solution.

    Save Cape Town City Ballet
     

About Me

Software developer at Afrigator.com Love Python, do PHP.
c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
esc
cancel