PHP / MySQL Tag Clouds


Figure 1In this article you'll learn how to add tag support to your web content with PHP and MySQL. You can download the demo scripts from the attachment link above.

This guide covers the MySQL tables, SQL queries, PHP scripts, XHTML and CSS needed to add tag support to your website. Apache mod_rewrite rules are also covered so you can use search engine friendly URLs if you'd like to.

Introduction


Content is usually categorized using either a static set of categories or with tags (aka keywords). Choosing the most appropriate method is done on a feature-by-feature basis by assessing such things as content identification, accessibility (search and discovery), control and scalability.

What's the difference between categories and tags? From a backend standpoint they're overwhelmingly the same, although tags are slightly more involved. One of the pro's of using a fixed list of site defined categories is that they require no moderation. The potential downside is that it's a fixed list. User defined tags on the other hand allow for open content classification but have the downside of requiring a little moderating to keep things clean and consistent.

Figure 2

Tags are appropriate for user content such as photos, videos, articles and so on. Sometimes both categories and tags are used on the same content. In a tag cloud, font sizes usually depict how much content there is with the particular tag assigned to it. The larger the tag the more content there is (and potentially popular). Tag clouds offer users a visual representation of the content on a site which is quite cool.

Figure 3

MySQL Tables


Alright, let's get started by taking a look at some MySQL tables. For the sake of example, let's assume we're working with articles. Let's also assume search engine friendly URLs like the one below are being used to access articles.
  • /view/Comprehensive-Guide-to-MySQL-PHP-Tag-Clouds

The first thing you'll need to do is add a tags column to your primary content table. This way, no additional querying will be required to determine which tags have been assigned to the article(s).

articles


I've named the primary table articles and this table is to store the most static aspects of the fictitious article records. It is assumed the title, description and tags will rarely if ever change. In the real world you'd also have additional tables that store the more dynamic aspects of data such as last_modified, num_views, num_comments and that sort of thing.

  1. CREATE TABLE articles (
  2.     article_id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  3.     date_published INT(10) UNSIGNED NOT NULL,
  4.     title VARCHAR(100) NOT NULL,
  5.     url VARCHAR(100) NOT NULL,
  6.     description VARCHAR(500) NOT NULL,
  7.     tags VARCHAR(359) NOT NULL,
  8.     UNIQUE KEY(title),
  9.     UNIQUE KEY(url)
  10. ) ENGINE=MYISAM;


As mentioned above we're assuming search engine friendly URLs are being used. With this in mind the url field must be an index to optimize SELECTS and a UNIQUE KEY to enforce a little integrity so MySQL will deny INSERT / UPDATE attempts that would yield 2 records with the same URL. A unique key is also applied to the title column for title based searches and integrity.

Where did I come up with the VARCHAR length on the tags column? This setup allows for up to 10 tags to be assigned to each article where each tag may be up to 35 characters. A comma delimited list of tags is stored in the tags column such as the one below. Therefore, the maximum VARCHAR length is (10 * 35) + 9.
  • web dev,php,mysql,tag clouds

articles_tags


In this table we'll store the actual tags. When a new article is added to the database you'll query this table to see if the tag already exists and if it does simply increment the num_articles field otherwise you'll need to insert a new row.

  1. CREATE TABLE articles_tags (
  2.     tag_id SMALLINT(5) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  3.     tag_name VARCHAR(35) NOT NULL,
  4.     num_articles SMALLINT(5) UNSIGNED NOT NULL DEFAULT 1,
  5.     UNIQUE KEY(tag_name)
  6. ) ENGINE=MYISAM;


UNIQUE KEY was added to the tag_name column to optimize SELECTS by tag_name and prevent duplicates at the MySQL level. Your app should of course be written to prevent duplicates as well.

The num_articles column was added for the same reason tags was added to the articles table above. When you display your tag clouds you don't want to run a COUNT() on every tag to figure out how many articles have the particular tag. Always design your tables with scalability in mind. Keep it simple and keep it light-weight!

ITNewb uses the InnoDB engine and transactions just about everywhere. This is done in part to ensure columns like num_articles remain accurate should an error occur. With the MyISAM engine (typically, this is the default) there is always the potential for these values to become inaccurate.

Engines are usually chosen on a table-by-table basis. These engines are complex (especially InnoDB) and each has "pro"s and "con"s. You should really read up on engines if you haven't yet.

articles_tagged


This table is used to search for articles that have been assigned a specific tag. When an article is added to the database a row is added to this table creating a link between the article and the tag via the tag_id and article_id columns in their respective tables above. If an article has 10 tags, there will be 10 rows in this table (1 for each).

  1. CREATE TABLE articles_tagged (
  2.     tagged_id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  3.     tag_id SMALLINT(5) UNSIGNED NOT NULL,
  4.     article_id INT(10) UNSIGNED NOT NULL,
  5.     KEY(tag_id), KEY(article_id),
  6.     KEY article_tag USING BTREE (tag_id, article_id),
  7.     KEY tag_article USING BTREE (article_id, tag_id)
  8. ) ENGINE=MYISAM;


As you can see all three columns are KEYs and there are two multi-column indexes. These should optimize just about any operation you perform on the table.

Queries


The following queries can be used to look up a tag by tag_name, select a list of the most popular tags and select paged result sets with a particular tag.

Tag Info

This query does a case-insensitive search for the tag specified. Use WHERE BINARY tag_name if you want a case-sensitive search.

  1. SELECT * FROM articles_tags WHERE tag_name='$tagName';


Popular Tags

Selecting a list of popular tags is done here by sorting the entire table highest to lowest based on the num_articles column. Typically you should avoid sorting large result sets and table scans but this is fine here so long as your tags table isn't too large (greater than about 20k rows).

  1. SELECT tag_name, num_articles
  2. FROM articles_tags
  3. ORDER BY num_articles DESC
  4. LIMIT $limit;


If you're concerned about performance and scalability (you should be) consider using memcached to cache the result set and only querying when the table actually changes or every few hours. This goes for almost all database activity.

Tagged Articles

The following query can be used for paged result sets tagged with a particular tag. As you can see the query needs a tag_id, sort order (ASC or DESC), an offset (the point in the sorted results to begin selecting the portion to return) and a limit (maximum records to return).

  1. SELECT artt.article_id, title, url, description, tags
  2. FROM articles_tagged AS artt, articles AS art
  3. WHERE artt.tag_id=$tagID AND art.article_id=artt.article_id
  4. ORDER BY artt.article_id $order
  5. LIMIT $offset, $limit;


When you lookup a tag by tag_name the result will include the tag_id and num_articles columns. Here I'm sorting DESC for newest to oldest. You'll probably be adding additional tables to this query and sorting by one or more columns so the sort order is a little more meaningful.

Here is how you can determine the offset and whether a valid page number was supplied.

  1. // fetch tag info
  2. // if the result is empty output a "nothing tagged with xyz message"
  3.  
  4. $numTaggedArticles = $tagInfo['num_articles'];
  5. $offset = $page * 10 - 10; // 10 per page, adjust as desired
  6. $numPages = ceil($numTaggedArticles / 10);
  7.  
  8. if ( $page > $numPages ) {
  9.     // output an invalid page message
  10.     // or redirect to a valid page
  11. }
  12.  
  13. // fetch tagged articles