Making Clean Urls With Apache And Php

Posted on 29 Mar 2002

in Backend

by Stephane Deschamps (notabene)

Rated 4.07 (Ratings: 15)

Want more?

More articles in Backend

Stephane Deschamps

Member info

User since: 19 Mar 2002

Articles written: 4

Nowadays we almost systematically create database-generated websites. URLs are thus written on-the-fly and we usually call pages through query string, like:

http://example.com/index.php?type=article&id=25&date_written=20020322.

What I call "clean" URLs is what you can see on evolt.org, for instance. In my example the URL would be:

http://example.com/article/200203226.

Let us assume that this article was the 6th written on the 22nd of march, 2002 and that we store this info in our database, see below, Structuring your data). The purpose of this article is to show you how to switch from the former to the latter with Apache and PHP through a simple example.

Note: there's nothing polemic about this "clean" qualification. Anybody who finds a better name is welcome to use it instead...

What's the use of switching from query strings to "clean" URLs?

Dropping query strings and using these so-called "clean" URLs:

Enables indexing by search engines. As you probably know,
when a search engine spiders your site and finds a query string it stops.
After all, how on earth can the robot know that all the pages are not the
same with just a different parameter passed to the page: what's the difference
between index.php?type=article&id=25 and index.php?style=red?
How can one expect a search engine to know the difference?

Makes for more perennial URLs. As Tim Berners-Lee himself
explained in Cool
URIs don't change, "It is the duty of a Webmaster to allocate
URIs which you will be able to stand by in 2 years, in 20 years, in 200 years".
You can see where we're aiming at.

Doesn't expose the server-side language in which you developed
the site: No extension is provided, thus enabling you to switch from one server-side
technology to another without generating multiple 404
errors. Moreover, as a side note, it's a bit more secure because it
will take potential attackers more time to know what language you're using
(but don't have too many illusions about that, the solution will eventually
be found...)

Need we say more?

Structuring your data: creating unique identifiers

First and foremost, think of the data stored. You have to be able to give a unique identifier to each article. So instead of (or in addition to) using an incremental id you must think of a unique, not-to-be-changed-in-the-future identifier.

My method in our example is this: create a field uniqid and give it the date concatenated with the rank of creation within the day. In my example my article's uniqid is 200203226 (the article was the 6th created on the 22nd of March, 2002). Of course one can imagine a situation where only one article is written per day. Thus maybe this uniqid field is redundant if you store the date the article was written. Or it can be YYYYMMDDHHmmss to have the whole date and no rank. [insert here any other naming method you like].

Note: there are many other ways to store your articles. Maybe they're each in their own text file in a dedicated includes directory and then you call them through an inclusion, maybe they're extracted from an XML file, etc. That's why I'm not too specific on the storage of data here.

The 'article' file: parsing the URL

Now we will create the article file using standard PHP code, even if, as you noticed, the file has no .php extension. We'll come back to this issue below, in Using Apache's ForceType. Here's the code for our article file:

<?php
    /* 1. parse the URL */
    $expl = explode("/",$HTTP_SERVER_VARS["REQUEST_URI"]);
    $article_id = $expl[count($expl)-1];
    /* 2 security check */
    [omitted for the sake of simplicity]
    /* 3 populate the page with uniqid's content */
    [omitted for the sake of simplicity]
?>

Here's how it works:

Parse the URL (REQUEST_URI) by exploding
it, fetch the results in an array ($expl) and assing to $article_id
the last occurence of the explosed array.

Check any security needed and stop the code on error. There's no reason
to parse the page any further if what's passed to $article_id
is not correct. In particular we'll check the following:
- Is $article_id valid? Does it really correspond to the
  format I'm looking for? For instance if I'm looking for a number of the
  form 'YYYYMMDDr' ('r' is the 'rank' in my example, remember?) and $article_id
  is a string (ie. 'foobarwhatever'), then there's something rotten in the
  kingdom of URLs.
- Does $article_id compare to an existing value of uniqid?
- In the case of a database-driven site: is there any database code passed
  through $article_id that could damage the database? Think
  of drop table instructions, for example.
- [insert here any other security check you find necessary]

Populate the page with the content corresponding to the uniqid
equal to $article_id.

Using Apache's ForceType

At first I tried creating "clean" URLs and redirecting them through 404 errors to the index.php, for example, which would then parse the URL and write the page accordingly. I ran into unsuspected trouble checking my system with Lynx: a 404 error is an error (yeah, right). So I guessed search engines would stop on this error. Damn. My world domination (© glassdog) scheme goes down the drain.

Enter ForceType. This Apache directive forces the server to consider a file as being of another type than what is expected by default (see Apache's documentation on ForceType).

We will now use the <FilesMatch> directive to specify which file is to be considered as a PHP file and not a plain ol' text file, which is the default on most servers (see Apache's documentation on <FilesMatch>).

Don't worry: even if you can't configure Apache directly because your hosting provider doesn't allow it, I've got good news: you can do it through a .htaccess file.

Here's a sample .htaccess I wrote in my root directory:

<FilesMatch "^article$">
    ForceType application/x-httpd-php
</FilesMatch>

The result is: Accessing http://www.example.com/article/200203226 tells the server to consider article as PHP, then parse the URL, and push the content whose uniqid is 200203226 into the template. And voilà. We're done.

Conclusion

To sum up this very simple example:

Call "clean" URLs when writing your code

Create as many templates as needed for each type of parsing

Specify their type in .htaccess

Your URLs now have a chance to be as permanent as their domain name.

For more server-side fun, you can include the same template in all the ForceType'd pages so that all the formatting will be written only once. You can also, like often in evolt.org, create longer URLs and parse them as a set of several parameter/value pairs. Be careful with that, though, because you may force search engines to index too many occurences of pages. Oh, well, maybe that's what you want after all...

Stéphane was a member of evolt.org's content working group for several years. He also belongs to Pompage.net, a French initiative dedicated to translating prominent web-related articles to French, and founded Paris-web.fr, a non-profit organising seminars about web accessibility and quality.

He has worked on the web since 1999 and is working for one of the largest phone companies in France as a client-side web expert, evangelising for accessibility and standards.

His notion of being cool involves red wine, sunglasses and <head> baseball caps. These days he's busy raising kids and pretending he manages the little devils.

Start of page header

Other Fine Evolt.org Sites

Navigation Starts

Submit

Article Categories

Highest rated articles

Help Support evolt.org

Main Page Content