Posts tagged ‘seo’

(Note to future readers: This is fixed in WordPress 3.3, so this article is now out-of-date. See http://core.trac.wordpress.org/changeset/18541 for the patch.)

I was not aware that other people didn’t know about this until recently, but since it seems to be little known, I thought I’d write a post on the topic.

Chain LinksIn WordPress, you should never start the custom permalink string with any of these %postname%, %category%, %tag%, or %author%. (Unless you know what you’re doing, of course. 🙂 )

Meaning that “%category%/%postname% ” is a bad custom permalink string. So is just “%postname%” for that matter.

Why? Well, it has to do with how the WordPress Rewrite system works.

Rewriting Explained

See, when you request a URL from a WordPress site, WordPress gets the URL and then has to parse it to determine what it is that you’re actually asking for.

It does this by using a series of rules that are built whenever you add new content to WordPress. Generally the list of rules is pretty small, but there are specific cases that can cause it to balloon way out of control.

Normal Rules

Let’s say you’re using a normal permalink string, like my preferred “%year%/%postname%”. The rules that are generated will look like this:

/robots.txt (for the privacy settings)
/feed/* (for normal feeds of any kind)
/comments/* (for comments feeds)
/search/* (pretty url for searches, not often used)
/category/%category% (category archives)
/tag/%tag% (tag archives)
/author/%author% (author archives)
/%year%/%month%/%day% (with each of those after year being optional)
/%year%/%postname% (this is the permalink string you define)
/%pagename% (any Page)

The way that system works is that it compares the URL it has to each of those in turn, from the top to the bottom. When one of them fits, then WordPress knows what to display and how to do it.

Note that the order I listed those in is is significant. Each one from the top down is less specific than the previous one.  For example, “robots.txt” matches only that, while “/feed/*” matches anything starting with /feed/. And so on down the list. The %postname%, %category%, %tag%, %author%, and %pagename% will match any string, but the other WordPress % ones will only match numeric fields. Like %year% is always a number.

Notice that the last one is %pagename%. This basically matches everything, because %pagename% can be anything at all. Even hierarchical pages like /plugins/whatever/something will cause this to match. It’s the fall-through position. And then, if that page doesn’t actually exist on your site, then this causes the query to trigger the 404 condition internally, which causes your theme’s 404.php to load up.

Pretty simple and straightforward, really.

Problem Rules

The problem comes in when you try to use a non-number for the beginning of your permalink string. Let’s examine those last two rules closer:

/%year%/%postname% (this is the permalink string you define)
/%pagename% (any Page)

What if you used “%category%/%postname%” for your custom permalink string? Now those last two rules are these:

/%category%/%postname% (this is the permalink string you define)
/%pagename% (any Page)

That violates our main rule, doesn’t it? That each one should be less specific than the one above it? Because %category% can match any string too, just like the %pagename% can… With this set of rules, there’s no way to view any of the Pages. Not good.

So, WordPress detects this condition and works around it. Internally, this sets a flag called “use_verbose_page_rules”, and that triggers the rewrite rebuild to make this set of rules instead:

/robots.txt (for the privacy settings)
/AAA
/BBB
/CCC (one of these for each of your Pages)
/feed/* (for normal feeds of any kind)
/comments/* (for comments feeds)
/search/* (pretty url for searches, not often used)
/category/%category% (category archives)
/tag/%tag% (tag archives)
/author/%author% (author archives)
/%year%/%month%/%day% (with each of those after year being optional)
/%category%/%postname% (this is the permalink string you define)

Now we have basically the same set of rules, except for those new ones at the top. Every Page now gets its own very specific rule, and this satisfies our main condition once again.

Pages

But what if you have a lot of Pages? I once read a post by a person who had over 50,000 Pages on his site. That is a special case obviously, but consider our lookup system. We’re going through these rules one at a time. With our first method, our rule list was only 10 rules, maximum. With this new method, you add a rule for every single Page you make. Going through 50,000 rules takes a lot longer than going through 10. And even just building that list of rules can take a long time.

Basically you’ve created a performance issue. Your Pages now won’t scale to unlimited numbers. Your site’s speed is linearly dependent on the number of Pages you have.

This is a bad thing.

Conclusion

Firstly, it’s really not any better for SEO to have the category in there, or to have just the postname there by itself. And anybody who tells you differently is wrong. If you disagree with me, then no, I’m not interested in arguing this point with you; you’re just wrong, period, end of discussion.

Secondly, shorter links are great and all, but hell, why not use a real shortlink? WordPress 3.0 now has a ShortLink API that defaults to using ?p=number links on your own domain. These will actually work for any WordPress site, even ye back unto WordPress 2.5. WordPress 3.0 just makes it nicer and easier to use these with the Shortlink API (as well as allowing plugins to make this automagically use services like wp.me or bit.ly). So use that instead.

The conclusion is, in general, just don’t do it. Leave a number, or something static, at the beginning of your permalink string and you’ll never have any sort of problems. But if you really MUST do this sort of thing, then keep your number of Pages low. Don’t try it when you have more than, say, 30-50 Pages.

Addendum

Okay, so I actually simplified things for this post. It’s actually worse than this, as verbose page rules can add much more than one rule per page, as this post demonstrates (he gets 11 per extra page!).

Shortlink: