Archive for the ‘WordPress’ Category.

So, I first wrote about this topic on the wp-hackers list back in January 2009, explaining some of the scaling issues involved with having ambiguous rewrite rules and loads of static Pages in WordPress. A year later the same topic came up again in the WPTavern Forums, and later I wrote a blog post about the issue in more detail. That post generated lots of questions and responses.

In August 2011, thanks to highly valuable input from Andy Skelton which gave me a critical insight needed to make it work, and with Jon Cave and Mark Jaquith doing testing (read: breaking my patches over and over again), I was able to create a patch which fixed the problem (note: my final patch was slightly over-complicated, Jon Cave later patched it again to simplify some of the handling). This patch is now in WordPress 3.3.

So I figured I’d write up a quick post explaining the patch, how it works, and the subsequent consequences of it.

Now you have two problems.Quick Summary of the Problem

The original underlying problem is that WordPress relies on a set of rules to determine what type of page you’re looking for, and it uses only the URL itself to do this. Basically, given /some/url/like/this, WordPress has to figure out a) what you’re trying to see and b) how to query for that data from the database. The only information it has to help it do this is called the “rewrite rules”, which is basically a big list of regular expressions that turn the “pretty” URL into variables used for the main WP_Query system.

The user of the WordPress system has direct access to exactly one of these rewrite rules, which is the “Custom Structure” on the Settings->Permalink page. This custom string can be used to change what the “single post” URLs look like.

The problem is that certain custom structures will interfere with existing structures. If you make a custom structure that doesn’t start with something easily identifiable, like a number, then the default rewrite rules wouldn’t be able to cope with it.

To work around this problem, WordPress detected it and uses a flag called “verbose_rewrite_rules”, which triggers everything into changing the list of rules into more verbose ones, making the ambiguous rules into unambiguous ones. It did this by the simple method of making all Pages into static rules.

This works fine, but it doesn’t scale to large numbers of Pages. Once you have about 50-100 static Pages or so, and you’re using an ambiguous custom structure, then the system tends to fall apart. Most of the time, the ruleset grows too large to fit into a single mySQL query, meaning that the rules can no longer be properly saved in the database and must be rebuilt each time. The most obvious effect when this happens is that the number of queries on every page load rises from the below 50 range to 2000+ queries, and the site slows down to snail speed.

The “Fix”

The solution to this problem is deeper than simple optimizations. Remember that I said “WordPress relies on a set of rules to determine what type of page you’re looking for, and it uses only the URL itself to do this”. Well, to fix the problem, we have to give WordPress more input than just the URL. Specifically, we make it able to find out what Pages exist in the database.

When you use an ambiguous custom structure, WordPress 3.3 still detects that, and it still sets the verbose_page_rules flag. However, the flag now doesn’t cause the Pages to be made unambiguous in the rules. Instead, it changes the way the rules work. Specifically, it causes the following to happen:

  1. The Page rules now get put in front of the Post rules, and
  2. The actual matching process can do database queries to determine if the Page exists.

So now what happens is that the Page matching rules are run first, and for an ambiguous case, they’ll indeed match the Page rule. However, for all Page matches, a call to the get_page_by_path function is made, to see if that Page actually exists. If the Page doesn’t exist in the database, then the rule gets skipped even though it matched, and then the Post’s custom structure rules take over and will match the URL.

The Insight

The first patch I made while at WordCamp Montreal used this same approach of calling get_page_by_path, but the problem with it was that get_page_by_path was a rather expensive function to call at the time, especially for long page URLs. It was still better than what existed already, so I submitted the patch anyway, but it was less than ideal.

When I was at WordCamp San Francisco in August, hanging around all these awesome core developers, Andy Skelton commented on it and suggested a different kind of query. His suggestion didn’t actually work out directly, but it did give me the final idea which I implemented in get_page_by_path. Basically, Andy suggested splitting the URL path up into components and then querying for each one. I realized that you could split the path up by components, query for all of them at once, and then do a loop through the URL components in reverse order to determine if the URL referred to a Page that existed in the database or not.

So basically, given a URL like /aaa/bbb/ccc/ddd, get_page_by_path now does this:

SELECT ID, post_name, post_parent FROM $wpdb->posts WHERE post_name IN ('aaa','bbb','ccc','ddd') AND (post_type = 'page' OR post_type = 'attachment')

The results of this are stored in an array of objects using the ID as the array keys (a clever trick Andrew Nacin pointed out to me at the time).

By then looping through that array only once with a foreach, and comparing to the reversed form of the URL (ddd, ccc, bbb, aaa) you can make an algorithm that basically works like this:

foreach(results as res) {
  if (res->post_name = 'ddd') {
    get the parent of res from the results array
     (if it's not in the array, then it can't be the parent of ddd, which is ccc and should be in the array)
    check to make sure parent is 'ccc',
    loop back to get the parent of ccc and repeat the process until you run out of parents
  }
}

This works because all the Pages in our /aaa/bbb/ccc/ddd hierarchy must be in the resulting array from that one query, if /aaa/bbb/ccc/ddd is a valid page. So you can quickly check, using that indexed ID key, to see if they are all there by working backwards. If they are all there, then you’ll eventually get to parent = zero (which is the root) and the post_name = ‘aaa’. If they’re not there, then the loop exits and you didn’t find the Page because it doesn’t actually exist.

So using this one query, you can check for the existence of a Page any number of levels deep fairly quickly and without lots of expensive database operations.

Consequences

There are still some drawbacks though.

In theory, you could break this by making lots and lots of Pages, if you also made their hierarchy go hundreds of levels deep and thus make the loop operation take a long time. This seems unlikely to me, or at least way more unlikely than somebody making a mere couple hundreds of Pages. Also, WordPress won’t let you use the same Page name twice on the same level, so you’d really have to try for it to make this take too long.

If you try to make a URL longer than around 900K or so, the query will break. Pretty sure it’d break before that though, and anyway most people can’t remember URLs with the contents of a whole book in them. ;)

This also adds one SQL operation to every single Post page lookup. However, this is still better than having it break and try to run a few thousand queries every time in order to build rewrite rules which it can’t ultimately save. And the SQL being used is relatively fast, since post_name and post_type are both indexed fields.

Basically, for the very few and specific cases that had the problem, the speedup is dramatic and immediate. For the cases that use unambiguous rules, nothing has changed at all.

There’s still some bits that need to be fixed. Some of the code is duplicated in a couple of places and that needs to be merged. The pagename rewrite rule is a bit of a hack to avoid clashing, but it works everywhere even if it does make the regexp purist groan with dismay (for critics of this, please know that I did indeed try to do this using a regexp comment to make the difference instead of the strange and silly expression, but it doesn’t work because the regexp needs to be in a PHP array key).

Anyway, there you have it. I wrote the patch, but at least 5 other core developers contributed ideas and put in the grunt work in testing the result. A lot of brain power from these guys went into what is such a small little thing, really. A bit obscure, but I figured some people might like to read about it. :)

 

Shortlink:

Missed the news last week or so, but Twitter added oEmbed provider support to their API. While previous methods have existed to easily post tweets (such as Blackbird Pie), oEmbed is built into the WordPress core.

However, since Twitter didn’t implement oEmbed discovery, and WP has discovery off by default anyway, you have to resort to a small bit of code to make it work. Here’s that bit of code:

add_filter('oembed_providers','twitter_oembed');
function twitter_oembed($a) {
	$a['#http(s)?://(www\.)?twitter.com/.+?/status(es)?/.*#i'] = array( 'http://api.twitter.com/1/statuses/oembed.{format}', true);
	return $a;
}

Here’s what happens when you put that code in a plugin (for example) and just paste a twitter URL into a post:

It handles RT’s pretty neatly, I think. :)

This may make it into WordPress by default in the next version. Too bad they came out with it too late for inclusion in WordPress 3.3.

Shortlink:


Wrote this quick WordPress code snippet at WordCamp Louisville. It makes a /random/ URL on your site which redirects to a random post. Thought some people might find it useful.

Not a perfect little snippet, but gets the job done. Note the use of the little-used 307 redirect for temporary redirection. This is to make browsers not cache the results of the redirect, like some of them might do with a 302.

add_action('init','random_add_rewrite');
function random_add_rewrite() {
       global $wp;
       $wp->add_query_var('random');
       add_rewrite_rule('random/?$', 'index.php?random=1', 'top');
}

add_action('template_redirect','random_template');
function random_template() {
       if (get_query_var('random') == 1) {
               $posts = get_posts('post_type=post&orderby=rand&numberposts=1');
               foreach($posts as $post) {
                       $link = get_permalink($post);
               }
               wp_redirect($link,307);
               exit;
       }
}

There’s plugins that do this sort of thing too, but this is such a simple little thing that it doesn’t really need a big amount of code to do.

Edit: Added get_permalink() optimization from @Raherian.

Shortlink:

If you read “how-to” stuff for WordPress sites around the web, then you frequently run across what many people like to call “snippets”. Short bits of code or functions to do various things. I myself post snippets frequently, usually made up on the fly to solve somebody’s specific problem.

One question I get a fair amount is “where do I add this code?”

The usual answer to this for a lot of people is “in the theme’s functions.php file”. This is a quick solution, but it is often a problematic one.

The reason this has become the more or less go-to place to add these snippets is because it’s complicated to explain to a newbie how to make a plugin and activate it, or to point out the problems with modifying core code, or plugin code. Saying to look for a specific file in their theme, on the other hand, is quick and easy, and until recently theme upgrades have been fairly rare.

However, as themes get upgrades, it becomes more and more incorrect to tell people to modify them directly. And telling people how to create child themes is complex, even if it’s easy to do.

So I’d like to start a new trend, and recommend that people start making Site-Specific Plugins. Most people who run WP sites on a serious level do this in some way already, but if you make it sorta-standard practice, then it’ll make things simpler all around.

How to create a Site-Specific Plugin

1. Create a new directory in the plugins directory. Name it after the site in some fashion. For example, /wp-content/plugins/example.com or something like that.

2. Create a new php file in that directory. Name is dealer’s choice.

3. Put this in the file:

<?php
/*
Plugin Name: Site Plugin for example.com
Description: Site specific code changes for example.com
*/

4. Finally, go activate your new blank plugin on the site.

Now you have a simple and specific place to add snippets. It will survive upgrades of any sort, and you can edit it to add new code on an as needed basis. What’s more, it’s kinda sorta break-proof. If the user uses the built-in plugin editor to edit it, and they add code that breaks the site, then the editor detects that on saving the code and deactivates the plugin, preventing the “white screen of death” for their site.

This is a much better way to use “snippets” than the theme’s functions.php file, and we should really use it more often in our replies to users.

Shortlink:

Note: Post has been updated for 3.3 beta 2

WordPress plugin and theme authors could get something interesting in WordPress 3.3: a somewhat more comprehensive help screen system.

This is actually just a small part of a more long term makeover involving unifying the admin screen code base, but it’s pretty cool nevertheless. Backward compatibility is sorta fuzzy on parts of it at the moment, but not a lot of authors used these functions previously anyway, from what I can tell.

Anyway, I spent an entertaining half hour reworking the help dropdown for my SFC plugin. Here’s what it looks like in 3.3-aortic-dissection right now (note that the look and feel of this will probably change before final):

SFC Help screen in WordPress 3.3

SFC Help screen in WordPress 3.3

As you can see, the dropdown Help menu moved into the Admin bar (along with lots of other stuff), and has some tabs on the left hand side where you can make different topics and such for different parts of the plugin.

The code for this is actually fairly straightforward, albeit in a sideways sort of way…

First we have to look at the code that adds my options page in the first place. This would work with any of the add_*_page functions, so add_theme_page and such works fine too.

add_action('admin_menu', 'sfc_admin_add_page');
function sfc_admin_add_page() {
	global $sfc_options_page;
	$sfc_options_page = add_options_page(__('Simple Facebook Connect', 'sfc'), __('Simple Facebook Connect', 'sfc'), 'manage_options', 'sfc', 'sfc_options_page');
	add_action("load-$sfc_options_page", 'sfc_plugin_help');
}

This makes an options page, and then uses the assigned identifier for that page to hook the load-* action. This means that when our page is loaded, the sfc_plugin_help function will get called.

function sfc_plugin_help() {
	global $sfc_options_page;
	$screen = get_current_screen();
	if ($screen->id != $sfc_options_page)
		return;

Here, we call get_current_screen() to get the current WP_Screen object. The WP_Screen object is new to 3.3, but essentially it encapsulates an admin screen. It’s still under active development, but in the long run, it may make the task of creating admin screens much, much easier.

Anyway, right away, I do one thing and that is to check if the ID of the current screen matches the ID I was given for my options page. If it doesn’t match, then the user isn’t on my screen, and so I return having done nothing at all. This is a sort of belt and suspenders approach, since we shouldn’t be here if load-$sfc_options_page didn’t get called, but it never hurts to be sure.

The next bit is where I add my help screens. Here’s an incomplete sample of that code:

$screen->add_help_tab( array(
	'id'      => 'sfc-base',
	'title'   => __('Connecting to Facebook', 'sfc'),
	'content' => "HTML for help content",
));

$screen->add_help_tab( array(
	'id'      => 'sfc-modules',
	'title'   => __('SFC Modules', 'sfc'),
	'content' => "HTML for help content",
));

$screen->add_help_tab( array(
	'id'      => 'sfc-login',
	'title'   => __('Login and Register', 'sfc'),
	'content' => "HTML for help content",
));

I removed the content for each one to make it clearer. Essentially, I’m just calling the object’s add_help_tab function to add my new tabs to the help screen, one by one. Simple, right?

There’s other useful bits in WP_Screen too, such as adding screen options, adding a right hand side sidebar to the help dropdown, and so forth. I haven’t figured out a use for them yet, but they’re still nice to see.

If you have a complex plugin or theme, might be worth looking into this now. Might reduce your need for end-user support. Maybe. Hard to say. People rarely look at the documentation, but inline help might be worth a shot.

Shortlink:

Have you ever looked at the add_action function in WordPress? Here it is:

function add_action($tag, $function_to_add, $priority = 10, $accepted_args = 1) {
	return add_filter($tag, $function_to_add, $priority, $accepted_args);
}

I know, right? Some people’s minds just got blown.

What are Filters?

A filter is defined as a function that takes in some kind of input, modifies it, and then returns it. This is an extremely handy little concept that PHP itself uses in a ton of different ways. About half the string functions qualify as a ‘filter’ function.

Look at strrev(). It’s a simple-stupid example. It takes a string as an argument, and then returns the reverse of that string. You could use it as a filter function in WordPress, easily. Like, to reverse all your titles.

add_filter('the_title', 'strrev');

Some filters take more than one argument, but the first argument is always the thing to be modified and returned. PHP adheres to this concept too. Take the substr() function. The first argument is the string, the second and third are the start and optional length values. The returned value is the part of the string you want.

What are Actions?

An action is just a place where you call a function, and you don’t really care what it returns. The function is performing some kind of action just by being called. If you hook a function to the init action, then it runs whenever do_action(‘init’) is called.

Now, some actions have arguments too, but again, there’s still no return value.

So in a sense, a WordPress action is just a filter without the first argument and without a return value.

So why have them both?

Because there is still a conceptual difference between an action and a filter.

Filters filter things. Actions do not. And this is critically important when you’re writing a filter.

A filter function should never, ever, have unexpected side effects.

Take a quick example. Here’s a thread on the WordPress support forums where a person found that using my own SFC plugin in combination with a contact form emailer plugin caused the email from the form to be sent 3-5 times.

Why did it do this? Basically, because the contact form plugin is sending an email inside a filter function.

One of the things SFC does is to build a description meta from the content on the page. It also looks through that content for images and video, in order to build meta information to send to Facebook. In order for this to happen at the right time, the plugin must call the_content filter.

See, what if somebody puts a link to a Flickr picture on their page? In that case, oEmbed will kick in and convert that link into a nice and pretty embedded image. Same for YouTube videos. Or maybe somebody is using a gallery and there’s lots of pictures on the resulting page, but the only thing in the post_content is the gallery shortcode.

In order to get those images from the content, SFC has to do apply_filters(‘the_content’,$post_content). This makes all the other plugins and all the other bits of the system process that $post_content and return the straight HTML. Then it can go and look for images, look for video, even make a pretty 1000 character excerpt to send to Facebook.

But SFC can’t possibly know that doing apply_filters(‘the_content’,…) will cause some other plugin to go and send a freakin’ email. That’s totally unexpected. It’s just trying to filter some content. That would be like calling the strrev() function and having it make a phone call. Totally crazy.

Shortcodes

Shortcodes are a type of filter. They take in content from the shortcode, they return replacement content of some sort. They are filters, by definition. Always, always keep that in mind.

Also keep in mind that shortcodes are supposed to return the replacement content, not just echo it out.

Conclusion

So plugin authors, please, please, I’m begging you, learn this lesson well.

Filters are supposed to filter. Actions are supposed to take action.

When you mix the two up, then you cause pain for the rest of the world trying to interact with your code. My desk is starting to get covered in dents from me repeatedly banging my head into it because of this.

Shortlink: