Archive for the ‘WordPress’ Category.

Seems that some people don’t know much about kses. It’s really not all that complicated, but there doesn’t seem to be a lot of documentation around for it, so what the hell.

The kses package is short for “kses strips evil scripts”, and it is basically an HTML filtering mechanism. It can read HTML code, no matter how malformed it is, and filter out undesirable bits. The idea is to allow some safe subset of HTML through, so as to prevent various forms of attacks.

However, by necessity, it also includes a passable HTML parser, albeit not a complete one. Bits of it can be used for your own plugins and to make things a bit easier all around.

Note that the kses included in WordPress is a modified version of the original kses. I’ll only be discussing it here, not the original package.

Filtering

The basic use of kses is as a filter. It will eliminate any HTML that is not allowed. Here’s how it works:

$filtered = wp_kses($unfiltered, $allowed_html, $allowed_protocols);

Simple, no?

The allowed html parameter is an array of HTML that you want to allow through. The array can look sorta like this:

$allowed_html = array(
	'a' => array(
		'href' => array (),
		'title' => array ()),
	'abbr' => array(
		'title' => array ()),
	'acronym' => array(
		'title' => array ()),
	'b' => array(),
	'blockquote' => array(
		'cite' => array ()),
	'cite' => array (),
	'code' => array(),
	'del' => array(
		'datetime' => array ()),
	'em' => array (), 'i' => array (),
	'q' => array(
		'cite' => array ()),
	'strike' => array(),
	'strong' => array(),
);

As you can see, it’s rather simple. The main array is a list of HTML tags. Each of those points to an array of allowable attributes for those tags. Each of those points to an empty array, because kses is somewhat recursive in this manner.

Any HTML that is not in that list will get stripped out of the string.

The allowed protocols is basically a list of protocols for links that it will allow through. The default is this:

array ('http', 'https', 'ftp', 'ftps', 'mailto', 'news', 'irc', 'gopher', 'nntp', 'feed', 'telnet')

Anything else goes away.

That $allowed_html I gave before may look familiar. It’s the default set of allowed HTML in comments on WordPress. This is stored in a WordPress global called $allowedtags. So you can use this easily like so:

global $allowedtags;
$filtered = wp_kses($unfiltered, $allowedtags);

This is so useful that WordPress 2.9 makes it even easier:

$filtered = wp_kses_data($unfiltered);
$filtered = wp_filter_kses($unfiltered); // does the same, but will also slash escape the data

That uses the default set of allowed tags automatically. There’s another set of defaults, the allowed post tags. This is the set that is allowed to be put into Posts by non-admin users (admins have the “unfiltered_html” capability, and can put anything they like in). There’s easy ways to use that too:

$filtered = wp_kses_post($unfiltered);
$filtered = wp_filter_post_kses($unfiltered); // does the same, but will also slash escape the data

Note that because of the way they are written, they make perfect WordPress filters as well.

add_filter('the_title','wp_kses_data');

This is exactly how WordPress uses them for several internal safety checks.

Now, this is all very handy, but what if I’m not filtering? What if I’m trying to get some useful information out of HTML? Well, kses can help you there too.

Parsing

As part of the filtering mechanism, kses includes a lot of functions to parse the data and to try to find HTML in there, no matter how mangled up and weird looking it might be.

One of these functions is wp_kses_split. It’s not something that is useful directly, but it is useful to understand how kses works. The wp_kses_split function basically finds anything that looks like an HTML tag, then passes it off to wp_kses_split2.

The wp_kses_split2 function takes that tag, cleans it up a bit, and perhaps even recursively calls kses on it again, just in case. But eventually, it calls wp_kses_attr. The wp_kses_attr is what parses the attributes of any HTML tag into chunks and then removes them according to your set of allowed rules. But here’s where we finally find something useful: wp_kses_hair.

The wp_kses_hair function can parse attributes of tags into PHP lists. Here’s how you can use it.

Let’s say we’ve got a post with a bunch of images in it. We’d like to find the source (src) of all those images. This code will do it:

global $post;
if ( preg_match_all('/<img (.+?)>/', $post->post_content, $matches) ) {
        foreach ($matches[1] as $match) {
                foreach ( wp_kses_hair($match, array('http')) as $attr)
                	$img[$attr['name']] = $attr['value'];
                echo $img['src'];
        }
}

What happened there? Well, quite a bit, actually.

First we used preg_match_all to find all the img tags in a post. The regular expression in preg_match_all gave us all the attributes in the img tags, in the form of a string (that is what the “(.+?)” was for). Next, we loop through our matches, and pass each one through wp_kses_hair. It returns an array of name and value pairs. A quick loop through that to set up a more useful $img array, and voila, all we have to do is to reference $img[‘src’] to get the content of the src attribute. Equally accessible is every other attribute, such as $img[‘class’] or $img[‘id’].

Here’s an example piece of code, showing how kses rejects nonsense:

$content = 'This is a test. <img src="test.jpg" class="testclass another" id="testid" fake fake... / > More';
if ( preg_match_all('/<img (.+?)>/', $content, $matches) ) {
        foreach ($matches[1] as $match) {
                foreach ( wp_kses_hair($match, array('http')) as $attr)
                	$img[$attr['name']] = $attr['value'];
                print_r($img); // show what we got
        }
}

The resulting output from the above:

Array
(
    [src] => test.jpg
    [class] => testclass another
    [id] => testid
    [fake] =>
)

Very nice and easy way to parse selected pieces of HTML, don’t you think?

Overriding kses

Want to apply some kind of filter of your own to things? WordPress kindly adds a filter hook to all wp_kses calls: pre_kses.

function my_filter($string) {
	// do stuff to string
	return $string;
}
add_filter('pre_kses', 'my_filter');

Or maybe you want to add your own tags to the allowed list? Like, what if you wanted comments to be able to have images in them, but (sorta) safely?

global $allowedtags;
$allowedtags['img'] = array( 'src' => array () );

What if you want total control? Well, there’s a CUSTOM_TAGS define. If you set that to true, then the $allowedposttags, $allowedtags, and $allowedentitynames variables won’t get set at all. Feel free to define your own globals. I recommend copying them out of kses.php and then editing them if you want to do this.

And of course, if you only want to do a small bit of quick filtering, this sort of thing is always a valid option as well:

// only allow a hrefs through
$filtered = wp_kses($unfiltered, array( 'a' => array( 'href' => array() ) ));

Hopefully that answers some kses questions. It’s not a complete HTML parser by any means, but for quick and simple tasks, it can come in very handy.

Note: kses is NOT 100% safe. It’s very good, but it’s not a full-fledged HTML parser. It’s just safer than not using it. There’s always the possibility that somebody can figure out a way to sneak bad code through. It’s just a lot harder for them to do it.

Shortlink:

(Note to future readers: This is fixed in WordPress 3.3, so this article is now out-of-date. See http://core.trac.wordpress.org/changeset/18541Β for the patch.)

I was not aware that other people didn’t know about this until recently, but since it seems to be little known, I thought I’d write a post on the topic.

Chain LinksIn WordPress, you should never start the custom permalink string with any of these %postname%, %category%, %tag%, or %author%. (Unless you know what you’re doing, of course. πŸ™‚ )

Meaning that “%category%/%postname% ” is a bad custom permalink string. So is just “%postname%” for that matter.

Why? Well, it has to do with how the WordPress Rewrite system works.

Rewriting Explained

See, when you request a URL from a WordPress site, WordPress gets the URL and then has to parse it to determine what it is that you’re actually asking for.

It does this by using a series of rules that are built whenever you add new content to WordPress. Generally the list of rules is pretty small, but there are specific cases that can cause it to balloon way out of control.

Normal Rules

Let’s say you’re using a normal permalink string, like my preferred “%year%/%postname%”. The rules that are generated will look like this:

/robots.txt (for the privacy settings)
/feed/* (for normal feeds of any kind)
/comments/* (for comments feeds)
/search/* (pretty url for searches, not often used)
/category/%category% (category archives)
/tag/%tag% (tag archives)
/author/%author% (author archives)
/%year%/%month%/%day% (with each of those after year being optional)
/%year%/%postname% (this is the permalink string you define)
/%pagename% (any Page)

The way that system works is that it compares the URL it has to each of those in turn, from the top to the bottom. When one of them fits, then WordPress knows what to display and how to do it.

Note that the order I listed those in is is significant. Each one from the top down is less specific than the previous one.Β  For example, “robots.txt” matches only that, while “/feed/*” matches anything starting with /feed/. And so on down the list. The %postname%, %category%, %tag%, %author%, and %pagename% will match any string, but the other WordPress % ones will only match numeric fields. Like %year% is always a number.

Notice that the last one is %pagename%. This basically matches everything, because %pagename% can be anything at all. Even hierarchical pages like /plugins/whatever/something will cause this to match. It’s the fall-through position. And then, if that page doesn’t actually exist on your site, then this causes the query to trigger the 404 condition internally, which causes your theme’s 404.php to load up.

Pretty simple and straightforward, really.

Problem Rules

The problem comes in when you try to use a non-number for the beginning of your permalink string. Let’s examine those last two rules closer:

/%year%/%postname% (this is the permalink string you define)
/%pagename% (any Page)

What if you used “%category%/%postname%” for your custom permalink string? Now those last two rules are these:

/%category%/%postname% (this is the permalink string you define)
/%pagename% (any Page)

That violates our main rule, doesn’t it? That each one should be less specific than the one above it? Because %category% can match any string too, just like the %pagename% can… With this set of rules, there’s no way to view any of the Pages. Not good.

So, WordPress detects this condition and works around it. Internally, this sets a flag called “use_verbose_page_rules”, and that triggers the rewrite rebuild to make this set of rules instead:

/robots.txt (for the privacy settings)
/AAA
/BBB
/CCC (one of these for each of your Pages)
/feed/* (for normal feeds of any kind)
/comments/* (for comments feeds)
/search/* (pretty url for searches, not often used)
/category/%category% (category archives)
/tag/%tag% (tag archives)
/author/%author% (author archives)
/%year%/%month%/%day% (with each of those after year being optional)
/%category%/%postname% (this is the permalink string you define)

Now we have basically the same set of rules, except for those new ones at the top. Every Page now gets its own very specific rule, and this satisfies our main condition once again.

Pages

But what if you have a lot of Pages? I once read a post by a person who had over 50,000 Pages on his site. That is a special case obviously, but consider our lookup system. We’re going through these rules one at a time. With our first method, our rule list was only 10 rules, maximum. With this new method, you add a rule for every single Page you make. Going through 50,000 rules takes a lot longer than going through 10. And even just building that list of rules can take a long time.

Basically you’ve created a performance issue. Your Pages now won’t scale to unlimited numbers. Your site’s speed is linearly dependent on the number of Pages you have.

This is a bad thing.

Conclusion

Firstly, it’s really not any better for SEO to have the category in there, or to have just the postname there by itself. And anybody who tells you differently is wrong. If you disagree with me, then no, I’m not interested in arguing this point with you; you’re just wrong, period, end of discussion.

Secondly, shorter links are great and all, but hell, why not use a real shortlink? WordPress 3.0 now has a ShortLink API that defaults to using ?p=number links on your own domain. These will actually work for any WordPress site, even ye back unto WordPress 2.5. WordPress 3.0 just makes it nicer and easier to use these with the Shortlink API (as well as allowing plugins to make this automagically use services like wp.me or bit.ly). So use that instead.

The conclusion is, in general, just don’t do it. Leave a number, or something static, at the beginning of your permalink string and you’ll never have any sort of problems. But if you really MUST do this sort of thing, then keep your number of Pages low. Don’t try it when you have more than, say, 30-50 Pages.

Addendum

Okay, so I actually simplified things for this post. It’s actually worse than this, as verbose page rules can add much more than one rule per page, as this post demonstrates (he gets 11 per extra page!).

Shortlink:

For ages, theme authors have been adding code like this to their theme’s header.php files:

<link rel="alternate" type="application/rss+xml" title="<?php bloginfo('name'); ?> RSS Feed" href="<?php bloginfo('rss2_url'); ?>" />

No need for that any more. Remove that stuff, make sure you’ve got the wp_head() call in the header (like you should anyway), then add this to the theme’s functions.php file instead:

add_theme_support( 'automatic-feed-links' );

This automatically adds the relevant feed links everywhere on the whole site. Standard feed, comments links, category and tag archives, everything as it should be.

Shortlink:

Over on my Simple Facebook Connect page, there’s lots of comments from users with problems. Having answered these for a while now, via there and via email, I’ve come to the conclusion that people don’t search for answers to their problems.

The “How to fix the Email Domain” problem is answered on that one page no less than 6 times, for example. Almost all the rest of the problems given come from the “wrong connect URL setup” issue.

So if you don’t want to do support all the time, I think you have to make your plugin smarter. Take the most common issues you see and make the plugin auto-detect the problems. That’s what I’ve done with SFC version 0.16, for example.

Error Messages

Error messages now show up when the user configured something wrong on Facebook.

The plugin now can detect these two major causes of problems and will display an error message. It also provides a link to the right place on Facebook to go and correct these problems. It can’t actually fix the problems directly (though that is possible… small steps), but I hope this will eliminate the need for me to continually have to answer the same questions over and over again.

So my tip for the day for plugin programmers: For robustness, make your plugin check for commonplace issues. And the issues that you think will be commonplace may not be the ones you expect, so figure on having to add more and more checks every time.

Shortlink:

If you subscribe to the Twitter account that is hooked up to my blog entries, then you might have saw a test tweet earlier. That’s because I was testing. Now I’m not, and I just pushed Simple Twitter Connect 0.5 out for release.

New features:

– Made the comments plugin smarter. The Settings page allows the “Send to Twitter” button on comments to be disabled. Just leave the field blank and voila, no more checkbox.

– The Tweetmeme script now uses HTML comments properly. This prevents the tweetmeme javascript stuff from showing up in weird places, like in the feed and in the Simple Facebook Connect Publish/Share sections.

– And the big one: Automatic Tweeting on Post Publish. It supports auto-tweeting to an alternate Twitter account than your own (useful if you have a multi-user blog). Manual publishing will be coming soon, but I suspect it will not support the alternate account functionality, for simplicity. It would be confusing and hard to use.

Note: that last one is a bit beta. Don’t be surprised by bugs and odd behavior. Don’t rely on it working every time, because it probably won’t. But when it does work, it works great. Here’s this post, auto-tweeted: http://twitter.com/ottodestruct/status/10934354229 πŸ™‚

Shortlink:

Too often I see themes missing the absolute minimum requirements to make the theme actually work properly. So I figured I’d make a list of things that ALL WordPress themes need to have in them, every time. These are WordPress theme-specific things. I’m not including obvious stuff like HTML and such.

Note: These are my opinions. You may not agree with every one of these. My opinion in that case is that you’re wrong, so there’s little point in arguing with me unless you have a rock-solid reason for disagreeing with me. In other words, I’m not trying to start a flame war, nor am I interested in one. This is just a checklist that I hope theme authors will start following more often. It would make me happy if all themes had these. πŸ™‚

  • wp_head() in the HEAD section.
  • wp_footer() just before the /BODY tag. (So many themes forget this simple little thing…)
  • language_attributes() in the opening HTML tag.
  • body_class() in the BODY tag.
  • post_class() in whatever surrounds each individual post (probably a DIV).
  • Use of get_header(), get_sidebar, and get_footer inside every appropriate page template.
  • The Loop inside every page template (exception: very Custom Page Templates).
  • Proper use of widgets on the sidebars (dynamic_sidebar, register_sidebar, etc).
  • A special image.php template. Image attachments can have their own template and make theme’s have built in nice gallery-like support. You should make a special one of these to fit your layout.
  • Comments must use wp_list_comments(). Preferably without using a customized callback. But if you must make a callback, be sure to support threading properly! This is tricky without also having an end-callback. And you should use a List to do it (unordered or ordered, it doesn’t really matter). If you’re using DIVs, you’re doing it wrong.
  • The Comments Reply form should have id=”commentform”. If you change this, you’re breaking plugins.
  • Similarly, you need to include do_action(‘comment_form’, $post->ID); on your comment form too.
  • A couple of useful Custom Page templates. Like a no-sidebar one, or one that has a different number of columns. Just generic ones to let your user have a few built in options.
  • New to 2.9: Thumbnail support. Come on, this is cool stuff, every theme needs to have it.
  • New to 3.0: Forget doing your own comment form at all. Just make the call to comment_form(). Then adjust it through styling or filters or what have you. Plugin authors will love you for doing this.
  • New to 3.0: Nav-menu support. It’s cool. Your users will love you for supporting it.
  • New to 3.0: add_theme_support( ‘automatic-feed-links’ ); in the function.php. This will make it do the feed links in the head for you, automagically.

This list is by no means complete. It’s just off the top of my head for now. But honestly, too many themes don’t have even the basic ones, and I’d like to see that fixed. If you’re a theme author, help everybody out, let’s make a list of standards and adhere to them. Users hate editing their themes to support their favorite plugins, and with standards like these, we could make it so that they didn’t have to.

Shortlink:

For you people who do your own theme work:

<?php if (function_exists('wp_get_shortlink')) { ?>
<div><span class="post-shortlink">Shortlink: 
<input type='text' value='<?php echo wp_get_shortlink(get_the_ID()); ?>' onclick='this.focus(); this.select();' />
</span></div>
<?php } ?>

Basically that adds a little Shortlink: input box with the shortlink in it. Click the box, and the shortlink becomes selected for easy copy and paste.

It’ll also work with any shortlink plugin that uses the “get_shortlink” filter to override the shortlink properly (like WordPress.com Stats).

Put it somewhere in The Loop or on the Single Post pages or what have you.

Shortlink:

I updated Simple Twitter Connect to version 0.4. New stuff:

Login message

The Login plugin now correctly displays an error message when somebody attempts to login as a user that isn’t recognized yet. This should prevent confusion about why “login doesn’t work” after activating the plugin.

Tweetmeme button

New Tweetmeme button plugin added. STC was already perfectly compatible with the existing TweetMeme plugins, but for completeness (and because it was easy), I added this. It’s much like the SFC Share plugin, really.

Note: In the future, an actual STC Share plugin will be created, which will send tweets directly from your own Twitter Application, instead of through TweetMeme. That will be a sorta replacement for this, in that it will have the same basic functionality.

Shortlink support (or lack thereof)

I made the plugin give more information about shortlinks and how they work. Basically, I’m not going to support shortlinks directly. There’s just too many of them. Instead, I have put in support for generic shortlink plugins. The idea here is that anybody can make a plugin to do some form of shortlink support, and this plugin can then use it automatically.

How this idea works:
1. A shortlink plugin author implements this function: “get_shortlink($post_id)”.
2. That’s it. Do that and the plugin will use it.

Any plugin author creating a shortlink plugin should be able to easily do this. In fact, one already has. If you use the WordPress.com Stats plugin, then you automatically get “wp.me” style shortlinks. STC will use them because they implement this function in a pluggable way.

This is also the best solution for people who prefer to implement their own shortlinks in some custom manner. All they need to do is to create a get_shortlink function that returns the shortlink string, and voila, it’ll be used.

So, there you go. Enjoy.

Shortlink:

Originally published here: http://ottodestruct.com/blog/2010/dont-include-wp-load-please/

Note: There is a followup post to this one, which details an even better way than the two given below: http://ottopress.com/2010/passing-parameters-from-php-to-javascripts-in-plugins/

Time for Otto’s general griping: WordPress plugin programming edition.

Here’s a practice I see in plugins far too often:

  1. Plugin.php file adds something like this to the wp_head:
    <script src='http://example.com/wp-content/plugins/my-plugin/script.js.php'>
  2. Script.js.php has code like the following:
    <?php
    include "../../../wp-load.php";
    ?>
    ... javascript code ...
    

The reason for this sort of thing is that there’s some option or code or something that the javascript needs from the database or from WordPress or whatever. This PHP file is, basically, generating the javascript on the fly.

Usually, the case for this turns out to be something minor. The code needs the value from an option, or some flag to turn it on or off. Or whatever.

Problem is that finding wp-load.php can be a bit of a chore. I’ve seen extreme efforts to find and load that file in plugins before, including searching for it, examining the directory structure to make decent guesses, etc. This sort of thing has existed even before wp-load.php came around, with people trying to load wp-config.php themselves and such.

But the real problem is simpler: This is always the wrong way to do it.
Continue reading ‘Don’t include wp-load, please.’ »

Shortlink:

Originally posted here: http://ottodestruct.com/blog/2009/wordpress-settings-api-tutorial/

When writing the Simple Facebook Connect plugin, I investigated how the Settings API worked. It’s relatively new to WordPress (introduced in version 2.7), and many things I read said that it was much easier to use.

It is much easier to use in that it makes things nice and secure almost automatically for you. No confusion about nonces or anything along those lines. However, it’s slightly more difficult to use in that there’s very little good documentation for it. Especially for the most common case: Making your own settings page.

So, here is my little documentation attempt.

Continue reading ‘WordPress Settings API Tutorial’ »

Shortlink: