Archive for November 2010

I have been trading email with several people recently, talking them through some webhosting stuff, and I just discovered how prevalent this practice was. I should have guessed it when I wrote a post about it earlier, but I didn’t know everybody was doing it this way. Most people I talked to didn’t realize there was any other way.

So it’s worth another look, I think.

Note that this post covers some basic fundamentals to start with. If you already grok DNS, you can skip ahead to the “How to Point a Domain at a Webhost” bit. For now, I’m going to use the word “server” a lot.

Why DNS is Important

So you bought a domain name for a few bucks. That’s great. What nobody told you: A domain name by itself is useless.

Really. Computers cannot connect to domain names. Computers on the internet can only connect to IP addresses. So you have to have a way to convert that domain name into an IP address. The way that happens is through DNS.

DNS? How does it work?

DNS works as a decentralized system. There’s thousands (millions?) of DNS servers in the world, all talking to each other all the time. One way to think of it is as a big tree, with connections coming from the root servers all the way up through to other servers. This is the traditional approach. But a better way to think of it is as a cloud, with connections branching every which way. DNS servers talk to other DNS servers and they don’t much care where they are on the “tree”, generally speaking.

When I make a request for my site, ottopress.com, then a few things happen.

First, my computer checks its memory to see if I’ve done this recently. I probably have, in my case, so it just uses that information if it’s pretty recent. This is known as DNS Caching, and all modern systems do it.

Next, if I don’t know the address, then I know who to ask. I ask my own DNS server. Pop open a command prompt and type ipconfig /all (or ifconfig -a) . You’ll get a big listing of your IP configuration info, and some of those are your DNS servers. Those are provided by whatever gave your computer an IP address. It might be your home router, or it might be your ISP, or maybe you entered them manually. The point is that that DNS IP is where the computer connects to in order to ask it “hey, where is ottopress.com?”.

Now, my DNS server may already know the answer because it has it in memory (DNS cache again). If not, then it knows how to find out.

Firstly, it looks at the name itself. In this case, the name ends in “.com”. That’s important. This is the “Top Level Domain” (TLD), and every TLD has its own set of servers dedicated to it. Actually, there’s a set of servers called the “root nameservers“. They live at root-servers.net. They are a set of 13 servers world wide which distribute the TLD information. (Actually, there’s a lot more computers than “servers”, since each server is separated geographically. The J server, for example, lives in 70 different places. You can see all about them at root-servers.org.)

They deliver a file called the “root zone” file. In fact, this file is rather small (it will even fit on an old-school single sided 5.25″ floppy!), but it contains some critical information that describes the functioning of the DNS system. Specifically, it specifies where things like .com and .org and all the other TLDs can be serviced from. Every DNS server on the planet needs this information, and usually has it cached for a long, long, long time. The thing rarely changes.

So, my DNS server looks at the root zone file and discovers that “.com” domains are handled by some other set of servers, so it goes to those and asks it “where can I get the info for ottopress.com?”

This is an opportune moment to talk about authority.

Authoritative Responses and What They Mean

Every domain name on the internet has to have somebody in control of it. This person is considered to be the “authority” on that domain. He in turn delegates that authority to some DNS server. That server is the only one on the whole internet who knows, for a fact, what IP addresses are connected to his names.

When I get something out of the cache of any server, the result is “non-authoritative”. That is, the DNS server gave me an answer, but it cannot guarantee that the answer is the right one. A non-authoritative answer is the fast one.

Those root servers I talked about are the authorities for the TLDs. They give out the root zone file, which says, among other things, who is the authority for all “com” domains. That server, in turn, doesn’t have the faintest clue what IP address connects to ottopress.com, but it does have information on what nameserver is the authority for ottopress.com.

So my DNS server goes and talks to this new server which the .com servers have told it is the authority. And finally, THAT server says “yes, I know for a fact that ottopress.com lives at 64.202.163.10″.

So, now that it has an answer, my DNS server relays this back to me. It also caches the information, because I’m probably going to ask it again soon, and it’s quicker if it doesn’t have to go through all that again.

How to Point a Domain at a Webhost

So, when you signed up for your webhost, if you got the domain somewhere else, then they very likely told you to “point your domain’s nameservers to X and Y”.  What does this mean, exactly?

Previously, I explained how my DNS lookup went to the .com authoritative nameservers to get the nameserver information. Well, when I change my domain’s nameservers, then what I’m actually doing is changing the information on those .com servers. I’m telling them that these new nameservers are the authorities for all DNS lookups involving ottopress.com. I’m delegating my authority to those nameservers. When I do that, what I’m saying is that those nameservers are now in control over all requests on the internet that involve my domain.

Now, this normally isn’t a bad thing. Running nameservers is difficult and tricky. The syntax is arcane and strange (albeit well worth learning for your toolbox). Plus, you’re probably not in possession of all the information. After all, you hired this web hosting company to host your website for you, and they might change IPs around and such. Better for them to manage it, yes?

No.

There’s a lot of good reasons to manage your DNS yourself. For one thing, you have total control. If you want to do some tricky DNS stuff, or set up email to the domain with MX records, or things like that, then you can do so yourself. Just the ability to edit your own CNAMEs and TXT records easily is well worth it. Heck, maybe you want to get Jabber working on your domain. Who knows?

On the other hand, you have total control, and that includes total freedom to screw it up. And anyway, most web hosts have some kind of easy interface to let you add and remove specific entries yourself, so you still have some control over it.

But now we get back to the main problem, which I was talking about in that previous post. Vendor lock-in.

DNS Propagation Delay and TTL

Remember what I mentioned earlier, when your webhost said to “point your domain’s nameservers to X and Y”? That’s the root of the problem.

DNS lives and dies by a setting called “Time-To-Live” (TTL). The time-to-live is the caching factor I mentioned several times before. When a DNS server gets some new information and stores it in its memory, it also stores the TTL, which it also receives from the other server. The TTL is a time limit on how long it can cache that information. Most DNS servers obey this value extremely well. If the TTL says to cache it for 2 hours, then it caches it for 2 hours and not a second longer.

Well, that nameserver lookup from the .com servers has a TTL too, only it’s a very LONG one. See, those second-level servers are way overloaded. Think about it, every lookup of every .com domain name goes through one server (which is actually a whole bunch of computers geographically spread out too). There’s millions, probably billions, of these lookups a day. So they offload a lot of the information. Where to? Why into everybody else’s caches, of course. The nameserver results tend to have a very long TTL, on the order of a day to a week or so (mean time is about 48 hours). Even then, many DNS servers are configured, by default, to hold these results even longer. Sometimes weeks.

This is because while the IP address corresponding to a domain name might change a lot, the nameservers for one actually rarely change. You don’t switch hosts every week, for example. But your IP might change a lot, if you’re using dynamic addressing or something along those lines.

So what happens when I change that information? Well, basically, all the other servers on the internet that have my information cached will be wrong for some period of time. That period of time is call the propagation delay, because it takes that long for my change to propagate out to the rest of the world. Those caches have to expire and all the DNS systems out there then have to ask me for the new information, assuming somebody asks them for it.

So if I change my IP, it takes a couple of hours for it to get out there, because my TTL is 2 hours. The downside to this is that when your nameserver changes, it takes a friggin’ long time to take effect.

Solving the Problem

The solution is simple: Never change your nameservers.

By that, I mean to keep your nameserver in the same place for as long as you possibly can. And this means, if at all possible, don’t delegate your authority to your web host. Instead, a better option is delegate it to your domain name provider.

I use GoDaddy for my domain names. With purchase of domain, they offer free DNS. It’s not the best interface in the world (actually it’s downright clumsy), but it works well enough. I can point my A record (that’s an “address” record, which connects names to IP addresses) to anywhere I want with relative ease. I can set my own TTL on that lookup (currently it’s 2 hours). If I were to change web hosts, my outage time would be 2 hours instead of 2 days. Why? Because all I have to do is to point my domain name at my new host, after they told me what IP address to point it to. If I instead tried to change my nameservers to theirs, then my outage would be 2 days, at least, because it usually takes at least that long for a change to the .com servers to take effect everywhere. And in some parts of the internet, that outage would be a week, at least. Minimum.

There’s also other options for owning your DNS. ZoneEdit offers both free and paid services for DNS, allowing you to point your domain to them and then controlling it all you like. This allows you to take your domain with you from one registrar to another, without having to worry about your registrar not providing DNS anymore.

Or you can even run your own DNS. That’s a super advanced topic though. Even I wouldn’t attempt that without some serious resources.

Summing up

But the point is that you want your DNS to be somewhere that it’s never going to move. Or, at least, that it’s going to move so rarely that you never have to worry about it. If I changed web hosts, it’s complex, but a simple enough matter that I could do it myself. But seriously, when am I going to move my domain names between registrars? How often does that really happen? Most people pick their registrar and stick with them forever. Unless they seriously raised the rates or something, it’s unlikely I’d ever switch them off GoDaddy.

Also, you want your DNS somewhere that you have a reasonable assurance that nobody’s going to screw with it. You own your domain, but the DNS controls where your domain goes. He who controls the DNS controls the domain, and that’s what ownership is, really. Control. Owning your DNS is the ability to control your own domains. It takes some learning, but seriously, it’s way easier than you think. More interesting too.

Shortlink:

One of the new things in 3.1 that hasn’t got a lot of discussion yet is the new Advanced Taxonomy Queries mechanism. At the moment, this is still being actively developed, but the basic structure is finalized enough to give at least a semi-coherent explanation on how to use it. Since 3.1 is now going into beta, it’s unlikely to change much.

What’s a Query?

In WordPress land, a “query” is anything that gets Posts. There’s a lot of ways to do this.

  • You can use query_posts to modify the main query of the page being displayed.
  • You can create a new WP_Query, to get some set of posts to display in its own custom Loop.
  • You can do a get_posts to get some limited set of posts for display in some special way.

Regardless of the method, you have to pass parameters to it in order to specify which posts you want. If you’ve used this at all before, then you’ve used parameters like cat=123, or tag=example, or category_name=bob and so forth. When custom taxonomies were developed, you were eventually able to specify things like taxonomy=foo and term=bar and so on.

Querying for Posts

The problem with these is that people sometimes want to specify more than one of these parameters, and not all parameters worked well together. Things like cat=a and tag=b, or cat=a and tag is not b, and so forth. This is because cat and tag are both forms of taxonomies, and the code didn’t handle that well. Sure, some of it worked, for specific cases, but those were mostly there by accident rather than by design. In other words, those cases worked because the system just happened to get it right for that particular instance.

Well, all these old methods still work, but they have been made into a more comprehensive system of generically specifying arbitrary taxonomies to match against. When you specify cat=123, it’ll actually be converting it to this new method internally.

Query Strings are for Suckers

One side effect of this new system is that it doesn’t really work with query strings very well. It can be done, but it’s a lot easier and more sensible if you just start getting into the array method of doing things instead. What’s the array method? I’ll explain:

Imagine you used to have a query that looked like this:

query_posts('cat=123&author=456');

A simple query, really. The problem with it is that WordPress has to parse that query before it can use it. But there is another way to write that query as well:

query_posts(array(
  'cat' => 123,
  'author' => 456,
) );

Essentially, you separate out each individual item into its own element in an array. This actually saves you some time in the query because it doesn’t have to parse it (there’s very little savings though).

The advantage of this is that you can build your arrays using any method of array handling you like. Here’s another way to do it:

$myquery['cat'] = 123;
$myquery['author'] = 456;
query_posts($myquery);

Simple, no? But what if you have to deal with the $query_string? The $query_string is that old variable that is built by WordPress. It comes from the “default” query for whatever page you happen to be on. One of the main uses for it was to deal with “paging”. A common method of doing it was like this:

query_posts($query_string . '&cat=123&author=456');

If you use arrays, you have to deal with that yourself a bit. There’s a couple of possible ways to do it. The easiest is probably to just parse the query string yourself, then modify the result as you see fit. For example:

$myquery = wp_parse_args($query_string);
$myquery['cat'] = 123;
$myquery['author'] = 456;
query_posts($myquery);

I started out with the $query_string, used wp_parse_args to turn it into an array, then overwrote the bits I wanted to change and performed the query. This is a handy technique I’m sure a lot of people will end up using.

On to Advanced Taxonomies

Advanced Taxonomy queries use a new parameter to the query functions called “tax_query”. The tax_query is an array of arrays, with each array describing what you want it to match on.

Let’s lead by example. We want to get everything in the category of “foo” AND a tag of “bar”. Here’s our query:

$myquery['tax_query'] = array(
	array(
		'taxonomy' => 'category',
		'terms' => array('foo'),
		'field' => 'slug',
	),
	array(
		'taxonomy' => 'post_tag',
		'terms' => array('bar'),
		'field' => 'slug',
	),
);
query_posts($myquery);

Here we’ve specified two arrays, each of which describes the taxonomy and terms we want to match it against. It’ll match against both of them, and only return the results where both are true.

There’s two things of note here:

First is that the “field” is the particular field we want to match. In this case, we have the slugs we want, so we used “slug”. You could also use “term_id” if you had the ID numbers of the terms you wanted.

Second is that the “terms” is an array in itself. It doesn’t actually have to be, for this case, as we only have one term in each, but I did it this way to illustrate that we can match against multiple terms for each taxonomy. If I used array(‘bar1′,’bar2′) for the post_tag taxonomy, then I’d get anything with a category of foo AND a tag of bar1 OR bar2.

And that second item illustrates an important point as well. The matches here are actually done using the “IN” operator. So the result is always equivalent to an “include” when using multiple terms in a single taxonomy. We can actually change that to an “exclude”, however, using the “operator” parameter:

$myquery['tax_query'] = array(
	array(
		'taxonomy' => 'category',
		'terms' => array('foo', 'bar'),
		'field' => 'slug',
		'operator' => 'NOT IN',
	),
);
query_posts($myquery);

The above query will get any post that is NOT in either the “foo” or “bar” category.

But what about terms across multiple taxonomies? So far we’ve only seen those being AND’d together. Well, the “relation” parameter takes care of that:

$myquery['tax_query'] = array(
	'relation' => 'OR',
	array(
		'taxonomy' => 'category',
		'terms' => array('foo'),
		'field' => 'slug',
	),
	array(
		'taxonomy' => 'post_tag',
		'terms' => array('bar'),
		'field' => 'slug',
	),
);
query_posts($myquery);

This gets us anything with a category of foo OR a tag of bar. Note that the relation is global to the query, so it appears outside the arrays in the tax_query, but still in the tax_query array itself. For clarity, I recommend always putting it first.

Useful Gallery Example

By combining these in different ways, you can make complex queries. What’s more, you can use it with any taxonomy you like. Here’s one I recently used:

$galleryquery = wp_parse_args($query_string);
$galleryquery['tax_query'] = array(
	'relation' => 'OR',
	array(
		'taxonomy' => 'post_format',
		'terms' => array('post-format-gallery'),
		'field' => 'slug',
	),
	array(
		'taxonomy' => 'category',
		'terms' => array('gallery'),
		'field' => 'slug',
	),
);
query_posts($galleryquery);

This gets any posts in either the gallery category OR that have a gallery post-format. Handy for making a gallery page template. I used the wp_parse_args($query_string) trick to make it able to handle paging properly, among other things.

Speed Concerns

Advanced taxonomy queries are cool, but be aware that complex queries are going to be slower. Not much slower, since the code does attempt to do things smartly, but each taxonomy you add is the equivalent of adding a JOIN. While the relevant tables are indexed, joins are still slower than non-joins. So it won’t always be a good idea to build out highly complex queries.

Still, it’s better than rolling your own complicated code to get a lot of things you don’t need and then parsing them out yourself. A whole lot easier too.

Shortlink:

Saw a few tweets by @lastraw today, asking Matt and others if they could make the Add Audio function in the WordPress editor work.

Well, @lastraw, the audio function does actually work, it just doesn’t do what you expect it to do.

Basically, the WordPress uploader does provide a few different kinds of uploader buttons: image, video, audio, and media. All of these buttons behave in different ways. The Audio button in particular lets you upload an audio file, and then insert a link to that file in your post.

WordPress upload buttons in the post editor

However, the link it inserts is just a bare link. This is because WordPress doesn’t come with a flash audio player, and HTML 5 hasn’t gotten standard enough to allow sane use of the <audio> tags.

Still, plugins can modify things to make audio files embed. I just wrote a quick plugin to take those bare audio links and turn them into embedded audio players using Google’s flash audio player. This is the same player they use on Google Voice and in several other locations in the Google-o-sphere.

Example:

Example Audio File

How did I do that? Easy, I activated my plugin, then used the Add Audio button to just insert the plain link to my audio file (which I uploaded). Naturally, this audio player will only show up on your site. People reading through an RSS reader or some other method won’t see it, they’ll just see the plain audio link and can download the file.

Couple limitations on this: It only handles MP3 formats. You could conceivably use a player that could handle more formats, I only made this as an example. MP3 is the most common format in use anyway, and I didn’t want to go searching for a more complicated player to use. Also, I made it only handle links on lines by themselves. If you put an audio link inline into a paragraph or something, it won’t convert it.

Here’s the plugin code if you want to use it or modify it or whatever. It’s not the best code in the world, but then it only took 5 minutes to create, so what do you expect? ;)

<?php
/*
Plugin Name: Google MP3 Player
Plugin URI: http://ottodestruct.com/
Description: Turn MP3 links into an embedded audio player
Author: Otto
Version: 1.0
Author URI: http://ottodestruct.com
*/

add_filter ( 'the_content', 'googlemp3_embed' );
function googlemp3_embed($text) {
	// change links created by the add audio link system
	$text = preg_replace_callback ('#^(<p>)?<a.*href=[\'"](http://.*/.*\.mp3)[\'"].*>.*</a>(</p>|<br />)?#im', 'googlemp3_callback', $text);

	return $text;
}

function googlemp3_callback($match) {
	// edit width and height here
	$width = 400;
	$height = 27;
	return "{$match[1]}
<embed src='http://www.google.com/reader/ui/3523697345-audio-player.swf' flashvars='audioUrl={$match[2]}' width='{$width}' height='{$height}' pluginspage='http://www.macromedia.com/go/getflashplayer'></embed><br />
<a href='{$match[2]}'>Download MP3 file</a>
{$match[3]}";
}

This is mainly intended as a demo. There’s more full featured plugins for this sort of thing in the plugins directory. If you need to embed audio, using one of them might be a better way to go.

Shortlink:

A lot of people have been debating back and forth lately about post formats and custom post formats. This discussion also gets all confused with post types, and custom taxonomies, and categories, and tags… It’s time for some clarity. Mark had a really good post on the topic, but I think this needs to be explored in more detail.

Also, can’t have a good explanation without a bad analogy. I mean, this is the internet, right? So I should probably try to relate it to cars somehow.

A Post Type as a Car

Okay, so let’s say we have a post type. It’s called “car”. Anything that even vaguely resembles a car (including trucks, SUVs, jeeps, fire engines) gets lumped together in this post type.

We can categorize cars by type: Truck. Van. Hummer. Ford. Whatever.

We can add tags to cars: Four-door. Premium-sound. 6-disc-changer. Etc.

We can come up with custom taxonomies for them: A Color taxonomy could contain red, blue, black, silver, white, brown, etc.

The point here is that the post type is the thing itself, the various taxonomies are merely descriptions of it. You wouldn’t have both a “car” and a “truck” post type, because those are the same type of thing. If you prefer to be generic, you could make your post_type into “automobile”, which sorta fits both. That’s just a matter of naming choice.

Post types are NOUNS. Taxonomy terms are ADJECTIVES. Taxonomies themselves are related groups of adjectives.

This is why people using post types for things like Podcasts or Comic Strips or Video or something else are just fundamentally wrong. They’re using different nouns to describe the same thing, when they should be using the adjectives to sort out what those things are.

Displaying Different Things Differently

Historically with WordPress, categories have been used for more than just ways to classify posts. They’ve often been used to define different ways of displaying something.

The classic example is an “aside”. An aside has been traditionally defined as, basically, a short form post. Matt loves asides and he uses them far more often than long format posts:

A couple of aside posts on ma.tt

A couple of aside posts on ma.tt

Matt also uses a special format for his gallery posts:

One of ma.tt's gallery posts

One of ma.tt's gallery posts

Compare these to his normal long format posts:

A normal long format post on ma.tt

A normal long format post on ma.tt

You can easily see some of the differences. Asides don’t display a title. Galleries display a photo on the left hand side and the title is shortened and to the right. Long format posts have that double line underneath them, and also show the categories (essays in the above case).

The way he does these, and the way they have traditionally been done in WordPress in the past, is to co-opt categories. So he has an “Asides” category, and a “Gallery” category. In the code for his theme, he then has code that looks kinda like this:

if (in_category('asides')) {
   // stuff to display the aside format post here
} else if (in_category('gallery')) {
   // stuff to display the gallery format post here
} else {
   // code to display the normal format post here
}

Then, when he makes a post, he just picks the right category for it and that changes how it shows up on the site.

The problem is that this is a bit lame. Categories should be adjectives describing the post, not sorting them into functionally different buckets for use by the theme. Sure, you can use them that way, but that’s confusing to some users. You could use tags in the exact same way with the has_tag() function, but that doesn’t make it a good idea.

Post Formats (coming soon to a WordPress 3.1 near you)

Enter Post Formats. Tumblr has had these for a long time:

Tumblr's post formats

Tumblr's post formats

Basically they just define a format for a post to fit into at display time. So the theme could say “asides won’t have a title displayed for them”, and voila. A theme can do something like this to define what formats it supports:

add_theme_support( 'post-formats', array( 'aside', 'gallery' ) );

And it can do something like this when displaying things differently:

if ( has_post_format( 'aside' ) ) {
   // display the aside format
}

So there we go. Theme authors can define what formats they support, and they can style those formats appropriately. And we didn’t use categories at all.

Additional: For those people trying to implement this in themes, post formats also add new styles to the post_class() call. You can use .format-XXX to style based on post formats on a post.

Custom Post Formats and why you don’t need them

As soon as this was announced, naturally theme authors got up in arms, because theme authors are a rowdy bunch of folks. They like to do things their own way. So there was instantly the question of “how do I add my own format”? The answer is: you don’t, nor should you even think about it.

Why? Why prevent customization? Think of it from the perspective of the user:

  • They’ve got an existing set of posts.
  • Those posts have formats.
  • They switch to your theme, which uses some custom formats.
  • Now their own posts don’t display properly with the new theme, because it’s using a whole different set of formats.
  • Bad user experience, that is.

Now, from the perspective of a theme author, I understand the reasoning here. You want to be able to display things differently.

The problem is that you were already able to do that before.

Custom taxonomies have been around a long time. All you had to do was to a) create a custom taxonomy (call it “mytheme_formats”), b) allow users to sort posts into your taxonomy, and c) display things differently in the theme based on the terms in that taxonomy.

Post Formats is just a taxonomy. It’s a set of adjectives, describing the nouns that are the posts. So now we have “aside posts” and “gallery posts” and “chat posts” and “video posts” and so on. If you want to make your own formats, then you’ve had that ability forever. Why have you not already used it?

The answer to why you didn’t do it before is because there was no standard set of formats.

Without a standard set to work with, users won’t have any idea what your formats mean. You have to write documentation. You have to educate the user. You have to explain what this weirdness in your theme is.

Post formats changes that. Now you have a standard set of formats, and the user, having used other themes that support those formats too, will have some idea of what they mean already. But in order for this to work, themes must all use the same basic formats. There has to be a standard set of adjectives to describe the posts.

If you want to create your own set, then create your own taxonomy and box to have your set in it. But don’t complain when users don’t understand why your theme’s formats don’t mesh with the formats of every other theme that does support them.

The point of standards is to be standard. You don’t have to support the standard, but you also will have to deal with the consequences of being non-standard.

Summing up

In the end, you want to present things to your users in a method that causes the least confusion. If your user wants to display things in a single stream, then those things need to be Posts. If the user wants different things in that stream to display in different ways, then you should use a taxonomy to do that, and the post format taxonomy provides a nice and easy way to standardize that and be compatible with other themes.

If you want to go it alone with custom things, feel free, but be aware of the risk.

Shortlink: