Rants » Otto on WordPress

Archive for the ‘Rants’ Category.

Internationalization: You’re probably doing it wrong

February 28, 2012, 3:15 pm

Fun fact of the day: about 37% of WordPress downloads are for non-English, localized versions.

So as a plugin or theme author, you should be thinking of localization and internationalization (L10N and I18N) as pretty much a fact of life by this point.

Fun total guess of the day: based on my experience in browsing through the thing, roughly, ohh… all plugins and themes in the directory are doing-it-wrong in some manner.

Yes friends, even my code is guilty of this to some degree.

It’s understandable. When you’re writing the thing, generally you’re working on the functionality, not form. So you put strings in and figure “hey, no biggie, I can come back and add in the I18N stuff later.” Sometimes you even come back and do that later.

And you know what? You probably still get it wrong. I did. I still often do.

The reason you are getting it wrong is because doing I18N right is non-obvious. There’s tricks there, and rules that apply outside of the normal PHP ways of doing things.

So here’s the unbreakable laws of I18N, as pertaining to WordPress plugins and themes.

Note: This is not a tutorial, as such. You are expected to already be translating your code in some way, and to have a basic grasp on it. What I’m going to show you is stuff you are probably already doing, but which is wrong. With any luck, you will have much slapping-of-the-head during this read, since I’m hoping to give you that same insight I had, when I finally “got it”.

Also note: These are laws, folks. Not suggestions. Thou shalt not break them. They are not up for debate. What I’m going to present to you here today is provably correct. Sorry, I like a good argument as much as the next guy, but arguing against these just makes you wrong.

Basic I18N functions

First, lets quickly cover the two top translation functions. There’s more later, and the laws apply to them too, but these are the ones everybody should know and make the easiest examples.

The base translation function is __(). That’s the double-underscore function. It takes a string and translates it, according to the localization settings, then returns the string.

Then there’s the shortcut function of _e(). It does the same, but it echoes the result instead.

There’s several functions based around these, such as esc_attr_e() for example. These functions all behave identically to their counterparts put together. The esc_attr_e() function first runs the string through __(), then it does esc_attr() on it, then it echo’s it. These are named in a specific way so as to work with existing translation tools. All the following laws apply to them in the exact same way.

So, right down to it then.

Law the First: Thou shalt not use PHP variables of any kind inside a translation function’s strings.

This code is obviously wrong, or it should be:

$string = __($string, 'plugin-domain');

The reason you never do this is because translation relies on looking up strings in a table and then translating them. However, that list of strings to be translated is built by an automated process. Some code scans your PHP code, without executing it, and pulls out all the __()’s it finds, then builds the list of strings to be translated. That scanning code cannot possibly know what is inside $string.

However, sometimes it’s more subtle than that. For example, this is also wrong:

$string = __("You have $number tacos", 'plugin-domain');

The translated string here will be something like ‘You have 12 tacos’, but the scanning code can’t know what $number is in advance, nor is it feasible to expect your translators to translate all cases of what $number could be anyway.

Basically, double quoted strings in translation functions are always suspect, and probably wrong. But that rule can’t be hard and fast, because using string operations like ‘You have ‘.$number.’ tacos’ is equally wrong, for the exact same reason.

Here’s a couple of wrongs that people like to argue with:

$string = __('You have 12 tacos', $plugin_domain);
$string = __('You have 12 tacos', PLUGIN_DOMAIN);

These are both cases of the same thing. Basically, you decided that repetition is bad, so you define the plugin domain somewhere central, then reference it everywhere.

Mark Jaquith went into some detail on why this is wrong on his blog, so I will refer you to that, but I’ll also espouse a general principle here.

I said this above, and I’m going to repeat it: “that list of strings to be translated is built by an automated process“. When I’m making some code to read your code and parse it, I’m not running your code. I’m parsing it. And while the general simplistic case of building a list of strings does not require me to know your plugin’s text domain, a more complicated case might. There are legitimate reasons that we want your domain to be plain text and not some kind of variable.

For starters, what if we did something like make a system where you could translate your strings right on the wordpress.org website? Or build a system where you could enlist volunteer translators to translate your strings for you? Or made a system where people could easily download localized versions of your plugin, with the relevant translations already included?

These are but a few ideas, but for all of them, that text domain must be a plain string. Not a variable. Not a define.

Bottom line: Inside all translation functions, no PHP variables are allowed in the strings, for any reason, ever. Plain single-quoted strings only.

Law the Second: Thou shalt always translate phrases and not words.

One way people often try to get around not using variables is like the following:

$string = __('You have ', 'plugin') . $number . __(' tacos', 'plugin-domain');

No! Bad coder! Bad!

English is a language of words. Other languages are not as flexible. In some other languages, the subject comes first. Your method doesn’t work here, unless the localizer makes “tacos” into “you have” and vice-versa.

This is the correct way:

$string = sprintf( __('You have %d tacos', 'plugin-domain'), $number );

The localizer doing your translation can then write the equivalent in his language, leaving the %d in the right place. Note that in this case, the %d is not a PHP variable, it’s a placeholder for the number.

In fact, this is a good place to introduce a new function to deal with pluralization. Nobody has “1 tacos”. So we can write this:

$string = sprintf( _n('You have %d taco.', 'You have %d tacos.', $number, 'plugin-domain'), $number );

The _n function is a translation function that picks the first string if the $number (third parameter to _n) is one, or the second one if it’s more than one. We still have to use the sprintf to replace the placeholder with the actual number, but now the pluralization can be translated separately, and as part of the whole phrase. Note that the last argument to _n is still the plugin text domain to be used.

Note that some languages have more than just a singular and a plural form. You may need special handling sometimes, but this will get you there most of the time. Polish in particular has pluralization rules that have different words for 1, for numbers ending in 2, 3, and 4, and for numbers ending in 5-1 (except 1 itself). That’s okay, _n can handle these special cases with special pluralization handling in the translator files, and you generally don’t need to worry about it as long as you specify the plural form in a sane way, using the whole phrase.

You might also note that _n() is the one and only translation function that can have a PHP variable in it. This is because that third variable is always going to be a number, not a string. Therefore no automated process that builds strings from scanning code will care about what it is. You do need to take care than the $number in _n is always a number though. It will not be using that $number to insert into the string, it will be selecting which string to use based on its value.

Now, using placeholders can be complex, since sometimes things will have to be reversed. Take this example:

$string = sprintf( __('You have %d tacos and %d burritos', 'plugin-domain'), $taco_count, $burrito_count );

What if a language has some strange condition where they would never put tacos before burritos? It just wouldn’t be done. The translator would have to rewrite this to have the burrito count first. But he can’t, the placeholders are such that the $taco_count is expected to be first in the sprintf. The solution:

$string = sprintf( __('You have %1$d tacos and %2$d burritos', 'plugin-domain'), $taco_count, $burrito_count );

The %1$d and such is an alternate form that PHP allows called “argument swapping“. In this case, the translator could write it correctly, but put the burritos before the tacos by simply putting %2$d before %1$d in the string.

Note that when you use argument swapping, that single-quoted string thing becomes very important. If you have “%1$s” in double quotes, then PHP will see that $s and try to put your $s variable in there. In at least one case, this has caused an accidental Cross-Site-Scripting security issue.

So repeat after me: “I will always only use single-quoted strings in I18N functions.” There. Now you’re safe again. This probably should be a law, but since it’s safe to use double-quoted strings as long as you don’t use PHP variables (thus breaking the first law), I’ll just leave you to think about it instead. 🙂

Law the Third: Thou shalt disambiguate when needed.

When I say “comment” to you, am I talking about a comment on my site, or am I asking you to make a comment? How about “test”? Or even “buffalo”?

English has words and phrases that can have different meanings depending on context. In other languages, these same concepts can be different words or phrases entirely. To help translators out, use the _x() function for them.

The _x() function is similar to the __() function, but it has a comment section where the context can be specified.

$string = _x( 'Buffalo', 'an animal', 'plugin-domain' );
$string = _x( 'Buffalo', 'a city in New York', 'plugin-domain' );
$string = _x( 'Buffalo', 'a verb meaning to confuse somebody', 'plugin-domain' );

Though these strings are identical, the translators will get separated strings, along with the explanation of what they are, and they can translate them accordingly.

And just like __() has _e() for immediate echoing, _x() has _ex() for the same thing. Use as needed.

Finally, this last one isn’t a law so much as something that annoys me. You’re free to argue about it if you like. 🙂

Annoyance the First: Thou shalt not put unnecessary HTML markup into the translated string.

$string = sprintf( __('<h3>You have %d tacos</h3>', 'plugin-domain'), $number );

Why would you give the power to the translator to insert markup changes to your code? Markup should be eliminated from your translated strings wherever possible. Put it outside your strings instead.

$string = '<h3>'.sprintf( __('You have %d tacos', 'plugin-domain'), $number ).'</h3>';

Note that sometimes though, it’s perfectly acceptable. If you’re adding emphasis to a specific word, then that emphasis might be different in other languages. This is pretty rare though, and sometimes you can pull it out entirely. If I wanted a bold number of tacos, I’d use this:

$string = sprintf( __('You have %s tacos', 'plugin-domain'), '<strong>'.$number.'</strong>' );

Or more preferably, the _n version of same that I discussed above.

Conclusion

Like I said at the beginning, we’ve all done these. I’ve broken all these laws of I18N in the past (I know some of my plugins still do), only to figure out that I was doing-it-wrong. Hopefully, you’ve spotted something here you’ve done (or are currently doing) and have realized from reading this exactly why your code is broken. The state of I18N in plugins and themes is pretty low, and that’s something I’d really like to get fixed in the long run. With any luck, this article will help. 🙂

Disclaimer: Yes, I wrote this while hungry.

Shortlink:

Tags: i18n, l10m, pomo, tacos, translation, WordPress
Category: Code, Rants, WordPress | 115 Comments

GoDaddy Hosting = Epic Failure. Looking for a new hosting service.

October 8, 2011, 12:43 pm

I’ve been hosting my sites on GoDaddy for years, despite everybody saying that they suck and so forth. I’ve even defended them. Their interface is crap, but it’s not terrible once you get used to it. It works well enough. Their shared servers are indeed overloaded, but with a little super-caching they tended to work alright. Their new cloud hosting service is definitely faster.

But if there’s one thing I can not stand, it’s censorship.

I recently discovered that a couple of old posts of mine about decoding code used by hackers were no longer loading up. Everything else worked, but not those posts. I couldn’t even pull them up in the WordPress post editor.

After some trial and error and back and forth, I discovered that any HTTP or FTP request that contains the string “eval(base 64_decode(” or similar variants, is blocked. FTP just stops dead, as does HTTP requests, with a continual spinning loading icon. Apparently they have some form of filtering in the TCP stack somewhere that just stops those connections dead in their tracks.

(BTW, the irony here is thick. GoDaddy’s malicious code scanner was blocking my “Scanning for Malicious Code is Pointless” post.)

GoDaddy… guys, I loved your service in the past, but I have to tell you that this is a *shit* approach to security.

After some tweeting back and forth, I found out from the horse’s mouth that this is intentional and cannot be disabled.

@otto42 Ya we keep that disabled for security purposes. Sorry for any inconvenience this may cause. ^C

— Go Daddy (@GoDaddy) October 8, 2011

@otto42 At this time you can’t disable it on our shared environments. You can on our Vded and Dedicated servers. ^C

— Go Daddy (@GoDaddy) October 8, 2011

And as much as they’d like to claim this isn’t censorship:

@otto42 @heykatieben We aren’t censoring posts, the issue is that you are running into a technical limitation of our shared environment.

— Go Daddy (@GoDaddy) October 8, 2011

Guys, you’re wrong. It is censorship. I wrote that post content, and they’re refusing to serve it over HTTP. You can spin that any way you like, but GoDaddy hosting is now censoring me.

What’s more, this is a *new* problem. Those posts worked fine when I wrote them. What changed? I dunno. I did move to their 4GH hosting, but nowhere did I see in the documentation that they would be intentionally blocking my content.

Anyway, I’ve worked around the problem for now with a plugin to add spaces to the proper places in my HTML content, thus bypassing their filter. However, in the long run, this will not stand. GoDaddy thinks it’s okay to block my personal content. I disagree with them, and no amount of argument is going to make me change my mind on this topic. Blocking my own content from being served is NOT a security measure.

As you might be able to tell, I’m a bit angry.

Therefore, I am now looking for a new hosting service. Some requirements of mine:

Traffic-wise I serve about 6000 page views a day, all told. In terms of total HTTP requests, I’d say somewhere around 30,000 or so.
Bandwidth tends to be in the 1.5 GB per day range. So, 50 GB per month, say.
Obviously, any form of censorship or technical limitations is unacceptable.
SSH access is a must-have.
I don’t necessarily need dedicated hosting or virtual dedicated hosting, shared is fine if it can handle it.
Speed would be nice. GoDaddy has always sucked in terms of time-to-first-byte. Their cloud hosting made it better, but not great.
MySQL Databases. I need at least 10 of them.

So, not too heavy requirements, I’d say.

I’ve heard suggestions for DreamHost in the past, and A Small Orange has always gotten favorable reviews from people I’ve talked to, but what the heck, might as well solicit the opinions of the internets in general, yeah?

Suggestions are happily accepted. If you can provide estimated pricing or links, I’d love to take a look at them. 🙂

Shortlink:

Tags: godaddy, hosting, sucks
Category: Other, Rants | 62 Comments

SFC will NOT require you to use HTTPS

September 27, 2011, 6:13 pm

Seen this a couple of times on various sites and had a couple people ask me on Twitter about it.

Starting October 1st, Facebook will start requiring two new things:

1. OAuth 2.0 support. SFC has this in version 1.1, which will be released shortly (tomorrow, probably).

2. Canvas and Page Tab Applications will require HTTPS/SSL support.

Both of these are true.

However, some people are interpreting that second one to mean that you need to buy an SSL certificate for your own domain to use Facebook Connect type of functionality, like my plugin Simple Facebook Connect provides. This is false.

A “Canvas” application is one that runs on your site, but shows up on http://apps.facebook.com. A “Page Tab” application is similar, but can show up anywhere on the facebook.com website, depending on how it’s written (it makes a “tab” for somebody’s “Page”). Both of those have something in common: Your website’s contents are actually showing up on *.facebook.com.

Simple Facebook Connect does neither of these things. SFC is a way for you to integrate Facebook connection functionality back into your own site. It can publish stories to Facebook, it can let users comment using Facebook, etc., but the whole point of it is to take things from FB and put them on your site. SFC does not enable your site to appear under the *.facebook.com domain name.

You do NOT need an SSL certificate to use SFC, and you will continue to not need one after October 1st.

You will, however, need to upgrade to SFC 1.1. Old versions, including 1.0, will cease to function.

SFC 1.1 will most likely be released tomorrow, and should be a painless upgrade from 1.0. Sorry for the delay.

Shortlink:

Tags: https, sfc, ssl
Category: Other, Rants | 15 Comments

Actions and filters are NOT the same thing…

September 9, 2011, 3:31 pm

Have you ever looked at the add_action function in WordPress? Here it is:

function add_action($tag, $function_to_add, $priority = 10, $accepted_args = 1) {
	return add_filter($tag, $function_to_add, $priority, $accepted_args);
}

I know, right? Some people’s minds just got blown.

What are Filters?

A filter is defined as a function that takes in some kind of input, modifies it, and then returns it. This is an extremely handy little concept that PHP itself uses in a ton of different ways. About half the string functions qualify as a ‘filter’ function.

Look at strrev(). It’s a simple-stupid example. It takes a string as an argument, and then returns the reverse of that string. You could use it as a filter function in WordPress, easily. Like, to reverse all your titles.

add_filter('the_title', 'strrev');

Some filters take more than one argument, but the first argument is always the thing to be modified and returned. PHP adheres to this concept too. Take the substr() function. The first argument is the string, the second and third are the start and optional length values. The returned value is the part of the string you want.

What are Actions?

An action is just a place where you call a function, and you don’t really care what it returns. The function is performing some kind of action just by being called. If you hook a function to the init action, then it runs whenever do_action(‘init’) is called.

Now, some actions have arguments too, but again, there’s still no return value.

So in a sense, a WordPress action is just a filter without the first argument and without a return value.

So why have them both?

Because there is still a conceptual difference between an action and a filter.

Filters filter things. Actions do not. And this is critically important when you’re writing a filter.

A filter function should never, ever, have unexpected side effects.

Take a quick example. Here’s a thread on the WordPress support forums where a person found that using my own SFC plugin in combination with a contact form emailer plugin caused the email from the form to be sent 3-5 times.

Why did it do this? Basically, because the contact form plugin is sending an email inside a filter function.

One of the things SFC does is to build a description meta from the content on the page. It also looks through that content for images and video, in order to build meta information to send to Facebook. In order for this to happen at the right time, the plugin must call the_content filter.

See, what if somebody puts a link to a Flickr picture on their page? In that case, oEmbed will kick in and convert that link into a nice and pretty embedded image. Same for YouTube videos. Or maybe somebody is using a gallery and there’s lots of pictures on the resulting page, but the only thing in the post_content is the gallery shortcode.

In order to get those images from the content, SFC has to do apply_filters(‘the_content’,$post_content). This makes all the other plugins and all the other bits of the system process that $post_content and return the straight HTML. Then it can go and look for images, look for video, even make a pretty 1000 character excerpt to send to Facebook.

But SFC can’t possibly know that doing apply_filters(‘the_content’,…) will cause some other plugin to go and send a freakin’ email. That’s totally unexpected. It’s just trying to filter some content. That would be like calling the strrev() function and having it make a phone call. Totally crazy.

Shortcodes

Shortcodes are a type of filter. They take in content from the shortcode, they return replacement content of some sort. They are filters, by definition. Always, always keep that in mind.

Also keep in mind that shortcodes are supposed to return the replacement content, not just echo it out.

Conclusion

So plugin authors, please, please, I’m begging you, learn this lesson well.

Filters are supposed to filter. Actions are supposed to take action.

When you mix the two up, then you cause pain for the rest of the world trying to interact with your code. My desk is starting to get covered in dents from me repeatedly banging my head into it because of this.

Shortlink:

Tags: actions, code, filters, shortcode, WordPress
Category: Rants, WordPress | 28 Comments

WordPress 3.2 Beta Admin Tweak

May 31, 2011, 4:00 pm

Just upgraded to the beta of 3.2. I like the new admin interface overall. Really, I do. But relatively minor things tend to bug me sometimes.

For example, I don’t much care for the Site Title being so tiny and hidden at the top of the admin screens. I like the site’s name to be big and prominent, as it’s a link to the front end of the site. On multi-site, it’s awfully nice to see at a glance what site I’m on. I often click that link to go to the front end of the site easily. So trying to navigate to the front end became difficult and hit or miss with this title being so tiny.

I also don’t like seeing the Page Title being so big and having a big ol’ icon there beside it. The Page Title strikes me as kinda useless. I mean, I know what screen I’m on.

So I wrote a quick tweak plugin to fix it. I’m posting it in case it bugs you as much as it bugs me. On a side note, it’s a quick little demo of how to modify the WordPress admin CSS quickly and easily.

<?php 
/* 
Plugin Name: Embiggen Site Title for WordPress 3.2 beta
Description: Embiggen the Site Title in wp-admin. Debiggen the Page headers. Ditch the useless icon.
*/
add_action('admin_print_styles', 'big_site_title');
function big_site_title() {
?>
<style>
.wp-admin #wphead {
	height: 42px;
}
.wp-admin #wphead h1 {
	font-size: 28px;
	#font-family: "HelveticaNeue-Light","Helvetica Neue Light","Helvetica Neue",Helvetica,Arial,sans-serif; #uncomment this if you want to go to the sans-serif font
}
.wp-admin #header-logo {
	background-image: url("images/logo.gif");
	background-size:32px 32px;
	width:32px;
	height:32px;
}
.wp-admin .wrap h2 {
	font-size:16px;
	padding-top: 1px;
	padding-bottom: 0px;
	height:20px;
}
.wp-admin .icon32 {
	display:none;
}
</style>
<?php
}

Feel free to tweak further as desired. Also, WordPress might change further before 3.2 is released, so this may stop working or need further tweaking.

Shortlink:

Category: Code, Rants | 6 Comments

Why You Should Use GPL for Commercial Themes

March 17, 2011, 5:16 am

I recently had an exchange with a commercial theme developer who changed his terms away from the GPL because of an experience with some rude person who was redistributing his themes for free. Ultimately, I wasn’t able to convince him to stick with it, but there was a clear misunderstanding of the GPL in the first place there (I suspect that language differences played some part), and I thought this might make for an interesting blog post.

(Note that I’m talking about themes, but this all applies to commercial plugins as well as any other code you’re selling online.)

It’s All About Redistribution

The main barrier to the GPL that a lot of theme developers have expressed is the right of redistribution. That is to say that if you sell me a GPL’d theme, then I can turn around and give that theme to anybody I want, for free, and you have no recourse.

This viewpoint is entirely correct, however it’s missing the big picture, I feel.

Why Would I Do That?

First off, why would I take something I paid for and then give it away to everybody else for free? I mean, it’s one thing to give a copy of something to a friend of mine for his use, but it’s wholly another to go to the effort of setting up a website to distribute your theme as some kind of “screw you” policy. Did you anger me in some way? What level of maliciousness would be necessary for me to want to do that? Seems a bit overboard, and most people are ultimately reasonable.

However, this ignores the existence of John Gabriel’s Greater Internet Fuckwad Theory. Which is to say that some people are just trolling bastards who will screw with you just because they can. So let’s say that somebody gets a copy of your themes, posts them online, then refuses to take them down despite your polite requests, and waves the GPL in your face for his right to redistribute them.

Technically, this sort of person is correct, he does have the right of redistribution. But that doesn’t really matter.

What Are You Selling, Anyway?

Let’s say I made a piece of code and sold it. No GPL, no license, just me selling code to people for their own use. They have no rights to the code whatsoever. So, somebody posts that code online, for free, at some pirate site. Somebody else downloads it, and uses it, without paying me. Straightforward software “piracy”.

What have I lost here? Well, I lost the cash that I could have made from an extra sale, true, assuming that said person would have bought the code instead of pirating it. If you know people who habitually pirate code, then you know that that is a rather dubious claim, at best.

More importantly, I’ve lost a contact point between me and the user of the code. When I sell something to somebody, then I now have a relationship with that person. I get their email address. They may contact me for support. Even paid support. I may have forums for purchasers of my software to talk amongst each other in a community support system. They may buy other things I wrote.

This is the real benefit to selling code, that relationship between me as a developer and them as a purchaser of what I develop. And I’m missing that connection, until they want support from me for my product. Then I may say “well, you’re using a pirated copy of my product, if you want to join my support forums and my community and get my help, then you have to buy the product from me”. Take note of the many times that software companies have offered “clemency” sales and such, to turn pirated copies into legitimate ones.

What it comes down to is simple:

You Can’t Stop Piracy, So Don’t Try.

Think about it, you’re selling a digital file here. Files can be copied. If I buy a copy of your software, strip out any identifying marks, then post it to a thousand torrent sites, what exactly can you do to stop me from doing that?

No matter what your terms and conditions are, people still can copy your files, distribute them, edit them, do whatever they want. Unless you’re actually enforcing your terms with (potentially expensive) legal actions, then your terms are really quite meaningless. Technical measures to stop piracy don’t work, as many game companies have found out over the years. DRM doesn’t (and technically cannot) work.

Instead of viewing people redistributing your code as a bad thing, view it as an opportunity. If somebody downloads a “pirated” copy of your code, and uses it, then clearly they have a use for it. And at some point, they’re going to want upgrades. They’re going to want support. They’re going to want modifications. So make sure that you are the person they come to, and then you have an opportunity to convert that pirated download into a real sale.

The GPL doesn’t screw the developer by allowing others to share his work. The GPL enables the developer to get more contacts (and potentially more sales) by allowing others to share his work along with his name, contact information, website, etc.

Don’t fight against the right of redistribution, make it work for you instead.

Shortlink:

Tags: commercial, gpl, piracy
Category: Other, Rants | 21 Comments

How to Cope with a Hacked Site

February 2, 2011, 10:00 pm

There’s been a lot of articles on this topic over the years (I even wrote one). But I’m going to tackle this from a different angle, one that I’m not used to: A non-technical one.

Fixing a website “hack” is actually a fairly heavy technical thing to do. Most bloggers are not webmasters. They are not really technical people. They’re probably people who simply purchased a web hosting account, maybe set up WordPress using a one-click install, and started blogging. In an ideal world, this sort of setup would be perfectly secure. The fact that it’s usually not is really a problem for web hosts to figure out.

But often I find that the emails/posts I see that read “help me my site was hacked what do I do” or similar don’t get a lot of help. There’s a reason for this. People who are asking this question are not usually the type of people who are technically capable of actually fixing the problem. Guiding somebody through this process is non-trivial. Frankly, it’s kind of a pain in the ass. So those of us capable of fixing such a site (and there are plenty) are reluctant to try to help and basically offer our services for free. The amount of work is high, the frustration is equally high, and there’s not a lot of benefit in it.

So, with that in mind..

Step One: Regain control of the site

By “control”, I basically mean to get the passwords back and change them. Tell your webhost to do it if you have to, and read this codex article on how to change your WordPress password even when you can’t get into WordPress. Also, change your web hosting account password, your FTP password, the database password… Any password you have even remotely related to your site: change it. Note that doing this will very likely break the site. That’s okay, down is down, and it would be better to be down than showing hacked spammy crap to the world.

And that’s another point: take the site down, immediately. Unexpected downtime sucks, but if you’re showing spam to the world, then Google is sure to notice. If you’re down for a time, then Google understands and can cope, but if you’re showing bad things, then Google will think you’re a bad person. And you don’t want that.

The idea here is to stop the bleeding. Until you do that, you haven’t done anything at all.

Step Two: Don’t do a damn thing else

Once you have the passwords and the site is offline, leave it like that.

Seriously, don’t erase anything, don’t restore from backup, don’t do anything until you do what follows next…

Anything you do at this point destroys vital information. I cannot stress this enough.

Step Three: Hire a technically competent person to fix it for you

If you know me, then you know I rarely recommend this sort of thing. I tend to offer technical knowledge and try to help people do-it-themselves. But hey, for some people, there are times when it’s just a hell of a lot to take in. Webserver security is a complex subject, with a lot of aspects to it. There is a lot of background knowledge you need to know.

If you’re reading this and you don’t know how a webserver works, or config files, or you don’t know arcane SQL commands, or you don’t understand how the PHP code connects to the database and uses templates to generate HTML, then trust me when I tell you that you are not going to fix your website. Not really. Sure, you could probably get it running again, but you can’t fix it to where it won’t get hacked again.

So, find a website tech person. Somebody who knows what they’re doing.

(BTW, not me. Seriously, I’ve got enough to do as is. Just don’t even ask.)

How, you ask? I dunno. Look on the googles. How do you find anybody to do anything? There’s several sites out there for offering short-term jobs to tech wizards. There is the WordPress Jobs site, but note that I said you need a website person, not necessarily a WordPress person. A lot of people who know WordPress don’t know websites and security… Although many of them do and this is not an indictment on the community, it’s more a recognition of the fact that working with servers and websites in general is not really the same thing as working with WordPress. WP knowledge is useful, but generic server admin experience is much, much better in this situation.

And yes, I said HIRE. Seriously, pay up. This is a lot of work that requires special knowledge. I know that a lot of people try to run their websites cheaply and such… Look here, if you’re paying less than $300 a year to run a website, then why bother? How serious are you about your website anyway? Quality web hosting should cost you more than that, hiring a specialist for a short term day-job is going to run you a fair amount of money. Expect that and don’t give him too much hassle about it. Feel free to try to argue on the price, but please don’t be insulting. Offering $50 to fix your site is unfair, as that’s less than an hour’s pay for most consultants, and you need one with special skills here. This is a minimum of a day’s work, probably longer if your site is at all complicated. Just getting it running again without doing everything that needs to be done is probably a 4 hour job. Sure, somebody can hack together a fix in half an hour, but do you ask your automotive guy to just throw the oil at the engine until it runs? Have some respect for the fact that knowledge and skill is valuable, in any profession.

Basically, here’s what the website guy will be doing, if he knows his business.

First, he’ll probably backup the site. This includes the files, the databases, any logs that are available, everything. The idea is to grab a copy of the whole blamed thing, as it stands. This is a “cover-your-ass” scenario; he’s going to be making large scale changes to the site, so having a backup is a good idea, even if it is a hacked one. The person will need all of the relevant passwords, but don’t give them out in advance. He’ll ask for what he needs from you.

Second, if you already have regular backups (please, start making regular backups… VaultPress is invaluable in this situation and can help the process out immensely), then he’ll probably want to restore to a backup from before the hack. And yes, you very likely WILL lose content in this restoration. However, since there is a backup, the content can be recovered later, if it’s worth the trouble.

(Note, if you don’t have any backups, then he’ll try to remove the hack manually. This is error prone and difficult to do. It also takes longer and has a much lower chance of succeeding. It’s also difficult to know that you got everything out of the site. If anything is left behind, then the site can be re-hacked through hidden backdoors. This is why regular backups are critically important to have.)

Third, he’ll update everything to the latest versions and perform a security audit of the site. This means looking at all the plugins, themes, permissions on the files, the files themselves, everything. This is to make sure all the main security bases are covered and that it doesn’t get rehacked while he does the next step. They may talk about “hardening” the site.

Fourth, from that backup he made earlier, he’ll likely try to trace where the hack started from. Logs help here, as do the files themselves. This is kind of an art form. You’re looking at a static picture of a dynamic system. And unfortunately, he may not even be able to tell you what happened or how the attackers got in. Attackers often hide their traces, especially automated tools that do hacking of sites. With any luck, the basic upgrades to the system will be enough to prevent them getting in again, and a security audit by a knowing eye will eliminate the most common ways of attackers getting in. That often is enough.

Step Four: Prevention

Once your site is fixed, then you need to take steps to prevent it from happening again. The rules here are the same rules as any other technical system.

Regular backups. I can’t recommend VaultPress enough. After my site went offline for a day due to some issues with my webhost (not a hack), I lost some data. VaultPress had it and restoring it was easy. There’s other good backup solutions too, if you can’t afford $20 a month (seriously, don’t cheap out on your website folks!).
Security auditing. There’s some good plugins out there to do automatic scans of your site on a regular basis and warn you about changes. There’s good plugins to do security checks on your sites files. There’s good tools to check for issues that may be invisible to you. Use them, regularly. Or at least install them and let them run and warn you of possible threats.
Virus scanning. My website got hacked one time only. How? A trojan made it onto my computer and stole my FTP password, then an automated tool tied to that trojan tried to upload bad things to my site. It got stopped halfway (and I found and eliminated the trojan), but the point is that even tech-ninjas like me can slip up every once in a while. Have good security on your home computer as well.
Strong passwords. There is no longer any reason to use the same password everywhere. There is no longer any excuse for using a password that doesn’t look like total gibberish. Seriously, with recent hacks making this sort of thing obvious, everybody should be using a password storage solution. I tried several and settled on LastPass. Other people I know use 1Password. This sort of thing is a requirement for secure computing, and everybody should be using something like it.

These are some basic thoughts on the subject, and there’s probably others I haven’t considered. Security is an ever changing thing. The person you hire may make suggestions, and if they’re good ones, it may be worth retaining him for future work. If your site is valuable to you, then it may be worth it to invest in its future.

And yes, anybody can learn how to do this sort of thing. Probably on their own. The documentation is out there, the knowledge is freely available, and many tutorials exist. But sometimes you need to ask yourself, is this the right time for me to learn how to DIY? If you need quick action, then it might just be worth paying a pro.

Shortlink:

Tags: backups, hack, hire, pro, vaultpress, web hosting, website, WordPress
Category: Other, Rants | 38 Comments

Own Your DNS, Because It Really Does Matter

November 30, 2010, 11:34 pm

I have been trading email with several people recently, talking them through some webhosting stuff, and I just discovered how prevalent this practice was. I should have guessed it when I wrote a post about it earlier, but I didn’t know everybody was doing it this way. Most people I talked to didn’t realize there was any other way.

So it’s worth another look, I think.

Note that this post covers some basic fundamentals to start with. If you already grok DNS, you can skip ahead to the “How to Point a Domain at a Webhost” bit. For now, I’m going to use the word “server” a lot.

Why DNS is Important

So you bought a domain name for a few bucks. That’s great. What nobody told you: A domain name by itself is useless.

Really. Computers cannot connect to domain names. Computers on the internet can only connect to IP addresses. So you have to have a way to convert that domain name into an IP address. The way that happens is through DNS.

DNS? How does it work?

DNS works as a decentralized system. There’s thousands (millions?) of DNS servers in the world, all talking to each other all the time. One way to think of it is as a big tree, with connections coming from the root servers all the way up through to other servers. This is the traditional approach. But a better way to think of it is as a cloud, with connections branching every which way. DNS servers talk to other DNS servers and they don’t much care where they are on the “tree”, generally speaking.

When I make a request for my site, ottopress.com, then a few things happen.

First, my computer checks its memory to see if I’ve done this recently. I probably have, in my case, so it just uses that information if it’s pretty recent. This is known as DNS Caching, and all modern systems do it.

Next, if I don’t know the address, then I know who to ask. I ask my own DNS server. Pop open a command prompt and type ipconfig /all (or ifconfig -a) . You’ll get a big listing of your IP configuration info, and some of those are your DNS servers. Those are provided by whatever gave your computer an IP address. It might be your home router, or it might be your ISP, or maybe you entered them manually. The point is that that DNS IP is where the computer connects to in order to ask it “hey, where is ottopress.com?”.

Now, my DNS server may already know the answer because it has it in memory (DNS cache again). If not, then it knows how to find out.

Firstly, it looks at the name itself. In this case, the name ends in “.com”. That’s important. This is the “Top Level Domain” (TLD), and every TLD has its own set of servers dedicated to it. Actually, there’s a set of servers called the “root nameservers“. They live at root-servers.net. They are a set of 13 servers world wide which distribute the TLD information. (Actually, there’s a lot more computers than “servers”, since each server is separated geographically. The J server, for example, lives in 70 different places. You can see all about them at root-servers.org.)

They deliver a file called the “root zone” file. In fact, this file is rather small (it will even fit on an old-school single sided 5.25″ floppy!), but it contains some critical information that describes the functioning of the DNS system. Specifically, it specifies where things like .com and .org and all the other TLDs can be serviced from. Every DNS server on the planet needs this information, and usually has it cached for a long, long, long time. The thing rarely changes.

So, my DNS server looks at the root zone file and discovers that “.com” domains are handled by some other set of servers, so it goes to those and asks it “where can I get the info for ottopress.com?”

This is an opportune moment to talk about authority.

Authoritative Responses and What They Mean

Every domain name on the internet has to have somebody in control of it. This person is considered to be the “authority” on that domain. He in turn delegates that authority to some DNS server. That server is the only one on the whole internet who knows, for a fact, what IP addresses are connected to his names.

When I get something out of the cache of any server, the result is “non-authoritative”. That is, the DNS server gave me an answer, but it cannot guarantee that the answer is the right one. A non-authoritative answer is the fast one.

Those root servers I talked about are the authorities for the TLDs. They give out the root zone file, which says, among other things, who is the authority for all “com” domains. That server, in turn, doesn’t have the faintest clue what IP address connects to ottopress.com, but it does have information on what nameserver is the authority for ottopress.com.

So my DNS server goes and talks to this new server which the .com servers have told it is the authority. And finally, THAT server says “yes, I know for a fact that ottopress.com lives at 64.202.163.10”.

So, now that it has an answer, my DNS server relays this back to me. It also caches the information, because I’m probably going to ask it again soon, and it’s quicker if it doesn’t have to go through all that again.

How to Point a Domain at a Webhost

So, when you signed up for your webhost, if you got the domain somewhere else, then they very likely told you to “point your domain’s nameservers to X and Y”. What does this mean, exactly?

Previously, I explained how my DNS lookup went to the .com authoritative nameservers to get the nameserver information. Well, when I change my domain’s nameservers, then what I’m actually doing is changing the information on those .com servers. I’m telling them that these new nameservers are the authorities for all DNS lookups involving ottopress.com. I’m delegating my authority to those nameservers. When I do that, what I’m saying is that those nameservers are now in control over all requests on the internet that involve my domain.

Now, this normally isn’t a bad thing. Running nameservers is difficult and tricky. The syntax is arcane and strange (albeit well worth learning for your toolbox). Plus, you’re probably not in possession of all the information. After all, you hired this web hosting company to host your website for you, and they might change IPs around and such. Better for them to manage it, yes?

No.

There’s a lot of good reasons to manage your DNS yourself. For one thing, you have total control. If you want to do some tricky DNS stuff, or set up email to the domain with MX records, or things like that, then you can do so yourself. Just the ability to edit your own CNAMEs and TXT records easily is well worth it. Heck, maybe you want to get Jabber working on your domain. Who knows?

On the other hand, you have total control, and that includes total freedom to screw it up. And anyway, most web hosts have some kind of easy interface to let you add and remove specific entries yourself, so you still have some control over it.

But now we get back to the main problem, which I was talking about in that previous post. Vendor lock-in.

DNS Propagation Delay and TTL

Remember what I mentioned earlier, when your webhost said to “point your domain’s nameservers to X and Y”? That’s the root of the problem.

DNS lives and dies by a setting called “Time-To-Live” (TTL). The time-to-live is the caching factor I mentioned several times before. When a DNS server gets some new information and stores it in its memory, it also stores the TTL, which it also receives from the other server. The TTL is a time limit on how long it can cache that information. Most DNS servers obey this value extremely well. If the TTL says to cache it for 2 hours, then it caches it for 2 hours and not a second longer.

Well, that nameserver lookup from the .com servers has a TTL too, only it’s a very LONG one. See, those second-level servers are way overloaded. Think about it, every lookup of every .com domain name goes through one server (which is actually a whole bunch of computers geographically spread out too). There’s millions, probably billions, of these lookups a day. So they offload a lot of the information. Where to? Why into everybody else’s caches, of course. The nameserver results tend to have a very long TTL, on the order of a day to a week or so (mean time is about 48 hours). Even then, many DNS servers are configured, by default, to hold these results even longer. Sometimes weeks.

This is because while the IP address corresponding to a domain name might change a lot, the nameservers for one actually rarely change. You don’t switch hosts every week, for example. But your IP might change a lot, if you’re using dynamic addressing or something along those lines.

So what happens when I change that information? Well, basically, all the other servers on the internet that have my information cached will be wrong for some period of time. That period of time is call the propagation delay, because it takes that long for my change to propagate out to the rest of the world. Those caches have to expire and all the DNS systems out there then have to ask me for the new information, assuming somebody asks them for it.

So if I change my IP, it takes a couple of hours for it to get out there, because my TTL is 2 hours. The downside to this is that when your nameserver changes, it takes a friggin’ long time to take effect.

Solving the Problem

The solution is simple: Never change your nameservers.

By that, I mean to keep your nameserver in the same place for as long as you possibly can. And this means, if at all possible, don’t delegate your authority to your web host. Instead, a better option is delegate it to your domain name provider.

I use GoDaddy for my domain names. With purchase of domain, they offer free DNS. It’s not the best interface in the world (actually it’s downright clumsy), but it works well enough. I can point my A record (that’s an “address” record, which connects names to IP addresses) to anywhere I want with relative ease. I can set my own TTL on that lookup (currently it’s 2 hours). If I were to change web hosts, my outage time would be 2 hours instead of 2 days. Why? Because all I have to do is to point my domain name at my new host, after they told me what IP address to point it to. If I instead tried to change my nameservers to theirs, then my outage would be 2 days, at least, because it usually takes at least that long for a change to the .com servers to take effect everywhere. And in some parts of the internet, that outage would be a week, at least. Minimum.

There’s also other options for owning your DNS. ZoneEdit offers both free and paid services for DNS, allowing you to point your domain to them and then controlling it all you like. This allows you to take your domain with you from one registrar to another, without having to worry about your registrar not providing DNS anymore.

Or you can even run your own DNS. That’s a super advanced topic though. Even I wouldn’t attempt that without some serious resources.

Summing up

But the point is that you want your DNS to be somewhere that it’s never going to move. Or, at least, that it’s going to move so rarely that you never have to worry about it. If I changed web hosts, it’s complex, but a simple enough matter that I could do it myself. But seriously, when am I going to move my domain names between registrars? How often does that really happen? Most people pick their registrar and stick with them forever. Unless they seriously raised the rates or something, it’s unlikely I’d ever switch them off GoDaddy.

Also, you want your DNS somewhere that you have a reasonable assurance that nobody’s going to screw with it. You own your domain, but the DNS controls where your domain goes. He who controls the DNS controls the domain, and that’s what ownership is, really. Control. Owning your DNS is the ability to control your own domains. It takes some learning, but seriously, it’s way easier than you think. More interesting too.

Shortlink:

Category: Rants | 22 Comments

WordPress 3.0 and Custom Post Types

May 18, 2010, 1:42 pm

There’s been a lot of talk about custom post types, and I know many people are looking forward to it. Unfortunately, I think some (perhaps many) of those people are going to be disappointed. Custom Post Types might not be what you think they are.

I blame the naming, really. “Custom Post Types” makes the implication that these are “Posts”. They’re not. “Post Type” is really referring to the internal structure of WordPress. See, all the main content in WordPress is stored in a table called “wp_posts”. It has a post_type column, and until now, that column has had two main values in it: “post” and “page”. And there’s the big secret:

“Custom Post Types” are really Pages.

Sorta.

For a long time in the early days of WordPress, it just had Posts. But hey, no big deal, because it was just running a big Blog anyway, right? The Posts appeared on the Blog page (and in the Feed) in reverse chronological order. Each Post could appear on its own URL, using the Permalink structure.

Pages came along and changed that.

Pages don’t appear on the Blog. Or in the Feed.
Pages don’t even really have dates and times on them that usually get displayed.
Pages have their own URL at the root of the website, outside the Permalink rules.
Pages even have hierarchy in their URLs, if they want.

Pages, however, do live in the wp_posts table. So post_type exists to handle that. When WordPress is building the Blog, it looks for post_type = “post”.

Bring on the “Custom”.

Now we have these Custom Post Types. Or rather, custom post_types. Instead of “page” or “post”, we can have “custom”. Or “fred”. Whatever we like.

But how do these new post types get displayed? What do their URLs look like?

Well, these are custom, and they can be customized. You can give them their own space on the website. So if I want them to live at /custom/page-name, then I can. If I want them to have hierarchy, then I can do so. Justin Tadlock explains how they can be made to do this quite well.

But they are still not Posts.
They do not show up on the Blog.
They do not appear in the Feed.

This is a matter of definitions, really. See, the Blog is a reverse chronological order of the Posts. That it what it is defined to be. The Feed is basically the same thing, in feed form.

So all of you thinking of a custom “Podcast” post type, you’re in for a disappointment.

So what’s the big deal?

Well, all that said, custom types can have their own systems of doing things. They are custom, and as such, they are customizable.

If, for example, you wanted to have them appear on their own “blog” area, or in their own “feed”, then sure, that’s entirely doable. You can make a function to produce your custom feed. You can then call “add_feed” to add your feed. You can create single-type.php templates in your theme that will be used for your custom type. You could even make a “blog” out of your custom type.

And doing it the other way is possible too. You can adjust the “Blog” to show your type. You could change the “Feed” to show your type as well.

But these things are NOT the default way of doing things. There’s no code in there to do that, and there’s very likely not going to be. If you want your type in the Blog, in the Feed, then you have to do it yourself. The URL is NOT easy to customize and play around with. The rewrite system is unforgiving, and you have to stick within a set of rules for things to work well.

However, should you? Let’s say you make a “Podcast” custom type. You can go to the effort of putting them in the feeds and making them show special on the blog… but you could already do that with a “podcast” category. And it’s much easier to do a category and customize categories than custom types will ever be.

Something like 80% of the uses I’ve seen for “custom types” would be better served by making normal posts and using some existing method to separate them or to otherwise mark them as special.

So what are they good for?

What if you could install a forum on your site? bbPress is pretty good. But it could get all complicated to set up and such. Well, plugins are pretty easy. But all those forum posts have to go somewhere…

Custom Post Types is a way for plugins to define types of content for themselves.

A bbPress forum could store every post in the forum as its own custom post type quite easily.Or a wiki plugin could store each of its own pages as a custom post type. Things like that.

See, they’d get their own URL handling automatically, and they wouldn’t need weird database handling tricks.. It makes things much simpler and easier for those plugins to do their thing when they have the backend support for it in the core.

Some of you are thinking “Okay, so plugin authors can make better use of them. I won’t have to write a lick of code, I’ll just install a plugin that makes my type and handles the stuff for me.” And yeah, you can do that.

But then you’re wedded to that plugin. WordPress doesn’t know about your custom posts. If you remove the plugin, your custom posts are still there, but now they’re completely invisible. Can’t be pulled up, seen in the admin, the URLs all stop working…

Wrap it up, son…

Using custom post types right now is, for most people, a bad idea. Only specialized usages really exist for them… for now.

For the long term, WordPress will probably use them a lot more extensively. And plugins can make great use of them for all sorts of things. But you, as a user, probably don’t need to be messing with them. Not if you’re just creating a website or writing a blog. Not right now. Wait for the plugin and core development to catch up to the potential. Using them early leaves you open for a world of confusion and grief.

Shortlink:

Tags: 3.0, custom, custom post types, post, rant, types, WordPress
Category: Rants, WordPress | 61 Comments

Basic I18N functions

Law the First: Thou shalt not use PHP variables of any kind inside a translation function’s strings.

Law the Second: Thou shalt always translate phrases and not words.

Law the Third: Thou shalt disambiguate when needed.

Annoyance the First: Thou shalt not put unnecessary HTML markup into the translated string.

Conclusion

What are Filters?

What are Actions?

So why have them both?

Shortcodes

Conclusion

It’s All About Redistribution

Why Would I Do That?

What Are You Selling, Anyway?

You Can’t Stop Piracy, So Don’t Try.

Step One: Regain control of the site

Step Two: Don’t do a damn thing else

Step Three: Hire a technically competent person to fix it for you

Step Four: Prevention

Why DNS is Important

DNS? How does it work?

Authoritative Responses and What They Mean

How to Point a Domain at a Webhost

DNS Propagation Delay and TTL

Solving the Problem

Summing up

“Custom Post Types” are really Pages.

Bring on the “Custom”.

So what’s the big deal?

So what are they good for?

Wrap it up, son…

Recent Posts

Email Me

Categories

WP Core Contributions

Archives