Aug
27
2008

WP Super Cache 0.7 – the dupe content killer

WordPress.org user, “definitelynot” discovered a bug in the WordPress plugin, WP Super Cache that could expose blogs to duplicate content penalties. Unfortunately this affects every blog that uses the plugin in “ON” or full “Super Cache” mode, and has URLs that end with the “/” (forward slash) character. If the plugin is on “half on” mode, you’ll be fine.

The problem is that an anonymous user might visit a legitimate URL, ending with a slash, the plugin then creates a static file out of that page, which is then used when people visit the same URL. Unfortunately if someone links to that URL without the ending slash, a visiting browser or search engine bot won’t be redirected to the proper URL, they’ll be served the static html file.

For example:

  1. John visits the URL /2007/05/23/why-the-nurses-cant-go-on-strike/ on my site. WP Super Cache creates a html file of that page.
  2. In his enthusiasm for that post, John publishes a post about those zany doctors, but he forgets the ending “/”.
  3. Googlebot, seeing fresh content on John’s site, crawls it and sees the link, visits my site eventually and wonders why it’s seeing the exact same page at two different URLs.

To be fair, Google is pretty good at figuring out where duplicate content is supposed to go but it’s better to avoid the issue completely. It also only matters if there are links to your site without the ending slash. The most common will probably be to your homepage as it’s likely internal URLs will be copy/pasted.

How to Fix
You should update to version 0.7 of the plugin which checks if your blog is affected by this problem. It also has instructions for updating the mod_rewrite rules in your .htaccess. It’s fairly easy to fix. Thank you “andylav” for the mod rewrite magic!

  1. Edit the .htaccess in the root of your WordPress install.
  2. You’ll see two groups of rules that look like this:
    RewriteCond %{REQUEST_METHOD} !=POST
    RewriteCond %{QUERY_STRING} !.*s=.*
    RewriteCond %{QUERY_STRING} !.*wp-subscription-manager=.*
    RewriteCond %{QUERY_STRING} !.*attachment_id=.*
    RewriteCond %{HTTP:Cookie} !^.*(comment_author_|wordpress|wp-postpass_).*$
    RewriteCond %{HTTP:Accept-Encoding} .*gzip.*
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz -f
    RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz [L]
    
    RewriteCond %{REQUEST_METHOD} !=POST
    RewriteCond %{QUERY_STRING} !.*s=.*
    RewriteCond %{QUERY_STRING} !.*wp-subscription-manager=.*
    RewriteCond %{QUERY_STRING} !.*attachment_id=.*
    RewriteCond %{HTTP:Cookie} !^.*(comment_author_|wordpress|wp-postpass_).*$
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html -f
    RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html [L]
    
  3. You need to add the following 2 rules above each block of “RewriteCond” lines:
    RewriteCond %{REQUEST_URI} !^.*[^/]$
    RewriteCond %{REQUEST_URI} !^.*//.*$
    
  4. The rules should eventually look like this:
    RewriteCond %{REQUEST_URI} !^.*[^/]$
    RewriteCond %{REQUEST_URI} !^.*//.*$
    RewriteCond %{REQUEST_METHOD} !=POST
    RewriteCond %{QUERY_STRING} !.*s=.*
    RewriteCond %{QUERY_STRING} !.*wp-subscription-manager=.*
    RewriteCond %{QUERY_STRING} !.*attachment_id=.*
    RewriteCond %{HTTP:Cookie} !^.*(comment_author_|wordpress|wp-postpass_).*$
    RewriteCond %{HTTP:Accept-Encoding} .*gzip.*
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz -f
    RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz [L]
    
    RewriteCond %{REQUEST_URI} !^.*[^/]$
    RewriteCond %{REQUEST_URI} !^.*//.*$
    RewriteCond %{REQUEST_METHOD} !=POST
    RewriteCond %{QUERY_STRING} !.*s=.*
    RewriteCond %{QUERY_STRING} !.*wp-subscription-manager=.*
    RewriteCond %{QUERY_STRING} !.*attachment_id=.*
    RewriteCond %{HTTP:Cookie} !^.*(comment_author_|wordpress|wp-postpass_).*$
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html -f
    RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html [L]
    
  5. Or you could just delete those rules and let the plugin regenerate them for you again.

PS. Thanks also to Lloyd for noticing the “enable the plugin” link was pointing at the wrong URL, and to Ryan who spotted a minor problem with the admin page and was kind enough to send me a Tweet about it.
PPS. I’ve just tagged 0.7.1 to fix some problems with the updating of the .htaccess, mainly for new users. If 0.7 of the plugin works for you, there’s no need to upgrade!

If you like this post then please subscribe to my full RSS feed. You can also click here to subscribe by email. There are also my fabulous photos and funny videos to explore too!

Related Entries

37 Comments »

  • Charlie says:

    Hi,

    I’m trying to install WP Super Cache 0.7 w/ WP 2.6.1 on a ReadyNAS Duo. I can’t seem to not get 403 error.

    I’ve tried adding “Options +FollowSymLinks”
    I’ve been searching posts everywhere I can for the last 4 hours. The only way I can make my site work is to pull the .htaccess file.

    Where do I look next?

  • Alex Leonard says:

    I think I love the WordPress community. Brilliant team work and I’ll be updating to 0.7 across the board.

    Thanks for the update :)

  • Rishi says:

    Hey Donncha, prior to the update, I had a malformed URI issue (if you navigate to my site using Firefox, you’ll see that there is 1 error). After this update, the error still persists. I know it has to do with WP-Super Cache because disabling the plugin solves the issue. Any ideas?

  • Michael Hampton says:

    Once again you’ve forgotten about us nginx users. :(

    Here’s my modified /etc/nginx/wp-super-cache file which I include from all my WP virtual hosts. It takes care of all WordPress and wp-super-cache rewrites.

    # enable search for precompressed files ending in .gz
    # nginx needs to be complied using –-with-http_gzip_static_module
    # for this to work, comment out if using nginx from aptitude
    gzip_static on;

    # if the requested file exists, return it immediately
    if (-f $request_filename) {
    break;
    }

    set $supercache_file '';
    set $supercache_uri $request_uri;

    if ($request_uri ~ '^.*[^/]$') {
    set $supercache_uri '';
    }

    if ($request_uri ~ '^.*//.*$') {
    set $supercache_uri '';
    }

    if ($request_method = POST) {
    set $supercache_uri '';
    }

    # Using pretty permalinks, so bypass the cache for any query string
    if ($query_string) {
    set $supercache_uri '';
    }

    if ($http_cookie ~* "comment_author_|wordpress|wp-postpass_" ) {
    set $supercache_uri '';
    }

    # if we haven't bypassed the cache, specify our supercache file
    if ($supercache_uri ~ ^(.+)$) {
    set $supercache_file /wordpress/wp-content/cache/supercache/$http_host/$1index.html;
    }

    # only rewrite to the supercache file if it actually exists
    if (-f $document_root$supercache_file) {
    rewrite ^(.*)$ $supercache_file break;
    }

    # all other requests go to WordPress
    if (!-e $request_filename) {
    rewrite . /index.php last;
    }

  • Hander says:

    Thank you very much for finding and fixing this. I only use links without the the ending slash and that this might happen has never occurred to me before. It is very similar to the www. and non-www. url’s …

  • BlaKKJaKK says:

    Not sure if I is happening to me is what Charlie is describing but with WP 2.6.1 + WP Supercache 7, when I use the HTACCESS above, it makes my site unreachable (page load failures). I tried deleting the cache, disable the plugin, renabling, etc.

    I think the fact that I have Eaccelerator may be causing a problem.

    Temporarily, I have restored the original HTACESS and running on WP-Cache only (HALF ON option).

    Is anyone else in my boat?

  • Donncha says:

    Michael – it’s be great to support nginx out of the box but I don’t use it so I can’t test it. Perhaps a section of the readme if you’d like to contribute a patch?

    Charlie, BlaKKJaKK – I don’t know why you’d get 403 errors. There’s nothing in the plugin to stop you accessing the site. Are you using Bad Behaviour or any other plugin to stop bots attacking your site such as WP-Ban?

    Rishi – you’ll have to be clearer about your problem.

    If you have support queries, can you post to the wp-super-cache forum instead? It’s easier to keep issues separate there.

  • Michael Hampton says:

    Hey, don’t try to blame this on me. Bad Behavior doesn’t interfere with WP-Super Cache. :) Since the super cached pages are static, I can’t block access to them! Half-on/cached pages are protected only if you patched WP-Super Cache. (Maybe I’ll submit this to you, since it’s safe even when Bad Behavior isn’t installed.)

    I’d love to submit a patch for supporting nginx out of the box, but nginx doesn’t have a feature comparable to Apache’s .htaccess, so the rewrite rules always have to be added in manually. Maybe I could still display them on the screen though?

  • [...] in der htacess Änderungen machen. Das “Wie” erklärt dieser Artikel. Mag sein in English doch Code ist [...]

  • Donncha says:

    Michael – grasping at straws there, sorry :) It might be wp-ban, someone posted to the forum about it a few days ago.

    There’s probably some way of detecting that nginx is running instead of Apache is there? Even if it’s just a SERVER variable, or a function? We could use that to display those rules. Anyone running nginx will be comfortable enough with editing config files to manually add them I think!

  • Neil says:

    Is it possible that due to this error, super cache was not feeding the HTML files out and then would lead to DB errors due to too many queries?

  • Donncha says:

    Neil – doubtful, the problem is that the plugin was working too well, and caching too much. No idea why your site went down when it did. Sorry.

  • Neil says:

    super cache was caching, both cache and super cache, the files were in the cache folder, but for some reason when people went onto the site, cached files were not being given to them. Hope i wasnt a bit over the top the other day, it was a bit stressful ;)

  • George says:

    Nice one, looks like it worked for me. Had a bit of a scare because it spat out a long error when I upgraded it, but then I sorta disabled and re-activated it, and it worked, so all is good.
    Thankyou much, you’ve saved my GPU’s :D (i’m not mad, look it up on MediaTemple)

  • bssn says:

    I actived the plugin, but it show a blank page in wp super cache manage page. And it looks that it is a synax error.

    here’s the screenshot:

    http://i3.6.cn/cvbnm/a0/13/10/eaba9ee9fc0969eef23b329246a1da00.jpg

  • James says:

    I have the feeling that soon non-cached versions will have better results in the overall performance of a site (including server). Thanks for the fix Donncha :)

  • Hi Donncha,
    Regarding the new update 0.7.1, was the bug because 0.7 appended the htaccess file instead of re-writing it?

    I noticed it and manually changed my htaccess file. If this is the only bug/error introduced on 0.7, then I don’t need to upgrade ;)

  • BlaKKJaKK says:

    Well, I gave it another try. Its definitely something in the .htaccess. I wouldn’t think that would be cached by Eaccelrator but anyway. It works with my old rewrite rules and then I get “this page is not accessible” for everything if I try use the new htaccess. This what the rewrite rule likes look right now.

    # BEGIN WordPress

    RewriteEngine On
    RewriteBase /
    RewriteCond %{QUERY_STRING} !.*s=.*
    RewriteCond %{HTTP_COOKIE} !^.*comment_author_.*$
    RewriteCond %{HTTP_COOKIE} !^.*wordpress.*$
    RewriteCond %{HTTP_COOKIE} !^.*wp-postpass_.*$
    RewriteCond %{HTTP:Accept-Encoding} gzip
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1index.html.gz -f
    RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1index.html.gz [L]

    RewriteCond %{QUERY_STRING} !.*s=.*
    RewriteCond %{HTTP_COOKIE} !^.*comment_author_.*$
    RewriteCond %{HTTP_COOKIE} !^.*wordpress.*$
    RewriteCond %{HTTP_COOKIE} !^.*wp-postpass_.*$
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1index.html -f
    RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1index.html [L]

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]

    # END WordPress

    I am running back in half mode now I suppose old WP-Cache is better than it not working.

  • Donncha says:

    Michael – no, while changing the code that warns about updating the .htaccess I screwed up and it didn’t warn new users their .htaccess had to be updated in the first place, also didn’t check if the file was writable.

    BlakkJakk – did you install WP in a subdirectory? That seems to be causing problems for people. Make sure the paths in the .htaccess actually match what you have on your server.

  • Mavis Crewitt says:

    Thanks for spotting this guys and Doncha for fixing. When I checked Google was reporting over 92 duplicates on my site.

  • Chuck says:

    Works good here. I even tried the click to upgrade feature. Holy cow did it work nice. No more FTP! Woo Hoo!

    I just can’t wait till the Whole Wordpress product works like this, click to upgrades makes old MS-DOS heads like me smile… :D

    -Chuck

  • Nihar says:

    Hi Donncha,

    Thanks for this fix. google webmaster was showing lot of duplicates on my site.

    I have added the two lines.

    Nihar

  • [...] WP Super Cache 07 can expose you to duplicate content penalties A bug in the WP Super Cache plugin affects every blog that uses it in “ON” or full “Super Cache” mode, and has URLs that end with the “/” (forward slash) character.  If the plugin is on “half on” mode, you’ll be fine. Read Donncha’s post if you want to know how to fix it. [...]

  • Mavis says:

    With the fix in place (google reported 104 duplicates) I now hope google will stop penalising me soon. Does anyone know how long they take before putting my listings back to normal.

  • Donncha says:

    Nihar, Mavis – your duplicate pages could (and are more likely to be) category and tags pages, or even some plugin that has repeated your posts pages at multiple urls.

    It’s only if there are links to your site with the missing slash at the end that Google will count the pages affected by this bug so don’t expect a huge decrease in duplicate pages.

  • olegp says:

    Do WP Super Cache work on costom theme ?
    I’ve discover a situation when WP Super Cache dosen’t work at all.
    When i switch my blog to standard theme “delight” — it completely works. But when i switch blog to my own custom theme – it looks like WP Super Cache do not installed. I do not tweaking any settings, just switching the temes.

    Which hook must be included to theme files to switch the WP Super Cache on and working?

  • justbloglah says:

    I have the same problem too with ver 0.7.1.
    In order to avoid this problem I have changed it to half mode.

  • Milan says:

    I have one question since I don’t understand something: should new users also make this changes in .htaccess or only users who used this plugin before this release?

    Also, can you please check this topic?

    (and btw, it’s good that you use this theme again; when you installed previous I wanted to write that this one is much better :) )

  • Pande says:

    Hi Donncha, I’m using the last version with the two lines two times, but Google still has my homepage duplicate (http://www.google.es/search?hl=es&site=q%3Dpandeblog&q=pandeblog&btnG=Buscar&meta=), should I stop bots from robots.txt avoiding them to index html.gz files?

    This is mi “cached” homepage appearing in SERPs (www.pandeblog.org/wp-content/cache/supercache/www.pandeblog.org/index.html.gz)

    Thank you

    • Donncha says:

      That url shouldn’t show at all. I can only presume someone linked directly to that file.
      Perhaps you should add a 301 redirect back to the correct url in the cache folder?

      • Pande says:

        Mmmm,
        Am I on the good way using this in the .htaccess of the cache folder?

        # BEGIN supercache

        AddEncoding gzip .gz
        AddType text/html .gz

        SetEnvIfNoCase Request_URI \.gz$ no-gzip

        Header set Cache-Control ‘max-age=300, must-revalidate’

        ExpiresActive On
        ExpiresByType text/html A300

        redirect 301 /wp-content/cache/supercache/www.pandeblog.org/index.html.gz http://www.pandeblog.org/

        # END supercache

        Thanks

      • Pande says:

        The antispam ate my previous comment…

        Should I add..

        # BEGIN supercache

        redirect 301 /wp-content/cache/supercache/www.pandeblog.org/index.html.gz http://www.pandeblog.org/

        # END supercache

        And that’s it?

      • Pande says:

        Maybe using….


        RewriteEngine On
        RewriteBase /
        redirect 301 /wp-content/cache/supercache/www.pandeblog.org/index.html.gz http://www.pandeblog.org/

        At the end of the .htaccess inside cache folder?

  • Pande says:

    grrr, your blog is eating my comments ;-)
    If i reirect at root .htacces I get an endless loop and if I redirect at cache folder I get no redirect

    Can you help me please Doncha?
    My cached home page is first in google and my real home second….
    :-(
    Thanks

  • Pande says:

    Ouch, I’m really sorry about previous comments.. :-(

    • Donncha says:

      Akismet thought your comments were spam, sorry. I don’t have time to help you wrestle with mod_rewrite though, but the rules should go into wp-content/cache/.htaccess

  • Pande says:

    Thanks Donncha, nothing happens, chached page still respons… :-(

RSS feed for comments on this post. TrackBack URL


Leave a Reply

  • If this is your first time commenting here, it will be held for moderation.
  • Your website link will not appear until you leave several comments.
  • Spam comments will be sanitized.
  • All links are nofollowed.
  • Comments may be edited or deleted at the discretion of the author.
  • Thanks for commenting!

Random Tweet: @RackerHacker I think it was moon something or other, a UK dist, sometime in mid-late nineties. #

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com

Holy Shmoly! is Digg proof thanks to caching by WP Super Cache!