WP Super Cache 0.7 – the dupe content killer

WordPress.org user, “definitelynot” discovered a bug in the WordPress plugin, WP Super Cache that could expose blogs to duplicate content penalties. Unfortunately this affects every blog that uses the plugin in “ON” or full “Super Cache” mode, and has URLs that end with the “/” (forward slash) character. If the plugin is on “half on” mode, you’ll be fine.

The problem is that an anonymous user might visit a legitimate URL, ending with a slash, the plugin then creates a static file out of that page, which is then used when people visit the same URL. Unfortunately if someone links to that URL without the ending slash, a visiting browser or search engine bot won’t be redirected to the proper URL, they’ll be served the static html file.

For example:

  1. John visits the URL /2007/05/23/why-the-nurses-cant-go-on-strike/ on my site. WP Super Cache creates a html file of that page.
  2. In his enthusiasm for that post, John publishes a post about those zany doctors, but he forgets the ending “/”.
  3. Googlebot, seeing fresh content on John’s site, crawls it and sees the link, visits my site eventually and wonders why it’s seeing the exact same page at two different URLs.

To be fair, Google is pretty good at figuring out where duplicate content is supposed to go but it’s better to avoid the issue completely. It also only matters if there are links to your site without the ending slash. The most common will probably be to your homepage as it’s likely internal URLs will be copy/pasted.

How to Fix
You should update to version 0.7 of the plugin which checks if your blog is affected by this problem. It also has instructions for updating the mod_rewrite rules in your .htaccess. It’s fairly easy to fix. Thank you “andylav” for the mod rewrite magic!

  1. Edit the .htaccess in the root of your WordPress install.
  2. You’ll see two groups of rules that look like this:
    RewriteCond %{REQUEST_METHOD} !=POST
    RewriteCond %{QUERY_STRING} !.*s=.*
    RewriteCond %{QUERY_STRING} !.*wp-subscription-manager=.*
    RewriteCond %{QUERY_STRING} !.*attachment_id=.*
    RewriteCond %{HTTP:Cookie} !^.*(comment_author_|wordpress|wp-postpass_).*$
    RewriteCond %{HTTP:Accept-Encoding} .*gzip.*
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz -f
    RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz [L]
    
    RewriteCond %{REQUEST_METHOD} !=POST
    RewriteCond %{QUERY_STRING} !.*s=.*
    RewriteCond %{QUERY_STRING} !.*wp-subscription-manager=.*
    RewriteCond %{QUERY_STRING} !.*attachment_id=.*
    RewriteCond %{HTTP:Cookie} !^.*(comment_author_|wordpress|wp-postpass_).*$
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html -f
    RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html [L]
    
    
  3. You need to add the following 2 rules above each block of “RewriteCond” lines:
    RewriteCond %{REQUEST_URI} !^.*[^/]$
    RewriteCond %{REQUEST_URI} !^.*//.*$
    
    
  4. The rules should eventually look like this:
    RewriteCond %{REQUEST_URI} !^.*[^/]$
    RewriteCond %{REQUEST_URI} !^.*//.*$
    RewriteCond %{REQUEST_METHOD} !=POST
    RewriteCond %{QUERY_STRING} !.*s=.*
    RewriteCond %{QUERY_STRING} !.*wp-subscription-manager=.*
    RewriteCond %{QUERY_STRING} !.*attachment_id=.*
    RewriteCond %{HTTP:Cookie} !^.*(comment_author_|wordpress|wp-postpass_).*$
    RewriteCond %{HTTP:Accept-Encoding} .*gzip.*
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz -f
    RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz [L]
    
    RewriteCond %{REQUEST_URI} !^.*[^/]$
    RewriteCond %{REQUEST_URI} !^.*//.*$
    RewriteCond %{REQUEST_METHOD} !=POST
    RewriteCond %{QUERY_STRING} !.*s=.*
    RewriteCond %{QUERY_STRING} !.*wp-subscription-manager=.*
    RewriteCond %{QUERY_STRING} !.*attachment_id=.*
    RewriteCond %{HTTP:Cookie} !^.*(comment_author_|wordpress|wp-postpass_).*$
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html -f
    RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html [L]
    
  5. Or you could just delete those rules and let the plugin regenerate them for you again.

PS. Thanks also to Lloyd for noticing the “enable the plugin” link was pointing at the wrong URL, and to Ryan who spotted a minor problem with the admin page and was kind enough to send me a Tweet about it.
PPS. I’ve just tagged 0.7.1 to fix some problems with the updating of the .htaccess, mainly for new users. If 0.7 of the plugin works for you, there’s no need to upgrade!


You might also like

If you like this post then please subscribe to my full RSS feed. You can also click here to subscribe by email. There are also my fabulous photos to explore too!

37 thoughts on “WP Super Cache 0.7 – the dupe content killer


  1. Hi,

    I’m trying to install WP Super Cache 0.7 w/ WP 2.6.1 on a ReadyNAS Duo. I can’t seem to not get 403 error.

    I’ve tried adding “Options +FollowSymLinks”
    I’ve been searching posts everywhere I can for the last 4 hours. The only way I can make my site work is to pull the .htaccess file.

    Where do I look next?


  2. I think I love the WordPress community. Brilliant team work and I’ll be updating to 0.7 across the board.

    Thanks for the update :)


  3. Hey Donncha, prior to the update, I had a malformed URI issue (if you navigate to my site using Firefox, you’ll see that there is 1 error). After this update, the error still persists. I know it has to do with WP-Super Cache because disabling the plugin solves the issue. Any ideas?


  4. Once again you’ve forgotten about us nginx users. :(

    Here’s my modified /etc/nginx/wp-super-cache file which I include from all my WP virtual hosts. It takes care of all WordPress and wp-super-cache rewrites.

    # enable search for precompressed files ending in .gz
    # nginx needs to be complied using –-with-http_gzip_static_module
    # for this to work, comment out if using nginx from aptitude
    gzip_static on;

    # if the requested file exists, return it immediately
    if (-f $request_filename) {
    break;
    }

    set $supercache_file '';
    set $supercache_uri $request_uri;

    if ($request_uri ~ '^.*[^/]$') {
    set $supercache_uri '';
    }

    if ($request_uri ~ '^.*//.*$') {
    set $supercache_uri '';
    }

    if ($request_method = POST) {
    set $supercache_uri '';
    }

    # Using pretty permalinks, so bypass the cache for any query string
    if ($query_string) {
    set $supercache_uri '';
    }

    if ($http_cookie ~* "comment_author_|wordpress|wp-postpass_" ) {
    set $supercache_uri '';
    }

    # if we haven't bypassed the cache, specify our supercache file
    if ($supercache_uri ~ ^(.+)$) {
    set $supercache_file /wordpress/wp-content/cache/supercache/$http_host/$1index.html;
    }

    # only rewrite to the supercache file if it actually exists
    if (-f $document_root$supercache_file) {
    rewrite ^(.*)$ $supercache_file break;
    }

    # all other requests go to WordPress
    if (!-e $request_filename) {
    rewrite . /index.php last;
    }


  5. Thank you very much for finding and fixing this. I only use links without the the ending slash and that this might happen has never occurred to me before. It is very similar to the www. and non-www. url’s …


  6. Not sure if I is happening to me is what Charlie is describing but with WP 2.6.1 + WP Supercache 7, when I use the HTACCESS above, it makes my site unreachable (page load failures). I tried deleting the cache, disable the plugin, renabling, etc.

    I think the fact that I have Eaccelerator may be causing a problem.

    Temporarily, I have restored the original HTACESS and running on WP-Cache only (HALF ON option).

    Is anyone else in my boat?


  7. Michael – it’s be great to support nginx out of the box but I don’t use it so I can’t test it. Perhaps a section of the readme if you’d like to contribute a patch?

    Charlie, BlaKKJaKK – I don’t know why you’d get 403 errors. There’s nothing in the plugin to stop you accessing the site. Are you using Bad Behaviour or any other plugin to stop bots attacking your site such as WP-Ban?

    Rishi – you’ll have to be clearer about your problem.

    If you have support queries, can you post to the wp-super-cache forum instead? It’s easier to keep issues separate there.


  8. Hey, don’t try to blame this on me. Bad Behavior doesn’t interfere with WP-Super Cache. :) Since the super cached pages are static, I can’t block access to them! Half-on/cached pages are protected only if you patched WP-Super Cache. (Maybe I’ll submit this to you, since it’s safe even when Bad Behavior isn’t installed.)

    I’d love to submit a patch for supporting nginx out of the box, but nginx doesn’t have a feature comparable to Apache’s .htaccess, so the rewrite rules always have to be added in manually. Maybe I could still display them on the screen though?


  9. Michael – grasping at straws there, sorry :) It might be wp-ban, someone posted to the forum about it a few days ago.

    There’s probably some way of detecting that nginx is running instead of Apache is there? Even if it’s just a SERVER variable, or a function? We could use that to display those rules. Anyone running nginx will be comfortable enough with editing config files to manually add them I think!


  10. Is it possible that due to this error, super cache was not feeding the HTML files out and then would lead to DB errors due to too many queries?


  11. Neil – doubtful, the problem is that the plugin was working too well, and caching too much. No idea why your site went down when it did. Sorry.


  12. super cache was caching, both cache and super cache, the files were in the cache folder, but for some reason when people went onto the site, cached files were not being given to them. Hope i wasnt a bit over the top the other day, it was a bit stressful ;)


  13. Nice one, looks like it worked for me. Had a bit of a scare because it spat out a long error when I upgraded it, but then I sorta disabled and re-activated it, and it worked, so all is good.
    Thankyou much, you’ve saved my GPU’s :D (i’m not mad, look it up on MediaTemple)


  14. I have the feeling that soon non-cached versions will have better results in the overall performance of a site (including server). Thanks for the fix Donncha :)


  15. Hi Donncha,
    Regarding the new update 0.7.1, was the bug because 0.7 appended the htaccess file instead of re-writing it?

    I noticed it and manually changed my htaccess file. If this is the only bug/error introduced on 0.7, then I don’t need to upgrade ;)


  16. Well, I gave it another try. Its definitely something in the .htaccess. I wouldn’t think that would be cached by Eaccelrator but anyway. It works with my old rewrite rules and then I get “this page is not accessible” for everything if I try use the new htaccess. This what the rewrite rule likes look right now.

    # BEGIN WordPress

    RewriteEngine On
    RewriteBase /
    RewriteCond %{QUERY_STRING} !.*s=.*
    RewriteCond %{HTTP_COOKIE} !^.*comment_author_.*$
    RewriteCond %{HTTP_COOKIE} !^.*wordpress.*$
    RewriteCond %{HTTP_COOKIE} !^.*wp-postpass_.*$
    RewriteCond %{HTTP:Accept-Encoding} gzip
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1index.html.gz -f
    RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1index.html.gz [L]

    RewriteCond %{QUERY_STRING} !.*s=.*
    RewriteCond %{HTTP_COOKIE} !^.*comment_author_.*$
    RewriteCond %{HTTP_COOKIE} !^.*wordpress.*$
    RewriteCond %{HTTP_COOKIE} !^.*wp-postpass_.*$
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1index.html -f
    RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1index.html [L]

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]

    # END WordPress

    I am running back in half mode now I suppose old WP-Cache is better than it not working.


  17. Michael – no, while changing the code that warns about updating the .htaccess I screwed up and it didn’t warn new users their .htaccess had to be updated in the first place, also didn’t check if the file was writable.

    BlakkJakk – did you install WP in a subdirectory? That seems to be causing problems for people. Make sure the paths in the .htaccess actually match what you have on your server.


  18. Thanks for spotting this guys and Doncha for fixing. When I checked Google was reporting over 92 duplicates on my site.


  19. Works good here. I even tried the click to upgrade feature. Holy cow did it work nice. No more FTP! Woo Hoo!

    I just can’t wait till the Whole WordPress product works like this, click to upgrades makes old MS-DOS heads like me smile… :D

    -Chuck


  20. Hi Donncha,

    Thanks for this fix. google webmaster was showing lot of duplicates on my site.

    I have added the two lines.

    Nihar


  21. With the fix in place (google reported 104 duplicates) I now hope google will stop penalising me soon. Does anyone know how long they take before putting my listings back to normal.


  22. Nihar, Mavis – your duplicate pages could (and are more likely to be) category and tags pages, or even some plugin that has repeated your posts pages at multiple urls.

    It’s only if there are links to your site with the missing slash at the end that Google will count the pages affected by this bug so don’t expect a huge decrease in duplicate pages.


  23. Do WP Super Cache work on costom theme ?
    I’ve discover a situation when WP Super Cache dosen’t work at all.
    When i switch my blog to standard theme “delight” — it completely works. But when i switch blog to my own custom theme – it looks like WP Super Cache do not installed. I do not tweaking any settings, just switching the temes.

    Which hook must be included to theme files to switch the WP Super Cache on and working?


  24. I have one question since I don’t understand something: should new users also make this changes in .htaccess or only users who used this plugin before this release?

    Also, can you please check this topic?

    (and btw, it’s good that you use this theme again; when you installed previous I wanted to write that this one is much better :))


    1. That url shouldn’t show at all. I can only presume someone linked directly to that file.
      Perhaps you should add a 301 redirect back to the correct url in the cache folder?


      1. Mmmm,
        Am I on the good way using this in the .htaccess of the cache folder?

        # BEGIN supercache

        AddEncoding gzip .gz
        AddType text/html .gz

        SetEnvIfNoCase Request_URI \.gz$ no-gzip

        Header set Cache-Control ‘max-age=300, must-revalidate’

        ExpiresActive On
        ExpiresByType text/html A300

        redirect 301 /wp-content/cache/supercache/www.pandeblog.org/index.html.gz http://www.pandeblog.org/

        # END supercache

        Thanks


      2. Maybe using….


        RewriteEngine On
        RewriteBase /
        redirect 301 /wp-content/cache/supercache/www.pandeblog.org/index.html.gz http://www.pandeblog.org/

        At the end of the .htaccess inside cache folder?


  25. grrr, your blog is eating my comments ;-)
    If i reirect at root .htacces I get an endless loop and if I redirect at cache folder I get no redirect

    Can you help me please Doncha?
    My cached home page is first in google and my real home second….
    :-(
    Thanks


    1. Akismet thought your comments were spam, sorry. I don’t have time to help you wrestle with mod_rewrite though, but the rules should go into wp-content/cache/.htaccess

Leave a Reply

Loading Facebook Comments ...