Duplicate Content Cure Plugin for WordPress

Surprise. Your WordPress Site Isn’t Search Engine Friendly
We have all seen the standard “WordPress SEO” articles that discuss the obvious things that you can do to optimize your WordPress blog. Most focus on optimizing titles, permalinks, meta keyword and description tags, and possibly adding a Google sitemap. All of these are important for SEO, but there is one really big one missing.

It turns out that the default WordPress site structure is extremely un-search engine-friendly. The same archives, categories, built in search, and home page paged navigation that make navigation extremely easy for people, but don’t feature any unique content at all. They usually feature excerpts of the full posts, or complete duplicate copies of the full posts in chronological order. All of these extra pages with the same content repeated again and again, dilute the value of the most important pages on the site: the posts and pages.

It turns out that this is actually a really easy problem to solve by just telling the search engine spiders not to index those pages with duplicate content by adding noindex meta tags. This can be done by modifying the header with a conditional tag to check for page type, and adding the noindex tag when needed.

I could post a tutorial, but I thought it would be nicer if I made a plugin for WordPress that does the work for you.


WordPress Duplicate Content Cure

UPDATE
Excluding category pages is now optional. See directions below.
——

Duplicate content cure is a very simple, yet effective SEO plugin that prevents search engines from indexing wordpress pages that contain duplicate content, like archives and category pages.

It does this by adding the noindex,follow meta tag on the problem pages.
<meta name="robots" content="noindex,follow">

It’s really simple, so there’s not really much more to explain.

Installing WordPress Duplicate Content Cure
1. download the plugin
2. place the file duplicate-content-cure.php in your plugins directory
3. By default, category pages will have the noindex tag added. If you wish to
allow your category paes to be indexed, just change the
$index_category_pages variable in the duplicate-content-cure.php file. See the example below:

Change
$index_category_pages = false;
to
$index_category_pages = true;

4. activate it on the plugins page

That’s it. Say goodbye to those pesky duplicate content pages for good.

74 Comments

  1. Thanks for making this plugin. I have been concerned about this for some time. I went to google and looked at all the pages they had indexed for my site. Most of them were archive pages rather than post pages. Not good.

    I didn’t know it was so easy to tell search engines not to index certain pages. This helps to solve a related issue, namely that ppl who search for something get directed to an archive page and then have trouble finding the content they are looking for. I would like to have the search engines send them to the post page using the permalink.

    One problem still remains: sometimes a search will direct them to the home page but the post they are looking for will no longer be shown there. I think that I can partially deal with this by having the home page show only excerpts. Do you have any other suggestions?

  2. Ken,
    I agree. The front page is definitely still somewhat of a problem. I think by only showing excerpts, you are definitely putting a big dent in the problem.

    I think I may be moving toward taking all of the post content off of the front page all together, and only show links to the latest posts. I have already created a page called Blog (http://seologs.com/blog/ ) that sort of replaces the home page. It shows excerpts from the latest few posts.

    So all I need to do now is find make some unique content for the home page that contains some keywords that are specific to the site in general, and remove those posts all together.

    Does that make sense?

  3. I like this seo plugin for wordpress. Maybe you can create a list of recommended wordpress plugins for seo. I would be interested in the ultimate css template and plugin combos for wordpress to be search engine optimized.

    Also btw the tab order for this leave a reply form above is out of wack :)

    Amir

  4. Pingback: TwiddleGeek
  5. Does this plugin have any negative effects? I do not have a spider.txt on my site, so will this plugin discourage goo search endine ranking?

  6. Charles,
    This plugin will discourage the search engines from indexing anything other than the home page, the Posts (blog posts), and the actual Pages. Since I have released this plugin, I have had both positive and some negative feedback on this plugin. Some people argue that having the category, archive, and paged pages (when you click next on the home page) will actually bring more traffic to a site. I tend to agree more with what Ken said above
    “sometimes a search will direct them to the home page but the post they are looking for will no longer be shown there.”

    At least in my experience, I have find it to be really annoying when I click a result from a search, only to find that the phrase I was searching for isn’t there anymore, or the group of words is there, but the context makes no sense at all when I actually go to the page.

  7. This is a nice plugin but noindexing the catagory pages is not a good idea. Is there any way to get an option or version that does not noindex the catagory pages. I’m really confused why you would have even tried noindexing the catagory pages. Have you ever looked at your log files? You must not be a very good seo if you are not using your catagory pages to rank for terms. It gives you a chance to try and rank for one specific term. It is not duplacate content. Your catagory pages are your biggest chance or ranking for terms. If you do it right you get lots of posts on one page all with the same theme and the keywords repeated many times.

  8. You can change line 24 in the plugin from:

    if((is_single() || is_page() || is_home()) && (!is_paged())){

    to

    if((is_single() || is_category() || is_page() || is_home()) && (!is_paged())){

    and that will put your category pages in Google. This mod is very simple and can be modified very easily.

  9. ogletree,
    Edit:
    The code you posted above will only work for the non paged (or first page) of each category. If you’d like to have all category pages indexed, see the update above.

    I may add a simple configuration to the plugin to allow for people who aren’t familiar with PHP to easily change this option.

    As for your first comment, I think your argument would make a lot more sense if I each post was only allowed to be placed into one category. I wish that I had organized my categories that way, but I, like many others, place my posts into anywhere from 3 to 10 categories each.

    Let’s say I write a post, and select 6 relevant categories to put it in. Now, the most important part of that post (the title), and at least one paragraph are going to be on at least 9 different pages of my blog (the 6 category pages, 2 (maybe more) archives page, and finally the actual post page).

    Sure, you may get traffic to those pages, but how similar can all of those posts be? At least in my case, all the posts in a single category are definitely going to be related, but they are all unique articles, and having them all on the same page, completely dilutes each one of them. Another problem with categories is that they are constantly changing, making it highly unlikely that the phrase the visitor searched for will even still be there.

    I think with so much emphasis on getting traffic, we sometimes loose sight of the importance of the content, and the quality of the landing page. For me, this is really a quality vs quantity issue.

    I don’t see why this would make me a bad SEO anyway. I’m just choosing to focus my efforts on optimizing the parts of wordpress that I have full control over.

  10. Although Google claims that it can identify the duplicated content and rank the ‘most important’ URL. It is always better to make it easy for Google bot. So, putting ‘no-follow’ for duplicated content URL is a solution.

  11. This is a great plugin. Several of the WP sites I run have been having problems with Google only indexing the category or archive pages and not the individual article pages.

    Ends up with a lot of search result snippets on Google being nothing but a list of post categories, author name, and publication date =P

  12. All my posts are in the supplemental results. How long does it take after installing the plug in before the posts start showing up again in the regular index and not supplemental results area better known as “Google Hell”

  13. Hey Glenn.
    Once pages go supplemental, it’s pretty difficult to get them out.

    The #1 cause of supplemental results in Google, is a lack of pagerank (lack of links), so if you have a lot of posts in the supplemental results, try getting some links to them.

    I know that is easier said than done, but that’s what you need.

  14. Thank’s Badi,

    What if I add links to the posts from my main site pages which many pages have PageRank? The blog is in a sub-folder of my main domain URL

    After activing your plug-in, if I write new blog posts will those go Supplemental or get indexed in the regular Google index you think?

  15. Yes,
    This plugin stops both category and monthly archive pages from getting indexed, as well as paged pages.

    Glen, I’d suggest that you try submitting your pages to directories, social bookmarking sites (like netscape, digg, reddit, etc).

    Get them out there.

  16. One thing Glen.
    Be careful when submitting your pages. Make sure not to abuse them. You should only submit some selected posts.

    Also the link you provided looks good. I’d be interested in hearing the results.

  17. Thanks for the tip Badi,

    With your plug-in activated do you think future blog posts I make will suffer the same “Google Hell” fate?

  18. Glenn,
    One more really important thing that might help you is to make sure you have picked either www or non www as your default domain.

    For example. I chose to use the www version, so I redirect all people who come to dnscoop.com to http://www.dnscoop.com

    This is what I use in my .htaccess

    RewriteCond %{HTTP_HOST} ^dnscoop.com [NC]
    RewriteRule ^(.*)$ http://www.dnscoop.com/$1 [L,R=301]

    This helps search engines to focus all attention on one.

  19. Again, the question…

    With this plug-in activated will future blog posts I make will suffer the same “Google Hell” fate?

    Anyone have any opinions?

    Thanks in Advance!
    Glenn

  20. One problem — though an easily-solved one — with this plugin: the code it generates is not valid XHTML. The solution is simple: just add a trailing forward-slash to the end of the meta-tags, right before the right angle bracket, like this:

    Actually, a second problem: The URL in the plugin itself doesn’t point to this page, so anyone looking for updates by following this link is going to just see your home page — and maybe assume the plugin page (like so many others out there) is gone.

  21. Thanks for the very useful tool! It seems it has some issues working together with wp_cache… So, to run dc cure i had to deactivate wp_cache 😉

  22. This sounds dumb to me. Not having duplicate content is a theme matter. Having more pages (category archives etc…) is good for SEO. Preventing SE from indexing them is totally stupid imho.

  23. @Ozh
    Try to look more at the big picture.
    Traffic isn’t everything. In fact, it can be quite useless when the landing pages don’t contain exactly what the visitor was looking for. In most cases, the user will just hit the back button and never look back. I know that’s what I do.

    Relying on getting random traffic from a bunch of random keywords on a page may be good SEO for a MFA (made for adsense) site where the goal is to make people leave your site (via ads), but it isn’t a good SEO strategy for a serious blog. Of course this is also just my opinion.

  24. I’ve been reading a lot about SEO these past few weeks, and there seems to be a lot of contradiction regarding plugins and follows.

    Ultimately we all know that it comes down to content, first and foremost, but I hope to discover a few ways to help out, without the risk of Google searches bringing up 404 pages.

  25. Thanks a bunch, easy to implement, works like a charm with Wordpres 2.2 and does the trick efficiently.

    Tom

  26. Thanks for this useful info. I realized my blog has a lot of pages indexed in supplemental index when i start using Firefox SEO plugin from Aron. I hope this plugin can help me get rid of it.

  27. Hey,
    Nice plugin! I do have one question/request. How can I exclude RSS feeds as well? I give out full feeds and it would seem to be a duplicate as well?

  28. Hey Michael,
    The plugin can’t do RSS feeds. I think the best way to exclude them would be the robots.txt file:
    Something like this:
    ——–

    User-agent: *
    Disallow: /feed/
    Disallow: */feed/

  29. I suppose for search engines, we could take the referring keywords and run a search for related posts and display the top 5links. Not an optimal solution, but one that’d work. I’m considering doing a plugin like that myself.

  30. Cool plug-in – thanks.

    If you are worried about your front page’s only showing exerpts and wish there was a way you could have the front page show any number of latest posts and control how much of the posts show, even choose by category and style it your own way here is what I did: If you click on my name above it takes you to my site.

    Ok imagine this, that the RSS feeds from your site are actually parsed as html with a simple tool called CARP – cool or what?

    Just make a new page in wordpress, then assign it to be your front page under options / reading, then edit your page as you see fit. when it is time for parsing the rss feeds (even of any specific category if you wish), download CARP and read the instructions of how to set it up. Just google it because I’m not sure if its cool to post a link for another tool besides this cool duplicate content cure plug in. And no, im not the author of CARP, but I do use it rather successfully.

    Check out my site if you want an example, for me its the coolest thing since… well…cool-aid.

    Hope it helps everyone that needs it.
    cheers
    Frank

  31. Thx for the plugin, however I think for most wordpress blogs, the default should be to not allow google to follow the home/main/index.php page. After observing where my links are coming from, i can see people are looking for specific articles, click on the the main page, http://nyherald.com only to not find the article.

  32. plugins a tener en cuenta. Hay algunos otros, asÃque prueba el que mejor te vaya, por ejemplo: Duplicate Content Cure o el All in One SEO […]

  33. Interesting thought.

    I am not a web designer, nor am I a php expert. And what I know of SEO has been self taught. I am getting organic traffic, but how can I tell if this is a problem on my site, duplicate content?
    I have taught myself a TON and have designed my own wordpress site. This has been much trial and error and reading through blogs such as yours.

    My own keyword optimization approach is to use adwords, find some relevant terms and disperse them through my site.
    I know its not a perfect job, but it does seem to work somewhat.

    I wonder if we will always be modifying our approach for the latest ‘spider’ changes that seem ongoing?

    Thanks so much,

    Jeromy AKA Hillbilly

  34. Hi

    I’ve been looking for this plugin. This plugin would help me to check my article (for duplication) before I post on my blog.

    Thanks for this plugin.

  35. What about tag pages? I’m finding that alot of my traffic is coming from tag pages. I didn’t see any settings for true/false with regard to tag pages. Thx.

  36. didn’t know it was so easy to tell search engines not to index certain pages. This helps to solve a related issue, namely that ppl who search for something get directed to an archive page and then have trouble finding the content they are looking for. I would like to have the search engines send them to the post page using the permalink.

  37. Can plugin be used to exclude an individual page from being followed/indexed? -such as a product download page.
    Cheers

  38. Yes,
    This plugin stops both category and monthly archive pages from getting indexed, as well as paged pages.

    Glen, I’d suggest that you try submitting your pages to directories, social bookmarking sites (like netscape, digg, reddit, etc).

    Get them out there.

  39. This sounds dumb to me. Not having duplicate content is a theme matter. Having more pages (category archives etc…) is good for SEO. Preventing SE from indexing them is totally stupid imho.

  40. This plugin stops both category and monthly archive pages from getting indexed, as well as paged pages.

    Glen, I’d suggest that you try submitting your pages to directories, social bookmarking sites (like netscape, digg, reddit, etc).

    Get them out there.

Comments are closed.