WebspaceWorks Resources

Content Extract: Taking control of excerpts in Wordpress

It is not uncommon to list short extracts of multiple posts on a single page within a blog and WordPress provides a couple of methods of doing just that, using either the <!–More–> content break in conjunction with the the_content template tag, or the the_excerpt template tag that will pull an excerpt or truncate the content if no real extract is found for a given post. This is fine, but didn’t go quite far enough for a recent client, so the Content Extract plugin was born and is now made available for a wider audience.

With this plugin the extract will use an excerpt if one is available, and if not will first obey any explicit break in the content, before otherwise checking the full content length and determining where to cut it, according to the parameters passed to it.


Content Extract is based upon elements of both the the_content and the the_excerpt template tags, but since it trims back on what the first one does, while extending the second, it shares interface with neither. The bulk of the work is done by a new filter, wswwpx_trim_extract, as described below.


Simply place the expanded file content_extract.php in the path ‘/wp-content/plugins/’ in your WordPress installation, and activate it in the admin interface.

All function names include the initials ‘wswwpx’ in a prefix, so there should be no conflicts with any other functions.

Wordpress 2.0 compatibility: Content Extract was built and tested against WP 2.0 Stable, but should also work with WP 1.5+.


Content Extract would typically be used to replace the functionality provided by the the_excerpt tag that is usually found in pages that list samples/extracts from multiple posts, and which typically provide links on to the full text versions. It therefore needs to be called from within The Loop, and it will have a call that looks like this:

<?php wswwpx_content_extract ( ); ?>

or this:

<?php wswwpx_content_extract ( '(More...)', 80, 55 ); ?>

or more generically, this [Note: This changed with 1.0b2 (21 January, 2006)]:

<?php wswwpx_content_extract ( $more_link_text, $check_length, $cut_length, $addtodb, $html_before, $html_after, $striphtml, $withlink ); ?>

The arguments are:

  1. $more_link_text: The text to be used for the onwards link to the full text version, dflt='(More...)'
  2. $check_length: The maximum length of text to allow from the full text of the post before the automatic truncation algorithm will be used. Defaults to words, but can be written as 'n:s' to denote a length of ‘n’ sentences, dflt=80 words
  3. $cut_length: The length to which the full text will be cut if automatic truncation is triggered. Defaults to words, but can be written as 'n:s' to denote a length of ‘n’ sentences, dflt=55 words
  4. $addtodb: Switch to write the auto-generated extract out to the post_excerpt field in the posts database record or not, dflt=false, setting it to ‘2′ will force the excerpt to be updated and re-written to the database.
  5. $html_before: HTML to be added to the front of the extract, dflt=<p>
  6. $html_after: HTML to be added to the end of the extract, dflt=</p>
  7. $striphtml: Strip html formatting from excerpt? dflt=true (in keeping with WP default for excerpts)
  8. $withlink: Switch to add the link to the end of the extract or not, dflt=true

Note here that the check length and truncation length can be different, and the truncation length will always be less than or equal to the check length.


  • If an excerpt is available, use it
  • If no excerpt, look through content to see if a <!–more–> breakpoint exists. If so, use it
  • If neither of the above, check content length against allowed $check_length
    • If less than or equal to this length, post full length article
    • If greater than this check length, cut back to the specified $cut_length
    • Allow limits to be optionally set as sentences (default is words)
  • If article is in any way shortened,
    • optionally add a link to the end of the ‘extract’
    • optionally write out the new extract to the post_excerpt field of the current post in the database
  • Place resulting extract between $html_before and $html_before tags

New in 1.0b3: If the $striphtml is set to false, then html formatting within the excerpt will be retained. This differs from normal Wordpress behaviour, which strips all formatting from excerpts, leaving you with a single block of text that becomes more difficult to read as length increases.


Current version (v1.0):
content_extract.php.zip [09 November, 2006].


  • 09 November, 2006: Current version (v1.0 Release version): New features/bugfix
    • Extends ’sentence’ identification to include ordered and unordered lists
    • Fixes a bug where leading/trailing tags were stripped and then replaced with something different
    • Fixes a bug where truncated lists/items were not being closed in the output. Openlists/items are now tracked, and appropriate closing tags added, wherever needed
  • 11 August, 2006: (v1.0b5): Bugfix
    • Fixed another stupid bug preventing the stripping of html tags, thus restoring correct default behaviour to the plugin.
  • 21 April, 2006: (v1.0b4): Bugfix
    • Fixed stupid bug encountered when breaking on a word-count
    • Corrected callable function name in “Call as” section of header comments
  • 04 February, 2006: (v1.0b3): Feature enhancement and small interface change
    • Extends ‘addtodb’ to support updating of database excerpt
    • Adds an option to retain html formatting of the excerpt, but will still strip the leading and trailing tags.
    • Beefed-up sentence counting (still not perfect) to catch ‘. ‘, ‘? ’, ‘! ’ & ‘.</p>’
    • Update header info and remove typos.
  • 21 January, 2006: (v1.0b2): Feature enhancement and small interface change
    • Adds option to allow the auto-generated extract to be written out to the post_excerpt field in the post’s database record. This will only happen if there is no excerpt currently stored for a post, so the effect is to reduce processing for subsequent loads of post extracts: default: false
    • Adds ability to express check and extract lengths as sentences, rather than the default unit of words
  • 19 January, 2006 (v1.0b1): 1st public release

Technorati Tags: , , ,

Webspace Works: Effective website design, development & search engine optimisation

Simply effective, web design that works!

W3Csites.com Web Standards Group: Promoting Accessibility and standards across the internet This page uses valid CSS
This page uses valid XHTML 1.0 Transitional This page works in all modern browsers Sitening perfect score Real Design Network Approved