plainblack.com
Username Password
search
Bookmark and Share
Subscribe

How to make a Google Sitemap of your WebGUI site

Just had some fun getting Google Sitemaps (http://www.google.com/webmasters/sitemaps/) working with WebGUI, so thought I'd document what I did her so others can do it too.

I used their python sitemap generator ( http://www.google.com/webmasters/sitemaps/docs/en/sitemap-generator.html ) to create the sitemap.xml.gz file - but to do that you needed a config.xml file full of urls to your pages.  What better tool but Webgui itself to generate this config.xml file!

To do this, I created a special navigation template which spat out xml rather than html.  This was pretty easy really - all I had to do was create:

a) a custom style template that contains "<tmpl_var body.content>"

b) a custom page layout template that contains "<tmpl_var content>"

c) a custom navigation template that contains:

<?xml version="1.0" encoding="UTF-8"?>
<tmpl_if session.var.adminOn>
<tmpl_var controls><br />
</tmpl_if>

<site base_url="http://www.example.com" store_into="/data/domains/www.example.com/public/sitemap.xml.gz" verbose="1">

<tmpl_loop page_loop>
<url href="http://www.example.com<tmpl_var page.url>" lastmod="^LastModifiedPage(<tmpl_var page.url>,,%y-%m-%dT%h:%n:%s+00:00);" />
</tmpl_loop>

</site>

You'll need to replace +00:00 above with your local timezone settings based on GMT.  So if you're in Perth you might add +08:00 while if you're in the US you might add -08:00.

d) You can then create a page layout with a url "config.xml" and set the style & page layout's you created above to it.

Add a navigation wobject to this page and set it to display every page in your tree, then set the navigation template you configured before.

Once you've got the above page then you can set up a cron job to grab the config.xml file, parse it, update the sitemap accordingly and notify google of the changes.

You'll need to let the sitemap.xml.gz file through, so you'll need:

a) a passthrough setting in the www.example.com.conf file that lets /sitemap.xml.gz through:

ie. passthruUrls = /sitemap.xml.gz

b) a Location block in your apache2 sites-available config for your virtualhost to let the sitemap.xml.gz go through.  Restart apache after doing this.

To generate the sitemap.xml.gz file - simply grab the config.xml created and then use google's sitemap_gen.py script to do so:

/usr/bin/wget -q http://www.example.com/config.xml -O /data/domains/www.example.com/www/config.xml
/usr/bin/python /usr/local/bin/sitemap_gen.py --config=/data/domains/www.example.com/www/config.xml

Cron the above script to run on a regular basis and google will have much better parsing of your site in future!

Hope this helps someone else trying to do the same thing!

 

Code for LastModifiedPage macro:

 

package WebGUI::Macro::LastModifiedPage;

#-------------------------------------------------------------------
# WebGUI is Copyright 2001-2006 Plain Black Corporation.
#-------------------------------------------------------------------
# Please read the legal notices (docs/legal.txt) and the license
# (docs/license.txt) that came with this distribution before using
# this software.
#-------------------------------------------------------------------
# http://www.plainblack.com                     info@plainblack.com
#-------------------------------------------------------------------

use strict;
use WebGUI::Asset;
use WebGUI::International;

=head1 NAME

Package WebGUI::Macro::LastModifiedPage

=head1 DESCRIPTION

Macro for displaying the date that the most recent revision of the given Asset was last modified.

=head2 process ( [asset, label, format] )

=head3 asset

Asset to calculate the date from.

=head3 label

Text to prepend to the date.  This can be the empty string.

=head3 format string

A string specifying how to format the date using codes similar to those used by
sprintf.  See L<WebGUI::Session::datetime/"epochToHuman"> for a list of codes.
Uses
"%z" if empty.

=cut


#-------------------------------------------------------------------
sub process {
        my $session = shift;
        return '' unless $session->asset;
        my ($asseturl, $label, $format, $time);
        ($asseturl, $label, $format) = @_;
        $format = '%z' if ($format eq "");
  my $asset = WebGUI::Asset->newByUrl($session,$asseturl);
        ($time) = $session->dbSlave->quickArray("SELECT max(revisionDate) FROM a
ssetData where assetId=?",[$asset->getId]);
        if ($time) {
                return $label.$session->datetime->epochToHuman($time,$format);
        }
        my $i18n = WebGUI::International->new($session,'Macro_LastModified');
        return $i18n->get('never');
}

1;

Keywords: google howto sitemap

Search | Most Popular | Recent Changes | Wiki Home
© 2010 Plain Black Corporation | All Rights Reserved