In WebGUI.pm beginning @ line 133 ther is the line that checks the 'If-Modified-Since' header for a value. There is also cache control logic in WebGUI::Session::Http that handles some of this.
I have had to comment out lines 132-137 & 139 in WebGUI.pm because no matter what happens, it breaks the site. I'll have users (visitors) emailing us because they haven't seen any new news on the site for a week, or a returning user (visitor) from a long time ago keeps getting the cached version of the page.
I might be missing something here, but the logic of the code reads to me as follows...
1. If the request headers (inbound) contains the header 'If-Modified-Since'AND2. The user is a visitorTHENDon't check the date on the 'If-Modified-Since' and see if it would make sense to send an updated copy, just send back 304, Content not modified.
I've looked around the source, and unless I've missed something, or there's something going on with the Apache API, it seems to me that nothing is removing the If-Modified-Since header if the document is updated.
This has been around for a while for our site and I've just kept the lines commented out, but I'm surprised nobody else has ran into an issue with more dynamic content that is regularly updated.
Any ideas?
Thanks,
Troy
I just want to add to this discussion, as we are experiencing _exactly_ the same problem, and independanty found our way to the If-modified-since check in WebGUI.pm and ended up commenting it out, as it just doesn't make sense.
I strongly feel that the current WebGUI behaviour is incorrect, with respect to the RFC (but I'm prepared to be flogged if I'm wrong!).
I have read the relevant bits of RFC2616 (June 1999), and here are the key points:
1. In section 14.9.4 is the description of the "must-revalidate" directive that WebGUI is sending to the browser with each visitor page view. It states that a cache (i.e. the browser's cache) "MUST NOT use the [cache] entry after it becomes stale to respond to a subsequent request without first revalidating [the entry] with the origin server".
2. At the beginning of the same section, you can find that this kind of revalidation (i.e. where the cache has a stale version of the entry) is called a "Specific end-to-end revalidation".
3. The issue (it seems to me) hinges on interpretation of what "revalidation" means. Section 13.3 of the RFC describes the Validation Model. The first 3 paraghaphs describe the use of the "conditional methods" for validating a cache entry. Use of the If-modified-since is one of these methods a cache may use.
It seems clear to me that it breaks the RFC specification to have WebGUI _always_ respond to a cache's attempt at a conditional-validation with a 304 (Not Modified), when in fact the content may well have been modified. In fact, by responding with a 304, WebGUI has "revalidated" the cache's entry, which will presumably not go stale agin until another max-age interval has passwd (as set in the original headers).
JT's interpretation seems to be (appologies to JT if I've got this wrong) that because the "must-revalidation, max-age=N" was sent by the server, then once the max-age has been reached the browser cache must simply "delete" the entry and request a fresh copy from the origin server. But in fact, the cache does no such thing - it attempts to "revalidate" the entry by the use of the If-modified-since conditional revalidator method.
As an aside, as a light-weight way for WebGUI to determining the current "Last-modified" date of a page, couldn't this be simply achieved by keeping a filecache of Last-modified dates keyed by PageUrl. These filecache entries would by updated whenever an asset revision was committed - you'd simply need to find the asset's "container" (Layout wobject) and use it's URL as the key to the Last-modified filecache entry to update with the current timestamp. Then in WebGUI.pm, you'd access this filecache whenver an IMS was received, and only send a 304 if the date in the filecache was <= the date in the request.
Best regards,Aman
Us too! We would be one of those sites that Troy was wondering about. Some of our content changes on a daily basis. That's exactly why we went to a CMS.
If this is a bug, then it really defeats the purpose of WebGUI as a CMS.
Reading the HTTP/1.1 RFC it seems that as the origin server, WebGUI is required to actually compare the Last-Modified time to the page. However, there is no easy or cheap way to do this. Even the cache solution won't work correctly, since there are ways to get assets from far across the site (Navigation assets for example).
A later (post 7.4 series) version will introduce a more flexible way to control caching, which should fix this problem.
[quote]Some of our content changes on a daily basis. That's exactly why we went to a CMS.
If this is a bug, then it really defeats the purpose of WebGUI as a CMS.[/quote]
Couldn't agree more. It may not be a bug but it's indeed a flaw in WebGUI, which is worst.
Add our sites to the lists.
Teachers are frustrated when the homework that they are assigning cannot be seen in the browsers.
I am using 7.3.22-stable. When will this issue be resoved?
Hmmm, seems that our site suffers from the same problem, as Mr. preaction pointed it out. Sorry for the stupid question, but what is the sollution for this? I couldn't figure it out so far.Thanks!
Here is the offending code for everyone who is having a problem and needs to turn this off. The code is in WebGUI.pm at the top level of the WebGUI package (i.e. /data/WebGUI/lib/WebGui.pm for WRE users). Take note that the line numbers here are only to make it clear where lines actually start in the code as this text will be wrapped.
I imply no sort of responsibility for anything you do on the part of managing your site. If you modify the code based off of this, it is your responsibility.
That being said, I commented out these lines (they all fall in line together in the code.) TAKE NOTE that one line is NOT commented out.
1.# if ($r->headers_in->{'If-Modified-Since'} ne "" && $session->var->get("userId") eq "1") { #display from cache if page hasn't been modified.2.# $http->setStatus("304","Content Not Modified");3.# $http->sendHeader;4.# $session->close;5.# return Apache2::Const::OK();6.# } else { #return the page.7. $out = page($session);8.# }
We've been doing this for a long time now and it works out well. Obviously traffic on your site will increase as a result of this change. Hope this helps some of you out.Troy
Troy, thanks for the patch; Graham, thanks for applying the patch.
I'm going to reinstall WRE0.8 tonight, for note-taking, wiki-updating purposes; will that be the version I'll get?
Rather than commenting this out, I've fixed this the right way so that it checks if the content has been modified and displays the cached version if it hasn't, or sending a full page if it has changed.
fixed in 7.4.6