plainblack.com
Username Password
search
Bookmark and Share
View All Tickets
Cache & If-Modified-Since  (#3182)
Issue

In WebGUI.pm beginning @ line 133 ther is the line that checks the 'If-Modified-Since' header for a value.  There is also cache control logic in WebGUI::Session::Http that handles some of this.

I have had to comment out lines 132-137 & 139 in WebGUI.pm because no matter what happens, it breaks the site.  I'll have users (visitors) emailing us because they haven't seen any new news on the site for a week, or a returning user (visitor) from a long time ago keeps getting the cached version of the page.

I might be missing something here, but the logic of the code reads to me as follows...

1. If the request headers (inbound) contains the header 'If-Modified-Since'
AND
2. The user is a visitor
THEN
Don't check the date on the 'If-Modified-Since' and see if it would make sense to send an updated copy, just send back 304, Content not modified. 

I've looked around the source, and unless I've missed something, or there's something going on with the Apache API, it seems to me that nothing is removing the If-Modified-Since header if the document is updated.

This has been around for a while for our site and I've just kept the lines commented out, but I'm surprised nobody else has ran into an issue with more dynamic content that is regularly updated. 

Any ideas?

Thanks,

Troy

Solution Summary
Comments
JT
0
6/13/2007 8:14 am
That's because in order to check if the content has been modified it would have to instantiate the asset or assets in question and make that determination. And at that point we might as well serve up the whole page again. However, that's not to say that this is a bug. It's doing what it's supposed to do. As a content publisher you're supposed to set the cache timeout on things so that the browser automatically deletes them after a while, and then it won't send an if-modified-since, and therefore WebGUI won't have anything to reply to with content not modified. If there is indeed a bug somewhere, which I cannot verify at the moment, then it would be in the logic that sets the time to live on the page, not the if-modified-since logic. From your description, it's working as it is supposed to
aman
0
8/1/2007 10:24 pm

I just want to add to this discussion, as we are experiencing _exactly_ the same problem, and independanty found our way to the If-modified-since check in WebGUI.pm and ended up commenting it out, as it just doesn't make sense.

I strongly feel that the current WebGUI behaviour is incorrect, with respect to the RFC (but I'm prepared to be flogged if I'm wrong!).

I have read the relevant bits of RFC2616 (June 1999), and here are the key points:

1. In section 14.9.4 is the description of the "must-revalidate" directive that WebGUI is sending to the browser with each visitor  page view.  It states that a cache (i.e. the browser's cache) "MUST NOT use the [cache] entry after it becomes stale to respond to a subsequent request without first revalidating [the entry] with the origin server".

2. At the beginning of the same section, you can find that this kind of revalidation (i.e. where the cache has a stale version of the entry) is called a "Specific end-to-end revalidation".

3. The issue (it seems to me) hinges on interpretation of what "revalidation" means.   Section 13.3 of the RFC describes the Validation Model.  The first 3 paraghaphs describe the use of the "conditional methods" for validating a cache entry.  Use of the If-modified-since is one of these methods a cache may use.

It seems clear to me that it breaks the RFC specification to have WebGUI _always_ respond to a cache's attempt at a conditional-validation with a 304 (Not Modified), when in fact the content may well have been modified.  In fact, by responding with a 304, WebGUI has "revalidated" the cache's entry, which will presumably not go stale agin until another max-age interval has passwd (as set in the original headers).

JT's interpretation seems to be (appologies to JT if I've got this wrong) that because the "must-revalidation, max-age=N" was sent by the server, then once the max-age has been reached the browser cache must simply "delete" the entry and request a fresh copy from the origin server.  But in fact, the cache does no such thing - it attempts to "revalidate" the entry by the use of the If-modified-since conditional revalidator method.

As an aside, as a light-weight way for WebGUI to determining the current "Last-modified" date of a page, couldn't this be simply achieved by keeping a filecache of Last-modified dates keyed by PageUrl.  These filecache entries would by updated whenever an asset revision was committed - you'd simply need to find the asset's "container" (Layout wobject) and use it's URL as the key to the Last-modified filecache entry to update with the current timestamp.  Then in WebGUI.pm, you'd access this filecache whenver an IMS was received, and only send a 304 if the date in the filecache was <= the date in the request.

Best regards,
Aman

baylink
0
8/8/2007 8:13 pm
[ subscribing to watch this; I'm having caching troubles as well.  No way to subscribe without a reply. That's a bug.  :-)
pantah650
0
8/8/2007 10:27 pm
I'd like to add myself to the list of people experiencing this
AnthonyAddinall
0
8/9/2007 12:30 am

Us too!  We would be one of those sites that Troy was wondering about.  Some of our content changes on a daily basis.  That's exactly why we went to a CMS. 

If this is a bug, then it really defeats the purpose of WebGUI as a CMS.

preaction
0
8/14/2007 5:14 pm

Reading the HTTP/1.1 RFC it seems that as the origin server, WebGUI is required to actually compare the Last-Modified time to the page.  However, there is no easy or cheap way to do this. Even the cache solution won't work correctly, since there are ways to get assets from far across the site (Navigation assets for example).

A later (post 7.4 series) version will introduce a more flexible way to control caching, which should fix this problem. 

mhsweb
0
8/27/2007 1:05 pm
I have the same exact problem.  Is the current solution to comment the lines indicated int he original
fathertorque
0
8/30/2007 4:06 am
Same problem for me too.

[quote]Some of our content changes on a daily basis.  That's exactly why we went to a CMS. 

If this is a bug, then it really defeats the purpose of WebGUI as a CMS.[/quote]

Couldn't agree more. It may not be a bug but it's indeed a flaw in WebGUI, which is worst.

AnthonyAddinall
0
9/2/2007 8:27 pm
This problem seems to effect the PlainBlack site itself!  I'm not seeing the most up to date information on the home page unless I clear my browsers cache
preaction
0
9/2/2007 8:56 pm
Yes, the current solution is to disable the behavior entirely. In a future version, there will be the ability to do this from a configuration value, as well as a couple other levels of
aewhale
0
9/4/2007 2:30 pm

Add our sites to the lists.

 

Teachers are frustrated when the homework that they are assigning cannot be seen in the browsers.

 

I am using 7.3.22-stable.  When will this issue be resoved? 

Cirifischio
0
9/6/2007 4:06 am

Hmmm, seems that our site suffers from the same problem, as Mr. preaction pointed it out. Sorry for the stupid question, but what is the sollution for this? I couldn't figure it out so far.
Thanks! 

TjECC
0
9/6/2007 9:34 am

Here is the offending code for everyone who is having a problem and needs to turn this off.  The code is in WebGUI.pm at the top level of the WebGUI package (i.e. /data/WebGUI/lib/WebGui.pm for WRE users).  Take note that the line numbers here are only to make it clear where lines actually start in the code as this text will be wrapped.

I imply no sort of responsibility for anything you do on the part of managing your site.  If you modify the code based off of this, it is your responsibility. 

That being said, I commented out these lines (they all fall in line together in the code.)  TAKE NOTE that one line is NOT commented out.

 

1.#                       if ($r->headers_in->{'If-Modified-Since'} ne "" && $session->var->get("userId") eq "1") { #display from cache if page hasn't been modified.
2.#                               $http->setStatus("304","Content Not Modified");
3.#                               $http->sendHeader;
4.#                               $session->close;
5.#                              return Apache2::Const::OK();
6.#                       } else {                                        #return the page.
7.                                $out = page($session);
8.#                       }

We've been doing this for a long time now and it works out well.  Obviously traffic on your site will increase as a result of this change.  Hope this helps some of you out.

Troy

Graham
0
9/7/2007 10:29 am
I've made WebGUI ignore those headers for now (7.4.6).  This will be revisited in the
baylink
0
9/7/2007 11:06 am

Troy, thanks for the patch; Graham, thanks for applying the patch.

I'm going to reinstall WRE0.8 tonight, for note-taking, wiki-updating purposes; will that be the version I'll get? 

JT
0
9/7/2007 12:54 pm

Rather than commenting this out, I've fixed this the right way so that it checks if the content has been modified and displays the cached version if it hasn't, or sending a full page if it has changed.

fixed in 7.4.6 

Details
Ticket Status Closed  
Rating0.0 
Submitted ByTjECC 
Date Submitted2007-06-11 
Assigned To unassigned  
Date Assigned2010-03-16 
Assigned By 
Severity Critical (mostly not working)  
What's the bug in?  
WebGUI / WRE Version current  
URLbugs/tracker/cache--if-modified-since
Keywords
Ticket History
© 2010 Plain Black Corporation | All Rights Reserved