plainblack.com
Username Password
search
Bookmark and Share

    

Searching external HTML files

User knowmad
Date 7/6/2008 9:08 pm
Views 969
Rating -1    Rate [
|
]
Previous · Next
User Message
knowmad

While preparing for my talk next month, I'm looking over the external indexer plugins. I notice that HTML files are simply being cat'd in whole. This will cause tags, attributes, styles and other non-text elements to get caught by the indexer. Is this intentional?

I propose the use of w3m, lynx or a simple Perl script to strip the HTML tags to avoid them being indexed.

 

William

----
Knowmad Technologies
http://www.knowmad.com



Back to Top
Rate [
|
]
 
 
JT
You're incorrect about your observation. The HTML is being stripped. 

JT
On Jul 6, 2008, at 9:08 PM, <william@knowmad.com> wrote:

knowmad wrote:

While preparing for my talk next month, I'm looking over the external indexer plugins. I notice that HTML files are simply being cat'd in whole. This will cause tags, attributes, styles and other non-text elements to get caught by the indexer. Is this intentional?

I propose the use of w3m, lynx or a simple Perl script to strip the HTML tags to avoid them being indexed.

 

William

----
Knowmad Technologies
http://www.knowmad.com



http://www.plainblack.com/webgui/dev/discuss/searching-external-html-files


--

Plain Black&#44; makers of WebGUI
http://plainblack.com


Back to Top
Rate [
|
]
 
 
    



© 2012 Plain Black Corporation | All Rights Reserved