HTTrack test page
Last modified: 2005.12.12 1520 AEST
NEL = No External Links
Last tested with: HTTrack 3.40-BETA-3 (+swf)
With NEL=ON this file (redirect.php) should redirect, and the hyperlink here change to "external.html?link=http://www.httrack.com/"
- v3.20RC6B this is now "external.html?link=http://www.tas.gov.au/"
- v3.23-rc1 has reverted back to "htt.html" containing META redirect.
- v3.30-rc8 has same "htt.html" META redirect.
- v3.30-rc15 WOOT! Appears to be doing how v3.20 did it :)
- v3.33 (Test changed from htt.asp to redirect.php)
- v3.40-BETA-3 creates redirect.html containing META redirect to site (or external.html)
Form
Testing NEL on form action, plus email address in hidden field.
- v3.20RC6B with NEL=ON and NEL=OFF, all seems fine.
- v3.23-rc1 still looks fine.
<!-- Comments -->
Hyperlink and image should be ignored in this section
- Service unavailable results in link to non-existant image
- v3.23-rc1 points to local non-existant image.
- v3.40-BETA-3 rewrites extension and points to non-existant "nothing.html"
- <v3.20RC1 - This link will not be external with NEL=ON. External.html auto-adds "http://" so https links would not work
- v3.20RC1 actually uses "external.html?link=https://esn.gov.au/" but on the external.html page, the javascript rewrites it incorrectly.
- v3.20RC6B incorrectly removes the "https://" from links when NEL=OFF, but now handled by NEL=ON perfectly.
- v3.21-4 working correctly
- v3.23-rc1 looks fine.
Various protocol test links
- v3.20RC6B fails on mailto: links, seeing them as relative rather than a separate protocol.
- v3.21-4 working correctly
- v3.23-rc1 telnet:// is not external, but that's no problem.
- v3.30-rc8 same as v3.23-rc1
Badly-coded relative links
- v3.20RC6B appears to handle this well
- v3.21-4 succeeds with NEL=OFF, but not NEL=ON
- v3.23-rc1 appears to handle this with NEL ON and OFF
- v3.30-rc8 looks fine.
- v3.33 All three links have become "external.html?link=http://kauler.com/./"
- v3.40-BETA-3 nice, now all links become "external.html?link=http://kauler.com/"
Links with ampersands
Turn off "Hide query strings" option for this test.
- v3.33 Looks fine.
- v3.34-ALPHA-5 Extra test added.
- v3.40-BETA-3 Looks fine.
File extensions
Nothing,
mp3,
mid,
avi,
pdf,
doc,
xls,
xml,
xsl,
swf,
php,
cfm,
asp,
gif,
jpg,
png,
bmp,
zip,
rar,
shtml,
xhtm,
htm,
html
- v3.21-4 fails to make a number of these external, with NEL=ON
- v3.23-rc1 still handles this inconsistantly.
- v3.30-rc8 same as v3.23-rc1
- v3.30-rc13 handles a lot better now.
- v3.40-BETA-3 all links are made external except gif, jpg, png because those are requested in filters (*.gif, etc), however the links are rewritten as "test.html", "test-2.html", "test-3.html"
Mis-matching file extensions and MIME-types
If a file has no extension, or its extension does not match the MIME-type sent, how will it be handled?
- v3.34-ALPHA-5 New test section.
- v3.40-BETA-3 creates a redirect page "directory.html" (hmm...), "text document.txt" (good), "zip.zip" (good), "zip-wrong.zip" (wrong), and "image-gif.gif" (good)
Dodgy Microsoft HTML
When saving a Word document containing images as a web page, the resulting HTML can be riddled with much dodgy code.
If HTTrack is unable to find the image in <v:imagedata> then Internet Explorer will fail to display an image.
The problem is that the "image" is hidden within comment tags.
<!--[if gte vml 100]>
<v:imagedata src="images/dot_red_dk.gif" o:title="sport strip 2"/>
<![endif]-->
- v3.30-rc15 does not find image because it is contained within comments.
"Clever parsing": assorted Javascript and CSS tests
There are many tricky code situations where HTTrack may need to try to find "hidden" links within CSS or Javascript.
Xavier's function foo()
Tests the parsing of javascript to gather images.
- v3.33 get_this_one[1-4].gif is attempted to download, get_this_one5.gif is not discovered.
- v3.40-BETA-3 same as v3.33, however their extensions are now rewritten from .gif to .html in the page.
Javascript: function argument
The following javascript code is embedded in the page here:
<script language="javascript">
function imageURL(action) {
}
</script>
- v3.23-rc1 incorrectly rewrites to
function imageURL(action.html)
- v3.30-rc8 no longer problem.
(Empty Reference!)
This stupid Adobe GoLive message can appear in numerous places of a webpage.
Should HTTrack attempt to get a background URL that looks like this?
<blockquote background="(Empty Reference!)"></blockquote>
- v3.23-rc1 becomes
background="(empty%20reference%21).html"
-- a link to a non-existant file.
- v3.30-rc8 incorrect like v3.23-rc1
- v3.30-rc13 looks fine.
CSS: Background applied to a DIV's style attribute
Test in this DIV to see if HTTrack detects the image.
<div style="background-image: url(../../misc/httrack/images/sto_bg.gif); background: url(../../misc/httrack/images/sto_bg.gif);">
- v3.30-rc13 does not find the image in this <div> but will find it in the CSS block in the <head>.
- v3.30-rc15 finds all occurances.
Javascript: Link graphic with rollover image change
This test is "fake", only containing example rollover code in the
A
tag, to see if HTTrack detects the images.
- v3.30-rc13 does not find the rollover image -- Invalid test due to my error.
- v3.30-rc15 finds the image.
Javascript: Preloaded images in <body> tag
It's very common to see similar code to the following on a website.
Fake test example included in this page's body, to see if HTTrack detects the images.
onLoad="MM_preloadImages('images/dot_darkonwhite.gif'"
- v3.30-rc13 does not find the images in the <body> tag -- Invalid test due to my error.
- v3.30-rc15 finds the images.
Javascript: found a slash, thinking it's part of a URL
HTTrack has been known to find code like:
document.open("text/html");
and rewrite to:
document.open("../html.html");
- v3.30-rc13 writes as
document.open('html.html');
.
- v3.30-rc15 no problems.
Javascript: variable concatenation
Objects may be used in javascript... will HTTrack look "too hard" and find things like ".bgImageUp" thinking it is a file extension?
The following code is in javascript in the <head> section.
"url(" + a.Menu.bgImageUp +")"
'style="background:url('+menu.bgImageUp+');'
- v3.30-rc13 writes things like
" +%20a.menu.html")
and ('+menu.html')
- v3.30-rc15 no problems.
CSS: @import
This is a two-level CSS test. This page contains
@import url(cssimage.css);
and that file itself
contains
@import url(cssimage_advanced.css);
A file called "cssimage.jpg" should be downloaded and appear top-right of the page. Should also find "toy_gold_32.gif"
- v3.34-ALPHA-5 New test.
- v3.40-BETA-3 Finds both images and correctly changes CSS code.