Sorry about the repulsive names, but that's the Java standard (or was, when these were written and one was supposed to use COM in all caps).
First, then, you download the program and put it on your classpath, however that's done in your system.
When you've got it installed, go to some directory in which you won't
mind getting new stuff stored by the program. Execute the program by,
for instance,
java COM.dandrake.wlv.PressMyButtons
You will get a dialogue in the middle of your screen, something like
In the text entry area at the top, enter the full URL for some simple, convenient page, like your home page. Click on the Mirror button to enable the mirroring feature. Leave the rest alone, and click Go.
Two windows will appear, hogging much of your screen. The one on the left, with the label "Error listing" will display any problems found with the page, like dead pointers, #reference items with no anchor defined, and so on. Since your site naturally is error-free, this window will show nothing until it says "Finished" when the job is done.
The other new window, "Progress log" will show what's currently happening. These messages will change at my whim, but they're likely to show what's being loaded from the Internet at any moment when nothing seems to be happening.
When the job is done, or before then, look at your working directory. It will have a new subdirectory named after your Website host, and if you follow the directory structure, you'll find a copy of the page you called for. Any pages that it incorporates (pages and images, but not hyperlinked pages) will allso be copied into their proper place in the directory structure; anything that's on a different host will be in a directory named after the host.
Now try mirroring that page plus everything that it directly links to on the same site. In the Depth field, enter 1. While you're up, click on the "Check all local pages" box, if you want it to check your whole site for bad pointers and the like. Then click Go. (A new pair of result windows will replace the old ones. By the way, it does no harm to close those windows at any time, but of course you'll lose the data.)
It won't load your home page a second time, because you already have a copy. But it will scan it for links, and load the things it points to, provided they're on the same site. Set Depth to 2, and it will go 2 links away; set Breadth to 1, and it will load from other sites that you link to, to the same total Depth.
Want to save the list of problems? Pull down the File menu on a result
window, and browse for a file to save to. If you select an existing
file, you'll get to decide whether to overwrite the file, append to it,
or forget it.
What the Programs Are About
PressMyButtons has two functions, which it can execute independently or at the
same time.
It's pretty obvious why you might want a program that checks your site
for links that don't link to anything. This type of function is
avavilable in some commercial web builders, but here's
a standalone version for anyone who wants one. And the program has two
other features that are not found in other programs I know of.
java COM.dandrake.wlv.PressMyButtons
In the text area you can enter either a full URL or a simple filename with or without a directory path. Or you can press the "Or choose a file" button to bring up a file dialog.
Now, having aimed the program at something simple for a first test, hit the GO button. Two scrollable windows will pop up: one shows a string of progress messages so that you know what's going on, while the other shows all the errors that are found. When the job is finished, the Progress window displays "DONE", and the main window gets the GO button re-enabled. You can stop the scan by pushing the STOP button; the result windows will remain, with the progress window showing "STOPPED". You can terminate the whole program at any time with the Cancel button.
The operation of the program, as distinct from its user interface, is described a little more precisely in the section on WebLinkValid.
A local file is one that has a simple directory/file reference rather
than a full URL. For instance, foo
or foo/bar
or ../foo/bar#baz
. If the option "Check absolute refs" is
checked, it will also check pages pointed to by full URLs, so long as
they appear to be on the same host. See Check absolute
refs. The limitations are intended to keep the
program from checking the validity of everything on the World-Wide Web;
we leave that sort of thing to Alta Vista and Google.
If the box is not checked, then only the named file will be fully
checked. Other files may still be scanned for the existence of names in
...#name
references; see the next section.
#someNameOrOther
. It will log an error if someNameOrOther
is not in the file.
If the box is not checked, then files will be scanned only if required by "Check all local pages", above.
When this option is on, pages are also checked if the reference is absolute, provided that the host is the same as the one on which the checking started. This can be problematic with large ISPs that host numerous Web sites, but it's unlikely to cause serious trouble, maybe.
If the reference to a file is via http:
, the file will be loaded into a
directory tree named after the host. If it's from file:
, and
it's on a DOS-like file system, the base of the tree will named after
the unit letter, as in CCoLoN
. There is no option to condense
the directory tree structure and load all files into one directory.
The things to be loaded are found in the checking process. Since it won't look at an off-site page to find references, there's no danger of trying to mirror the entire Web.
As of November, 2000, it also has a mirroring capability. See under the
-m
option. See, more relevantly, the PressMyButtons
documentation.
The names can be proper URLs
(of type http:
or file:
) or simple file
names. It generates two outputs:
stdout
stderr
href
and
name
attributes in <A>
tags (ordinary
hyperlinks) and <AREA>
tags (maps). Wherever
an href
is of a type that the JDK URL class understands (a
local reference or file:
or http:
), it tries
to validate the reference as follows:
#name
spec, and the file is *.html
or *.htm
, scan it to check that the name is defined.protocol:
prefix), find and test all its references, recursively.
It doesn't know how to check the validity of a reference to a directory
(http://www.nobody.glump/foo/bar/
) because JDK 1.0 doesn't
know how, so it ignores them. If you leave off the / it will complain
that the file doesn't exist. It's a worse bummer that the program can't
verify ftp:
links, but that's the way it is.
Suggestions will be appreciated unless they're obvious known problems, like "You should give it a decent interface and maybe offer it as an applet."
-m
option-m
option on the command line before any URL
spec, the program will download every referenced file on the same host.
Specifically,
.html
or .htm
, it examines that file in the same
way. The order in which it looks at things is unspecified.
-m
option is on and the file is on the same host as the
original file. Note that the reference need not be local in the sense
just given, and the file may or not be scanned for contents. The file
may be of any type.
file:
protocol), the
local directory structure is mirrored. On DOS-ish filesystems, names
like C:
are rendered as CCoLoN
.
-c
(clobber) option is on.
ftp:
files.
%
sign are considered to be queries of some
kind, and are stripped to the part preceding the %
. This is a
naïve approach, but works for the moment. Its crudeness may cause problems on
Macs. If so, let us reason together.
adc/parser
, an HTML tokenizing package written by
Arthur Do, available at
http://www-cs-students.stanford.edu/~do/.
The classes are © 1997 and are used under the terms of the
license.
The WebLinkValid program itself is freeware, but distribution is restricted under the copyright in the preceding paragraph. Enjoy.
Document: http://www.dandrake.com/wlv/wlv.html