The desire to do such a thing will seem eccentric to applet writers. My concern, though, is with using Java as a general-purpose language.
The names foo
and foo/bar
, for instance are not reliable
filenames,
because their meaning depends on the current working directory. By
contrast, the Unix names /foo
and /foo/bar
, as well as the DOS names
c:\foo
and e:\foo\bar
, are reliable.
canonize
that has the following characteristic:
if name1
names some unique file F1
, and name2
ditto F2
, then canonize(name1)
is the same as
canonize(name2)
if and only if F1
and F2
are the same thing.
We use plain English words, rather than = signs and == signs and .equals() methods, to distance the definition from any implementation details. Likewise, we avoid the word "identical" which used to mean "the very same entity", but now has no particular meaning that's generally accepted.
And in some cases, they are the same: the isAbsolute()
method will return
false for foo/bar
, and the getAbsolute()
will return a reliable version
of that name, perhaps /user/baz/foo/bar
.
Similarly, in Unix, the name /foo/bar
is absolute, and Java will
tell you so, and getAbsolute(
) will apply the identity transformation
to the name: it won't change it.
Regrettably, the ultimate machine-independent language was
designed by Unix weenies, and therefore doesn't consider DOS worthy of the
effort
to make it work right. Plainly, the DOS filename \foo\bar
is not
reliable, or in any possible meaningful sense absolute. But Java
happily classes it as absolute, and lets getAbsolute()
do nothing to
it. This behavior is not a stupid bug in some implementation: it's
documented in a book published by Javasoft, with Gosling as co-author.
By the way, this is an unmistakable violation of the language spec (section 22.24.15. etc.), also co-authored by James Gosling. Nobody's perfect.
Likewise, the file c:foo
is absolute in no way except Java's treatment
of it.
To get these things right when working in the machine-dependent code for a particular Java implementation would be an hour's implementation work -- if you worked cautiously and had forgotten the details of the proper system calls, and had to take time looking them up. Otherwise it would be quicker. No, that's wrong. What system calls? If the first character is backslash or the second is colon and the third is not backslash, then grab some bytes from user.dir. You could take an hour to do that if you had a nice coffee break in the middle, provided the nearest Starbucks is several blocks away.
Fixing this would not require violating the almost-explicit provision
of the spec that the operation of File
is purely syntactic
and may not dynamically call for information from the operating
environment beyond that which was read at startup. You must understand
that provision, lest you think that the getParent()
method
is anything other than purely
syntactic, or maybe I should say lexical. What is the parent of
/foo/bar/baz/..
? According to Java, it's
/foo/bar/baz
. This is a useful answer, but one that calls
for a warning to the user who might reasonably hope to get the actual
parent directory when calling
getParent(
). Determining the actual parent is an
interesting task. On DOS, you can get the answer syntactically:
/foo
. On Unix, symbolic links make matters more
challenging. Which leads us more or less naturally to canonical names.
Canonical names, of course, are harder than reliable ones. Under DOS you have a reasonable shot at making a canonical name from a merely reliable one, by converting everything to a consistent case and editing out . and .. names. Without links, either hard or symbolic, this ought to work right.
For Unix, the job may not even be possible in the general case. On any
system at all, you have a reasonable shot at a canonical filename with
the simple procedure
cd name1; pwd
or equivalent. On Unix that will solve the problem for a lot of
practical cases (and, yes, there are practical cases that need
to make such a determination), but I can't say whether it will always
work.
Now here's where Java really comes into its own: you can't do that with
Java code. You've got user.dir
to find the working directory from
which Java was invoked, and that name really is a reliable one and
probably a useful form of canonical name. But you can't change
directories; and if you could, you still couldn't ask for the system's
version of the name of the new directory.
We see now that there is one case in which we can get a reliable name that presumably is also canonical: the directory from which the program was executed. And there are also all its parent directories, all the way to the root. If you force the user to execute the program from within a particular directory, you're all set, so long as you aren't interested in any other directory.
If for some reason you want to let the user name some other directory, you would never let anyone type in a filename; that's old-fashioned, and implies that you could possibly write a command-line program and completely bypass the elegant design of AWT. And if you do allow a name to be typed in, you could insist that the user type an entire, reliable filename. Except, of course, that Java gives you no accurate and machine-independent way of distinguishing reliable from evanescent filenames to enforce that requirement.
Copyright (C) 1998 Daniel Drake. A royalty-free license to reproduce this document in whole or in part is hereby granted provided (i) all additions, omissions, and other changes are clearly marked; (ii) the work is not reproduced as, or as part of, a work for which payment is charged; (iii) this notice is reproduced without change. Quotations for critical or polemical purposes, with proper attribution, are permitted in any case, being obviously fair use.