John's HTML Tutor (indexed)

Note that this is all on one long thin loo-roll, and the hypertext links just jump up and down it (unless preceded by a symbol or "[Web]") -- this means that you don't have to be on-line to read it.
[Home]Return to my Home Page . . . [Index] Down to Alphabetic Index

Contents


What is HTML?

HTML is the acronym for HyperText Mark-up Language. The significance of HyperText is that key-words can be Marked-up to refer to other documents and sources. An HTML file is just a plain text file; in fact, it is very plain text in that it only uses the ASCII characters 32 to 126, but no top-bit-set characters. However, it does allow you to "mark" up the text to indicate where a fancy character should appear.

When an HTML page is viewed, or rendered, the reader's browser will attempt to reproduce as many of the fancy bits as it can; but the author of the page has no way of knowing what the abilities or limitations of the reader's browser might be. One thing that is important to appreciate, is that HTML is not a Page Description Language for doing DTP work: it does offer a degree of control over the final page appearance, but can never give total control. A further point is that, in principle, a browser might be text-only (no graphics), or could be a text-to-speech or text-to-braille converter; regrettably, far too many HTML-authors go for lots of gee-whiz stuff which would then be quite uselessly unreadable!

Another significance of "Mark-up" (which arises from SGML -- Standard General Markup Language -- from which HTML is derived) is that it enables portions of text to be identified as, eg "this word is the author's surname", or "this phrase is a citation", or "this is a key-word, which should be referenced in the index" etc; ie have nothing to do with the way the text actually appears on the page!

There is an international Standards Committee for HTML:
it is the [Web] World Wide Web Consortium (W3C) at CERN in Geneva. They have a standard called HTML 2.0 which is the basic standard, and are working on revisions as an on-going activity: a draft version of HTML 3.0 has been published, but has "expired" without being finalised; and the current draft is HTML 3.2. I will try and keep myself to describing HTML2 as far as possible, and note any extensions to it which creep in.

[^Top^] Up to Top of Page . . . [Index] Down to Alphabetic Index


Basic Terminology

A "Tag" is an HTML keyword enclosed in angle brackets (for which the mathematical less-than and greater-than < > symbols are used).
A "Start Tag" is a tag like <BODY> and an "End Tag" is one like </BODY>.
An End Tag is always the same as its matching Start Tag but with a preceding "/".
An "Element" is something (anything) between a matching pair of Start and End Tags,
eg <H2>Chapter One</H2>.
Note that not all Start Tags need a matching End Tag; for example, <BR> and <IMG> don't.
An "Attribute" is the equivalent of a "parameter" inside a Tag, for example <DL COMPACT>
or <A HREF="page2">.

The "tags" (and "keywords") may be in either upper-case or lower-case; but you should not use a mixture of the two, as some browsers are liable to throw a wobbly!
I have used upper-case throughout: a habit inherited from Basic-programming, where the variables and strings are (predominantly) lower-case, and the BASIC keywords are always upper-case, and therefore readily distinguishable; however, you may prefer to stick to lower-case throughout to reduce risk of RSI on whichever finger/thumb/hand/foot you use to press the SHIFT key!

An "Anchor" is either of the two ends of a HyperText Link.
<A NAME="para5"> is called a "Tail Anchor" or "Source",
and <A HREF="page2#para5"> is a "Head Anchor" or "Destination".
A "Hyperlink" is deemed to join them from the Tail to the Head.

An "Entity" is a coded representation of an otherwise-unavailable character, eg &eacute; or &#163;

A "User Agent" is what W3C call "Client-Side Software" which is what everybody else calls "my Browser"!
A "Server" is the hardware on which a web site resides and the software there which enables it to be accessed by "visitors" (anyone trying to view the site).
"CGI" stands for "Common Gateway Interface" and is software on/in the Server to process requests such as those generated by links to images of type ISMAP and certain types of Forms submission. The average HTML author usually has no control over the availability of CGI on their site.

An "Identifier" is a means of specifying the location of a particular file containing an HTML page or an image or various other items;
a "Fragment Identifier" is a means of locating (for example) a particular point or paragraph within a page.
"URI" stands for Uniform Resource Identifier, which is a more generalised term than "URL" -- Uniform Resource Locator. URLs can be "Absolute", "Base", or "Relative".
So http://www.argonet.co.uk/users/protovale would be a Base URL; john.html would be a Relative URL (relative to the Base URL); and http://www.argonet.co.uk/users/protovale/john.html is an Absolute URL.

[^Top^] Up to Top of Page . . . [Index] Down to Alphabetic Index


Anatomy (HEAD and BODY)

The first tag of the document is <HTML>, and the very last tag (surprise, surprise) is </HTML>.

The tags <HEAD> and </HEAD> , and <BODY> and </BODY> , enclose the two parts of the document:
The HEADer, none of which appears on the rendered page; and
the BODY, which contains everything which should appear.

In some documents, you may see the <HEAD> tag preceded by:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
or similar; this is a declaration of strict W3C HTML conformity, and I'd suggest you leave it out unless you're sure you know what you're doing!

HEAD and TITLE

About the only thing you need in the HEAD section is a title; thus:
<HEAD>
<TITLE>Yours truly's Home Page</TITLE>
</HEAD>
which will cause your title to appear in the browser window's title bar, but not on the displayed page.

Another item you can include within the head looks like:
<BASE HREF="http://www.argonet.co.uk/users/userid/">
which in principle allows the page's true location to be recognised out of context; however, unfortunately, its presence would then not allow you to view your own page locally and off-line! You can only really use this if you have direct access to the server and can build your page(s) "on-site"; in other words, I'm advising you not to use it!

There is also a permitted tag called LINK, which however is not the sort of hyper-text link you're looking for, so forget it;
and another called META which allows information about the document to be included, which will not appear on the browser's screen, but which can be searched for by other "agent programs": the tag contains two attributes: NAME= and CONTENT= .

BODY

In HTML 2.0, <BODY> and </BODY> are just tags which go immediately after </HEAD> and immediately before </HTML> respectively, thereby enclosing all of the displayable document.

In HTML 3.0, only one (useful) attribute is allowed inside the body tag, thus:
<BODY BACKGROUND="image_URL">
which will tile the background with the specified image.
Be careful how you use this -- I've seen many pages where the text is unreadable because of an unsuitably-dark or complicated background!

HTML 3.2 adds the attributes:
TEXT="#RrGgBb" which sets the text foreground colour,
BGCOLOR="#RrGgBb" which sets the background colour (NB: American spelling!),
LINK="#RrGgBb" which sets the colour of a hypertext link,
ALINK="#RrGgBb" which colours the link while it is Active (being accessed),
VLINK="#RrGgBb" which sets the colour of a link that has been previously Visited;
where RrGgBb are the hexadecimal values of the three colours, ie #00FF00 is bright green.
Note that most browsers have default colours for all of the above (except for BGCOLOR which is effectively "unset", thereby allowing the BACKGROUND image to be the computer's default window background), and so these commands are used to over-ride them.

If you use BACKGROUND= to tile a background with an image, you should also precede that with a BGCOLOR= to set a colour that is the average or dominant image colour, so that the text renderer has a background colour to anti-alias towards (though of course most of the PC browsers don't implement anti-aliased text, but browsers for Acorn do).

Also, If you set any one colour, you should set them all.

Comments

Any number of comments to the source (which are not rendered by the browser) can be inserted using:
<!-- First comment -- next comment -->
Note that the structure is: a less-than, a pling, two hyphens and one space at the beginning; and one space, two hyphens and a greater-than at the end.
Note also that the comment text must not contain any tags or reserved characters like < (I discovered this by accident when trying to "REM out" a block of links temporarily, and it caused mayhem!)

[^Top^] Up to Top of Page . . . [Index] Down to Alphabetic Index


Textual

Text will appear fairly conventionally, but is in "free-format". This means that during rendering, occurrences of spaces, tab-characters and new-lines are all treated as being "white space" (without distinction); multiple consecutive white spaces are then reduced to a single white space, and then the white space is shown as a single space character, unless the line has reached the right margin in which case the text continues on the next line.
To put it another way: you can't force multiple spaces or new lines. (In fact you can, but only in certain cases).

Headings

Any text enclosed between one of the six tags <H1>, <H2>, <H3>, <H4>, <H5> or <H6> and the matching tag </H1> to </H6> will be displayed on a line on its own in a larger-than usual size with <H1> being the largest and <H6> being the smallest.

The recommendation is that you should use H1 for the title heading at the top of the visible page, H2 for the main section headings, H3 for sub-headings, and H4 for paragraph headings.
Obviously, if H1 looks too big, you would start the hierarchy at H2 instead; but note that somewhere around H5 or H6 (depending on the particular browser) the font size has reduced to much the same as the ordinary body text.

Styles

There are an awful lot of these altogether, all needing a matching End Tag.

There are two official styles for emphasising text: <EM> and <STRONG>, for slightly emphasised and strongly emphasised respectively. Most browsers will render these in italic and bold respectively; though you have no way of guaranteeing this!

There are two "typographic" styles, <I> and <B> for italic and bold respectively. Officially, W3C deprecates the use of these, as they presuppose specific font-weights on the reader's browser; however, most authors nevertheless use them, and in general, browsers support them.
Note that <B><I>Look!</I></B> and <I><B>Look!</B></I> need not necessarily come out the same! There is no requirement that browsers must support BoldItalic; most do, in which case both the above would come out like that, but equally they might appear in MediumItalic and BoldUpright respectively!

A third and widely-used typographic style is <TT> (for "TeleType"), which displays in a monospaced font, eg Corpus/Courier (and is what I'm using here for all the examples).
It is usually possible to combine <TT> with <B> or <I>; but I wouldn't recommend you to try and nest any of the other ones!

Then comes a variety of "Phrase Markup Idiomatic Elements":
<ADDRESS>: typically rendered in italic, possibly indented,
<CITE> : citation or book-title; typically italic;
<CODE> : for short fragments of code; typically monospaced;
<KBD> : to indicate text typed in by a user: monospaced;
<SAMP> : for an example of literal characters: monospaced;
<VAR> : to indicate a variable placeholder to be substituted: italic.
I wouldn't like to guarantee that all browsers necessarily support all these styles; if a browser doesn't recognise a style tag, it (supposedly) just ignores it.

There exist two other styles, <LISTING> and <XMP>, which HTML2 recommends you should NOT use! However, <PRE> (see later) and <SAMP> will do the job.
You may come across <U> for Underline; not many browsers support it, so it's usually a waste of time using it!

The markups <BLOCKQUOTE> and <PRE> are described under "Layout" below.

HTML 3.0 adds several more, of which three of the possibly-useful ones are:
<Q> : a quotation, surrounded by matching sexed quote marks (unfortunately, rarely supported, and withdrawn by HTML 3.2);
<SUB>: subscript (inferior); and
<SUP>: superscript (superior).

Font Size and Colour

W3C deprecates markup which specifies typographic detail. Nevertheless, Netscape includes the elements <FONT> and </FONT> used as follows:
<FONT SIZE=n> where n is between 1 (smallest) to 7 (largest): note seven sizes, not the six of <H1> to <H6> , and in the opposite order!
<FONT SIZE=+2> and <FONT SIZE=-1> for example, which ask text to be rendered two sizes larger or one size smaller -- respectively -- than otherwise.
Netscape and HTML3.2 also allow a tag of the form <BASEFONT SIZE=n> which however a good HTML-author would have no valid use for.
Note that none of these specify absolute point sizes.
Because some of these are non-standard Netscapisms, I would recommend restraint in their use.

<FONT COLOR="#RrGgBb"> (note American spelling) changes the colour of the text to the hexadecimal RGB value given, until the matching </FONT>.
Further details of the #RrGgBb format, and of setting the foreground and background colours for the entire document, are given in the section on <BODY> (qv).

[^Top^] Up to Top of Page . . . [Index] Down to Alphabetic Index


Layout

As stated earlier, browsers ignore newlines and treat them as white spaces. However, you may force a new-line with the tag <BR> (which does not have a matching end tag).
But multiple <BR> tags do not produce multiple newlines, just a single one.
The tag <P> (for new Paragraph) will produce a blank line. You may use it either on its own as a "blank line" generator, or use it paired with </P> to enclose a paragraph as an "element"; either way seems to work.

Browsers also ignore multiple spaces and attempts at tabs; but you can enclose a block of text between <BLOCKQUOTE> and </BLOCKQUOTE> which will result in blank lines above and below the block and the block's left margin being indented.
(HTML 3.0 allows the abbreviation <BQ> but this may not be recognised by HTML 2 browsers)

A further degree of horizontal control is the use of <CENTER> and </CENTER> (note American spelling!). Text will be centre-justified; <CENTER> starts a new line; and </CENTER> will be followed by a new line.
Note that this was a Netscapism rather than W3C HTML (although it is widely supported).
The HTML 3.0 equivalent is to use <P ALIGN=CENTER> text or whatever </P>
and the preferred HTML 3.2 method is to use <DIV ALIGN=CENTER> text or whatever </DIV>

A special form of style is also available: <PRE> (for PRE-formatted).
<PRE> starts a new line and changes the font to monospaced (Corpus/Courier). Thereafter, all (multiple) spaces and new-lines are "obeyed" exactly as per the source. The closing </PRE> issues one final new-line, then reverts to the body font.
You would be advised to keep to fewer than 80 characters per line (W3C does define an attributed tag <PRE WIDTH=chars> to instruct a browser fit that many characters on a line, but hardly any browsers support this particular use of the WIDTH attribute).
This is probably the easiest way to present tabular matter when you want things to appear laid out in columns.

Finally, you can insert a very clear separator between sections of text by using <HR> (for Horizontal Rule).
Some browsers support <HR WIDTH=pixels> , but this is not defined in W3C HTML.

Lists

The two main types of lists are the "Ordered List" and the "Unordered List".
The start/end tags for these are <OL> , </OL> , <UL> & </UL> .
What they have in common are:
They enclose a sequence of "List Items", each preceded by a <LI> tag (no </LI> needed);
Each list item is displayed indented from the left margin, and with extra line space between them;
They may be mixed and nested, in which case the items indent progressively further to the right.
The difference is that Unordered List Items are preceded by a bullet, and Ordered List Items are preceded by an automatically-generated sequential number.

HTML 2.0 adds two further types of list: <DIR> (directory), and <MENU> .
These are a bit like <UL> , except that: they can't be combined or nested; and there is no extra line-space between list items.
<DIR> is supposed to arrange the list items in columns (if possible); my experience is that it usually doesn't!

HTML 3.0 withdraws <DIR> & <MENU> , but adds two attributes to the <UL> tag:
PLAIN which suppresses the bullets; and
COMPACT which suppresses the inter-item line-space (thereby emulating <MENU> ).
HTML 3.0 also introduces a tag <LH> to precede a "List Header" within the list range (no end tag needed); but this is rarely supported, and is not mentioned in HTML 3.2.

A more esoteric type of list is the "Definition List", delimited by <DL> and </DL>.
This contains two types of list items:
"Defined Term" <DT> and "Definition Data" <DD>.
The "Term" items are rendered flush-left, and the "Data" indented to an arbitrary tab-stop.
If the Terms are ridiculously short (like two or three characters), and you use <DL COMPACT> , the Term and Data are supposed to appear on the same line; otherwise they will be on separate lines.
W3C thoroughly deprecate the misuse of this markup to fudge indented but otherwise plain text (use BLOCKQUOTE instead); similarly, attempts to nest Definition Lists is rather unpredictable!
(HTML 3.0 - but not HTML 3.2 - also adds <DH> Definition Headers; but this isn't really relevant if you're not using <DL> anyway!)

[^Top^] Up to Top of Page . . . [Index] Down to Alphabetic Index


URLs, Reference Identifiers and Filenames

Most references will be to filenames. The server will use Unix filenames (if it isn't actually a Unix box, it'll pretend to be one for consistency). These names are made up of: letters (case-sensitive); hyphen and plus (but not as initial characters); underscore and dot. Most filenames are formed using the characters following the last dot to indicate filetype; this looks like MSDOS dot-extension, but the dot-bit is really part of the name, not an extension to it. To avoid having to use !LongFiles or similar, create filenames of length "name plus one plus extension" ten characters maximum.

The directory separator in a path-name is a (forward) slash.
If you don't specify otherwise, your filenames are interpreted as being relative to the BASE directory.
So, a reference to thing.html would be that filename in the same directory as the current document;
piccies/thingy.gif would be a file thingy.gif in the directory piccies in the same directory as the current document;
and ../photos/scene.jpeg (note double-dots) would be a file in a directory in the parent of the directory containing the current document.
If all your files are in the same directory, you won't need the last two forms to link around your own site.

To refer to a file on another site, you must use a full URL of the form:
http://www.domain_name/base_directory/path/filename.ext
(you could also use that form for anything on your own site, but it would be rather pointless!)

[^Top^] Up to Top of Page . . . [Index] Down to Alphabetic Index


Links (Anchors)

Hyperlinks

Most uses of anchors are of the "jump to" or "fetch this" variety, typified by:
<A HREF="page2.html">Continued on p.2</A>
where page2.html is a Relative URL, the name of a file in the same directory as the current document;
or <A HREF="http://www.acorn.co.uk">Acorn's Home Page</A> where an Absolute URL has been given.

In both cases, the text between the <A> and </A> tags will be recognisable as a hyperlink on the rendered page, typically by being in blue and underlined. When the reader clicks on the hyperlink, the browser will "fetch" or "visit" the specified page.

A link may also take the form <A HREF="mailto:userid@domain.name"> in which case the browser will enable the reader to compose and send an email to that address.

Fragment Identifiers

It is also possible to "jump" to a specific point on a page. The identifier following the HREF= has a fragment identifier appended, consisting of a # symbol and the "name" of the fragment.
Thus HREF="page2.html#para5" refers to a fragment on page2, and HREF="#more" can jump to a fragment somewhere on the current page.

The actual points you want to jump to should be identified (on their respective pages), using
<A NAME="para5">Item Five</A> and <A NAME="more">Further Ideas</A> (to match the examples above).
In these cases, the text between <A> and </A> will not be highlighted by the browser.
Fragment identifier names must be unique within the page on which they occur (in NAME= attributes).

Image Maps

This is when the browser displays a largish image with several "button areas" scattered over it (or even a literal map), and the reader clicks on a particular point on the image to activate one of several links.

The position on the image map then has to be decoded into one of several possible link URLs; and there are two ways of doing this:
'server-side' decoding, by the server and additional info on the site,
and 'client-side' decoding, by the browser using info on the page.

A Server-side image map is encoded in a manner such as:

<A HREF="http://host/directory/script"><IMG SRC="mymap.gif" ISMAP></A>

As the reader moves the pointer over the map, the browser keeps track of its coordinates in x-pixels horizontally and y-pixels downwards. When the reader clicks, the browser performs a hyperlink to the URL
http://host/directory/script?xPix,yPix
ie appends a question mark and the two comma-separated coordinates.

What happens next relies on the host machine/server containing a CGI thingy which will take those coordinates and look up the URL that is really wanted by referring to a "script" file that the author supplies.

To give a specific example: suppose you are userid@argonet.co.uk, so your home page is http://www.argonet.co.uk/users/userid/index.html; and you have created an image called "mymap/gif" which is 300 pixels wide and 100 pixels deep; and you want the left-hand half to link to a page called "page2/html" and the right half to link to "page3/html"; and you will be supplying a decoding script called "script/map".

The image should be included in your home page by the HTML-code

<A HREF="http://www.argonet.co.uk/users/userid/script.map">
<IMG SRC="mymap.gif" WIDTH=300 HEIGHT=100 ALT="Jumping map" ISMAP>
</A>
(and you can include BORDER=0 within the IMG tag to stop it drawing a box round the image).

The file script/map (which is a plain text file, where the name is not important but the extension .map is) could be

default http://www.argonet.co.uk/users/userid/index.html
rect page2.html 5,5, 145,95
rect page3.html 155,5, 295,95
rect index.html 0,0, 300,100
where the pairs of co-ordinates are the top-left and bottom-right corners of the active rectangle.

The rectangles are allowed to overlap; in which case the first-defined takes precedence in the region of overlap. In the example, the first two rectangles do not overlap each other, but they do both overlap the last rectangle which is the whole image, which would be obeyed (ie do nothing except reload the home page) if the reader clicked exactly in the middle between the first two rectangles!

Two other shapes can be defined; for example

circle page4.html x1,y1, x2,y2
where x1,y1 are the co-ordinates of the centre of the circle,
and x2,y2 are the co-ordinates of a point on the circumference; and
poly page5.html x1,y1, x2,y2, .... xn,yn
where there are as many co-ordinate pairs as there are corners/vertices of a polygon;
this enables you to define triangles, diamonds, hexagons, etc.

A Client-side image map is inserted with just an IMG tag such as

<IMG SRC="mymap.gif" WIDTH=300 HEIGHT=100 ALT="Links" USEMAP="#decode">

where USEMAP= is the magic attibute that specifies it is an image map and the "#decode" is a fragment identifier pointing to a <MAP> tag on the same page which contains the decoding information.

The decoding information might look like

   <MAP NAME="decode">
   <AREA SHAPE=RECT COORDS="5,5,145,95" HREF="page2.html">
   <AREA SHAPE=RECT COORDS="155,5,295,95" HREF="page3.html">
   </MAP>
where the name "decode" matches that in the USEMAP= attribute.

The <AREA> tag can define three other shapes, using attributes:

    SHAPE=CIRCLE  COORDS="xCentre,yCentre,radius"
    SHAPE=POLY    COORDS="x1,y1,x2,y2, .... xn,yn"
(or SHAPE=POLYGON COORDS="x1,y1,x2,y2, .... xn,yn" ?)
    SHAPE=DEFAULT
(note the different way of specifying a circle).

You can also replace HREF="url" by NOHREF, in which case that area won't link anywhere (useful for making 'holes' in the map).

You also can and should include AREA attributes of the form
ALT="textual description"

The Client-side method has the advantage of speed as decoding is performed locally; but it does require the browser to support it (not all do). The Server-side method is supported by all browsers; but it requires the server to contain a cgi-program to actually interpret your script file.

However you can (and probably should) combine both methods on your page: the visitor's browser will then implement the USEMAP method if it can, or the ISMAP if it can't.
The image will be inserted along the following lines:

   <A HREF="http://www.argonet.co.uk/users/userid/script.map">
   <IMG SRC="mymap.gif" WIDTH=300 HEIGHT=100 USEMAP="decode" ISMAP>
   </A>
   <MAP NAME="decode">
   <AREA SHAPE=RECT COORDS="5,5,145,95" HREF="page2.html">
   <AREA SHAPE=RECT COORDS="155,5,295,95" HREF="page3.html">
   </MAP>
and you also write and upload a decoding text-file "script/map" as already described.

Downloading

So what happens if the file referred to in <A HREF="filename"> or
HREF="http://www.site.name/directory/leafname" is not an HTML file? The web-site server will obediently send the file to the browser; but then what? If the browser can not render it as a page, it has two options (the exact details of which will depend very much on the software of the particular browser, and any configurable options the reader has set).

Possibility number One, is that the received file can (and will be) "run" or "obeyed"; this will require either that the operating system on the browser's platform can run the file, or else that the internet/browser package includes "helper applications" that can -- and in any case requires that the "type" of the file can be unambiguously recognised.

Possibility number Two, is that the browser downloads the file, recognises that it is not a displayable HTML page, and so offers the reader an opportunity to save it to disc:-- on a RiscOS platform, this means that the browser will put up a standard "Save as" dialogue box.
In any case, it is customary for the file to be an archive ("zipped"), so that download time is minimised; the file being "unzipped" later off-line.
As implied above, the archive should be easily recognised as such: by having an appropriate "dot-extension" such as .arc , .spk or .zip ; and in the case of RiscOS platforms, having the appropriate file-type (if preserved) to be on the safe side.

It is also possible for any HREF to be an absolute URL of the form
ftp://ftp.site.name/etc/etc providing it is a valid reference, but take care:-- it does not follow that a web-site server will necessarily allow ftp-access (nor vice versa).

[^Top^] Up to Top of Page . . . [Index] Down to Alphabetic Index


Images

The start tag is <IMG> (but no End Tag is required).

SRC=

The Start Tag must also include at least one of two attributes: SRC= and ALT= ;
the usual format is: <IMG SRC="imagefile" ALT="text string"
where imagefile is something like photo.gif and text string might say Mugshot of me.
The example imagefile given would be a file in the same directory as the HTML page; but there's nothing to stop you putting something like: SRC="http://somewhere.else.altogether/porno/much_nicer.pic"> instead (apart from the longer download time the reader might have to suffer)!

The standard filetype for Web pages is GIF (Graphics Interchange Format) file, and essentially all graphics browsers are also able to render JPEG (Joint Photographic Expert Group) format files.
Some browsers may be able to render TIFF (Tagged Image File Format), and some can also handle .xbm (? uniX Bit-Map ?), but it is unwise to rely on either of those;
and please remember that while Acorn browsers can usually render &FE9 SpriteFiles, none of the non-Acorn browsers have even heard of it!

ALT=

The ALT bit is only taken any notice of if the browser is text-only, or has graphics turned off;
if it's not going to display the image, then:
If ALT="Text" is present, then that text will be displayed instead of the image;
If ALT="" is present, then nothing is displayed at that point;
If no ALT attribute is present, most browsers will display a default icon to let you know there's an image missing.
I would recommend that ALT= is always included; either null or specified, as appropriate.

ALIGN=

If a line is to contain both text and graphics (which is perfectly permissible: they're not called "inlined images" for nothing!), a third attribute, ALIGN= may be included.
This may take any one of the three "values" TOP MIDDLE BOTTOM, and determines the vertical alignment of the text and graphic (=BOTTOM means the bottom of the text aligns with the bottom of the image, etc).
It's a good idea to include an ALIGN attribute (but only when you have got both text and graphics on the same line), because if it's missing, the browser will have to guess how to align; and since there's no default convention for this, Murphy's Law will guarantee it'll come out the opposite to what you intended!

In HTML 3.0, the constructs ALIGN=LEFT and ALIGN=RIGHT also exist; but I have known them not to work!
Netscape also has ALIGN=CENTER , but since this isn't W3C HTML, it's probably better to use <CENTER><IMG SRC="beauty.gif"></CENTER> .
There are also some undocumented tags <RIGHT> and </RIGHT> which can be placed around the <IMG> element, and which some browsers support; they may be worth using as well as ALIGN= to cover all eventualities.
Some oddball HTML-editors wrongly assume that ALIGN= is for horizontal alignment only, and so use VALIGN= for vertical alignment:-- however this is totally against the W3C HTML specifications, so don't you do it that way!
Note that if you have a tall but narrow image, you can only have one line of text to its side: you can't have text flowing round graphics like you can in DTP frames! (Actually, <FIG> in HTML3 is supposed to allow this; but I wouldn't like to say (m)any browsers support this!)

BORDER=

This was a Netscape extension, now included in HTML 3.2.
BORDER=1 puts a thin line around the image, BORDER=2 a thicker, etc.
Possibly its most useful application is BORDER=0, which will turn off the blue rectangle which would otherwise be drawn around an image which is (part of) a hypertext link.

In HTML 3.0, two more permissible attributes are WIDTH= and HEIGHT=, followed by the image dimensions in pixels.
The idea is that a browser can reserve a rectangular area of the screen for an image, and then get on with rendering the text that comes after it while it's waiting for the image to be fetched.
Therefore I would recommend that these attributes be included if you know the image dimensions (which you ought to, if you're the author!); however, don't rely on your reader's browser necessarily taking any notice of them!
Some browsers claim to be able to use the values to scale and resize an image:-- again, I wouldn't like to rely on it working!

Web-Counters

This is an image of some digits which is created by a utility maintained by the ISP on the web-site (which stores and increments a count every time it is accessed), and which you can insert into your web page.
There is a universal web-counter available from www.digits.com; but I'll just describe the one available to Argonet users on the Argo site.
To include the counter, you include an image element of the form
   <IMG SRC="/cgi-bin/counttest.bm?parameters">
(where you can also include other standard attributes such as ALT="many" ).
The parameters part consists of one or more terms of the form name=value separated by & if there are more than one of them. The first term must be code=userid where userid is your user name (the bit before the @ in your email address, as used to specify your web-site directory); this will produce a counter in the default style.

Subsequent terms can be one or more of:
fore=cc back=cc brdr=cc and size=n where cc is a two-digit number for the colour of the foreground digits, background, and border respectively;
and n is 0 for the default large size or 1 for a smaller size.

The colour numbers are 00 to 07 for the standard BBC colours; the six actual colours can have 8 added for a darker shade or 16 added for a lighter shade; and similarly four intermediate greys can be selected. The full colour table is:

00 Black (100%)  08 Dark grey (80%)         16 Med. light grey (40%)

01 Red           09 Dark red (brown)        17 Light red (pink)
02 Green         10 Dark green (olive)      18 Light green (lime)
03 Yellow        11 Dark yellow (khaki)     19 Light yellow (cream)
04 Blue          12 Dark blue (indigo)      20 Light blue
05 Magenta       13 Dark magenta (purple)   21 Light magenta (lilac)
06 Cyan          14 Dark cyan (turquoise)   22 Light cyan

07 White (0%)    15 Medium dark grey (60%)  23 Light grey (20%)
So, for the "protovale" site to have a large counter with brown digits on a cream background and unspecified border, I would put
 <IMG SRC="/cgi-bin/counttest.bm?code=protovale&fore=09&back=19">
Note that no image will be displayed if you are viewing your page off-line!
Even if you are logged on but viewing a local page (from your hard disc), the counter will not be accessed unless you change the reference to an absolute URL
SRC="http://www.argonet.co.uk/cgi-bin/counttest.bm?parameters" .

[^Top^] Up to Top of Page . . . [Index] Down to Alphabetic Index


Symbols (Entities)

You may by now have spotted a potential problem: if the "angle brackets" < and > are reserved for enclosing tags, how do you represent them in ordinary text?
The answer is in the use of "entities", and those two characters are represented by &lt; and &gt; respectively (being the abbreviations for "Less Than" and "Greater Than").
Great -- but now we've lost & as a reserved character!!
Don't panic Mr Mainwaring! The ampersand now gets encoded as &amp; , which finally sorts everything out.
One more entity is defined: &quot; for a double (but sexless) quotation mark:-- this is vital if you need quote characters in a string which is delimited by (ordinary keyboard " ) quotes; but it is worthwhile to use the entity form everywhere in your text, and the " key only round HTML variables (such as the URL following HREF= ).

A second format of entity is the "numeric" format (as opposed to the "glyph" format above); whereby &#192; will produce a capital A with a grave accent (À).
The numbers used must be those defined in ISO-8859-1 and ISO 8879:1986:-- all Acorn users can smugly relax here, because that is the same as the "ISO Latin 1" set that the ARM 32-bit machines have always used! However, only the characters with ASCII codes 192 to 255 are included in that spec:-- that includes all of the accented letters ("diacritical marks"), but not the fractions, currencies, dashes or "proper" (sexed) quotes.

Purely as a matter of interest:-- in order for you to read "<TT>", I actually had to type
"&quot;<TT>&lt;TT&gt;</TT>&quot;"; and even that's a lie -- if you look at the HTML source (in a text editor, SEARCH FOR "that's a lie"), it's really one stage even more complicated!

HTML 3.0 extends the number range to include ASCII values 160 to 191 too, which gets you the fractions and currency symbols (and also 32 to 126 for that matter; but not 127 to 159).
HTML 3.0 also allows the 192-255 range to be encoded by glyph-name entities (which saves you having to look up the Alt-codes). I won't list all 96 of them, just enough for you to get the idea:
&Aring; &Ccedil; &ecirc; &igrave; &ETH; &ntilde; &Oslash; &szlig; &Uuml; &Yacute; &thorn; which you can see are all case-sensitive.
I have discovered that ArcWeb/WebVoyage also correctly render &mdash; and &ndash; (which are the SGML names for em- and en- dashes), but I haven't found any other browsers that do; and in any case, you can't expect any browser running under MS-DOS/Windows/Netscape to render those (let alone proper quotes) as such systems have no concept of such niceties!
There also exist glyph-named entities such as &nbsp; (hard-space) and &copy; (copyright © ); but again, I don't know how many browsers support all these.

W3C also have an interesting proposal for "predefined icon-like symbols":-- the idea is that the HTML source could contain an entity such as &mail; and the browser would insert a little icon of an envelope (their suggested symbol is [envelope icon], and all of these are downloadable as .xbm files). However, this is just a suggestion so far, so current browsers would just display &mail; !

[^Top^] Up to Top of Page . . . [Index] Down to Alphabetic Index


Forms

In normal use, the Web-site/HTML-Author provide information, and the Reader/Browser receives information:-- but there are times you want to do the reverse, for example to get your reader to respond to a questionnaire.

Overview and (client-side) Submission

Any block of the document can have <FORM> and </FORM> tags put around it; and within that area, three further tags can be used to enable the reader to enter information.
The FORM tag looks like <FORM METHOD=POST ACTION="mailto:email_address"> (there also exist the attributes METHOD=GET and ACTION="URL" which are ISP-specific, and described later).
The last or penultimate INPUT element will be of the form <INPUT TYPE=SUBMIT> which puts up a button with the word "Submit"in it; when the reader clicks on that, all the various inputs are concatenated into one long string which becomes the message body to be POSTed.
This string is "url-encoded", which means that: text strings have spaces replaced by plus symbols, and several potentially-troublesome reserved characters (such as a real plus, and quotes, ampersand, equals, percent, comma etc) are "escaped" by being replaced by a percent symbol and a two-character hex value (and new-lines become %0D%0A ); the var_name and var_value strings are paired up into "fields" either side of an equals sign; and the fields are strung together separated by ampersands.

Input Fields

Within the FORM area, three new tags are allowed: <INPUT> , <TEXTAREA> and <SELECT> .

INPUT
There are seven different types of INPUT, specified by a TYPE= attribute. Most of these additionally take NAME= and VALUE= attributes. The NAME specifies the name of a variable (that you decide), and the VALUE is the value of the variable; generally, the VALUE attribute is only included if you want to pre-set a default value for the variable (the rest of the time, it will be supplied by the reader).
<INPUT TYPE=TEXT NAME="var_name"> will set up a writable icon into which the reader can type anything. Two additional attributes can be included: SIZE= sets the size in characters of the (possibly scrollable) input window; MAXLENGTH= limits the maximum number of characters that can be typed in.
<INPUT TYPE=PASSWORD NAME="var_name"> does exactly the same except that the characters are obscured as they are entered.
<INPUT TYPE=HIDDEN NAME="var_name" VALUE="var_value"> is a sneaky one: the reader doesn't see anything, but the submitted form will contain a field of the form var_name=var_value which is useful if you've got reply forms on several sites about the place and want to identify which one a reply has come from!
<INPUT TYPE=CHECKBOX NAME="var_name" VALUE="var_value"> puts up a "check-box" which the user can click on to "tick". You may have several such elements all using the same var_name but with different var_values. You may also add the attribute CHECKED in which case the box is displayed ready-ticked (although the reader can "un-tick" it).
<INPUT TYPE=RADIO NAME="var_name" VALUE="var_value"> does the same as CHECKBOX except that only one of several items with the same var_name can be selected (like an Exclusive Selection Group). If none contain CHECKED, the first var_value is "selected" by default.
<INPUT TYPE=SUBMIT> puts up a button with the word "Submit" in it unless you include an attribute like VALUE="All done!" to legend the button as you want. This is the one the reader clicks on to actually send the form.
<INPUT TYPE=RESET> conversely clears all the user-inputs and resets them to any default states/values (and doesn't submit!).

TEXTAREA
This is an extension of the INPUT TYPE=TEXT method, but allows for multiple lines. It takes the form: <TEXTAREA NAME="var_name" ROWS=6 COLS=64></TEXTAREA> (Note the end tag).
The COLS= attribute is the equivalent to SIZE= and therefore refers to the visible width of the writable window.
There is no VALUE attribute; but if you want any default text, enclose it between the <TEXTAREA> and </TEXTAREA> tags.

SELECT
This is a variant on the CHECKBOX and RADIO inputs, and the structure of it is very similar to that of a list (see UL, MENU, etc, and LI).
The start tag is of the form <SELECT NAME="var_name"> and can have two further attributes:
MULTIPLE allows several options to be selected; otherwise only one;
SIZE=1 will usually display the first option only and the reader has to click on it to get the menu of options; SIZE=n should display the whole list of options.
Next comes the list of choices or options, each of the form:
<OPTION>description where OPTION may have two further attributes:
SELECTED is the equivalent of CHECKED (pre-selected by default);
VALUE="var_value" will return var_value in the reply message instead of description which it would do otherwise (but only if the item is selected, of course!)
After the list of options, there must be a </SELECT> tag.

Server-side Submission

The other method of submission relies on a suitable cgi-thingy being present on the server holding the web-site.

For Argonet subscribers, the FORM tag now looks like

<FORM METHOD=GET ACTION="http://www.argonet.co.uk/cgi-bin/mail">
The code between <FORM> and </FORM> must include the two elements
<INPUT TYPE=HIDDEN NAME="mailto" VALUE="userid@argonet.co.uk">
<INPUT TYPE=HIDDEN NAME="linkto" VALUE="full_URL_of_a_page">
where "mailto" and "linkto" must be exactly as shown;
"userid@argonet.co.uk" is the email address to which the data in the form should be sent (ie your email address); and
"full_URL_of_a_page" should be the full URL of the page that the visitor will receive after the data has been submitted, eg "http://www.argonet.co.uk/users/userid/index.html".

When the visitor clicks on the [Submit] button, they will 'get' a standard cgi-generated "Thank you" page containing just one hyperlink: the one you specified using "linkto".

Two other permissible fields are "mailfrom" and "mailsubject", which might for example be invoked by

Please enter your email address<INPUT TYPE=TEXT NAME="mailfrom">
<INPUT TYPE=HIDDEN NAME="mailsubject" VALUE="Web-site Response">

[^Top^] Up to Top of Page . . . [Index] Down to Alphabetic Index


Tables

W3C HTML 3.0 & 3.2 (but not HTML 2.0) contain a specification for tables; and Netscape contains an implementation of tables which is similar but not exactly the same! Browsers which adhere to HTML 2.0 only will not render tables (just display all the cells in either a single row or a single column). Therefore it is not possible to guarantee 100% reliable tables!
Nevertheless, I'll have a stab at describing them.

The tabular information is enclosed between <TABLE> and </TABLE> tags.
There can be an optional Title between <CAPTION> and </CAPTION> tags.
Then come one or more blocks of row information, each preceded by a <TR> (Table Row) tag, and -- solely to keep Netscape happy -- followed by a </TR> tag.
The row information says what goes in each cell of a row:-- this will be one Table Header after a <TH> tag and one or more cells of Table Data, each with a <TD> tag (no closing tags required).
Further facilities are offered by attributes inside a tag (and this is where the HTML3/Netscape discrepancies occur!):

<TABLE> can have the attribute BORDER which draws a box round the table (otherwise not); Netscape and HTML3.2 allow BORDER=pixels for the thickness of the border.
Netscape allows an attribute of the form WIDTH=80% to specify the total width of the table relative to the document margins;
HTML 3.2 allows the width to be given in percentage or pixels.

In both, inside both of the cell item tags <TH> and <TD> the following attributes can occur:
ALIGN= taking the values LEFT, CENTER or RIGHT for horizontal alignment (otherwise Headers are centred and Data flush-left); and
COLSPAN=number and ROWSPAN=number to allow an entry to span more than one cell horizontally or vertically respectively (and also to get you very confused!)
Netscape also includes VALIGN=TOP and VALIGN=BOTTOM (mainly used when several lines of ROWSPAN have been specified).

It is also supposedly possible to "nest" tables, ie have a mini-table inside one cell of a larger table; but rumour has it that anybody who has ever tried this has finished up wearing one of those funny calico jackets in a Rest Home For The Bewildered!

There -- I should have put you off tables for life by now! That's why it's a darn sight simpler to use the (admittedly cruder) <PRE> layout!

[^Top^] Up to Top of Page


Alphabetical Index

( "! to B" visible; click to jump to: . . C-D . . E-H . . I-P . . Q-S . . T-X )
<!>          Comment - Comments
<A>           Anchor - Terminology        HyperLinks
 Absolute URL        - Terminology        Identifiers & Filenames
 Accents             - Symbols
 ACTION=             - Forms Overview
<ADDRESS>            - Styles
 ALIGN=              - Text Layout        Images        Tables
 ALINK=  Active Link - Body
 ALT=    Alternative - Images
 Ampersand           - Symbols
 Anchor              - Terminology
 Angle Brackets      - in Tags            in Text
<AREA>               - Image Map
 Attribute           - Terminology
<B>             Bold - Styles
 BACKGROUND=         - Body
 BASE=               - Head
<BASEFONT>           - Font Size
=BASELINE            - Tables
 Base URL            - Terminology
 BGCOLOR=            - Body
=BLEEDLEFT           - Tables
=BLEEDRIGHT          - Tables
<BLOCKQUOTE>         - Layout
<BODY>               - Body
 BORDER              - Tables
 BORDER=             - Images             Tables
=BOTTOM              - Images Align       Tables Align
<BQ>      BlockQuote - Layout
<BR>      Line Break - Layout
 Browser             - Terminology
<CAPTION>            - Tables
 Case sensitivity    - in Tags            in Filenames
<CENTER>             - Layout
=CENTER              - Text Layout        Images Align  Tables Align
"CGI"                - Terminology        Image Maps    Forms Action
=CHECKBOX            - Forms
 CHECKED             - Forms
<CITE>               - Styles
 Client              - Terminology
<CODE>               - Styles
 COLOR=              - Font Colour
 Colour              - Body/Background    Text colour
 COLS=       Columns - Forms: Textarea
 COLSPAN=            - Tables
 COLSPEC=            - Tables
 Comments            - Comments
 COMPACT             - Unordered List     Data List
 CONTENT=  (in META) - Head
 COORDS=             - Image Map
<DD> Definition Data - in Data List
 Counter             - Images: Web-Counter
=DECIMAL             - Tables
 Destination Anchor  - Terminology
<DH>   Def'n Heading - in Data List
<DIR> Directory List - Lists
<DL> Definition List - Lists
!DOCTYPE             - Head
 Downloading
 DP=   Decimal Point - Tables
<DT>    Defined Term - in Data List
 Element             - Terminology
<EM>      Emphasised - Styles
 End Tag             - Terminology
 Entity              - Terminology        Symbols
<FONT>               - Font Size & Colour
<FORM>               - Forms
 Forms
 Fragment Identifier - Links Fragment
 ftp:                - Downloading
=GET                 - Forms Submission
.gif GIF image files - Images
 Glyphs              - Symbols
 Greater-Than symbol - in Tags            in Text
<H1>-<H6>            - Headings
 Hard Space          - Symbols
<HEAD>               - Head
 Head Anchor         - Terminology
 Headings            - Styles
 HEIGHT=             - Images
 Helper applications - Downloading
=HIDDEN              - Forms
<HR> Horizontal Rule - Layout
 HREF=               - HyperLinks         Image Map
"HTML"               - Introduction
<HTML>               - Anatomy
 HTML 2.0 and 3.0    - as defined by W3C
 Hyperlink           - Terminology        Description
 HyperText           - Introduction
<I>           Italic - Styles
 Identifier          - Terminology
 Icons    (proposed) - Symbols
 Image Maps
 Images
<IMG>  inlined Image - Images
<INPUT>              - Forms
 ISMAP               - Image Maps
"JPEG"   image files - Images
<KBD>       Keyboard - Styles
 Layout
=LEFT                - Images Align       Tables Align
 Less-Than symbol    - in Tags            in Text
<LH>    List Heading - Lists
<LI>       List Item - Lists
 Line Breaks         - when Not shown     Forcing
 LINK=    (Not used) - Head
 Links
<LISTING>            - UNUSED Style
 Lists
 mailto:             - in Hyperlink       in Forms Submit
.map     (extension) - Image Maps
<MAP>                - Image Map
 MAXLENGTH=          - Forms Input
<MENU>     Menu List - Lists
<META>               - Head
 METHOD=             - Forms Submission
=MIDDLE              - Images Align
 MULTIPLE            - Forms Select
 NAME=               - in HyperLink       in Forms      in META
 Newlines            - when Not shown     Forcing
<OL>    Ordered List - Lists
<OPTION>             - Forms Select
<P>  Paragraph break - Layout
=PASSWORD            - Forms Input
 PLAIN               - Unordered List
=POST                - Forms Submission
<PRE>   Preformatted - Layout
<Q>    paired Quotes - Styles
 Questionnaires      - Forms
 Quotation marks     - Symbols            Styles
<QUOTE>              - Styles
=RADIO               - Forms Input
 Relative URL        - Terminology        HyperLinks
 Reply forms         - Forms
=RESET               - Forms Input
=RIGHT               - Images Align       Tables Align
 ROWS=               - Forms Textarea
 ROWSPAN=            - Tables
<SAMP> SampleExample - Styles
<SELECT>             - Forms
 SELECTED            - Forms Select
"SGML"               - Standard General Markup Language
 SHAPE=              - Image Map
 SIZE=               - Form Input/Select  Font
 Source Anchor       - Terminology
 Spaces              - when Not shown     in PRE        Hard Space
 SRC=   image Source - Images
 Start Tag           - Terminology
<STRONG>             - Styles
 Styles
=SUBMIT              - Forms Submission   Forms Input
 Symbols
<TABLE>              - Tables
 Tables
 Tag                 - Terminology
 Tail Anchor         - Terminology
<TD>      Table Data - Tables
 Terminology
 TEXT=               - Body Colour
 Text colour         - Body               in Font
 Text markup
=TEXT                - Forms Input
<TEXTAREA>           - Forms
<TH>   Table Heading - Tables
<TITLE>              - Head
=TOP                 - Images Align       Tables Align
<TR>       Table Row - Tables
<TT>        TeleType - Styles
 TYPE=               - Forms Input
<U>     (Underlined) - UNUSED Style
<UL>  Unordered List - Lists
 UNITS=              - Tables
"URI"                - Terminology        Identifier
"URL"                - Terminology        HyperLink
 url-encoded         - Forms Submission
 USEMAP=             - Image Maps
 User Agent          - Terminology
 VALIGN=             - NOT Images Align   Tables Align
 VALUE=              - Forms Input        Forms Select
<VAR>       Variable - Styles
 VLINK= Visited Link - Body Colour
"W3C"                - World Wide Web Consortium
 Web-Counter         - Images: Web-Counter
 White Space         - in Text
 WIDTH=              - Horizontal Rule    Images        Tables
.xbm        Bit-maps - Image Source       Icons
<XMP>      (Example) - UNUSED Style

[^Top^] Up to Top of Page


[Home]Return to my Home Page

Written (wrotten?) by: email: <john@protovale.co.uk> John Alldred;
please mail me if you spot any disastrous errors, or have any helpful suggestions.

Last revised 31st May 1998 (anchors corrected 13th March 1999)