The Internet, the World Wide Web, and Markup

June 23, 2009

Harvard University
Summer School

Course Web Site:

Instructor email:
Course staff email:

img img img

This Lecture's Topics

A form for lecture feedback will be available from the course web site. Please take two minutes to fill it out after you have seen the lecture.

Goals of CSCI S-12

Course Description
This course provides a comprehensive overview of website development. Students explore the prevailing vocabulary, tools, and standards used in the field and learn how the various facets—including XHTML, CSS, JavaScript, Ajax, multimedia, scripting languages, HTTP, clients, servers, and databases—function together in today's web environment. The course provides a solid web development foundation, focusing on content and client-side (browser) components (XHTML, CSS, JavaScript, multimedia), with an overview of the server-side technologies. In addition, software and services that are easily incorporated into a website (for example, maps, checkout, blogs, content management) are surveyed and discussed. Students produce an interactive website on the topic of their choice for the final project and leave the course prepared for more advanced and focused web development studies.

CSCI S-12, Fundamentals of Web Site Development

The Course

Course Syllabus | Course Schedule

Lectures and Sections



In addition to the texts, there will be online readings assigned and online references cited.

Required texts:

Freeman, Elisabeth and Eric Freeman. 2005. Head First HTML with CSS & XHTML . O'Reilly & Associates. 658 p. ISBN 0-596-10197-X

Head First HTML with CSS and XHTML Cover

Jennifer Niederst Robbins, 2006. Web Design in a Nutshell . O'Reilly & Associates. 826 p. ISBN 0596009879

Web Design in a Nutshell

Texts are available through:

The Internet

International Network

Wikipedia: Internet

The Internet in Pictures

Web zoom in
Image from Opte Project and is used under the Creative Commons 1.0 license.

The Internet in Pictures

internet map
"A model of Internet topology using k-shell decomposition"
Carmi, Shai et al. (2007) Proc. Natl. Acad. Sci. USA 104, 11150-11154

The Internet in Numbers

Types of Traffic on the Internet

International Network

The World Wide Web

Weaving the Web

The irony is that in all its various guises -- commerce, research, and surfing -- the Web is already so much a part of our lives that familiarity has clouded our perception of the Web itself.

Tim Berners-Lee in Weaving the Web

Happy 20th Birthday to the World Wide Web!

Scientific American: Happy 20th Birthday, World Wide Web
Happy 20th Birthday, World Wide Web

ZDNet: The Web Turns 20. In 1989, Tim Berners-Lee proposed an information system that would later be known as the World Wide Web.

Read Write Web: Happy 20th Birthday, World Wide Web.

The Web in Numbers

web page and world populations

"The Indexed Web contains at least 25.41 billion pages (Tuesday, 27 January, 2009)."
from Maurice de Kunder's site

The Web in Numbers

About 77 million active sites in June 2009.

netcraft survey
Source: Netcraft Web Server Survey

The Web in Numbers

Home Use, May 2009 Value
Sessions/Visits per Person per Month 37
Domains Visited per Person per Month 70
Web Pages per Person per Month 1,591
Page Views per Surfing Session 42
PC Time Spent per Month 38:00:14
Time Spent During Surfing Session 1:02:11
Duration of a Web Page Viewed 0:00:51

Source: Nielsen Online

Top 25 Sites for the United States

image image image image image
image image image image image
image image image image image
image image image image image
image image image image image

Top 25 Sites in the United States:

Source: Top Sites, United States, from Alexa: The Web Information Company

Thumbnail screenshots from

A Web Site over Time

The White House Site (

Features of the World Wide Web

Approaching the Web

Nature of the Web

Clients and Servers

client-server computing

The interaction between two programs when they communicate across a network. A program at one site sends a request to a program at another site and awaits a response. The requesting program is called a client; the program satisfying the request is called the server. (definition from The Internet Book, 2nd edition by Douglas E. Comer)

Client-Server Computing

Client-Server Architecture from Webopedia

HTTP Client and Server

HTTP browser and server

An HTTP Transaction

HTTP Transaction:

The HTTP client gets the content, makes additional requests for CSS, images, Javascript, etc., and then renders the page:

White House

URL: A Web Address

URL Anatomy

Names and Locations: URLs, URIs, and URNs


For those who truly wish to find out more of the details, see Untangle URIs, URLs, and URNs by Dan Connolly

Web Parts

HTTP Client

http client http client



Client-side Web Parts: Markup, Style, Function

web parts

Our Solar System: Markup


Our Solar System: Markup + Style


Our Solar System: Markup + Style + Function



XHTML: Solar System Example

Example 1.1 - Solar System - Structure, Style, and Function - View example by itself



(X)HTML Document Structure

A Tree

XHTML document as a tree

A "tree" structure view of XHTML produced by Amaya, the open source Web editor/browser from the W3C.

Amaya Screenshot

Components of XHTML Elements

A Hypertext Link

Markup for a Hypertext link:

<a href="">Harvard</a>

element anatomy

Start Tag
<a href="">Harvard</a>

Element Name
<a href="">Harvard</a>

<a href="">Harvard</a>

Attribute Value
<a href="">Harvard</a>

<a href="">Harvard</a>

End Tag
<a href="">Harvard</a>


The XHTML page references an external stylesheet document.

The CSS file contains style rules for the document (planets.css)

Contents of the CSS file:

Harvard Summer School

Harvard Summer School 2009

HSS site

With CSS disabled:

HSS site without CSS

Examples from: CSS Zen Garden

css Zen Garden: The Beauty in CSS Design. A demonstration of what can be accomplished visually through CSS-based desgin.

Zen Garden No CSS Zen Garden Source
Zen Garden CSS Zen Garden CSS
Zen Garden CSS Zen Garden CSS
Zen Garden CSS Zen Garden CSS

Benefits of Web Standards

Question: What do the White House, GE, IBM, Library of Congress, EDS, Stanford, AGFA, Abbott, and Princeton have in common?
Answer: They adhere to Web standards.
Some sites that adhere to Web standards

Standards for:

Web Standards Project


Web Standards Project
The Web Standards Project is a grassroots coalition fighting for standards which ensure simple, affordable access to web technologies for all.

What are Web Standards and why should I use them?

Creating Markup Languages: SGML and XML

relationship between sgml, xml, html and xhtml

Defining Markup Languages

Specific Markup Languages



Main differences between SGML and XML:

End tags can be defined as optional

<img src="images/lake.png" alt="Lake" >

End tags always required (even for "empty" elements)

<img src="images/lake.png" alt="Lake" />

Element and attribute names are not case-sensitive Element and attribute names are case-sensitive
Attribute values do not need to be in quotes if the values contain alpha-numeric characters only Attribute values must always be in quotes

Well-formed XML documents

Well-formed and Valid XML Documents



Validate against the rules (elements, attributes, content models) of a specific grammar (e.g. XHTML 1.0 Transitional).

Ways of expression grammars:

HTML (SGML) and XHTML (XML) Compared

A valid HTML (SGML) document is not necessarily a valid XHTML (XML) document:

html and xhtml

Valid HTML 4.01 Transitional (SGML)

Valid XHTML 1.0 Transitional (XML)

Good practices for HTML

Examples of Other Markup Languages

Markup languages created using SGML or XML are "applications" of SGML or XML.

relationship between sgml, xml, html and xhtml

Math Markup Language

MathML Markup

Displayed in a browser:

Open Document Format



Timeline of Web Markup and Style Standards

timeline of web standards

Some Markup Specifications

Markup Evolution

markup evolution

The Ghosts of Markup Past, Present and Yet to Come


  • HTML 2.0
  • HTML 3.2
  • HTML 4.0 Strict
  • HTML 4.0 Transitional


  • XHTML 1.1
  • XHTML 1.0 Strict
  • XHTML 1.0 Transitional
  • HTML 4.01 Strict
  • HTML 4.01 Transitional

Yet to Come

  • HTML 5 / XHTML 5
  • XHTML 2.0

"Strict" vs "Transitional"

Flavors have to do with separation of markup and presentation.

Markup and Presentation - Transitional

Markup and Presentation - Strict

Some elements in XHTML 1.0 Transitional not found in Strict

  • basefont
  • center
  • dir
  • font
  • s
  • strike
  • u

Some attributes in XHTML 1.0 Transitional not found in Strict

  • background
  • bgcolor
  • color
  • marginheight
  • marginwidth
  • face

Document Type Declaration and Document Type Definition (DTD)

The Document Type Declaration for an XHTML 1.0 strict document is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

A closer look at the components follows:

Common XHTML and HTML Document Type Declarations and DTDs

Some Document Type Declarations for HTML documents. Remember that the HTML document must conform to the rules of the Document Type Definition that is referenced in the Document Type Declaration:



Effects of the Document Type Declaration

DOCTYPE statement used by:


XHTML 1.0 The Extensible HyperText Markup Language (Second Edition) from the W3C. XHTML 1.0 is a reformulation of HTML 4.0 in XML 1.0. It comes in three "flavors": strict, transitional, and frameset.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

Documentation for XHTML 1.0

XHTML Elements

Grouped by modules defined by XHTML modularization.

Our Solar System

Example 1.2 - Our Solar System - View example by itself



Structural: html, head, body, title


Firefox Firebug HTML inspector:
structural XHTML elements

Document Type Definition (DTD) for XHTML

A DTD defines the rules (elements, attributes, content model) for a markup language.
Note that the format for a DTD is not XML or HTML.

Elements, Attributes, Entities

html element defined:

<!ELEMENT html (head, body)>

attributes for img element defined:

  src         %URI;          #REQUIRED
  alt         %Text;         #REQUIRED
  longdesc    %URI;          #IMPLIED
  height      %Length;       #IMPLIED
  width       %Length;       #IMPLIED
  usemap      %URI;          #IMPLIED
  ismap       (ismap)        #IMPLIED

Parameter Entitites

<!ENTITY % coreattrs
 "id          ID             #IMPLIED
  class       CDATA          #IMPLIED
  style       %StyleSheet;   #IMPLIED
  title       %Text;         #IMPLIED"

Character Entities

<!ENTITY nbsp   CDATA "&#160;" -- no-break space = non-breaking space,
                                  U+00A0 ISOnum -->
<!ENTITY iexcl  CDATA "&#161;" -- inverted exclamation mark, U+00A1 ISOnum -->
<!ENTITY cent   CDATA "&#162;" -- cent sign, U+00A2 ISOnum -->
<!ENTITY pound  CDATA "&#163;" -- pound sign, U+00A3 ISOnum -->
<!ENTITY curren CDATA "&#164;" -- currency sign, U+00A4 ISOnum -->
<!ENTITY yen    CDATA "&#165;" -- yen sign = yuan sign, U+00A5 ISOnum -->
<!ENTITY brvbar CDATA "&#166;" -- broken bar = broken vertical bar,
                                  U+00A6 ISOnum -->
<!ENTITY sect   CDATA "&#167;" -- section sign, U+00A7 ISOnum -->
<!ENTITY uml    CDATA "&#168;" -- diaeresis = spacing diaeresis,
                                  U+00A8 ISOdia -->
<!ENTITY copy   CDATA "&#169;" -- copyright sign, U+00A9 ISOnum -->

Reading the DTD

Three main things to be concerned with:

  1. Content Model
  2. Attribute List
  3. Expanding defined entities (e.g. %Block, %Inline)

Some notations important for reading DTDs:

* , asterisk
zero or more
+, plus
one or more
( ), parentheses
|, pipe


What are the content model and attributes for the element "body"?

Content Model for body

Content model for body:

%Block entity referred to:


%Block entity defined:

content model for body

% Block entity defined:
% Block "(%block; | form | %misc;)*"
Expand the entities referred to (%block, %misc):

% Block fully expanded:
"(p | h1 | h2 | h3 | h4 | h5 | h6 | div | ul | ol | dl | pre | hr | blockquote | address | fieldset | table | form | noscript | ins | del | script)*"

Content model for "body"

(p | h1 | h2 | h3 | h4 | h5 | h6 | div | ul | ol | dl | pre | hr | blockquote | address | fieldset | table | form | noscript | ins | del | script)*

Attributes for body


Attributes: onload, onunload, %attrs


%attrs is: %coreattrs; %i18n; %events

Attributes for body

Reading the XHTML Specification

You can start with the HTML 4.01 Specification

XHTML in particular...

DTD: Is there a better way?

Documentation produced from the DTD:

Other ways to define XML Markup Languages


The a element.
[a Element, DTDParse]
Example 1.3 - Hypertext - View example by itself


Absolute and Relative Directions

Directions to the "Greenhouse Cafe"

greenhouse cafe

"Go to: 1 Oxford St, Cambridge, MA 01238. Go in the west entrance. Lecture Hall A will be on your right. Go straight.
In the lobby area, take a slight right and then a left"

Relative (to Science Center Lecture Hall A):
"Exit the lecture hall, take a right.
In the lobby area, take a slight right and then a left."

Greenhouse Cafe image from Flickr user Felix42 contra la censura

Absolute and Relative URLs

See Resolving Relative URIs in the Links section of the HTML 4.01 specification for more details.

Relative locations (URLs) are resolved according to the location (URL) of the containing (starting) document!

Absolute or Fully Qualified URLs

Absolute, or fully-qualified, URLs specify the complete information (scheme, host, port, path).

<a href=""> Diplomacy of Lewis and Clark stressed in exhibit</a>

Relative or Partial URLs

Relative, or partial, URIs specify partial information. The information not provided is resolved from the current location (or from base element or from meta data in HTTP response).

<a href="slide1.html">Slide 1</a>

Relative to Server Root

Is this relative or absolute? Scheme, host, and port would be resolved from current location, but path is absolute

<a href="/copyright.html">copyright information</a>

Relative Paths to Parent Locations

Relative Link Containing Document

Relative links are "transportable":

Relative Link Containing Document


Text: Heading

h1 , h2 , h3 , h4 , h5 , h6
Example 1.4 - Headings h1, h2, h3, h4, h5, h6 - View example by itself

A Third Level Heading

Lorem ipsum dolor sit amet, consectetuer adipiscing elit...

A Fourth Level Heading

Lorem ipsum dolor sit amet, consectetuer adipiscing elit...

A Fifth Level Heading

Lorem ipsum dolor sit amet, consectetuer adipiscing elit...

A Sixth Level Heading

Lorem ipsum dolor sit amet, consectetuer adipiscing elit...



Heading elements (h1,h2,etc.) combined with CSS are very powerful. Headings can remain headings in markup and CSS can style them as desired.

Harvard College Admissions page with headings highlighted

Text: Block

div , p , address , blockquote , pre

div and p

Example 1.5 - 'div' and 'p' - View example by itself
Division (div) elements are block-level and will be very useful when we discuss stylesheets.

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed feugiat nisi at sapien. Phasellus varius tincidunt ligula. Praesent nisi. Duis sollicitudin. Donec dignissim, est vel auctor blandit, ante est laoreet neque, non pellentesque mauris turpis eu purus.

Suspendisse mollis leo nec diam. Vestibulum pulvinar tellus sit amet nulla fringilla semper. Aenean aliquam, urna et accumsan sollicitudin, tellus pede lobortis velit, nec placerat dolor pede nec nibh. Donec fringilla. Duis adipiscing diam at enim. Vestibulum nibh.

Proin sollicitudin ante vel eros. Nunc tempus. Quisque vitae quam non magna mattis volutpat. Ut a risus. Fusce bibendum sagittis magna.

Curious about the Lorem Ipsum text?


blockquote, address


Example 1.6 - 'blockquote' and 'address' - View example by itself

In his I Have a Dream speech delivered in August 1963, Martin Luther King Jr. said:

I have a dream that one day this nation will rise up and live out the true meaning of its creed: We hold these truths to be self-evident that all men are created equal.

I have a dream that one day on the red hills of Georgia the sons of former slaves and the sons of former slave owners will be able to sit down together at the table of brotherhood.

I have a dream that one day even the state of Mississippi, a state sweltering with the heat of injustice, sweltering with the heat of oppression, will be transformed into an oasis of freedom and justice.

I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character. I have a dream today!

I have a dream that one day, down in Alabama, with its vicious racists, with its governor having his lips dripping with the words of interposition and nullification; one day right there in Alabama little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers. I have a dream today!

I have a dream that one day every valley shall be exalted, and every hill and mountain shall be made low, the rough places will be made plain, and the crooked places will be made straight, and the glory of the Lord shall be revealed and all flesh shall see it together.



Example 1.7 - 'address' element - View example by itself
The Science Center is home to several large lecture halls, the Greenhouse Cafe, computer labs, the Cabot Science Library, and the Museum of Historical Scientific Instruments. The address of the Science Center is:
1 Oxford St., Cambridge, Massachusetts, 02138


Whitespace, including spaces, tabs, carriage returns, and line feeds, are generally "collapsed" in XHTML. If you need a line break, you can use the br element.

Example 1.8 - Whitespace in Markup - View example by itself

Whitespace, including spaces, tabs, carriage returns, and line feeds, are generally "collapsed" in XHTML. If you
a line break,
you can use the br element.



pre: Where whitespace is important!
Example 1.9 - 'pre' element - View example by itself
Boston Forecast (°F)
     High  Low
Wed   25    11
Thu   18    11
Fri   31    27
Sat   38    20
Sun   43    34

Here is the same source, except this time in a p element.

Example 1.10 - Whitespace in 'p' - View example by itself

Boston Forecast (°F) High Low Wed 25 11 Thu 18 11 Fri 31 27 Sat 38 20 Sun 43 34


Text: Inline

Text Chapter from HTML 4.01 Specification

abbr , acronym , br , cite , code , dfn , em , kbd , q , samp , span , strong , var

Example 1.11 - 'strong', 'bold', 'em', and 'i' elements - View example by itself

Strong text and bold text should not be confused. They may be rendered in the same way on visual browsers. However, remember that "strong" is semantic and "bold" is presentational.

Likewise, emphasized text should not be confused with italicized text. The former (em) is semantic, the latter (i) is presentational.

Example 1.12 - 'span' element and style - View example by itself

span elements are useful in CSS. They are an inline partner with the block level div elements.

Example 1.13 - 'abbr' element - View example by itself

Web clients and servers communicate via HTTP.



Example 1.14 - 'acronym' element - View example by itself

NASA was founded in 1958.



Example 1.15 - 'q' element - View example by itself

Martin Luther King Jr. said, Injustice anywhere is a threat to justice everywhere.



List Chapter from HTML 4.01 Specification

ul , li , ol , dl , dt , dd

Example 1.16 - Unordered List - View example by itself

Some of my favorite food categories:

  • Tea
  • Bread
  • Cheese
  • Chips
  • Ice Cream
Example 1.17 - Nested Unordered List - View example by itself
  • Tea
    • Kenyan
    • Sikkim Temi
    • Formosa Oolong Fancy
  • Potato Chips
    • Dirty's
    • Art's and Mary's
    • Tim's Cascade
Example 1.18 - Ordered List - View example by itself
  1. Boil water
  2. Measure tea (approximately 1 tsp. per 6 oz. cup)
  3. Steep tea for 3 to 5 minutes
  4. Enjoy!
Example 1.19 - Dictionary Lists (terms and definitions) - View example by itself
a usually baked and leavened food made of a mixture whose basic constituent is flour or meal
a solid emulsion of fat globules, air, and water made by churning milk or cream and used as food

Lists and CSS

Lists combined with CSS are very powerful. Lists can remain lists in markup (navigation, content items, etc.) and CSS can style them as desired.

Harvard College Admissions page with lists highlighted


Objects, Images, Applets Chapter from HTML 4.01 Specification


HTML documents do not contain the images themselves, but merely contain references to the images to be displayed. Common image file types are:
Example 1.20 - 'img' element - View example by itself
Harvard University Extension School Logo
Example 1.21 - 'img' element wrapped in an 'a' - View example by itself


Tables Chapter from HTML 4.01 Specification

Tables are great for data.

Tables are often co-opted for page layout purposes (something better left to CSS).

Basic Tables
table , tr , td , th , caption , thead , tbody , tfoot

Example 1.22 - 'table' element - View example by itself
A table
Column 1 Heading Column 2 Heading Column 3 Heading
row 1 column 1 row 1 column 2 row 1 column 3
row 2 column 1 row 2 column 2 row 2 column 3
row 3 column 1 row 3 column 2 row 3 column 3

Table Head and Table Body

Tables are constructed with thead and tbody. Javascript can then be used to make the table "sortable" and "striped".

table table_sortable

Tables within Tables

Example 1.23 - Nested Tables (if you must) - View example by itself
row 1 column 1
t2 r1c1 t2 r1c2
t2 r2c1 t2 21c2
row 1 column 2 row 1 column 3
row 2 column 1 row 2 column 2 row 2 column 3
row 3 column 1 row 3 column 2 row 3 column 3

<!-- Comments -->

Example 1.24 - Comments in Markup - View example by itself

Comments can be inserted into XHTML and HTML markup. Just as with any code, making good use of comments in your markup is good habit that will reap rewards for the person (who may be you) who has to edit or alter the page.


XHTML/HTML Character Entities

XHTML/HTML Character entities can be defined by

Character Entities Defined for XML/SGML

Critical character entites are:

Character Entities defined specifically for XHTML/HTML

And Because we have deficient input devices...

Copyright symbol ©:

List of XHTML 1.0 Entity Sets and Character Entities

CDATA Sections

SGML and XML allow for "character data" (CDATA) sections, where you can have raw <, >, and & in the content. This is useful for including markup code within markup (as well as a few other use cases):



Getting Equipped...

While a student at Harvard, there are a variety of software packages you may download and use through FAS Information Technology Software Downloads. Note that "keyed" software must be run from a computer that is on an on-campus network (which includes VPN).

Web Hosting

For this course, Harvard will be your web hosting service. A server is dedicated to this course. Details on accessing (SSH and FTP) will be provided in Assignment 1.

HTTP Clients

Web Browser - It isn't just for surfing anymore.

Your web browser is not only useful for "using" the web, but it can be a powerful tool for web development.

HTTP Clients - Browser Statistics

grains of salt
Grains of Salt photo from Flickr user kevindooley

Net Applications

browser stats
Source: Browser Market Share from Net (January 2009)

W3 Schools

browser stats
Source: (January 2009xs)

Harvard iSites (Course Sites)

browser stats
Source: Harvard, iCommons (December 2008 and January 2009)

Text or HTML Editor

Please start with your favorite text editor!





Markup Checking and Validation Resources


SSH (Remote Login)

PuTTY Screenshots



FTP and SFTP (Move files from here to there and back again)

secure fx

URI to Filename Mapping

User directories

Web documents for each user are kept in the user's home directory, in a directory traditionally named public_html. As an example, for the user jharvard whose home directory is /home/courses/j/h/jharvard
URI /index.html
File /home/courses/j/h/jharvard/public_html /index.html

Document Root

The Web documents are typically kept under a single directory, traditionally named htdocs. The full path to this directory is called the "document root" of the Web server, for example, /www/htdocs.
File /www/htdocs /museumx/index.php

Directory Requests and "index.html"

URL paths that map to a directory. For example:

Copyright © 1998-2009 David P. Heitmeyer

Bookmark and Share