XML, VoiceXML, XLink, XHTML, XBRL,  XForm, 
XSLT, RDF and Semantic Web Watch

Bob Jensen at Trinity University

This is a threaded discussion about meta languages and extensions of SGML.   I cannot overly stress how much these newer technologies will impact upon e-Commerce, education, and the Web in general.

The document below is a disorganized collection of threads on XML and related topics.  For a more organized introduction to these topics, go to Jensen's Overview and Timeline of OLAP, GML, SGML, HTML, XML, RDF, and XBRL at http://www.trinity.edu/rjensen/XBRLandOLAP.htm 

WEB TIMELINE
Hypertext ---> PC ---> GUI,Mouse ---> GML,SGML --->Internet --->Hypermedia --->HTML,HTTP,WWW --->
DYNAMIC WEB TIMELINE                 
CGI,Java,JavaScript,DHTML,ActiveX,ASP ---> XML --->RDF ---> OLAP ---> HBRL

Overview of XML and RDF

Accounting Relational Databases Versus XML Databases

XForm News

Selected News Items

Selected Online References

Selected Offline References

Selected Software Alternatives for XML Authoring

OLAP Online Analytical Processing and Pivot Tables 

Summary of XBRL and Business Reporting on the Internet  

XBRL Demos 

Frequently Asked Questions (FAQs)

What kind of language is XSLT?

The Semantic Web

The Dark Side of XML

IETF  "standards"

VoiceXML 

Related Documents

Extended Summary of OLAP http://www.trinity.edu/rjensen/XBRLandOLAP.htm  

Extended Summary of XBRL http://www.trinity.edu/rjensen/XBRLandOLAP.htm 

Extended Summary of HTML http://www.trinity.edu/rjensen/XBRLandOLAP.htm 

Extended Summary of XML http://www.trinity.edu/rjensen/XBRLandOLAP.htm 

Bob Jensen's Technology Glossary

Working Paper 260:  Network Databases: Past, Present, and Future

Bob Jensen's Home Page XML and RDF Overview XML Software Review Table of Contents
Offline References Online References Technology Glossaries XML FAQs

 

 

 

Overview of XML and RDF --- The Next Big Things on the Web

A March 15, 2001 message from Neil Hannon recommends the following three references:

The best place to start (for learning about XML) is a Scientific American article, http://www.sciam.com/1999/0599issue/0599bosak.html  written by Jon Bosak and Tim Bray. Bosak and Bray were on the original XML working group. The article is short, readable and lays out the basic concepts of XML.

Next, try my XML resource page, located at http://web.bryant.edu/~nhannon/xbrl/xml.htm . At that site, I have gathered several articles that focus on the users side to XML. For books, I recommend XML, A Manager's Guide by Kevin Dick (Addison Wesley).

Neal Hannon [nhannon@HOME.COM] 


Avoiding Information Overload: Knowledge Management on the Internet --- http://www.jisc.ac.uk/techwatch/reports/tsw_02-02.html 

Keywords: search, knowledge management, XML, metadata, RDF, ontology, agent

It is estimated that there are over two billion Web pages, and thousands of newsgroups and forums, on the Internet - covering virtually every topic imaginable. However, many users find that searching the Internet can be a time consuming and tedious process. Even experienced searchers sometimes run into difficulties. To fully benefit from the potential opportunities of the Internet, both Web site developers and users need to be aware of the tools and techniques for managing and retrieving online knowledge.

This has driven the development of improved search and information retrieval systems. However, we now need sophisticated information extraction (and/or summary) capabilities to present the user only with the information they need, rather than a large set of relevant documents to read.

Search service providers, Web portals, and amalgamations of community Web sites could all help their users to benefit today, just by adopting the current generation of knowledge management systems, particularly those with effective information extraction capabilities.

Metadata has a very useful role to play, but it has limitations with regard to information extraction.

One of the key opportunities of the XML initiative is to allow structure and (indirectly) "meaning" to be embedded into the content of the resource itself. XML provides the much needed data structure for computer-to-computer interaction. The availability of good user-friendly, and "intelligent", tools will be critical in persuading the wider community to adopt XML as an alternative to HTML.

It is probably reasonable to state that the current generation of knowledge management systems is an interim measure, to be superseded by AI systems in the long-term. Such systems will probably be able to process natural language and XML encoded content.

The success of Internet based knowledge management, and the Semantic Web, will require the development and integration of various data standards, ontology definitions, and knowledge management and agent technologies. It will take a concerted and significant effort to get there. The likely longer-term benefits are much more effective Internet searches and smart information extraction services, which present the user with concise relevant extracts.

In the meantime, perhaps we should also think about how authors represent knowledge and present information, and how users apply knowledge, in a more structured and meaningful way.


From Ecommerce Discussion Digest on August 1, 2001

What is XML?? A technical definition of XML or Extensible Markup Language is "a document markup language for defining structured information" ---  http://html.about.com/library/weekly/aa091500a.htm#markup

http://www.compuware.com/products/fileaid/cs/  --- COMPUWARE's product supports it?

 http://www.xmlspy.com / --- XML development environment?

 http://www.netwind.com/html/xml.htm l --- XML Development Courses?

 http://www.online-learning.com/  --- XML Online (Internet) Courses?

 http://www.learnkey.com/lkweb/Products/XML/index.asp  --- Courses from "Industry Experts"?

 http://www.citrixiforum.com/iForumGuest/cds/host.dll  - Forum Topics?

 http://www.netwind.com/html/xml.html  --- Training Vedios and CDs?


Paul Adams works up a lather over the Simple Object Access Protocol, a fast, easy, XML-based way for Web aps to talk to each other --- http://hotwired.lycos.com/webmonkey/02/08/index0a.html 


Deconstructing Babel: XML and application integration XML may not yet be a true "silver bullet," but it can be used to great effect in integration projects if IT managers create a detailed plan that can overpower its weaknesses. 
"Deconstructing Babel: XML and application integration," By Henry Balen, Application Development Trends, December 2000 --- http://www.adtmag.com/ 

XML: A brief description
The eXtensible Markup Language, or XML, came out of the world of the Standard Generalized Markup Language (SGML). Since its introduction a little more than three years ago, XML has spawned a set of technologies that allow users to manipulate XML documents programmatically and use style sheets to perform transformations.

Initially, XML was developed to overcome the shortcomings of HTML, a markup language containing stylistic information. The aim of XML's developers was to create a language that was easy to use over the Internet, supported by a wide variety of applications, compatible with SGML and legible to humans. Like its ancestor, SGML, XML separates content from style.

A typical XML document is hierarchical. It is made up of elements defined by tags. The code below is an example of a simple XML document. A document type definition (DTD), or XML Schema, is used to define the structure of a document. It was originally envisioned that the presentation of the information within an XML document could be viewed with an XML browser and associated style sheet. Now it is not uncommon for the style sheet to be applied on the server side and the XML translated into HTML.

 


<?xml version="1.0"?>
<!DOCTYPE REPORT SYSTEM "report.dtd">
<REPORT>
<TITLE>Acme Corp Merger Rumours</TITLE>
<SOURCE>
     <PROVIDER TYPE="wire">Good News</PROVIDER>
     <AUTHOR>Brenda Star</AUTHOR>
</SOURCE>
<!ENTITY vendor SYSTEM "vendor.xml">
<ITEM>
     <PARAGRAPH>
     The grapevine is working overtime on the subject of a possible Acme Corp
     merger and Acme would like to take everyone off the clock.
     </PARAGRAPH>
     <PARAGRAPH>
     Gossip concerning Acme has been circulating ever since the recent restructuring.
     </PARAGRAPH>
</ITEM>
</REPORT>

An XML document is referred to as well formed if it conforms to the XML standard, and correct (or valid) if it complies with a DTD or Schema. At the core of any XML application is an XML parser. All XML parsers will check that the documents they receive are well formed, and most can also check to see if the document is valid. A Simple API for XML (SAX) parser has become the de facto standard for event-driven parsers. Most XML parsers are either event-driven or produce an in-memory Document Object Model (DOM) instance of the document. The one you use depends on the application and memory requirements. Producing a DOM tree requires more memory, but it can provide greater programmatic flexibility. SAX may be suitable for applications that need smaller memory footprints, and it can process the XML document as a stream of events.

Henry Balen

XML has become the lingua franca for inter-application communication. Using XML, all messages sent between applications consist of self-describing text. This makes the messages easily understandable by both humans and machines, although it does not supply an efficient packaging of the message. (XML messages can be considerably larger than a binary representation of the same information.)

There are three aspects of inter-application communication:

Transport—how to get information across the wire; Protocol—how to package the information sent across the wire; and Message—the information itself. The transport is usually a lower level network standard such as TCP/IP. Inter-process communications standards, such as CORBA, DCE and DCOM, have their own protocols that sit on top of such transports.

The protocol used depends on the communication mechanism. Standards may use different protocols to communicate: CORBA uses IIOP, while electronic mail uses SMTP. Each of these protocols allows you to package a message, specify a destination and get the message to the designated location. In protocols that support remote method invocation (RMI), the destination can consist of an object reference and method.

With each of these protocols, the user defines the message that is sent across the wire. In the case of CORBA, DCE, DCOM and so on, the message is defined using an Interface Definition Language (IDL). In E-mail and message-oriented middleware (MOM) it can be more fluid. No matter what you use, there is an agreement between the sender and receiver about the meaning of the message. The meaning is not transferred with the message.

So why use XML? In XML, documents contain meta-information about the information being transmitted, and can be extended easily. However, XML is less efficient than transmitting the information using a binary protocol. One advantage, though, is that humans and computers can both read the document.

To overcome the communication problem, the application can be enabled to send and receive information in the form of XML. This can be done independent of protocol, and if the meaning is agreed upon between the applications or organizations, then you just need to get the package to its intended destination. How it gets there is up to you. Of course, in these days of the Internet, the HTTP protocol is a natural choice. There are business domain-specific XML vocabularies under development.

Application integration From the point of view of an application, there are various points of integration: data store, APIs or components, and protocol. The point of integration used depends on the nature of the application. If integration means the ability to speak XML, then you will need to acquire or build adapters for the point of integration. These adapters are responsible for getting information in and out of the application, and performing any necessary transformations along the way.

If the integration involves the sharing of information, you may want to integrate at the level of the data store. Assuming you have an existing database containing the information you want to share, your integration adapter is responsible for translating from a query's result set to an XML document. Conversely, when the application receives information in the form of XML, the adapter performs a reverse translation and maps the document elements to the appropriate database entities.

Oracle Corp., Redwood Shores, Calif., sells a relational database with a degree of XML support. XML is either mapped as just described, or the database's hybrid capabilities can store XML natively. The SQL syntax has been extended with an XML Query language.

Object or network databases may provide a more natural mapping for XML to the database's representation. A persistent Document Object Model (DOM) mechanism can preserve the structure of the XML document. You should be aware that while an XML document provides a good way in which to represent information, it is not an application domain model.

In addition, some products are being marketed as XML databases. eXcelon Corp. (formerly Object Design Inc.), Burlington, Mass., has re-purposed its object database to handle XML. Conversely, there are some products, such as Tamino from Software AG, Darmstadt, Germany, that were built from the ground up to handle just XML. While each product provides an XML Query language, it has not been standardized. The World Wide Web Consortium (W3C) is currently working on a standard for XML Queries, which I expect most vendors will adopt.

Of course, most existing data is kept in hierarchical or relational databases, and you cannot ignore this if you want to integrate at the database level. If you are in this camp, take a look at tools that help with the translation to and from XML.

Integration at the API level can be achieved through handcrafted adapters. Using an off-the-shelf XML parser, an adapter can be constructed that will translate from the received XML to an object model or function/method invocation. Similarly, you can transform information from the application to an XML document for transmittal. If the application supports one of the component models, you may be able to acquire an adapter that implements a bridge to the world of XML. If you are using one of the industry XML schemas, however, you will most likely have to code a transformation; an XSL Transformations (XSLT) processor is useful. With XSLT, you can use an XML dialect to define the transformation rules.

When the application already utilizes middleware, such as MOM or CORBA, then the adapter provides a gateway. This gateway can receive the XML messages, decide which components need to be notified, and perform the necessary translation. Commercial implementations of these gateways, such as CapeConnect from Cape Clear Software, Dublin, Ireland, are starting to ship. These XML brokers use XML for the content of the message and protocol to specify the destination. The W3C is working on an XML RPC mechanism standard. One submission, first promoted by Microsoft and now gaining wide support, is SOAP.

SOAP is a lightweight XML protocol for the exchange of information. It is probably the leading contender for adoption by the W3C. SOAP can provide synchronous and asynchronous mechanisms to send requests between applications using a variety of protocols. Robust security and transactional capabilities still need to be added to the SOAP protocol.

CORBA middleware users may find it interesting that the Object Management Group (OMG) has put out a call for proposals for a SOAP–CORBA mapping. Along with work on XML value types for CORBA, this can provide a natural basis for XML enabling the CORBA infrastructure.


XML Standards In Effect or In Process Much of the material in the following table was derived from the OASIS Cover Pages, maintained by Robin Cover. These pages are a tremendous resource for anyone wanting to keep track of the rapidly changing XML marketplace. Also, you can get connected with XML developers at Cisco’s XML community at www.hotdispatch.com/cisco-ip-telephony .

All links were visited and supplementary material added in December, 2000 --- http://www.stratvantage.com/directories/xmlstandards.htm 


Year 2000:  Important updates on XML at http://www.w3.org/TR/xhtml1/ 

"XML's Grand Schema XML Schema Language is a powerful feature that can be used to validate data in myriad ways, and save you time in the process" by Yasser Shohoud at http://www.xmlmag.com/upload/free/features/xml/2000/03sum00/ys0300/ys0300.asp 
This is a good review article dated in Summer 2000.  

What's on Microsoft's wish list today?  From Newsweek, April 17, 2000, pg. 43.  You can read the entire article online at http://newsweek.com/nw-srv/printed/us/bz/a18441-2000apr9.htm  

The current rubric for this effort is "Next Generation Windows Services," with an emphasis on that final word. The Microsoft vision is to replace the bulk of its software with a collection of dynamic "services" that makes it easy for customers to access and manipulate information spread out over the Web. In Microsoft's telling, the Web you know and love is severely limited: you can view pages but can't really fool around with the information it offers. By making use of a recent standard for creating Web pages called XML, however, it's possible to use that data as smoothly as you can massage the numbers in your own little spreadsheet at home. A whole new set of possibilities open where minutiae stored in the bowels of Web-connected databases get integrated into your life. Want to travel? Your personal calendar could take into account the weather in destinations you're scheduled to visit, as well as whether seats remain for discount fares on your favorite airline. And if your stockholdings increase, you may automatically upgrade your hotel reservation to a suite. Another benefit of XML is that by unhooking data from a fixed page view, it can effortlessly display the same figures, facts and trivia in devices ranging from mobile phones to e-books.

An April 7, 2000 draft of the the WC3's XML Schema Part 0: Primer is available at http://www.w3.org/TR/xmlschema-0/   For later versions, go to http://www.w3.org/TR/xmlschema-0/ 

One of the best articles on XML with a minimum of techie jargon, is entitled "Is XML the answer?  Depends on the Question?" by Michael Goulde in Application Development Trends, October 1999, pp. 21-22.  The online version is at http://www.adtmag.com/Pub/oct99/d9910xml.htm 

One of the reasons XML has captured so much interest so quickly (Version 1.0 of the XML specification was released in February 1998) is that it represents a parsimonious solution to a wide variety of problems. There are three sets of users who have a very high level of interest in XML. The first group includes Webmasters and other designers of Web-based information systems who use HTML to mark up information for presentation, but have no way to structure the information they send to browsers. By providing structure to the unstructured Web data in a standard way, a Web query can deliver a much more useful set of results, increasing the value of the information.

The second group of users have toiled for years with the Standard Generalized Markup Language (SGML) to create structured documents such as training manuals and technical documentation. Although HTML is derived from SGML, SGML in general is not well suited to the Web environment because it is extremely complex -- something that has also affected its universal adoption. XML is an SGML derivative that is not only easier to use on the Web, it has garnered wider adoption. These users have also become very active in World Wide Web Consortium (W3C) working groups that are hammering out additional specifications and standards to ensure that XML meets many of the same application requirements as SGML.

The third set of users fascinated by XML are a set who were not originally targeted by the W3C's XML efforts. But very early on, application developers building distributed applications -- and faced with difficult challenges around application integration and interoperability -- saw XML as a way to free their applications from the tyranny of over-the-wire binary formats that made it impossible to link applications together in real time. These developers, many from the Java community, but equally as many using Microsoft tools, quickly realized that they could use XML syntax in their messages and, because of the self-describing nature of XML documents, applications could exchange data without having to be explicitly written or compiled to do so. Freedom! This group has now expanded to include developers who want to extend EDI, link networks of suppliers and customers, create dynamic marketplaces, and perform other heretofore impossible tasks over the Web.

The problem with networking of databases at the present time is that systems are proprietary and lack standards for efficient and effective computing around the world.  Companies would like their particular products to dominate e-commerce, but that just is not going to happen because of emerging standards for networking of databases.  The lead in standard setting is being taken by the World Wide Web Consortium (W3C).

Tim Berners Lee led a team of physicists who invented HTML scripting and the HTTP protocol.  This creator of the WWW says there’s a new revolution on the horizon for the Internet and the best way to deal with it is the Resource Description Framework (RDF).  RDF will be of monumental importance to the 21st Century Intranet and millions of intranets.  It will be implemented largely through XML extensive markups to HTML.   XML will become a popular way of putting databases on networks.  XML is already supported by leading browsers such as Microsoft's Internet Explorer and Netscape's Navigator.  In reality, XML is a nested object-oriented component structure in which documents and databases can be broken into object components that can be edited, divided, and re-assembled.  The analogy is that the component structure is like adding nested sections and chapters into the (hidden) table of contents of a document or database.  For example see the POET Content Management Suite at http://www.poet.com .

A good place to start learning is "XML for the Absolute Beginner" at  www.javaworld.com/javaworld/jw-04-1999/jw-04-XML.html.    I might add the following online article entitled "XML Gains Ground:  Vendors pledge support as XML stands poised to become a universal format for data exchange" at http://www.informationweek.com/725/XML.htm .

Interested in XML? Sign up for a free weekly email full of XML news, features, downloads and reviews. http://www.zdnet.com/enterprise/lists/xml/subscribe.html 

Just about every recent technology magazine and journal carries at least one article about the looming XML and RDF.  My top recommendation, apart from my own overview mentioned above, is entitled "XML: The Last Silver Bullet" by Jack Vaughan in Application Development Trends, April 1999, 24-30.  He contends that "coming as it does on the heels of the Web's great success (HTML), XML is viewed by some as having a far broader impact."  This is a nice summary article of the history of XML (it only started in 1996) and XML's tremendous future.  Vaughn also discusses RDF.  The online version of this article is at  http://www.adtmag.com/pub/apr99/f04eaix0499.htm

The first step to understanding RDF is to distinguish between data and metadata.   Metadata tags in documents and databases provide "data about data" like unseen genes provide data about body parts. One of the drawbacks of HTML is that HTML tags relate only symbols rather than attributes of what the symbols depict. For example, HTML tags tell us how to display the word "eyes" in a web document but there are no tags related to attributes such as eye color, eye size, vision quality, and susceptibility to various eye diseases.  

For example, HTML tags relate only to formatting and linking tags on words red and purple appearing in a document.  HTML tags do not disclose that both words depict colors, because HTML does not associate words with meanings.  Metadata, on the other hand, attaches meanings to the data by attaching hidden attribute tags.  For example, attached to the word "petal" might be an invisible tag that records information that the petal has color coded numbers for color hue and color saturation for rose petals.   When any petal's invisible tags are read in a meta search engine, it would be possible to identify types of roses having a range of hue and saturation commonalties.   Poppies would be excluded because they do not have rose tags.   Red herrings (a term for false leads in a mystery) would be excluded because they do not have a tagged attribute for color.

In a sense, metadata is analogous to genetic coding of a living organism.   Attributes in hidden tags become analogous to attributes coded into genes that determine the color of a flower's petals, degree of resistance to certain diseases, etc.   If we knew the genetic "metadata" code of all flowering plants, we could quickly isolate the subsets of all known flowering plants having red petals or resistance to a particular plant disease.  In botany and genetics, the problem lies is discovering the metadata codes that nature has already programmed into the genes.  In computer documents and databases, the problem is one of programming in the metadata codes that will conform to a world wide standard. That standard will most likely be the RDF standard that is currently being developed by the World Wide Web Consortium (W3C) having Tim Berners-Lee as its current Director. 

The examples given by me above are gross simplifications of text tagging that will actually take place under RDF.  RDF works in a more complicated fashion that will be much more efficient for meta searches.  The core of RDF will be its "RDF Schema" briefly described below:

This specification will be followed by other documents that will complete the framework. Most importantly, to facilitate the definition of metadata, RDF will have a class system much like many object-oriented programming and modeling systems. A collection of classes (typically authored for a specific purpose or domain) is called a schema. Classes are organized in a hierarchy, and offer extensibility through subclass refinement. This way, in order to create a schema slightly different from an existing one, it is not necessary to "reinvent the wheel" but one can just provide incremental modifications to the base schema. Through the sharability of schemas RDF will support the reusability of metadata definitions. Due to RDF's incremental extensibility, agents processing metadata will be able to trace the origins of schemata they are unfamiliar with back to known schemata and perform meaningful actions on metadata they weren't originally designed to process. The sharability and extensibility of RDF also allows metadata authors to use multiple inheritance to "mix" definitions, to provide multiple views to their data, leveraging work done by others. In addition, it is possible to create RDF instance data based on multiple schemata from multiple sources (i.e., "interleaving" different types of metadata). Schemas may themselves be written in RDF; a companion document to this specification, [RDF Schema], describes one set of properties and classes for describing RDF schemas. (Emphasis added).

World Wide Web Consortium (W3C)
http://web1.w3.org/TR/REC-rdf-syntax/

The term "metadata" is not synonymous with RDF.  There were various metadata systems before RDF was on the drawing boards.  Microsoft's Channel Definition Format (CDF) used in "Web Push Channels" and Netscape's Meta Content Framework (MCF) preceded RDF.  These technologies describe information resources in a manner somewhat similar to RDF and can be used to filter web sites and web documents such as filtering pornography and violence from viewing.  Metadata systems can be used to channel inflows of desired or undesired web information.  CDF, for example, carries information not read on computer screens that perform metadata tasks.

RDF resources are built upon a foundation of Uniform Resource Identifiers (URIs) that are described at http://www.ietf.org/internet-drafts/draft-fielding-uri-syntax-04.txt .  The metadata structure in RDF has the following components described on Page 4 of http://web1.w3.org/TR/REC-rdf-syntax/

Resources
All things being described by RDF expressions are called resources. A resource may be an entire
Web page; such as the HTML document "http://www.w3.org/Overview.html" for example. A
resource may be a part of a Web page; e.g., a specific HTML or XML element within the
document source. A resource may also be a whole collection of pages; e.g., an entire Web site. A
resource may also be an object that is not directly accessible via the Web; e.g., a printed book.
Resources are always named by URIs plus optional anchor ids. Anything can have a
URI; the extensibility of URIs allows the introduction of identifiers for any entity imaginable.


Properties

A property is a specific aspect, characteristic, attribute, or relation used to describe a resource.
Each property has a specific meaning, defines its permitted values, the types of resources it can
describe, and its relationship with other properties. This document does not address how the
characteristics of properties are expressed; for such information, refer to the RDF Schema
specification).

Statements
A specific resource together with a named property plus the value of that property for that resource
is an RDF statement. These three individual parts of a statement are called, respectively, the
subject, the predicate, and the object. The object of a statement (i.e., the property value) can be
another resource or it can be a literal; i.e., a resource (specified by a URI) or a simple string or
other primitive datatype defined by XML. In RDF terms, a literal may have content that is XML
markup but is not further evaluated by the RDF processor.

A good place to begin reading about RDF is at http://web1.w3.org/TR/REC-rdf-syntax/ .

The most likely scripting codes will be XML, although RDF can be used in other scripting systems.  The popular HTML and the emerging HTML are subsets of the GML text scripting conceived in 1969 by IBM researchers depicting Generalized Markup Languages (and not-so-coincidentally the lead researchers were named Goldfarb, Mosher, and Lorie).  Between 1978 and 1987, Charles F. Goldfarb led the team that developed the SGML Standard GML that became International Standard ISO 8879.  In 1990, Tim Berners-Lee led a team of particle physicists that invented the World Wide Web rooted in the rule-based text scripting markup innovations of SGML.  The World Wide Web is comprised of all web documents marked up in scripts known as  Hypertext Markup Language (HTML) scripts.  SGML is tremendously powerful but inefficient and complex.  HTML is marvelously simple but not very powerful.  In 1996, Jon Bosak of Sun Microsystems spearheaded the development of the XML standard to lend power, efficiency, cross-platform standards, and simplicity to the networking of databases on the Internet.   At the time of this writing, the world is converging upon an important standard known as RDF (Resource Description Framework) rooted in XML that will be the biggest 21st Century thing to hit the Internet since HTML hit the Internet in 1991.

HTML was extremely limited in its early versions.  Several early versions' rigid and limited document formatting had simplistic appeal in their limited number of scripting "tags."  Early versions of HTML, however, lacked styles (italics, underlines, indentations, tables, etc.) that authors prefer in documents.  In subsequent versions, HTML developers invented cascading style sheets that expanded the formatting and font capabilities at the expense of more complex scripts for HTML tags.  But HTML software "editors" such as HotMetal Pro, Page Mill, FrontPage, and many others took over the scripting chores.  It became as easy to produce World Wide Web documents as it is to use a word processor such as Microsoft Word and Word Perfect.  In fact, newer versions of word processors added options to automatically embed HTML scripts in documents.

But HTML did not, and still does not, allow authors to "extend" or "customize" tags for application-specific tasks.   In later versions of HTML, tags were invented for creating tables.  However, HTML tables are not dynamic in the sense of a table in a relational database or object oriented database.  For example, it is not possible to perform simple arithmetic operations to fill table cells or do any other types of "computing" apart from formatting, viewing, and linking text and graphics.  It was and is still not possible to search and retrieve subsets of tables without downloading entire HTML documents containing the tables.  Common database software operations such as writing queries and revision of records within networked tables are not possible in HTML tables.

The curses of HTML was that HTML tags were not "extensible" and could not otherwise be customized for application-specific tasks such as simple database operations.  Netscape invented JavaScript to allow developers to embed customized scripts to overcome the limitations of HTML.  Dynamic HTML known as "DHTML" was invented for certain types of customizations.  Web browsers such as Microsoft's Internet Explorer and Netscape's Navigator will read JavaScripts and DHTML.  But these "extensions" of HTML have some very frustrating limitations.  The major limitation is that many lines of script must be written to perform rather simple tasks.  To write JavaScripts to perform database operations boggle the mind.   The simplicity of HTML, thereby, gives way to coding complexities that virtually require that document authors first become computer programmers.  Even computer programmers find JavaScript and DHTML to be inefficient and ineffective extensions of HTML.  To make matters worse, standard setters could not agree on proposed standards for DHTML. 

In 1996, the World Wide Web Consortium (W3C) gave serious attention to Jon Bosak's proposed "extensible markup language" called XML.  It is extensible to HTML and allows meanings (e.g., attributes of a petal) to be tagged on the words "red petal."  More importantly, subsets of documents and tables can be edited and transmitted over the Internet as bits of data that do not carry the accompanying excess baggage of HTML formatting information and entire documents containing entire tables.  Users can feed those data inflows into style sheets of their own choosing.  Appearances can be changed by user modifications to style sheets. 

Perhaps the best example is a networked database on the web containing 10 million names, addresses, and phone numbers.  It would be extremely inefficient to have to download the entire database merely to look up a particular phone number or to change a phone number in the database.  It would be absurd to code the database information into a HTML document that has to be downloaded with all 10 million records before a user can search for one record.  In contrast, application-specific database software is highly efficient in allowing users to use queries to retrieve only a desired subset of data in a database.  XML will do the same thing for web documents and tables.  Web meta searches become more like database queries.

Another important distinction between HTML and XML lies in the ability of using XML to process information without the aid of human beings.  HTML documents are intended for human viewing.  Computers using XML can "talk" to one another without human intervention.  Charles Goldfarb and Paul Prescod describe the database aspects of XML as follows:

XML is also expected to become an important tool for interchange of database information.  Databases have typically interchanged information using simple file formats like one-record per line with semi-colons between the fields.  This is not sufficient for the new object-oriented information being produced by databases.   Objects must have internal structure and links between them.  XML can represent this using elements and attributes to provide a common format for transferring database records between databases.  You can imagine that one database might produce an XML document representing all of the toys the manufacturer produces and that document could be directly loaded into another database either within the company or at a customer's site.  This is a very interesting way of thinking about documents, because in many cases human beings will never see them.  They are documents produced by and for computer software. (Emphasis added)

C.E. Goldfarb and P. Prescod
The XML Handbook
(Upper Saddle River, N.J. Prentice Hall PTR, 1998, Page 25)
http://www.phptr.com


"XML: Plugging into 'Standard' Hybrids," eWeek, January 7, 2002by Renee Boucher Ferguson --- http://www.eweek.com/article/0,3658,s%253D1884%2526a%253D20656,00.asp 

It was supposed to be so simple. XML would enable companies to move beyond paper-, e-mail- and electronic data interchange-based commerce to the world of Internet transactions. Having such an open platform was supposed to provide a lower-cost way for developing applications that would be universally accessible to all of a company's business partners.

Now, more than three years after XML's introduction, IT shops implementing industry-specific variants find themselves looking at multiyear, multimillion-dollar projects that leave two fundamental obstacles unchallenged: how to shift partners from trading through traditional means to trading with XML and how to interoperate with other industries.

These vertical-industry XML flavors for many companies have created walls around their Internet trading software that require more code to be written and more expense incurred to make sure that some potential buyers or suppliers can take part in business-to- business e-commerce.

What's needed now, in the view of IT managers, software vendors and analysts, is a horizontal XML blueprint of sorts to describe a syntax and vocabulary that vertical industries can use to interoperate with B2B trading software from other verticals. ebXML (electronic business XML) is being touted as one solution—not just another XML variation but an architecture that provides a horizontal messaging framework.

Other cross-industry standards in the works include UBL (Universal Business Language) and XSL (Extensible Stylesheet Language).

However, until a universal standard or set of standards is agreed upon, vertical industries will continue to support individual XML standards that do not interoperate.

Continued at http://www.eweek.com/article/0,3658,s%253D1884%2526a%253D20656,00.asp 


From Neal Hannon on September 1, 2000

Formal work in the area of XML glossaries of terms is well documented at the www.w3c.org web site. 

I am working with the XBRL.org steering committee.  We have developed a taxonomy, or data dictionary, for identifiying the elements of financial statements compiling with US GAAP.  XML Schema is being used to provide more flexibility in the expressiveness of DTD's. 

DTDs are part of the XML family of standards but do not use XML document syntax.  DTDs also do not provide the mechanism for specifying the fundamental type of an element or attribute.  XML Schema, although not yet a formal W3C recommendation, provides this ability.   The entire taxonomy and examples are posted at the www.xbrl.org Web site.

Books that cover schemas simply include "Teach yourself XML in 24 hours" by Ashbacher, "XML, a Manager's Guide", by Dick.  I hope this information helps.

Neal

A "Must See" site on XBRL --- http://www.xbrlsolutions.com/ 

Try out the  XBRL Instance Document Validator!

Try out the XBRL Custom Taxonomy Builder!

XBRL is a framework that will allow the financial community a standards-based method to prepare, publish in a variety of formats, exchange and analyze financial reports and the information they contain. It will also permit the automatic exchange and reliable extraction of financial information among various software applications.

Neil Hannon pointed out this XBRL Demo --- http://www.reportingtools.com/xbrl/index.cfm 

These financial statements have been created to display the use of XBRL taxonomies utilizing the specification dated 2000-07-31. These are NOT the official financial statements of Newtec

This demo provides a view of the XBRL mapping feature contained within MultiMart™ Web Financials and a brief description of how this is processed. A ‘View Source’ element, allows you to see the underlying data (instance document in accordance with XBRL Taxonomy specification dated 2000-07-31). Notably, the taxonomy mapping needs only to be completed once as Newtec’s MultiMart™ Enterprise Datawarehouse saves the information for use in report output and this solution can accommodate any taxonomy, including future additions.

MultiMart™ Web Financials encompasses applications such as Web G/L and Web A/P with functionality that includes online drill-down capabilities. This can be seen on the XBRL financial statements and accessed from the demo site. To obtain greater details of the many features included, you may request a one-on-one, online demonstration from Newtec.

In a very nice document (for beginners) on XML, Mark Johnson lists the following benefits from extending HTML to XML:

Mark Johnson
"XML for the Absolute Beginner"
http://www.javaworld.com/javaworld/jw-04-1999/jw-04-XML.html

XHTML:  A Bridge to the Future, Information Week, May 8, 2000, pp. 210-214.  The article is not yet posted online, but eventually you will find it at http://www.informationweek.com/maindocs/archive.htm 

XHTML: A Bridge To The Future

THE W3C'S RECOMMENDATION BLENDS XML AND HTML TO PRODUCE EXTENSIBLE WEB-PAGE FORMATTING

Hypertext Markup Language, an aging, inflexible formatting standard, has fueled the phenomenal growth of the Web.  Now a new technology, a flexible data-markup standard called Extensible Markup Language, promises nearly complete flexibility.  In a flash of brilliance, the World Wide Web Consortium (W3C) has combined HTML and XML into the new XHTML recommended standard, which reformulates HTML 4.02 -- the latest version -- with XML document type definitions (DTD).

HTML is the language behind one of the fastest, most widespread technology adoptions ever.  Derived from Standard General Markup Language (SGML), HTML is simple to learn and reasonably flexible for formatting text and graphics, but it doesn't have the extensibility to adapt to dynamic Web applications.  Most every site with valuable content is more of a Web application than a Web site, requiring code components, multimedia effects, and other features that strain the limits of HTML.

HTML is usually extended by innovations in a single browser, usually Microsoft's Internet Explorer or America Online's Netscape Communicator, and these changes gradually make their way into other browsers.  Inevitably, the implementations are different enough that Web authors have a tough time making their sites viewable from different browsers, much less older versions of those browsers.  The more popular extensions eventually make their way into the group's HTML standard -- frames and scripting languages, for example.

In the last couple of years, XML has been taking the Web by storm.  Whereas HTML formats and presents information, XML marks up data so that the individual pieces of information on a Web page are identified as being of a particular type.  In a bank's data, for example, $4,562.03 is marked as the outstanding balance of a customer's loan, and $123.90 as the monthly payment, identifying them as particular kinds of data points.  Without XML, these would be just two character strings in a sea of text on a Web page.  XML provides metadata -- data about data.

The most important feature of XML is the "X."  HTML has a fixed set of tags, but with XML you can create multiple namespaces that define custom tags.  Industries can band together and create namespaces that facilitate the exchange of information.

Continuing the bank example, <Balance> and <Payment> can identify the two character strings as being specific types of information.  This facilitates exchanging data between applications and computer systems, limiting the need for expensive, complex data-conversion programs.

Bob Jensen's Home Page XML and RDF Overview XML Software Review Table of Contents
Offline References Online References Technology Glossaries XML FAQs

Some Messages on Traditional Relational Accounting Databases versus XML Databases

Some Questions Followed by Answers From Experts

Dear Professor Jensen, 
I have read roughly your review of XML and some XML references that you recommend in your homepage. Thank you so much for this very useful introduction. Taking this opportunity, I would like to ask you for some additional help. I would be grateful if you could comment on the following issues which arise from my reading of your review and other materials. 

My first issue concerns the possible contradiction between standardisation and customisation in the context of corporate financial reporting (CFR). Apparently, any widespread use of XML requires standardisation of tags which gives a common meaning to pieces of financial information. On the other hand, there has been a sustained call for a greater customer focus or customisation in CFR. My question is, to what extent XML's standardisation imperative contradicts the call for customisation? If any, in what ways? 

My second issue relates to XML's relative (dis)advantages over traditional databases (such as networked relational databases). It is claimed that XML is a powerful tool for data representation, storage, modeling, interoperation, and so on. Will XML simply replace traditional databases, or just operate on top of them? If comparable at all, how will you compare XML with traditional databases in terms of queries, storage, search, data import, data export, data exchange, data maintenance, data updating, data input etc? 

I am not sure that my questions themselves are valid, logical, or significant. They may appear silly to you, but I look forward to hearing from you. If you think these questions may be intersting to others and/or you wish to invite others to discuss them, you may post your comments together with my questions on the AECM. Many thanks in advance. Best wishes. Dr Jason Xiao Cardiff Business School Cardiff University, UK

Dr JASON XIAO [Xiao@Cardiff.ac.uk]
Cardiff Business School University of Wales, 
Cardiff Colum Drive Cardiff CF1 3EU Tel: 01222-875374 Fax: 01222-874419 URL: http://www.cf.ac.uk/uwcc/carbs/xiao/xiao.html 

Interested in XML? Sign up for a free weekly email full of XML news, features, downloads and reviews. http://www.zdnet.com/enterprise/lists/xml/subscribe.html 

A summary of XBRL is given at http://www.nwfusion.com/news/2000/0407xml.html 

Leading global accounting, financial and software industry groups and companies have announced the formation of a consortium aimed at promoting a new specification for exchanging financial data over the Internet.

The extensible business reporting language Project Committee aims to develop and by July to launch XBRL for Financial Statements, the first in a planned series of free XBRL products for sending financial statements over the Internet as well as across other software and technologies, the American Institute of Certified Public Accountants (AICPA) said in a statement.

Based on XML, the XBRL specification uses accepted financial reporting standards and practices, and aims to standardize how financial information is sent and viewed on computer screens. XBRL, formerly codenamed XFRML, has been in development for one year, according to the AICPA statement.

AICPA is one of more than 30 backers of the XBRL Project Committee, which among its members also counts some of the biggest names in the software industry, including IBM, Microsoft, Oracle and SAP AG.

Other members include Arthur Andersen LLP, the Canadian Institute of Chartered Accountants, Deloitte & Touche LLP, Ernst & Young LLP, the International Accounting Standards Committee, the Institute of Chartered Accountants in Australia, the Institute of Chartered Accountants in England and Wales, KPMG LLP, PricewaterhouseCoopers LLP and Reuters Group LP.

More information about XBRL can be found at http://www.xbrl.org/ . AICPA is at http://www.aicpa.org/ .

Neil Hannon's XBRL Resource Center at http://www.tiac.net/users/nhannon/xbrl.htm 

Section One: Introduction to XBRL
Section Two: XML and The Financial Community
Section Three: What is XML and XML Basics
Section Four: What is XBRL
Section Five Why Financial Professionals Will Use XBRL
Section Six History of XBRL
Section Seven XBRL Instance Documents
Section Eight XBRL and XML Case Studies
Section Nine Glossary of Terms
Section Ten Questions and Problems

Thanks for visiting the XBRL Education Resource Center. Be sure to visit www.xbrl.org for more information

Neil also clued me into the following: 

Navision Software releases the first XBRL-enabled financial system, Navision Financials 2.50.

[August 02, 2000] "Navision Software Releases XBRL Solution; XML-Based Financial Reporting Language Now Available in Navision Financials 2.50." - "Navision Software, a leading worldwide provider of business management solutions to the middle market, announced today that it has released its XBRL solution, one day after the publication of the official XML-based taxonomy. XBRL (eXtensible Business Reporting Language) is a free specification that first appeared on the financial and accounting scene in October of 1999. It uses a financial reporting specification, agreed upon by key members of the financial information supply chain, that allows an open exchange of financial reporting data across all software and technologies, including the Internet. The XBRL coding contained in Navision Financials 2.50 will enable customers to more easily and efficiently connect and communicate with both competing products in the ERP space and complementary products such as Caseware. For example, a set of subsidiary offices using Navision Financials can now more quickly collaborate with a parent office using a larger ERP system, while realizing significant time and cost savings. XBRL offers several key benefits: technology independence, full interoperability, efficient preparation of financial statements and reliable extraction of financial information. Information is entered only once, allowing that same information to be rendered in any form, such as a printed financial statement, an HTML document for the company's Web site, an EDGAR filing document with the SEC, a raw XML file or other specialized reporting formats, such as credit reports or loan documents. More than 80 percent of major US public companies provide some type of financial disclosure on the Internet. Investors and users of the Internet need accurate and reliable financial information that can be delivered promptly to help them make informed financial decisions." See XBRL Taxonomy - "Taxonomy for the creation of XML-based instance documents for business and financial reporting of commercial and industrial companies according to US GAAP."

The main XBRL website is at http://www.xbrl.org/ 

Extensible Business Reporting Language (XBRL), formerly code-named XFRML, is an open specification which uses XML-based data tags to describe financial statements for both public and private companies. XBRL benefits all members of the financial information supply chain.

XBRL is:

For a brief history of how XBRL came to be, see the history page.  

Hi Bob,

This message appears on Robin Cover's XML Cover Pages. 

[August 22, 2000] FpML Architecture 1.0 Working Draft Advanced to Last Call. A communiqué from Cathy S. Yesenosky announces that the Financial Products Markup Language (FpML) Architecture document is now in last call review. Members and Working Groups of the FpML Consortium and other interested parties released the FpML specifications as working drafts July, 2000. The principal FpML Version 1.0 Specifiction (together with the FpML XML DTD) is currently in a last call review phase which ends on 25-August-2000. FpML (Financial Products Markup Language) "is a business information exchange standard for electronic dealing and processing of financial derivatives instruments. It establishes a new protocol for sharing information on, and dealing in, financial derivatives over the Internet. It is based on XML (Extensible Markup Language) and initially focuses on interest rate swaps and Forward Rate Agreements (FRAs). FpML has been designed to be modular, easy-to-use and in particular intelligible to practitioners in the financial industry. Ultimately, it will allow for the electronic integration of a range of services, from Internet-based electronic dealing and confirmations to the risk analysis of client portfolios. It is expected to become the standard for the derivatives industry in the rapidly growing field of electronic commerce. The standard, which will be freely licensed, is intended to automate the flow of information across the entire derivatives partner and client network, independent of the underlying software or hardware infrastructure supporting the activities related to these transactions." The announcement says: "the FpML Architecture Version 1.0 Working Draft has been advanced to the Last Call stage. The Last Call period is expected to end September 1, 2000. We encourage interested parties to provide comments on the specification as soon as possible. Please send comments via email to fpml-issues@egroups.com . Please report each issue in a separate email message. An archive of the comments is available at: http://www.egroups.com/messages/fpml-issues . An issues list is also maintained on the web site. The FpML specifications are available at http://www.fpml.org/spec/ . For description and references, see "Financial Products Markup Language (FpML)."

Neal Hannon         Mailto:nhannon@tiac.net  nhannon@bryant.edu
Bryant College      http://web.bryant.edu/~nhannon    401-232-6227
   
XBRL... Financially Speaking in XML

On April 7, 2000 Glen Gray suggested going to the story at 
http://www.computerworld.com/home/print.nsf/CWFlash/000407D2CE
 

"Big names back new XML-based financial standard"
By Maria Trombly 
ComputerWorld, April 7, 2000

Some of the world's top financial institutions have formed a consortium to promote a new, XML-based standard for exchanging financial data over the Internet.

The group, the XBRL Project Committee, expects to launch the standard by July 1, the American Institute of Certified Public Accountants (AICPA) announced yesterday.

The standard, Extensible Business Reporting Language (XBRL), is also backed by big-name financial service companies such as Standard & Poor's, Arthur Andersen LLP, Deloitte & Touche LLP, Morgan Stanley Dean Witter, Ernst & Young LLP and PricewaterhouseCoopers.

In addition, some of the biggest names in the computer industry have lined up behind XBRL, including IBM, SAP AG, Microsoft Corp. and Oracle Corp. Financial reporting companies such as EDGAR Online Inc. and Reuters Group LP, as well as the International Accounting Standards Committee, are also backing the proposed standard.

The standard will be released in stages. The first release, scheduled for July, will cover specifications for publishing companies' financial statements in XBRL, said Mike Willis, chairman of the XBRL steering committee and a partner at PricewaterhouseCoopers. Other specifications, which will cover additional types of business reports — such as regulatory reports including Securities and Exchange Commission EDGAR files, tax filings and business event reports such as press releases — will be issued within the next 18 to 24 months, he said.

Willis said that because these specifications are simply electronic dictionaries for the XML standards that are already used in a great number of software applications, they will be simple to install and use.

"We have vendors such as SAP who are already working to integrate XBRL directly into their software, so when their customers want to run their financial statements, XBRL is an option," said Christy Reichhelm, an enterprise resource planning industry manager at Microsoft and co-chair of the public relations and communications working group for XBRL.

"This will be a new feature in these software packages, so some type of software upgrade will be gone through," she added. "But it would be minor."

XBRL will be a free specification that uses accepted financial reporting standards and practices to exchange financial statements across all software and technologies, including the Internet, the AICPA said.

"XBRL . . . greatly benefits all users of financial information," said Robert Elliot, chairman of the AICPA, in the statement released yesterday. "XBRL solves two significant problems for users and preparers of financial statements by providing efficient preparation and reliable extraction of financial data across all technology formats, including the Internet."

On April 7, 2000, a leading expert replied as follows:

XBRL, XFRML, XFDL, and the rest of the alphabet soup of XML applications are mere grammars for documents of specific sort from narrow domains, so some discipline (virtually non-existent in the HTML world) is imposed to facilitate communication.

From the viewpoint of us accountants, an equally interesting area is to tie-in these documents with backend databases. The document model for XML is hierarchical whereas most accounting databases today are relational. For seamless integration of relational databases and web-based interfaces, the application layer software should be able to map the relational stuff to the hierarchical model (in DOM) of XML. This is where the action today is. One example is the IBM prototype DB2XML, which maps the results of the SQL queries into a W3C DOM object, which can be displayed on a web browser using the Document Type Definition (DTD) and an XSL (eXtensible Stylesheet Language) stylesheet. DB2XML creates on the fly the DTD for the SQL query results.

The students in my web applications development course have been building XML fron-ends tied to backend relational (Oracle) databases, but the process is very tedious (and manual). It is imperative to develop technologies to map relational query results into DOM and then tie the whole thing together through something like JSP or ASP. The near future should be interesting. Those interested in the course may like to see

http://www.albany.edu/acc/courses/acc683.spring00.html

J. S. Gangolly [gangolly@CSC.ALBANY.EDU
Associate Professor, State University of New York at Albany, Albany, NY 12222. 
Phone: (518) 442-4949 Fax: (707) 897-0601 URL: http://www.albany.edu/acc/gangolly 

"XBRL - The emerging electronic reporting language," by Mike Willis --- http://www.accountingeducation.com/library/library147.html 

XBRL enables the users and preparers of existing financial statements to:

·    decrease the cost of accessing the information contained within the existing financial statements,

·    decrease the preparation cost,

·    increase the distribution of, and access to, existing financial statement information, and

·    increase and enhance the statement’s analysis

Scott Bonacker replied as follows on April 8, 2000

The EAI Journal recently (in the lst two or three issues) devoted a whole issue to XML and it's place in the world. There is also a website for the Enterprise Application Integration magazine - www.eaijournal.com  - and the print publication is available for free if you qualify.

EAI is an important function for accountants - whether by digital or analog tools.

Scott Bonacker, CPA [scottbonacker@CLAND.NET
CPA McCullough, Officer & Company, LLC Springfield, Missouri moccpa.com

Ron Tidd sent the following message on April 10, 2000:

Jason et al,

I am not even close to being the expert that many of the other participants in this list are. However, I think that I can address Jason's questions based on my own naive, rudimentary understanding of XML:

Standardization will/must occur within disciplines that have traditionally shared a common language. Thus, while the Web might allow everyone in the World to use XML to prepare web-based documents, each profession/discipline will/must adapt XML to its specific needs in a manner that facilitates communications between its members (e.g., math, chemistry, financial reporting, etc.). Personally, I am not too concerned that I, as an accountant, am "restricted" to the XFRML standard and that mathematicians can not derive much benefit from it.

Second, I would like to assure Jason that the questions are valid and significant. As I tell my students, you are not the only one with the question but usually, if you ask it, you are the only one with the guts. We all need to be prodded into thinking about the basics, especially with respect to emerging technologies.

Ron [rrtidd@MTU.EDU
www.rrtidd.com 

An April 19 message on the dark side of XBRL --- J. S. Gangolly [gangolly@CSC.ALBANY.EDU

XML is a meta-language, and XBRML is a language written in XML syntax (at least that is my understanding), ie., XBRML is an XML application. XBRML in some sense, therefore, is an XML derivative "customised" language for business reporting, and gives a bunch of "customised" tagset and a bunch of "grammatical" rules that fit the business reporting application.

Therefore, XBRML provides facilities for further "customisation" for a specific situation just as HTML did. Looking at XBRML as a language provides insights. The tagset and the grammar rules in XBRML are akin to lexicon and the grammar in english. It is possible to write all sorts of sentences in english eventhough the lexicon is limited (probably not more than 150,000 words or so) and the grammar, though fluid, can be taken an given.

There is nothing that prevents a company developing an XML application on its own (and might even be desirable, if one has belief in Darwinian "survival of the fittest"). My own suspicion for the bee-line for standardisation of business reporting is the aspirations of the accounting firms to keep the reporting costs down. Imagine costs of code review alone if each client had its own XML application for business reporting.

On the other hand, XBRML standardisation may lock us down to a mediocre standard in the long run. Moreover, imagine all the billings that would be created by need for, say, XARML (a fictional eXtensible Accounting Reporting Markup Language for each client) code review!

We as a profession are more worried about our liability (specially for code reviews with which traditional accountants are quite uncomfortable) than the "survival of the fittest" language.

Imagine what happened to languages with rigid standards (French being a good example vis-a-vis English) in international discourse. I do not mean to be disparaging (my favourite authors in ANY language are Romain Rolland and Moliere).

Thank you Denny Beresford for the tip on http://www.sec.gov/news/press/2000-53.txt 

SEC APPROVES ISSUANCE OF INTERPRETIVE RELEASE ON THE USE OF ELECTRONIC MEDIA

Washington, DC, April 26, 2000 - At an open meeting yesterday, the Commission approved the issuance of an interpretive release discussing the application of the federal securities laws to electronic media. The interpretations build on Commission interpretations in 1995 and 1996 and are intended to help promote the efficient dissemination of information to investors, security holders and the securities markets. In addition, the interpretations are intended to ensure that the evolving use of communication technologies to offer and sell securities is consistent with the Commission's goals of protecting investors and promoting fair and orderly markets.

Many publicly-traded companies are incorporating Internet- based technology into their routine business operations, including setting up their own web sites to furnish company and industry information. Some provide information about their securities and the markets in which their securities trade. Investment companies use the Internet to provide investors with fund-related information, as well as security holder services and educational materials. Issuers of municipal securities also are beginning to use the Internet to provide information about themselves and their outstanding bonds, as well as new offerings of their securities.

The increased use of the Internet by issuers as a means of widespread information dissemination has resulted in uncertainty about the application of the federal securities laws to these communications. Through the release, the Commission seeks to reduce this uncertainty and remove interpretively some of the barriers to use of electronic media, while preserving important investor protections.

Highlights of the Interpretations

1. Electronic Delivery

The guidance in the release resolves several issues that have arisen out of the Commission's 1995 and 1996 releases on the use of electronic media to satisfy delivery obligations. In brief, this guidance

ú clarifies that, in addition to written consent, investors and security holders may consent to electronic delivery of documents telephonically, as long as the consent is obtained in a manner that assures its validity and a record of the consent is retained;

ú permits market intermediaries (such as broker-dealers and banks) to obtain consent to electronic delivery of documents on a "global," multiple-issuer basis, as long as the consent is informed;

ú clarifies that issuers and market intermediaries may deliver documents electronically in portable document format, or PDF, as long as investors and security holders are adequately informed of the requirements to download PDF and are provided with any necessary software and assistance;

ú clarifies that a hyperlink embedded within a prospectus or any other document required to be filed or delivered under the federal securities laws causes the hyperlinked information to be a part of that document; and

ú clarifies that the close proximity of information on a web site to a public offering prospectus does not, by itself, make that information an "offer to sell," "offer for sale" or "offer" within the meaning of the federal securities laws.

 The first phase of an International Accounting Standards Committee research project on this topic can be downloaded from  http://www.iasc.org.uk/frame/cen3_26.htm.  The first phase of the project involved developing and publishing a discussion paper, "Business Reporting on the Internet." The discussion paper was published in November 1999 and was authored by: --Prof. Andrew Lymer (University of Birmingham, a.lymer@accountingeducation.com) --Prof. Roger Debreceny (Nanyang Technological University, Singapore, rogerd@netbox.com) --Prof. Glen Gray (California State University, Northridge, glen.gray@csun.edu). --Prof. Asheq Rahman (Nanyang Technological University, Singapore, aarrahman@ntu.edu.sg)

Outline of the Discussion Paper

 

XForm News

XForms --- forwarded by J. S. Gangolly [gangolly@CSC.ALBANY.EDU

INTERNET WORLD NEWS Tuesday, April 18, 2000 Vol. 2, Issue 75 http://www.internetworldnews.com 

Newfangled Forms from the W3C

By Nate Zelnick

It's been seven years since forms were added to the Hypertext Markup Language and, in the interim, a few things have changed.

For instance, in 1993 it was simply astounding to be able to collect user-supplied data from within a Web page itself through generic little widgets like text boxes, drop-down combo boxes, and Boolean radio buttons. The fact that doing anything with that data in the stateless Web meant submitting the form back up to the server and handing it off to some CGI script or other ancillary system -- which meant you could have one form per page that could be processed -- was a small price to pay. Later, client-side scripting helped relieve some of the tedium of this approach, but only by requiring a completely different development paradigm that would work only in the presence of the right version of JavaScript. In other words, a hack.

This week the World Wide Web Consortium ( http://www.w3.org ) published the first public view of where it wants to take the forms of the future. As with nearly everything coming out of the Consortium, the new XForms proposal ( http://http://www.w3.org/TR/2000/WD-xhtml-forms-req-20000329 ) begins and ends with the core value it's been promulgating since its founding: If the Internet is going to work everywhere, on every kind of device for every type of person, then information needs strict barriers between its structure, its content, and how it looks.

This meant that the HTML Activity Group that built the XForm outline had to think about what a form is and what it does in the most generic sense. Dave Ragget, one of the editors of the XForm Data Modeling Draft and the XForm Requirements document and a participant in the development of HTML from nearly the beginning, stressed that XForms is a much larger concept than merely the Web. It needs to encompass archaic media like paper, as well. A form that requires a human signature needs to exist as more than electrons, but the minute it's printed or faxed, it loses the ability for filled field values to be extracted.

But because XForms defines its data model as separate from its presentation, the position of a named field's answers can be extracted by Optical Character Recognition systems even after the electronic life has been squeezed out of it. More familiar Web-expansion problems -- like how to present a form on a cell phone, television screen, or Web-enabled blender -- are less hairy variations of the same problem.

Tuesday's XForm announcement includes only the broad definition of the problem that needs to be solved -- the Requirements doc -- and a first draft of an XForm Data Model. Possible collisions with XML Schemas -- an evolving spec that deals with defining data types for XML vocabularies -- may create some intraconsortium grumbling, but the XForm group was careful to make distinctions between its model and that ongoing work.

Early backing for the work thus far came from form-centered companies like Xerox, JetForm, and Cardiff Software. The long road to consensus -- required for something to become a W3C recommendation -- means predicting a done date is impossible.

Selected News Items


"The dark side of XML and privacy, by Jack Vaughan, September 5, 2002 --- AppDevTrends@101communications-news.com 

The data-describing power of XML could have a very dark side in the hand of mischievous individuals, says Ron Schmelzer, a senior analyst at industry analyst firm ZapThink, Waltham, Mass. "XML is essentially automating identity theft," said Schmelzer, a speaker at the XML Web Services One Conference in Boston.

By creating what Schmelzer described as a "human-readable, machine-processable, meta data-enhanced, text-based way of reading information that is tagged," XML has given developers a way to tag data fields that may be too efficient. With XML, developers don't really have the ability to tell DBAs to ignore the information. "It's like telling them not to think about polar bears. They're essentially drawing a big red flag" that points to those data fields holding sensitive information.

To resolve this problem, said Schmelzer, some programmers have turned to a strategy of obfuscation -- creating a field called XJ12 as the tag for credit cards, and splitting the credit card number into four fields or even hashing the number.

The Platform for Privacy Preferences is a popular XML-based effort that defines privacy policies in machine-readable formats and generates such policies. According to Schmelzer, attempts at offering customers P3P-based, user-centric services to store and access personal information, such as Microsoft Passport, the Liberty Alliance, CPExchange and Oasis CIQ, at best create as many questions as answers; at worst, they are doomed to failure.

All these plans have one thing in common: They use XML tags to standardize customer information. But, said Schmelzer, "if it's hard to [get agreement on] standardized simple address fields internationally, then think about how hard it will be to tag other, more complex forms of customer information."


Those of you following the tremendous impact that XML is having and will soon have upon all networking may be interested in a special insert in the November 15 Edition of The Wall Street Journal called "Technology:  The Providers."  I have not been able to find an online version of this insert.

SIDE BY SIDE

Publicly, few Microsoft officials claim that Windows will dominate the Internet, and instead say they envision a world in which Microsoft operating-system and application software coexists peaceably with that of competitors. "Windows 2000 is our intellectual property, and we will continue to drive forward with that," says Bill Anderson, head of Web application services for Microsoft, based in Redmond, Washington. But, he adds, "in a heterogeneous environment down the road, it will become increasingly difficult to interject proprietary standards in a Web-based world."

So, for instance, Microsoft has embraced an industry-wide standard for distributing data known as XML, for Extensible Markup Language. XML seems likely to become the common language of electronic commerce, making it possible for businesses to exchange in a universal format purchase orders, product descriptions and other minutiae important to e-commerce.

Microsoft has driven aggressive efforts to standardize the use of XML across the industry, even establishing a clearinghouse of XML data types called biztalk.org. Some critics have been surprised at the company's embrace of the standard; many expected Microsoft to attempt to subvert it by adding proprietary extensions that would work only on computers that run Windows.

But so far, the company's approach to XML differs substantially from its defensive reaction a few years ago to Sun's Java technology, an earlier attempt to break Microsoft's lock-in by making it possible to transfer software programs across incompatible computers without modifying them. E-mail disclosed as a result of Microsoft's numerous legal tussles has shown that officials from Bill Gates on down feared Java's threat to Windows; as one Microsoft foe put it, company officials set out a strategy to "embrace, extend and extinguish" Java by building in extensions that would tie it closely to Windows. Microsoft eventually had to abandon that strategy when Sun sued and obtained a ruling that forced Microsoft to hew to Sun's Java standards; the matter remains before the courts.

This time, Mr. Anderson insists that Microsoft will adhere to industry-wide XML standards. "It benefits us to be a good player in XML space," he says. Most Microsoft-inspired extensions to the XML standard, he says, will be accepted industry-wide; any exceptions will be "one-off" solutions tailored to solve particular problems. "We're saying we're going to take that framework, build on it and extend it, and make sure it's robust for the Windows 2000 platform," Mr. Anderson says.

Microsoft, however, envisions XML as much more than a simple data-description language; instead, it considers the standard a way of letting programs communicate with each other across networks of otherwise incompatible machines. That job currently required the use of Java or other similar technologies that create a layer of compatible "middleware" that allows programs to communicate. Microsoft competitors such as Sun and IBM consider the company's passion for XML little more than a thinly disguised attack on Java and other middleware technologies.

Indeed, some critics scoff at the notion that Microsoft intends to cooperate with the rest of the industry indefinitely. Fear of the Internet explains "why Microsoft is rushing so quickly to embrace the Web, to extend it in proprietary ways and get people to use those proprietary extensions," says Dan Kusnetzky, an analyst with International Data Corp. "Once you do, you are tied to Microsoft, which is trying to own the Web in a way no other provider is really trying to do."

XML rift may split HR applications standards.  A schism is forming over the way developers use XML schema to integrate human resources software and services --- http://www.pcweek.com/a/pcwt9911152/2393058/ 

XHTML:  A Bridge to the Future, Information Week, May 8, 2000, pp. 210-214.  The article is not yet posted online, but eventually you will find it at http://www.informationweek.com/maindocs/archive.htm 

Selected online references

One of the best articles on XML without all of the techie jargon is entitled "Is XML the answer?  Depends on the Question?" by Michael Goulde in Application Development Trends, October 1999, pp. 21-22.  The online version is at http://www.adtmag.com/Pub/oct99/d9910xml.htm 

One of the reasons XML has captured so much interest so quickly (Version 1.0 of the XML specification was released in February 1998) is that it represents a parsimonious solution to a wide variety of problems. There are three sets of users who have a very high level of interest in XML. The first group includes Webmasters and other designers of Web-based information systems who use HTML to mark up information for presentation, but have no way to structure the information they send to browsers. By providing structure to the unstructured Web data in a standard way, a Web query can deliver a much more useful set of results, increasing the value of the information.

The second group of users have toiled for years with the Standard Generalized Markup Language (SGML) to create structured documents such as training manuals and technical documentation. Although HTML is derived from SGML, SGML in general is not well suited to the Web environment because it is extremely complex -- something that has also affected its universal adoption. XML is an SGML derivative that is not only easier to use on the Web, it has garnered wider adoption. These users have also become very active in World Wide Web Consortium (W3C) working groups that are hammering out additional specifications and standards to ensure that XML meets many of the same application requirements as SGML.

The third set of users fascinated by XML are a set who were not originally targeted by the W3C's XML efforts. But very early on, application developers building distributed applications -- and faced with difficult challenges around application integration and interoperability -- saw XML as a way to free their applications from the tyranny of over-the-wire binary formats that made it impossible to link applications together in real time. These developers, many from the Java community, but equally as many using Microsoft tools, quickly realized that they could use XML syntax in their messages and, because of the self-describing nature of XML documents, applications could exchange data without having to be explicitly written or compiled to do so. Freedom! This group has now expanded to include developers who want to extend EDI, link networks of suppliers and customers, create dynamic marketplaces, and perform other heretofore impossible tasks over the Web.

How XML unleashes data
PC Week Labs looks at XML, the simple and almost universally compatible language that is poised to become the key technology in e-commerce ---  http://www.pcweek.com/a/pcwt9911231/2396831/ 

Why XML is failing (in the eyes of one analyst)
According to John Taschek, the "unifying" technology is falling victim to its own popularity as big adopters bypass the standards bodies. http://www.pcweek.com/b/pcwt0004266/2551691/ 

XML is Not Yet a Cornerstone Technology," Application Development Trends, April, 2000, pp. 55-60.  The online version is at http://www.adtmag.com/Pub/apr2000/fe401a.cfm 

Despite the promises, corporate developers need to make smart decisions about how to apply the technology as it is today to specific integration problems and challenges. Perhaps just as important, developers have to disregard some of the growing myths that surround the eXtensible Markup Language (XML). This article will show that while XML is not the cornerstone of EAI, it is an important enabler that, when used correctly, can be a key weapon in any corporation's IT arsenal.

Nevertheless, the Web as a delivery mechanism and XML as the delivery format is already a very powerful combination that can enable integration across the board for business-to-business (B2B), business-to-consumer (B2C) and application-to-application (A2A) connectivity.

Neil Hannon provided the following updates:

XML and metadata news
XML could slash forex costs...
Financial Times   Sun Aug 6 16:28:31 CDT 2000
 
Mercator Software - Upgrading to Strong Buy from Buy...
Wit Capital   Sun Aug 6 11:47:51 CDT 2000
 
Sun releases SVG tools for XML...
365java   Sun Aug 6 10:48:15 CDT 2000
 
Sun Microsystems: Sun Microsystems Delivers Tools to Help Developer Community Leverage Xml Standards for Graph...
Java Industry Connection   Sat Aug 5 02:28:05 CDT 2000
 
Major retailers back XML...
InfoWorld   Sat Aug 5 02:24:45 CDT 2000
 
Qt is still a prime choice for toolkits even if you ignore its greatest strengths...
InfoWorld   Sat Aug 5 02:24:24 CDT 2000
 
There's been a reality check, but the Internet stock bubble has not quite burst...
InfoWorld   Sat Aug 5 02:24:24 CDT 2000
 
Where in the world is Carmen Sandiego? She's spying on your network...
InfoWorld   Sat Aug 5 02:24:24 CDT 2000
 
Major retailers back XML...
IDG.net   Sat Aug 5 02:21:23 CDT 2000
 
HIT establishes EDI link with Evergreen...
Journal of Commerce   Fri Aug 4 14:11:55 CDT 2000
 
 
Get these headlines on your site...

 

Bob Jensen's Home Page XML and RDF Overview XML Software Review Table of Contents
Offline References Online References Technology Glossaries XML FAQs

Selected offline references:

 

Bob Jensen's Home Page XML and RDF Overview XML Software Review Table of Contents
Offline References Online References Technology Glossaries XML FAQs

Selected Software Alternatives for XML Authoring

A good web site to follow for XML software updates is the Web Tools site at http://www.webtools.com/toolbox/html .

For a time, not much was out there in the way of authoring software for XML and the standards have not yet been fully established to be embedded in web browser software.   However, many business firms are already using XML and there are many software options ranging from free to thousands of dollars. 

XML Notepad from Microsoft Corporation

The latest release of MSXML3, the company's XML parser, offers VB access to API and better conformity to XML standards --- http://www.eweek.com/a/pcwt0008025/2610542/ 

Microsoft has a free download of XML Notepad in beta form that will perform some simple XML basics.  It is described at http://msdn.microsoft.com/XML/NOTEPAD/intro.asp .  Frequently asked questions about XML Notepad are answered at http://msdn.microsoft.com/XML/NOTEPAD/faq.asp .  I downloaded a free copy from http://msdn.microsoft.com/XML/NOTEPAD/download.asp
Microsoft Corporation's dedication to great new things in XML is described at http://www.gca.org/memonly/XMLfiles/issue4/edit.htm
Both Internet Explorer and Netscape have XML viewing capabilities.  See http://www.softseek.com/Internet/Web_Browsers_and_Utilities/Browsers/Review_20326_index.html .  On the heavy duty side of XML, see SQL Server 7.0 and XML Power Microsoft’s Product Catalog http://www.microsoft.com/backstage/

XMetaL from Soft Quad

Possibly the best buy in full-featured XML authoring software packages is called XMetaL from the company (Soft Quad) that originated the HTML and web server software called HOT METAL PRO. The price is only $495 for the world and $347 for poor professors (very reasonable for XML authoring). You can read the following in documents at http://www.sq.com/products/XMetaL/index.html

XMetaL is a highly customizable XML authoring tool that delivers unprecedented ease of use to authors while shielding them from the complexities of XML, lowering costs of both customization and deployment.

You can read the initial press release about XMetaL at http://www.sq.com/press/releases/pr990525.html .

Dynabase XML Software

A heavy duty XML backbone is Dynabase from INSO (800-733-5799) at http://www.inso.com/Dynabase can be built on top of such relational database systems as Oracle, Sybase, Informix, SQL Server, and DB2.  (It should be pointed out, however, that XML will eventually be an object-oriented database system).  Dynabase uses a proprietary programming language that is very close to Visual Basic and will, therefore, integrate well with Microsoft's Office 2000 products.  It is a bit early for poor professors to start experimenting with Dynabase since it carries a price tag of $50,000.  But Dynabase is already on the move in the corporate world.

ArborText XML Software

A leading company for heavy duty SGML and XML development is ArborText at http://www.arbortext.com/ .  ArborText produces a new software product called EPIC described as follows:

Because Epic connects directly to Microsoft Word, you can easily import existing product information contained in Word files and convert them to valid XML. Epic can also use Word’s filters to import product information contained in other formats including Microsoft Excel tables, Word Perfect files, and more.  After the import is finished, Epic helps you fix up anything that does not convert to valid XML.   In addition to a traditional editing view, Epic also displays the document in an editable, hierarchical view through its Document Map.  In addition, Epic contains several tools that simplify the structured XML authoring process. One example is the Insert Element panel on the right. This allows authors to find the appropriate element by first selecting a category; in this example, the author has selected the "List" category and can then choose from all the types of lists that Epic supports.

In addition, ArborText has the The ADEPT Series described at http://www.arbortext.com/Products/ADEPT_Series/adept_series.html

ADEPT Series -- Supports XML and SGML authoring and page publishing on Windows-based PCs and UNIX-based workstations. ADEPT·Editor -- Allows authors to write text, place graphics and create books, manuals, catalogs, encyclopedias, and similar types of information. Also, ADEPT’s Willow technology enables tight integration between ADEPT and document management systems. ADEPT·Publisher -- Includes all the capabilities of ADEPT·Editor plus page composition. ADEPT·Publisher automatically lays out pages by balancing the need for page fullness with the need to keep related elements together to provide a powerful tool for increasing author productivity. Document·Architect – Provides an application development tool to build DTDs (Document Type Definitions), design stylesheets, and and customize the behavior of ADEPT.  

Pricing at ArborText appears to be negotiated, and it does not appear possible to find ballpark pricing at the company's web site.  It appears that ArborText software is not priced for poor professors.

XML Software from POET Software

POET Software is a leader in data management products and Internet solutions. POET’s object data management products are widely used in next generation packaged software applications and embedded systems. POET’s XML based Internet applications enable e-commerce and content management over the Web.

The main POET web site is at http://www.poet.com .  I highly recommend the POET Content Management Guided Tour at http://www.poet.com/products/cms_solutions/guided_tour/index.asp

Chrystal Software from Xerox Corporation

The XML News Center from Xerox is at http://www.chrystal.com/XML/XML.htm

Xerox offers two product lines for XML content management.  Astoria creates structured documents.  Canterbury is for users of Adobe FrameMaker + SGML.   My browsers have trouble reading the Java pages about these two products. 

Other XML Software Options

There are many free XML software downloads, although most require programming skills.  One such listing is available in Golfarb and Prescod (1998, Chapter 30) .  The CD-ROM that accompanies the book contains 55 types of XML freeware.

A good web site to follow for XML software updates is the Web Tools site at http://www.webtools.com/toolbox/html .

 

Bob Jensen's Home Page XML and RDF Overview XML Software Review Table of Contents
Offline References Online References Technology Glossaries XML FAQs

Frequently Asked Questions (FAQs)

Where should I go to track developments in XML software?

A good web site to follow for XML software updates is the Web Tools site at http://www.webtools.com/toolbox/html .

What are some of the existing and/or proposed applications of XML?

Consider for example the following applications described in  Golfarb and Prescod (1998)


Are there any financial services standards for XML?

From InformationWeek Daily, July 8, 1999
Financial-Services XML Standard Proposed

Software vendor Integral Corp. yesterday released programming details for creating documents using FinXML 1.0, a proposed standard for data interchange in the financial-services industry based on the Extensible Markup Language. XML is a promising new meta-language that is increasingly seen as a way to facilitate E-commerce by bringing better structure to data on the Web.

Integral has specified, and made publicly available via the Web, a set of data table definitions that prescribe rules for sharing data using FinXML. It plans next to make FinXML compatible with other flavors of XML being developed in the E-commerce arena, such as Microsoft’s BizTalk initiative and Ariba Inc.’s commerce XML effort. Integral says FinXML is already interoperable with the Financial Industry Exchange protocol and the Open Financial Exchange protocol, which is used for online billing and other retail transactions.

Integral is attempting to form a consortium of technology vendors and users to back FinXML, but it has not yet identified any corporate members. Sun Microsystems and Chase Manhattan Corp. have endorsed the concept of FinXML, but a spokesman for Integral says it is premature to say whether they will actually join the planned group.


Some drawbacks of not having regulation:  Why is XML more likely to be popular in business-to-business applications than business-to-consumer applications?

Before answering that question, let me relate a short story.   According to a San Antonio TV report a couple of years ago, an employee in a local Best Buy department store discovered a reporter walking down the isles recording Best Buy prices and product model numbers.  The reporter was promptly escorted out of the store due to a supposed Best Buy policy of not allowing store prices to be recorded for comparison shopping guides.  The bottom line is that most vendors do not like to make it easy to compare prices.

The first XML book price comparison service was the Junglee Shopping Guide according to Golfarb and Prescod (1998, Chapters 9 and 29).  However, doing so was not easy since XML markups are not provided by book sellers.  Junglee used "extractors" to automatically extract prices from unstructured (non-XML) text.  For definitions of terms like "extractor" and "wrapper," see my Technology Glossary.)  

In any case, Junglee Shopping Guide is now part of Amazon.  You can shop for a variety of products in this added service from Amazon, but I could not find how to shop for the best deal on books. I wonder why Amazon dropped price comparison guides for books after acquiring the Junglee Shopping Guide.

If book sellers put current prices and handling/shipping charges into XML markups for each ISBN number, it would be possible to easily compare prices for hundreds of book sellers.  If such metadata were available in vendor XML markups, the wonderful book comparison shopping guide provided by Glenn Fleishman at http://isbn.nu/ could more easily extract comparative prices for hundreds of book sellers instead of the 17 book sellers that he now scans as a public service without having the benefit of XML tags. 

The following message (from Glenn Fleishman) not only highlights a special problem that XML enthusiasts will have in providing shopping comparison guides, it possibly touches upon a business ethics issue as well.  We seem to be approaching a classic dilemma of where the only way to make it easier to comparison shop will be for the government to require firms to make it easier to obtain the necessary data.

Hi Dr. Jensen,
No XML here, unfortunately. It’s not necessarily in the interests of an online bookstore to provide XML tags on their data, as an easier comparison of their prices does not necessarily help stores sell their books
.

I am very excited about XML, but I believe it will wind up being used primarily in business-to-business partnership applications and in applications that replace proprietary EDI systems.

It will also be an amazing tool for transferring information from heterogeneous data sources that will far, far surpass the awful tab-delimited text file format.

Glenn Fleishman [glenn@glenns.org]

Fleishman's excellent ISBN book shopping comparison guide is at http://isbn.nu/ . Although I like this guide as a consumer, such guides do tend to reduce standardized products to commodity items.  This, in turn, may stifle innovations that add to vendor overheads and require longer-term capital spending on improved services.

Note that it is possible to generate XML markups (e.g., for consumer guides) even though the vendor web sites do not have XML tags.  See the definition of wrapper and extractor at http://www.trinity.edu/~rjensen/245glosf.htm#Wrapper .


Some benefits of regulation:  Why are investors and investment analysts already experiencing benefits from XML markups?

Although vendors may not willingly provide any XML markups that make it easier to conduct competitive comparisons on most anything (prices, quality, ingredients, consumer complaints, reliability tests, etc.), there are areas where industry or government regulations already require public disclosures.  Those disclosures are subject to XML markups for ease of comparison.  Great examples are the required accounting disclosures required by the Securities and Exchange Commission (SEC) for the EDGAR database.  These disclosures fit into an EDGAR Document Type Definition (DTD) published at the SEC web site.  Each submission document to the SEC from registrants required to file financial data with the SEC is supposed to conform to the SEC DTD.   The EDGAR Filer Manual can be downloaded from http://www.sec.gov/asec/ofis/filerman.htm .  The Appendices are quite interesting.  They are as follows:

APPENDICES to EDGAR Filer Manual::

A  Form Types Accepted for Electronic Filing This appendix lists the form types that EDGAR accepts and the EDGAR Submission Header type names given to each SEC form. It also provides a page reference to Appendix B, where we provide the tags appropriate for each submission header type.

B  EDGAR Tags by Submission Header Type In this appendix we provide the EDGAR tags appropriate for each submission header. Please note that we have added new tags within the General Tags section which apply to all form types.

C  Acceptable Values for Paper Forms for Electronic Filing The EDGAR system recognizes a limited set of values for certain tags. This appendix lists the values you must provide in a specified format.

·D  Messages Reported by EDGAR This appendix includes information to assist you in understanding the acceptance and suspension messages that EDGAR generates.

E   Tagging for Financial Data Schedules This appendix provides information on EDGAR requirements for Financial Data Schedules processed by our Divisions of Corporation Finance and Investment Management. Financial Data Schedules require specific EDGAR tags; this appendix includes the correct format for input of tags and data by Article type.

F  Paper Forms for Electronic Filing Form ID Uniform Application for Access Codes to File on EDGAR Form SE Form for Submission of Paper Format Exhibits by Electronic Filers Form ET Transmittal Form for Electronic Format Documents Under the EDGAR System Form TH Notification of Reliance on Temporary Hardship Exemption

G  Glossary of Commonly Used Terms, Acronyms, and Abbreviations

H  Form 13-F Special Electronic Filing Instructions

I  EDGARLink[R] Script Language

J  Instructions for Attaching HTML Documents to Electronic Filings This new appendix provides information to assist you in creating SEC-acceptable HTML documents. This appendix provides the allowed HTML tags and disallowed HTML attributes for specific HTML tags. This appendix also includes all new HTML/PDF error messages.

K  Instructions for Attaching Unofficial PDF Documents to Electronic Filings This new appendix provides information to help you create and attach SEC-acceptable PDF documents.

In addition to the SEC DTD, extensive rules and regulations of the SEC dictate what financial data are to filed with the SEC (e.g., a complete 10-K annual set of audited financial statements).  The required submission data and the DTD facilitate using XML for filing and retrieving EDGAR data.  In 1998, an entire chapter is devoted to XML submissions to the SEC in Golfarb and Prescod (1998, Chapter 11). These ideas are elaborated upon in a 1999 paper entitled "The Electronic Dissemination of Accounting Information - Resource Discovery, Processing, and Analysis" by Roger Debreceny, Glen Gray, and Tony Barry.  I recommend that all of you contact one of these authors for a copy. In particular you may request a copy from Glen at glen.gray@csun.edu  or Roger at rogerd@netbox.com .


Why are business firms rushing so quickly away from extranets and the "old" electronic data interchange (EDI) into the new EDI based upon XML?

Extranets are not efficient in handling the multiple platforms of computing.  For example, a manufacturer may have to deal with hundreds of suppliers having little consistency in serving up data.  XML is going to bring more efficient compatibility between diverse systems.  See Golfarb and Prescod (1998, Chapter 7).


Why is XML more efficient than HTML and ODBC networked database query interactions?


What are some XML search sites that will find the best deals for shopping?

Possibly to first book searching web site was the Junglee Shopping Guide that is now a part of Amazon books at http://www.junglee.com/ .


Why is XML a type of "middleware"?  How is XML generated in the middle tier?

The middle tier referred to above is actually a middleware process that sits between remote servers and remote clients.  It is technically called message-oriented middleware (MOM).  Instead of requiring specialized programs to query (access) data, the XML MOM in the middle tier acts as a data transmission "middleman" that contains a single XML parser that separates the XML markups from the data.  Ideally, the middle tier server collects all the data from multiple sources and then transmits this data in a consolidated file to the client machine.  The old concept of a HTML "document" is extended to a XML "DocuMent" that is comprised of the data plus the markup scripts. The middle tier server parses transmissions to separate data from markups and renders it for particular types of delivery such as screen displays, audio, Braille, etc.  In this context MOM renders POP (presentation-oriented publishing).  Not all XML is generated at the remote server level.  XML can be generated in the middle tier by ASP active server pages as explained in See Golfarb and Prescod (1998, pp. 82-83).   ASP scripts are delimited by "<%" and "%>".

XML can be used to process requests and replies in a Business-to-Business (B2B) Integration Server as illustrated by Golfarb and Prescod (1998, Chapter 6).  For example, a B2B server that integrates data and communications with virtually all vendors in the supply and/or customer chain of a business.


What is the role of markup scripts like JavaScript, VBScript,  and DHTML in the XML scheme of things?  How does this differ from Java, CGI codes, and other similar codes on server computers?

XML does not necessarily eliminate HTML, DHTML, and other types of markup scripting.  The XML is typically being marked up on a middle tier server that accepts other types of markups as inputs and outputs.

By way of illustration, I recently requested that the Social Security Administration provide me with a statement of my past earnings and estimated future benefits .  The SSA server has code to respond to the form that I transmit from my client computer.  However, it takes up to eight weeks for the SSA to process my query and return the answers via snail mail.  If the SSA server was feeding XML script extensions into a middle tier server, I could have instantly downloaded all past earnings data and other data that pertains to me.  The SSA might also allow me to download benefit-computing software that would enable me to compute future benefits based upon various elective retirement scenarios.  The point here is that XML will allow citizens to download subsets of the entire database as those subsets pertain to them.


What are wrappers and extractors?

A wrapper is a Java applet designed to extract data from web sites.  Such a wrapper makes use of XML structure.  A wrapper may use one or more "extractors" to extract data from unstructured XML files.  Extractors utilize dictionaries to achieve sophisticated linguistic processing of unstructured text.   Life is much easier for structured documents having XML markups.  An illustration in terms of a web shopping guide is provided in  Golfarb and Prescod (1998, pg. 136).


What kind of language is XSLT? --- http://www-106.ibm.com/developerworks/library/x-xslt/?dwzone=x 

An analysis and overview Michael H. Kay (mhkay@iclway.co.uk

February 2001

What kind of a language is XSLT, what is it for, and why was it designed the way it is? These questions get many different answers, and beginners are often confused because the language is so different from anything they are used to. This article tries to put XSLT in context. Without trying to teach you to write XSLT style sheets, it explains where the language comes from, what it's good at, and why you should use it.

I originally wrote this article to provide the necessary background for a technical article about Saxon, intended to provide insights into the implementation techniques used in a typical XSLT processor, and therefore to help users maximize the performance of their style sheets. But the editorial team at developerWorks persuaded me that this introduction would be interesting a much wider audience, and that it was worth publishing separately as a free-standing description of the XSLT language.

What is XSLT? The XSLT language was defined by the World Wide Web Consortium (W3C), and version 1.0 of the language was published as a Recommendation on November 16, 1999 (see Resources). I have provided a comprehensive specification and user guide for the language in my book XSLT Programmers' Reference and I don't intend to cover the same ground in this paper. Rather, the aim is simply to give an understanding of where XSLT fits in to the grand scheme of things.

The role of XSLT XSLT has its origins in the aspiration to separate information content from presentation on the Web. HTML, as originally defined, achieved a degree of device independence by defining presentation in terms of abstractions such as paragraphs, emphasis, and numbered lists. As the Web became more commercial, publishers wanted the same control over quality of output that they had with the printed medium. This gradually led to an increasing use of concrete presentation controls such as explicit fonts and absolute positioning of material on the page. The unfortunate but entirely predictable side effect was that it became increasingly difficult to deliver the same content to alternative devices such as digital TV sets and WAP phones (repurposing in the jargon of the publishing trade).

Drawing on experience with SGML in the print publishing world, XML was defined early in 1998 as a markup language to represent structured content independent of its presentation. Unlike HTML, which uses a fixed set of concepts (such as paragraphs, lists, and tables), the tags used in XML markup are entirely user defined, and the intention is that they should relate to objects in the domain of interest (such as people, places, prices, and dates). Whereas the elements in HTML are essentially typographic (albeit at a level of abstraction), the aim with XML is that the elements should describe real-world objects. For example, Listing 1 shows an XML document representing the results of a soccer tournament.


The Semantic Web

Innovation of the Future --- The Semantic Web

A message from Sean Palmer

 The debate on the meaning of the term is intense  given the importance and source of such assertions as  "the semantic web is the future of the www[...] Requirements  such as "describing every possible aspect of your data"  are impossible to meet and will not be considered  seriously

You think? That is the "aim" of the SW: where everything is so connected that proof validation becomes possible. Describing every possible aspect of your data will become possible (although according to Tim Berners Lee, proof generation will not be a required part of the SW).

 The nebulousness of the term "semantic web" leaves a  lot to be desired if one is trying to write hard  requirements for scheduled projects. A precise  description of a meta-data service based on sharing  RDF documents has a better chance of being taken  seriously.

Are you saying the SW isn't being taken seriously? If so, I could cite all RDF/DC/W3C meta-data efforts as proof to the contrary! RDF etc. are all integral parts of the application and architecture side of the SW. They are not an end to the means, but they are a good step in the right direction. Could you give an example of a meta-data based service for me? I agree that meta-data based services will be easier to explain/describe etc. than the SW as an entity in itself; imagine trying to concisely explain the WWW. That is in essence what we are trying to do: the SW is evolving from the WWW!

Kindest Regards, Sean B. Palmer
http://xhtml.waptechinfo.com/swr/
 
"Perhaps, but let's not get bogged down in semantics."

"The Semantic Web,"  Seth Grimes, Intelligent Enterprise, March 28, 2002, pp. 16-17 --- http://knowledgemanagement.ittoolbox.com/documents/document.asp?i=1579&r=default.asp 

Tim Berners-Lee invented the World Wide Web in 1989 by creating a language for presenting and linking content, an information-interchange protocol, and basic client/server software. By 1994, the year he founded the World Wide Web Consortium (W3C) at the Massachusetts Institute of Technology, his attention had turned to embedding meaning in his creation.

This interest was long before the birth of well-loved sites (depending on your interests) such as SlashDot, The Onion, and Gap.com. Applying business intelligence (BI) and knowledge management (KM) jargon, these kinds of sites are "stovepipe" systems, designed and used primarily for a single purpose, whether furthering technology development, commentary, or mass-market retail. Regardless of their purposes, these isolated systems constitute "islands of information." Much of the work in the BI and KM worlds is to build bridges among such islands. The thought is that a whole is greater than the sum of its parts, that acquiring new, varied data sources will expand the scope of analyses and strengthen their reliability. Fully integrating, rather than just occasionally drawing from those sources and finding relationships in the integrated whole, can bring additional rewards.

Building Bridges In the BI world, developers build bridges using harmonized metadata and coordinated access methods. KM adds value by classifying and cataloging unstructured information to create analytic metadata. But, BI bridges are often one-directional and purpose-built, designed to support ad hoc queries and periodic reports, aggregation and consolidation via scorecards and portals, and electronic data interchange (EDI) extractions from specific sources to specific users. Generalized interchange, where information can flow in more than one direction and the user and source haven't negotiated protocols in advance, requires standardized, published interfaces with common metadata structures and definitions.

Interchange standards such as the extensible markup language (XML) and efforts such as the Dublin Core metadata project - which seeks to promote "the widespread adoption of interoperable metadata standards and developing specialized metadata vocabularies for describing resources that enable more intelligent information discovery systems" - are significant advances on the generalization road.

Searching for Meaning Internet search engines compensate for the lack of meaning in the Web by analyzing sites for keywords and concepts that match user-supplied search terms. E-commerce exchanges attempt to meet integration and automation needs in the collaborative e-business arena - procurement, supply, logistics, and so forth - by mediating transactions via XML-encoded, domain-specific vocabulary and syntax. And, the latest much-hyped Internet hot ticket, Web services, takes a similar, server-centric approach, proposing universal description, discovery, and integration (UDDI) registries based on somewhat rigid definitions of what constitutes a service.

Search engines, exchanges, registries: The Web was supposed to be a disintermediated world, free of gatekeepers, where anyone can publish - "information wants to be free" - and agents automate routine tasks. The issue with these technologies is not only control, but also how best to accommodate and exploit the distributed, diverse, and constantly evolving nature of the Web.

These lofty concerns are not priorities for companies attempting to deliver particular products in a competitive marketplace. Despite the supposed dot-com bust, entrenched brands can still emulate Amazon.com by using bold applications of technologies that circumvent conventional production and delivery methods. Businesses often seek to protect their market positions by building fortresses, isolating themselves behind the walls of anticompetitive, proprietary platforms and business practices that together create barriers against innovative rivals.

This strategy was the shape of the Windows devolution, where technologies that support efficient, automated computing were (temporarily) displaced by interfaces that require laborious mouse work. The "personal" in personal computer is about enabling mass marketing of dumb systems, where the need for intelligence is shifted to users. Our PCs process information, but they don't understand it.

Search engines help circumvent walls and overcome isolation. Keyword search was a start. The next generation of engines examined context by automated or manual examination of links to and from pages and by translating keywords and summaries into concepts. Automated classification and categorization of results is now common; emerging search technology discerns attributes in pages and matches them to profiles that describe the properties of results to target.

The semantic Web, if realized, could support all these goals and more. The aim is to enable profile-based software agents not only to search but also to act on what they find. A semantic Web would shift the information processing burden back where it belongs, to a world of automated, peer-to-peer computing where people spend more time thinking about business and life problems and less time trying to remember whether "Preferences" is under File, Edit, or View on a program's menu bar.

Visualize a closed loop networked system that classifies and categorizes via KM and data mining techniques, applies algorithms to score and rank results, and derives and executes business rules. Achieving this ambitious synthesis will require creation of multidisciplinary standards that are independent of business domain and implementation technology. The W3C has been guiding this effort.

Continued at http://www.iemagazine.com/020328/506decision1_1.shtml 

Note that Tim Berners-Lee was the lead developer of HTML scripting and the Web.
"The Semantic Web A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities," by TIM BERNERS-LEE, JAMES HENDLER and ORA LASSILA,  Scientific American --- http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html 

The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users. Such an agent coming to the clinic's Web page will know not just that the page has keywords such as "treatment, medicine, physical, therapy" (as might be encoded today) but also that Dr. Hartman works at this clinic on Mondays, Wednesdays and Fridays and that the script takes a date range in yyyy-mm-dd format and returns appointment times. And it will "know" all this without needing artificial intelligence on the scale of 2001's Hal or Star Wars's C-3PO. Instead these semantics were encoded into the Web page when the clinic's office manager (who never took Comp Sci 101) massaged it into shape using off-the-shelf software for writing Semantic Web pages along with resources listed on the Physical Therapy Association's site.

The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. The first steps in weaving the Semantic Web into the structure of the existing Web are already under way. In the near future, these developments will usher in significant new functionality as machines become much better able to process and "understand" the data that they merely display at present.

The essential property of the World Wide Web is its universality. The power of a hypertext link is that "anything can link to anything." Web technology, therefore, must not discriminate between the scribbled draft and the polished performance, between commercial and academic information, or among cultures, languages, media and so on. Information varies along many axes. One of these is the difference between information produced primarily for human consumption and that produced mainly for machines. At one end of the scale we have everything from the five-second TV commercial to poetry. At the other end we have databases, programs and sensor output. To date, the Web has developed most rapidly as a medium of documents for people rather than for data and information that can be processed automatically. The Semantic Web aims to make up for this.

Like the Internet, the Semantic Web will be as decentralized as possible. Such Web-like systems generate a lot of excitement at every level, from major corporation to individual user, and provide benefits that are hard or impossible to predict in advance. Decentralization requires compromises: the Web had to throw away the ideal of total consistency of all of its interconnections, ushering in the infamous message "Error 404: Not Found" but allowing unchecked exponential growth.

Knowledge Representation For the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning. Artificial-intelligence researchers have studied such systems since long before the Web was developed. Knowledge representation, as this technology is often called, is currently in a state comparable to that of hypertext before the advent of the Web: it is clearly a good idea, and some very nice demonstrations exist, but it has not yet changed the world. It contains the seeds of important applications, but to realize its full potential it must be linked into a single global system.

Traditional knowledge-representation systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as "parent" or "vehicle." But central control is stifling, and increasing the size and scope of such a system rapidly becomes unmanageable.

Moreover, these systems usually carefully limit the questions that can be asked so that the computer can answer reliably— or answer at all. The problem is reminiscent of Gödel's theorem from mathematics: any system that is complex enough to be useful also encompasses unanswerable questions, much like sophisticated versions of the basic paradox "This sentence is false." To avoid such problems, traditional knowledge-representation systems generally each had their own narrow and idiosyncratic set of rules for making inferences about their data. For example, a genealogy system, acting on a database of family trees, might include the rule "a wife of an uncle is an aunt." Even if the data could be transferred from one system to another, the rules, existing in a completely different form, usually could not.

Semantic Web researchers, in contrast, accept that paradoxes and unanswerable questions are a price that must be paid to achieve versatility. We make the language for the rules as expressive as needed to allow the Web to reason as widely as desired. This philosophy is similar to that of the conventional Web: early in the Web's development, detractors pointed out that it could never be a well-organized library; without a central database and tree structure, one would never be sure of finding everything. They were right. But the expressive power of the system made vast amounts of information available, and search engines (which would have seemed quite impractical a decade ago) now produce remarkably complete indices of a lot of the material out there. The challenge of the Semantic Web, therefore, is to provide a language that expresses both data and rules for reasoning about the data and that allows rules from any existing knowledge-representation system to be exported onto the Web.


The Dark Side of XML

Now in its final review phase, the W3C's long-awaited XML Schema is under fire for being far too complex --- http://www.eweek.com/a/pcwt0104232/2710691/ 

From Internet Week on April 24, 2001

The original promise of Extensible Markup Language was to offer an easier way for companies to exchange data with customers and partners independent of their specific database platforms or architectures. But the proliferation of industry-specific XML dialects may create more confusion than communication as IT managers try to build B2B systems.

The problem with so many schema popping up so quickly is that there's too much variation in how they define common data fields such as "company name" or "address." Many have overlapping purposes, or are aimed at overly narrow functions. That makes it hard for any of them to gain critical mass.

"Most of these schema will be dead in two years," says Gartner Group analyst Rita Knox, noting that the specificity of first-generation XML schema is exactly what will spell their demise. "It's like giving a person 100 sentences to use for the day. That's not really how you communicate."

The best approach for companies is to stay pragmatic when evaluating the business benefits of industry-specific XML dialects. Sometimes, the flexibility of a company's XML middleware may be more important than any individual schema, because it's middleware that ultimately lets companies quickly translate and implement any given flavor of XML.

Read on: http://update.internetweek.com/cgi-bin4/flo?y=eDV60Bdl6n0V30NFW0AP 

Microsoft, Hyperion team up on open XML spec:  The companies expect the new specification, known as XML for Analysis, to ensure the interoperability of analytic applications across different platforms --- http://www.eweek.com/a/pcwt0104245/2711446/ 

To help users navigate through a sea of confusing XML-related specs and standards, RosettaNet is trying to start a powerful new trend: convergence --- http://www.eweek.com/a/pcwt0104253/2712187/ 


IETF  "standards"

Deep Web forwarded by The Webmonkey on May 18, 2001

Apparently search engines reach less than one percent of the sites out there. So how do we get to the other 99+ percent known as the The Deep Web? It helps when the sites want to be found and we know where to look.

"How to go about digging deeper on the Web," by Jackie Loohauis,  JS Online, May 12, 2001 --- http://www.jsonline.com/enter/netlife/may01/deepweb13051101.asp 

Traditional search engines have access to only a fraction of 1% of what exists on the Web, according to BrightPlanet, an Internet search company, noting that as many as 550 billion pieces of content are hidden from most search engine scrutiny. These documents make up what is known as "The Deep Web."

Undercover and undercovered, the vast reservoir of the Deep Web is estimated to be 500 times larger than the "surface" World Wide Web. And, according to BrightPlanet, the Deep Web is the largest growing category of new information on the Net.

"There's a huge amount of information you can't find entirely or easily via a search engine," says Net search guru Gary Price, a librarian at George Washington University, and co-author of the upcoming book "The Invisible Web" (CyberAge Books, $29.95). "The material on the Web is unorganized, very ephemeral. There's no rhyme or reason, no language control. The Web is a huge directory that's very hard to get at."

What's hidden? What makes up the depths of The Deep Web? The biggest part of this invisible Web is information stored in databases - massive libraries of Web content unsearchable through such tools as Yahoo! and Google. You have to know they exist before you can search them.

Such a database would be the Government Printing Office listings at www.access.gpo.gov/su_docs/aces/aaces002.html . There are thousands more.

Other aspects of the Net remain hidden in deep waters, too.

"There are tons of things out there," says Tara Calishain of Researchbuzz.com, an online Internet guide. "Pay content sources, lots of genealogy sources. The Library of Congress ( www.loc.gov ) has fabulous collections you can't find on Alta Vista."

Several types of information are most elusive for search engines - bibliographies, multimedia files, information that comes in .pdf files (Adobe's portable document format). "News is dreadful, says Calishain. "Search engines don't cover it. It's tough to find breaking news."

Some sites, such as Amazon.com have sections so far from the surface of their home pages that they, too, can be classified as Deep Web, says David Crane, a spokesman for search engine Google ( www.google.com ). An example, says Crane, is "the section that specifically offers a 'portable compact disc player by Sony.'"

But the deepest Deep Web drop-off is in the category of government, and it's getting deeper.

"More and more city and county governments are putting their offerings on the Web. The State of Pennsylvania has a new crime reporting database ( www.ucr.psp.state.pa.us/UCR/ComMain.asp  ), and more and more of that kind of thing is coming up now," says Calishain.

. . .

Two groups of Web experts are also making it their business to provide searchers with information on Deep Web sources.

Calishain's Researchbuzz.com ( www.researchbuzz.com ) chronicles search engines, new data collections ("Online Legal Information in Denmark, Norway and Sweden"), browser software and other Deep Web mining tools that "a research librarian, journalist, educator and others would find helpful, from the perspective of someone who's really going to use it."

And in the early '90s at the University of Wisconsin-Madison, the Internet Scout Project ( www.scout.cs.wisc.edu  ) was started with funding from the National Science Foundation to "inform the higher education and research communities about resources on the Internet," says Scout Director Rachael Bower. The project posts detailed reports each Friday to keep searchers, including the general public, "up to speed" on Deep Web sources.

The Scout Project is driven by five editors who have spent years creating bookmarks and automatically checking changes in existing sources; there's a searchable archive of 11,500 sites available.

"We do supply Deep Web information. A lot of the things you get from Scout an Alta Vista search wouldn't get, or it's buried. Think of it as being a card catalog with information about information," says Bower. "It's one of the first attempts to get librarians to catalog Web resources. All of the editors doing the cataloging are graduate students in the subject or in library science."

 

"RTFM: A Guide to Online Research,"  by Steve Champeon, Webmonkey, February 23, 2001 --- http://hotwired.lycos.com/webmonkey/00/08/index2a.html 

 

So, then, why do so many feel the need to ignore the vast resources available to them, publicly and repeatedly offer up disinformation, and generally offend the basic tenets of the liberal arts education? What can be done to help these people, so obviously confused by their encounter with a badly constructed tutorial, or ruined by unmonitored self-study? I mulled the problem over a strong cup of Kenya AA and suddenly struck my fist into my palm, shouting, "Eureka! We must introduce them to the primary sources!"

 

One of the great things about being a Web designer or developer is that you have access to an enormous collection of tutorials, documentation, specifications, and related materials, no matter what part of the Web you work with.

 

. . . 

The IETF produces several different kinds of "standards":

 

. . .

Armed with this knowledge, go forth and uphold the social contract of the Internet: "Be conservative in what you do, and liberal in what you accept from others." The conservatism is the natural result of having a good reference library at your fingertips. The liberalism extends only so far and doesn't include accepting an uninformed line of bull from somebody on a mailing list.


Update on July 16, 2001

"Two Standards Bolster XML" by Jim Rapoza, eLabs Report, July 16, 2001 

As XML has progressed as the key language for e-business communication, many of the recent standards associated with the markup language have improved its portability and its ability to integrate with various systems. This is again the case with two newest XML standards released by the World Wide Web Consortium: XLink and XML Base.

Although in most ways XML is much more powerful than HTML, HTML has had XML beat when it comes to linking and portability. XLink and XML Base, which were released as W3C recommendations late last month, both seek to fix this weakness in XML, essentially giving it many of the same capabilities as HTML, plus a few improvements.

XLink gives XML documents the same capabilities that URIs have given HTML documents but goes much further by allowing what are called extended links. In its most basic form, XLink gives XML documents the same capabilities that URIs (Uniform Resource Indicators) have given HTML documents—essentially the ability to define hyperlinks internally in a document. However, XLink goes much further by allowing what are called extended links. Extended links can include multiple resources or can point to rich information about links.

To use XLink in a document, an author must define the XLink namespace (by entering xmlns:xlink=http://www.w3.org/1999/xlink). The author can then use XLink tags within the document to define links.

XML Base is even simpler in its scope—so simple that it's somewhat of a surprise that it hasn't been included before. Essentially, as in HTML, XML Base lets a document author define a base URI for a document that can then be used as a relative reference for other links within the document.

An example of using XML Base with XLink would be something like the following. (However, a coder would need to replace every instance of "[" in the code below with an open angle bracket. We replaced every open angle bracket in order to make the code appear as text in this HTML newsletter.)

In this example, the XML Base of "eweek.com/labs" can be referenced by the review xlink at the bottom, meaning the link in full would be http://eweek.com/labs/review.xml .

For more information on XML Base and XLink, click on these links: www.w3.org/TR/2001/REC-xlink-20010627/ 
www.w3.org/TR/2001/REC-xmlbase-20010627/
 


From XML Report on July 18, 2001

It may not give Microsoft's Internet Explorer a run for its money, but W3C has released a free downloadable version of Amaya 5.0, its Web browser and authoring tool.

For Web browsing, W3C claims Amaya offers an interface "similar to that of the most popular commercial browsers" (read IE). But if Amaya doesn't score a big hit with the user community, it does offer developers an easy way to generate either XHTML or HTML pages, along with CSS style sheets, and MathML expressions. The new version also offers enhanced support for SVG, which was lacking in earlier versions.

For teamwork, it all includes an application for collaboration annotation based on RDF, XLink, and XPointer. Further information on the annotation capabilities is available at the Annotea project home page http://www.w3c.org/2001/Annotea .

Versions are available for Solaris 8, Linux, and lest we forget, Microsoft Windows 2000, NT, 95 and 98. Amaya is an open source project, with binaries by operating system, and source code available at http://www.w3.org/Amaya/ .

For more on Tools, go to: http://www.adtmag.com/section.asp?section=tools .


"More XML Fundamentals," by Michael S. Dougherty, DB2 Magazine, Quarter 3, 2001 --- http://www.db2mag.com/db_area/archives/2001/q3/webdev.shtml 

Before you can begin to develop with XML, you've got to know the basics.

When determining how to implement XML with a database, one of the first points to consider is whether the XML implementation will encompass data storage or the overall design of Web pages or will be used primarily for document management. This question is important because using XML with document management is very different than using XML with data storage retrieval. XML handles document management with the Document Object Model (DOM) using a native XML database (one designed specifically for XML storage) or a content management system (an application designed to manage documents that is built with native XML). DOM is an API for accessing content within a Web browser that is written to include information about document structure. DOM allows the developer to dynamically access and update the content, structure, and style of documents. Using the DOM is excellent for document management, but often is not necessary for data management. In spirit with the first installment of this article, we shall focus on data management.

XML has the advantage of being able to change data types when transferring data from one source to another. However, as the technology develops, XML databases are adopting more traditional database characteristics and traditional databases have gained XML database compatibility, so the approaches are starting to overlap.

In the last article, the author described how to use XML to connect to DB2 UDB. Because DB2 is a relational database, the most common connection mechanisms include Microsoft's Open Database Connectivity (ODBC), Sun's Java Database Connectivity (JDBC), and newer hybrids such as Object Linking and Embedding (OLE). The main liability of using these class libraries, as well as those that access native database drivers, is that they are too complex for standard XML use.

Currently, XML interfaces provide the best support for update, delete, insert, and query messages. The interface for handling multiple objects in the database will not be much more complex. Therefore, the XML classes in the development environment provide simple functionality, and may not be sufficient for the types of connectivity requirements of some applications.

To close this gap, a persistent layer to access the database is recommended to allow the programmers to focus on programming and the database administrators to focus on managing the database. A persistent layer should encapsulate the permanent storage mechanisms, separating the programmer from changes. If a table is updated or modified, the XML code should not make direct updates; instead, updates should be made through a data dictionary, which provides the information needed to map objects to tables. Using this method, only the data dictionary changes, and the dictionary is much simpler and safer to update.

USING XML WITH DATABASES

The primary use of relational databases like DB2 UDB 7.2 with regard to XML is to integrate XML styles, tables, and object mapping to the dynamic appearance of Web pages. When mapping data with XML and relational databases, you can choose from several options. Remember that XML is basically a hybrid similar to object databases in data modeling, so it can represent data from an RDMS adequately.

There are plenty of software products that effectively and automatically map XML objects and classes to relational database tables and directly into XML databases. Database vendors such as IBM, Microsoft, Oracle, and Sybase have developed tools to assist in converting XML documents into relational tables. (You'll find an extensive list of available products at www.rpbourret.com/xml/XMLDatabaseProds.htm.)

For DB2, IBM offers the DB2 XML Extender, which lets you store XML documents either as binary large object (BLOB)-like objects or as decomposed data in a set of tables. The latter transformation, known as XML collection, is defined in XML 1.0 syntax.

The IBM Extender provides two methods for using XML:

1. XML columns, which store entire XML documents as DB2 column data.

2. XML collection, which compose or decompose XML documents from or into a collection of relational tables.

DTDs are stored in a DB2 table called DTD_REF. Each DTD in the DTD_REF table has a unique ID. The mapping between the database tables and the structure of the XML document is defined using the Data Access Definition (DAD) file. The DAD references a processed DTD, thus providing the connection between an XML document, its DTD, and DB2 database tables. A short DAD example for an online contact definition follows:


   <?xml version="1.0"?>
   <!DOCTYPE DAD SYSTEM "exampledad.dtd">
   <DAD>
   <dtdid>CONTACTS.DTD</dtdid>
   <validation>YES</validation>
   <Contact_List>
   <prolog>?xml version="1.0"?</prolog>
   <doctype>!DOCTYPE CONTACT CONTACT.DTD </doctype>
   <root_node>
   <element_node name="CONTACT">
   <RDB_node>
   <table name="ONLINE_CONTACT"/>
   <table name="REFERENCES" key="ID"/>
   </RDB_node>
   </element_node>
   </root_node>
   </Contact_List>
   </DAD> 

The DAD defines a mapping between XML elements and relational database columns using element_node to RDB_node associations.

For other parts of the article, go to  http://www.db2mag.com/db_area/archives/2001/q3/webdev.shtml 


VoiceXML Developer Series Everything you ever wanted to know about VXML, but were afraid to ask. http://www.newmedia.com/nm-ie.asp?articleID=3094 

"VoiceXML Developer Series," by NewMedia Staff, NewMedia.com, September 26, 2001 

VoiceXML is an XML format that utilizes existing telephony technology to interact with users over the telephone through speech recognition, speech synthesis, and standard Web technologies. The first edition of the VoiceXML Developer series will provide you with a synopsis of VoiceXML and a glimpse into the technology used to develop VoiceXML applications. Subsequent editions will go into the specific details of creating VoiceXML applications.

Background

The VoiceXML 1.0 specification was released on March 2000 by the VoiceXML Forum which was founded by technologists from Lucent, AT&T, IBM, and Motorola. The group was formed out of the need to create a unified standard for voice dialogs rather than requiring customers to learn several XML specifications that had been developed internally within each of the member's respective research labs (starting as early as 1995). Other non-founders had also experimented with voice dialog XML formats including HP's TalkML and Sun's Java Speech Markup Language (JSML).

All of this led up to October 2000, when the VoiceXML Forum released VoiceXML 1.0 to the Voice Browser Group (founded in 1998) of the World Wide Web Consortium (W3C), the recognized standards body for the Web. This independent body has been working on the second version of the specification and have announced that it will release a revised specification sometime towards the end of 2001.

The nascent industry has grown rapidly since its millennium debut into a market that is expected to reach $200 million dollars in 2001 and reach $24 billion by 2005. The industry has been driven in part by an existing marketplace that has utilized Interactive Voice Response (IVR) systems for call center automation; think "Press 1 for your account balance. Press 2 to transfer funds". You've probably used such a system to check your bank or credit card balances.

So VoiceXML fills an existing need for automation by improving upon the current technology and making it simpler to implement and integrate into the rest of the enterprise. VoiceXML also provides a new opportunity for companies that have not been able to afford the cost or complexity of an IVR system by using standard telephony components and leverage its existing Web infrastructure, applications, and developer skills.

Technologies

A VoiceXML system is made up of of a VoiceXML gateway that accesses static or dynamic VoiceXML content on the Web. The gateway contains a VoiceXML browser (interpreter), Text-To-Speech (TTS), Automatic Speech Recognition (ASR), and the telephony hardware that connects to the Public Switched Telephone Network (PSTN) via a T1, POTS, or ISDN telephone connection. A Plain Old Telephone Server (POTS) line is the type that's installed in your home and can only handle a single connection whereas a T1 contains 24 individual phone lines.

 

Bob Jensen's Home Page XML and RDF Overview XML Software Review Table of Contents
Offline References Online References Technology Glossaries XML FAQs

The document above is a disorganized collection of threads on XML and related topics.  For a more organized introduction to these topics, go to Jensen's Overview and Timeline of OLAP, GML, SGML, HTML, XML, RDF, and XBRL at http://www.trinity.edu/rjensen/XBRLandOLAP.htm

WEB TIMELINE
Hypertext ---> PC ---> GUI,Mouse ---> GML,SGML --->Internet --->Hypermedia --->HTML,HTTP,WWW --->
DYNAMIC WEB TIMELINE                
CGI,Java,JavaScript,DHTML,ActiveX,ASP ---> XML --->RDF ---> OLAP ---> HBRL