Visualization of Multidimensional Data (A Preliminary Working Draft)
Bob Jensen at Trinity University
 


IBM's Website for Data Visualization --- --- http://services.alphaworks.ibm.com/manyeyes/app 
IBM's site lets people collaborate to creatively visualize and discuss data on fast food, Jesus' apostles, greenhouse-gas trends, and more.

"Sharing Data Visualization," by Kate Greene, MIT's Technology Review, April 11, 2007 --- http://www.technologyreview.com/Infotech/18516/ 

IBM is showing that there's more to the social Internet than just sharing pictures and video clips. The company has launched a new website, called Many Eyes, with the hope of adding a social aspect to data visualizations like maps, network diagrams, and scatter plots. The site's users already include Christian bloggers, nutritionists, and professors.

Many Eyes teaches people how to build their own visualizations (a simple tutorial can be found here) so that they can dive into complex, multidimensional data. Since its launch in January, the site has amassed nearly 2,000 visualizations that illustrate, for example, the carbon emission of cars and the nutritional information of food on a McDonald's menu. For example, by illustrating numbers graphically, users see how Big Macs compare with double cheeseburgers in terms of calories, fat, and sodium--differences that might be harder to spot on a chart of numbers.

Many Eyes was developed by Martin Wattenberg and Fernanda Viegas, researchers at IBM's Visual Communication Lab, in Cambridge, MA. To be sure, Many Eyes is not the first, or even the most powerful, data-visualization tool available. Spotfire, for instance, is well-known software that businesses use to visualize and analyze trends. But what makes Many Eyes novel is that it's explicitly designed to be a social site for sharing visualizations and analysis; it's essentially the Flickr of data plots.

While the field of data visualization in general isn't new, it has seen a sort of rebirth in the past few years thanks to the availability of software tools that explore data sets, as well as the ubiquity of data sets themselves, says Ben Shneiderman, a professor of computer science at the University of Maryland, in College Park. "It's one of those things that after 15 years, it's an overnight success." Recently, Shneiderman says, data visualizations have gone from static charts commonly used in PowerPoint presentations to dynamic displays of multidimensional data. "Suddenly," he says, "we've been given a new eye to see things that we've never seen before."

The IBM software was built using standard software architectures, says Wattenberg; the visualizations are displayed using Java, and there are a few somewhat sophisticated algorithms that crunch numbers and produce the graph layouts. Ultimately, he says, he and Viegas wanted a simple, immersive experience. "The more that it becomes almost gamelike in its level of activity, the more fun it becomes."

Within days of Many Eyes going live, the researchers saw a big spike in traffic from a user-generated visualization. A user named "crossway" had uploaded a data set of names from the New Testament and how often they occurred near one another in the text. The user chose to visualize the data using a network diagram; the result was essentially an illustration of the social network of Jesus and his apostles. Crossway posted the network diagram on his or her well-trafficked Christian blog, and soon awareness of the visualization moved from the Christian community into the technology community, thanks to an appearance on the popular blog BoingBoing.net.

 

 


Microsoft's Shiny New Toy Photosynth is an application that's still a work in progress.
It is dazzling, but what is it for?

Jeffrey McIntyre, MIT's Technology Review, March/April 2008 --- http://www.technologyreview.com/Infotech/20203/?nlid=915&a=f
Watch Photosynth stitch photos together
View the images and see how it works

Jensen Comment
It struck me that if a company's financial report could be visualized in a photograph then Photosynth might be used to stitch various financial reports together.


Now for College Males Seeking an Unknown Roommate
How to assess the beauty of a woman's face

"Grad Student Creates a Hot-or-Not Bot:  An Israeli computer-science grad student has designed a program that judges how attractive women are," by Catherine Rampell, Chronicle of Higher Education, April 4, 2008 ---

According to Haaretz, the program identifies basic facial features that are considered beautiful. For his master’s thesis at Tel Aviv University, Amit Kagian had human participants rate the beauty of photographed faces. He then processed the photos and mathematically mapped the faces by computer, coming up with 98 numbers that represent the geometric shape of the face, hair color, smoothness of skin, facial symmetry, and other characteristics. The computer then uses these dimensions to predict how human subjects would rate other female faces.

The study only covered female faces because “there is a greater variety of positions regarding male beauty,” Haaretz said.

 

Bob Jensen's threads on mixed gender roommates in college are at http://www.trinity.edu/rjensen/HigherEdControversies.htm#DatingRoommates


Question
What does a student's blinkless stare signify?

a. Daydreaming
b. Confusion
c. Anger
d. Drug trip

"Facial-Recognition Software Could Give Valuable Feedback to Online Professors." Jeffrey R. Young, Chronicle of Higher Education, June 27, 2008 --- http://chronicle.com/wiredcampus/index.php?id=3126&utm_source=wc&utm_medium=en

Many professors who teach online complain that they have no way of seeing whether their far-away students are following the lectures — or whether the students have fallen asleep at their desks. But researchers at the University of California at San Diego say they have a solution. They recently tested a system that can detect facial expressions of online students and determine when they find the material difficult, so that cues could be sent to the professors telling them to slow down.

Jacob Whitehill, a doctoral student at the university working on the research, presented results from the experiment this week at the Intelligent Tutoring Systems 2008 conference in Montreal.

In the experiment, eight subjects were shown short video clips of lectures while a Web cam tracked their facial expressions — looking for smiles, blinks, raised eyebrows, and the like. The subjects were then asked to report how difficult they found each section, and to take a quiz on the material. Mr. Whitehill says that the system correctly detected when students were having trouble (the most reliable indicator: students blinked less when they were struggling to understand).

The system could be used to give valuable feedback to professors teaching online, says Mr. Whitehill. “It’s not going to be perfect by any means,” he says, but it’s better than no student feedback at all. “Professors say that they can’t see the students. This could do it for them automatically.”

Bob Jensen's threads on tricks and tools of the trade in education technology are at http://www.trinity.edu/rjensen/000aaa/thetools.htm


Speak to Me Only With Thine Eyes:  The Sound of Colors for the Blind
Researchers at the Balearic Islands University in Spain are developing a device that will allow blind children to distinguish colors by associating each shade to a specific sound. The project, dubbed COL-diesis, is based on the synesthesia principle--a confusion of senses where people involuntarily relate the real information gathered by one sense with a different sensation. "Only 4 percent of the population are true synesthetes, but everybody else is influenced by associations between sounds and colors," said Jessica Rossi, one of the coordinators of the project. For example, people tend to associate light colors with high-pitched sounds. "We want to give the user a device that allows [blind children] to chose specific associations of colors and sounds based on each user's sensitivity," Rossi said. The device will include a sensor the blind kids will wear on their fingertips to touch the objects they want to know the colors of, and a bracelet that will transform the color into a sound. The researchers expect to have their prototype ready by September.
Maria José Viñas, Chronicle of Higher Education, June 23, 2008 --- http://chronicle.com/wiredcampus/index.php?id=3109&utm_source=wc&utm_medium=en
Jensen Question
Do we need multiple sounds for some colors? For example, there's Wall Street green, Al Gore's green, vegetable green, freshman green, and seasick green.

Bob Jensen's threads on technology aids for handicapped learners are at http://www.trinity.edu/rjensen/000aaa/thetools.htm#Handicapped

Jensen Comment for Accountants
Proposed (actually now optional) fair value financial statements have so many shades of accuracy regarding measurements of financial items. Cash counts are highly accurate along with cash received from sales of financial instruments. Unrealized earnings on actively traded bonds and stocks are quite accurate according to FAS 157. Value estimates of interest rate swaps may be inaccurate but inaccuracy doesn't matter much since these value changes will all wash out to zero when the swaps mature. Color them blah. Value estimates of most anything highly unique, like parcels of real estate, are highly subjective and prone to fraud among appraisal sharks. Color them scarlet!

Our Students Might Actually Like Color Book Accounting
Could we add information to fair value financial statements by colorizing them according to degrees of uncertainty and accuracy? And could we add sounds of uncertainty so that SEC-recommended bracelets could listen to the soothing waltzes Strauss (read that cash) and the rancorous hard rock-sounding shares in a REIT. What sounds and colors might you give to FIN 41 items Amy?

Bob Jensen's threads on visualization of multivariate data are shown below.
I think the tidbits below are interesting, but I never get any feedback about these tidbits.
There are all sorts of research opportunities in visualization of multivariate fair value financial performance!

Bob Jensen's threads on alternative valuations in accounting are at http://www.trinity.edu/rjensen/theory01.htm#UnderlyingBases


Question
What new technology reads emotions in faces?

A demonstration version of the face detection and analysis software package is available for download at: http://www.iis.fraunhofer.de/EN/bf/bv/kognitiv/biom/dd.jsp 

"Happy, sad, angry or astonished?" PhysOrg, July 3, 2007 ---

An advertisement for a new perfume is hanging in the departure lounge of an airport. Thousands of people walk past it every day. Some stop and stare in astonishment, others walk by, clearly amused. And then there are those who seem puzzled when they look at the poster.

With the help of a small video camera, the system automatically localizes the faces of everyone who walks past the advertisement. And nothing escapes its watchful eye: Does the passerby look happy, surprised, sad or even angry?

The system for rapid facial analysis is being developed by researchers at the Fraunhofer Institute for Integrated Circuits IIS in Erlangen. Highly complex algorithms immediately localize human faces in the image, differentiate between men and women and analyze their expressions.

“The special feature of our facial analysis software is that it operates in real time,” says Dr. Christian Küblbeck, project manager at the IIS. “What’s more, it is able to localize and analyze a large number of faces simultaneously.” The most important facial characteristics used by the system are the contours of the face, the eyes, the eyebrows and the nose. First of all, the system has to go through a training phase in which it is presented with huge quantities of data containing images of faces. In normal operation, the computer compares 30,000 facial characteristics with the information that it has previously learned.

“On a standard PC, the calculations are carried out so quickly that mood changes can be tracked live,” explains Küblbeck. However, we do not need to worry about an invasion of our privacy, as the software analyzes the data on a purely statistical basis.

The software package is not only of interest to advertising psychologists; there are numerous potential applications for the system. It can be used, for example, to test the user-friendliness of computer software programs. The system monitors the facial expressions of the user in order to determine which aspects of the program arouse a particularly strong reaction. Alternatively, it can assess the reactions of the users of learning software, in order to establish the extent to which they are put under stress or challenged by the task they are performing. The system could also be used to check the levels of concentration of car drivers.

A demonstration version of the face detection and analysis software package is available for download at: http://www.iis.fraunhofer.de/EN/bf/bv/kognitiv/biom/dd.jsp 

Question
What new technology reads emotions in faces?

A demonstration version of the face detection and analysis software package is available for download at: http://www.iis.fraunhofer.de/EN/bf/bv/kognitiv/biom/dd.jsp 

"Happy, sad, angry or astonished?" PhysOrg, July 3, 2007 ---

An advertisement for a new perfume is hanging in the departure lounge of an airport. Thousands of people walk past it every day. Some stop and stare in astonishment, others walk by, clearly amused. And then there are those who seem puzzled when they look at the poster.

With the help of a small video camera, the system automatically localizes the faces of everyone who walks past the advertisement. And nothing escapes its watchful eye: Does the passerby look happy, surprised, sad or even angry?

The system for rapid facial analysis is being developed by researchers at the Fraunhofer Institute for Integrated Circuits IIS in Erlangen. Highly complex algorithms immediately localize human faces in the image, differentiate between men and women and analyze their expressions.

“The special feature of our facial analysis software is that it operates in real time,” says Dr. Christian Küblbeck, project manager at the IIS. “What’s more, it is able to localize and analyze a large number of faces simultaneously.” The most important facial characteristics used by the system are the contours of the face, the eyes, the eyebrows and the nose. First of all, the system has to go through a training phase in which it is presented with huge quantities of data containing images of faces. In normal operation, the computer compares 30,000 facial characteristics with the information that it has previously learned.

“On a standard PC, the calculations are carried out so quickly that mood changes can be tracked live,” explains Küblbeck. However, we do not need to worry about an invasion of our privacy, as the software analyzes the data on a purely statistical basis.

The software package is not only of interest to advertising psychologists; there are numerous potential applications for the system. It can be used, for example, to test the user-friendliness of computer software programs. The system monitors the facial expressions of the user in order to determine which aspects of the program arouse a particularly strong reaction. Alternatively, it can assess the reactions of the users of learning software, in order to establish the extent to which they are put under stress or challenged by the task they are performing. The system could also be used to check the levels of concentration of car drivers.

A demonstration version of the face detection and analysis software package is available for download at: http://www.iis.fraunhofer.de/EN/bf/bv/kognitiv/biom/dd.jsp 


Google's Contribution to Data Visualization

June 1, 2006 message from Brown, Curtis [cbrown@trinity.edu]

I just stumbled across some very interesting tools for visualizing data that I can't resist sharing. There's a wild play-with-it-yourself tool at http://tools.google.com/gapminder/ , and some prepackaged presentations at http://www.gapminder.org

I went through the "Human Development Trends 2005" presentation at the second link above and found it fascinating and informative (and also helpful for developing a sense of the significance of the images in the do-it-yourself tool at the first link).

A minor frustration: toward the end, the presentation includes data on income and child mortality distribution within 42 different countries (it gives the income and child mortality rates of the poorest 20% of the population of the country, the next richest 20%, etc.), but it only has average data for the United States (as far as I could see). I wonder why? Anyone know how to find comparable data for the US?

Curtis

Curtis Brown
Philosophy Department
Trinity University
One Trinity Place
San Antonio, TX 78212

 


Do you suppose we could also add CEO emotions to annual reports?
Or maybe this is the dawn of emotional corporate logos!

"The New Face of Emoticons:  Warping photos could help text-based communications become more expressive," by Duncan Graham-Rowe,  MIT's Technology Review, March 27, 2007 --- http://www.technologyreview.com/Infotech/18438/

Computer scientists at the University of Pittsburgh have developed a way to make e-mails, instant messaging, and texts just a bit more personalized. Their software will allow people to use images of their own faces instead of the more traditional emoticons to communicate their mood. By automatically warping their facial features, people can use a photo to depict any one of a range of different animated emotional expressions, such as happy, sad, angry, or surprised.

All that is needed is a single photo of the person, preferably with a neutral expression, says Xin Li, who developed the system, called Face Alive Icons. "The user can upload the image from their camera phone," he says. Then, by keying in familiar text symbols, such as ":)" for a smile, the user automatically contorts the face to reflect his or her desired expression.

"Already, people use avatars on message boards and in other settings," says Sheryl Brahnam, an assistant professor of computer information systems at MissouriStateUniversity, in Springfield. In many respects, she says, this system bridges the gap between emoticons and avatars.

This is not the first time that someone has tried to use photos in this way, says Li, who now works for Google in New York City. "But the traditional approach is to just send the image itself," he says. "The problem is, the size will be too big, particularly for low-bandwidth applications like PDAs and cell phones." Other approaches involve having to capture a different photo of the person for each unique emoticon, which only further increases the demand for bandwidth.

Li's solution is not to send the picture each time it is used, but to store a profile of the face on the recipient device. This profile consists of a decomposition of the original photo. Every time the user sends an emoticon, the face is reassembled on the recipient's device in such a way as to show the appropriate expression.

To make this possible, Li first created generic computational models for each type of expression. Working with Shi-Kuo Chang, a professor of computer science at the University of Pittsburgh, and Chieh-Chih Chang, at the Industrial Technology Research Institute, in Taiwan, Li created the models using a learning program to analyze the expressions in a database of facial expressions and extract features unique to each expression. Each of the resulting models acts like a set of instructions telling the program how to warp, or animate, a neutral face into each particular expression.

Once the photo has been captured, the user has to click on key areas to help the program identify key features of the face. The program can then decompose the image into sets of features that change and those that will remain unaffected by the warping process.

Finally, these "pieces" make up a profile that, although it has to be sent to each of a user's contacts, must only be sent once. This approach means that an unlimited number of expressions can be added to the system without increasing the file size or requiring any additional pictures to be taken.

Li says that preliminary evaluations carried out on eight subjects viewing hundreds of faces showed that the warped expressions are easily identifiable. The results of the evaluations are published in the current edition of the Journal of Visual Languages and Computing.

Continued in article

Bob Jensen's threads on visualization of multivariate data are at
http://www.trinity.edu/rjensen/352wpvisual/000datavisualization.htm 


Software that recognizes faces on your photographs
(after some training as to what face goes with what person)

"Filing Photos by Face," by Leslie Walker, The Washington Post, February 8, 2006 --- http://snipurl.com/WPFeb8

One of the best afternoon demos came from Riya, a company using face recognition and automated text-reading techniques to classify people's digital photo collections.

Its software uses image-analysis to index or "tag" photos on the fly. It tries to recognize faces and automatically label them as, say, your Uncle Rupert. Riya's software also reads text inside images, like any signs or words that appear on computer screens.

Riya chief executive Munjal Shah showed the audience how people can manually train Riya to recognize faces by uploading photos of that person to Riya's Web site and providing their name.

In the demo, Riya scanned his laptop to search for faces matching ones he'd uploaded of his son -- it even found one photo of Shah in which a framed photo of his son hung behind him on the wall.

Riya's service resides on the Web, which I gather means you have to upload your photos to a Flickr-like Web site in order for it to analyze your photos. The service is in a private testing now, but will open for public testing in two weeks, Shah said.

The Ria home page is at http://www.riya.com/

Jensen Comment
This reminds me of main frame computer software that I used to use to make Chernoff Faces made from multivariate data having up to 18 variables. Professor Chernoff was a former professor of mine who gave me his main frame computer program. One of the problems was subjectivity in clustering "similar faces." It is possible these days to make real faces rather than cartoon faces from multivariate data. I wonder if Ria software could be adapted to cluster similar faces?
 

You can scroll down this document to see examples of my Chernoff faces.


January 16, 2005 message from my graduate assistant

Dr. Jensen, 

I searched for some software to graph multivariate and multidimensional data, and while a lot of them cost a good sum of money or required the use of linux or unix OS, I found a couple that could perhaps be useful and are free to the public domain. If you want to check them out and let me know what you think, they are:

*Xgobi: http://www.research.att.com/areas/stat/xgobi/  
(by its description, looks like this program could do a lot, although I haven't downloaded it yet since its instructions are a handful)

*Vista: http://forrest.psych.unc.edu/research/  
(says it can be used in conjunction with Excel, which would be the best of both worlds)

Chris


An important observation by Phillip Long:

Why does this matter? Because we are asking our students to learn more and more from a monitor. Getting clear thoughts across on the printed page has always been a challenge. Doing it with a computer is harder, even with the unique attributes it has over the static page. But clear thinking visually is not just good teaching, it can be a matter of life and death.

The Challenger disaster, for instance, could have been avoided if the visual representation of quantitative data had been clear. The engineers knew there was a problem nearly 12 hours before the launch and voted to postpone it. But when challenged to justify their argument, the contractors presented tables and charts, none of which brought the essential point to light: the causal relationship between temperature and O-ring damage at launches.

The sad fact is that had the data been ordered by temperature, it would have shown a direct correlation with O-ring damage. The Challenger launch temperature was six standard deviations outside the range for which they had actual engineering data. It was, as they say, a disaster waiting to happen.

"The Visual Display of Data," by Phillip D. Long, Syllabus, December 2002, Page 8 --- http://www.syllabus.com/article.asp?id=6987 

Visual representation of multidimensional data should be of particular interest in accountancy in modern times as we move toward improved networking of data with OLAP, XBRL, EDGAR, and other advances in reporting of financial and non-financial measures --- http://www.trinity.edu/rjensen/XBRLandOLAP.htm 

"The Visual Display of Data," by Phillip D. Long, Syllabus, December 2002, pp. 6-8 --- http://www.syllabus.com/article.asp?id=6987 

The computer has provided a revolutionary tool to represent information visually. Its power is clearly demonstrated by the captivating power of today's video games. While usually describing a narrative of mayhem and destruction, the stunningly seductive rendering of 3D imagery in video games draws the gamer into new visual worlds. It also has the power to bring forward data from multiple dimensions to render information.

One of the most stunning multidimensional graphical representations of human folly was created 141 years ago by Charles Joseph Minard, a French engineer and general inspector of bridges and roads. Sometimes called the "best statistical graphic ever produced," and a work that "defies the pen of the historian," Minard drew a flow-map depicting the tragic fate of Napoleon's Grand Army in the disastrous 1812 Russian campaign. Using pen and ink, Minard captured on the two-dimensional page no fewer than six dimensions of descriptive data.

Edward Tufte, an information designer who, for over three decades, has cultivated the art and science of making sense of data, has eloquently described Minard's map.

The thick band in the middle describes the size of Napoleon's army, 422,000 men strong, when he began the invasion of Russia in June of 1812 from the Polish-Russian border near the Niemen River. As the army advances, the line's thickness reflects its size, narrowing to reflect the attrition suffered during the advance on Moscow. By the time the army reached Moscow (right most side of the drawing), it had been reduced to 100,000 men, one-quarter of its initial size. The lower black line depicts the retreat of Napoleon's army, and the catastrophic effect of the bleak Russian winter. The line of retreat is linked to both dates and temperature at the bottom of the graphic. The harsh cold reduced the army to a mere 10,000 men by the time it re-crossed into Poland. In addition to the main army, Minard characterizes the actions of auxiliary troops who move to protect the advancing army's main flanks.

Minard's map is a tour de force of data representation, an escape from flatland. He conveys a central reality about the world: Things that are interesting are multidimensional. Minard captures and plots six variables: the size of the army (1); the army's location on a two-dimensional surface (2, 3); direction of the army's movement (4); the temperature on various dates during the retreat from Moscow (5, 6).

The truth is nearly everything is multidimensional. Consider giving directions. Telling someone how to get from Logan airport to Cambridge at different times of the day requires the traveler to juggle information in four dimensions.

Continued at http://www.syllabus.com/article.asp?id=6987 


"LAPD Studies Facial Recognition Software," The Associated Press, The New York Times, December 25, 2004 --- http://www.nytimes.com/aponline/technology/AP-Facial-Recognition.html  

The Los Angeles Police Department is experimenting with facial-recognition software it says will help identify suspects, but civil liberties advocates say the technology raises privacy concerns and may not identity people accurately.

``It's like a mobile electronic mug book,'' said Capt. Charles Beck of the gang-heavy Rampart Division, which has been using the software. ``It's not a silver bullet, but we wouldn't use it unless it helped us make arrests.'

But Ramona Ripston, executive director of the American Civil Liberties Union of Southern California, said the technology was unproven and could encourage profiling on the basis of race or clothing.

``This is creeping Big Brotherism. There is a long history of government misusing information it gathers,'' Ripston said.

The department is seeking about $500,000 from the federal government to expand the use of the technology, the Los Angeles Times reported Saturday. Police have been testing it on Alvarado Street just west of downtown Los Angeles.

In one recent incident, two officers suspected two men illegally riding double on a bicycle of being gang members. If they were, they may have been violating an injunction that barred those named in a court documents from gathering in public and other activities.

As the officers questioned the men, Rampart Division Senior Lead Officer Mike Wang pointed a hand-held computer with an attached camera at one of the men. Facial-recognition software compared his image image to those of recent fugitives, as well as dozens of members of local gangs.

Within seconds, the screen displayed nine faces that had contours similar to the man's. The computer said the image of one particular gang member subject to the injunction was 94 percent likely to be a match.

That enough to trigger a search that yielded a small amount of methamphetamine. The man did turn out to be the gang member, and was arrested on suspicion of violating the injunction by possessing illegal drugs. The city attorney's office has not yet decided whether to charge the man.

The LAPD has been using two computers donated by their developer, Santa Monica-based Neven Vision, which wanted field-testing for its technology. The computers are still considered experimental.

The Rampart Division has used the devices about 25 times in the two months officers have been testing them. The technology has resulted in 16 arrests for alleged criminal contempt of a permanent gang injunction, and three arrests on outstanding felony warrants.

On one occasion, the computer was used to clear a man the officers suspected of being someone else, police said.

So far, the city attorney has filed seven injunction cases in arrests that involved the technology. A judge dismissed a case after questioning the technology, but it has been refiled. Suspects in two cases pleaded guilty.

Continued in article


Books and Seminars of Edward R. Tufte --- http://www.edwardtufte.com/1635855389/tufte/ 

Also see http://www-users.cs.york.ac.uk/~susan/bib/nf/t/tufte.htm 


Hi Chuck,

One of my major professors at Stanford was Yuji Ijiri. One of his major research contributions was a monograph on triple-entry accounting. But the theory never took off in practice. Perhaps all that is needed is this new Adobe Atmosphere software.

Thanks,

Original Message----- 
From: White, Charles 
Sent: Wednesday, May 07, 2003 12:07 PM 
To: Jensen, Robert Subject: 3D authoring tool

Bob:

Check this one out. The beta download is available for us to use. Our new NMC participation brought this to my attention as the consortium is looking for collaborative projects involving this software.

http://www.adobe.com/products/atmosphere/main.html 

May 8, 2003 reply from Paul Williams [williamsp@COMFS1.COM.NCSU.EDU

Succinctly: Ijiri analogized the wealth process to Newtonian mechanics. The third dimension was force (his monograph and Star Wars were near contemporaries, so Professor Ijiri heard "let the force be with you" more times than he probably cared to). The general idea is that earnings are the first derivative of capital, so, by analogy, the second derivative (the rate of change in income) is a logical extension and the logical third dimension of an accounting recording system (Ijiri posed the problem of whether there were logically more dimensions beyond two in his Theory of Accounting Measurement and multiple-classifications did not qualify as a solution. As Bob Jensen has noted elsewhere, Professor Ijiri was intrigued by mathematical puzzles, notably the 4-color map problem, and he puzzled over whether accounting logically had more than two dimensions (causal double entry, not merely classificatory double entry). Triple entry accounting was his proposed solution to the problem. Practically speaking it likely never caught on because analogizing to the natural world, we have learned, can be dangerous to understanding, particularly when it is to 18th century models of the natural world (Adam Smith, for example). The randomness of the phenomenon accounting attempts to measure (represent) makes it doubtful that wealth has a second derivative (or first one for that matter) in any practical sense for an individual firm. I make my students in my Masters Class read Professor Ijiri's Theory of Accounting Measurement. His work, I believe sadly, has been lost to new accounting scholars. He was nearly unique as a scholar who thought deeply about accounting problems using concepts and ideas from other fields to enhance rather than replace reasoning in the terms that belong to accounting. In light of the recent accounting scandals, perhaps the SEC and FASB should visit some of Ijiri's ideas (hardness, for example, and the notion that accounting is about accountability!!!). 
PFW

Data Visualization in Accounting Richard Dull [rdull@CLEMSON.EDU

My dissertation (Virginia Tech, 1997) used "triple entry" (aka "momentum accounting") as a problem space for looking at 2D & 3D visualizations. I found that when I talked about the "momentum accounting" part of the study, there were polar reactions -- "it's an interesting idea" and "it's totally off-the-wall".

I still believe the concept has significant merit, and believe Dr. Ijiri will be someday be better recognized for his contribution, as technology makes his ideas feasible.

Far from a "succinct summary" my dissertation is available on line at http://scholar.lib.vt.edu/theses/available/etd-81197-165010/  . It not only gives some background, pro's and con's regarding momentum accounting, it also offers some visualization ideas. (Side note: There was a paper published from it, with David Tegarden, in JIS, Fall 1999.)

Richard Dull


Learners do not need as much reality built into simulations as is commonly believed.
How Much Reality Does Simulation Need?  by Phillip D. Long, Syllabus, February 2003, Page 6 --- http://www.syllabus.com/article.asp?id=7255 

Today's students are immersed in a world of images that draw them into multi-sensory experiences. These are often provided by various entertainment genres, from video games (individual or multi-user) to movies. Young people and old find the engagement compelling, which has lead to the burgeoning gaming industry and laments from the English faculty about the deterioration of linear narrative.

Developments in computer graphics have brought a new realism to video games, movies, and simulations. Blending reality with a suspension of physical constraints made possible by computer simulation has given rise to characters such as Spiderman, who swings by a thread through the canyons of Manhattan. We perceive that experience unfolding as "real." Now, while we certainly remember these scenes from the cinema, if the same computational power were applied to learning would the impact be as powerful?

Chris Dede at Harvard has been studying the impact of adding multi-sensory perceptual information to aid students struggling to understand complex scientific models. He and his colleagues have built virtual environments such as NewtonWorld and MaxwellWorld to test how they affect learning. Providing experiences that leverage human pattern recognition capabilities in three-dimensional space (e.g., shifting among various frames-of-reference and points-of-view) also extends the perceptual nature of visualization.

Their work has concentrated on middle school students who have not scored well on standardized tests of scientific understanding. Among the questions they are investigating is what the motivational impact that graphical multi-user simulation environments have on learning. These environments include some or all of the following characteristics: 3-D representations; multiple perspectives and frames-of-reference; multi-modal interface; simultaneous visual, auditory, and haptic feedback; and interactive experiences unavailable in the real world such as seeing through objects, flying like Superman, and teleporting.

What have they found? With careful design, the characteristics of multi-dimensional virtual environments can interact to create a deep sense of motivation and concentration, thus helping students to master complex, abstract material.

This might suggest that the more realistic the virtual environment becomes the better the learning. Maybe. Of course, these technology-infused approaches to learning are the modern day version of John Dewey's assertion that students learn by doing. Translated into today's computer-enhanced learning environment, the rich perceptual cues and multi-modal feedback (e.g., visual, auditory, and haptic) that are provided to students in virtual environments enable an easier transfer of simulation-based training to real-world skills (Dede, C., Salzman, M.C.; Loftin, R. B.; and Sprague, D., 1999).

Continued at http://www.syllabus.com/article.asp?id=7255 

 


Visual display of multidimensional data has been a special interest of mine over the years.  I devoted an entire chapter to this topic in a research monograph that I wrote in 1976.    

Quest for Types: Condensation, Display, and Numerical Taxonomy
Chapter 6 in Phantasmagoric Accounting 
by Bob Jensen at Trinity University
(American Accounting Association:  Studies in Accounting Research No. 14, 1976, pp. 103-149)

Chapter 6

All the real knowledge which we possess, depends on methods by which we distinguish the similar from the dissimilar.  The greater number of natural distinctions this method comprehends, the clearer becomes our idea of things.  The more numerous the objects which employ our attention the more difficult it becomes to form such a method and the more necessary.

For we must not join in the same genus the horse and the swine, tho' both species had been one hoof'd nor separate in different genera the goat, the reindeer and the elk, tho' they differ in the form of their horns.  We ought therefore by attentive and diligent observation to determine the limits of the genera, since they cannot be determined a priori.  This is the great work, the important labour, for should the Genera be confused, all would be confusion.  
[Carolus Linaeus, Swedish Botonist, Genera Plantarum, 1739]

General observations drawn from particulars are the jewels of knowledge, comprehending great store in a little room.  
[John Locke, 17th Century British Philosopher]

Science is built up with facts, as a house is with stones.  But a collection of facts is no more a science than a heap of stones is a house.  
[Jules Henri Poincare, French Mathematician, La Science et l'Hypothese, 1908]

Throughout the history of the development of scientific method the only lasting theories have been those that began with good observation, with noting peculiar relations among measurements, or with firm groundwork of classificatory, taxonomic, and clinical experience.  In those cases where theory appears to have preceded observation, it will often be found that the theory that preceded measurement is the same as the post-measurement theory in name only.  
[Raymond B. Cattell, Research Professor in Psychology at the University of Illinois (Urbana), in Personality and Motivation Structure and Measurement (New York: World Book Company, 1957, p. 3)]

Comparing mills is like comparing apples and oranges.  No two are identical and the local environmental problems and priorities are different.  
[J. L. McClintock, Weyerhaeuser Corporation, as quoted in Paper Profits: Pollution in the Pulp and Paper Industry (New York: Council on Economic Priorities, 1971)]

One picture is worth more than ten thousand words.  
[Anonymous Chinese Proverb]

  In thy face I see
The map of honor, truth, and loyalty.
  
[Shakespeare, Henri VI]

His face is the worst thing about him.  
[Shakespeare, Measure for Measure]

When men are calling names and making faces,
    And all the world's ajangle and ajar,
I meditate on interstellar spaces
    And smoke a mild seegar.
  

[Burt Leston Taylor, 19th Century Poet, Canopus]

 

6.1--Introduction

The purpose of this chapter is largely to consider a number of approaches in taxonomy and the quest for empirical types.  The approaches discussed later on in this chapter are those which either (i) result in sensory displays (confined here to visual displays) enabling human observers to search for "types" in a subjective manner, or (ii) result in mathematical partitionings of entities into "types" via numerical taxonomy techniques.  The analysis may consist of more than merely searching for types on the basis of multivariate corporate social impacts such as those illustrated in Appendix A.  A point made repeatedly in earlier chapters is that corporate social accountings will typically yield masses of data, some of which are qualitative and some of which are quantitative but measured in differing units (percentages, man-hours, tons, cubic yards, dollars, etc.).  In such situations some type of parsimony is needed for both reporting and analyzing such a hodgepodge of disconnected facts.  The accustomed accounting procedure of converting everything to monetary units and then aggregating by arithmetic methods (usually addition) to achieve parsimony in social accounting is fraught with difficulties.  The usual statistical multivariate data analysis techniques (e.g., multiple regression, discriminant, factor and variance analyses) are somewhat more flexible, but frequently suffer from overly restrictive assumptions and/or difficulties in interpretation.

The major purpose of Chapter 6 is to explore some more general techniques for condensing and evaluating multivariate quantitative data, although some of the techniques may also accommodate qualitative differences.  In an effort to avoid being too abstract, such techniques are applied to a number of social accounting variables observed on twelve electric utility companies.  Particular emphasis is placed upon graphic and other visual display techniques under varying circumstances.  Several important data transformations and numerical taxonomy are also examined.

 

6.2--Theory of Types

Raymond Cattell, authority of personality typology, once stated:
    ...The Experience of science is that a tidy taxonomy is never useless, but full of systematic profits for research.  For example, in many social psychological problems, in which one person is the stimulus situation for the behavior of another, perceptions depend on type affiliations.  Types are thus not just unnecessary intermediate concepts--not just another instance of academic punditry or compulsion--but, if properly conceived, necessary and economical operational concepts...1

The term "type" has intuitive meaning to nearly everyone, although forming a precise definition (along with related concepts such as group, pattern, cluster, configuration, factor, genus, species, etc.) is difficult.2  Entities classified as a type supposedly are "more alike" in terms of certain properties than other entities not of that type.  Different properties (attributes, traits, etc.) may give rise to different groupings of entities into types.  In addition, what constitutes a "type" depends on the basis for defining similarity (association, distance, affinity, interaction, etc.) and precise constraints imposed by the definition of what constitutes or does not constitute a "type."  For example, "types" may be mutually exclusive versus intersecting, collectively exhaustive versus selective, discrete partitions versus having gradations of belongedness, and so on.

Ball lists seven uses of cluster analysis which apply to the quest for types in general:

  1. Finding a true typology;

  2. Model fitting;

  3. Prediction based on groups;

  4. Hypothesis testing;

  5. Data exploration;

  6. Hypothesis testing;

  7. Data reduction.3

These are not necessarily mutually exclusive, and prediction seemingly may arise under any of the above purposes.  Cattell writes:
    Briefly to indicate what this second step may comprise, one should point out that Aristotelian classification permits one to make predictions of the kind: "This is a dog; therefore it may bite"; "This is a schizophrenic; therefore the prospect of remissions is not high."  In other words, a classification of objects by variables of one kind may permit prediction on others not at the time included.  Parenthetically, despite the illustrations, these predictions need not be categorical, but can be parametric.4

I do not pretend to be the first to suggest that business firms might be typed.  For many years business firms have been viewed according to industry types, size classifications, production or marketing regions, capital intensity, labor intensity, etc.  I am suggesting, however, that researchers devote more attention to classifying business firms into empirical types on the basis of social impacts.  In the next chapter (Chapter 7) some attention is devoted to classifying firms or persons on the basis of human perceptions.  In this chapter (Chapter 6) our concern will be more upon classifications based upon general statistics on businesses, e.g., earnings margins, product prices, pollution expenditures, etc.  Research along similar lines has taken place with respect to finding nation types.  Rummell, for example, writes:
    Students of comparative relations have always dealt with nation types.  One type that has played a dominat role in the theoretical and applied international relations is that of the powerful nation.  This type has become so widely recognized as implying set characteristics and international behavior that we readily employ the noun "powers" alone to refer to nations of this kind.  Such nation "types" as "modern," underdeveloped," "Constitutional," "status quo nations," "prismatic," "aggressive," "traditional," and "nationalistic," have only to be mentioned to evidence the prevalence of typal distinctions.

The problem with the prevailing types is that the rationale underlying the categorization is not explicit (and that it is not clear whether the type really divides different kinds of variance).  If we are to deal in types, a clear and empirical basis for the distinctions must be made.5


1    R. B. Cattell, Personality and Motivation Structure and Measurement (Yonkers-on-Hudson, New York: World Book Company, 1957, p. 383).

2    Definition varieties for "type" are discussed by Cattell, Ibid, pp. 364-69.

3    G. H. Ball, Classification Analysis, Stanford Research Institute, Project 5533, Stanford, California, 1971.

  R. B. Cattell, "Taxonomic Principles for Locating and Using Types (and the Derived Taxonome Computer Program)," in Formal Representation of Human Judgment, Edited by B. Kleinmuntz (New York: John Wiley & Sons, Inc., 1968, p. 104).

5    R. J. Rummell, The Dimensions of Nations (Beverly Hills, California: Sage Publications, 1972, p. 300).


6.3--Condensation of Data: The Need for Parsimony

In spite of the difficulties of detecting, recording, and attestation of corporate impact data, equally difficult problems arise in utilizing such data.  Decisions are made by humans (or decision rules set by humans) and, unfortunately, the human mind is easily boggled by relatively small amounts of data.  As facts and figures begin to pile up, the decision maker devises means of organizing, categorizing, and summarizing in an effort to achieve parsimony in what he or she must comprehend and evaluate.  At one end of the spectrum are masses of disconnected facts; at the other end are a few condensed statements or measures.

Within a firm, the degree of condensation of traditional accounting data varies with the manager's level in the organization and the use to which information is to be put.  In social accounting we are still at a stage where we have a basket of apples, oranges, rocks, carrots, thistles, roses, rabbits, turtles, monkeys and ad infinitum.  Methods of condensation of heterogeneous social accounting items are undeveloped.

In traditional accounting, condensation typically consists of additive aggregation, e.g., operating managers may only see labor cost aggregated over people and time.  Top management examines summary reports over multiple divisions, subsidiary companies, and longer intervals of time.  The investing public receives even more parsimonious aggregations.

Another means of data condensation is the filtering process.  For example, budget or standard items may automatically be compared (by computer) with actual out comes.  Operating managers may only act upon "exception" phenomena, e.g., aberrant phenomena which vary from standard by some predetermined amount.  The aberrant phenomena are "filtered" out and acted upon.  Similarly, public press releases are usually about aberrant events apart from routine day-to-day happenings.

Typically an analysis is conducted whenever hidden or obscure relationships are suspected which are not evident in either the basic or aggregated data.  Analysis may, in turn, facilitate further condensation and parsimony, especially if the analysis yields crucial "measurements" needed to achieve further condensation.  The term "analysis" has a connotation of breaking something down into component parts, whereas "condense" implies combining component parts into a denser whole.  However, in science the term "analysis" does not necessarily imply less parsimony, e.g., one of the objectives of factor "analysis," component "analysis," cluster "analysis," regression "analysis," and other statistical analysis tools may be that of achieving parsimony.  As such, some form of "analysis" may be part of a data condensation process.  Similarly, in accounting a cost analysis may entail decomposition of "total cost" into various "component costs."  However, this is not necessarily the same as moving a step backwards on the condensation spectrum.  For example, total cost may be analyzed to break it down into fixed and variable components.  The analysis may utilize detailed data from labor and materials records, but the analysis may identify a relationship (e.g., linear) which facilitates parsimony and condensation.

In corporate financial accounting, the higher-most levels of condensation (after much aggregation, filtering, and analysis) are financial statements items and various computed statistics (e.g., working capital ratios and earnings-per-share) derived from financial statement items.  For example, the total assets reported (in billions of dollars) at the bottom of a General Motors Corporation annual report is a condensed measure of the millions of heterogeneous items of value held by the company.  The condensation process which yielded such a figure for G. M. Assets entailed a myriad of accounting "rules" of measurement.

At nearly every point in the accounting condensation process, accountants disagree as to the proper "rule."  As the condensations become more parsimonious, the accounting disputes are more pronounced.  One of the constant sources of difficulty is the penchant (based on centuries of tradition) of condensing on the basis of monetary units (i.e., a numeraire).  For example, cash in bank accounts, inventories, land, buildings, and all other items termed "assets" in the General Motors balance sheet are measured in dollars, which in turn, makes the heterogeneous items additive in a common scale of measurement.

Since it is even more difficult to measure most corporate social impacts in monetary units, accountants are reluctant to extend financial boundaries into unexplored social accounting territory.  Attempts to do so (e.g., the Abt Associates Social Audits6) have been highly controversial both as to method and to purpose.  Social audits have primarily been confined to descriptive listings of corporate social endeavors, with little or no attempt to measure or aggregate over heterogeneous items.  The question is whether it is possible to do more than just hold forth a basket of social accounting apples, oranges, rocks, carrots, thistles, roses, rabbits, turtles, monkeys, and so on.


6    See Chapter 3 of the book (cited at the top of this table).


6.4--Multivariate Data Analysis (MDA)

It is evident from preceding chapters (and Appendix A) that corporate social accounting entails multiple variates in areas of environmental impacts, consumer impacts, employee impacts, etc.  In this chapter I will turn to a number of multivariate data analysis (MDA) techniques employed in scientific research.  The objectives in most instances are to both achieve parsimony and to discover hidden unknown relationships.  It should be stressed, however, that rarely do MDA techniques disclose underlying casual mechanisms.  At best, the outcomes in MDA aid in prediction and possibly provide clues in the quest for discovery of causal relationships.

It should also be stressed that, in spite of intricate and complex mathematical formulations, the MDA outcomes are often not conducive to statistical inference testing.  Accordingly, MDA is usually a first exploratory step rather than a conclusive final stage in the analysis.

An extensive body of theory concerns MDA applied to continuous variates.7  Models used for such purposes include multiple regression, multiple discriminant analysis, canonical correlation, partial correlation, cluster analysis, factor analysis and related approaches.  Closely related are the classical experimental design models and analysis of variance (ANOVA) intended for analyzing a continuous criterion variate over discrete predictor variate cross-classifications.

Nominal variates may be analyzed in various ways.   Binary variates, for example, may often be included with continuous variates and treated as if they themselves are continuous, e.g., binary variates are commonly included as predictors in multiple regression equations.  Another means of nominal variate analysis is available in multivariate contingency table analysis.  For example, stepwise procedures utilizing maximum likelihood theory are availabe.8

Ordinal variates are usually the most difficult to analyze.  The usual procedure is either to (i) ignore the ordinal property and analyze ordinal variates in contingency tables, or (ii) ignore the discrete property and treat ordinal variates as continuous variates.  In recent years, however, multidimensional scaling (MDS) techniques have opened up a new line of approach.  In particular, MDS is useful in mapping preference or similarity orderings into metric space, and as such was a major breakthrough in analyzing subjective preferences.  This subject is taken up in greater detail later on in Chapter 7.

Few MDA techniques have been employed in corporate social accounting.  On occasion, social impact costs have been analyzed in some MDA models.  For example, studies utilizing regression techniques in air pollution impact measurement were reviewed in Chapter 4.  In the remainder of this chapter, potential applications of several other MDA tools will be explored, in particular general purpose multiple variate display and numerical taxonomy techniques.


7    References are legion.  I have compiled and abstracted thousands of MDA references on computer tape, R. E. Jensen, A Computerized Bibliography in Multivariate Data Analysis c/o South Stevens Hall, University of Main, Orono, Maine 04473.  Also see J. L. Dolby and J. W. Tuckey, The Statistics Cum Index (Los Altos, California: R&D Press, 1973).

8    See L. A. Goodman, "The Analysis of Multidimensional Contingency Tables: Stepwise Procedures and Direct Estimation Methods for Building Models for Multiple Classifications," Technometrics, Vol. 13, 1971, pp. 33-61.


6.5--An Illustration: Search for Types Among Twelve Electric Utility Companies

Throughout the remainder of this chapter, some electric utility company data will be analyzed for illustrative purposes using a variety of techniques.  It should be stressed that the intent is to illustrate the potential application of certain MDA techniques in comparing corporations in terms of multiple criteria.  In no way is this intended to be a thorough analysis of the companies involved.  It should also be noted at the onset that, although the data used in most of the illustrations in this chapter are continuous, many of the MDA approaches discussed are easily adapted to discrete data as well.

The electric utilities chosen for this section are the N=12 private power corporations listed in Table 6.1.  These were selected from the fifteen companies investigated in considerable depth by the Council on Economic Priorities.9  The three smallest companies are not included here, mainly for convenience in certain graphical displays presented later on.

 

 

 

 

 

The focal point for many of the illustrations which follow will be the Table 6.2 data on variates x1,...x10.  It might be noted that except for x1 (megawattage), the other variates x2,...x10 are not necessarily directly associated with size of the companies involved.  For example, whereas pollutant volumes would normally be expected to increase with the size of an electric power company, percentage data such as that given for x7,...x10 pollution variates need not behave in such a manner.

The reader is cautioned about some of the conclusions which are either explicitly drawn or implicitly inferred in the illustrations which follow.  These conclusions follow only from the data as tabulated in the Council on Economic Priorities Study.  The write-up for the CEP study contains many footnotes and other explanations on the nature and limitations of this data.  Most of these explanations are not repeated here but should be carefully heeded before accepting my analysis of the published data as fact.

In some of the graphical displays it is difficult to handle more than a few variates at a time.  Therefore, from among the M=10 variates in Table 6.2, a select of subset four social impact criteria was extracted and is comprised of:

(The Four-Variate Subset)
x3= Earnings margin;
x4= Cost per kwh;
x5= R&D proportion;
x6= State-of-the-art pollution control inadequacy.

The above four variates cut across various interest groups, including shareholders, consumers, local communities, and the public-in-general (who might be especially interested in the R&D commitment.).


9    Charles Komanoff, Holly Miller, and Sandy Noyes, The Price of Power: Electric Utilities and the Environment, Edited by Joanna Underwood, (New York: The Council on Economic Priorities, 1972).


6.6--Graphic and Other Display Techniques

6.6.1--Purposes.  Numerical data are convenient to view in graphical form whenever possible.  For instance, continuous variates are often displayed in Cartesian scatter plots along one, two, and occasionally even three dimensions.  Discrete data are often represented in histograms, pie charts, etc.  Such display techniques are familiar and need not be elaborated upon here other than to mention that they might be effectively employed in corporate social accounting.  For example, wages might be displayed in relation to age, sex, race, plant location, etc.  Pollutant outputs might be plotted in relation to time, weather conditions, plant locations, etc.  Product performance and plant safety might similarly be displayed in various ways.  To date, however, graphic displays are sparingly employed in corporate social audit reports.  Conversely, in the public sector economic and social indicators are commonly displayed in graphic form.

Some of the more common purposes of graphical displays are mentioned below:

  1. (COMMUNICATION).  Frequently the major intent is to communicate to other persons as concisely and efficiently as possible.  Graphical displays are advantageous first of all because they are more likely to capture attention than are long columns of numbers or paragraphs of text.  Secondly, graphical displays are frequently among the most parsimonious means of communicating data.

  2. (DISCOVERY OF DISTRIBUTION PROPERTIES).  Sometimes the analyst constructs a graphical display of a single variate in order to discover its distributional properties, e.g., dispersion and skewness.  Following a mathematical analysis, outcomes or residuals are often plotted in order to identify violations of assumptions in the analysis.  For instance, regression residuals are frequently plotted in an effort to investigate conformance with normality, homoscedasticity, and independence assumptions.

  3. (DETECTION OF ABERRANT PHENOMENA).  Often data are plotted in order to disclose phenomena deviating from norms.  Graphic displays are often a quick and simple means of detecting awry or extreme reactions.

  4. (DETECTION OF LEVEL DIFFERENCES, SHAPES, AND CLUSTERS).  Graphical displays often disclose differences in levels of observations.  However, whereas level differences may often be discovered by merely scanning the data, hidden patterns, shapes, or clusters of phenomena may be disclosed (in graphical displays) which are almost impossible to discern by scanning the data itself.

  5. (TRANSFORMATION AND CONCATENATION).  Graphics may assist the analyst in determining what, if any, transformations of the data (e.g., translation of axes, rotation, and scaling transformations) provide more useful results.  Often these become linked in a sequence and, through concatenation in interactive computer graphics, can be combined in one procedure.

  6. (INVESTIGATION OF VARIATE AND ENTITY RELATIONSHIPS).  Another purpose of graphical displays may be to analyze the relationship between two or more variates.  For instance, scatter plots along two dimensions are frequently employed to study linear or nonlinear relations of two continuous variates.  Smooth functions may be fitted amongst data points.  If one of the variates is time, the purpose may be to identify trends, seasonal patterns, structural shifts, and drift of a variate of interest over time.

Patterns or clusters may also be detected among entities.  For instance, companies (or divisions within companies) might be first plotted according to pollutant discharges and then be partitioned into subsets according to visual scannings of plotted points.

An advantage of visual display is the tremendous ability and flexibility of humans for detecting spatially and temporally distributed features in data.  Mathematical models, though often an aid in discovering relationships, have much less flexibility and adaptive innovation ability.

6.6.2--Limitations.  Graphic displays are physical representations of properties.  One limitation is that qualitative properties are usually cumbersome to display relative to quantitative properties.  Quantitative properties, however, are also difficult to display in more than two dimensions, even though the analyst is frequently interested in detecting patterns in multivariate space.  Thirdly, in most graphical displays there is usually an upper bound on the number of entities that can be effectively plotted and compared.  Fourthly, it is a fallacy to assume that graphic displays are a substitute for mathematical analysis.  Often the detection or communication of phenomena depends upon making appropriate mathematical transformations of data to be plotted.  Developments in computer graphics have greatly facilitated the combining of mathematics and graphics.

Various approaches have been proposed for graphical display to overcome one or more of the above limitations, although usually trade-offs are encountered.  Several of these approaches are illustrated in the following discussion.  In many of these approaches an added difficulty arises in that how the variates (properties) are assigned to graphic pattern components either unintentionally or purposefully biases the outcomes.  Also, too many variates may obscure existent patterns in subsets of the variates.

6.6.3--Profile Line Plots and Shape Correlations.  Although quantitative variates are difficult to plot in more than two dimensions, various techniques may be employed.  One such technique is profile analysis in which entities are usually compared on the basis of their "profiles" on two or more variates under study.  Profile analysis is employed extensively in educational and psychological testing, i.e., persons are compared on the basis of graphical profiles of test scores.  If variates are not measured in the same scales, they are typically standardized to avoid scaling differences.

For illustrative purposes, four variates (x3, x4, x5, and x6) were selected from the Table 6.2 data presented previously.  Although the raw data could be plotted in profile charts, I elected to standardize (normalize) the variates using the customary transformation

 

 

The resultant standardized variate outcomes are shown in the STDVAR matrix in Table 6.3.  The electric utility company profiles derived from this data are shown in Exhibit 6.1.

 

 

 

It is immediately evident that no single company is consistently "best" or "worst" in terms of all four of these criteria.  For instances, Oklahoma Gas and electric (OGE) had the highest earnings margin (19.4%) and the lowest allocation to research and development (9% of revenues).   Similarly, The Southern Company (SOC) has a relatively poor performance on three criteria but generates the cheapest power (1.69¢ per kwh) for average residential users.  On two criteria (earnings margin and price per kwh) Consolidated Edison Company of N.Y. (CON) falls way below all the other companies in performance.

A careful inspection of Exhibit 6.1 reveals a number of profile similarities.  The Southern Company (SOC) and Florida Power and Light (FPL) have rather close profiles except for the x5 (R&D) criterion.  Houston Lighting and Power (HLP), Oklahoma Gas and Electric (OGE), and Virginia Electric and Power (VEP) have similar profiles, especially in terms of the first three criteria.  Commonwealth Edison (COM) and Northern States Power (NSP) have somewhat close profiles on all three criteria.  Pacific Gas and Electric (PGE) and Southern California (SCE) are also similar except for the x5 criterion (R&D allocation).

These profile similarities seem to suggest certain geographic "types" since the above-mentioned likenesses are mostly between companies operating in somewhat contiguous regions.  This is interesting since some of the paired companies along these criteria have major differences as well, e.g., whereas SOC is a large holding company across various southern states and in 1970 generated electric power with 79.1% coal, 20.6% gas, and 0.3% oil, FPL is a much smaller southern company using 56% oil and 44% gas.10

When examining profiles, analysts are sometimes interested in comparing profile shapes (configurations) irrespective of differences in profile levels and/or scatter.  A transformation which facilitates such comparisons is the profile scatter transformation

 

 

This transformation eliminates both profile level (elevation) and profile scatter (standard deviation) differences.  The effect of profile elevation removal, in particular, is to bring profiles with similar configurations (at different levels) closer together.11  The profile scatter transformation yields what are called "pure shape" proviles.12  Profile charts derived after such a transformation of the data conform to the profile shape correlation coefficients computed from the formula

 

Type your question here and then click Search

This correlation coefficient (sometimes call a Q-technique correlation) is used when the analyst is interested in comparing profile shapes aside from elevation and scatter considerations.  In other words, the profile shape correlation coefficients are invariant under profile elevation and scatter transformations.  Other pairwise coefficients (such as Euclidean distances) are not necessarily invariant under such transformations, i.e., Euclidean distances reflect differences in profile levels whereas profile shape correlations measure differences in profile shapes (configurations).13


10    The Council on Economic Priorities, The Price of Power: Electric Utilities and the Environment, Op. Cit., p. 144.

11    From a mathematical standpoint, the profile elevation transformation (i.e., the subtraction of entity means) projects the entity scores from N space to a hyperplane of N-1 dimensions.

12    In mathematical terms, the profile scatter transformation projects N entity scores to a hypershpere of N - 2 dimensions of constant radius lying in a hyperplane.

13    The profile shape correlation coefficients can, however, be shown to be related to Euclidean distance by the formula

CORENT(I,H) = 1 - DISENT(I,H))2
_____________________________
2(M - 1)

 

where DISENT(I,H) is the Euclidean distance between Entity I and Entity H using STDENT data.


The profile scatter transformation was performed on the STDVAR data in Table 6.3, yielding the STDENT standardized entity matrix also shown in Table 6.3.  The STDENT profiles are plotted in Exhibit 6.2.  One surprising and quite unexpected outcome is the near congruence of the Pacific Gas and Electric (PGE) and Baltimore Gas and Electric (BGE) profiles in Exhibit 6.2.  This indicates almost identical profile shapes for these two companies on the four criteria being analyzed, i.e., the two companies have almost identical profile "shapes" in Exhibit 6.1.  Similarly, the Oklahoma Gas and Electric (OGE) profile is closely related in shape to both the PGE and BGE profiles.  This indicates that these three companies must also have high profile shape correlations coefficients.  Another surprising likeness in profile shapes, as revealed in Exhibit 6.2., arises between Commonwealth Edison (COM) and Southern California Edison (SCE).  In this case, the two companies have similar profile shapes but differ in terms of profile elevation (in Exhibit 6.1).

 

 

The above visual conclusions from Exhibit 6.2 are borne out by the profile shape correlation coefficients shown in Table 6.4.  The five highest correlations are as follows:

 

 

 

What is a little less obvious in Exhibit 6.2 are the profile shapes least congruent.  In Table 6.4, however, the most negative profile shape correlation coefficients are revealed as:

 

These differences are not especially surprising except for the Northern States Power (NSP) and Virginia Electric Power (VEP) profiles.  These two companies are somewhat similar in size and in fuel usage.14  However, whereas the NSP profile in Exhibit 6.1 is relatively flat, the VEP profile moves from a high on earnings margin and cost per kwh to lows on R&D and pollution control inadequacy.


14    In 1970, the fuel use for NSP was 66% coal, 33% gas, and 1% oil.  For VEP the percentages were 53.8% coal, 46% oil, and 0.2% gas.


6.6.4--Principal Component (Factor Score) Profiles.  Profile analysis becomes clumsy when more than five or six variates (criteria) are under study, e.g., imagine trying to compare profile patterns over twenty or thirty social criteria.  Often, however, multicollinearities exist such that one, two, or several principal components or factors account for much or most of the variation in an entire system of variates.

One approach is to transform the original variates into factors and then plot entity factor scores.  For one or two principal factors, entities can be plotted in scatter plots.  For more than two factors, entity profile configurations can be examined using underlying factors in lieu of original variates.

Suppose there are M variates under study.  There are two major reasons why factor scores may be more of interest than original data:

(1) Whereas the M variates under study may be systematically intercorrelated with one another, the factors (principal components) are linearly independent (orthogonal).  This is helpful in data analysis techniques which are linearly independent (orthogonal).  This is helpful in data analysis techniques which assume linear independence.

(2) The factors (principal components) are extracted in such a manner that they successively account for smaller portions of the total variation among the M original variates.  If the first few factors account for a large share of this variation, and if they can be meaningfully interpreted, it may be possible to describe the system more parsimoniously (i.e., in fewer than M variates).

The major difficulty in principal component or factor analysis often lies in interpreting the importance and meaning of the factors extracted from the original variates.  The relative importance of successive factors can be estimated by comparing their latent roots (eigenvalues).  Finding descriptive interpretations is more difficult.  The usual approach is to examine the factor loadings (eigenvectors), which are correlations between factors and original variates.  Frequently, subsets of the original variates having highest correlations with a given factor have something in common which is suggestive of what the factor depicts.15


15    This approach was illustrated in the Chapter 4 principal component analysis of air pollution and human mortality data.  An excellent elementary example is also provided in W. W. Cooley and P. R. Lohnes, Multivariate Data Analysis (New York: John Wiley & Sons, Inc., Second Edition, 1971, pp. 133-36).


For illustrative purposes, the pairwise correlations between variates x1,...,x10 given in Table 6.2 are given in the CORVAR matrix in Table 6.5.  An underlying factor structure is not easily determinable from merely scanning this correlation matrix.  A

I.    PAIRWISE CORRELATIONS (CORVAIR) BETWEEN TEN VARIATES IN TABLE 6.2

II.    FACTOR LOADINGS

III.    FACTOR INTERPRETATIONS

(1) Factor 1 (Air Pollution Control Inadequacy): This factor loads highly on overall pollution inadequacy (x6) and sulphur dioxide control inadequacy (x8), both of which reflect air pollution under-investment in state-of-the-art controls available.  This factor also loads relatively high on coal usage (x2), suggesting that heavy coal burning companies have a more serious under-investment in such controls, although there is considerable dispute over what constitutes "state-of-the-art" control, e.g., the wet scrubber dispute is discussed later on.

(2) Factor 1 (Technology): This appears to be largely an R&D (x5) and nitrogen oxides control inadequacy (x9) factor, the two variates being highly correlated at -.818.  Size of company in terms of megawattage (x1) also loads highly on Factor 2, partly reflecting the fact that there is a tendency for larger companies to have a higher R&D proportion and lower nitrogen oxides control inadequacy.

(3) Factor 3 (Financial): This appears to be a combination of the company's earnings margin (x3) and average customer price per kwh (x4), the two being negatively correlated at -.5145.

(4) Factor 4 thru 10 (Junk): These factors have latent roots less than one, and hence, are not viewed as relevant underlying factors.

IV.    LATENT ROOTS (EIGENVALUES)

Factor Latent Root Variance Accounted For
    Percentage Cumulative
1 2,7466 27.466% 27.466%
2 2,6882 26.882% 54.348%
3 2,4751 24.751% 79.099%
4-10 2,0901 20.901% 100.000%

 

principal component analysis on the variates x1,...,x10 in Table 6.2 yielded the outcomes in Table 6.5.  Three factors emerged with latent roots exceeding one.  These three factors account for 79.1% of the variance in the ten-variate system.  Interpretations of these factors are not at all obvious or concise.  Based upon the rotated factor loadings shown in Table 6.5, the best interpretations I could come up with are also given in Table 6.5.

The illustration points out one of the potential frustrations with principal component or factor analysis in general, i.e., a frequently encountered situation arises in which there is no concise and all-embracing concept for two or more rather heterogeneous variates closely correlated with a factor.  This is particularly evident in Factor 2 in Table 6.5, which loads highly on research and development (x5), nitrogen oxide control inadequacy (x9), and megawattage (x1).  It is also evident in Factor 3, which loads highly on earnings margin (x3) and cost (price) per kwh to an average residential electricity consumer (x4).

 

 

The outcomes in Table 6.5 were utilized in transforming the M=10 variates (in Table 6.2) into the major factor scores (on each entity) sown in Table 6.6.  The company (entity) profiles derived from the standardized factor scores (SFSCENT) are shown in Exhibit 6.3.  No company consistently performs highest on all criteria, although SCE performs relatively well on all three major underlying factors, brief interpretations for which were given in Table 6.5.  The inconsistent performance of CON is manifested in its somewhat reasonable performance on Factor 1 (pollution control) relative to falling way below other companies on Factor 3 (financial performance) due to a combination of having both the lowest earnings margin and the highest kwh rates.  The inconsistent performance of AEP is also evident in its poor showing on Factor 1 (pollution control) relative to the highest showing on Factor 2 (technology) due to a combination of having a relatively high R&D commitment (x5) and a low nitrogen oxides state-of-the-art underinvestment (x9).  As indicated previously, however, the AEP performance on x9 is misleading since it is the lack of technology for "state-of-the-art" pollution control rather than investment in pollution controls which gives the coal-fired AEP such a good score on x9.

 

Similarity in both level and shape on the three principal underlying factor profiles in Exhibit 6.3 are also evident.  For example, the large coal burning companies (AEP, COM, and SOC) have very similar profiles, with AEP pulling ahead on Factor 2 due to a higher R&D commitment.  In contrast, the smaller natural gas-fired OGE and HLP companies have almost congruent profiles with shapes nearly opposite those of the large coal-fired companies.  The larger SCE, however, does not succumb to the OGE and HLP drop along Factor 2 because of the exceptional performance of SCE on both R&D (x5) and nitrogen oxides (x9) criteria.

One of the most important outcomes in the factor score profiles in Exhibit 6.3 arises in the amazing similarity between the Florida Power and Light (FPL) and Northern States Power (NSP) profiles.  In contrast, the M=10 variate raw scores for these companies (see Table 6.2) are much more divergent.16  This phenomenon provides an important illustration of how principal components or other types of factor analyses can be used to reduce a large number of variates into a more parsimonious subset of underlying principal factors.  At the same time it also illustrates "overkill" in the sense that the outcome may be too parsimonious.  For example, the primary determinants of Factor 3 appear to be quite different social impact criteria which, at least in this data, are negatively correlated.  Company scores on Factor 3 are caught between opposing forces.  For example, the FPL "poor" showing on earnings margin (x3) pulls against the FPL "good" score on electricity pricing (x4).  Similar negative correlations in performance criteria are present in other factors.  Hence, this is the case where, because of opposing interests in given factors, less parsimony in terms of keeping opposing criteria separated is probably more meaningful.


16    Also note the divergent FPL and NSP profiles in Exhibit 6.1.


6.6.5--Fourier Series Profiles.  In the preceding section, a principal component analysis was reported in which M=10 variates were parsimoniously reduced to M'=3 factors (principal components).  The resultant factor scores were plotted in the Exhibit 6.3.  Suppose, however, that such an analysis yielded a substantially larger number of underlying factors, e.g., suppose M=50 variates produced M'=15 factors of interest.  Profile charts are difficult to construct and evaluate for more than a few factors.

An alternate approach which is especially interesting when there are more than a handful of underlying major factors is to use a Fourier series method originally proposed by Andrews.17  The procedure for plotting multivariate observations on each entity is to compute the following Fourier series transform on each entity (e.g., each company):

 

The f(t) function is then plotted (best results are obtained from a computer plotter) for values of t over the range ±3.1416, such that each entity receives a plotted curve over this range of t.  Profiles of entities may then be compared both as to level and to configuration.  The number of variates is not a limiting constraint, i.e., the f(t) function is plotted against t rather than the xJ variates.  When the xJ variates are linearly independent and certain other assumptions are met, the f(t) outcomes have a number of interesting properties and are conducive to statistical inference testing of differences between entity profiles.

Proceeding by way of illustration, consider the factor scores shown previously in Table 6.6.  These outcomes were transformed into Fourier series curves plotted in Exhibit 6.4.  Most plotted f(t) profiles yield conclusions similar to those derived previously from the profiles in Exhibit 6.3.  For example, in Exhibit 6.4 the FPL(E) and NSP(G) curves are nearly congruent, indicating that these two companies have almost identical scores on the three major underlying factors.  The similarity among the three largest coal-fired companies (AEP(A), COM(C), and SOC(K)) are also evident in their bell-shaped curves which differ markedly from the curves of the other companies.  The natural gas burning companies HLP(F) and OGE(H) also have similar profiles.  The widely differing performances of CON and SCE are also evident.

When there are only a few factors (e.g., the three factors in Exhibit 6.3) there seems to be little advantage in resorting to the more complex Fourier series profiles such as those in Exhibit 6.4.  The Fourier series approach becomes more interesting when the number of factors becomes too unwieldy for a profile analysis on all factors simultaneously.  However, both approaches (e.g., those in Exhibits 6.3 and 6.4) are cumbersome when there are very many entities, e.g., the N=12 profiles plotted in the preceding profile exhibits approach the limit of human ability to visually compare profiles.

 

 


17    D. F. Andrews, "Plots of High Dimensional Data," Biometrics, Vol. 28, March 1973, pp. 125-36.


6.6.6--Geometric Patterns and Plotted Caricatures.  Instead of plotting multivariate data as scatter plots or profile line plots, it is sometimes better to consider other geometric patterns (e.g., triangles, rectangles, etc.) or caricatures (e.g., facial sketches).  It may be particularly advantageous to do so when:

(1) The number of entities (N) is such that profile lines overlap and crisscross so much that entity comparisons are difficult, e.g., previous profile plots of N=12 electric utility companies were difficult to evaluate because of numerous intersecting line segments.

(2) There are discrete qualitative variates under study which can be depicted as varying geometric shapes or caricature components.

There is a limit to how many entities (N) can be depicted or how many variates (M) can be incorporated as features in geometric patterns or caricatures.  In recent years, however, a number of interesting innovations in these areas have arisen, some of which will be illustrated here.

For example, Edgar Anderson proposed the drawing of geometric patterns which he termed "glyphs."18  These were intended primarily for the graphical display of multiattribute discrete variates in biology.  A glyph has a base (or core) with rays pointed upward, where each ray depicts a different attribute.  For example, an attribute having three categories is depicted by Anderson as a ray having three lengths, i.e., zero, medium, and long.

A slightly modified glyph approach is illustrated in Exhibit 6.5.  In this case the standardized variates on x3, x4, x5, and x6 social impact criteria in Table 6.3 are depicted as separate rays (in clockwise order).  Each glyph corresponds to a different electric utility company.  The ray lengths are marked into unit gradations where:

 

The origin on a standardized variate (which is also the mean of a standardized variate) is marked with a "o" on those rays for which companies scored at or above the mean on the criterion in question.

 

 

In Exhibit 6.5 each glyph is plotted in a two-dimensional Euclidean space, where the horizontal axis corresponds to x1 (megawattage) and the verticle axis corresponds to x2 (coal usage) raw data scores from Table 6.2.  Note that the largest coal burning companies (AEP, COM, and SOC) are isolated by themselves in x1 and x2 space.  Smaller companies which also rely heavily on coal (NSP, BGE, and VEP) also cluster by themselves.  Companies which use little or no coal are also clustered on the x1 axis as large (SCE, PGE, and CON), medium (HLP and FPL) and small (OGE).

The net result is that in Exhibit 6.5 multivariate data in six dimensions are plotted in two-dimensional space.  The company glyphs resemble frontal views of wounded biplanes returning from battle.  Performances on the x3, x4, x5, and x6 standardized criteria appear as wings (rays) of varying lengths.  If the origin, "o," is shown on the wing (ray), the company performed at or above the mean on the criterion in question.  The "o" origins resemble engines beneath a wing.  In this context, a company has an "engine" on a wing if it performed at or above the mean performance on that criterion.

In this sense, the "best" performing companies in Exhibit 6.5 are those with the longest wings.  The only company performing above the standardized mean (zero) on all four social impact criteria (and therefore having all four "engines" intact under its glyph wings) is Pacific Gas and Electric (PGE).  Both HLP and OGE are natural gas burning companies which perform at or near the best on three criteria (x3, x4, and x6) but have little or no wing (ray) length on the x5 (R&D) criterion.  Similarly, SCE performs quite well on three criteria but falls slightly below the mean on the x4 (kwh price) criterion.  AEP and NSP are also "three-engine" glyph biplanes, where AEP falls short on x6 (pollution control inadequacy) and NSP falls short on x3 (earnings margin).

In contrast, CON barely flies along on its single x6 (pollution control inadequacy) engine whereas FPL limps on its x4 (price per kwh) performer.  Other single-engine glyphs (BGE and COM) have better balance in terms of wing (ray) length on all four criteria in Exhibit 6.5.

Among all the graphic display approaches illustrated thus far, I find the glyph approach quite appealing.  Anderson's glyph rays are plotted according to discrete ordinal scales, although nominal or continuous (as illustrated in Exhibit 6.5) variates may be plotted as glyph rays.  Glyphs may also be used as geometric pattern representations without having to be plotted in Euclidean space.  Anderson recommends no more than seven rays and that rays do no extend in all directions.  He also recommends having no more than three discrete levels for ray length (a recommendation which was not followed in Exhibit 6.5).  Continuous variates may also be transformed into these three discrete ordinal categories.  Multiple rays may be used for more than three categories or complexes of related variates.  Anderson writes:

In attempting to work out complexes of related qualities, the analysis is facilitated if the ray lengths are coded in such a way that all the extreme values characteristic of one complex are assigned long rays and those characteristic of the other are assigned no rays.  For example, in studying hybridization between two subspecies of Campsis, one of the subspecies had a short tube, a wide limb, and much red in the flower; the other had a long tube, a small limb, and little red.  Redness and limb width were coded with long rays for much red and for wide limbs, tube length was coded in reverse with a long ray for short tubes.  This meant that those hybrids closely resembling the other parent as (sic.) a rayless dot.19

For purposes of graphic plotting, the symbols drawn may be triangles, line segments, polygons, or most any caricature imaginable.  One of the most unique caricature plotting ideas is described by Tversky and Krantz.20  They depict alternate sketches of face shape (long versus wide), eyes (empty  versus filled-in), and mouth (straight versus curved) to represent three binary variates in two-dimensional plots.  The facial sketches were then used in a visual perception test of interdimensional additivity, i.e., that overall dissimilarity between faces could be decomposed into additive components represented by varying facial features.

A more extensive and general facial plotting program was apparently developed independently by Chernoff,21 although both Tversky-Krantz and Chernoff utilize elliptical components.  Each variate (initially the computer program developed by Chernoff can handle up to 18 variates, but the program can be modified to accommodate more variates) is represented as a feature (eye shape, eye size, mouth shape, mouth size, etc.) in a computer-sketched face.  Differing values of the variate are distinguished by different sizes and/or shapes of the feature in question.  Each entity is depicted by a particular face whose features are determined by observed values of variates on that entity.  An advantage of facial caricatures over glyph plots is that numerous features can be depicted in faces whereas Anderson found that glyphs with more than seven rays were too cumbersome.

The facial features in Chernoff's original program are listed in Table 6.7.  If there are fewer than M=18 variates under study, a given variate may (i) be assigned to more than one feature or (ii) certain features may remain fixed.

 

 

For instance, the N=12 entities (electric utility companies) measured on M=4 social impact criteria in Table 6.3 are plotted as faces in Exhibit 6.6.  In this case the M=4 variates were randomly assigned to four different facial features, giving rise to 16 features which vary among the N=12 faces plotted in Exhibit 6.6.  The faces have been arranged in two-dimensional Euclidean space on x1 (megawattage) and x2 (coal usage) from Table 6.2, i.e., the exhibit depicts two Cartesian variates and sixteen facial variations determined by x3 (earnings margin), x4 (kwh pricing), x5 (R&D), and x6 (pollution control inadequacy).  Recall that the latter four criteria were also displayed in Exhibits 6.1 and 6.2 in profile charts and Exhibit 6.5 as glyph rays.

After plotting the faces, I had a number of students, businessmen (e.g., those who attended my N.A.A. courses on accounting for corporate social responsibility22), and other friends try to match up the faces.  For this purpose the faces were not plotted in Euclidean space on x1 and x2 as they are in Exhibit 6.6 nor was there any indication as to what the faces depicted.  Interestingly, rather consistent partitionings of these N=12 faces into G=5 clusters (groups) emerged from those subjective evaluations.