Classification interaction is empirically demonstrated, and I’m thrilled about that. For the “Big Data” workshop at SIG/CR I proposed a preliminary survey research project in which a sample of the nine million UDC numbers in the WorldCat would be used to match deconstructed components of the UDC expressions to content-designated components of the respective bibliographic records. The purpose was to learn about the interrelationship between a faceted classification and the artifacts it represents. All of the variables (except age of work) were nominal-level, so I used Chi-squared to look for statistically-significant correlations. It was thrilling to find correlations all through the study. Results (and definitions of all of these terms!) are in the paper “Big Classification: Using the Empirical Power of Classification Interaction” in the 2013 SIG/CR Proceedings (or will be). The outcome is preliminary but exciting nonetheless.
But just when I thought it couldn’t get any better I took one more look at the largest results table and realized it was revealing a network among the correlations. I was therefore doubly thrilled (with some coaching from Laura Ridenour) to be able to create a visualization of that network structure using Gephi 0.8.2. Here is an early version (not the one that appears in the paper):
I haven’t made a new post to this blog in quite awhile.
However, I have been sitting at my desk today for seven hours now working on Knowledge Organization. I probably have another seven hours to go to get caught up. Not including editing the next issue.
Just saying ….
About ten days ago there was a breathless story on the evening news about how “more information” appearing on New York City fast food menus was not being used by consumers. Told that sandwich A had 150 calories and sandwich B had 850 they were all buying sandwich B. How could this be?, wondered the newscaster, that people overlooked “information.” All of the talking heads interviewed were chefs, consumer advocates, and dietitians.
Not one knowledge organization specialist. Not one commentary on “concepts” of food, or the problems of homonymy and synonymy and meronymy, not one comment on cognition or cognitive overload or navigating networks pathways. A missed opportunity I think; we should’ve been right there, commenting.
In last week’s Economist is a story about “Ad scientists” with a headline image that looks an awful lot like some of my WordStat™ visualizations–lots of little boxes with network lines connecting them in pretty colors; all of it cast as terminological catch in a fishnet. The story begins with the example of what happens when someone searches “tennis balls” using three different search engines. Some of the results are said to be “organic” and others are paid links.
Now, why are no knowledge organization scientists cited in this paper?
How could it be that searchers are thrown off by overload, in which case they turn to the first available organic link (Patrick Wilson’s “relevance as means-to-an-end”; cognitive overload, etc. etc.)
Apologies before I begin–as I’ve pointed out before I have my Ph.D. from the University of Chicago and when I was there it was still the bedrock home of empirical research methods. We were learning to conceive of the applications more broadly all the time, but it was the substrate on which everything else seemed to have been built.
I wish knowledge organization were thus. One of the reasons I have been so engaged in the CIDOC-CRM http://www.cidoc-crm.org/ and FRBRoo http://www.cidoc-crm.org/frbr_inro.html operations has been the empirical basis on which both ontologies are built.
I often teach doctoral seminars in knowledge organization and I always ask the students to produce original research that will contribute to theory. They do, and I’m proud of the work they do. Often when they ask what sorts of things they ought to study I tell them I read The Economist whenever I travel by air, and that I’m always shaking my head as I read about Prof. this and Dr. that and control group this and factorial experiment that. It seems there is substantial research in the world based on empirical premises. I’m always wondering how we in knowledge organization can get on those pages.
Here is an example from The Economist dated June 22, 2013, p. 83 (Safe Driving: Keep your mind on the road) (I was in Portland, Oregon, for my 40th reunion at Lewis & Clark College), about hand’s-free texting and how it is more distracting that using a mobile phone. Some folks at the University of Utah divided 102 volunteers into three groups and asked them to perform a set of tasks. They wore a hat that recorded mental workload. And among the groups the treatment variable that shifted was what they did–nothing, listening to a radio, phoning a friend, texting, etc. Some sat at computers, some used simulators, and some were in actual motor vehicles. Talk about a factorial experiment! Brilliant!
I’ll leave it to you to discover the results in The Economist. But let’s think about this sort of work in our own domain. It is rare indeed. Notable exceptions include La Barre’s testing of facets in online catalogs and Milonas’ partial replication of it. We have a lot of excellent descriptive research including my own work on instantiation and Greenberg’s replication of it among botanists; and Kipp’s landmark work on social tagging.
Let’s take up the cause of creating more experimental work. Let’s get in The Economist. (The closest I’ve come so far, was my study showing that social taggers display a bandwagon effect, which was picked up by the Globe and Mail from a CAIS conference, but they didn’t ever report on whatever it was that attracted their attention.
Here is a sign I saw recently. It was in a public space and in a country where I had never visited before, but then again it was in a university hall, so I can’t really say that I was so culturally shocked that I didn’t comprehend it. Still, I took it’s picture, didn’t I?
I had a lot of contemplative time that day because I didn’t really speak the language in which most of the discussion was taking place, so although I could read the slides people were showing and sort of follow along, I also had time to let my mind drift. I looked at this set of images, and I laughed a bit to myself and resolved to take a picture when the next break came along. Then I got to thinking about Otto von Neurath and his attempt to use visualization to advance human communication, in particular to use images as a sort of universal language. One supposes it is from that impulse that we get the confusing array of icons on the dashboards of new automobiles today. The point is that even simple images, like those shown here, can be confusing.
That brings me back always to phenomenology and the notion of noesis, that humans perceive through ego acts, or, to try to put it more simply, we see new things always through a lens of those things we have experienced in our past. The reason I laughed (not quite out loud) when I looked up at this sign was that I read in my head “no cigarettes, no radios, and no hamburgers.” Well, why not? The cigarette is clear enough I suppose. But to my unfocused gaze that image in the middle looks like the kind of radio we all had when I was a teenager. You’d set it in the sand near your ear so you could listen to it but it wouldn’t bother the other people on the beach, the sound of the surf providing useful cover. And if that isn’t a hamburger on the right I don’t know what it is! Ok, with a large soda, but obviously no fries. Maybe this means “no carnivores”?
Well that’s the majority of my point I think, that we simply cannot take a simple notion of “concept” seriously as a concrete entity because there just is no such thing. All concepts, no matter how simple, are perceived along a zillion personal continua. Knowledge organizations can provide frameworks but precision will always escape us.
Which is why we need to move to faceted systems–not categorized systems, but true facets–that embrace contexts, because it is the contexts that mediate individual perceptions. A faceted KOS that permitted contextual entry first and conceptual second would allow users to gauge the parameters of noeitic mediation involved in a given search, or in a given set of assigned semantic concepts. Just for fun, here is the uncropped image. I admit it isn’t the best example; still it shows a column, in fact the top of a column in an industrial strucutre with cinderblock walls and an airduct there on the ceiling–that makes it relatively clear this is some sort of public space, like a classroom, and that also makes it a bit more clear why those certain things are prohibited.
I know now that thing in the middle is a mobile phone, because they don’t want people chattering. The sandwich and drink on the right probably mean “no eating or drinking” (see, I did get it, after considering the context). Still, it would be more useful to show someone with a full mouth I think and that hash mark across it.
This was in Rio de Janeiro, by the way, at the recent ISKO Brazil conference held at Fundação Getulio Vargas: Portal FGV.
I have a Ph.D. from the University of Chicago.
Let me repeat .. ok, never mind … but it does mean that a) I know the difference between a survey, which is what George Washington did, and a questionnaire, which is a survey (!) instrument; and b) I am appalled at the use of the “survey” for commercial interests. Every time I buy something I am asked to “fill out the survey, it will just take a few minutes” (add it all up, it’s hours per day!), and if I dare give someone less than a stellar rating (which I must do, if the rating is to have any meaning) I am punished by being phoned and queried mercilessly.
the phrase is either “I woke up” or “I was awakened”