I’ve been toying with WordStat™ software from Provalis Research again. It is very useful for the kind of qualitative analysis required in domain analysis. One valuable tool in the content analysis package is a KWIC index. Ancient students of KO will recognize that acronym for “Keyword-in-Context,” a kind of indexing once thought potentially fruitful. Here is an example including three “contexts” for the word “model” from ISKO 13’s proceedings.

A functional model of information retrieval systems
A reference ontology for biomedical informatics: the Foundational Model of Anatomy
Towards a Comprehensive Model of the Cognitive Process and the Mechanisms of Individual Sensemaking

As you see, it is very useful for comprehending the precise context of those big words that show up in the center of word clouds or the foreground of MDS plots.

However, the interesting thing I’ve just learned is that most of the presence of the term “information science” in our domain comes not from the keywords in research papers, but rather from the title of the third most cited journal in our domain JASIST (forgive me for not spelling out here, and using  that term again). Thus it is not that that term is a topic of critical interest, rather it is that as much as 20% of our research appears in a competing journal.

If our science is going to continue to thrive and grow, our authors need to stop sending their research to competing journals. Better a world in which our journal Knowledge Organization has to split into an A for ontology and a B for epistemology and a C for domain analysis, etc., than one in which the dispersion of our science hinders exploitative power and weakens the scientific structure of our domain.

The Core of Knowledge Organization

I famously wring my metaphorical hands about the number of authors who submit manuscripts to Knowledge Organization reporting research that is topically relevant, but showing absolutely no inculcation in the theories or values of the science of KO. Emotions range from demoralized to furious on these occasions. Fortunately, rational academic policies dictate manuscript acceptance, and in almost all cases we return these errant papers to the authors with instructions to go do their homework. Some of them do, happily.

I am in the midst of a domain analysis of the 75 papers presented at the recent ISKO International Conference in Krakow ( The complete results of that analysis will appear in an editorial in a future issue of KO. But the interesting thing I am seeing this time is that there is, indeed, a core of knowledge organization. Seventy-five papers, 1200-some citations, from 20 countries, citing over 400 journal articles, 300 books and 200 anthologies. And yet, most of the citations are to a tightly-knit intellectually coherent core of KO. Most journal citations by far (44%) are to Knowledge Organization, the majority of conference papers cited are in ISKO international conferences or regional chapter conferences, and the most-cited monographs are by Hjørland and Ranganathan.

It is good news, that there is such a strong and resilient and theoretically useful core of knowledge organization. The challenge, it seems, is to require those interloping into our topical areas to encounter our theoretical base.

What a concept!

I recently completed a rich analysis of the entirety of American Documentation in order to trace the evolution of the concept of a concept across that era of the growth of the emerging field of information science. I wrote a short paper on the subject for CAIS 2014 (available here:

The “abstract” is this: A core entity of information science is the “concept.” Agreement on the basic definition as a mental construct representing a concrete instance, conceals divergence in understanding of the nuances. A case study of the domain’s nascent era represented by American Documentation reveals some of the contours of the terms evolution.

There were lots of fun things to be encountered in those years of AD, and I was going to upload some photos of things like the rapid selector and Termatrex and so on, until I went to do so and found all of those “further reproduction prohibited” notices. Oh well. The whole run is available to ASIST members in the ASIST Digital Library.

I thought it was fascinating to see how interwoven knowledge organization was in those early days of documentation into information science. There was a lengthy evolution of something called “the duality concept,” which was an expression of the dichotomies between known-fact and browsing, between simple and complex terminology, and thus between isolate and hierarchy.

Stay tuned: a lengthy journal article is forthcoming.

Who wrote Aristotle? Boyd Rayward, of Course.

As a KO scholar enamored of what domain analysis can reveal, and unphased by the challenge of unindexed source material, I spend way too much time manually indexing things like conference proceedings. This always means reformatting and “cleaning” something like 1200 citations at a pop, to get them into some form that can be manipulated or mined for statistical parameters.

It’s a wonder I don’t get whiplash from all the shaking of my head that goes on during these sessions. Of course, as a journal editor I experience a lot of errant citing practice as well. At least in that case I have the prerogative to require the author to do it over and get it right.

I remember when I was a masters student at Indiana University a zillion years ago; one of my scholarly mentors explained to me how to prepare notes when engaging in literature review. She said, open the book and place it on the table at the right of your typewriter. Put a new sheet of paper in the typewriter and before you do anything else type out the elements of the citation for the book. Then you’ll always have them. And then, as you read along (instead of highlighting or underlining, which not only destroys the text, but which you can never find again anyway) as you come to something interesting type the page number then just type out the text. When you’re finished, you’ve got block quotes or potential paraphrases ready to insert into your analysis, together with the appropriate material for text references. Now, I don’t expect everyone reading this to rush out and buy typewriters, but I do commend the method to you. It has served me well for decades.

What is not appropriate is to just cite willy-nilly to show you did some searching. And what is just plain wrong is to cite from online citations instead of directly from the source material. (Note to authors: we know when you’ve just plopped citations in from citation databases or from software because when we convert your text to edit it all of the citations either disappear, or, they become URLs and we can’t edit without opening hundreds of windows. Don’t do that!)

A key point to keep in mind is that the purpose of a citation is the same as the purpose of a precise methodology and that is replication. Another scholar should be able to follow your path by finding the sources you cite, precisely.

So I won’t tell you what I’ve just been indexing so as not to embarrass anybody (not that you won’t be able to guess), but here are some of the more interesting things I discovered:

§Aristotle. Aristotle is important to knowledge organization, I give you that. But nothing he wrote is likely your actual source. This resource: is cited as:

Aristotle. 1994. Metaphysics, trans. W.D. Ross. The Internet classics archive. Cambridge: MIT.

Why? The date is the date of publication of the resource, not the date of writing. “Aristotle. 350BCE” is not an appropriate reference.

§OCLC is not an author. Well, usually. You don’t have to cite OCLC if all you’ve done is make reference to it in your text. Let’s say you’ve written “Often, in bibliographic utilities like OCLC’s WorldCat, [blah, blah, blah ....].” That does not require a citation in the reference list. What you can do, although even this is not really necessary, is place the URL in parentheses in the text: “Often, in bibliographic utilities like OCLC’s WorldCat (, [blah, blah, blah ....].” Don’t litter your reference list with URLs of websites from which you have not cited or paraphrased. On the other hand, if you are citing something specific, then please follow the general instructions for doing so.

§Reprints should be described using their own details of publication. Here is the classic example from my own writing: Wilson, Patrick. 1978. Two kinds of power: an essay on bibliographical control. Berkeley: UC Press. Reprint of 1968 ed.

§Works by classic authors, contained in anthologies, are described as chapters in the books in which they appear: Otlet, Paul. 1990. The science of bibliography and documentation. In Rayward, W. Boyd, ed., International organisation and dissemination of knowledge: selected essays of Paul Otlet. Amsterdam: Elsevier, pp. 71-86.

(Note, it is very impressive that you know this was written in 1903, but the date for the citation is the date of publication of the resource in which you read the material.)(Note 2, just think how Boyd Rayward, a really nice guy, would feel seeing his name next to a 1903 publication date!)

I’m thinking it would be fun for a doctoral seminar to give them this particular set of citations and give them fifteen minutes to figure out what the real citations should have been so they can actually lay hands on the resource. Hmmmmm.


It is appalling the number of manuscripts we receive for review for Knowledge Organization, that are about things like ontologies and taxonomies and domain analyses, and that cite absolutely no literature from the domain of knowledge organization.

Usually my first intuitive reaction is to think the authors simply were negligent in submitting their siloed papers to us without checking that our journal is published by a scientific society that might expect its own science to be used. Sometimes I have a second intuitive reaction that the authors are so siloed they do not even know that domains other than their own exist and have their own literatures. I suppose both of these are true to some extent.

Lately I have come to see that there is increasingly no connection–no synthesis, no syndesis, not even any syncopation–in the evolution of theory. I think this has something to do with the habits of researchers to conduct so-called literature reviews online using Google Scholar, or worse just Google alone, and never bothering even to go to the many multi-disciplinary indexing services available online through most research libraries (this ought to be demonstrable empirically; perhaps one could take a random sample of published articles and actually search for relevant literature? Never mind that this is the responsibility of peer-reviewers!). Internet resources usually provide something quasi-relevant (remember Patrick Wilson’s excoriation that relevance often means “satisfactory”?–see Two Kinds of Power), enough to fill out the tiny tweet-like excuses for paragraphs most people manage to type these days. But this is no proper approach to science.

Theory requires connection and connection requires sequence in human thought. In order to make sense of an empirical observation all of the science available that can be brought to bear must be connected. To move that empirical observation forward as an hypothesis, or to move the hypothesis forward as a theory requires that observations be classified cumulatively. It all requires “syn”–synthesis, syndesis, syncopation.

If either of the people reading this blog are considering contributing to the science of knowledge organization let them hie at once to the ISKO website and use the powerful new KO literature search tool: While they’re at it, let’s urge them to go to the ISKO member’s portal at Ergon-Verlag where they now can find KO from 1993 to the present and AIKO from 2006 to the present (and soon will find the entire backlog).

A little mystery

Accuracy in all aspects of scholarship is critical. It seems increasingly to me, as a journal editor, that authors are taking less care with citations than ever before. It’s a bit like what we hear about pilots getting lax because they know their planes have autopilot—authors no longer make extensive files of source publications because they can view an abstract online with a couple of clicks and use one or another citation service to get automatic citations. One problem for another time is how this seems to lead to ritual citation. But more to the point of this post, it leads to errant citations, if the author is pasting from a citation service (or worse, from another paper whose author pasted it, etc., etc.) rather than keying a citation from a source document. Of course, the story I’m about to tell might just not have anything to do with any of this; I’ve no way of knowing how this happened.

When we prepare an issue of Knowledge Organization for publication we do several things that involve cross-checking for accuracy. One of them is verifying all of the citations in the text and the accompanying references in the reference list. Sometimes, despite having three different people working on this (as a cross-check, of course) something will slip through the cracks and we’ll find ourselves at the twelfth hour having to hold up production because a mystery develops. This one had to do with a citation. The issue was ready for press and we realized nobody had answered the question about what this abbreviated citation really was for:

Ranganathan, S. R. 1967. Areas for research in library and information science (development of library science. 6). Library science 4: 235-93.

Immediately one question was obvious, and that was why there was something like a series statement in the title portion of a journal article citation. I asked my colleagues to verify the citation and was told nothing like that could be found anywhere. We all tried looking it up in various ways. It seemed very curious that we could not find this citation online (but then again, 1967 was eons ago in digital journal time). It also was not possible to locate any journal with exactly the title Library Science from this period.

I decided to search the catalog of the library at the University of Illinois at Urbana-Champaign. I used to work there years ago and I knew the collection was nearly exhaustive in information science. Also, UIUC is relatively nearby, so it would be possible to actually go there or send someone (or beg someone there) to look at the source if necessary. What I found in their online catalog was a journal called Library Science With a Slant to Documentation, published in India by SRELS (Sarada Ranganathan Endowment for Library Science) beginning in 1964 and ending in 1999, all of which seemed promising. However, I could not find a digitized copy of this journal anywhere by searching online. Volume 4 was dated 1967, but there was no explanation for the odd series statement, and there was no way to find a table of contents for the journal online. (I thought briefly of those halcyon days when long tables full of bound periodical indexes were at my fingertips, with citations stretching back more than a century; and the closed stacks of bound volumes were just through that little door over there ….)

I decided to turn to our ISKO colleagues by placing a notice on ISKO-L. Within a few hours I had several responses from around the world, acknowledging that we had found the correct title, and apparently the citation had employed a formerly standard title abbreviation. Paper copies of the journal were located. And even more oddly, European colleagues were able to find the digitized article online using Google. Now, why couldn’t we do that from the U.S.? I also heard from others in the U.S. who couldn’t find it online! How bizarre!

The next mystery arising concerned the phrase “library and information science,” because several people pointed out that Ranganathan would not have used that expression. Eventually a copy of the article was received from Kothi Raghavan; I’ll reproduce the first page here:


Sure enough, there is a series statement in parentheses within the title, and the title does not say “and information science” and the journal title is Library Science With a Slant to Documentation.

The upshot is there were at least three inaccuracies in the original citation, so it was good thing we chased it down rather than creating a bibliographic ghost by publishing it in erroneous form. But it also was a lesson in the pitfalls of relying too heavily only on our digitized sources. As I tell my doctoral students, who inevitably groan and refuse to believe me, a scholar has to look at the actual sources to verify their veracity.

The mystery was resolved and the correct citation appeared in Knowledge Organization. Thanks to Kathryn La Barre, Gerhard Riesthuis, Thomas Dousa, Vivien Petras, Joe Tennis, F.J. Devadason and Kothi Raghavan for helping resolve this little mystery.

And remember, apparently, caveat emptor applies to citations.

Doubly thrilled

Classification interaction is empirically demonstrated, and I’m thrilled about that. For the “Big Data” workshop at SIG/CR I proposed a preliminary survey research project in which a sample of the nine million UDC numbers in the WorldCat would be used to match deconstructed components of the UDC expressions to content-designated components of the respective bibliographic records. The purpose was to learn about the interrelationship between a faceted classification and the artifacts it represents. All of the variables (except age of work) were nominal-level, so I used Chi-squared to look for statistically-significant correlations. It was thrilling to find correlations all through the study. Results (and definitions of all of these terms!) are in the paper “Big Classification: Using the Empirical Power of Classification Interaction” in the 2013 SIG/CR Proceedings (or will be). The outcome is preliminary but exciting nonetheless.

But just when I thought it couldn’t get any better I took one more look at the largest results table and realized it was revealing a network among the correlations. I was therefore doubly thrilled (with some coaching from Laura Ridenour) to be able to create a visualization of that network structure using Gephi 0.8.2. Here is an early version (not the one that appears in the paper):bigudc


