Archive for November 2010

CRM as footprints of knowledge (originally posted 1/24/2009)

In the exercise of developing the CIDOC-CRM it became apparent that using the ontology to map information objects would reveal certain patterns of entities, properties, and relationships.
Furthermore, these patterns, when analyzed, reveal essential footprints of information objects. That is, like a genome, a CRM mapping records the essential informative properties of mapped
objects. The region for research here is pure theory. What categories can be observed among mapped information objects? When is a sailor’s deck-log like a terracotta hut urn? I have constituted a research team and with a very small grant from Long Island University we have begun developing techniques for mapping, and a calculus for data-mining the maps in order to generate clusters (or classes) of information objects. We had one poster at the ISKO conference this summer (“Classifying Information Objects: An Exploratory Ontological Excursion,” by Sergey Zherebchevsky, Nicolette Ceo, Michiko Tanaka, David Jank, Richard P. Smiraglia, and Stephen Stead. Poster presented at 10th International ISKO Conference, Montreal, 5-8 August 2008). The poster can be seen here (or have a look at these pdfs from ISKO and ASIST).

mining_asist08

Classify_ISKO10

Posted November 18, 2010 by lazykoblog in cultural heritage, epistemology

Tagged with ,

Bandwagons (originally posted 7-10-2010)

I suspect in the year 2010 the bandwagon is so old a metaphor few remember what they actually were like. Originally it was a wagon that carried a band in a parade. All of the recent meanings of the term derive from this, because the music is the fun thing in the parade that makes spectators want to follow along or climb aboard. Oh well.

As I read study after study of social tagging I began to wonder about the behavior of taggers. Most studies have demonstrated various properties of the tags themselves, and several studies have suggested tagging is some sort of egalitarian indexing-for-the-masses that would be ever so more useful if the taggers would just stick to a thesaurus. But I considered both of those assumptions unlikely. For one thing, if you inhabit a social networking site just enough to watch the tags go by out of the corner of your eye each day you see a surprising number of them that are self-centered expressions (not just “todo” though there is plenty of that, but also “wtf” and so forth). Also, again watching out of the corner of your eye, the really fascinating thing about the tags is the network of associations among them–in other words, what happens if you click on one, and then when you get to that destination click on the first one there, and so on–you’ll not be following any road that a thesaurus would have led you along (stay tuned for a blog entry about my work at VKS with Wikipedia). There was a lot of discussion about the difference between the main tags and the little ones populating the outer corners of those tag clouds as well, and that reminded me of the problem of noesis, which is the ego-act of perceiving through one’s own experience–this is a hallmark of Husserl’s phenomenology.

I designed a study of tags as exploratory, with the purpose of surveying the tags assigned to a random sample of sites in Delicious.com. I wanted to compare what I would find to prior studies to see whether there was any theoretical potential (there was), and then subsequently to analyze the behavior of the taggers to look for noietic behavior. I submitted an abstract to this effect to the 11th International ISKO Conference in Rome, and also I drew my sample all in one day. I based a sample-size calculation on prior studies’ figures about the proportion of affective tags, and then in my enthusiasm drew twice as many cases (sites) as I needed for 95% confidence. I was excited to get my feet wet with this kind of research. I’m glad I drew the sample manually so I could watch the data as I downloaded the sites and their taggers and their tags. But now I know why people use crawlers for this! My abstract was accepted, and along with it came some helpful referee comments, which sent me to the literature of cognitive linguistics. Bear with me, I was on a learning curve here.

For the conference in Rome I wrote a summary paper about the behavior I observed among the taggers. I discovered plenty of noietic behavior, and interestingly enough, although I was able to affirm the proportion of affective tags–the figure from my study fell within the confidence interval of the prediction from prior studies–the surprise was that the noietic tagging was not affective tagging. I also analyzed the entire sample to see what I could learn about co-tagging–in other words, which taggers were tagging together, and here was my first surprise. A substantial core of the taggers were, in fact, all focused on work on the same sites, and their co-tagging was nested in two clusters, which I was able to identify roughly as web designers and programmers (remember, we’re talking about Delicious.com); the web designers’ tags were descriptive and the programmers’ tags were slightly more likely to be affective.

All of this convinced me I had figured it backwards–the noietic behavior was not the weird stuff in the long tail, but rather was the common ego-act perceptions of the tightly-knit group of co-taggers. In other words, here was a group of taggers all leaping on a bandwagon and in so doing classifying their commonly tagged sites with some very specific and (for taggers) relatively precise terminology.Here is a slide from the PowerPoint presentation of that paper. On the left you see the clusters of taggers, and on the right their tags. The point was that most of those tags could be seen as semantically related to two conceptual clusters–noesis as bandwagon effect. The paper is available in the ISKO Proceedings of the conference at Rome (Richard P. Smiraglia “Perception, Knowledge Organization, and Noetic Affective Social Tagging” pp. 64-7) but here is the abstract:

Knowledge organization can be postulated as existing on a continuum between classificatory activity and perception. Studying perception and its role in the identification of concepts is critical for the advancement of knowledge organization. The purpose of this research is to advance our understanding of the role of perception in knowledge organization systems. We briefly review the role of perception in knowledge organization and some preliminary evidence about affective social tagging, which is seen as a form of everyday classification. We consider how Husserlian phenomenology might be useful for analyzing the role of perception in affective social tagging. Finally, preliminary results of an empirical study are reported.

Because this was for ISKO I was intentionally focussed on the KO issue, which I here stated as a continuum between classificatory activity and perception. I gave a paper on noesis at ISKO in Montréal as well (scroll down, it’s the mailbox paper). I think that we think too often that classification is about putting things in little boxes, and therefore that we think too little about how fuzzy are the boundaries of those boxes. So here is just a glmpse at that issue.

As I said, the referees had sent me to cognitive linguistics, and I found particular resonance in the writing of Ronald Langacker (Langacker, Ronald W. 2005. Dynamicity, fictivity, and scanning: the imaginative basis of logic and linguistic meaning. In Pecher, Diane and Rolf A. Zwaan eds., Grounding cognition : the role of perception and action in memory, language, and thinking. Cambridge : Cambridge Univ. Pr., pp. 164-97). Scanning is the linguistic activity in which a kind of shorthand is used to project a landscape on which perceived activity is taking place; it results in “fictive” or at least unfactual language, but common understanding allows and even encourages this. Here’s a PowerPoint slide from my presentation at CAIS in Montreal in June.

The example is the phrase “my teacher’s books keep getting longer.” What is meant is that each time the teacher writes a book (or, one supposes, even buys a book) it is longer than the last. But that isn’t what was said at all, and obviously the idea that the teacher has a stack of books that is somehow stretching is absurd. It seemed likely that some of the variation in tagging might be due to scanning.

I wanted to complete the statistical analysis of the data and to present a fuller account of the study apart from the philosophical issue of noesis, so I submitted an abstract to CAIS for this year (2010); that abstract was accepted. To my chagrin, instead of the typical complete CAIS-paper, this year someone had decided to allow only what they called “extended abstracts,” which gave one precious little space. Nevertheless, I gave a presentation during the conference, and the “extended abstract” (Smiraglia, “Self-Reflection, Perception, Cognitive Semantics: How Social is Social Tagging?”) is in the proceedings, here: http://www.cais-acsi.ca/proceedings/2010/CAIS055_Smiraglia_Final.pdf.

This was pretty exciting for a couple of reasons. One was that the Globe and Mail got wind of it and kept asking for more text. Unfortunately all I had was the extended abstract, which must not have been enough because I never saw myself quoted. Still, for a moment there I was flirting with the thrill of being reported in the press. As I say often, oh well.

The research itself was exciting enough however. The fictive scanning was there, although once again in small proportions–less than 1% of the total. But more important was the extension of this notion of social classification. It turned out that all of the sites in the study had clusters like those we saw above. In fact, most of the tags were somehow or other associated with the bandwagon effect. There were typically 4 or 5 clusters per site, 2/3 of the tags fell into the clusters, and 1/2 of the tags fell into the two largest clusters. Voila, classification that is social.

I really want to go make a cup of tea but I suppose I should finish with the conclusions, which were:

The taggers collectively are generating a classification with a social basis.

Also, the clusters are not mutually exclusive, demonstrating that a natural classification is not necessarily either hierarchical, or mutually exclusive. But it does remain collectively, potentially, exhaustive.

Warrant becomes a new issue in such a classification, because there is no accountable literary warrant—rather warrant is cultural (as Beghtol predicted).

Those look like some interesting hypotheses for future research to me.

I suppose I should write this up for a journal. But what I really want to do now is look for the same effect on more social social-networking sites.

Two Kinds of Power (originally posted 1-24-2009)

Inspired by encountering quotations from Two Kinds of Power in conference papers last summer I undertook an analysis of the domain defined by those who cite Wilsons famous book. The paper is to be presented at the 2007 conference of the Canadian Association for Information Science/L’Association canadienne des sciences de l’information: Among Patrick Wilson’s most influential books was Two Kinds of Power, which has influenced scholars in information science, and particularly in knowledge organization. Tools of domain analysis are used to analyze the corpus of literature that cites Two kinds of power. Aboutness and relevance are demonstrated keys to this specialization.

The Proceedings are here: http://www.cais-acsi.ca/2007proceedings.htm. I am really quite fascinated by the concept that author co-citation analysis gives us a picture of symbolic interaction. That is, that what we see is how the scholarly community perceives intellectual connections among the co-cited authors. As I mentioned only briefly in the paper, there seem to be clear social networks in the map that focus on the lineage of dissertation advising. For my presentation I added a final slide using this quotation from Two Kinds of Power (p. 132): “Let us imagine a Supreme Bibliographical Council, whose task it was to evaluate the bibliographical situations ….” I decided that’s what we’re looking at here. Marcia Bates is the “chief justice” and there are two parties, one in IR represented by Belkin and Saracevic, and another in KO dominated by Hjorland, with Howard White as the swing vote. Well, it’s a metaphor ….

Fascinating to see Åström’s paper in JASIST 58(7): 947-57, in which he finds informetrics and ISR stable but that user-oriented and experimental IR research have merged into one field–ISR. This is comparable I think, to my finding that “aboutness” was a historical node but has given way to IR and KO. Interesting ….

CYSWIK (originally posted 1-24-2009)

I attended the workshop Can You See What I Know? (http://cyswik.blogspot.com/) presented by the Virtual Knowledge Studio in Amsterdam (http://www.virtualknowledgestudio.nl/). If you have time to watch the videos you can see me eating (tuna-salad sandwich) in the first one, and in the second one I actually get to talk about disturbance as a catalyst for knowledge acquisition. At least, that’s what I meant to say.

CYSWIK was a remarkable two-day event, bringing together artists and scientists and humanists for discussion and learning. It was a socially difficult activity because of the different vocabularies and modes of thought across the domains. But I think we all learned an immense amount, about ourselves especially.

A brainstorm with Charles van der Heuvel of the VKS (http://www.virtualknowledgestudio.nl/staff/charles-van-den-heuvel/) will lead to a collaboration on what we are so far calling an “idea collider.” I hope this will incorporate the ontological footprint technique. Stay tuned for updates.

Posted November 17, 2010 by lazykoblog in interdisciplinarity

Tagged with ,

rant (originally posted 6-19-2009)

not dendogram … “Dendro” from the Greek “dendron” for tree ….

although I was tempted to say: “Dendro” named for its inventor Sir Richard Dendro the famous irish physicist. Let’s see if this gets into Wikipedia.

Posted November 17, 2010 by lazykoblog in domain analysis

Tagged with ,

Music Information Retrieval (originally posted 1-24-2009)

The relatively new domain of Music Information Retrieval or MIR is a rapidly evolving, technology-driven recent entrant on the information retrieval scene. Generated by information scientists, computer scientists, engineers, mathematicians, and musicologists, among others, the domain has contributed new systems for automatic storage and retrieval of music. Mapping the domain is itself a fascinating business. Recently I asked “Music Information Retrieval: An Example of Bates’ Substrate?” in a paper for the Canadian Association for Information Science/L’Association canadienne des sciences de l’information. This is the abstract:

Bates suggested that the intrinsic unity of information science lies in ‘substrate’-the properties of information and its transmission. Music Information Retrieval (MIR), and ISMIR annual conferences offer a rich panoply of intellectual and cultural diversity. We map the evolution of MIR using conference papers from 2000 through 2005. Results indicate tight thematic coherence in the domain around the problems of information retrieval and classification, and the locus of most research within computer science departments.

The paper is available here: http://www.cais-acsi.ca/search.asp?year=2006.

Author co-citation analysis was also revealing: indicat[ing] tight thematic coherence in the domain around the problems of information retrieval and classification, and the locus of most research within computer science departments. Citation practice indicates the habits of a hard science. Author co-citation within the domain is abundant, J. Stephen Downie is clearly the founding focal point, but the domain is very focused, reinforcing the notion of a tightly-packed, emerging and continuously successful domain. ACA data from outside the domain provides an interesting comparison; watch for another paper soon.

UDC (originally posted 1-2-2010)

This post is courtesy of Aida Slavic (“Aida Slavic”<aida@acorweb.net>)

Hi,

The UDC Summary of around 2,000 classes has been online since October 2009 and can now be browsed in ten languages at
http://www.udcc.org/udcsummary/php/index.php (English, German, Dutch, French, Spanish, Russian, Swedish, Croatian, Slovenian, Finnish)

The UDC summary is fully aligned with the UDC MRF 2009 which is going to be released in the following months.This set is made available for free
use under the Creative Commons Attribution Share Alike 3.0 license (CC-BY-SA).

The work is very much ‘in progress’. We are adding language data and updates as we speak and changes will be visible on a daily basis.
Captions in all languages appear first and then scope notes, application notes and example of combinations are added as updates progress.

The effort put into the UDC Summary is entirely voluntary including the programming support, the work of our language editors and translators
for which we are most grateful. Read more at the UDC blog <http://universaldecimalclassification.blogspot.com> or at the UDC Summary webpage.

Contributions and feedback are invited

Kind regards
Aida

**************************************************
Of course, Otlet saw the UDC as the classification that would underpin all of his other ideas. Where some utopians saw brilliant cities shining on hilltops Otlet saw the interweaving of the structure of knowledge and this mechanism that could approach its explanation and yield further insight. And here it comes now.

Posted November 17, 2010 by lazykoblog in classification

Tagged with