Archive for November 2010

CRM as footprints of knowledge (originally posted 1/24/2009)

In the exercise of developing the CIDOC-CRM it became apparent that using the ontology to map information objects would reveal certain patterns of entities, properties, and relationships.
Furthermore, these patterns, when analyzed, reveal essential footprints of information objects. That is, like a genome, a CRM mapping records the essential informative properties of mapped
objects. The region for research here is pure theory. What categories can be observed among mapped information objects? When is a sailor’s deck-log like a terracotta hut urn? I have constituted a research team and with a very small grant from Long Island University we have begun developing techniques for mapping, and a calculus for data-mining the maps in order to generate clusters (or classes) of information objects. We had one poster at the ISKO conference this summer (“Classifying Information Objects: An Exploratory Ontological Excursion,” by Sergey Zherebchevsky, Nicolette Ceo, Michiko Tanaka, David Jank, Richard P. Smiraglia, and Stephen Stead. Poster presented at 10th International ISKO Conference, Montreal, 5-8 August 2008). The poster can be seen here (or have a look at these pdfs from ISKO and ASIST).

mining_asist08

Classify_ISKO10

Posted November 18, 2010 by lazykoblog in cultural heritage, epistemology

Tagged with ,

Bandwagons (originally posted 7-10-2010)

I suspect in the year 2010 the bandwagon is so old a metaphor few remember what they actually were like. Originally it was a wagon that carried a band in a parade. All of the recent meanings of the term derive from this, because the music is the fun thing in the parade that makes spectators want to follow along or climb aboard. Oh well.

As I read study after study of social tagging I began to wonder about the behavior of taggers. Most studies have demonstrated various properties of the tags themselves, and several studies have suggested tagging is some sort of egalitarian indexing-for-the-masses that would be ever so more useful if the taggers would just stick to a thesaurus. But I considered both of those assumptions unlikely. For one thing, if you inhabit a social networking site just enough to watch the tags go by out of the corner of your eye each day you see a surprising number of them that are self-centered expressions (not just “todo” though there is plenty of that, but also “wtf” and so forth). Also, again watching out of the corner of your eye, the really fascinating thing about the tags is the network of associations among them–in other words, what happens if you click on one, and then when you get to that destination click on the first one there, and so on–you’ll not be following any road that a thesaurus would have led you along (stay tuned for a blog entry about my work at VKS with Wikipedia). There was a lot of discussion about the difference between the main tags and the little ones populating the outer corners of those tag clouds as well, and that reminded me of the problem of noesis, which is the ego-act of perceiving through one’s own experience–this is a hallmark of Husserl’s phenomenology.

I designed a study of tags as exploratory, with the purpose of surveying the tags assigned to a random sample of sites in Delicious.com. I wanted to compare what I would find to prior studies to see whether there was any theoretical potential (there was), and then subsequently to analyze the behavior of the taggers to look for noietic behavior. I submitted an abstract to this effect to the 11th International ISKO Conference in Rome, and also I drew my sample all in one day. I based a sample-size calculation on prior studies’ figures about the proportion of affective tags, and then in my enthusiasm drew twice as many cases (sites) as I needed for 95% confidence. I was excited to get my feet wet with this kind of research. I’m glad I drew the sample manually so I could watch the data as I downloaded the sites and their taggers and their tags. But now I know why people use crawlers for this! My abstract was accepted, and along with it came some helpful referee comments, which sent me to the literature of cognitive linguistics. Bear with me, I was on a learning curve here.

For the conference in Rome I wrote a summary paper about the behavior I observed among the taggers. I discovered plenty of noietic behavior, and interestingly enough, although I was able to affirm the proportion of affective tags–the figure from my study fell within the confidence interval of the prediction from prior studies–the surprise was that the noietic tagging was not affective tagging. I also analyzed the entire sample to see what I could learn about co-tagging–in other words, which taggers were tagging together, and here was my first surprise. A substantial core of the taggers were, in fact, all focused on work on the same sites, and their co-tagging was nested in two clusters, which I was able to identify roughly as web designers and programmers (remember, we’re talking about Delicious.com); the web designers’ tags were descriptive and the programmers’ tags were slightly more likely to be affective.

All of this convinced me I had figured it backwards–the noietic behavior was not the weird stuff in the long tail, but rather was the common ego-act perceptions of the tightly-knit group of co-taggers. In other words, here was a group of taggers all leaping on a bandwagon and in so doing classifying their commonly tagged sites with some very specific and (for taggers) relatively precise terminology.Here is a slide from the PowerPoint presentation of that paper. On the left you see the clusters of taggers, and on the right their tags. The point was that most of those tags could be seen as semantically related to two conceptual clusters–noesis as bandwagon effect. The paper is available in the ISKO Proceedings of the conference at Rome (Richard P. Smiraglia “Perception, Knowledge Organization, and Noetic Affective Social Tagging” pp. 64-7) but here is the abstract:

Knowledge organization can be postulated as existing on a continuum between classificatory activity and perception. Studying perception and its role in the identification of concepts is critical for the advancement of knowledge organization. The purpose of this research is to advance our understanding of the role of perception in knowledge organization systems. We briefly review the role of perception in knowledge organization and some preliminary evidence about affective social tagging, which is seen as a form of everyday classification. We consider how Husserlian phenomenology might be useful for analyzing the role of perception in affective social tagging. Finally, preliminary results of an empirical study are reported.

Because this was for ISKO I was intentionally focussed on the KO issue, which I here stated as a continuum between classificatory activity and perception. I gave a paper on noesis at ISKO in Montréal as well (scroll down, it’s the mailbox paper). I think that we think too often that classification is about putting things in little boxes, and therefore that we think too little about how fuzzy are the boundaries of those boxes. So here is just a glmpse at that issue.

As I said, the referees had sent me to cognitive linguistics, and I found particular resonance in the writing of Ronald Langacker (Langacker, Ronald W. 2005. Dynamicity, fictivity, and scanning: the imaginative basis of logic and linguistic meaning. In Pecher, Diane and Rolf A. Zwaan eds., Grounding cognition : the role of perception and action in memory, language, and thinking. Cambridge : Cambridge Univ. Pr., pp. 164-97). Scanning is the linguistic activity in which a kind of shorthand is used to project a landscape on which perceived activity is taking place; it results in “fictive” or at least unfactual language, but common understanding allows and even encourages this. Here’s a PowerPoint slide from my presentation at CAIS in Montreal in June.

The example is the phrase “my teacher’s books keep getting longer.” What is meant is that each time the teacher writes a book (or, one supposes, even buys a book) it is longer than the last. But that isn’t what was said at all, and obviously the idea that the teacher has a stack of books that is somehow stretching is absurd. It seemed likely that some of the variation in tagging might be due to scanning.

I wanted to complete the statistical analysis of the data and to present a fuller account of the study apart from the philosophical issue of noesis, so I submitted an abstract to CAIS for this year (2010); that abstract was accepted. To my chagrin, instead of the typical complete CAIS-paper, this year someone had decided to allow only what they called “extended abstracts,” which gave one precious little space. Nevertheless, I gave a presentation during the conference, and the “extended abstract” (Smiraglia, “Self-Reflection, Perception, Cognitive Semantics: How Social is Social Tagging?”) is in the proceedings, here: http://www.cais-acsi.ca/proceedings/2010/CAIS055_Smiraglia_Final.pdf.

This was pretty exciting for a couple of reasons. One was that the Globe and Mail got wind of it and kept asking for more text. Unfortunately all I had was the extended abstract, which must not have been enough because I never saw myself quoted. Still, for a moment there I was flirting with the thrill of being reported in the press. As I say often, oh well.

The research itself was exciting enough however. The fictive scanning was there, although once again in small proportions–less than 1% of the total. But more important was the extension of this notion of social classification. It turned out that all of the sites in the study had clusters like those we saw above. In fact, most of the tags were somehow or other associated with the bandwagon effect. There were typically 4 or 5 clusters per site, 2/3 of the tags fell into the clusters, and 1/2 of the tags fell into the two largest clusters. Voila, classification that is social.

I really want to go make a cup of tea but I suppose I should finish with the conclusions, which were:

The taggers collectively are generating a classification with a social basis.

Also, the clusters are not mutually exclusive, demonstrating that a natural classification is not necessarily either hierarchical, or mutually exclusive. But it does remain collectively, potentially, exhaustive.

Warrant becomes a new issue in such a classification, because there is no accountable literary warrant—rather warrant is cultural (as Beghtol predicted).

Those look like some interesting hypotheses for future research to me.

I suppose I should write this up for a journal. But what I really want to do now is look for the same effect on more social social-networking sites.

Two Kinds of Power (originally posted 1-24-2009)

Inspired by encountering quotations from Two Kinds of Power in conference papers last summer I undertook an analysis of the domain defined by those who cite Wilsons famous book. The paper is to be presented at the 2007 conference of the Canadian Association for Information Science/L’Association canadienne des sciences de l’information: Among Patrick Wilson’s most influential books was Two Kinds of Power, which has influenced scholars in information science, and particularly in knowledge organization. Tools of domain analysis are used to analyze the corpus of literature that cites Two kinds of power. Aboutness and relevance are demonstrated keys to this specialization.

The Proceedings are here: http://www.cais-acsi.ca/2007proceedings.htm. I am really quite fascinated by the concept that author co-citation analysis gives us a picture of symbolic interaction. That is, that what we see is how the scholarly community perceives intellectual connections among the co-cited authors. As I mentioned only briefly in the paper, there seem to be clear social networks in the map that focus on the lineage of dissertation advising. For my presentation I added a final slide using this quotation from Two Kinds of Power (p. 132): “Let us imagine a Supreme Bibliographical Council, whose task it was to evaluate the bibliographical situations ….” I decided that’s what we’re looking at here. Marcia Bates is the “chief justice” and there are two parties, one in IR represented by Belkin and Saracevic, and another in KO dominated by Hjorland, with Howard White as the swing vote. Well, it’s a metaphor ….

Fascinating to see Åström’s paper in JASIST 58(7): 947-57, in which he finds informetrics and ISR stable but that user-oriented and experimental IR research have merged into one field–ISR. This is comparable I think, to my finding that “aboutness” was a historical node but has given way to IR and KO. Interesting ….

CYSWIK (originally posted 1-24-2009)

I attended the workshop Can You See What I Know? (http://cyswik.blogspot.com/) presented by the Virtual Knowledge Studio in Amsterdam (http://www.virtualknowledgestudio.nl/). If you have time to watch the videos you can see me eating (tuna-salad sandwich) in the first one, and in the second one I actually get to talk about disturbance as a catalyst for knowledge acquisition. At least, that’s what I meant to say.

CYSWIK was a remarkable two-day event, bringing together artists and scientists and humanists for discussion and learning. It was a socially difficult activity because of the different vocabularies and modes of thought across the domains. But I think we all learned an immense amount, about ourselves especially.

A brainstorm with Charles van der Heuvel of the VKS (http://www.virtualknowledgestudio.nl/staff/charles-van-den-heuvel/) will lead to a collaboration on what we are so far calling an “idea collider.” I hope this will incorporate the ontological footprint technique. Stay tuned for updates.

Posted November 17, 2010 by lazykoblog in interdisciplinarity

Tagged with ,

rant (originally posted 6-19-2009)

not dendogram … “Dendro” from the Greek “dendron” for tree ….

although I was tempted to say: “Dendro” named for its inventor Sir Richard Dendro the famous irish physicist. Let’s see if this gets into Wikipedia.

Posted November 17, 2010 by lazykoblog in domain analysis

Tagged with ,

Music Information Retrieval (originally posted 1-24-2009)

The relatively new domain of Music Information Retrieval or MIR is a rapidly evolving, technology-driven recent entrant on the information retrieval scene. Generated by information scientists, computer scientists, engineers, mathematicians, and musicologists, among others, the domain has contributed new systems for automatic storage and retrieval of music. Mapping the domain is itself a fascinating business. Recently I asked “Music Information Retrieval: An Example of Bates’ Substrate?” in a paper for the Canadian Association for Information Science/L’Association canadienne des sciences de l’information. This is the abstract:

Bates suggested that the intrinsic unity of information science lies in ‘substrate’-the properties of information and its transmission. Music Information Retrieval (MIR), and ISMIR annual conferences offer a rich panoply of intellectual and cultural diversity. We map the evolution of MIR using conference papers from 2000 through 2005. Results indicate tight thematic coherence in the domain around the problems of information retrieval and classification, and the locus of most research within computer science departments.

The paper is available here: http://www.cais-acsi.ca/search.asp?year=2006.

Author co-citation analysis was also revealing: indicat[ing] tight thematic coherence in the domain around the problems of information retrieval and classification, and the locus of most research within computer science departments. Citation practice indicates the habits of a hard science. Author co-citation within the domain is abundant, J. Stephen Downie is clearly the founding focal point, but the domain is very focused, reinforcing the notion of a tightly-packed, emerging and continuously successful domain. ACA data from outside the domain provides an interesting comparison; watch for another paper soon.

UDC (originally posted 1-2-2010)

This post is courtesy of Aida Slavic (“Aida Slavic”<aida@acorweb.net>)

Hi,

The UDC Summary of around 2,000 classes has been online since October 2009 and can now be browsed in ten languages at
http://www.udcc.org/udcsummary/php/index.php (English, German, Dutch, French, Spanish, Russian, Swedish, Croatian, Slovenian, Finnish)

The UDC summary is fully aligned with the UDC MRF 2009 which is going to be released in the following months.This set is made available for free
use under the Creative Commons Attribution Share Alike 3.0 license (CC-BY-SA).

The work is very much ‘in progress’. We are adding language data and updates as we speak and changes will be visible on a daily basis.
Captions in all languages appear first and then scope notes, application notes and example of combinations are added as updates progress.

The effort put into the UDC Summary is entirely voluntary including the programming support, the work of our language editors and translators
for which we are most grateful. Read more at the UDC blog <http://universaldecimalclassification.blogspot.com> or at the UDC Summary webpage.

Contributions and feedback are invited

Kind regards
Aida

**************************************************
Of course, Otlet saw the UDC as the classification that would underpin all of his other ideas. Where some utopians saw brilliant cities shining on hilltops Otlet saw the interweaving of the structure of knowledge and this mechanism that could approach its explanation and yield further insight. And here it comes now.

Posted November 17, 2010 by lazykoblog in classification

Tagged with

Paragraphs? (originally posted 7-16-2009)

What has happened to the idea of a paragraph? That clever little invention that parses narrative into semantically related clusters, giving breathlessness to the expression of a part of an idea, and yet, by the pause it introduces at its end, allows the mind breathing room while reading a text–the paragraph is in dire peril my friends. I have grown weary of marking student papers “no 1-sentence paragraphs!!!” and yet now as editor of a so-called scientific journal I find myself deluged with these devices too. Okay, so many of them are in manuscripts that have evolved from doctoral dissertations. Still, someone’s dissertation advisor should have said “no 1-sentence paragraphs!!!” I wrack my brain trying to understand how this wonderful device, taught to most of us in third grade (or maybe even earlier) could so easily depart from academe. I suppose a lot of it has to do with word-processors, which make whatever drivel one manages to generate after staring at the monitor for hours look like elegant printed text. Rather than actually expressing an idea, we find ourselves instead filling justified space between paragraph marks. Oh my … Well, look here folks; a paragraph should be several sentences long. It should begin with a topic sentence, usually the first, which is a sort of exposition, or thesis statement. It should be followed by all of the evidence about that topic (properly supported with references, of course). And then, in good sonata form (see Music Appreciation 101) there should be development, in which you (the author, remember?) add value to the evidence by providing your own synthesis about what it means. And then a paragraph should conclude definitively.

Which reminds me, whatever happened to the literature review (stay tuned) that isn’t just a litany of  “he-said, she-wrote”?

Posted November 17, 2010 by lazykoblog in writing

Tagged with

19th century lenses (originally posted 3-12-2010)

I guess I’m fairly often overheard saying I became a historian because I got so old I remember everything. It’s a little bit true. After all, I learned to catalog using AACR1 (blue cover) writing on 3×5 cards; then graduated to typing on 3×5 cards, that would go off to another division someplace to be reproduced. I then became a librarian at Illinois where we had legions of card typists who did that part; we the catalogers would type on worksheets, which gave us more space for our reviewers to write in red ink all over our cataloging. “No double punctuation!” When major changes came–such as the shift from “negroes” to both “Blacks” and “African-Americans” in LCSH, we had to pull thousands of cards, and all of the main entry cards that went with them, erase, and re-type and refile them. We had truly armies of card filers in every division of the library whose job was all day long just to file the thousands of cards we produced every day.

That was the information society as people in cataloging knew it (in part) in the 1970s. All of that has passed away into the dimness of memory. And yet what a feat of engineering it took for those armies of people, all of whom understood the physics of the syndetic structure of the catalog, to maintain bibliographic control.

History is useful of course, not just for telling the story of the past, but for understanding the present and the future as well. We are situated historically in every moment, and the better we understand the circumstances of that situation the better job we will do pushing society and our own domain forward.

I am doing a lot of work right now on 21st century phenomena with lenses produced in the late 19th century. Their usefulness became clear only once technology brought us to this point. Yet these thinkers–specifically Otlet, Peirce, and Husserl are the folks I’m working with at the moment–saw clearly how the problems that engaged them were historically situated. Well enough that when the time came we discovered the lenses they’d provided.

Posted November 17, 2010 by lazykoblog in phenomenology

Tagged with , , ,

Film music (originally posted 4-1-2009)

About fifteen yeras ago I was asked to consult on a project called the Union Catalog of Motion Picture Music. The project has not advanced, although the research it was invented to serve has done so nicely. I’m once again visiting with these folks at a symposium at USC concerning musicological film studies.

There seem to be two essential problems for this domain–the musicological study of film music. First, that sources are missing, lost, dispersed, or non-existent. The second is that the sources are not of a traditional sort. This is to say, there probably are no definitive full scores, and of what does exist, much is in private or industry hands or has been lost. So here are two pdf’s–the symposium program, and a pdf of my PowerPoint slides (with references added at the end).

Posted November 17, 2010 by lazykoblog in domain analysis

Tagged with