There has been quite a saga unfolding slowly over the last couple months in the #code4lib IRC channel regarding the use of Amazon’s OpenSearch technology and it’s use for searching across cooperating libraries’ catalogs. Everyone there seems to agree, on a basic level, that OpenSearch is a useful search interface and that it benefits everyone to expose our catalogs using it.
Recently, however, the discussion has spilled over onto the web4lib mailing list (of which I am not yet a member, but which I should perhaps join). From reading the archives I gather that there are some in the library community that are not so keen on the idea of an OpenSearch interface being used as a common entry point into a library catalog.
The main argument seems to go something like, “Look, we have this perfectly good search interface in SRU/SRW. Why would we want to spend time creating yet another Web Service interface on our catalog? Everyone should just use the superior SRW protocol.” While I can see the benefits of SRW on a purely technical level, there are some really big problems with this attitude, the least of which is nobody outside the library community has any interest at all in implementing SRW. My initial thought is that it basically comes down to the fact that, other than bibliographic and citation services, there is no need, what so ever, for a protocol as complicated as SRW. I’m not talking about the complexity of using of an SRW source, but the implementation complexity.
Ross Singer makes another great point, supported by the frustration 😉 of Rob Casson, about the complexity of using an SRW source. The problem is that, because any SRW source can supply its results in one of many different formats, and that not all sources support the same output formats, the job of writing client software is extremely complex.
Let’s say, hypothetically, that Google decided to create an SRW interface for searching the web. They may decided that RSS 2.0 is the best way to deliver the results of a search. Now, a library that wants to expose it’s catalog via SRW comes along and decides that it will offer MARCXML and Dublin Core output formats. An online publisher then creates an SRW interface to their journal catalog and settles on MODS for it’s output format. Now any client software that wants to allow searching of all three sources must support not one, not two, and to completely cover it’s bases not three, but four different metadata formats.
Is this example a bit over the top? I don’t think so. There are far more than four metadata standards in use, and every SRW server implementation need only support one in order to “work”. If the dream of SRW everywhere that some on the web4lib mailing list are espousing were to come true then every single one of those (potentially infinite number of) metadata formats would need to be supported. Could this be done with plugins or some such configurable, modular setup? Sure, but all I want, as a user, is to search for “Harry Potter”. I don’t want to have to download some huge search client, or wait for my search provider to add the latest format-x plugin to some server-side app. I want to point my browser at a web page and get results. Now.
So that’s my view of the current problem of cross-source searching in the SRW world. In my view (and Ross’ as far as I can tell), what we really need is a “lowest common denominator” search protocol. We need something that is super easy to implement on both the client and server side, something that can be extended when you really want to put the extra time into both the server and client, but that will degrade gracefully in the presence of a very simple endpoint, be it the server or the client. We need something that will provide enough information to the user so that they can decide on the usefulness of a particular result, and that helps keep all result sets more or less on the same footing. We need a title, a short description, and a link to look at more detail.
We need OpenSearch.
The main thing I’ve described there is not a way of finding, it is a way of searching. I’ll say that again, because it’s the crux of the argument that the SRW proponents don’t seem to get:
OpenSearch == search
SRW == find
What I mean by that is OpenSearch is, to use Ross’ words, a discovery mechanism. It allows a site to quickly expose vast amounts of data to end users in a detailed enough format that it elicits click-throughs. It is a way for end users to search a variety of sources, and source types, and to quickly grab the useful bits from each source, and to dig deeper for more detail when they find something of interest.
More to the point, though, since everyone must implement their opensearch results in exactly the same way every OpenSearch source is guaranteed to work with every OpenSearch client. Instant interoperability.
SRW, on the other hand, is about finding specific instances of one type of data. A bibliographic record. An online journal citation. Case law note. But the point is that you must have a good idea of what you want to begin with in order to make good use of an SRW source, because such sources are very specialized. And such sources are so specialized because they are not as easy (relatively speaking) to implement as other, simpler protocols.
Now, I’ll make one last very important point: I like SRW. I think SRW is a great protocol for retrieving information from a very specific source type, and we will be building an SRW server into Open-ILS. Not because we have to but because it is very useful for a variety of common library oriented tasks. It will be a great adjunct to our (as yet unimplemented) Z39.50 server, and in fact will probably sit directly on top of it. But I won’t expect the Valdosta-Lownds County Chamber of Commerce to create a portal for searching their local libraries using SRW. I wouldn’t be surprised, however, if they added an OpenSearch portal for searching their libraries, government services, public school web sites, and even a local restaurant guide. Why? Because it can be done with a single DHTML web page.
Don’t believe me? http://gapines.org/opensearchdemo.html
DeWitt Clinton says
Wow — that’s a great summary of the issues at hand. And trust me (as the author of the OpenSearch spec), I am very aware of the tensions.
OpenSearch is designed to be exactly that — a least common denominator. Or rather, it is designed to be simple to implement, simple to use, and rely on familiar technologies (such as RSS). Can it address all of the scenarios that SRW/SRU can? Of course not! As you say, those are much more powerful tools for certain tasks.
But OpenSearch is pretty good at solving one particular problem in search — how do you return a set of search results in a common format that everyone can read.
Will OpenSearch evolve into more over time? Of course it will. In fact, I recently posted about what is coming up in OpenSearch 1.1 over at the A9 Developer Blog. This release starts to address some of these concerns, but since it will preserve backward compatability, it clearly won’t address all of them.
And while I haven’t written yet about the goals for OpenSearch 2.0, I can tell you that it will focus on three areas: One, giving semantic meaning and structure to the description documents. Right now they are simple key-value XML pairs. This could be greatly improved. Two, potentially moving the OpenSearch result format to something with more semantic meaning and structure (perhaps using Atom 1.0 instead of RSS 2.0, etc.) And three, making sure that the query side of OpenSearch is “pluggable” — i.e., giving the search engine the power to say what kinds of query formats it would like.
I encourage everyone to keep letting me know what matters to you. I try to read blog postings that discuss OpenSearch, but I obviously might miss something here and there. And when I write about the plans for OpenSearch 1.1 and 2.0, please comment and let us know what you think can be done better. It’s definitely a community effort!
Cheers,
-DeWitt