星期一, 9月 12, 2005

Zebra Searching

Hi everyone,

In case you haven"t been following the IRC logs we"ve been discussing
Zebra as a potential searching engine. From Indexdata"s website:

Zebra is a high-performance, general-purpose structured text indexing and retrieval engine.
It reads structured records in a variety of input formats (eg. email, XML, MARC) and allows
access to them through exact boolean search expressions and relevance-ranked free-text
queries.

Zebra supports large databases (more than ten gigabytes of data, tens of millions of
records). It supports incremental, safe database updates on live systems. You can access
data stored in Zebra using a variety of Index Data tools (eg. YAZ and PHP/YAZ) as well as
commercial and freeware Z39.50 clients and toolkits.

http://indexdata.dk/zebra

I"ve setup a zebra test site running on LibLime"s server. It currently
has access to three Zebra datasets, Nelsonville"s 150K records, LibLime"s
5 million records (recently donated by sanspach), and Paul Poulain"s 13K
records. (Paul is still working out some issues with indexing unimarc
records so stay tuned for that one to work).

http://liblime.com/zap/advanced.html

Note that the search and retrieval is done via the Z39.50 protocol with
the server that ships with Zebra and both the index and the server can
be customized based on the kinds of searches you want to perform (the
above site is just a proof of concept) -- we"d have support for relevence
ranking, stemming, the whole gambit of searching technologies.

In all my tests searches are returned in under a second.

If we decide to work with Zebra we will need to decide what to do with
non-marc libraries. Should we develop an export utility that will allow
Zebra to index the records (in say, XML format)? Should we use the Koha
tables to create a basic MARC record for use with Zebra? Should we leave
the Koha 1.x searching methods unchanged and only use Zebra for
MARC libraries? Also, what should we do with the existing marc_*_table
tables?

So ... it"s clearly time to schedule a "Koha 2.4 Searching Group Meeting" on
IRC. I"d like to pick a time when everyone can be represented. how
is Thursday, June 23 at 9:00 GMT? Here"s the time in your area:
http://tinyurl.com/925c8

Please let me know on-list if you will not be able to attend and what
time you can attend.

Comments, suggestions, concerns?

沒有留言: