5. MARC

YAZ provides a fast utility for working with MARC records. Early versions of the MARC utility only allowed decoding of ISO2709. Today the utility may both encode - and decode to a varity of formats.

    #include <yaz/marcdisp.h>

    /* create handler */
    yaz_marc_t yaz_marc_create(void);
    /* destroy */
    void yaz_marc_destroy(yaz_marc_t mt);

    /* set XML mode YAZ_MARC_LINE, YAZ_MARC_SIMPLEXML, ... */
    void yaz_marc_xml(yaz_marc_t mt, int xmlmode);
    #define YAZ_MARC_LINE      0
    #define YAZ_MARC_SIMPLEXML 1
    #define YAZ_MARC_OAIMARC   2
    #define YAZ_MARC_MARCXML   3
    #define YAZ_MARC_ISO2709   4
    #define YAZ_MARC_XCHANGE   5
    #define YAZ_MARC_CHECK     6
    #define YAZ_MARC_TURBOMARC 7
    #define YAZ_MARC_JSON      8

    /* supply iconv handle for character set conversion .. */
    void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd);

    /* set debug level, 0=none, 1=more, 2=even more, .. */
    void yaz_marc_debug(yaz_marc_t mt, int level);

    /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
    On success, result in *result with size *rsize. */
    int yaz_marc_decode_buf(yaz_marc_t mt, const char *buf, int bsize,
                            const char **result, size_t *rsize);

    /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
       On success, result in WRBUF */
    int yaz_marc_decode_wrbuf(yaz_marc_t mt, const char *buf,
                              int bsize, WRBUF wrbuf);

   

Note

The synopsis is just a basic subset of all functionality. Refer to the actual header file marcdisp.h for details.

A MARC conversion handle must be created by using yaz_marc_create and destroyed by calling yaz_marc_destroy.

All other function operate on a yaz_marc_t handle. The output is specified by a call to yaz_marc_xml. The xmlmode must be one of

YAZ_MARC_LINE

A simple line-by-line format suitable for display but not recommend for further (machine) processing.

YAZ_MARC_MARCXML

MARCXML.

YAZ_MARC_ISO2709

ISO2709 (sometimes just referred to as "MARC").

YAZ_MARC_XCHANGE

MarcXchange.

YAZ_MARC_CHECK

Pseudo format for validation only. Does not generate any real output except diagnostics.

YAZ_MARC_TURBOMARC

XML format with same semantics as MARCXML but more compact and geared towards fast processing with XSLT. Refer to Section 5.1, “TurboMARC” for more information.

YAZ_MARC_JSON

MARC-in_JSON format.

The actual conversion functions are yaz_marc_decode_buf and yaz_marc_decode_wrbuf which decodes and encodes a MARC record. The former function operates on simple buffers, the stores the resulting record in a WRBUF handle (WRBUF is a simple string type).

Example 7.18. Display of MARC record

The following program snippet illustrates how the MARC API may be used to convert a MARC record to the line-by-line format:

      void print_marc(const char *marc_buf, int marc_buf_size)
      {
         char *result;      /* for result buf */
         size_t result_len;    /* for size of result */
         yaz_marc_t mt = yaz_marc_create();
         yaz_marc_xml(mt, YAZ_MARC_LINE);
         yaz_marc_decode_buf(mt, marc_buf, marc_buf_size,
                             &result, &result_len);
         fwrite(result, result_len, 1, stdout);
         yaz_marc_destroy(mt);  /* note that result is now freed... */
      }

     


5.1. TurboMARC

TurboMARC is yet another XML encoding of a MARC record. The format was designed for fast processing with XSLT.

Applications like Pazpar2 uses XSLT to convert an XML encoded MARC record to an internal representation. This conversion mostly check the tag of a MARC field to determine the basic rules in the conversion. This check is costly when that is tag is encoded as an attribute in MARCXML. By having the tag value as the element instead, makes processing many times faster (at least for Libxslt).

TurboMARC is encoded as follows:

  • Record elements is part of namespace "http://www.indexdata.com/turbomarc".

  • A record is enclosed in element r.

  • A collection of records is enclosed in element collection.

  • The leader is encoded as element l with the leader content as its (text) value.

  • A control field is encoded as element c concatenated with the tag value of the control field if the tag value matches the regular expression [a-zA-Z0-9]*. If the tag value do not match the regular expression [a-zA-Z0-9]* the control field is encoded as element c and attribute code will hold the tag value. This rule ensure that in the rare cases where a tag value might result in a non-wellformed XML YAZ encode it as a coded attribute (as in MARCXML).

    The control field content is the the text value of this element. Indicators are encoded as attribute names i1, i2, etc.. and corresponding values for each indicator.

  • A data field is encoded as element d concatenated with the tag value of the data field or using the attribute code as described in the rules for control fields. The children of the data field element is subfield elements. Each subfield element is encoded as s concatenated with the sub field code. The text of the subfield element is the contents of the subfield. Indicators are encoded as attributes for the data field element similar to the encoding for control fields.