The EZID API, Version 2

EZID (easy-eye-dee) provides an easy way to obtain, describe, and manage long-term identifiers for digital objects. It can be accessed via a web User Interface (UI) and a web Application Programming Interface (API). A few account management functions can be accessed from the UI only, but otherwise all of EZID's functionality is available through the API. This document describes Version 2 of the EZID API.

Please send mail to the EZID discussion list (open to EZID customers only) to ask questions or report problems:

ezid-l@listserv.ucop.edu

Contents

Framework

The EZID API is available from the base URL

https://ezid.cdlib.org

Interaction is via REST-style HTTP web services. The API's central design principle is to treat an identifier as a kind of web resource. Specifically, identifier foo is represented as a resource at URL https://ezid.cdlib.org/id/foo. In this document we will refer to this URL as the identifier's "EZID URL." A client manipulates an identifier by performing HTTP operations on its EZID URL: PUT to create the identifier, GET to view it, and POST to update it.

An identifier's EZID URL should not be confused with the identifier's "URL form." The former is used to manipulate the identifier, whereas the latter is used to express the identifier as an embeddable hyperlink that redirects to the identifier's target URL. For DOI identifiers:

Identifier doi:10.nnnn/suffix
URL form https://doi.org/10.nnnn/suffix
EZID URL https://ezid.cdlib.org/id/doi:10.nnnn/suffix

For ARK identifiers:

Identifier ark:/nnnnn/suffix
URL form http://n2t.net/ark:/nnnnn/suffix
EZID URL https://ezid.cdlib.org/id/ark:/nnnnn/suffix

For UUID identifiers:

Identifier uuid:suffix
URL form http://n2t.net/uuid:suffix
EZID URL https://ezid.cdlib.org/id/uuid:suffix

API vs. UI

The EZID UI and API share some URLs (the base URL is the same for both) but their behavior is different. For example, in the API a GET operation on an EZID URL returns client-parseable metadata (see Operation: get identifier metadata below), but in the UI it returns an HTML page.

To distinguish between the two interfaces EZID employs HTTP content negotiation . If a request comes in with an HTTP Accept header that expresses a preference for any form of HTML or XML, the UI is invoked; otherwise, the API is invoked. A preference for the API can be made explicit by omitting any Accept header, or setting the Accept header to something like "text/plain". If using Java, it will probably be necessary to override the default Accept header Java sends as follows:

import java.net.*;

URL u = new URL("https://ezid.cdlib.org/...");
URLConnection c = u.openConnection();
c.setRequestProperty("Accept", "text/plain");
c.connect();

Authentication

Most requests require authentication. The EZID API supports two methods of authentication:

  1. HTTP Basic authentication. With this method, the client supplies HTTP Basic authentication credentials on every request. The authentication realm is "EZID". For example, credentials can be added manually in Python as follows:

    import base64, urllib2
    r = urllib.request.Request("https://ezid.cdlib.org/...")
    r.add_header("Authorization", "Basic " + base64.b64encode("username:password"))
    

    But most programming libraries provide higher-level support for authentication. For example, Python provides HTTPBasicAuthHandler:

    import urllib2
    h = urllib.request.HTTPBasicAuthHandler()
    h.add_password("EZID", "https://ezid.cdlib.org/", "username", "password")
    o = urllib.request.build_opener(h)
    o.open("https://ezid.cdlib.org/...")
    

    The downside of using higher-level authentication mechanisms is that they often do not supply credentials initially, but only in response to a challenge from EZID, thus doubling the number of HTTP transactions.

    To manually provide credentials in Java, using Apache Commons Codec to do the Base64 encoding:

    import java.net.*;
    import org.apache.commons.codec.binary.*;
    
    URL u = new URL("https://ezid.cdlib.org/...);
    URLConnection c = u.openConnection();
    c.setRequestProperty("Accept", "text/plain");
    c.setRequestProperty("Authorization", "Basic " +
      new String(Base64.encodeBase64("username:password".getBytes())));
    c.connect();
    

    Java also provides an Authenticator class:

    import java.net.*;
    
    class MyAuthenticator extends Authenticator {
      protected PasswordAuthentication getPasswordAuthentication () {
        return new PasswordAuthentication("username", "password".toCharArray());
      }
    }
    
    Authenticator.setDefault(new MyAuthenticator());
    
  2. One-time login. Perform a GET operation on https://ezid.cdlib.org/login and supply HTTP Basic credentials as above. In response, EZID returns a session cookie. Subsequent requests can be made without authentication by supplying the session cookie in HTTP Cookie headers. Here's an example interaction:

    ⇒ GET /login HTTP/1.1
    ⇒ Host: ezid.cdlib.org
    ⇒ Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=
    
    ⇐ HTTP/1.1 200 OK
    ⇐ Set-Cookie: sessionid=403a1ea3b03b74f663c1cd7fc877f495; expires...
    ⇐ Content-Type: text/plain; charset=UTF-8
    ⇐ Content-Length: 32
    ⇐
    ⇐ success: session cookie returned
    

    In Python, cookies can be managed using cookielib , or manually captured and set using code similar to the following:

    import urllib2
    c = urllib.request.urlopen("https://ezid.cdlib.org/login")
    cookie = c.headers["Set-Cookie"].split(";")[0]
    ...
    r = urllib.request.Request("https://ezid.cdlib.org/...")
    r.add_header("Cookie", cookie)
    

    In Java, cookies can be manually captured and set using code analogous to the Python code above or, in Java 1.6 and newer, CookieManager can be used to manage cookies.

    Perform a GET operation on https://ezid.cdlib.org/logout to invalidate a session.

If authentication is required and credentials are either missing or invalid, EZID returns a 401 HTTP status code and the status line "error: unauthorized" (see Error reporting below). If authentication is successful but the request is still not authorized, EZID returns a 403 HTTP status code and the status line "error: forbidden".

Request & response bodies

Request and response bodies are used to transmit identifier metadata. The HTTP content type for all bodies is "text/plain" using UTF-8 charset encoding. In request bodies, if no charset encoding is declared in the HTTP Content-Type header, it is assumed to be UTF-8.

EZID's data model for metadata is a dictionary of element name/value pairs. The dictionary is single-valued: an element name may not be repeated. Names and values are strings. Leading and trailing whitespace in names and values is not significant. Neither element names nor element values may be empty. (When updating an identifier, an uploaded empty value is treated as a command to delete the element entirely.)

Metadata dictionaries are serialized using a subset of A Name-Value Language (ANVL) rules:

For example:

who: Proust, Marcel
what: Remembrance of Things Past
when: 1922

In addition, two ANVL features may be used when uploading metadata to EZID (but clients can safely assume that EZID will never use these features when returning metadata):

For example:

# The following two elements are identical:
who: Proust,
  Marcel
who: Proust, Marcel

Care must be taken to escape structural characters that appear in element names and values, specifically, line terminators (both newlines ("\n", U+000A) and carriage returns ("\r", U+000D)) and, in element names, colons (":", U+003A). EZID employs percent-encoding as the escaping mechanism, and thus percent signs ("%", U+0025) must be escaped as well. In Python, a dictionary of Unicode metadata element names and values, metadata, is serialized into a UTF-8 encoded string, anvl, with the following code:

import re

def escape (s):
  return re.sub("[%:\r\n]", lambda c: "%%%02X" % ord(c.group(0)), s)

anvl = "\n".join("%s: %s" % (escape(name), escape(value)) for name,
  value in metadata.items()).encode("UTF-8")

Conversely, to parse a UTF-8 encoded string, anvl, producing a dictionary, metadata:

import re

def unescape (s):
  return re.sub("%([0-9A-Fa-f][0-9A-Fa-f])",
    lambda m: chr(int(m.group(1), 16)), s)

metadata = dict(tuple(unescape(v).strip() for v in l.split(":", 1)) \
  for l in anvl.decode("UTF-8").splitlines())

In Java, to serialize a HashMap of metadata element names and values, metadata, into an ANVL-formatted Unicode string, anvl:

import java.util.*;

String escape (String s) {
  return s.replace("%", "%25").replace("\n", "%0A").
    replace("\r", "%0D").replace(":", "%3A");
}

Iterator<Map.Entry<String, String>> i = metadata.entrySet().iterator();
StringBuffer b = new StringBuffer();
while (i.hasNext()) {
  Map.Entry<String, String> e = i.next();
  b.append(escape(e.getKey()) + ": " + escape(e.getValue()) + "\n");
}
String anvl = b.toString();

And conversely, to parse a Unicode ANVL-formatted string, anvl, producing a HashMap, metadata:

import java.util.*;

String unescape (String s) {
  StringBuffer b = new StringBuffer();
  int i;
  while ((i = s.indexOf("%")) >= 0) {
    b.append(s.substring(0, i));
    b.append((char) Integer.parseInt(s.substring(i+1, i+3), 16));
    s = s.substring(i+3);
  }
  b.append(s);
  return b.toString();
}

HashMap<String, String> metadata = new HashMap<String, String>();
for (String l : anvl.split("[\\r\\n]+")) {
  String[] kv = l.split(":", 2);
  metadata.put(unescape(kv[0]).trim(), unescape(kv[1]).trim());
}

The first line of an EZID response body is a status indicator consisting of "success" or "error", followed by a colon, followed by additional information. Two examples:

success: ark:/99999/fk4test
error: bad request - no such identifier

Error reporting

An error is indicated by both an HTTP status code and an error status line of the form "error: reason". For example:

⇒ GET /id/ark:/99999/bogus HTTP/1.1
⇒ Host: ezid.cdlib.org

⇐ HTTP/1.1 400 BAD REQUEST
⇐ Content-Type: text/plain; charset=UTF-8
⇐ Content-Length: 39
⇐
⇐ error: bad request - no such identifier

Some programming libraries make it a little difficult to read the content following an error status code. For example, from Java, it is necessary to explicitly switch between the input and error streams based on the status code:

java.net.HttpURLConnection c;
java.io.InputStream s;
...
if (c.getResponseCode() < 400) {
  s = c.getInputStream();
} else {
  s = c.getErrorStream();
}
// read from s...

Operation: get identifier metadata

Metadata can be retrieved for any existing identifier; no authentication is required. Simply issue a GET request to the identifier's EZID URL. Here is a sample interaction:

⇒ GET /id/ark:/99999/fk4cz3dh0 HTTP/1.1
⇒ Host: ezid.cdlib.org

⇐ HTTP/1.1 200 OK
⇐ Content-Type: text/plain; charset=UTF-8
⇐ Content-Length: 208
⇐
⇐ success: ark:/99999/fk4cz3dh0
⇐ _created: 1300812337
⇐ _updated: 1300913550
⇐ _target: http://www.gutenberg.org/ebooks/7178
⇐ _profile: erc
⇐ erc.who: Proust, Marcel
⇐ erc.what: Remembrance of Things Past
⇐ erc.when: 1922

The first line of the response body is a status line. Assuming success (see Error reporting above), the remainder of the status line echoes the canonical form of the requested identifier.

The remaining lines are metadata element name/value pairs serialized per ANVL rules; see Request & response bodies above. The order of elements is undefined. Element names beginning with an underscore ("_", U+005F) are reserved for use by EZID; their meanings are described in Internal metadata below. Some elements may be drawn from citation metadata standards; see Metadata profiles below.

EZID also supports a more flexible identifier lookup operation; see Suffix passthrough / prefix matching below.

Operation: create identifier

An identifier can be "created" by sending a PUT request to the identifier's EZID URL. Here, identifier creation means establishing a record of the identifier in EZID (to be successful, no such record can already exist). Authentication is required, and the user must have permission to create identifiers in the namespace (or "shoulder") named by the identifier's prefix. Users can view the namespaces available to them by visiting the EZID UI and navigating to the Create ID tab. For example, if the user has permission to create identifiers in the general EZID ARK (ark:/13030/c7) namespace, then the user may create identifiers beginning with "ark:/13030/c7".

A request body is optional; if present, it defines the identifier's starting metadata. There are no restrictions on what metadata elements can be submitted, but a convention has been established for naming metadata elements, and EZID has built-in support for certain sets of metadata elements; see Metadata profiles below. A few of the internal EZID metadata elements may be set; see Internal metadata below.

Here's a sample interaction creating an ARK identifier:

⇒ PUT /id/ark:/99999/fk4test HTTP/1.1
⇒ Host: ezid.cdlib.org
⇒ Content-Type: text/plain; charset=UTF-8
⇒ Content-Length: 30
⇒
⇒ _target: http://www.cdlib.org/

⇐ HTTP/1.1 201 CREATED
⇐ Content-Type: text/plain; charset=UTF-8
⇐ Content-Length: 27
⇐
⇐ success: ark:/99999/fk4test

The return is a status line. The normalized form of the identifier is returned as shown above, but if a DOI was created, the status line also includes, separated by a pipe character ("|", U+007C), the identifier's "shadow ARK" (an ARK identifier that is an alias for the created identifier; deprecated). Note that different identifier schemes have different normalization rules (e.g., DOIs are normalized to all uppercase letters). Here's a sample interaction creating a DOI identifier:

⇒ PUT /id/doi:10.9999/test HTTP/1.1
⇒ Host: ezid.cdlib.org

⇐ HTTP/1.1 201 CREATED
⇐ Content-Type: text/plain; charset=UTF-8
⇐ Content-Length: 43
⇐
⇐ success: doi:10.9999/TEST | ark:/b9999/test

Operation: mint identifier

Minting an identifier is the same as creating an identifier, but instead of supplying a complete identifier, the client specifies only a namespace (or "shoulder") that forms the identifier's prefix, and EZID generates an opaque, random string for the identifier's suffix. An identifier can be minted by sending a POST request to the URL https://ezid.cdlib.org/shoulder/shoulder where shoulder is the desired namespace. For example:

⇒ POST /shoulder/ark:/13030/c7 HTTP/1.1
⇒ Host: ezid.cdlib.org
⇒ Content-Type: text/plain; charset=UTF-8
⇒ Content-Length: 30
⇒
⇒ _target: http://www.cdlib.org/

⇐ HTTP/1.1 201 CREATED
⇐ Content-Type: text/plain; charset=UTF-8
⇐ Content-Length: 29
⇐
⇐ success: ark:/13030/c79cz3dh0

Aside from specifying a complete identifier versus specifying a shoulder only, the create and mint operations operate identically. Authentication is required to mint an identifier; namespace permission is required; and permissions can be viewed in the EZID UI under the Create ID tab. The request and response bodies are identical.

EZID automatically embeds the newly-minted identifier in certain types of uploaded metadata. See Metadata profiles below for when this is performed. Additionally, EZID replaces all occurrences of the string "${identifier}" in the target URL with the newly-minted identifier.

Operation: update identifier

An identifier's metadata can be updated by sending a POST request to the identifier's EZID URL. Authentication is required; only the identifier's owner and certain other users may update the identifier (see Ownership model below).

Metadata elements are operated on individually. If the identifier already has a value for a metadata element included in the request body, the value is overwritten, otherwise the element and its value are added. Only a few of the reserved EZID metadata elements may be updated; see Internal metadata below. Here's a sample interaction:

⇒ POST /id/ark:/99999/fk4cz3dh0 HTTP/1.1
⇒ Host: ezid.cdlib.org
⇒ Content-Type: text/plain; charset=UTF-8
⇒ Content-Length: 30
⇒
⇒ _target: http://www.cdlib.org/

⇐ HTTP/1.1 200 OK
⇐ Content-Type: text/plain; charset=UTF-8
⇐ Content-Length: 29
⇐
⇐ success: ark:/99999/fk4cz3dh0

The return is a status line. Assuming success (see Error reporting above), the remainder of the status line echoes the canonical form of the identifier in question.

To delete a metadata element, set its value to the empty string.

Operation: create or update identifier

An identifier can be created or updated in one interaction; the specific operation performed will depend on whether the identifier already exists or not. To do so, issue a create operation as described under Operation: create identifier above, but add an update_if_exists=yes URL query parameter to the PUT request. EZID returns a 201 HTTP status code if the identifier was created or a 200 HTTP status code if the identifier already existed and was successfully updated. The response body is a status line as described previously. Here's a sample request:

⇒ PUT /id/ark:/99999/fk4test?update_if_exists=yes HTTP/1.1
⇒ Host: ezid.cdlib.org
⇒ Content-Type: text/plain; charset=UTF-8
⇒ Content-Length: 30
⇒
⇒ _target: http://www.cdlib.org/

Operation: delete identifier

An identifier that has only been reserved can be deleted by sending a DELETE request to the identifier's EZID URL. We emphasize that only reserved identifiers may be deleted; see Identifier status below. Authentication is required; only the identifier's owner and certain other users may delete the identifier (see Ownership model below).

Here's a sample interaction:

⇒ DELETE /id/ark:/99999/fk4cz3dh0 HTTP/1.1
⇒ Host: ezid.cdlib.org

⇐ HTTP/1.1 200 OK
⇐ Content-Type: text/plain; charset=UTF-8
⇐ Content-Length: 29
⇐
⇐ success: ark:/99999/fk4cz3dh0

The return is a status line. Assuming success (see Error reporting above), the remainder of the status line echoes the canonical form of the identifier just deleted.

Suffix passthrough / prefix matching

The N2T resolver—the principal resolver for ARK identifiers—supports "suffix passthrough," a capability that allows an identifier to be resolved even if it has not been explicitly registered, so long as some prefix of the identifier has. In such a case, N2T locates the longest matching prefix (the "root" identifier) and appends the extra characters in the supplied identifier (the "suffix") to the root identifier's target URL before redirecting. For example, if identifier ark:/99999/fk4/root has been registered with EZID and has target URL http://www.cdlib.org, then N2T resolves ark:/99999/fk4/root/andmore to http://www.cdlib.org/andmore. The capability is so-named because the suffix is effectively "passed through" to the receiving server.

EZID supports a similar capability. If a request to view identifier metadata (see Operation: get identifier metadata above) is accompanied by a prefix_match=yes URL query parameter, then EZID returns metadata for the longest matching identifier (if there is one). If an identifier other than the one requested is returned, the status line includes a note to that effect. Here is a sample interaction that continues the previous example:

⇒ GET /id/ark:/99999/fk4/root/andmore?prefix_match=yes HTTP/1.1
⇒ Host: ezid.cdlib.org

⇐ HTTP/1.1 200 OK
⇐ Content-Type: text/plain; charset=UTF-8
⇐ Content-Length: 244
⇐
⇐ success: ark:/99999/fk4/root in_lieu_of ark:/99999/fk4/root/andmore
⇐ _target: http://www.cdlib.org
⇐ ...

Ownership model

EZID maintains ownership information about identifiers and uses that information to enforce access control.

The ownership model employed by EZID is hierarchical: each identifier has one owner, which is an EZID user; each EZID user belongs to one group; and each group belongs to one realm. Permission to create identifiers is governed by the namespaces (or "shoulders") that have been assigned to a user by an EZID administrator. But once created, permission to subsequently update an identifier is governed solely by the identifier's ownership. An identifier may be updated only by its owner, with two exceptions:

Proxies can be set up and managed in the EZID UI, Account Settings tab. Group administrators can be appointed only by an EZID administrator.

Proxies and group administrators are independent concepts. A group administrator may also be a proxy, and may also have proxies.

Identifier status

Each identifier in EZID has a status. The status is recorded as the value of the "_status" reserved metadata element (see Internal metadata below) and may be one of:

public
The default value.
reserved
The identifier is known only to EZID. This status may be used to reserve an identifier name within EZID without advertising the identifier's existence to resolvers and other external services. A reserved identifier may be deleted.
unavailable
The identifier is public, but the object referenced by the identifier is not available. A reason for the object's unavailability may optionally follow the status separated by a pipe character ("|", U+007C), e.g., "unavailable | withdrawn by author". The identifier redirects to an EZID-provided "tombstone" page (an HTML page that displays the identifier's citation metadata and the reason for the object's unavailability) regardless of its target URL.

An identifier's status may be changed by setting a new value for the aforementioned "_status" metadata element. EZID permits only certain status transitions:

Internal metadata

Metadata element names beginning with an underscore ("_", U+005F) are reserved for use by EZID. The reserved elements below are returned by the EZID API, and have the following meanings. A check mark in the first column indicates the element is updatable by clients.

Element Definition Example
_owner The identifier's owner. Only certain ownership changes are permitted; see Ownership model above. jsmith
_ownergroup The identifier's owning group, which is currently restricted to be the identifier's owner's group. ucla
_created The time the identifier was created expressed as a Unix timestamp. 1300812337
_updated The time the identifier was last updated expressed as a Unix timestamp. 1300913550
_target The identifier's target URL. Defaults to the identifier's EZID URL. That is, the default target URL for identifier foo is the self-referential URL https://ezid.cdlib.org/id/foo. Note that creating or updating the target URL of a DOI identifier may take up to 30 minutes to take effect in the Handle System.
_profile The identifier's preferred metadata profile (see Metadata profiles next). erc
_status The identifier's status (see Identifier status above). unavailable | withdrawn by author
_export Determines if the identifier is publicized by exporting it to external indexing and harvesting services. Must be "yes" or "no"; defaults to "yes". yes
_datacenter DataCite DOIs only. The datacenter at which the identifier is registered (or will be registered, in the case of a reserved identifier). CDL.CDL
_crossref If returned, indicates that the identifier is registered with Crossref (or, in the case of a reserved identifier, will be registered), and also indicates the status of the registration process. When setting, must be set to "yes". See Crossref registration below for more information. yes | successfully registered

Metadata profiles

EZID allows "citation metadata" to be stored with an identifier, i.e., metadata that describes the object referenced by the identifier or that otherwise gives the meaning of the identifier. In certain cases certain metadata elements are required to be present; see Metadata requirements & mapping below. This section describes only the general structure and naming of citation metadata in EZID.

EZID supports several citation metadata "profiles," or standard sets of citation metadata elements. By convention, a metadata profile is referred to using a simple, lowercase name, e.g., "erc", and elements belonging to that profile are referred to using the syntax "profile.element", e.g., "erc.who".

Currently EZID treats profiles entirely separately, and thus an identifier may have values for multiple metadata profiles simultaneously. However, we anticipate that EZID will provide metadata cross-walking in the future, in which case setting a value for an element in one profile will automatically provide a value for equivalent elements in other profiles. For this reason, clients are encouraged to pick one profile to populate per identifier.

The "_profile" internal metadata element defines the identifier's preferred metadata profile (typically the only profile for which it has metadata). There is no restriction on what metadata elements may be bound to an identifier, and hence clients are free to use alternate citation profiles or no citation profile at all. However, EZID's UI is, and its future metadata cross-walking support will be, limited to those profiles that it explicitly supports.

  1. Profile "erc". These elements are drawn from Kernel Metadata and Electronic Resource Citations (ERCs) . This profile aims at universal citations: any kind of object (digital, physical, abstract) or agent (person, group, software, satellite) for any purpose (research, education, entertainment, administration), any subject focus (oceanography, sales, religion, archiving), and any medium (television, newspaper, database, book). This is the default profile for ARK and UUID identifiers.
Element Definition
erc.who

The name of an entity (person, organization, or service) responsible for creating the content or making it available. For an article, this could be an author. Put name parts in "sort-friendly" order, such as:

  • van Gogh, Vincent,
  • Hu Jintao
  • Gilbert, William, Sir,,; Sullivan, Arthur, Sir,

Separate multiple names with ";". Append one or more final commas (",") to indicate that one or more internal commas can be used as inversion points to recover natural word order (if different from sort-friendly word order).

erc.what

A name or other human-oriented identifier given to the resource. For an article, this could be a title such as:

  • Moby Dick
  • Scarlet Pimpernel, The,

Use sort-friendly name parts and final commas in the same way as for the erc.who element.

erc.when

A point or period of time important in the lifecycle of the resource, often when it was created, modified, or made available. For an article, this could be the date it was written, such as:

  • 2009.04.23
  • 1924~
  • BCE0386
  • 1998-2003; 2008-

A date range (which can be open ended) may be useful, such as to indicate the years during which a periodical operated. Use ";" to separate entries and "~" to indicate approximation.

As a special case, an entire ANVL document containing ERC metadata may be bound to the metadata element "erc". Care should be taken to escape line terminators in the document (as is true for all metadata element values; see Request & response bodies above). For example, the ANVL document:

who: Proust, Marcel
what: Remembrance of Things Past

would be expressed as the single value:

erc: who: Proust, Marcel%0Awhat: Remembrance of Things Past
  1. Profile "datacite". These elements are drawn from the DataCite Metadata Scheme for the Publication and Citation of Research Data . This is the default profile for DOI identifiers.
Element Definition
datacite.creator

The main researchers involved in producing the data, or the authors of the publication in priority order. Each name may be a corporate, institutional, or personal name. In personal names list family name before given name, as in:

  • Shakespeare, William

Separate multiple names with ";". Non-roman names should be transliterated according to the ALA-LC schemes.

datacite.title A name or title by which the data or publication is known.
datacite.publisher A holder of the data (e.g., an archive) or the institution which submitted the work. In the case of datasets, the publisher is the entity primarily responsible for making the data available to the research community.
datacite.publicationyear The year when the data was or will be made publicly available. If an embargo period is in effect, use the year when the embargo period ends.
datacite.resourcetype

The general type and, optionally, specific type of the data. The general type must be one of the controlled vocabulary terms defined in the DataCite Metadata Scheme:

  • Audiovisual
  • Collection
  • Dataset
  • Event
  • Image
  • InteractiveResource
  • Model
  • PhysicalObject
  • Service
  • Software
  • Sound
  • Text
  • Workflow
  • Other

Specific types are unconstrained. If a specific type is given, it must be separated from the general type by a forward slash ("/"), as in:

  • Image/Photograph

Alternatively, an entire XML document adhering to the DataCite Metadata Scheme schema may be bound to the metadata element "datacite". Note that EZID sets the identifier embedded in the document to the identifier being operated on; thus it need not be specified by the client. The <identifier> element must still be included in the document, though, so the XML document may resemble:

<?xml version="1.0"?>
<resource xmlns="http://datacite.org/schema/kernel-4"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="...">
  <identifier identifierType="DOI">(:tba)</identifier>
  ...
</resource>

If an XML document is bound to a non-DOI identifier then, in an extension to the DataCite schema, the identifier type in the document must be set to "ARK" or "UUID" as appropriate.

Care should be taken to escape line terminators and percent signs in the document (as is true for all metadata element values; see Request & response bodies above). Once properly escaped, the uploaded metadata will resemble:

datacite: <?xml version="1.0"?>%0A<resource...
  1. Profile "dc". These elements are drawn from the Dublin Core Metadata Element Set .
Element Definition
dc.creator An entity primarily responsible for making the content of the resource. Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity.
dc.title A name given to the resource. Typically, a Title will be a name by which the resource is formally known.
dc.publisher An entity responsible for making the resource available. Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.
dc.date A date associated with an event in the life cycle of the resource. Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 and follows the YYYY-MM-DD format.
dc.type

The nature or genre of the resource. Recommended best practice is to use a term from the DCMI Type Vocabulary:

  • Collection
  • Dataset
  • Event
  • Image
  • InteractiveResource
  • MovingImage
  • PhysicalObject
  • Service
  • Software
  • Sound
  • StillImage
  • Text

4. Profile "crossref". This profile consists of a single element, "crossref", whose value is Crossref deposit metadata (an XML document). Care should be taken to escape line terminators and percent signs in the document (as is true for all metadata element values; see Request & response bodies above). See Crossref registration below for more information on usage of this profile and element.

Metadata requirements & mapping

DOI identifiers must satisfy specific metadata requirements. A DataCite DOI created by EZID must have title, creator, publisher, and publication year metadata any time its status is not reserved (see Identifier status above). A Crossref DOI must have Crossref metadata at all times (see Crossref registration below), though the metadata need not be complete if the identifier is reserved. Other than that, EZID imposes no requirements on the presence or form of citation metadata, but uploading at least minimal citation metadata to EZID is strongly encouraged in all cases to record the identifier's meaning and to facilitate its long-term maintenance. Regardless of the metadata profile used, population of the "datacite.resourcetype" element is encouraged to support broad categorization of identifiers.

To satisfy the aforementioned DataCite DOI metadata requirements, EZID looks in order for:

  1. DataCite XML metadata bound to the "datacite" element;
  2. Individual elements from the "datacite" profile as described in Profile "datacite" ("datacite.title", etc.); and lastly
  3. Elements from the identifier's preferred metadata profile (see Metadata profiles above) that EZID is able to map to DataCite equivalents. For example, if the preferred profile is "erc", then EZID will map element "erc.who" to "datacite.creator".

If no meaningful value is available for a required element, clients are encouraged to supply a standard machine-readable code drawn from the Kernel Metadata and Electronic Resource Citations (ERCs) specification. These codes have the common syntactic form "(:code)" and include:

Code Definition
(:unac) temporarily inaccessible
(:unal) unallowed; intentionally suppressed
(:unap) not applicable; makes no sense
(:unas) unassigned (e.g., untitled)
(:unav) unavailable; possibly unknown
(:unkn) known to be unknown (e.g., anonymous)
(:none) never had a value, never will
(:null) explicitly and meaningfully empty
(:tba) to be assigned or announced later
(:etal) too numerous to list (et alia)
(:at) the real value is at the given URL or identifier

A code may optionally be followed by the code's human-readable equivalent or a more specific description, as in:

who: (:unkn) anonymous donor

Crossref registration

A DOI identifier may be registered with either the Crossref or the DataCite registration agency. The choice of registration agency is not selectable, but is entirely determined by the identifier's shoulder. This section discusses registration with Crossref.

Once registered, an identifier cannot be removed from Crossref. If the identifier's status is set to unavailable (recall Identifier status, above), EZID will prepend "WITHDRAWN" to the title of the resource associated with the identifier, but the identifier remains in Crossref's systems.

Registering an identifier with Crossref requires three steps:

  1. Optionally set the "_crossref" reserved metadata element to "yes".
  2. Supply Crossref deposit metadata as the value of the "crossref" element.
  3. Set the "_profile" reserved metadata element to "crossref" to be able to view the metadata in the EZID UI.

These steps are discussed in more detail next.

Crossref registration is asynchronous. Registration is initiated by a create, mint, or update identifier request, when the identifier's status is public. Setting the "_crossref" reserved metadata element to "yes" in the request is optional. In responses, the "_crossref" element is always returned and has the value "yes" followed by a pipe character ("|", U+007C) followed by the status of the registration, e.g., "yes | registration in progress" or "yes | successfully registered". The status of the registration is updated automatically by EZID and may be polled by the client. If a warning or error occurred during registration, the status is followed by another pipe character and the message received from Crossref, e.g., "yes | registration failure | xml error...". Warnings and errors may also be viewed in the EZID UI and may also be emailed to a specified mailbox. Warnings and errors can be removed only by submitting new metadata and re-registering identifiers.

Crossref deposit metadata should adhere to the Crossref Deposit Schema , version 4.3.0 or later. The metadata should consist of the immediate child element of a <body> element, i.e., one of the following elements:

(If an outer element such as <doi_batch> or <body> is nevertheless supplied, it will be stripped off.)

Although the Crossref deposit schema is quite flexible, and supports batch operations, EZID requires that the deposit metadata specify a single DOI identifier, i.e., a single <doi_data> element. This element should contain <doi> and <resource> subelements, which may be left empty. EZID replaces the contents of the <doi> element (if any) with the identifier in question and the contents of the <resource> element with the target URL. Here is an abridged example of deposit metadata:

<?xml version="1.0"?>
<book xmlns="http://www.crossref.org/schema/4.3.4"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.crossref.org/schema/4.3.4
  http://www.crossref.org/schema/deposit/crossref4.3.4.xsd"
  book_type="monograph">
  <book_metadata>
    <contributors>
      <person_name contributor_role="author" sequence="first">
        <given_name>Marcel</given_name>
        <surname>Proust</surname>
      </person_name>
    </contributors>
    <titles>
      <title>Remembrance of Things Past</title>
    </titles>
    ...
    <doi_data>
      <doi>(:tba)</doi>
      <resource>(:tba)</resource>
    </doi_data>
  </book_metadata>
</book>

In supplying an XML document as the value of element "crossref", care should be taken to escape line terminators and percent signs in the document (as is true for all metadata element values; see Request & response bodies above).

If the identifier's preferred metadata profile is "crossref", EZID automatically creates a DataCite Metadata Scheme record from the Crossref deposit metadata for display and search purposes. Where conversion values are missing (e.g., a journal does not have a creator) EZID supplies the code "(:unav)". This automatic conversion can be overriden by supplying an entire DataCite Metadata Scheme XML record as the value of the "datacite" element (see Profile "datacite" above). Additionally, individual DataCite elements (e.g., "datacite.title") may be specified to override selected portions of the automatic conversion.

Putting it all together, uploaded metadata in a Crossref registration request will resemble:

_crossref: yes
_profile: crossref
_target: http://...
crossref: <?xml version="1.0"?>%0A<book...

Testing the API

EZID provides three namespaces (or "shoulders") for testing purposes: ark:/99999/fk4 for ARK identifiers, doi:10.5072/FK2 for DataCite DOI identifiers, and doi:10.15697/ for Crossref DOI identifiers. Identifiers in these namespaces are termed "test identifiers." They are ordinary long-term identifiers in almost all respects, except that EZID deletes them after 2 weeks.

Test ARK identifiers resolve just as real ARK identifiers do through the N2T resolver, but test DOI identifiers do not resolve and do not appear in any DataCite or Crossref systems.

All user accounts are permitted to create test identifiers. EZID also provides an "apitest" account that is permitted to create only test identifiers. Contact us for the password for this account. Additionally, please contact us before embarking on any large-scale testing, specifically, before creating more than 10,000 test identifiers.

Test identifiers and reserved identifiers are orthogonal concepts. A test identifier has a limited lifetime and is deleted by EZID when it expires. A reserved identifier may be deleted by the owner while still in its reserved state, but once made public, is permanent. As evidence of this orthogonality, it is possible to create reserved test identifiers.

Server status

The status of the EZID server can be probed by issuing a GET request to the URL https://ezid.cdlib.org/status. If the server is up the response will resemble the following:

⇒ GET /status HTTP/1.1
⇒ Host: ezid.cdlib.org

⇐ HTTP/1.1 200 OK
⇐ Content-Type: text/plain; charset=UTF-8
⇐ Content-Length: 19
⇐
⇐ success: EZID is up

Python command line tool

ezid.py is a command line tool, written in Python, that is capable of exercising all API functions. It serves as an example of how to use the API from Python, but it's also useful in its own right as an easy, scriptable means of accessing EZID functionality. The general usage is:

% ezid.py credentials operation [arguments...]

Run the tool with no command line arguments for a complete usage statement; additional documentation is in the source code. To give a flavor of the tool's usage and capabilities here, a few examples follow.

To mint a test ARK identifier and supply initial metadata:

% ezid.py username:password mint ark:/99999/fk4 erc.who 'Proust, Marcel' \
    erc.what 'Remembrance of Things Past' erc.when 1922
success: ark:/99999/fk4gt78tq

To get identifier metadata:

% ezid.py -dt - view ark:/99999/fk4gt78tq
success: ark:/99999/fk4gt78tq
_created: 2013-05-17T18:17:14
_export: yes
_owner: user
_ownergroup: group
_profile: erc
_status: public
_target: https://ezid.cdlib.org/id/ark:/99999/fk4gt78tq
_updated: 2013-05-17T18:17:14
erc.what: Remembrance of Things Past
erc.when: 1922
erc.who: Proust, Marcel

The tool provides two mechanisms in addition to the command line for supplying metadata. If a metadata element name is an at-sign ("@", U+0040), the subsequent value is treated as a filename and metadata elements are read from the named ANVL-formatted file. For example, if file metadata.txt contains:

erc.who: Proust, Marcel
erc.what: Remembrance of Things Past
erc.when: 1922

Then a test ARK identifier with that metadata can be minted by invoking:

% ezid.py username:password mint ark:/99999/fk4 @ metadata.txt

And if a metadata element value has the form "@filename", the named file is read and treated as a single value. For example, if file metadata.xml contains a DataCite XML document, then a test DOI identifier with that document as the value of the "datacite" element can be minted by invoking:

% ezid.py username:password mint doi:10.5072/FK2 datacite @metadata.xml

PHP examples

PHP is agnostic with respect to character sets and character set encoding; it operates on bytes only. The following examples assume that input data is already UTF-8 encoded and hence can be passed directly to EZID; if this is not the case, input data will need to be converted to UTF-8 using the functions PHP provides for that purpose.

Get identifier metadata:

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://ezid.cdlib.org/id/identifier');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
print curl_getinfo($ch, CURLINFO_HTTP_CODE) . "\n";
print $output . "\n";
curl_close($ch);
?>

Create identifier:

<?php
$input = '_target: url
element1: value1
element2: value2';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://ezid.cdlib.org/id/identifier');
curl_setopt($ch, CURLOPT_USERPWD, 'username:password');
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'PUT');
curl_setopt($ch, CURLOPT_HTTPHEADER,
  array('Content-Type: text/plain; charset=UTF-8',
        'Content-Length: ' . strlen($input)));
curl_setopt($ch, CURLOPT_POSTFIELDS, $input);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
print curl_getinfo($ch, CURLINFO_HTTP_CODE) . "\n";
print $output . "\n";
curl_close($ch);
?>

Mint identifier:

<?php
$input = '_target: url
element1: value1
element2: value2';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://ezid.cdlib.org/shoulder/shoulder');
curl_setopt($ch, CURLOPT_USERPWD, 'username:password');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_HTTPHEADER,
  array('Content-Type: text/plain; charset=UTF-8',
        'Content-Length: ' . strlen($input)));
curl_setopt($ch, CURLOPT_POSTFIELDS, $input);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
print curl_getinfo($ch, CURLINFO_HTTP_CODE) . "\n";
print $output . "\n";
curl_close($ch);
?>

Update identifier:

<?php
$input = '_target: url';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://ezid.cdlib.org/id/identifier');
curl_setopt($ch, CURLOPT_USERPWD, 'username:password');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_HTTPHEADER,
  array('Content-Type: text/plain; charset=UTF-8',
        'Content-Length: ' . strlen($input)));
curl_setopt($ch, CURLOPT_POSTFIELDS, $input);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
print curl_getinfo($ch, CURLINFO_HTTP_CODE) . "\n";
print $output . "\n";
curl_close($ch);
?>

Perl examples

The following Perl examples use the libwww-perl (LWP) library.

To get identifier metadata, parse and decode it, and store it in a hash, %metadata:

use LWP::UserAgent;

$ua = LWP::UserAgent->new;
$r = $ua->get("https://ezid.cdlib.org/id/identifier");
if ($r->is_success) {
  ($statusline, $m) = split(/\n/, $r->decoded_content, 2);
  %metadata = map { map { s/%([0-9A-F]{2})/pack("C", hex($1))/egi; $_ }
    split(/: /, $_, 2) } split(/\n/, $m);
} else {
  print $r->code, $r->decoded_content;
}

The following example creates an identifier, supplying initial metadata values from a hash, %metadata. Note that LWP is particular about how https URLs are expressed. In an LWP request the protocol should be included but not a port number ("https://ezid.cdlib.org/..."), but conversely when supplying credentials the https port number should be included but not a protocol ("ezid.cdlib.org:443").

use Encode;
use HTTP::Request::Common;
use LWP::UserAgent;
use URI::Escape;

sub escape {
  (my $s = $_[0]) =~ s/([%:\r\n])/uri_escape($1)/eg;
  return $s;
}

%metadata = ( "_target" => "url",
  "element1" => "value1",
  "element2" => "value2" );
$ua = LWP::UserAgent->new;
$ua->credentials("ezid.cdlib.org:443", "EZID", "username", "password");
$r = $ua->request(PUT "https://ezid.cdlib.org/id/identifier",
  "Content-Type" => "text/plain; charset=UTF-8",
  Content => encode("UTF-8", join("\n",
    map { escape($_) . ": " . escape($metadata{$_}) } keys %metadata)));
print $r->code, $r->decoded_content unless $r->is_success;

To mint an identifier (in this case supplying no metadata initially), obtaining a new identifier, $identifier:

use HTTP::Request::Common;
use LWP::UserAgent;

$ua = LWP::UserAgent->new;
$ua->credentials("ezid.cdlib.org:443", "EZID", "username", "password");
$r = $ua->request(POST "https://ezid.cdlib.org/shoulder/shoulder",
  "Content-Type" => "text/plain; charset=UTF-8");
if ($r->is_success) {
  $identifier = $r->decoded_content =~ m/success: ([^ ]*)/ && $1;
} else {
  print $r->code, $r->decoded_content;
}

To update an identifier using values from a hash, %metadata:

use Encode;
use HTTP::Request::Common;
use LWP::UserAgent;
use URI::Escape;

sub escape {
  (my $s = $_[0]) =~ s/([%:\r\n])/uri_escape($1)/eg;
  return $s;
}

%metadata = ( "_target" => "url" );
$ua = LWP::UserAgent->new;
$ua->credentials("ezid.cdlib.org:443", "EZID", "username", "password");
$r = $ua->request(POST "https://ezid.cdlib.org/id/identifier",
  "Content-Type" => "text/plain; charset=UTF-8",
  Content => encode("UTF-8", join("\n",
    map { escape($_) . ": " . escape($metadata{$_}) } keys %metadata)));
print $r->code, $r->decoded_content unless $r->is_success;

Java example

A number of Java code snippets have been presented above. In the example below we combine them all into a runnable, end-to-end program that mints a test identifier and then retrieves and prints the identifier's metadata.

import java.io.*;
import java.net.*;
import java.util.*;

class harness {

    static String SERVER = "https://ezid.cdlib.org";
    static String USERNAME = "username";
    static String PASSWORD = "password";

    static class MyAuthenticator extends Authenticator {
        protected PasswordAuthentication getPasswordAuthentication () {
            return new PasswordAuthentication(
                USERNAME, PASSWORD.toCharArray());
        }
    }

    static class Response {

        int responseCode;
        String status;
        String statusLineRemainder;
        HashMap<String, String> metadata;

        public String toString () {
            StringBuffer b = new StringBuffer();
            b.append("responseCode=");
            b.append(responseCode);
            b.append("\nstatus=");
            b.append(status);
            b.append("\nstatusLineRemainder=");
            b.append(statusLineRemainder);
            b.append("\nmetadata");
            if (metadata != null) {
                b.append(" follows\n");
                Iterator<Map.Entry<String, String>> i =
                    metadata.entrySet().iterator();
                while (i.hasNext()) {
                    Map.Entry<String, String> e = i.next();
                    b.append(e.getKey() + ": " + e.getValue() + "\n");
                }
            } else {
                b.append("=null\n");
            }
            return b.toString();
        }

    }

    static String encode (String s) {
        return s.replace("%", "%25").replace("\n", "%0A").
            replace("\r", "%0D").replace(":", "%3A");
    }

    static String toAnvl (HashMap<String, String> metadata) {
        Iterator<Map.Entry<String, String>> i =
            metadata.entrySet().iterator();
        StringBuffer b = new StringBuffer();
        while (i.hasNext()) {
            Map.Entry<String, String> e = i.next();
            b.append(encode(e.getKey()) + ": " +
                     encode(e.getValue()) + "\n");
        }
        return b.toString();
    }

    static String decode (String s) {
        StringBuffer b = new StringBuffer();
        int i;
        while ((i = s.indexOf("%")) >= 0) {
            b.append(s.substring(0, i));
            b.append((char)
                     Integer.parseInt(s.substring(i+1, i+3), 16));
            s = s.substring(i+3);
        }
        b.append(s);
        return b.toString();
    }

    static String[] parseAnvlLine (String line) {
        String[] kv = line.split(":", 2);
        kv[0] = decode(kv[0]).trim();
        kv[1] = decode(kv[1]).trim();
        return kv;
    }

    static Response issueRequest (
        String method, String path, HashMap<String, String> metadata)
        throws Exception {
        HttpURLConnection c = (HttpURLConnection)
            (new URL(SERVER + "/" + path)).openConnection();
        c.setRequestMethod(method);
        c.setRequestProperty("Accept", "text/plain");
        if (metadata != null) {
            c.setDoOutput(true);
            c.setRequestProperty("Content-Type",
                                 "text/plain; charset=UTF-8");
            OutputStreamWriter w =
                new OutputStreamWriter(c.getOutputStream(), "UTF-8");
            w.write(toAnvl(metadata));
            w.flush();
        }
        Response r = new Response();
        r.responseCode = c.getResponseCode();
        InputStream is = r.responseCode < 400? c.getInputStream() :
            c.getErrorStream();
        if (is != null) {
            BufferedReader br = new BufferedReader(
                new InputStreamReader(is, "UTF-8"));
            String[] kv = parseAnvlLine(br.readLine());
            r.status = kv[0];
            r.statusLineRemainder = kv[1];
            HashMap<String, String> d = new HashMap<String, String>();
            String l;
            while ((l = br.readLine()) != null) {
                kv = parseAnvlLine(l);
                d.put(kv[0], kv[1]);
            }
            if (d.size() > 0) r.metadata = d;
        }
        return r;
    }

    public static void main (String[] args) throws Exception {

        Authenticator.setDefault(new MyAuthenticator());

        // Sample POST request.
        System.out.println("Issuing POST request...");
        HashMap<String, String> metadata =
            new HashMap<String, String>();
        metadata.put("erc.what", "a test");
        Response r = issueRequest(
            "POST", "shoulder/ark:/99999/fk4", metadata);
        System.out.print(r);

        // Sample GET request.
        System.out.println("\nIssuing GET request...");
        String id = r.statusLineRemainder;
        r = issueRequest("GET", "id/" + URLEncoder.encode(id, "UTF-8"),
                         null);
        System.out.print(r);

    }

}

curl examples

The EZID API can be exercised using the curl command line tool. The following examples assume metadata is UTF-8 encoded throughout.

To get identifier metadata, obtaining text formatted as described in Request & response bodies above:

curl https://ezid.cdlib.org/id/identifier

To mint an identifier:

curl -u username:password -X POST https://ezid.cdlib.org/shoulder/shoulder

A single metadata element can be specified on the command line. For example, to mint an identifier and specify a target URL at the same time:

curl -u username:password -X POST -H 'Content-Type: text/plain'
  --data-binary '_target: url' https://ezid.cdlib.org/shoulder/shoulder

To specify more than one metadata element, the metadata must be placed in a file that is formatted as described in Request & response bodies. For example, to mint an identifier and upload metadata contained in a file metadata.txt:

curl -u username:password -X POST -H 'Content-Type: text/plain'
  --data-binary @metadata.txt https://ezid.cdlib.org/shoulder/shoulder

Creating an identifier is similar to minting one, except that the HTTP method (-X option) is changed from POST to PUT and an identifier is specified instead of a shoulder. Here are the three examples above, but now creating an identifier:

curl -u username:password -X PUT https://ezid.cdlib.org/id/identifier

curl -u username:password -X PUT -H 'Content-Type: text/plain'
  --data-binary '_target: url' https://ezid.cdlib.org/id/identifier

curl -u username:password -X PUT -H 'Content-Type: text/plain'
  --data-binary @metadata.txt https://ezid.cdlib.org/id/identifier

To update identifier metadata:

curl -u username:password -X POST -H 'Content-Type: text/plain'
  --data-binary '_target: url' https://ezid.cdlib.org/id/identifier

curl -u username:password -X POST -H 'Content-Type: text/plain'
  --data-binary @metadata.txt https://ezid.cdlib.org/id/identifier

Batch processing

The API does not directly support batch processing, but EZID does provide two client tools, linked from this documentation, that can simplify the work of scripting a batch job. First and most generally, the Python command line tool can exercise all API functions and is straightforward to script. For example, to mint and print 100 test ARK identifiers:

#! /bin/bash
for i in {1..100}; do
  ezid.py username:password mint ark:/99999/fk4 | awk '{ print $2 }'
done

Second, the batch-register.py script automates several common types of batch processing. It reads an input CSV file containing identifier metadata, one row per identifier; transforms the metadata into EZID metadata as directed by a configuration file of mappings; creates or mints identifiers, or updates existing identifiers, using that metadata; and outputs a CSV file containing the created, minted, or updated identifiers and other information. Detailed usage information is contained in the script itself, but to give a taste of what it can do, given an input CSV file with columns,

title,author,orcid,publisher_name,publisher_place,url

a possible complete mapping file to mint DOI identifiers is shown below. The mappings reference both EZID metadata elements and, using XPath expressions, DataCite Metadata Scheme elements and attributes.

_profile = datacite
/resource/titles/title = $1
/resource/creators/creator/creatorName = $2
/resource/creators/creator/nameIdentifier = $3
/resource/creators/creator/nameIdentifier@nameIdentifierScheme = ORCID
/resource/publisher = $4 ($5)
/resource/publicationYear = 2018
/resource/resourceType@resourceTypeGeneral = Dataset
_target = $6

For another example, to update the statuses of a batch of existing identifiers to public, given an input file listing the identifiers (i.e., a CSV file with just one column), a mapping file would be:

_id = $1
_status = public

Batch download

The metadata for all identifiers matching a set of constraints can be downloaded in one batch operation. Authentication is required, and the scope of the identifiers that can be downloaded in this way is implicitly restricted to those that are directly owned by or otherwise updatable by the requestor.

Batch download and harvesting (see OAI-PMH harvesting below) are similar but different operations. With batch download, the identifiers returned are restricted to those updatable by the requestor as noted above, but within that scope it is possible to download all identifiers, including reserved, unavailable, and test identifiers. By contrast, with harvesting, no authentication is required and the identifiers returned are not restricted by ownership, but only those identifiers that are public and exported and that satisfy several other quality criteria are returned.

Subsections

Overview

The batch download process is asynchronous. A download is requested by issuing a POST request to

https://ezid.cdlib.org/download_request

The content type of the request body must be "application/x-www-form-urlencoded" and the body must include one POST parameter, "format", specifying the download format, and may include additional parameters (see Parameters below) specifying search criteria and download format and notification options. The return is a status line indicating either error (see Error reporting above) or success. If successful, the status line includes a URL from which the download can be retrieved. Here's a sample interaction:

⇒ POST /download_request HTTP/1.1
⇒ Host: ezid.cdlib.org
⇒ Content-Type: application/x-www-form-urlencoded
⇒ Content-Length: 19
⇒
⇒ format=xml&type=ark

⇐ HTTP/1.1 200 OK
⇐ Content-Type: text/plain; charset=UTF-8
⇐ Content-Length: 57
⇐
⇐ success: https://ezid.cdlib.org/download/da543b91a0.xml.gz

The download will not be available immediately, but clients can poll the returned URL; the server returns HTTP status code 404 (Not Found) if the download is not yet ready. As part of the request, clients can also specify an email address to which a notification will be sent when the download becomes available. Downloads are retained for one week.

Download formats

Identifier metadata is returned in one of three formats; which format is determined by the "format" parameter. In all cases, the text encoding is UTF-8 and the metadata is compressed with either gzip or ZIP as determined by the "compression" parameter.

  1. Format "anvl". This format is effectively the concatenation of performing a get metadata operation (see Operation: get identifier metadata above) on each selected identifier. Metadata is returned in ANVL format and employs percent-encoding as described in Request & response bodies. The metadata for an identifier is preceded by a header line that contains two colons (":", U+003A) followed by the identifier. Blocks of metadata are separated by blank lines. For example:

    :: ark:/99999/fk4gt78tq
    _created: 1300812337
    _export: yes
    _owner: apitest
    _ownergroup: apitest
    _profile: erc
    _status: public
    _target: http://www.gutenberg.org/ebooks/7178
    _updated: 1300913550
    erc.what: Remembrance of Things Past
    erc.when: 1922
    erc.who: Proust, Marcel
    
    :: doi:10.5072/FK2S75905Q
    _created: 1421276359
    _datacenter: CDL.CDL
    _export: yes
    _owner: apitest
    _ownergroup: apitest
    _profile: datacite
    _status: public
    _target: http://www.gutenberg.org/ebooks/26014
    _updated: 1421276359
    datacite: <?xml version="1.0"?>%0A<resource xmlns="http://...
    
  2. Format "csv". Metadata is returned as an Excel-compatible Comma-separated values (CSV) table, one row per selected identifier. A header row lists column names. The columns to return must be specified using one or more "column" parameters; the order of columns in the table matches the parameter order. The columns that can be returned include all internal EZID metadata elements (refer to Internal metadata) and all citation metadata elements (refer to Metadata profiles). Additionally, the following columns may be requested:

    • _id

      The identifier.

    • _mappedCreator, _mappedTitle, _mappedPublisher, _mappedDate, _mappedType

      Creator, title, publisher, date, and type citation metadata as mapped from the identifier's preferred metadata profile.

    Continuing with the previous example, if the parameters are

    format=csv&column=_id&column=_owner&column=erc.when&column=_mappedCreator
    

    then the following table will be returned:

    _id,_owner,erc.when,_mappedCreator
    ark:/99999/fk4gt78tq,apitest,1922,"Proust, Marcel"
    doi:10.5072/FK2S75905Q,apitest,,Montagu Browne
    

    Note that for the CSV format only, line terminators in metadata values (both newlines ("\n", U+000A) and carriage returns ("\r", U+000D)) are converted to spaces.

  3. Format "xml". Metadata is returned as a single XML document. The root element, <records>, contains a <record> element for each selected identifier, and within each <record> element are <element> elements for each of the identifier's metadata elements. Thus the returned document will have the structure:

    <?xml version="1.0" encoding="UTF-8"?>
    <records>
      <record identifier="identifier">
        <element name="name">value </element>
        ...
      </record>
      ...
    </records>
    

    As a special case, XML metadata bound to a "datacite" or "crossref" element is directly embedded in the containing <element> element, i.e., the metadata will appear as an XML subelement and not as a string value.

    Continuing with the previous example, the return in XML format would be:

    <?xml version="1.0" encoding="UTF-8"?>
    <records>
      <record identifier="ark:/99999/fk4gt78tq">
        <element name="_created">1300812337</element>
        <element name="_export">yes</element>
        <element name="_owner">apitest</element>
        <element name="_ownergroup">apitest</element>
        <element name="_profile">erc</element>
        <element name="_status">public</element>
        <element name="_target">http://www.gutenberg.org/ebooks/7178</element>
        <element name="_updated">1300913550</element>
        <element name="erc.what">Remembrance of Things Past</element>
        <element name="erc.when">1922</element>
        <element name="erc.who">Proust, Marcel</element>
      </record>
      <record identifier="doi:10.5072/FK2S75905Q">
        <element name="_created">1421276359</element>
        <element name="_datacenter">CDL.CDL</element>
        <element name="_export">yes</element>
        <element name="_owner">apitest</element>
        <element name="_ownergroup">apitest</element>
        <element name="_profile">datacite</element>
        <element name="_status">public</element>
        <element name="_target">http://www.gutenberg.org/ebooks/26014</element>
        <element name="_updated">1421276359</element>
        <element name="datacite">
          <resource xmlns="http://datacite.org/schema/kernel-4">
            <identifier identifierType="DOI">10.5072/FK2S75905Q</identifier>
            <creators>
              <creator>
                <creatorName>Montagu Browne</creatorName>
              </creator>
            </creators>
            <titles>
              <title>Practical Taxidermy</title>
            </titles>
            <publisher>Charles Scribner's Sons</publisher>
            <publicationYear>1884</publicationYear>
            <resourceType resourceTypeGeneral="Text"/>
          </resource>
        </element>
      </record>
    </records>
    

Parameters

Unless otherwise noted, parameters are optional and not repeatable.

The remaining parameters are search constraints. Search constraints are logically ANDed together, but search constraint parameters that are repeated have the effect of creating a logical OR of the selected values. For example, parameter "status" can take on three possible values, "reserved", "public", or "unavailable". If no "status" parameter is specified, there is no constraint on identifier status; if "status=reserved" is specified, then only reserved identifiers are returned; and if "status=reserved&status=public" is specified, then reserved and public identifiers are returned (but not unavailable identifiers).

Using curl to request a download

A batch download can easily be requested using the curl command line tool. Use curl's "-d" option to specify parameters, and use the "-u" option to supply credentials. For example:

curl -u username:password -d format=anvl -d type=ark -d type=doi
  -d permanence=real https://ezid.cdlib.org/download_request

For even more convenience, a simple Bash script, batch-download.sh , turns a batch download into a one-step operation. The script issues a download request using curl, waits for the request to be processed, and when ready downloads to a file in the current directory. Its usage equivalent to the above example would be:

% batch-download.sh username password format=anvl type=ark type=doi permanence=real
submitting download request...
waiting......
9c02f494ab.txt.gz

OAI-PMH harvesting

EZID supports harvesting of identifiers and citation metadata via The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) , version 2.0. The base URL for OAI-PMH access is

https://ezid.cdlib.org/oai

Only public, exported, non-test identifiers that have non-default target URLs and at least creator, title, and date citation metadata (in ERC terms, who/what/when metadata) are made available through OAI-PMH.

Harvesting and batch download are similar but different operations; see Batch download for the differences.

In returning an identifier's metadata, EZID maps citation metadata from the identifier's preferred metadata profile (see Metadata profiles above) to one of two delivery formats: Dublin Core (as required by the protocol) or DataCite . In the latter case, older DataCite XML metadata records stored in EZID are converted to version 4 of the DataCite schema for uniformity. Note that, in an extension to the DataCite schema, the identifier type for non-DOI identifiers is set to "ARK" or "UUID" as appropriate.