Home > C4D > Is there ‘a’ standard format for citing data in papers?

Is there ‘a’ standard format for citing data in papers?

Hello All,

RCUK open access policy states:

‘….the policy requires all research papers, if applicable, to include a statement on how underlying research materials, such as data, samples or models, can be accessed.  However, the policy does not require that the data must be made open.  if there are considered to be compelling reasons to protect access to the data, for example commercial or legitimate sensitivities around data derived from potentially identifiable human participants, these should be included in the statement’



Is anyone recommending a standard format that can be used as guidance for researchers who want to comply with this?

Just to clarify I mean is there  a UK standard we can all just refer to safe in the knowledge we are all referring to the same (in the same way as RCUK recommend Research Information Network format for quoting funder references).  So I am not looking of a range of examples to explore – we’ve done lots of that – I’m looking for a definitive standard that is advertised to all.

A contact at the University of North Carolina suggests looking at the IASSIST guidance.



  • Author: Name(s) of each individual or organizational entity responsible for the creation of the dataset.
  • Date of Publication: Year the dataset was published or disseminated.
  • Title: Complete title of the dataset, including the edition or version number, if applicable.
  • Publisher and/or Distributor: Organizational entity that makes the dataset available by archiving, producing, publishing, and/or distributing the dataset.
  • Electronic Location or Identifier: Web address or unique, persistent, global identifier used to locate the dataset (such as a DOI). Append the date retrieved if the title and locator are not specific to the exact instance of the data you used.


  • Author: Smith, Tom W., Peter V. Marsden, and Michael Hout.
  • Date of Publication: 2011
  • Title: . General Social Survey, 1972-2010 Cumulative File
  • Publisher and/or Distributor: Chicago, IL: National Opinion Research Center. Distributed by Ann Arbor, MI: Inter-university Consortium for Political and Social Research
  • Electronic Location or Identifier:  doi:10.3886/ICPSR31521.v1

At Glasgow we might add description of dataset to this – however perhaps this is not necessary since it will be clear from the paper itself?

Addition of advice about what to say re protected access for some datasets might also be helpful.

Another suggestion was the CrossCite citation formatter (currently in beta) which works with both CrossRef and DataCite DOIs:


This tool takes a DOI and a target citation style (default APA) and produces (in theory) a correctly-formatted citation.

But what is the definitive specification or can we safely just pick one?

I have emailed RCUK to ask if they have any guidance.

Categories: C4D Tags: , ,
  1. April 25, 2013 at 10:11 am

    I’d very strongly recommend using the DataCite method for citing datasets which is very similar to the IASSIST guidance in your blog post. (http://schema.datacite.org/meta/kernel-2.0/doc/DataCite-MetadataKernel_v2.0.pdf)

    “2.2 Citation
    Because many users of this scheme are members of a variety of academic disciplines, DataCite remains discipline‐agnostic concerning matters pertaining to academic style sheet requirements.
    Therefore, DataCite recommends rather than requires a particular citation format. In keeping with this approach, the following is the recommended ormat for rendering a DataCite citation for human readers using the first five properties of the scheme:

    Creator (PublicationYear): Title. Publisher. Identifier

    It may also be desirable to include information from two optional properties, Version and ResourceType (as appropriate). If so, the recommended form is as follows:

    Creator(PublicationYear): Title. Version. Publisher. ResourceType. Identifier

    For citation purposes, the Identifier may optionally appear both in its original format and in a linkable, http format, as it is practiced by the Organisation for Economic Co‐operation and Development (OECD), as shown below.

    Here are several examples:
    • Irino, T; Tada, R (2009): Chemical andmineral compositions ofsedimentsfromODP Site 127‐797.Geological Institute,University of Tokyo.doi:10.1594/PANGAEA.726855. http://dx.doi.org/10.1594/PANGAEA.726855
    • Geofon operator(2009):GEFONevent gfz2009kciu (NW Balkan Region). GeoForschungsZentrumPotsdam(GFZ). doi:10.1594/GFG.GEOFON.gfz2009kciu.
    • Denhard, Michael(2009): dphase_mpeps: MicroPEPS LAF‐Ensemble run by DWDforthe MAP D‐PHASE project. WorldData Centerfor Climate. doi: 10.1594/WDCC/dphasempeps. http://dx.doi.org/10.1594/WDCC/dphase_mpeps

    We’re still at the stage of trying to reach consensus about how to cite data – I’m co-chair of the CODATA working group on data citation which will shortly be publishing a report on the current status of data citation, and will be coming up with a report on best practice next year.

    A vital part of data citation is that it is treated exactly the same way as a paper citation – making data a research output that’s equal in status to a paper.

    • April 25, 2013 at 10:20 am

      Thanks so much Sarah. Yes we have included Datacite requirements at Glasgow. What we are looking for is a formal standard recommendation to go with the Open Access policy including advice on how to cite restrictions to data. Would be good if one statement could be drawn up rather than lots of organisations creating thier own. I guess this could be amended once the outcome of the CODATA working group is available next year.

  2. Caroline Wilkinson
    April 25, 2013 at 10:58 am

    Is the citation itself the right place to convey information about access/restrictions? In principle, citation should direct users to some form of landing page where this information is available. Also, access rights may change over time whereas a citation, once published, can’t be altered.

    As well as DataCite recommendations, UKDA have some good guidance on this http://data-archive.ac.uk/conditions/citing-data

    • April 25, 2013 at 11:04 am

      Hi Caroline,

      No I personally do not think the citation is the right place to include restrictions and the landing page or equivalent is. This is borne out by similar reply I received from an RCUK representative so it may be that the wording needs amended in the guidance to make clear it is not necessary to have this information in the paper itself. I await confirmation from RCUK.

  3. April 25, 2013 at 12:48 pm

    I started working on this quite a few years ago, with my kcite plugin for wordpress; you mention citeproc in your blogpost. Kcite was the original motiviation for the addition of citeproc metadata to crossref content negotiation services, with datacite shortly after; kcite uses this metadata to generate reference lists given a DOI (or indeed any URL nowadays).

    You can see the practical upshot of this on, for example Henry Rzepa’s blog where he cites his own data sets.


    And, yes, it works although with a number of different formats; one thing that is worth mentioning is that datacite stores names as a string (crossref breaks them into family and given names). So, author/year citation formats work well with crossref DOIs, less well with Datacite. You can see that problem on Henry’s site also.

    Now, from all this I draw the conclusion that it citation format more or less entirely irrelevant. It really doesn’t matter. If you want a deeply thoughful take on the whole issue, I would read this:


    The reason it doesn’t matter is that if the DOI (or pubmed ID, or URI, or whatever) is correct, then this should be all that is necessary; the rest is only necessary in case the author has made a mistake with the DOI. If, and only if, you directly use the identifier to generate the reference, then this necessity goes away; if the DOI is wrong, the visual form of the reference will break as well.

    From which I draw my last conclusion. Any scientist found adding or editing bibliographic metadata in their tool of choice should be flogged publically.

    • April 26, 2013 at 8:23 am

      Thanks Phillip,

      Yes I agree that the actual DOI etc format is open to some different choices.

      So I think you are suggesting we ask the authors to put a link in their paper – DOI, URL or whatever to where more info can be found about the dataset and not worry too much about the specifics. E.g. if I don’t have an electronic dataset a link to my repository record (complete with a few mandatory fields recommended by Datacite/Australian National Data Services etc) giving some info about the location of my paper file would do.

      • April 26, 2013 at 11:25 am

        Citation styles are just a nightmare. This is why there are around 3000 of them. And in many ways, they really, really do not matter. The DOI is extractable, especially if represented as a URL. If the DOI is there, then all the metadata that you might ask them to put in is duplicated anyway, because you can always get the metadata from the DOI. Make it simple. “Always put a DOI or a URL to a standard repository, and always represent DOIs as http://dx.doi.org/10.nnn

        What is the use case for specifying how to display everything else? How likely is it that people will follow these guidelines anyway?

  4. April 25, 2013 at 6:48 pm

    Surely part of the answer here is to use the form required by your publisher for citing data. So APA would be different from Chicago etc…

    Another answer would be to look at the ISO standard. It used to be ISO 690-2: 1997, which gives you some idea how out of date it was. I know there was a longstanding effort to update it, but I don’t know what has happened. But this is really about which elements are required and which optional etc, rather than telling you how to do it. And of course, ISO standards are not easily accessible (although Glasgow probably still has a BSI subscription that would get you access). A more accessible standard is Z39-29:2005, but again it’s at the required/optional elements level, rather than any syntax.

    As usual, the ex-UKDA, now UK Data Service has good advice on how to cite their data, and it can be adapted (in fact I think it is pretty much indistingushable from and advice from the non-ESRC RCs, if that makes sense). See http://ukdataservice.ac.uk/use-data/citing-data.aspx.

    I do note that Zotero doesn’t seem to have a resource type for datasets, nor indeed does Mendeley (other than “generic”), although the ancient version of EndNote I have access to does support “Online databases”.

    • April 26, 2013 at 8:30 am

      Thanks Chris. My concern is really that many organisations are repeating this same pattern – so now I have to read BSI, Z39….. and various other offerings and synthesise into a standard set of advice. I certainly don’t have the time to get up to speed on all of that so I just wondered if someone had already synthesised all the info and was keeping up to speed with it on behalf of the community and could guide us rather than us all reading everything in detail and perhaps interpreting some bits differently.

      If RCUK are happy to accept a URL/DOI/narrative that directs to more info about the dataset then maybe we will just try to do that – perhaps in lots of different, but perfectly acceptable ways, and see what feedback arises.

      So maybe I worry unnecessarily.

  5. Gerry
    April 26, 2013 at 7:47 am

    Valerie – the Datacite format is “Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier” (the TR Data Citation Index also recommend this)

    Have a look at their website….

    Gerry Lawson, NERC Research Information Systems, 01793-444417 (o) 07740-068060 (m) gela@nerc.ac.uk ________________________________________

    • April 26, 2013 at 8:37 am

      Hi Gerry,

      Thanks for your reply.

      I think I made the mistake of being unclear in my original post – not looking for lots of standards to review – looking at spec for format to insert in papers.

      We’ve already looked at Datacite and incorporated it into our plans. One of the drivers being that RCUK mentioned that this as an example of how to comply.

      So where not otherwise specified do you think we could put the key metadata into the paper (as per IASSIST example above), or suggest a statement e.g.

      This data can be accessed at ‘URL/DOI/University of Glasgow Records Store email …..’

      Or either?

      Would that comply?

      Also do you agree with the exchange with Caroline above – that the restrictions could be in a landing page or alternative – or do you think they need to be stated in the paper – as suggested in the current policy guidance?

      Thanks again – I know I ask you lots of questions!

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: