Scientific data management is a broad term that involves all stages of the digital data lifecycle including capture, analysis, sharing, and preservation. When done well it can help advance science by allowing data to be used and verified more broadly than originally intended.
The extensiveness of what is saved is a strategic decision that needs to balance the costs and benefits to come to a final decision of what to do.
Our typical principle is that the data immediately behind the plots in the scientific manuscript is probably the most useful and likely to be re-used. This is, after all, part of the reason why it was plotted.
An example of what to do then is provided in Mazzone2021Antiferromagnetic. Please copy the
README.rstfile from the example to the new repository and update the file to reflect the current manuscript.
Commit the data needed to make the plots and sufficient instructions so that someone with no knowledge of the work could generate the plots. The easiest way to do this is to use the plotting scripts behind making the plots for the paper. As long the plotting code is reasonably well written it should be adequately clear.
A minimum is then to ask Mark to link the repository to Zenodo, which is a system designed by CERN to store data and link it to a specific citable item. Each time a release is created in the github repository Zenodo stores the data and creates a new citable item. In most cases, we will just create one release. We suggest this is called First release and named v1.0.0. If, for any reason, an update is needed more releases can be created.
A final desirable step is to specify how to run the plotting scripts. We suggest doing this by including a
requirements.txtfile and using the tool jupyter-repo2docker. In this way, the code should run by following the instructions in the
README.rstfile. For most purposes, it will be enough to just copy the
requirements.txtfiles. If more packages are needed, update
requirements.inand then run
pip-compile requirements.into generate a new
requirements.txtfile. This is comprehensively specifying exactly what version python packages are being used and will ensure long-term reproducibility of your code even if future versions of the code include changes that break your code.