Nearly 120 IT professionals, scientists and managers from the Photon and Neutron (PaN) community attended the 2nd European PaN EOSC Symposium organised jointly by PaNOSC and ExPaNDS on 26th October 2021.
The first session focused on the grants’ main goals, challenges and achievements, with a joint presentation by projects’ coordinators Patrick Fuhrmann and Andy Götz. A showcase of a selection of use cases followed, relating to some of the tools and services developed in the EOSC projects, for FAIR data catalogues, data analysis and simulation.
In the second session, PaN EOSC sustainability models were discussed, with contributions from the Chairs of the LEAPS and LENS initiatives, Caterina Biscari and Robert McGreevy, who shared their views on the outcomes and future of the two EOSC projects and their proposed future data strategy.
After the event, PaNOSC Project Officer, Flavius Pana, congratulated all the speakers for the very informative and interesting talks, and expressed his appreciation for the free exchange of ideas at the end of the session.
PaNOSC and ExPaNDS – Main achievements
After the welcome address by Alun Ashton, Head of the Science IT at the Paul Scherrer Institute (PSI) – main organiser of the event, Patrick Fuhrmann talked about the major activities carried out within PaNOSC and ExPaNDS towards making FAIR data a reality for PaN facilities in Europe. After setting the big picture, Patrick went through the main results, starting with the consultation process carried out with various PaN facilities, discussing a first set of elements and regulations which were identified and included in a common research data policy framework, to be used as a reference for drafting and adopting FAIR data policies across PaN facilities. Progress towards making FAIR data a reality for PaN has also been achieved by identifying a set of rules for the setup of ontologies, i.e., a set of keywords for the definition of metadata catalogues and data format standards, to make data and metadata machine-readable, and consequently findable and accessible by the PaN scientific community.
Patrick highlighted the work carried out for the setup of standardised OAI-PMH interfaces, a PaN data portal and the remote data and compute access platform in what he described as the “new generation PaN facility infrastructure”. In particular he highlighted the remote data analysis platform, offering both Jupyter Notebooks and the VISA platform, which allows access to data, and to a number of services and tools for data visualisation, analysis and simulation.
For the education of future generations of PaN users, the projects have been working on the development of two platforms, pan-training and pan-learning, which include courses and moodles on photon and neutron science topics, also linking to the original source of data at PaN facilities.
For seamless access to the data services developed in the projects, a common federated AAI (Authentication and Authorisation Infrastructure) based on UmbrellaID and integrated with eduTEAMS has been set up.
Use Cases – Overview
To further engage with the community of PaN users, a selection of specific examples showcasing how the services developed in both grants can serve their needs were presented.
Use Case #1 – Petr Čermák – Describe data by scripts for future reuse
There is a long-lasting discussion in the PaN community about how to properly describe the data and which metadata are useful. To fulfil the last letter in FAIR, data needs to be reusable, which is often the most difficult task for large research infrastructures users.
Petr Čermák presented an easy and convenient way of describing the data by user scripts, using publicly available data at PaNOSC ILL, treating them with open-source software and publishing the scripts on a GitHub repository. The repository at Figshare was mirrored to get a citable entity and show how to use Binder to re-evaluate the data from any computer in the world “even after 100 years”.
This approach describes how processed data is obtained, through a transparent evaluation. Referees of the upcoming publication can easily verify the data treatment process; other scientists can easily learn how data can be treated and – most importantly – that the data treatment process will work forever.
Use Case #2 – Kamel Madi – Tomography Case Study
The tomography work undertaken at research facilities such as Beamline I13 at Diamond is generating a vast quantity of valuable in-situ radiography and tomography data. Publications presenting the work and results are published in open access journals. The custom applications have been published on Zenodo to allow for free and open access to the analysis package. However, the size limit of Zenodo’s data set means that it is not possible for the team to make the data open in the same manner. As a result, the raw data is available to peers upon request to the Principal Investigator (PI) and transfers are done manually using commercial services (e.g dropbox).
The two PaN EOSC projects can help by:
- Minting a DOI for the dataset and offering a data landing page where authorised users can download the dataset;
- Associating the dataset, publication and analysis workflow, and having each of them referencing the other using persistent identifier;
- Tagging the dataset and analysis workflow application using the common ExPaNDS taxonomy to make the elements easier to find by interested parties;
- Making the data findable via the common search API (and FAIR) and available in EOSC;
- Test the value of common search APIs/sample data sets for FAIR commercial use case;
- Allow for community engagement to gain collaborators to work on the data analysis;
- Explore options around the publishing and long term preservation of data;
- Supporting the Science team in using best practice for open science.
Use Case #3 – Mousumi Upadhyay Kahaly, Neutron diffraction from Boro-carbon for efficient structural analysis and defect detection
Neutron scattering is considered to be a complementary technique to electron microscopy which unveils detailed information on the defect structure in real space over tiny localised volumes in the specimen. Boron-doped diamond (BDD) is a conductive material and is considered as a potential candidate for electrode materials with large cell voltages. However, the exact role of Boron and its location within the crystal has not been investigated so far. Within the scope of this PaN user case, inelastic neutron scattering experiments and ab-initio calculations have been used to investigate the location-dependent response of defects in diamond, and BDD structures. Ab-initio tools from Atomistic Simulation Environment (ASE) has been used for obtaining structural and electronic properties, and relaxed nuclear positions. Based on these nuclear positions, neutron scattering is simulated with McStas code in a well-known experimental environment.
The origin of the diffraction peaks was identified, correlating them to individual system geometries. Our approach can correlate the appropriate ‘micro atomistic scenario’ among a manifold of possibilities to reproduce the observed ‘experimental macro features’.
Use Case #4 – Jan-Christoph Deinert – TELBE data analysis workflow and the PaN training platform UX
The TELBE user experiment is part of the ELBE Center for High-Power Radiation Sources, the superconducting linear electron accelerator ELBE, serving two free electron lasers, sources for intense coherent THz radiation, mono-energetic positrons, electrons, γ-rays, a neutron time-of-flight system, as well as two synchronised ultra-short pulsed Petawatt laser systems are collocated. The characteristics of these beams make the ELBE center a unique research instrument for a variety of external users in fields ranging from material science over nuclear physics to cancer research, as well as scientists of the Helmholtz-Zentrum Dresden-Rossendorf (HZDR).
Jan introduced the ultrafast experiments (temporal resolution << 100 fs) he is performing with his team at TELBE and the associated challenges and needs for specialised data correction algorithms. The overall data workflow from access to data in RODARE to the data correction using a jupyter notebook is quite complex. Jan showed us how using the workflow feature of the PaN training platform can help in such cases to organise the training resources available for each step of the workflow.
Use Case #5 – Yue Sun – Machine Learning-based Spectra Classification
Spectroscopy experiment techniques are widely used and produce huge amounts of data especially in facilities with very high repetition rates. At the European XFEL, X-ray pulses can be generated with only 220ns separation in time and a maximum of 27000 pulses per second. In experiments (e.g. SCS, FXE, MID, and HED) at European XFEL, spectral changes can indicate the change of the system under investigation and so the progress of the experiment. Immediate feedback on the actual status (e.g., time-resolved status of the sample) would be essential to quickly judge how to proceed with the experiment. The major spectral changes that we aim to capture are either the change of intensity distribution (e.g., drop or appearance) of peaks at certain locations, or the shift of those on the spectrum.
Machine Learning (ML) opens up new avenues for data-driven analysis in spectroscopy by offering the possibility to quickly recognise such specific changes on-the-fly during data collection, and it usually requires lots of data that are clearly annotated. Hence, it is important that research outputs should align with the FAIR principles. For XFEL experiments, it is suggested to introduce NeXus data format standards in future experiments.
Yue Sun presented an example to show how Neural Network-based ML can be used for accurately classifying the system state if data is properly provided. A solution has been demonstrated, to automatically find the regions (or bins) with high separability where the spectra classes differ significantly. By teaching individual neural networks for each bin and combining them with a weighting technique, a robust classification of any new spectral curve can be quickly obtained.
Use Case #6 – Frank von Delft – DOI, FAIR and MX COVID-19 use case
The COVID Moonshot is a worldwide consortium that includes academic and industrial groups, working to identify new compounds that could block the SARS-CoV-2 and develop an antiviral drug that would be safe, affordable and easily-manufactured.
The COVID Moonshot consortium aims to release all the data generated by the projects, to help researchers in the discovery of new antivirals molecules against coronaviruses.
Contributions by the PaN EOSC projects:
- A large amount of data has been created and analysed, before being released into the Protein Data Bank (PDB). PDB identification is used as PID.
- The usage of Zenodo was suggested as a data repository for raw diffraction data of proteins, allowing the community to access and process the data at will.
- A clear example of the good usage of Zenodo to serve a specific problem or community was presented , and both projects identified it as a possible solution for some of the PaN facilities that may not be able to afford a data catalogue of their own.
- Zenodo is a practical solution to find data. In this case, Zenodo data are linked to PDB data to maximise dissemination of information. There is a potential further maximisation of Zenodo through direct involvement in the development direction of the tool.
Views on the future of PaN data
Another use case, on the Human Organs Atlas, was presented by PaNOSC coordinator, Andy Götz, to provide an example of how the PaN data commons is meant to work for the wider PaN community, to search for data, download data and reuse them.
Shifting to the other main topic of the Symposium – sustainability, Andy went through the results of the survey submitted to the members of the LEAPS initiative, on their views on the outcomes of the two PaN EOSC projects, and showcased the possible options and funding actions identified for sustaining the PaN Open Science Commons.
Andy concluded that we need to keep the PaNOSC and ExPaNDS communities collaborating together even after the end of the two projects, on issues around data management for instance and also by creating a PaN data management working group, working towards enabling open science through the PaN common data portal via the EOSC.
Caterina Biscari (LEAPS Chair and director of ALBA) presented the LEAPS perspective on sustainability of the projects’ outcomes. She explained that LEAPS and LENS cover many of the partners involved in PaNOSC and ExPaNDS and LEAPS has a working group focused on IT. She finished her presentation by saying that LEAPS members are considering the suggestions from PaNOSC and ExPaNDS regarding sustainability and will bring the subject forward at the next General Assembly.
Robert McGreevy (LENS Chair and and director of ISIS) presented the LENS perspective, which is that neutrons and photon facilities should collaborate, focusing on the priorities for science, and not on techniques, and bear in mind that not all facilities are in the same position when it comes to budget, schedule or available resources. Robert encouraged the project coordinators to come forward with a proposal detailing what it means to make the project outcomes sustainable.
The very last session of the Symposium was dedicated to answering the questions from the audience, allowing a constructive discussion.