A set of Common Software Quality Assurance Baseline Criteria for Research Projects

Pablo Orviz; Mario David; Isabel Campos; Jorge Gomes; Germán Moltó; Vyacheslav Tykhonov; Doina Cristina Duma; Alvaro Lopez; Giacinto Donvito

The purpose of this document is to define a set of quality standards, procedures and best practices to conform a Software Quality Assurance plan to serve as a reference within the European research ecosystem related projects for the adequate development and timely delivery of software products.

Acknowledgements

The INDIGO-DataCloud, DEEP-Hybrid-DataCloud, eXtreme-DataCloud and EOSC-Synergy projects have received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement number 653549, 777435, 777367 and 857647 respectively.

Document Log

1. Introduction and Purpose

This document has been tailored upon the recommendations and requirements found in the Initial Plan for Software Management and Pilot Services deliverable [1], produced by the INDIGO-DataCloud project.

These guidelines evolved throughout the project’s lifetime and are being extended in the EOSC-Synergy [2], as well as the past DEEP-Hybrid-DataCloud and eXtreme DataCloud subsequent projects.

The result is a consolidated Software Quality Assurance (SQA) baseline criteria emanated from the European Open Science Cloud (EOSC), which aims to outline the SQA principles to be considered in the upcoming software development efforts within the European research community, and continuously evolve in order to be aligned with future software engineering practices and security recommendations.

2. Goals

3. Notational Conventions

The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119 [4].

4. Quality Criteria

The following sections describe the quality conventions and best practices that apply to the development phase of a software component within the EOSC ecosystem. These guidelines ruled the software development process of the former European Commission-funded project INDIGO-DataCloud, where they have proved valuable for improving the reliability of software produced in the scientific European arena.

The next sections describe the development process driven by a change-based strategy, followed by a continuous integration approach. Changes in the source code, trigger automated builds to analyze the new contributions in order to validate them before being added to the software component code base. Consequently, software components are more eligible for deployment in production infrastructures, reducing the likelihood of service disruption.

4.1. Code handling

4.1.1. Code Accessibility [QC.Acc]

4.1.2. Code Workflow [QC.Wor]

4.1.3. Code Management [QC.Man]

4.1.4. Code Review [QC.Rev]

Code review implies the informal, non-automated, peer review of any change in the source code [5]. It appears as the last step in the change management pipeline, once the candidate change has successfully passed over the required set of change-based tests.

4.1.5. Semantic Versioning [QC.Ver]

4.2. Files and documents

4.2.1. Licensing [QC.Lic]

4.2.2. Code metadata [QC.Met]

Metadata for the software component provides a way to achieve its full identification, thus making software citation viable [9]. It allows the assignment of a Digital Object Identifier (DOI) and is key towards preservation, discovery, reuse, and attribution of the software component.

4.2.3. Documentation [QC.Doc]

4.3. Static testing - White box testing

4.3.1. Code Style [QC.Sty]

Code style requirements pursue the correct maintenance of the source code by the common agreement of a series of style conventions. These vary based on the programming language being used.

4.3.2. Unit Testing [QC.Uni]

Unit testing evaluates all the possible flows in the internal design of the code, so that its behavior becomes apparent. It is a key type of testing for early detection of failures in the development cycle.

4.3.3. Test Harness [QC.Har]

In software development, a test harness [10], is a collection of software and test data used by developers to unit test software models during development. A test harness will specifically refer to test doubles, which are programs that interact with the software being tested. Once a test harness is used to execute a test, they can also utilize a test library to generate reports.

It is also a simple form of Integration Testing, where interaction and integration with external components are substituted by a Double.

Test Double is a generic term for any case where you replace a production object for testing purposes. There are various kinds of double [11]:

4.3.4. Test-Driven Development [QC.Tdd]

Test-Driven Development [12], is a software development process relying on software requirements being converted to test cases before software is fully developed, and tracking all software development by repeatedly testing the software against all test cases. This is opposed to software being developed first and test cases created later.

4.3.5. Security [QC.Sec]

Security assessment is essential for any production Software. An effective implementation of the security requirements applies to every stage in the Software Development Life Cycle (SDLC), especially effective at the source code level.

4.4. Delivery

4.4.1. Automated Delivery [QC.Del]

Automated delivery comprises the build of Software into an artifact, its upload/registration into a public repository of such artifacts and notification of the success of the process.

4.5. Deployment

4.5.1. Automated Deployment [QC.Dep]

5. Glossary

A1. Annex

The Quality Criteria described in this document follows a technology-agnostic approach, as such the choice of tools and services to implement the code workflow, followed by the verification & validation processes, is up to the team or community developing and/or using a given software.

This annex describes an implementation approach to help software developers cover the Quality Criteria detailed in this document.

A1.1. Code workflow

The code workflow is shown in Figure 1. It depicts a real case example, Github is used for several purposes that will be described below. The workflow starts when the developer branches the code to implement a given new feature or fix, after the implementation the Pull Request triggers a CI pipeline in the Jenkins service.

In the Jenkins service, several checks are performed, both on the code (static), such as code style linting, as well as dynamic tests such as unit and harness tests. To perform functional tests that may include API tests in case of a service, the packages will need to be built (delivery), and automatically deployed into a running state. In the case shown, the configuration is done with ansible roles and playbooks.

The Pull Request is updated with the results of the tests, thus notifying the developer team that they can proceed with the review and, if approved, with the release process where the branch is merge into the main production branch.

A1.2. Verification & Validation (V&V)

Once having the code workflow, the next step is to shape the V&V processes in order to ensure that the software does “the right things right”. One way to address them is to rely on CI/CD pipelines that encompass the series of checks that will be executed automatically for each change done in the code, driven by the agreed code workflow (see A1.1).

There are multiple technological solutions to compose CI/CD pipelines. The SQA as a Service (SQAaaS) platform¹ aids developers in such task, with the additional benefit of being 100% aligned with the criteria being described in this document. The SQAaaS solution relies on Jenkins code pipelines [16/] and allows developers to use either a graphical (web-based) or code-based approach to define the work to be performed in each stage of the CI/CD pipeline.

A1.3. Services

As it can be seen in Figure 1, multiple services take part in the execution of the CI/CD pipelines. Table 1 shows the list of services used for the source code management as well as to implement the V&V of the Quality Criteria detailed in this document.

One of the most popular services for Software source code management is Github. It is uses Git as the Version Control System, branching and tag management. Additionally has an issue tracker for bugs, requests and enhancements and the mechanism of Pull Requests for code review and discussion of changes.

Ansible and Ansible-Galaxy are used in many software frameworks as deployment and configuration tools.

The software is packed/built into executable artifacts that can be RPMs (case of RedHat and derivative OS), DEBs (case of Debian/Ubuntu and derivatives) and in many cases containers such as Docker images.

The artifacts are provided, in general, by public repositories and most notably Docker Hub [17/] in the case of Docker images. Other common repositories are PyPI [18/] for python SW and Maven [19/] for Java.

Regarding the CI/CD automation, Jenkins pipelines can be easily composed through the SQAaaS platform and put into the git repositories to be used by the Jenkins CI service to perform the tests. The tools used in the CI automation are shown in section A1.2.

Table 1: Tools and services used to implement the QA criteria, also shown the criteria where applicable. * All in the tooling; https://github.com/EOSC-synergy/sqaaas-tooling/blob/staging/tooling.json#L305. **QC.FAIR** - Criteria “FAIRness level of Digital Objects”, are not covered in this document.
Service	Usage	Criteria	Repo URL or documentation	Comment
Github	VCS	QC.Acc	https://docs.github.com/	Source code repository - git
Github	API	QC.Wor	https://docs.github.com/en/rest	git branching management and version tagging
Github	Issue tracker	QC.Man	https://docs.github.com/en/issues	Track issues, bugs, new features, etc.
Github	Pull Requests (PR)	QC.Man, QC.Rev	https://docs.github.com/en/pull-requests	Code review through PRs
Github	Documentation	QC.Doc		Documentation present in VCS repository
Ansible, Galaxy	Install, Config	QC.Dep	https://docs.ansible.com/	Automated deployment and configuration
SQAaaS platform	Pipeline composition	All *	https://sqaaas.eosc-synergy.eu	Pipeline composition for automatic tests
SQAaaS platform	Assessment & awarding	QC.Acc, QC.Doc, QC.Lic, QC.Met	https://sqaaas.eosc-synergy.eu	Bronze badge awarding
Jenkins CI service	Automated tests	All *	https://www.jenkins.io/	Execution of automatic tests
Docker Hub	Docker images	QC.Del	https://hub.docker.com/	Public repository of Docker images
PyPI	Python packages	QC.Del	https://pypi.org/	Find, install & publish Python packages
Maven	Java artifacts	QC.Del	https://mvnrepository.com/	Repository for Java dependencies

A1.4. Tools for CI/CD

References

INDIGO-DataCloud collaboration, Initial Plan for Software Management and Pilot Services

Members of the INDIGO-DataCloud collaboration

(2015) https://owncloud.indigo-datacloud.eu/index.php/s/yDklCrWjKnjutVA

EOSC-SYNERGY. EU DELIVERABLE: D3.1 Software Maturity baseline

Mario David, Isabel Campos, Valentin Kozlov, Amanda Calatrava, EOSC-SYNERGY

Digital.CSIC (2020-06-29) https://doi.org/gm97ww

DOI: 10.20350/digitalcsic/12607

The Open Source Definition, URL: https://opensource.org/osd

Open Source Initiative

https://opensource.org/osd

Key words for use in RFCs to Indicate Requirement Levels

Scott O Bradner

Internet Engineering Task Force (1997-03) https://datatracker.ietf.org/doc/rfc2119/

OWASP Code Review Guide | OWASP Foundation https://owasp.org/www-project-code-review-guide/migrated_content.html

Semantic Versioning 2.0.0, URL: https://semver.org

Tom Preston-Werner

https://semver.org

Licenses & Standards, URL: https://opensource.org/licenses

Open Source Initiative

https://opensource.org/licenses

SPDX License List | Software Package Data Exchange (SPDX) https://spdx.org/licenses/

Software citation principles

Arfon M Smith, Daniel S Katz, Kyle E Niemeyer, FORCE11 Software Citation Working Group

PeerJ Computer Science (2016-09-19) https://doi.org/bw3g

DOI: 10.7717/peerj-cs.86

10.

What is test harness? | Definition from TechTarget

Software Quality

https://www.techtarget.com/searchsoftwarequality/definition/test-harness

11.

xUnit test patterns: refactoring test code

Gerard Meszaros

Addison-Wesley (2007)

ISBN: 9780131495050

12.

Test-driven development: by example

Kent Beck

Addison-Wesley (2003)

ISBN: 9780321146533

13.

https://owasp.org/www-project-secure-coding-practices-quick-reference-guide/migrated_content

14.

Source Code Analysis Tools | OWASP Foundation https://owasp.org/www-community/Source_Code_Analysis_Tools

15.

EOSC-SYNERGY EU DELIVERABLE: D3.2 First prototype of the Service Integration platform

Pablo Orviz, Isabel Campos, EOSC-SYNERGY

Digital.CSIC (2020-10-08) https://doi.org/gm97w9

DOI: 10.20350/digitalcsic/12721

16.

Pipeline

Pipeline

https://www.jenkins.io/doc/book/pipeline/

17.

Docker https://hub.docker.com/

18.

PyPI · The Python Package Index

PyPI

https://pypi.org/

19.

Maven – Welcome to Apache Maven https://maven.apache.org/

The SQAaaS platform is being developed in the framework of the EOSC-Synergy project described in Deliverable 3.2 [15]). It details the architecture and set of components of the platform.↩︎

Issue	Date	Comment
v1.0	31/01/2018	First draft version
v2.0	05/02/2018	Updated criteria
v3.0	20/12/2019	Code management section, metadata for software
v3.1	05/03/2020	Add tags/names for each criteria
v3.2	23/04/2020	Add EOSC-Synergy to copyright
v3.3	15/10/2020	Fix issues: #32, #46, #47, #48, #49, #51
v4.0	02/02/2022	Add annex, spell check, Fix issues: #7, #35, #50, #57, #59, #60, #61, #63, #64, #65, #66, #67, #69, #70, #71, #72, #73, #74, #75, #76, #77, #78, #80, #81
v4.1	24/10/2022	update pipeline tools, Fix issues: #68, #83, #85, #88, #90, #91, #92

Tool	Criteria	Programming Language	Repo URL or documentation	Summary
git	QC.Acc	Agnostic	https://git-scm.com/	Code Accessibility
pylint	QC.Sty	Python	https://pylint.org/	Code style
hadolint	QC.Sty	Dockerfile	https://github.com/hadolint/hadolint	Code style
checkstyle	QC.Sty	Java	https://github.com/checkstyle/checkstyle	Code style
jsonlint	QC.Sty	JSON	https://github.com/zaach/jsonlint	Code style
pycodestyle	QC.Sty	Python	https://pycodestyle.pycqa.org/en/latest/	Code style
flake8	QC.Sty	Python	https://flake8.pycqa.org/en/latest/	Code style
tox	QC.Sty, QC.Uni, QC.Sec, QC.Doc	https://tox.readthedocs.io/	Automated test framework
bandit	QC.Sec	Python	https://bandit.readthedocs.io/	Static security
licensee	QC.Lic	Agnostic	https://github.com/licensee/licensee	Check license
cff-converter-python	QC.Met	Agnostic	https://github.com/citation-file-format/cff-converter-python	Code metadata
checkCitable	QC.Met	Agnostic	https://github.com/EOSC-synergy/sqa-composer-templates/tree/main/QC.Met	Code metadata
markdownlint	QC.Doc	Markdown	https://github.com/markdownlint/markdownlint	Documentation
restructuredtext-lint	QC.Doc	reStructuredText	https://github.com/twolfson/restructuredtext-lint	Documentation
find_doc_files.py	QC.Doc	Agnostic	https://github.com/EOSC-synergy/sqa-composer-templates/tree/main/QC.Doc/find_doc_files.py	Documentation
get_git_tags.py”	QC.Ver	Agnostic	https://github.com/EOSC-synergy/sqaaas-tooling/blob/release/1.7.0/QC.Ver/get_git_tags.py	Semantic Version