A set of Common Software Quality Assurance Baseline Criteria for Research Projects


              

A DOI-citable version of this manuscript is available at http://hdl.handle.net/10261/160086.

Authors

Abstract

The purpose of this document is to define a set of quality standards, procedures and best practices to conform a Software Quality Assurance plan to serve as a reference within the European research ecosystem related projects for the adequate development and timely delivery of software products.

Copyright Notice

Copyright © Members of the INDIGO-DataCloud, DEEP Hybrid-DataCloud eXtreme DataCloud and EOSC-Synergy collaborations, 2015-2021.

Acknowledgements

The INDIGO-DataCloud, DEEP-Hybrid-DataCloud, eXtreme-DataCloud and EOSC-Synergy projects have received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement number 653549, 777435, 777367 and 857647 respectively.

Document Log

Issue Date Comment
v1.0 31/01/2018 First draft version
v2.0 05/02/2018 Updated criteria
v3.0 20/12/2019 Code management section, metadata for software
v3.1 05/03/2020 Add tags/names for each criteria
v3.2 23/04/2020 Add EOSC-Synergy to copyright
v3.3 15/10/2020 Fix issues: #32, #46, #47, #48, #49, #51
v4.0 02/02/2022 Add annex, spell check, Fix issues: #7, #35, #50, #57, #59, #60, #61, #63, #64, #65, #66, #67, #69, #70, #71, #72, #73, #74, #75, #76, #77, #78, #80, #81
v4.1 24/10/2022 update pipeline tools, Fix issues: #68, #83, #85, #88, #90, #91, #92

1. Introduction and Purpose

This document has been tailored upon the recommendations and requirements found in the Initial Plan for Software Management and Pilot Services deliverable [1], produced by the INDIGO-DataCloud project.

These guidelines evolved throughout the project’s lifetime and are being extended in the EOSC-Synergy [2], as well as the past DEEP-Hybrid-DataCloud and eXtreme DataCloud subsequent projects.

The result is a consolidated Software Quality Assurance (SQA) baseline criteria emanated from the European Open Science Cloud (EOSC), which aims to outline the SQA principles to be considered in the upcoming software development efforts within the European research community, and continuously evolve in order to be aligned with future software engineering practices and security recommendations.

2. Goals

  1. Set the base of minimum SQA criteria that a software developed within EOSC development project MUST fulfill.

  2. Enhance the visibility, accessibility and distribution of the produced source code through the alignment to the Open Source Definition [3].

  3. Promote code style standards to deliver good quality source code emphasizing its readability and reusability.

  4. Improve the quality and reliability of software by covering different testing methods at development and pre-production stages.

  5. Propose a change-based driven scenario where all new updates in the source code are continuously validated by the automated execution of the relevant tests.

  6. Adopt an agile approach to effectively produce timely and audience-specific documentation.

  7. Lower the barriers of software adoption by delivering quality documentation and the utilization of automated deployment solutions.

  8. Encourage secure coding practices and security static analysis at the development phase while providing recommendations on external security assessment.

3. Notational Conventions

The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119 [4].

4. Quality Criteria

The following sections describe the quality conventions and best practices that apply to the development phase of a software component within the EOSC ecosystem. These guidelines ruled the software development process of the former European Commission-funded project INDIGO-DataCloud, where they have proved valuable for improving the reliability of software produced in the scientific European arena.

The next sections describe the development process driven by a change-based strategy, followed by a continuous integration approach. Changes in the source code, trigger automated builds to analyze the new contributions in order to validate them before being added to the software component code base. Consequently, software components are more eligible for deployment in production infrastructures, reducing the likelihood of service disruption.

4.1. Code handling

4.1.1. Code Accessibility [QC.Acc]

4.1.2. Code Workflow [QC.Wor]

A change-based approach is accomplished with a branching model.

4.1.3. Code Management [QC.Man]

4.1.4. Code Review [QC.Rev]

Code review implies the informal, non-automated, peer review of any change in the source code [5]. It appears as the last step in the change management pipeline, once the candidate change has successfully passed over the required set of change-based tests.

4.1.5. Semantic Versioning [QC.Ver]

4.2. Files and documents

4.2.1. Licensing [QC.Lic]

4.2.2. Code metadata [QC.Met]

Metadata for the software component provides a way to achieve its full identification, thus making software citation viable [9]. It allows the assignment of a Digital Object Identifier (DOI) and is key towards preservation, discovery, reuse, and attribution of the software component.

4.2.3. Documentation [QC.Doc]

4.3. Static testing - White box testing

4.3.1. Code Style [QC.Sty]

Code style requirements pursue the correct maintenance of the source code by the common agreement of a series of style conventions. These vary based on the programming language being used.

4.3.2. Unit Testing [QC.Uni]

Unit testing evaluates all the possible flows in the internal design of the code, so that its behavior becomes apparent. It is a key type of testing for early detection of failures in the development cycle.

4.3.3. Test Harness [QC.Har]

In software development, a test harness [10], is a collection of software and test data used by developers to unit test software models during development. A test harness will specifically refer to test doubles, which are programs that interact with the software being tested. Once a test harness is used to execute a test, they can also utilize a test library to generate reports.

It is also a simple form of Integration Testing, where interaction and integration with external components are substituted by a Double.

Test Double is a generic term for any case where you replace a production object for testing purposes. There are various kinds of double [11]:

As such the following criteria is defined for Test Harness:

4.3.4. Test-Driven Development [QC.Tdd]

Test-Driven Development [12], is a software development process relying on software requirements being converted to test cases before software is fully developed, and tracking all software development by repeatedly testing the software against all test cases. This is opposed to software being developed first and test cases created later.

4.3.5. Security [QC.Sec]

Security assessment is essential for any production Software. An effective implementation of the security requirements applies to every stage in the Software Development Life Cycle (SDLC), especially effective at the source code level.

4.4. Delivery

4.4.1. Automated Delivery [QC.Del]

Automated delivery comprises the build of Software into an artifact, its upload/registration into a public repository of such artifacts and notification of the success of the process.

4.5. Deployment

4.5.1. Automated Deployment [QC.Dep]

5. Glossary

API
Application Programming Interface
CLI
Command Line Interface
EOSC
European Open Science Cloud
OWASP
Open Web Application Security Project
SAST
Static Application Security Testing
SCM
Software Configuration Management
SQA
Software Quality Assurance
TDD
Test-Driven Development
VCS
Version Control System

A1. Annex

The Quality Criteria described in this document follows a technology-agnostic approach, as such the choice of tools and services to implement the code workflow, followed by the verification & validation processes, is up to the team or community developing and/or using a given software.

This annex describes an implementation approach to help software developers cover the Quality Criteria detailed in this document.

A1.1. Code workflow

The code workflow is shown in Figure 1. It depicts a real case example, Github is used for several purposes that will be described below. The workflow starts when the developer branches the code to implement a given new feature or fix, after the implementation the Pull Request triggers a CI pipeline in the Jenkins service.

In the Jenkins service, several checks are performed, both on the code (static), such as code style linting, as well as dynamic tests such as unit and harness tests. To perform functional tests that may include API tests in case of a service, the packages will need to be built (delivery), and automatically deployed into a running state. In the case shown, the configuration is done with ansible roles and playbooks.

The Pull Request is updated with the results of the tests, thus notifying the developer team that they can proceed with the review and, if approved, with the release process where the branch is merge into the main production branch.

Figure 1: Code workflow

A1.2. Verification & Validation (V&V)

Once having the code workflow, the next step is to shape the V&V processes in order to ensure that the software does “the right things right”. One way to address them is to rely on CI/CD pipelines that encompass the series of checks that will be executed automatically for each change done in the code, driven by the agreed code workflow (see A1.1).

There are multiple technological solutions to compose CI/CD pipelines. The SQA as a Service (SQAaaS) platform1 aids developers in such task, with the additional benefit of being 100% aligned with the criteria being described in this document. The SQAaaS solution relies on Jenkins code pipelines [16/] and allows developers to use either a graphical (web-based) or code-based approach to define the work to be performed in each stage of the CI/CD pipeline.

A1.3. Services

As it can be seen in Figure 1, multiple services take part in the execution of the CI/CD pipelines. Table 1 shows the list of services used for the source code management as well as to implement the V&V of the Quality Criteria detailed in this document.

One of the most popular services for Software source code management is Github. It is uses Git as the Version Control System, branching and tag management. Additionally has an issue tracker for bugs, requests and enhancements and the mechanism of Pull Requests for code review and discussion of changes.

Ansible and Ansible-Galaxy are used in many software frameworks as deployment and configuration tools.

The software is packed/built into executable artifacts that can be RPMs (case of RedHat and derivative OS), DEBs (case of Debian/Ubuntu and derivatives) and in many cases containers such as Docker images.

The artifacts are provided, in general, by public repositories and most notably Docker Hub [17/] in the case of Docker images. Other common repositories are PyPI [18/] for python SW and Maven [19/] for Java.

Regarding the CI/CD automation, Jenkins pipelines can be easily composed through the SQAaaS platform and put into the git repositories to be used by the Jenkins CI service to perform the tests. The tools used in the CI automation are shown in section A1.2.

Table 1: Tools and services used to implement the QA criteria, also shown the criteria where applicable. * All in the tooling; https://github.com/EOSC-synergy/sqaaas-tooling/blob/staging/tooling.json#L305. QC.FAIR - Criteria “FAIRness level of Digital Objects”, are not covered in this document.
Service Usage Criteria Repo URL or documentation Comment
Github VCS QC.Acc https://docs.github.com/ Source code repository - git
Github API QC.Wor https://docs.github.com/en/rest git branching management and version tagging
Github Issue tracker QC.Man https://docs.github.com/en/issues Track issues, bugs, new features, etc.
Github Pull Requests (PR) QC.Man, QC.Rev https://docs.github.com/en/pull-requests Code review through PRs
Github Documentation QC.Doc Documentation present in VCS repository
Ansible, Galaxy Install, Config QC.Dep https://docs.ansible.com/ Automated deployment and configuration
SQAaaS platform Pipeline composition All * https://sqaaas.eosc-synergy.eu Pipeline composition for automatic tests
SQAaaS platform Assessment & awarding QC.Acc, QC.Doc, QC.Lic, QC.Met https://sqaaas.eosc-synergy.eu Bronze badge awarding
Jenkins CI service Automated tests All * https://www.jenkins.io/ Execution of automatic tests
Docker Hub Docker images QC.Del https://hub.docker.com/ Public repository of Docker images
PyPI Python packages QC.Del https://pypi.org/ Find, install & publish Python packages
Maven Java artifacts QC.Del https://mvnrepository.com/ Repository for Java dependencies

A1.4. Tools for CI/CD

This section shows the tools being used in the CI pipelines, the criteria that it verifies, applicable programming language. This list is based on the template file in https://github.com/EOSC-synergy/sqa-composer-templates/blob/main/tooling.json.

Tool Criteria Programming Language Repo URL or documentation Summary
git QC.Acc Agnostic https://git-scm.com/ Code Accessibility
pylint QC.Sty Python https://pylint.org/ Code style
hadolint QC.Sty Dockerfile https://github.com/hadolint/hadolint Code style
checkstyle QC.Sty Java https://github.com/checkstyle/checkstyle Code style
jsonlint QC.Sty JSON https://github.com/zaach/jsonlint Code style
pycodestyle QC.Sty Python https://pycodestyle.pycqa.org/en/latest/ Code style
flake8 QC.Sty Python https://flake8.pycqa.org/en/latest/ Code style
tox QC.Sty, QC.Uni, QC.Sec, QC.Doc https://tox.readthedocs.io/ Automated test framework
bandit QC.Sec Python https://bandit.readthedocs.io/ Static security
licensee QC.Lic Agnostic https://github.com/licensee/licensee Check license
cff-converter-python QC.Met Agnostic https://github.com/citation-file-format/cff-converter-python Code metadata
checkCitable QC.Met Agnostic https://github.com/EOSC-synergy/sqa-composer-templates/tree/main/QC.Met Code metadata
markdownlint QC.Doc Markdown https://github.com/markdownlint/markdownlint Documentation
restructuredtext-lint QC.Doc reStructuredText https://github.com/twolfson/restructuredtext-lint Documentation
find_doc_files.py QC.Doc Agnostic https://github.com/EOSC-synergy/sqa-composer-templates/tree/main/QC.Doc/find_doc_files.py Documentation
get_git_tags.py” QC.Ver Agnostic https://github.com/EOSC-synergy/sqaaas-tooling/blob/release/1.7.0/QC.Ver/get_git_tags.py Semantic Version

References

1.
INDIGO-DataCloud collaboration, Initial Plan for Software Management and Pilot Services
Members of the INDIGO-DataCloud collaboration
(2015) https://owncloud.indigo-datacloud.eu/index.php/s/yDklCrWjKnjutVA
2.
EOSC-SYNERGY. EU DELIVERABLE: D3.1 Software Maturity baseline
Mario David, Isabel Campos, Valentin Kozlov, Amanda Calatrava, EOSC-SYNERGY
Digital.CSIC (2020-06-29) https://doi.org/gm97ww
3.
The Open Source Definition, URL: https://opensource.org/osd
Open Source Initiative
https://opensource.org/osd
4.
Key words for use in RFCs to Indicate Requirement Levels
Scott O Bradner
Internet Engineering Task Force (1997-03) https://datatracker.ietf.org/doc/rfc2119/
5.
6.
Semantic Versioning 2.0.0, URL: https://semver.org
Tom Preston-Werner
https://semver.org
7.
Licenses & Standards, URL: https://opensource.org/licenses
Open Source Initiative
https://opensource.org/licenses
8.
SPDX License List | Software Package Data Exchange (SPDX) https://spdx.org/licenses/
9.
Software citation principles
Arfon M Smith, Daniel S Katz, Kyle E Niemeyer, FORCE11 Software Citation Working Group
PeerJ Computer Science (2016-09-19) https://doi.org/bw3g
10.
What is test harness? | Definition from TechTarget
Software Quality
https://www.techtarget.com/searchsoftwarequality/definition/test-harness
11.
xUnit test patterns: refactoring test code
Gerard Meszaros
Addison-Wesley (2007)
ISBN: 9780131495050
12.
Test-driven development: by example
Kent Beck
Addison-Wesley (2003)
ISBN: 9780321146533
13.
14.
Source Code Analysis Tools | OWASP Foundation https://owasp.org/www-community/Source_Code_Analysis_Tools
15.
EOSC-SYNERGY EU DELIVERABLE: D3.2 First prototype of the Service Integration platform
Pablo Orviz, Isabel Campos, EOSC-SYNERGY
Digital.CSIC (2020-10-08) https://doi.org/gm97w9
16.
17.
18.
PyPI · The Python Package Index
PyPI
https://pypi.org/
19.
Maven – Welcome to Apache Maven https://maven.apache.org/

  1. The SQAaaS platform is being developed in the framework of the EOSC-Synergy project described in Deliverable 3.2 [15]). It details the architecture and set of components of the platform.↩︎