The Importance of Free and Open Source Software (FOSS) in the Biosciences.

Modern life science runs on software, yet much of it remains closed. From statistical packages such as SPSS to systems like AlphaFold 3 (Abramson et al., 2024) - whose outputs are shared under a Creative Commons licence while the underlying training infrastructure remains proprietary - the tools that underpin biological research are often controlled, restricted, or expensive.

Open source is not a foreign concept in academia. Programming languages such as Python and R are used extensively in research and are both free and open source. This removes financial barriers and allows libraries such as BioPython, Matplotlib, and ggplot2 to evolve alongside scientific needs. In contrast, platforms such as SPSS or GraphPad Prism operate under restrictive licensing models. While powerful, they limit modification, prevent inspection of underlying implementations, and often require costly institutional subscriptions. Users are consumers rather than contributors.

The open development model does more than reduce costs; it strengthens scientific integrity. When code is accessible, algorithms can be inspected, challenged, and improved. This aligns closely with the core principles of science itself: transparency, critique, and reproducibility. Projects that are deprecated can be forked to maintain compatibility with modern systems, and the availability of source code enables the creation of supplementary tools - from API extensions to something as simple as a graphical interface layered over a command-line program. A prime example of this model functioning effectively is the National Center for Biotechnology Information (NCBI). As a U.S. government institution, much of its software and data, including resources such as BLAST (Figure 1) and GenBank, are placed in the public domain. The code, databases, and interfaces are freely accessible, allowing researchers worldwide to build pipelines, mirror datasets, and integrate these tools into their own systems without restriction, including using their own datasets for specialised workflows. This openness makes NCBI resources foundational to modern biology, enabling accessibility, reproducibility, and most importantly, accountability in research. It is difficult to imagine modern molecular biology without tools such as BLAST or COBALT. Yet if such foundational tools were proprietary, access to sequence comparison - one of the most basic analytical methods in genomics - would be subject to commercial gatekeeping and profit margins.

Figure 1 - NCBI Blast website running on GNU/Linux

Imagine if BLAST were not a public domain tool provided by the NCBI, but a proprietary product owned by a private corporation - let’s call it ‘GenBlast’. In this scenario, a basic sequence alignment might cost $10 per run, or require a $5,000 annual institutional seat. Graduate students working on novel organisms would be unable to ‘mirror datasets’ locally to experiment freely; instead, they would be forced to upload their private, potentially sensitive data to a corporate cloud server to be processed. Worse, if ‘GenBlast’ released an update that slightly altered how E-values were calculated to improve speed (convenience), researchers would have no way to verify the change. They could not inspect the code to see why their matches suddenly differed from previous years. The basic analytical method of genomics would become a trade secret, and the reproducibility of decades of genetic research would rely entirely on the solvency and goodwill of a single company.

My own bioinformatics project, PicoMol, serves as a case study in this regard. While it functions as a plasmid viewer and structural analysis toolkit, its real value lies in how it was built. Proprietary suites, like SnapGene, often create walled gardens, locking features behind tiered subscriptions. PicoMol, conversely, demonstrates the power of composability. By synthesizing the computational libraries of Biopython with the graphical capabilities of Qt6 (when i’ve finished the migration from the now deprecated Qt5), a single undergraduate can build a tool that rivals expensive commercial suites. This is not a claim that open-source developers are superior engineers. It is a recognition that open collaboration compounds progress. It is because the open model allows us to stand on the shoulders of developers who choose to share their work, rather than facing financial barriers to access. The corporate model monetises fragmentation. Open development enables integration. PicoMol’s combined suite proves that when the motive shifts from profit to discovery, scientific tools naturally tend towards connection, not fragmentation.

It would be naive, however, to ignore the advantages that proprietary software can offer. Analyzing raw biological data is a complex task, and for many wet-lab scientists, the steep learning curve of command-line tools is a significant barrier to entry. Commercial platforms such as GraphPad Prism, SPSS, or specialised software designed to operate a complex piece of equipment often provide polished interfaces, dedicated customer support, and tightly integrated workflows that hide away the underlying code, allowing researchers to generate publication-ready visualisations without needing to become programmers first. In fast-paced research environments, convenience matters. Industry investment can also accelerate development, particularly in areas requiring significant computational infrastructure, as seen with advanced AI-driven systems in structural biology, such as the invaluable contributions towards AlphaFold from Google’s DeepMind. However, these benefits come at a cost beyond licensing fees. With proprietary software operating as a black box, algorithms cannot be meaningfully inspected, modified, or independently validated. Researchers must trust that implementations are correct and that updates will not silently alter analytical behaviour. Consider the implications for a PhD student. If a proprietary model is updated halfway through a thesis, and results suddenly shift, the student has no way to prove whether the change is a scientific discovery or a software patch, potentially invalidating years of work. When licences expire or companies pivot, access can disappear entirely, stranding the researchers, both academic and industrial, who rely on such workflows. In such cases, methodology depends on corporate continuity rather than reproducibility. Science, by contrast, is meant to outlive the products that support it.

While commercial tools may offer convenience, convenience for convenience’s sake is not a scientific virtue. Transparency is.

The dependence on proprietary systems in academia extends beyond specialised scientific software. Much of modern research infrastructure runs atop closed operating systems and productivity suites such as Microsoft Windows and Microsoft Office. These platforms are often treated as default infrastructure. They are the systems researchers actually rely on to run experiments, store data, and write papers. When the infrastructure itself is proprietary, researchers are subject to licensing terms, update cycles, file format constraints, and long-term corporate strategy decisions that lie entirely outside the academic sphere, serving only executives and shareholders whose priority is market dominance over scientific longevity.

Scientific inquiry demands scepticism and verification. It is inconsistent, then, for research institutions to depend on digital infrastructure that they cannot inspect, modify, or control.

The risk of proprietary infrastructure becomes particularly visible in systems such as Microsoft Windows and Microsoft Office. Windows update cycles are controlled entirely by Microsoft, and major updates have, at times, introduced instability, deprecated features, or removed support for older hardware. High-value instruments - such as electron microscopes and plate readers - often possess a mechanical lifespan that far exceeds the support window of their proprietary drivers. It is a common reality in academia to find six-figure instruments tethered to isolated computers running Windows XP or Windows 7, simply because the vendor has ceased software support and the hardware is incompatible with modern operating systems. This forces a wasteful choice: maintain insecure, air-gapped machines that hinder data transfer, or decommission perfectly functional scientific equipment due to software obsolescence. In an open ecosystem, such as the Linux kernel, community-maintained drivers could extend the life of these instruments indefinitely, ensuring that grant funding is spent on new discoveries rather than replacing tools that already work.

If a vendor deprecates the software for a £500,000 electron microscope, a university running Linux is not forced to scrap the machine. They, or a hired developer, can update the kernel driver to keep it running on modern infrastructure. This restores control to the laboratory, and ensures the task of coding the software is not placed on the researcher as writing or maintaining drivers in low-level languages such as C or Rust is not a skill expected of biologists. The focus is instead on Right-to-Repair. While navigating university procurement and grant funding to contract a specialist is rarely simple, the math remains undeniable, paying for a few weeks of custom development is infinitely cheaper than purchasing six new microscopes every Windows update cycle.

This brings me to my critical point about education. To reclaim control over our infrastructure, we must integrate computational literacy, not just coding, into the undergraduate curriculum. This does not mean every biologist must learn to write kernel drivers. Rather, the goal is to shift students from being passive ‘consumers’ of software to active architects of their digital environments. Just as a researcher understands the optical principles of a microscope without needing to know how to grind the lenses, they should understand the functional nature of their operating systems (such as filesystems, command-line integration, or even learning how to install tools such as BLAST locally). By teaching the fundamentals of how open systems function, we empower the next generation to manage their infrastructure rather than be held hostage by it. A biologist with this systems thinking knows enough to deploy a community patch, migrate data from a legacy format, or collaborate with a developer to keep a six-figure instrument running, ensuring that research is limited only by physics, not by a licensing agreement.

This struggle for sovereignty extends beyond the laboratory bench and into the cloud. In the last decade, Microsoft Office has evolved from a standalone tool into a subscription-based ecosystem increasingly integrated with AI-driven features such as Copilot. This creates a new crisis of data sovereignty. Cloud-based tools often require researchers to upload sensitive, unpublished data to infrastructure they do not own, governed by agreements few read. This raises a critical question: does the data remain the exclusive property of the scientist, or does it become training data for the corporation? In a closed system, there is no technical guarantee that a novel protein sequence, confidential patient dataset, or new drug discovery will not be used to train commercial models.

Closed infrastructure requires trust in authority. Open systems distribute that trust across a community.

This is not to imply that open systems are effortless. They are constantly maintained, tested, and iterated upon by the communities and organisations that develop them. Their stability does not emerge from corporate control, but from continuous collaborative refinement. The sense of community that arises within open-source spaces fosters shared responsibility; bugs are reported publicly, improvements are discussed transparently, and progress is visible to all.

Ultimately, the reliance on proprietary software introduces a systemic fragility into the heart of bioscience. When methodology is embedded in closed systems, reproducibility depends on corporate stability rather than scientific principle. We risk a future where critical instruments are rendered obsolete by arbitrary driver updates, where novel datasets become training fodder for corporate AI, and where reproducibility is limited by the lifespan of a subscription. To reclaim our independence, we must stop treating software as a mere utility and start treating it as a foundational scientific discipline. This shift begins in the classroom. By integrating open systems and computational literacy into the undergraduate curriculum, we empower the next generation of biologists to be architects of their own workflows rather than consumers of restricted products. In the computational era, if we cannot inspect the code, we cannot fully verify the science. One silent update is enough to introduce a bug that can spiral into career‑ending errors. Open source is not just a software model, it is the infrastructure most compatible with the scientific method.

Science requires us to show our work. In the computational era, that includes the code.

The choice is not simply one of convenience or cost, but of who controls the means of knowledge production. If we want science to remain reproducible and independent, FOSS must become the standard, not the exception, in bioscience research, education, and industry.

Open Source Projects
#

Biopython – A library for bioinformatics in Python.
PicoMol – Your bioinformatics toolkit.
NCBI Tools – Publicly available bioinformatics resources.
The Linux Kernel – Open-source operating system kernel.
LibreOffice – Open-source office suite.

GitHub Resources
#

Matplotlib – Python plotting library.
Qt6 – Open-source GUI framework (up-to-date version).
Open-Source Molecular Biology Projects – Community-maintained bioinformatics projects.

Useful Citations & Further Sources
#

On AlphaFold and Open Source Licensing
#

AlphaFold 3 licensing terms and restrictions – The official GitHub repository showing AlphaFold 3’s Creative Commons Attribution‑NonCommercial ShareAlike licence and terms of use for model weights, which restrict commercial use.
Accurate structure prediction of biomolecular interactions with AlphaFold 3 (Nature) – The peer‑reviewed paper describing AlphaFold 3’s architecture and its ability to predict structures of proteins, nucleic acids, and other biomolecular complexes.

On Open Source, Reproducibility, and Scientific Software
#

Publishing computational research and open infrastructure – Review showing how open access to code and data supports reproducible computational science.
Importance of open source and open standards in scientific publishing – Article discussing why reliance on proprietary software and formats can negatively affect scientific communication.
ReScience C journal - reproducibility with FOSS – A journal dedicated to replications done with free and open‑source software, highlighting the role of FOSS in scientific verification.

Author

Jack Magson

Hi, I’m Jack. I’m a 21-year-old undergraduate biochemist at the University of Bristol with a passion for understanding the molecular intricacies of life. Beyond the lab, I’m an avid musician and proud president of the University of Bristol Big Band Society, where I get to indulge my love for jazz and big band music. When I’m not studying or making music, you’ll find me exploring open source projects and contributing to some of my own. This blog is where I share my thoughts on biochemistry, music, software, and everything in between.

Further Reading #

Open Source Projects #

GitHub Resources #

Useful Citations & Further Sources #

On AlphaFold and Open Source Licensing #

On Open Source, Reproducibility, and Scientific Software #

Further Reading
#

Open Source Projects
#

GitHub Resources
#

Useful Citations & Further Sources
#

On AlphaFold and Open Source Licensing
#

On Open Source, Reproducibility, and Scientific Software
#