data42 — Life(science), the universe, and everything agile

Pascal Bouquet
8 min readJul 2, 2021

Starting a project with a vision like data42 requires a modern technological approach that is a very new addition to the DNA of a big pharma, and allows for many of the benefits of a tech startup mentality while leveraging the time-tested backing of the Enterprise. This balance is sometimes difficult, but almost invariably, always worth the effort. data42 is Novartis’ ambitious initiative to change the paradigm of healthcare data in Research & Development (R&D).

We started out with a vision and a team of high-velocity and high-agility resources from Engineers to Agile coaches and Leadership. To put it mildly — we learned a lot.

While defining architecture principles and making technology choices were key, we also had to learn to become agile and find the right balance between working on use cases and developing products. The following lines will give you a summary of our journey, seen from a technological point of view.

Becoming Agile

As most of us from the data42 tech team are from the world of big pharma and used to the traditional implementation of technology solutions, we had to learn the new agile ways of working. Agile coaches have been essential to this journey.

Starting in 2019, we quickly created multidisciplinary “two pizza teams” with fully dedicated resources. We rapidly integrated in our set-up the concept of 12-week missions, with intermission times to address tech debt and re-configure any mission targets. And at the program level we adopted the concept of defining workstreams that drive towards well-designed program-level Objectives and Key Results (OKR). In this methodology, anything that does not progress towards an OKR is slightly de-prioritized in favor of any work which will advance those OKRs.

This concept has enabled us to regularly redefine our priorities and to pivot a number of times as necessary to ensure constant delivery of quality PoCs (proof of concept), MVPs (minimum viable product), and Products.

The two weeks intermission have been instrumental to align dependencies between teams and to prepare the backlog for the next mission. As of today, our ~12 teams have a cadence of 3-week sprint lengths which has proven to be optimal in our context.

It sometimes takes a lot of learning to adopt new ways of working, and after almost three years of working in this environment, I am impressed by what the teams accomplished, especially the active enablement of empowerment and the unbossed culture every day. And we no longer need to think about agility, as it has become an integral part of our working DNA.

Learning: Look at Agile transformation as a journey! Agile is about speed, but without rushing changes. You can implement tools quickly but changing ways of working takes months. Having agile coaches is key to achieving the necessary cadence while also giving time for people to digest new ideas and new ways of working. It allows teams to think them through, discuss and see for themselves if they work. When the day comes that people no longer speak about agile, but instead speak of the tasks, stories, and product release schedules, you have made your culture change!

Reflection on data, use cases and products

At the beginning of our data42 journey, the first need was to ingest data existing in a number of geographically disparate data siloes into a common R&D data lake. We quickly learned that the ‘data hunt’ was a never-ending journey, and that it did not necessarily provide the high value we wanted to provide. Very quickly, we saw the need to focus on concrete use cases. The first selected ones were those that gave priority to the OKRs of our data and engineering teams. However, we realized that even with the best selection, the value you can provide with a single use case is limited and very specific to one hypothesis.

That is when we identified the need to create a number of products that would make it easier for both technical and non-technical scientists to discover our data and get access to secured and anonymized patient data in a harmonized and often pre-aggregated state.

That intrinsic pattern has enabled the scientific community to very quickly reach the point of insight generation that was a massive undertaking in the past and is now something that ‘just works’.

As a result, we have developed our product and demand management capabilities, and can now identify one-off use cases more easily, as opposed to the repetitive use cases that can lead to the creation of a good product.

Learning: Finding the right balance between use cases and products comes only with experience. Developing product management capabilities takes time. And understanding the unique use case versus the right product is a necessity for success.

Good technology principles matter!

When you start with a vision like data42’s, technology and platform matter. Where do you start?

We started with a number of “technology principles” that we defined with the program architects and the technical leads. We all agreed that to support the bold vision of a program like data42, we needed bold principles:

  • Embrace native and in-house designed cloud services
  • Create a loosely coupled API (Application Programming Interface) first architecture
  • Separate compute from storage
  • Implement infrastructure as code
  • Bet on Spark as an engine
  • Adopt a policy-based access control
  • Security first

We had several internal meetings to explain the rationale for a cloud approach, and especially the way we would secure data in this paradigm. We took advice from numerous external consulting firms in conjunction with our internal experts, and even went so far as to perform benchmarking across competing principles to assure the most value was derived from these principles, and abandoned those which did not help us reach one of those goals.

Learning: Technology is changing faster every day. If you want to build for the future, you can’t be conservative. Also pay attention to benchmarking with peers across your industry and beyond. Technology may have changed since then! In fact: almost certainly has!

Architecture matters and User Journeys matter as well

Photo by Bonneval Sebastien on Unsplash

As technology changes rapidly, it is important to architect your platform based on capabilities rather than vendors or specific technologies, so that it can actively evolve as new insights are gained. Therefore, we architected the data42 platform considering that at any given time you may need to adapt or replace some of the technology components. It has also been important to consider that products and services will also evolve rapidly, triggering new capabilities and technology requirements, relying on the user base and its changing demands as a template.

Now, if you would like your platform really to be used, this is not only about architecture and technology: you need to make it simple for users, and this requires thoughtful user journey. Those user journeys are not graved in the marble, journey maps are living artefact that will continuously evolve over time as the insights we gain from the first users will tell us more stories about our users over time and develop a deeper empathy for them.

With all the above, this is essential to design an architecture with decoupling in mind and to leverage APIs.

Learning: Move some of the most experienced architects where they are needed most — to software development teams that are designing complex technology. Once redeployed and empowered to drive change, architects can help simplify technical stacks and create technical agility.

Platform choices

When we started our data42 initiative, we began by developing our platform using as much as possible the existing cloud services. In the absence of a mature data lake blueprint, we put a lot of effort into building an MVP platform that allowed us to integrate our data, to manage metadata, to have a data catalog, a search layer, a policy based access control and an API/SDK (software development kit) layer to access the data.

This was good and generated a lot of passion from our technical teams as well as the business needs, and we supported the first missions of data42 with a solid base.

We then realized that the effort to industrialize and continue to develop such a platform was huge… And that we were not an engineering company. Indeed, most of the conversations of our technical teams were focused on the evolution of the technical platform, when our desire was to use these resources to create products or to resource use cases.

While retaining our architecture principles, we decided to move towards a managed platform as a central storage and processing engine that would allow us to leverage an industrial platform and the talents of an engineering company, while focusing our internal teams more on the development of products and use cases as well as the connected environment of tooling which supplements the proffering from our vendors.

After a successful PoC that ran for 6+ months, we made the pivot in early 2020 and in May 2020, the entire data42 team started working on the new platform. Our bet was to see an acceleration of the product development, which was demonstrated in November 2020 with the creation of the first successful MVP.

What’s next?

Beyond these learnings, we also learned a lot by industrializing our MVP and defining ways to move more quickly from PoC to MVP to industrialized products. We have also learned to manage our growing community of engineers and to implement a continuous innovation process through multiple Tech Spikes and Proof of Concepts.

if you are interested in being part of our team of highly skilled engineers, and if you are passionate about changing the way we develop medicines apply for a role on the data42 technology team!

Stay tuned for more exciting news from data42!


The significant advances we have been able to make are essentially the result of the teamwork of talented engineers who are passionate about the impact they can have on society.

I would like to thank them all for their relentless efforts.

About data42

data42 is Novartis’ transformational program to leverage our resource of patient, clinical and research data — one of the largest and most diverse datasets in the pharma industry — with the ultimate moonshot goal of changing the way we develop medicines.

Contact us at

Read more about data42:

Leadership lessons from a grand endeavor

Building the Map of Life, our single source of healthcare R&D data



Pascal Bouquet

Technology leader passionate about #innovation, #digitalhealth, #startups, #data, #ai and … #triathlon. Opinions are my own and not the views of my employer