Simulating Sterility Breaches from Non-Parametric Data - (2023-US-PO-1400)

Briana Russo, Senior Statistician, Merck

Sterility breaches of pre-filled syringes of a drug product are not directly measured but are known to be a function of syringe dimensions, plunger movement and fill weight. Fill weight is dynamically controlled so a non-parametric fit in the JMP Distribution platform was used to fit a Kernel Density based on real-world data. JSL was used to simulate 10 million iterations based on the non-parametric fit, along with plunger movement simulations based on dimension specifications and measured frictional forces. Process time for the simulations were reduced over three-fold by using invisible tables, simplifying the output and eliminating saved formulas.

My name is Briana Russo, and I'm a senior statistician

at the Center for Mathematical Sciences at Merck.

Today I'll be going over simulating sterility breaches

with non-parametric data.

At Merck, we often deliver our liquid formulated drugs in prefilled syringes.

A group at Merck that specialized

in that came to me asking me to see if I could simulate

if there is any risk to sterility breaches in them, depending on historical data

and some different scenarios they wanted to look at.

There was two interesting parts of this

that I wanted to go over today in my coaster and discuss a little further.

The first was some of the historical data,

specifically, the fill weight was non-normally distributed.

When filling the syringes, it's not necessarily processing to a target.

It's able to move within a range

and even drift outside of that range for a bit before being corrected.

That often results in some heavy tailing of the data,

which you can see in the bottom left here.

That's an example of that.

We wanted to make sure that we were capturing that heavy tailing,

because obviously that's where the highest risk is going to be.

The other interesting part that goes specifically, into some JSL scripting

is that I was dealing with a large number of iterations asked for by the customer

because they were looking for 10 million per scenario because that's the order

of magnitude, they were expecting to create the syringes.

I was able to, during the project, discover some techniques

to reduce the processing load on JMP

that was able to significantly reduce the process time for when I was running

the simulations and prevent any crashing or anything like that from memory issues.

I'll touch on both of those things.

But first, I wanted to go into a little bit about more background

on the prefilled syringes and what we were looking at.

As I mentioned, we have the fill weight data.

That's the amount of liquid that's filled into the syringe.

That again, I wanted to look at non-parametrically

using a density function.

I was able to find that that was very easy to do in JSL.

I'll show how I did that.

Then the other aspect was the plunger insertion depth.

How deep is the plunger being inserted and how close is that to the liquid fill?

Then the dimensions of the prefilled syringe.

There is some variability from the manufacturer,

I wanted to make sure that was being captured.

There were two key outputs, and they were a yes or no output for each.

The first was, we want to make sure that we were maintaining a gap

between the liquid fill and the plunger.

Because if we don't, then we're going to be getting

liquid up on the plunger, and that could be a sterility risk.

We wanted to make sure that the air gap length was always greater than zero.

The other one was we also don't want that air gap to be too big

because when we're shipping the syringes, say, on an airplane,

they might be exposed to lower atmospheric pressures,

which can cause the plunger to move up.

If it moved up too much, it could go beyond a sterile barrier

that was created when the plunger was inserted.

We don't want it to go too low. We don't want it to go too big.

But there's a lot that goes into the plunger movement,

not only the air gap, which is a function of the dimensions

of the plunger and how deep the plunger was inserted

and how close it is to the fill.

But again, also different atmospheric pressures

and the cross sectional area, so the dimensions of the syringe.

There's a lot of different inputs

and different sources of variability potentially

to that plunder movement.

I wanted to be able to simulate all of those.

That meant that I knew that my data table and JMP

that I wanted to simulate into was going to be very big.

The first change that I was able to make, to make these simulations

a lot more efficient was actually just opening up the historical data

that I was going to use, the data table I was going to use

as being invisible.

This made it so JMP didn't have to render the table,

this potentially massive table I was going to create

and was able to really reduce process time

and also prevent jump from crashing at times,

it said, the memory of my laptop was exceeded.

Once I opened up the historical data as invisible,

I then would add enough rows to that just blank rows to get me to 10 million,

because obviously my historical data wasn't that big.

But I wanted to make sure that the data table had 10 million rows,

so then I could go ahead and simulate 10 million iterations.

Specifically, what I did for the non-parametric aspect

of the data was I fit the data in the distribution platform in JMP,

and then I was able to just very easily use the fit smooth curve function

to save simulations from that non-parametric data

to 10 million iterations.

Super simple and easy way to fit essentially kernel density function

simulated values.

The other two things that really improved my simulation

was, as I mentioned, there was a lot of different calculations

that I was doing within a data table and different scenarios over 20 different,

for example, plunger depth targets we wanted to look at.

As part of my JSL script, I wanted to be looping over different scenarios.

But if I was just going to create a column that then referenced previous columns

in a loop, that could cause reference issues

for each iteration of the loop, because I would end up with essentially

all of the new columns having the same formula

because they'd all just end up referencing whatever the last

iteration of the loop was.

To prevent that, if I wanted to use a formula

for the column, I would then need to delete the formula.

Again, very inefficient.

One very simple and easy way that I could get around this

was instead of saving a formula for a new column,

just use set each value.

This means that JMP didn't need to save the formula at all.

It eliminated that issue with the looping reference

and then also, again, reduced process time.

The final improvement that I made was by really working with my customer

in this case, and really figuring out what exactly they needed,

I was able to streamline things a lot.

Because initially, I was just giving them the kitchen sink.

Giving them distributions and histograms of every single parameter and output,

which they thought was interesting but was not really worth the effort

and worth the process time.

What they really just wanted was what is the % failure rate for these two outputs?

I was able to make delivering that a lot more efficient

by eliminating the need of opening up, say,

a distribution platform and trying to fit 10 million rows.

Instead, I just made sure that any sterility breach,

I just created a column where if a sterility breach occurred,

it was a one, if it didn't, it was a zero.

Then it was very easy to just calculate the column mean to give

the percentage of failure for any scenario and directly output that to a journal.

That way, the journal also wasn't having to be massive because it was saving

so much information from the data table because it was creating graphs from it.

Overall, initially in this project,

I was able to deliver it, but by using the platform outputs,

visible tables, and save formulas, it was taking at least three hours.

Often, I was letting it run overnight,

so I don't know the exact timing, but at least three hours.

By simplifying the output alone, so going directly to the journal instead

of saving from, say, the distribution platform and JMP,

I was able to get this down to an hour and 49 minutes.

Then just those two simple changes

of making sure that the data table was invisible

and saving values instead of saving, the formula got me down

to 52 minutes despite the volume of calculations

that were being needed to be made.

Overall, it can be very simple

and easy to simulate non-parametric data within JMP

using these data tables and using the fit, smooth curve function.

Then also, if you are simulating really big data sets in JMP,

if you are simplifying the output, if you're making sure

that JMP isn't rendering things it doesn't need to or calculating

and saving things it doesn't need to, it can actually be very efficient

in creating the simulations and giving you the outputs.

In this particular case, using those techniques,

I was able to reduce my simulation time over a three-fold.

That's all I have. Thanks for listening.