From Coping Strategies to a Piece of Cake: Making Sense of Curves with the Functional Data Explorer (2021-US-30MP-890)

5 Kudos

Level: Intermediate

Andrea Coombs, Sr. Systems Engineer, JMP

Ten years ago, I gave a JMP Discovery talk entitled, "JSL to the Rescue: Coping Strategies for Making Sense of Real-Time Production Data in Vaccine Manufacturing," which proposed some approaches for creating a custom workflow to make sense of curve data from a manufacturing process. The amount of data, data processing, and modeling was overwhelming, and a considerable amount of scripting was necessary to monitor, characterize, and improve the process. Ten years later, we now have the Functional Data Explorer, which would have made this effort not only much easier, but also more informative in term of process performance and process understanding. In this presentation, I show how making sense of bacterial fermentation data in the Functional Data Explorer is a piece of cake and how the results can bring peace of mind -- no coping strategies necessary!

The end of this presentation includes some thoughts on monitoring fermentation growth curves over time. One option discussed is to calculate the integrated difference of each curve from a standard curve. The following JMP Community discussion shows how to calculate the area between two spline fits.

https://community.jmp.com/t5/Discussions/Calculating-area-between-two-spline-fits-bivariate/td-p/118...

Auto-generated transcript...

Speaker	Transcript
Andrea Coombs, JMP	Hello, my name is Andrea Coombs, and I am a systems engineer with JMP. So today I'm going to be telling the story from my years when I worked in the industry, specifically in the pharmaceutical industry.
	So here's what I'm going to be talking about today in a nutshell.
	coping strategies for making sense
	of real time production data in manufacturing...in vaccine manufacturing. And in that presentation I discussed ways of dealing with curved data
	that we were getting in real time from sensors on our manufacturing floor.
	And today, what I want to talk about are some of the new tools that we have in JMP, specifically the functional data explorer, and how that would have made all of this work a piece of cake.
	And I'll be going into ways to use the functional data explorer for process understanding, process optimization, and process monitoring.
	So let's first start by talking about the process and the curves that we were dealing with. So here on the left, we have an example of a hypothetical vaccine manufacturing process. It starts with material prep.
	In this case, we're doing bacterial fermentation in two stages, then the product is filtered, absorbed, and harvested,
	and downstream the product is centrifuged, and then finally, multiple sublots of material are formulated together into one lot of material.
	And today I'm going to be focusing on the fermentation stages of this process.
	So while bacteria are fermenting, they're going through growth and the growth curves look like something over here on the right. So during growth, there is a lag phase, there's an exponential growth phase, a stationary phase and, finally, a logarithmic death phase.
	And while I was at this company, we were going through an evolution of process under...of process understanding, specifically around these growth curves.
	So we started off early on, by just taking five samples from these two fermentation stages, sending those samples down to...to the QC lab and waiting for results. And we got
	optical density, so we could tell the density of cells in those samples, and the results we got looked something like what you see here on the right.
	We have a distribution of results at those specific time points, but we really didn't know what the growth curves looked like. They could have looked like any of the curves that I drew in over those distributions down below. We just really didn't have any idea.
	But then we implemented inline probes, so we implemented turbidity probes so we can actually see what those growth curves look like. However, we had to gown up and go into the manufacturing suite
	to manually download the data, bring it back to our desks, and then try to figure out what we were going to do with all of this data.
	And then, finally, we upgraded our PLC to where we could now have all of the data from all of our sensors at our desk, no need to gown up and go back...and back into manufacturing; we had all of the data
	at our fingertips, which was great. But through careful analysis, we were able to determine that we had too much data and we had a lot of data to deal with.
	So that's why...
	so that's why I gave this presentation on coping strategies for making sense of real time data.
	We put a lot of work into making sense of our data, and I wanted to share some strategies you could take to make sense of your real time data.
	The first thing I talked about was parameterizing those growth curves. So we had these curves but we wanted to be able to take a look at the growth and
	understand what was going on, so we we parameterized them. In other words, we took the slope of the growth, the slope of the death rate, we were looking at duration of time that the bacteria were in each of the phases. We were looking at
	the turbidity at certain time points, and now we had a big collection of these growth...what we called growth variables.
	And these growth variables were very useful for us. We used them for process understanding, for process optimization and process monitoring.
	Now the second group of coping strategies I talked about was just the processing of all this real time data
	that we had. So of course, once we started finding all of this data useful, we wanted access to it right away.
	So the first thing we had to figure out is all of our data sources and making sure we could get access to everything we had, not only data from our historian, but we had data from our
	ERP system, in our laboratory information management systems. We had to bring that data all into one place. We used a super journal, so this is kind of the command central of accessing all of our data and running all of our scripts.
	This was really a lot of work that we did. We were pretty proud of ourselves, but you know in hindsight,
	knowing what tools we have in JMP, specifically now with JMP 16, there are a couple things that would have made this work a lot easier.
	With the growth curves, we could have used the functional data explorer to analyze our curves. We wouldn't have to parameterize them,
	where we were leaving a lot of the data behind. We can use all of the data in the functional data explorer. And then, of course, action recording in the enhanced log,
	that would have made the efforts in the scripting so much easier. But today I want to just focus on how I would have used the functional data explorer.
	Of course, I don't have access to that data anymore, so I simulated some growth curves. Here I have 20 different curves, which represent some variability you could see during bacterial fermentation.
	And with the functional data explorer, it really boils down to these four steps. The first thing is you're going to take your collection of curves
	and you're going to smooth all of the individual curves. The second step is to determine a mean shape and shape components that describe all of the variability that you see in these curves.
	The next step is to extract the magnitude of the shape components for each curve.
	And once you have these two shape components, they're actually called functional principal components,
	then you can use those for however you want to analyze your data, whether your curves are inputs to models, if they're outputs to models, if you want to do clustering analysis.
	Really, you can do any kind of analysis now with these functional principal components. In the example today, all of my curves are outputs to a model so I'm going to be using functional DOE
	to take a closer look at these curves. So let's get into JMP.
	Here I have my data table, where I have the batch numbers, I have the time in the turbidity, which make up the curves.
	And I also have a bunch of process parameters. And the first thing I want to do is, I want to do some process understanding to see if any of these process parameters
	are impacting the variability I'm seeing in the curves. So let's go into the functional data explorer. We'll put in time as our X, turbidity is our Y. We'll put in our ID function and then we're going to take all of these process parameters and enter them in as supplementary variables.
	Now, the first step that we need to do is, we need to fit the flexible model to all of our curves and I'm going to use directional...direct functional PCA to do that.
	Here are those flexible models. The second step, which JMP has already done for me, is determine the mean curve
	and our shape components. And JMP has identified six different shape components for me. Over here on the left, I can see those six different shape components.
	And here's the cumulative amount of variability those shape components are explaining in my data.
	You can see, just with the first two shape components, I'm already describing 92% of the variability in my curve, which is great.
	And the remaining a shape components here, I'm going to exclude, because they're probably just explaining random variability in my data. So let's go ahead and customize our number of FPCs down to two and I'm left with those two shape components.
	Now, the next step is determining the magnitude of each of those shape components for each of those curves, and I have them here in my score plot.
	On the X axis, I have the Shape Component 1, and on the Y axis, I have Shape Component 2.
	Below I have a profiler where I can take a look at my curve, which I've highlighted here in yellow, and you can see how the first shape component impacts the shape of the curve,
	mainly the growth phase of the curve, and the second shape component is impacting mainly the death phase of the curve.
	And let's take a closer look at a couple of these batches. Let's take a look at Batch 15. Here's the curve for Batch 15, and we can see the magnitude of Shape Component 1 and the magnitude of Shape Component 2. I can enter these in right into my profiler.
	Now you can see I'm able to reproduce the curve for Batch 15. Let's just take a look at one more. Let's take a look at this Batch 2, and you can see this looks very different compared to Batch 15 and the shape...the magnitude of shape components are very different as well. So let's
	take a look at those.
	And you can see now we've been...we're able to replicate that curve as well.
	So, knowing the mean shape, the two shaped components that represents the variability around the curves, we can use now these magnitudes of those shape components to do an analysis.
	And we can do the...when your curves are the response in your model, you can do that analysis right here in the functional data explorer.
	So down here below, I have a new profiler. And this time in my profiler, I have...I have my curve again, which I'll highlight in yellow again.
	And then I have the pH, temperature, pressure, and mixing speed parameters. You can see here for mixing speed, it's pretty flat, so it's not impacting that shape.
	Same for pressure, but when I start to look at temperature, you can see it changing the shape of my curve, mainly the growth phase of my curve. And as I change pH, it's mainly impacting that phase of my curve.
	So that's the approach I would take for process understanding. Well, what about process optimization? And by process optimization, I mean,
	you know, if there is a particular curve that we want to standardize on, what would be the the pH and temperature we would need in order to get that desired curve? Well, the first thing you need to do is you need to define what that
	desired curve looks like. And I just have a few data points here at the bottom of my data table that describe that shape I'm looking for, and you'll see them here on my...
	in the graph builder. And in this case, I'm interested in getting a curve where I have a really fast growth rate and then a really slow crash rate.
	And this is really easy to add into the functional data explorer. We just need to do one additional step; I'll recall what I did before.
	And the one additional step is to load that target function, identify which
	function represents our target. And all I am doing now is I'm going to replicate what I did before. I'll do my direct functional PCA.
	I'm going to customize my FPCs down to 2 and this...these results are exactly what we were seeing before, and then I'm going to come up and do my functional DOE.
	And again, I'll highlight my curve here. And now that that target curve has been identified, I can maximize the desirability, so in other words, I can find the set points for pH and temperature
	and pressure and mixing speed that will give me a curve that looks most like that target curve. So there is
	really fast growth rate, a slow crash rate. And I would just need to
	start with media with a pH of 7.2 and use the temperature of 37.9 on my fermenters. So that's the approach I would have used for process understanding and process optimization.
	When we're talking about process monitoring of growth curves or curve shapes, there are some different options out there and I have a couple different thoughts here.
	If this is something that you want to do, I highly recommend that you check out this 2020 presentation, statistical process control for process variables that have a functional form.
	This was a presentation that was recorded and it's really great. They talked about taking those functional principal components and putting them into model driven multivariate control charts.
	They did a great demonstrate...demonstration of that and then also talked about some maybe drawbacks of taking that approach.
	So I want to discuss another approach that you might want to consider and that's looking at the integrated difference from a standard.
	So just like as before, when we were looking at that target curve, you could use that target curve as a standard and then for each batch, calculate the difference from that standard. So for example, here I have
	a curve from one batch that's represented here in blue...or sorry, in red. The standard curve is in blue and I'm taking the difference and the difference here is on this green curve down below.
	And you can just integrate over that curve to find the area under the curve, and then you could use that in your statistical process control program. And I just want to quickly take you through the steps of how to do that.
	So here is my time and my standard turbidity values, my turbidity for that first batch, and I've just calculated the difference between those two. And then what you can do is you can
	use fit Y by X to build a spline model for that difference over time, and then you can save the coefficients for that model to a data table.
	What you'll get is your X values and the four coefficients that you can use to build the spline model. And here's that formula to build the spline model,
	and then you integrate this formula. So you can either do this through scripting, I chose to do it in the formula editor. So I've just determined the area for each slice of the curve here by just integrating that previous formula by hand.
	So once I have these areas slices, then I just need to add up the areas that I'm interested in. So I can do the area for the total curve.
	I can also do it for just the growth phase and just the death phase, and I've done that here up here
	in my table variable. So I have now a variable for my total integrated difference, for my integrated difference just during the growth phase, and my integrated difference during the death phase.
	And essentially you just go through and do that for each of the batches. Here I've done this for all 20 batches, and now you can do your statistical process control on those variables.
	So those are kind of my approaches that I would have taken to look at process monitoring and also process understanding and process optimization using the functional data explorer. I hope you check it out and it's a piece of cake for you as well. Thank you.