Survival Prediction of Patients with Heart Failure from Limited Clinical Laboratory Data Using JMP (2021-US-30MP-881)

Level: Intermediate

Stanley Siranovich, Principal Analyst, Crucial Connection LLC

Cardiovascular diseases (CVDs) are a leading cause of death, accounting for about 30% of all deaths worldwide. Hence, early detection and treatment, especially for high-risk cases, would be helpful in preventing loss of life. In this session, we examine data set with 299 rows and13 columns, recently published on Kaggle and used in a study published in BMC Medical Informatics and Decision Making, 20, Article number 16 (2020). In particular, we examine a number of predictors such as age, hypertension, smoking, and creatinine, and their ability to predict heart failure. We use several different platforms in the predictor screening and multivariate methods menus. Also, we conduct the analyses as a live demo to demonstrate how visualization, interactivity, and analytical flow can lead to shorter time to discovery when compared to more traditional methods.

Auto-generated transcript...

Speaker	Transcript
Peter Polito	Hello Stan are you there.
	hey Peter.
Peter Polito	How are you doing.
	i'm doing good.
	Wonderful.
Peter Polito	How are you feeling about doing your abstract today.
	yeah i'm feeling pretty good.
	All right, I don't feel like putting had another week to practice.
Peter Polito	Now sure.
	i've always thought that perfect is the enemy of the good.
	yeah yeah you're right about that yeah I clicked on the wrong link.
	Oh i'm sorry about that yeah it's okay to click going to space i'm recording to for this one.
Peter Polito	Ah, I see.
	Where you're we're.
Peter Polito	sorry about that.
	All right, let me pull up.
	Your listening to confirm a few things before we start, and it is in a very unhelpful to see excel file and you think it'd be in JMP.
	Just a moment how's your day going.
	So far, so good.
Peter Polito	Great and where are you calling from.
	i'm calling in from jeffersonville indiana which is right across the river for louisville Kentucky.
Peter Polito	Okay i'm going to lexington and livable in November.
	Oh really for for one of the conferences.
Peter Polito	know a pleasure that's my.
	brother in law's 40th birthday and a bunch of.
	guys are going to meet for some whiskey tasting.
	Good good they got the Bourbon trail now are you going yeah.
Peter Polito	that's exactly what we're doing.
	All right, just a moment here Stan.
	Okay, and when you get done with that, let me know, let me give you my cell phone number.
Peter Polito	Alright, are we looking at.
	The survival prediction of patients with heart failure or your other abstract.
	No that's it the heart failure.
Peter Polito	Okay, so the title is just to confirm survival prediction of patients with heart failure from limited clinical laboratory data using JMP.
	that's it.
Peter Polito	Okay, and I have to go through a few things here.
	You have no co presenters correct.
	Correct and then I need to verbally confirm this, you understand this is being recorded for these have JMPed discovery summit conference and will be available publicly in the JMP user community do you give permission for this recording and its use.
	I do great Thank you very much, all right you're.
	Can you just say a little few words, I just want to make sure the audio is coming through clearly.
	Okay, well, let me, let me just start it my name is Stan saran of itch and I am principal analyst at crucial connection llc.
	And I am located in jeffersonville indiana indiana right across the Ohio river from famous louisville Kentucky and today i'm going to talk to you about heart failure prediction.
Peter Polito	Perfect alright that's coming through Nice and clear.
	Okay.
Peter Polito	Let me just so there's a few things we need to do so that the visuals come through clearly we need to check your display are you on a windows machine.
	i'm on a windows 10 machine and I am going to be working off my outboard monitor.
Peter Polito	Okay, so if you could just go to the search bar at the bottom and type in display.
	search bar bar at the bottom oh.
Peter Polito	yeah if you would mind sharing your screen, while you're doing this, I can.
	check things yeah.
	yeah had some trouble with this one, not to say.
	We share screen but i'd lose my screen share every once in a while.
Peter Polito	Oh, you can check that.
	Okay.
Peter Polito	So at the bottom you'll see the share screen button Oh, you are i'm sorry it's on my father.
	Okay yeah so type click on that magnifying glass.
	And type in display.
	Okay.
	Where.
Peter Polito	You can just start typing it's the cursor is already in place.
	Oh there there it is.
Peter Polito	Oh, I don't think you want to hit enter when you try that one more time.
	Okay.
Peter Polito	So go to change the resolution.
	And Microsoft management.
Peter Polito	said oh it's you know I bet you, is it on your other monitor.
	yeah it's on my other monitor.
Peter Polito	Okay Would you mind dragging it to this one.
	Okay.
	Yes.
Peter Polito	I can't see it yet i'm only seeing your browser window.
	And that's.
Peter Polito	Because he was saying.
	yeah same same problem we had.
Peter Polito	So did you share your whole screen, or do you share just a.
	window, I thought, a shared the whole screen the.
	New share.
	problem is a lot of times it'll cover up my.
	Control bar for the zoom meeting.
	Ah.
	Which is why i'm trying to do it like this.
	Can you see it scheduling maybe see how did that get in there.
Peter Polito	that's what I see is the scheduling made easy.
	yeah free to that.
	And I probably looking at my outlook.
	yep.
Peter Polito	And we're going to want to close outlook anyway, before we start.
	yeah yeah now now that we're connected I don't need any extra windows, let me shut that I go to all the windows open.
	Okay, there we go.
	And that's all I need.
	New share i'll share screen to.
	Let me drag that over.
	Here, when I click on hitting a new share button.
	You see my desktop.
Peter Polito	I see a green tent.
	With a beautiful.
	sunset in the Milky.
	way yeah yeah let me.
	Go back here display all search on display again.
	You see my search bar.
	yep.
Peter Polito	And then chant you want to click on change the resolution of the display so thanks for the option down.
	Okay.
Peter Polito	Here we go, so you were at 1920 but that's okay perfect that's what we want to be at.
	Okay, and then you can go and close that.
	All right.
Peter Polito	And then right click on your taskbar at the bottom.
	and go to task bar settings.
	click it.
	yep.
Peter Polito	yep.
	Okay.
Peter Polito	So it's the second option down.
	automatically test for it there, we go.
	All right, anything else here.
	um.
Peter Polito	can go and close that.
	And then the last thing we want to do is go down to.
	Do you like.
	Notification tabs or anything to make sure any anything that might give a pop up is closed.
	yeah I closed everything before I logged on.
	Okay, the only only problem is my control bar is still on on my laptop monitor and i'm still looking at the test bar on yelp board monitor and that's the one I want to present from.
Peter Polito	Okay um.
	let's go back there and see if it automatically change itself.
	Okay, so maybe I just go and try clicking on the desktop and maybe that will make it go away.
	yeah there we go.
	There we go there, we go.
Peter Polito	Alright, so once we.
	We are recording now so.
	Now I go to be careful on that.
Peter Polito	yeah if it.
	If it pops up it's not the end of the world.
	Okay.
Peter Polito	You will.
	want to.
	Once you start presenting i'm going to go mute my camera is going to be off i'm going to be a ghost you're going to think you're not just talking to yourself.
	If you.
	feel like you've made just some cataclysmic air and you want to start all over that's totally fine, unfortunately.
	The way they're having us do this if you there's we're not going to really be able to edit something in the middle it's basically we got to go through in one run if in the future, you finish this and then you realize Oh, I really wanted to say this, instead of that.
	When these are actually presented.
	you're going to be, you have the option to be live, and so we can always pause the video, and you can say Okay, you know I misspoke here what I really was trying to convey the X, Y or Z and you'll have the opportunity to kind of in real time.
	set the record straight so that's if, like you realize tomorrow or the next day or something, but if you in the moment you feel like I wanted to say this completely differently, we can just start again.
	Okay, well, hopefully we won't have to do that I was having trouble with Sharon to state and Nick stopped at once, and I asked them stop at once.
	And that be edited out, you know.
Peter Polito	um yeah yeah.
	Okay, so.
	Good.
	All right, i'll leave it to to my colleagues that are producing this to go ahead and work that.
	That for you let's let's pause the recording temporarily.
	Okay.
Peter Polito	All right, i'm going to turn off my video.
	For a second.
	Before we start, I want to check the screen share.
	Perfect okay.
	Now I want to have to bring up the taskbar.
Peter Polito	No that's fine that's fine.
	Okay, the issue is is we just want maximum.
	yeah yeah okay.
Peter Polito	square inch while you're presenting.
	Okay, and here's here's the other deal I ran into problems with this, both on Tuesday and last week, when I pretended to research triangle analysts that's kind of a bit of a story, but i'm not not going to do presentation mode.
Peter Polito	Okay that's fine.
	Okay, and so I have that and let's see do you see my PowerPoint up there.
Peter Polito	I do see your PowerPoint.
	Okay now how about if I come over here.
Peter Polito	And I see my screen.
	super Let me close this and open up the other job screen.
	Come on.
	You see, this is a new JMP screen with.
	scripts and script window.
	I do.
	All right, yeah part of part of the way through the presentation and quit doing that and I had to stop the share and then reshare so hope that that.
Peter Polito	Okay.
	it's fine all right well i'm I am now going to mute my microphone.
	And you.
	Just you do you all right, good luck stance.
	Okay, thank you.
	Good morning, good afternoon or good evening everyone. My name is Stan Siranovich. And, I am principal analyst Crucial Connection LLC.
	And we are located in Jeffersonville, Indiana. Right across the Ohio River from historic Louisville, Kentucky.
	And as you can see, on screen today i'm going to talk to you about the survival prediction of patients with heart failure from limited clinical laboratory data using JMP Pro 16.
	Now what these are,
	are a collection of medical records. 299 medical records to be exact of heart failure patients. And they were collected at the facility [unintelligible] Institute of Cardiology and the [unintelligible] in
	[unintelligible] Pakistan. And they were collected between April and December of 2015. Now, they were published,
	originally, in the BMC Medical Informatics and Decision Making Journal, Volume 20 in 2020 by two authors Chicco, and Jurman.
	And what the authors wanted to do was create a model to assess the likelihood of death by heart failure.
	And th model had to be relatively simple model, hopefully, be used by hospitals and practitioners to assess the various severity of cardiovascular disease.
	Now they go into the history, a little bit and i'm going to do it, too, but very briefly state reason for doing this, the authors notice that.
	the data they looked at and the models that were derived from the data had number one low reproducibility and they also had a number of different factors in the top five or top six.
	For predicting heart failure but two kept reappearing, and they were ejection volume and creatinine. And, i'm going to go over
	what those are in just a minute or two once we start the analysis. Now what the authors did was to use 10 different according to them, advanced machine learning methods to predict heart failure.
	So they ran 10 machine learning scripts. They used random forest decision trees, gradient boosting, all the normal suspect. Excuse me. And what they did was evaluated them on seven separate machine learning
	metrics. And, they found, not to their surprise, that serum creatinine and ejection volume were significant variables. And they the managed rule out all the other variables.
	And, that wasn't the case when we did the analysis in JMP Pro 16. So, we'll be taking a look at that and i'm going to talk about variable Age little bit, but,
	before we do that, let me bring up JMP.
	And here is the JMP data table that we're going to be working with.
	If you're not familiar with JMP let me go over very briefly up here, where I have my cursor we we have the script window. Followed by the column window. You can see we have 13 columns with nothing selected and here's the rows window and we have 299 rows.
	Though I downloaded. The way it works is I went to Kaggle. They published everything to Kaggle,
	for other people to practice on, so to speak. And in the intervening years, some state put it up there, there were 177 separate submissions. They were all open source and dozen or so that I looked at, they were all
	like we think of normally is top two most popular Open Source platforms. They also put the original analysis in there, which is, which was done in the R statistical programming language and they published all the code to GITHUB. So, let's see what we can do with JMP.
	Now, let's go over the variables here which alluded to earlier. First of all, we have we have creatinine phosphokinase. That's an enzyme.
	And it's an indicator of muscle damage. We also have serum creatinine, which is a chemical that's in the blood.
	And, it's an indicator of both muscle damage and renal failure and we talked about muscle damage here we're thinking about heart failure.
	We've also got ejection fraction, which is the percentage of blood leaving the heart and having to load the caption fraction is an indicator of heart muscle damage. We've also got serum sodium,
	which is a routine measurement done with blood work. And very low values here, maybe an indicator of heart failure.
	We should also note that we have what we could refer to as a data set in balance. That is,
	we had a...well shouldn't call it positive...a death event in 32% of the of the cases here and survival in 68%. So it's yeah, one to two
	we've got one third of the people suffered heart failure two thirds did now. We should also mentioned that we have a relatively small data set here.
	299 rows. Very small for heart records. So let's start with the analysis and let me bring up another JMP data table.
	And that is this data table right here, and let me get rid of this. I'll drag it over to my other screen.
	Okay we're looking at data table and I already did a little bit of work on this I rearrange some things and you'll notice down here,
	anaemia, diabetes, high blood pressure, sex, and smoking. When I originally imported to JMP it gave it a continuous
	distribution or continuous
	data type and what I did was changed them. For example, if you come over here and look at anaemia, or better yet high blood pressure.
	High blood pressure - we've got zeros and ones, and they didn't really say where the cutoff was would have been nice to have some continuous data here with with the
	systolic and diastolic. That's not the case. So that's what we have to work with and I changed all those. And, you'll also see that platelets
	is hidden.
	And, that's because I have hidden and excluded, I should say, and that that is because I had to clean them and there's one other thing I want to do.
	And, show you how to do this purposely did not do it, I go over here to age and i'm going to hide it.
	And, i'm also going to exclude it that is it's hidden from from the graphing for viewing it's going to be excluded from any calculations, in case I accidentally drag it in with something else and i'll explain the reasons for doing that in a little bit. So,
	we're going to do this, a little bit differently. I'm going to start, excuse me, with the analysis, and then after l've finished the analysis i'm going to go back and do some cleaning and I'll explain as I go along.
	So we come up here to
	Analyze
	and
	to follow something else for a second. Okay, by the way the PowerPoint deck is it's going to be up on the Discovery Summit website, and i've got some screen captures in there and a little bit of explanation, so that you can follow along if you'd like to do it for yourself.
	So, the first thing i'm going to do is go up here to the Analyze menu, click on it and i'm going to go down here to Predictive Modeling.
	And we have a bunch of choices here including Model Comparison and Model Screening. It says here for Model Screening - "fits many different predictive models so that you can select the best."
	So, that sounds like we've wanted to do with, we would like to parallel
	the original analysis. And, there is this window that we get and if you notice down here over on the left, we have a box
	and it's got some checkbox in it and it tells us which models we are going to be running this is all the default.
	So, it's going to run, for example, Decision Tree, Bootstrap Forest, Boosted Tree, Fit Least Squares, all that sort of thing. All of our favorites. And there, there are 11 of them for us in there and JMP it's going to do it automatically. So, DEATH_EVENT,
	that is going to be my Y response. And, what i'm going to do is just
	click drag select all the variables here, and I'm going to put this in as X factors.
	And i'm going to click OK.
	And we sit back and let JMP do its thing.
	And there it is. Here is the result that we get.
	So let's just work off of this. Here are the details. We can see the various
	analyses that we're running if we click on this grey triangle.
	We can see more details and the details that are present in the window depend on particular analysis and then with run. If we want to go further, of course, we can click on the red hot spot, and we have some additional choices.
	So, that's what we have down here, and when it did that it also opened this box here and it's labeled training. And, what this is is the results from running all the data as a training set. In other words are there wasn't any training.
	There wasn't any test set. Excuse me, there was a training set, but there was any test set. And, there wasn't any validation set and that's because we didn't make any selection.
	This is just screening and it tells us which had the best best results in training. Okay, so, we see number of columns here so let's
	go through those briefly. Entropy R squared is
	simply an R squared and it is,
	well it's a measure that compares to log likelihoods of the fitted model so it's log value.
	And we have the miscalculation rate, the area under the curve here, obviously high is going to be better.
	And we've got the RASE, which is the root average squared prediction rate. So, this is the difference between one and p-probability.
	That is the fitted probability for the model that they that JMP came up with. And, we have the generalized R square that that is what everybody thinks about as R squared. It's a
	measure...it's applied to regression models, and if we get a perfect fit, we have a one, and if we don't fit we have zero. They are ranked here so she Bootstrap Forest came out on top.
	And let's see what else do I want to say check my notes.
	I think that's about it, we could
	come up here again
	and open up the Bootstrap Forest and check the results, and we could see for our training set down here let's look at the confusion matrix. It gives us
	the training data here, which same data that we saw, but this is just for the bootstrap forest and we have the confusion matrix here we have predicted count.
	And, down here we have the predicted rate for the Bootstrap Forest. And if you're not familiar with this or your memoriy is a bit rusty what this does is it show us the count for
	death events that were actual death events with the value of zero, and that is the patient survived down here, the 83,
	we had death event with a value of one, which means patient did not survive. And this is the absolute count, so we want to look at the diagonal going from the upper left to the
	lower right and in a diagonal here from the lower left to the upper right those those are the incorrect count. So, we had 13
	times here that were predicted to survival, and they weren't survival. They were deaths. And then in the second grey box here we have same data, but it deals with rate. Which, in our case is a little bit better measure, because we do have unbalanced data set so I will close that.
	And next,
	what we're going to do is go to Analyze and Fit Mode. And, the reason we want to do this is we're going to fit a regression model to it, which is, which is a whole lot easier for people to understand and it'll give us
	give us chance to show off
	the ability to interact in JMP. So, what i'm going to do is close that and i'm going to go up here to Analyze, here we are, to
	Fit Model.
	And, well,
	We know that the variables now, first of all let me go up here
	will put to DEATH_EVENT in there as what we want to predict. And we know and we saw that the Creatinine, no that's the wrong one. The ejection fraction
	and the here it is.
	Serum creatinine
	are important variables. And, age also popped up in a couple of the analyses. So, let me put that in if I can find it.
	Hang on.
	Let me start that one over again something got left out.
	hey Peter, can we pause for a second.
	So, what i'm going to do is come up here to Analyze,
	Fit Model. And, Y is going to be the DEATH_EVENT.
	And what i'm going to do is since number of the models found age as an important variable I'm going to add age. And, i'm going to put in
	ejection fraction and serum creatinine.
	Now the authors didn't find age as a significant variable but I did so i'm thinking there might be an interaction in there and JMP allows us to handle that very easily. So, the way I that, I click on age
	and ejection fraction instead of clicking Add,
	I come down here
	click Cross and I'm going to go to age.
	and
	serum creatinine.
	And that's control click.
	And we're going to add that so we've got.
	Age, ejection fraction, wait a minute.
	I think that just fat fingered that one.
	There we go.
	Now it's in there, so it's got the two interactions and we can see up here it gave us a choice for Personality. And, what we're going to do is accept the default, which is Nominal Logistic.
	And we have some other choices here but we'll just stick with that.
	And I will click run.
	And here's what we get.
	Okay, so.
	We have to log worth here, which is basically minus.
	log base 10 of this probability value over here. And the reason we do, that is, it makes it easier to work with, excuse me, easier to graph and easier for human beings to understand it. So, we see that number one on the list.
	Is ejection fraction, excuse me, which is what the authors found. Number two is serum creatinine, but age is still number three and highly significant.
	And we looked on into
	interactions and we see they are not significant. By the way this blue line here is the value of two. It's two.
	And they put that in or simply to make it easier to see. And the reason I chose two is because that that is log of .01, which is the significance level that is chosen by JMP as default.
	But we come down here and we see that the log worth or value and regression equation is going to be relatively low, but we're over here and .05 which could be a cut off in a lot of experimental designs. It could be cut off in a lot of.
	analytics work so we're going to take note of that, even though we're going to ignore it right here, because it doesn't pass our limit of .01.
	It may be of interest to clinicians. It may give them something else to look out for and it very well may be significant in a different data set. And, we have all the usual data down here. Let's see what can we look at, let me scroll down okay.
	We have the chi square probabilities which I won't go into, and we have the estimates down here which, which would be the coefficient in our regression equation.
	So that was what we found.
	So let me minimize that and we're going to go on to cleaning and exploration now, the reason I did that, without cleaning first is because in th original articles, there was no mention of any
	cleaning and sort of data munging whatsoever. They just took the data and they ran their models on it.
	And I like to look at my data first. So, let's look at a couple of things here. First of all, if we come up over here,
	and click on that little icon we can show that header graphs. And, I like to do that. So, let's click on 1 here, which is
	positive death event, if I may use those terms.
	And what that does is highlight all the rows with a death event of 1. If we'd like to take a closer look at it, and we can see if there is any sort of correlation with the other variables that might pop out at us and there's not a whole lot there.
	My window, but we noticed one thing we've got some really, really skewed distributions for the
	creatinine phosphokinase and for the serum creatinine which looks a little strange. Nothing else out there, we can clear those
	by clicking anywhere else in the graph and, by the way, if you're not familiar with it, these these sorts of things are
	dragable if we want to expand anything.
	Now from there let's go to Graph Builder.
	Let's see.
	I'll tell you what since we're concerned about that select that column, which is creatinine phosphokinase and shift click.
	For serum creatinine and, oh, I almost forgot before I do that
	let's right click
	And we can go right from here into Distribution, and this is the same thing we get if we went went up to Graph and Distribution. We could see we have a highly skewed distribution in both of them.
	And we have a lot of outliers here's just quantiles here and i'll close that. Let me close that. We don't really need that.
	And let's look at the Summary Statistics, let me.
	There we go well look at a creatinine phosphokinase we've got a mean 581, a standard deviation of 970. So, that's a little weird. That's obviously a long way off from what would be an ideal case and over here for serum creatinine I mean we have a mean
	of I call that 1.4 and the standard deviation of 1.0. So that's not an ideal situation, either. And it's given us some other information too and if it's a little hard to see on this graph but if if you're not
	used to looking at box plots real quick the whiskers are 1.5 times the interquartile range, which is the difference between the 25th percentile, which is the bottom of the box here 75th percentile, which is the top.
	The diamond it gives us the mean and the line gives us the median. And, to make that a bit more clear I can expand that, and you can see it a little bit better and there's a whole lot of outliers there.
	So,
	keep in mind that the serum creatinine was one of our very strongly predictive variables. We could also examine that a bit more detail
	by going to Graph Builder.
	So let's go up to Graph, Graph Builder.
	And this is the window we get.
	Well, we know from doing our research and
	some subject matter, knowledge if not subject matter expertise that creatinine phosphokinase is continuously produced by the body. And it reacts on
	a precursor of what will become the serum creatinine so maybe there's some relationship between the two of them and we'd also like like to examine those, wel, unsatisfactory distributions, shall we say. So over here creatinine phosphokinase.
	And we'll put that in the X axis simply by clicking down here.
	And we'll do the same with serum creatinine that in the Y axis and here's what we get. So, we noticed the Y serum creatinine goes from zero to 10 and while
	creatine phosphokinase, man, that it goes way out here
	to almost 8000. Now one thing we can do.
	Is click drag and we can spread that out a little bit more. But, the other thing we can do
	is double-click on our axis and we get this window I won't go into detail on this, but we come up here and look at the scale and see the maximum once we past 8000. And, from looking at the graph down here, let me drag that out of the way.
	Certainly, most of our values for the creatine phosphokinase are less than 1000. In fact looks like a good portion of them are less than maybe five or six-hundred.
	So,
	I don't have to.
	change that to
	let's pick 800.
	Okay.
	And that gives us a graph of just those data points. We cut all the other ones off, and we see it's basically straight line. There's no real relationship between them and we also have some outliers here on the
	serum creatinine, way up here at 10. And that looks like a by denying value side here, so it may behoove us to go back and take another look at those maybe get rid of some of the outliers see if we get in and get different results.
	And that is pretty much it, let me minimize that.
	Take one last look at the data set.
	Oh yeah serum creatinine.
	We put that in as a continuous, but if you look at that here in the column our least count is out here, in the first decimal place, so we have possibility of 10 values there plus another 10 values most most of them
	looks like they are under 2 or 3. So, there, there may be some question there about how we want to treat that depending on the analysis.
	And we certainly want from our analysis. We're going to use that. Do we want to really consider that, as a continuous variable or should we can consider that as an ordinal variable with maybe,
	I don't know,
	20 different values for that ordinal variable. 20 or 25 or whatever that's frequently done in marketing analyses, but it's not usually done here, but it is something to consider. So, let me
	go back over here to a summary. Go back to the PowerPoint.
	And let me expand it and I hope that everyone can see that so here's what we had.
	We have 11 different models that we JMP picked by default and we got those 11 different models and bring them all just by going to Analyze, Predictive Modeling, Model Screening. We found it serum creatinine, age, and ejection fraction all to be significant.
	And we found out that age cross ejection fraction interaction may warrant further further clinical study, even though it wasn't really significant for our study it wasn't
	decided upon value. And, we saw some highly skewed values, especially in the creatinine distribution, which was used in our model and with quite a small and unbalanced data set. And, now I mentioned age before. Let me go back to the original.
	Drag that away. Here is age.
	And we see, we have a somewhat skewed distribution and I didn't put that into model and reason I didn't put that into model is the way that data was collected.
	We know from reading the articles that the way the age was like did was they sent some sort of healthcare workers to check on the people after
	the original data was developed for the study. So, when you see a 65 year they went back 65 months afterwards.
	Oh, excuse me that wasn't the age i'm talking about time, let me go back here. Go to time that me unhide it.
	And there is time. A little weird let me pull that out a little bit.
	What they did
	was they went back after the original study. So, when you see 16 year they went back 16 months after the original study and they check was there a death event or no death event. That is, was the patient still alive or is he not still alive or she.
	So after 16 months this patient here, number 21,
	was still alive. Now, if we go down here to death event, and let me highlight that one.
	And come over here.
	We see we have time of 20.
	Well, we don't know if the death event took place the day before they checked for the week before or a month before or two months before. The original authors who published a paper and put it up on Kaggle did look at that data.
	And they couldn't find any positive conclusions from it. That person could have died in month one, but nobody checked at month one.
	And we don't know the frequency of the follow up checks. So that is why I had that added and included, and if we would have had better data there, we could come up here to Analyze,
	Reliability and Survival and we could have done a Life Distribution or a Fit Life by X, which is the way a lot of these are done but we couldn't do that in this particular case.
	So that pretty much
	ends the presentation i'd like to just go back to the PowerPoint deck which once again will will be posted on the site and here are the links to the original journal article.
	The Kaggle.
	pages. The Kaggle data set which is a CSV and,
	the authors Github repository with the R code. And, I hope you all enjoyed that.
	And with that
	we will end this presentation.
Peter Polito	good jobs now is very interesting.
	Thank you.
Peter Polito	Alright, well, let me make check my checklist make.
	sure.
Peter Polito	i'm not.
	I don't believe that with the agent time.
	Please, please that caught it.
Peter Polito	yeah and if they come back and say Oh, we don't we don't like the pausing what we'll do is when you're alive, you can just we can you can talk to whoever's presenting it, and you can skip over that part and just say this is what happened and it's.
	Okay do.
	I don't think they're going to ask us to re record it but.
	well.
	Okay yeah if they do it's no big problem.
Peter Polito	Okay well i'm everything else i'm going to do i'm going to do.
	Once we've stopped recording, so we are good to go.
	and good.
Peter Polito	that's great thanks for letting me watch.
	yeah well the good thing about it, these, even though I had a couple flubs in near was he didn't have any troubles and software.
	Now everything looked great yeah yeah dragged everything over and stayed there so that's good.
Peter Polito	All right, well.
	i'll probably cross paths with you at the discovery summit and have a great day.
	Okay, you too theater.
Peter Polito	All right, bye bye stand.
	By.