cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

Using Matrices

Started ‎11-08-2022 by
Modified ‎11-08-2022 by
View Fullscreen Exit Fullscreen

So this demonstration is going to illustrate some of the ways that you can use matrices. So in the course journal in section I'm going to click on Cure Time Data. And let's suppose that we need to find the average cure time for each date. So we'll add up all of the cure time values in each row, and dividing by the number of samples. And I'm going to use matrices to perform this computation. This is, again, those vectorized operations will make this more efficient. So back in the course journal, I've got a pre-written script called Compute Statistics With Matrices. And let me move this so we can see both the Cure Time table-- we'll make that a little wider, even-- and our script. So the script begins by creating a variable called s. And s is going to store the result of the J function. And the J function creates a matrix with some specified number of rows, a specified number of columns, and some starting value. And in this case, I'm using the Nrow function with no arguments. So it's just going to look at the number of rows in the current data table. So I'll run line And you can see in the log that what's now stored in s is this column vector of zeros. And I'm going to use that to add in place each of the cure time values. So I'm using a for loop. I initialized c equals because with the cure time data, I want to start with the second column. I test that c is less than or equal to the number of columns. And again, a best practice would be to, especially if you might be working with larger data tables, to do this counting outside of the for loops, store it in a variable, and compare c to that variable. Next the parser evaluates the body of the for loop. And so we take-- look at the right hand side of the assign function first. So I'm sending the message get as matrix to column this first time around and then adding that in place to s. Then we update c with the post-increment operator. So that adds one in place to c. We test to make sure that c is still less than or equal to the number of columns. And now we get column as a matrix and add that in place to s. So I'm going to go ahead and run lines through And remember that the for function returns a missing value to the log. But you can hover over s to see that it now contains these fairly large numbers, the sum across all five cure time columns. Lastly, I'm going to send a message to the current data table, the new column message. And the title of the new column will be Average. And for the values-- and again, this could be set values or values-- I'm actually going to do a computation. So again, I can supply functions to the values argument without having to create variables to store those first. So the function here-- I've got a divide function and a subtract function. I'm taking s, which is the sum across the cure time columns, and I'm dividing it by c minus And it's by c minus because we started counting c at So that takes one off the number of items that we've added up. And of course, c had to increment past the test. It had to fail the test. So the final value of c-- I can hover and see that that is And yet, we only actually added things. So that's why we divide by c minus So I'll go ahead and run this last line. And you can see that column Average has been created in the cure time data table. And indicated in the log, you can see that I have created an instance of a column object. And in fact, line everything on line that result could be assigned to a variable if I wanted to continue working with that column more. Now, for the next step of the demonstration, I want to work with all of the Cure Time data at once, performing computations. So I'm going to extract the entire data table as a matrix rather than one column at a time, as I did in this pre-written script. And before I do that, I want to get rid of that column that I just created. So I'm going to select the entire contents of this Script Editor window. And I'm going to send a message to the current data table. Delete columns. And the column I want to delete is the one called Average. And I would note, again, that the use of current data table is very convenient when you are testing things out or I'm presenting. In a production kind of script, we really want to use more explicit references to our table. So current data table is nice when you're practicing with scripting, just testing to see how something works. But in practice, a script that you were planning to deploy or use and want to maintain over versions of JMP and across multiple situations where you use it, we would want to use that explicit data table reference rather than current data table. So I'm going to go ahead and run this. And that deleted the column that I wanted to get rid of. Returned that scriptable message to the log. So now let me clear the contents of the script editor window. I'm going to extract the entire data table as a matrix and store that in the variable m. So m will be assigned the result of sending a message to the current data table. And that message is get as matrix. So I'll run that. And you can see in the log, or you could hover over m, that I've got this matrix now. And the first column is these really huge numbers. And of course, that's because when I send get as matrix to a data table in general, it's going to pull out all of the numeric columns. And that will include dates. So the day here is being treated as numeric. You can tell by the modeling type icon, because it's continuous. The data type has to be numeric. You could actually also-- if I make that wider-- even though day appears to have characters in it, that's just formatting. So it's right justified because it is a number. So get as matrix extracted that value, as well. And if I want to do computations on Cure Time-- and in fact, the computations I want to do are going to involve finding a minimum and maximum value-- I don't want these big values for the day in that matrix. So I'm going to go to a new row. And I'll be careful. I want to keep what I've already-- the code I've already put in the window visible as I go through this, but I don't want to run the whole thing over. So I'm going to be very careful to run one line at a time. To get rid of that first column, I'm going to use the variable m and subscript it with a zero and a one. And if you recall, that subscripting just means, don't select any particular row. In other words, give me all the rows in the first column. And I'm going to assign that the empty matrix, which is just the square brackets with nothing in them. And again, to run just that one line, I can highlight the line and use any of the methods that we use to run an entire script. But it will only run the selected or highlighted portion. Or on the numeric keypad of a standard desktop keyboard, I can use the Enter key on that far right side. And so what's returned to the log is just the empty matrix. But if I hover over m now or if I highlighted m and ran it, you can see that we only have Cure Time values now. We've gotten rid of that first column with those large date numbers. I'll go to a new line and look at some of the other matrix functions that we can use. So I'm going to create a variable, a, which will be assigned the result of using the loc function, L-O-C. And I'm going to find the positions in this matrix, m, where the values are greater than And when I run just that line, we are treating the matrix, m, as though it were a row vector and counting out from the first position and just returning the positions, the single subscript positions, of all the elements in the matrix that are greater than Now, if I wanted the actual values, I'm going to create a variable called avals. And I'm going to sign that the result of subscripting the matrix stored in m with the variable a. And I'll run just that line. And there are the actual cure time values that exceed And of course, to streamline this script, if I didn't need to store the positions of all those values, I could actually simply have subscripted-- let me just copy this-- subscripted of the matrix m with the loc function. I don't need this extra step of storing the locations and then using that, unless I have some other reason to work with the positions that I had initially stored in a in this example. So the next thing I want to do with this matrix is compute some transformed values. Ans this transformation is going to take each observation of Cure Time minus the minimum across the entire table divided by the range of the maximum to minimum Cure Time values and multiply that by So it's kind of like taking the distance at each observation from the minimum cure time as a proportion of the range. This illustrates another advantage of working with matrices. In order to find a minimum value across an entire data table, whether I was scripting or working interactively, to work with the individual columns in the data table would take a lot of steps to find that minimum value or the maximum value. But working with a matrix, I can do it in one very short line of code. So in my script editor window-- let me go ahead and clear the contents now-- I'm going to create a variable minCT-- so min cure time, and I'm going to assign to that variable the result of using the min function with m as its argument. And I'll go ahead and run that line. So the minimum value is And then I'll also create a variable maxCT. And I'll assign it the result of using the max function with m as its argument. And I'll run just that line. And that's And now I can use these variables in my calculation of this transformation. I'll create a variable, t, to store the transform values. Let me go to a new line. So t will be assigned the result of times and then open parentheses for the numerator m minus minCT close parentheses the divide sign and then open parentheses for the denominator, maxCT minus minCT. So I can run just that line and then I get these transform values. Each observation minus the minimum as a proportion of the range of cure time values. Now, that's a lot of decimal places. We really don't have that much precision. So we can actually round these values. And I'll go to a new line again. I'm going to overwrite the values in t by assigning the result of the round function back to t. You could do this in a new variable if you wanted to keep the original. But I'm going to type t and then the assign operator, the round function. The arguments are first, what am I rounding? So I'm rounding the values stored in t. And then how many decimal places do I want? I want zero. Now, this is going to be truncating the values in the matrix. This is not the same as formatting a data column to only have zero decimal places where the full values are still stored in the background. So I'm not formatting here. I'm actually truncating the values. I'll go ahead and run that. And now I've got the integer values for the transformed values. Now, if you wanted to put these values back into the cure time data table, you could loop through the matrix and create new columns. If I wanted to just create a new data table entirely, there's a very simple function called as table. And there are optional arguments for naming this table and for naming the columns. I'm going to keep things simple for this demonstration just by using the variable t as the argument to as table. And when I run that, I get this untitled table with generic column names. But all of those values from that transform matrix populated in this table. And again, I could specify the name and the column headings as part of the as table argument. Or I could use the script to modify this after the fact. So we've looked at a couple of different ways to extract numeric values from a data table, either by column or the entire data table, all the numeric columns, the entire data table, and some manipulations that we can do with matrices.