cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
tuo88138
Level II

data spliting

I want to feed data as three sets: train, validation, and test set to jmp. how can I do this? does jmp use test set? or just training and validation?

4 REPLIES 4
P_Bartell
Level VIII

Re: data spliting

JMP Pro has a native capability to establish for any observation, a designation for training, validation, and test. Then within the analysis platforms the observations are handled appropriately. If you only have JMP you can still use this modeling best practice...it's just way more keystrokes and menu selections with lots of manual handling of data. I think my former colleague (I'm a SAS/JMP retiree) @Jeff_Perkinson composed a blog post many years ago now with suggestions on a workflow for the JMP pathway? Or maybe I'm misremembering.

 

Here's a link for 'how to' make a validation column in JMP and JMP Pro:Creating a validation column 

tuo88138
Level II

Re: data spliting

Thank you so much.

Re: data spliting

Hello @tuo88138,

 

Welcome to the JMP Community!  

 

Standard JMP uses 2-level validation as a rule and JMP Pro has 3-level validation built in to the Analyze platform under Predictive Modeling. In Standard JMP, you can always hide and exclude your test set data when building your model and then bring it back in to see how it affects your model and/or how well the training and validation sets did in the modeling process. 

 

You can also build your own validation columns manually in JMP Pro.  Create a new column and go to Column Info and the Initialize Data > Missing/Empty drop down.  Select Random and Random Indicator to break the data up into 2 or 3 level validation sets.  To see how to build a validation column in JMP Pro check out the link below.

Creating a Validation Column (Holdout Sample) | JMP

 

HTH

Bill

 

 

P_Bartell
Level VIII

Re: data spliting

Just to add a bit to both my initial reply and @Bill_Worley 's, I'm not sure if you are aware but JMP also supports other modeling cross validation methods such as k-fold, leave one out, etc. Implementation of these methods is most commonly embedded in the analysis platform launch/specification workflow rather than setting up specific columns or train/validate/test designations for observations. So depending on the modeling method, the practical problem at hand and the data...one of these methods might be useful as well. Like so much in statistics...'it depends'.