I have lingering confusion about blocking variables in my design and analysis.

gchesterton · Nov 29, 2023 04:00 PM

Hi all, I'm using JMP Pro 15.2.1.

I have lingering confusion about blocking in a designed experiment. In particular, whether I am indeed blocking at all.

In my experiment, I have three 2-level (on/off) factors of interest. The response variable is the time-to-perform a task. I want to do a 2x2x2 factorial and I'm interested in testing the significance of main effects and 2-way interactions. I have 24 runs avail, or 3 reps of the 8 treatments.

Suppose further that I have 6 different operators performing the runs. I am not interested in the operator effect, but I want to account for this source of variation. Complicating matters, due to a constraint I can't control, each operator will "show up" and be assigned different numbers of runs: anywhere between 2-6 runs of the randomized 24 run table.

Suppose I conducted the experiment in this manner:

operator 1 showed up, available to perform, say, 6 runs; I assigned six random treatments from the 24 run table to operator 1.
operator 2 showed up, available to perform, say, 4 runs; I assigned 4 random treatments from the remaining 18 runs to operator 2.
repeat with the remaining operators until the 24 runs are complete.

So, this is not exactly a blocked design. The Operator was not a constraint on randomization. It's more like a categorical covariate. The operators differ in skill, presumably, but I don't have any quantitative proxy measure for their skill (such as years of experience or a baseline performance test).

When I conduct the analysis, do I simply include the Operator term (as a random effect, since the six operators are a sample from the population) to account for this source of variation? Do I assign a design role (using JMP column properties) to the Operator column? If so, do I assign it a blocking role or a covariate role? Or is that inappropriate given how I conducted the experiment? Or do I simply enter the Operator factor in the model like any other main effect, and simply disregard its significance since I'm not interested in it per se?

Should I be looking at BIBD? The JMP module for that did not seem to fit my constraints.

Thanks all.

Mark_Bailey · Nov 30, 2023 12:20 PM

I would ignore the Operator when you design the experiment. Treat it like a covariate as you suggested, but add it later. I would add Operator as a new Nominal data column after you click Make Table. Use this column to record the operator for each run. Add this column as a term in the model and add the Random Effect attribute to this term.

This won't be optimal, but it is pragmatic, and it will work. This approach allows you to focus on the design of the main experiment but still account for the added variation across operators.

GregChesterton · Nov 30, 2023 02:29 PM

Thanks @Mark_Bailey I guess I was on the right track. Does giving the Operator a design role make any difference when I add it as a random effect term in the model?

Mark_Bailey · Dec 1, 2023 09:39 AM

The column properties for DOE factors won't help with the Operator covariate in your case.

statman · Dec 1, 2023 12:22 PM

Here are my thoughts:

1. I'm confused by your statement "I am not interested in the operator effect, but I want to account for this source of variation"? While I understand you cannot control operator (noise), I would think you want to understand operator effect and more importantly, are the effects of the factors in your study consistent over changing operators?

2. Perhaps too late, but I'm not sure why you can't plan to include operator effect in the study (e.g., only use certain operators that meet you criteria when the "show up" in the experiment. It may be inconvenient to run and may take more time, but you may get "better" information.

3. Do you have hypotheses as to why operators might influence mean or variation of the response variable? What specifically do operators do differently or what varies between operators? If you can more specifically hypothesize about what is actually changing, you might have a more efficient means of including the effect in the experiment (e.g., experience, technique, dexterity). If so, you can use blocking or cross product arrays effectively.

4. Something to keep in mind, while randomization is an excellent strategy to minimize bias, increase inference and quantitatively estimate error, randomized errors are unassignable.

5. For your situation, somewhat along the lines of Mark's suggestion...Create a column called operator and code the different operators. Start with practical and graphical analysis (vs. ANOVA). When you plot the data use Rows>color or mark by column and choose the operator column to look for interesting patterns. Realize the operator effects may be collinear with treatment effects.

"All models are wrong, some are useful" G.E.P. Box

GregChesterton · Dec 4, 2023 12:50 PM

A few responses, but I'll clarify that in our experiment we actually had teams comprised of five operators. Operators are nested within the teams - they don't mix across teams. I included a 'team' term in the model, since each mission was conducted by a team.

1. yes, i want to know the factor effects (if any) hold across changing teams of operators. But it's a random effect nuisance factor. My interest is solely in claiming a factor effect observed in the experiment is generalizable to the theoretical population of teams.

2. operator screening like this is not an option at this time. However, see #5 below for other considerations.

3. you can guess the sorts of factors that make some gamers better or worse than others. But those skills don't become apparent until after they have executed a few runs. As answered in #2, we don't have the luxury of advance screening of their performance. But it's reasonable during sign-up that we elicit from participants their gamer experience in terms that could be quantified as a covariate -- years of gaming, etc. More thoughts in #5 below.

4. no comment

5. we already did track each operator (or more specifically, teams that are groups of 5 operators). If we had each participant throughout the experiment (we didn't), we could have mixed operators across teams to reduce the effects of poor individual performance within their team. Importantly, we observe greater within-team variance in the response variable than we do across-team variance. So, including a 'team' term in the model does little to manage error. There's no correlation between the 'team' factor and any other factor, which is good. There's also little evidence that a team's score (the response variable) improved during later runs (i.e., in their third and fourth runs). Nor is their greater variance associated with earlier runs. Nevertheless, I think rather than having 4 replicates with a ton of noise, we could have had 3 'runs' of training to ensure we've eliminated any learning effect and ensure our record run was optimal and more similar w/r/t team performance capabilities. Hard to say.

I have lingering confusion about blocking variables in my design and analysis.

Re: I have lingering confusion about blocking variables in my design and analysis.

Re: I have lingering confusion about blocking variables in my design and analysis.

Re: I have lingering confusion about blocking variables in my design and analysis.

Re: I have lingering confusion about blocking variables in my design and analysis.

Re: I have lingering confusion about blocking variables in my design and analysis.