How to perform gene expression data analysis in JMP® Pro 17: Part 2 - Control an...

Valerie_Nedbal · Aug 8, 2023 10:30 AM

In the first part of this blog, I introduced a workflow for control and quality check with gene expression data in JMP Pro 17. Keep reading to see how to apply this step.

Control and Quality

Before doing any types of analysis, the samples need to be quality checked to ensure a coherent outcome.

Start with the Hierarchical Cluster platform to check whether samples from different origins can be clustered together. You can do this by grouping samples and/or genes into clusters based on their similarity in expression patterns. If samples can be clustered together based on their origins, then we can be confident that we have a reliable and accurate representation of the grouping variable (Tissue, Stage and Animal Type).

With the advent of JMP 17, a new computing approach to Ward’s method – called Hybrid Ward – has been implemented. It is especially useful for omics data where there may be tens (or even hundreds) of thousands of items to cluster.

Dendogram Post 2.jpg

Each line represents one sample. We have color coded the samples so that we can see the differences between Cell/Tissue Stages. We can see homogeneous clusters of samples, except for a few (Cell/Tissue Stage Selected and Differentiated).

The Multivariate Embedding platform uses another method from the one used in Hierarchical Clustering to cluster samples into groups. It allows a very high dimensional set of data to be mapped into a low dimensional space, so that points that are near each other in high dimensional space are near each other in the resulting low dimensional space. Typically two or three dimensions that can be shown in a 2D or 3D graph. I have used the t-SNE method, which demonstrates how the data clusters.

t-SNE Plot Post 2 .jpg

In this graph, each circle (Cows) or triangle (Heifers) represents one sample. We can see homogeneous clusters of samples, particularly between Granulosa and Theca Cell Type (illustrated by the hashed red line), and also Luteinized versus Selected/Differentiated Cell/Tissue Stage. Cows and Heifers do not cluster separately. As we will most likely see, there is not a significant difference in gene expression between Cow and Heifer, nor in the tissue stages between selected and differentiated. Knowing this, it makes sense to compare the differential expression of genes, either between Tissue Type or Tissue Stage.

In the next blog, I will talk on how to compare groups using the response screening and fit Y by X platform.