Data visualization helps engineers “see” more

As single-cell technologies continue to improve, it has become possible to measure multiple parameters simultaneously at the cellular- and even subcellular- level. Flow cytometry, for example, allows for the measurement of hundreds of properties for each cell, including features related to its shape, size, and protein expression levels. This new information has allowed for the discovery of behaviors that were previously unseen using population measures, such as Western blotting.

Example of data visualization (Source: http://www.flickr.com/photos/luc/5418037955/)

Example of data visualization (Source: http://www.flickr.com/photos/luc/5418037955/)

Along with this new information, however, comes a challenge. With multiple dimensions of data, it is difficult to perceive and properly interpret the message presented. Because of this, much of the information gained from single-cell resolution can be obscured or completely lost depending on the way the data are presented.

This perception problem leads to the need for dimension reduction, or a method that can transform the multi-dimensional data into a dataset with only two or three dimensions. To do this, a technique, “t-SNE”  was developed to visualize multi-dimensional data through the identifications of similar clusters within the dataset. t-SNE works by minimizing the differences between data points in order to identify the regions where data are most similar. Once these similarities are identified, t-SNE remaps the multi-dimensional data into three dimensions for visualization – two arbitrary axes and color, for visual separation of clusters. This remapping of the data allows for visual identification of the difference in data points that would be nearly impossible to show with the data in its original form.

One current application of t-SNE, a joint effort from groups in Stanford and Columbia University, uses this data visualization technique to show the heterogeneity in leukemia, and even the tumor subtypes they believe are responsible for relapse. This group took bone marrow samples from individuals who were considered healthy and those with leukemia, and compared these samples using flow cytometry.

With flow cytometry, they were able to study the expression of 29 different proteins, as well as the morphologic features of each cell. After processing the data using t-SNE, the group was able to not only distinguish the healthy cells from the cancerous cells, but also identify the protein expression profiles that were associated with relapse in these patients.

Data visualization is an increasingly important area of research as the amount of information gained from each experiment continues to increase. t-SNE is only one example of many that aims to allow for better perception of high dimension data to maximize its impact. This highlights the need for a researcher to not only design well-planned experiments, but also be creative with the presentation of their data. Creativity, combined with interesting data, will allow for a more thorough presentation of information and ultimately foster a better understanding of many areas in biologic research today, from genomic data to single-cell technologies.

Jacob Sarneki is a second year PhD student in Dr. Wirtz’s lab working on quantification of signal transduction at single cell resolution.