Sister blog of Physicists of the Caribbean in which I babble about non-astronomy stuff, because everyone needs a hobby

Wednesday 30 May 2018

The importance of visualising data and user interfaces

Amen to this.

When tools to produce refined-looking graphics are only accessible to, or usable by, professionals and/or when expert-to-public translation leads to inaccuracies, a fear of elegant-looking graphics, and a concomitant exploratory-explanatory divide is understandable. The divide often leaves behind the aforementioned idea that exceptionally fancy graphics, and the time invested to make them, are only for public consumption, and not really useful for serious scientists or others pursuing deep quantitative analysis.

​As the “democratization of data” continues, more and more services are making data behind both scholarly journal figures and public outreach graphics freely accessible. These open data sets represent a wealth of new information that researchers can combine with more traditional data acquisitions in their inquiries. If it’s quick ​and easy to get the data behind explanatory graphics, scientists will use those data, and learn more.

To generalise on that, take a particularly obvious lesson from nudge theory : software needs to be easy to use if people are going to use it. Visualisation is innately fun, but installing software is not. Interface design really matters - I'm far more likely to experiment with something if the basics just involve pressing a button. If have to pause to write even ten lines of code, well, I'm not going to do that. Case in point : HI source extraction. If the only software available doesn't let you record detected galaxies very easily, you might go away thinking that human-based detection is very difficult. In reality it is not, it's simply the lack of a sensible recording interface that makes it tedious. Humans are great at this, but they get bored by having to write down long numbers. Visualisation software should make it as easy as possible for humans to do what they're good at and ease the burden of the less interesting tasks.

This problem is particularly acute in fields where everyone writes their own code. Also, on a related point, visualisation software should have a freakin' GUI. No, I don't want to have to type commands to generate a plot, that's just plain silly. Code should be used to manipulate data, and not - wherever possible - be used to visualise it. Major caveat : it should always be possible to access the underlying code for experimentation with non-standard, custom techniques. Modern versions of Blender make it very easy to access the appropriate commands to control each module, thus giving the best of both worlds.

Sometimes, even though tools permit easy explanatory-exploratory travel, sociology or culture prohibits it. By way of a very simple example, consider color. To a physicist portraying temperature, the color blue encodes “hot,” since bluer photons have higher energy, but in popular Western culture, blue is used to mean cold. So, a figure colored correctly for a physicist will not necessarily work for public outreach. Still, though, a physicist’s figure produced in an exploratory system like the one portrayed in Figure 2 would work fine as an explanatory graphic for other physicists reading a scholarly report on the new findings.

It might be interesting to have some app/website that lets people play with the raw data behind images to compile them themselves. After a while, you start to lose the bias against thinking that what you can see with your eyes is an especially privileged view of the Universe, e.g. http://www.rhysy.net/the-hydrogen-sky.html

In 2006, no one quite knew what a “data scientist” was, but today, those words describe one of the most in-demand, high-paying, professions of the 21st century. Data volume is rising faster and faster, as is the diversity of data sets available – both in the commercial and academic sectors. Despite the rise of data science, though, today’s students are typically not trained–at any level of their education–in data visualization. Even the best graduate students in science at Harvard typically arrive completely naive about what visualization researchers have learned about how humans perceive graphical displays of information.

Over the past decade or so, more and more PhD students in science fields are taking computer science and data science courses. These courses often focus almost entirely on purely statistical approaches to data analysis, and they foster the idea that machine learning and AI are all that is needed for insight. They do not foster the ideas that one of the 20th century's greatest statisticians, John Tukey, put forward about visualization: 1) having the potential to give unanticipated insight to later be followed up with quantitative, statistical, analysis; or 2) that algorithms can make errors easily discovered and understood with visualization.

Exactly. It's true that human pattern recognition is fallible. However, it's at least equally true that statistical analyses can be fallible too. Having an objective procedure is not at all the same as being objectively correct. Working in concert, visualisation and statistical measurements are more than the sum of their parts. Finding a pattern suggests new ways to measure data, which in turn forces you to consider what it is you're actually measuring.
https://arxiv.org/abs/1805.11300

2 comments:

  1. "Having an objective procedure is not at all the same as being objectively correct." Amen. Added immediately to quotes.txt

    ReplyDelete
  2. It might seem clear to fuse a logo you begin marking your business venture however it might stun you that a great deal of organizations regularly neglect to give any thought to their logo structure or even to incorporate one by any means.logo design service

    ReplyDelete

Due to a small but consistent influx of spam, comments will now be checked before publishing. Only egregious spam/illegal/racist crap will be disapproved, everything else will be published.

Dune part two : first impressions

I covered Dune : Part One when it came out, so it seems only fair I should cover the "concluding" part as well. I'm gonna do ...