The other day I was registering on a website. Their sloppy data entry form got me thinking about what is really important if we want to derive information from data.
Among other data, the website wanted to know where I was from. First, there was a country drop down list where I quickly chose my country, Slovenia. Then they wanted to know what region I was from. There was again a drop down list but I couldn’t find my region on it. Surprisingly, Žalec, a small town, was highlighted as the default choice for the region. What was going on? This certainly piqued my interest and further exploration was required.
I scrolled down and on closer inspection discovered that the drop down list was actually composed of two lists, one representing administrative regions and another representing geographical regions, one after the other. A region could be entered by one of two different values, depending on whether the user chose the administrative or the geographical interpretation. It seems as if someone loaded the regions from two different sources without realizing they overlap. What a mess! What were they thinking?
Each of the two lists seemed to be sorted in alphabetical order and that’s probably why Žalec came first – the sort wasn’t done in the local language and so special characters show up first. After examining the drop down list and not finding my region (it wasn’t on the first half of the list but I later found it on the second half) I was tempted to do what many other users probably do as well: give up on finding the correct choice and just leave the default value. And at some point in the future, when someone decides to do analyses by geographical region, they will come to the conclusion that the most active users from Slovenia are from Žalec.
So I wonder what is the point of such random data entry? Who needs it and why? It seems that not much thought was put into designing the user interface. If users can’t find their location easily from the drop down list, they won’t enter it. And if the default value is not set to unknown but rather to an arbitrary value from the list, that’s what will end up being populated.
When designing entry screens, each field should be considered for its purpose, tested and verified to ensure that it delivers the required information so that useful analyses could be performed in the future. No analysis software or tools will derive useful information from data that was poorly acquired in the first place.