Working with data

The realistic value of the data analysis can only be ensured by following three basic areas, keeping an eye on these areas is necessary for correct data processing.

The first area is the questions we want answered.


We basically live a conversation with data. We ask it questions and it will give us an answer. The interviewee can only answer the question if he / she has the correct information. Likewise, the data set responds only if it is based on the correct records and variables. This means we need to carefully consider which questions we need answered before we start taking the data. Basically we will work backwards.


First of all, we need to list which statements supported by real data we want to retrieve. Then we decide what records and variables we need to take and analyze to get the statement / output.


In summary, it is always better to request and acquire all available data in the system (database). Time-outs providing partial information for a particular output may cause other questions to be answered, and we will be forced to document and capture data.

The second area is that the data is neutral and needs to be cleaned.

This is usually the largest and most demanding part of the job. On a simple example, we'll show you what's going on. Acquiring a client directory in your hotel booking system usually includes filling in the "address" box (the titles are left blank). It often includes addressing (not filled), and "Mr.", "Mrs.". If we do not have a direct opening option, we are dependent on how they filled in the form. From the "master" we get to abbreviations and typing, so we will open variations of options such as "p.", "P", "pn", English "Mr." or "M" . Therefore, first of all, we must standardize the data acquisition - assign a single entry option to each data. There is no need for existing data to modify this form.


To quickly find out how the database is mismatched, it can be exported to a spreadsheet. In a simple field, such as such an address, we can immediately see the result.


In the case of data that is automatically transferred from the online booking system to the hotel directory, it is good to enter opening menus where the client has made selections for a particular option. For entering names and addresses, it is advisable to place a legend in which shape the entry should be written. E.g. the name and surname in exactly the form as it is written in the travel document or the OP, without the abbreviations of the middle and other names, or the address. If any information is not known, choose "not known" or fill out at all. This prevents duplication and multiplication of client cards and other misinformation. If any check-in data does not match the check-in, it is always necessary to correct it.

The third area is that data may have unkown issues.

Therefore, we need to be sure before processing that no new code was introduced during the assignment, is not listed in the directive, or that the system operator did not fill in the "unknown" field with some other data "just to get something there" to any functional change in the system itself, etc. For these reasons, we must first look at the result of the analysis with common sense. "Does it make sense? Does this conclusion seem fitting? "