How to Incorporate Text Analytics into the Voice of the Employee

Silhouettes an dialog bubbles
Many users who want to incorporate text analytics into their Voice of the Employee (VoE) analysis are not sure how to translate their business requirements into something compatible with their current processes.

1.   Entity extraction

It's very useful to be able to automatically extract structured information from a text such as names, topics, dates, places, or amounts of money.
This task is often known as Named Entity Recognition.
In the beginning, it may only seem that it deals with finding names that appear in the text. But there's quite a bit more to it Think of synonyms: There are many ways of referring to the same concept. If I talk about attrition, I can also called it "turnover," "talent flight," "job abandonment," "leaving the company," "going to another organization," "quitting," or several other things.
You can use dictionaries to extract the specific entities or concepts from your own linguistic area.
A real example taken from exit surveys
An exit interview is a survey taken by someone who is leaving an organization. This organization can use the information obtained from the exit interview to evaluate what they should improve, change, or maintain as is.
The analysis results provide valuable information that can help improve turnover, optimize the recruitment process and hiring, reduce absences, improve innovation, maintain performance, and reduce possible conflicts.
Here are some examples taken from actual exit interviews. The names of people and companies are anonymous. I'll talk about exit surveys in more depth in another part of this chapter.

2.   Classification or categorization

Classification consists of assigning a text to one or more categories within a predefined taxonomy, keeping the overall content of the text in mind. In general, it requires you to have previously trained or configured a classification model that is specific to the taxonomy that you want to use. Classification is used to identify the topic (or topics) addressed in the text as a whole.
Hence, a classification model contains the list of categories, as well as the necessary means to classify the documents in the defined classes. For example, a model can classify the reasons for leaving that the employees give in exit interviews.
The quality or accuracy of the analysis is generally evaluated in terms of accuracy (the quantity of detected elements that are relevant) and coverage (the quantity of relevant elements detected). Given a particular analysis technology, accuracy and coverage are often antagonistic; increases in one can decrease the other and vice versa. For this reason, the key is to find a balance between the two that's optimal for the application.

Classification model for social means

This simplified model has been applied with great results to:
·       Private forums
·       Public forums
·       Internal social networks
·       Publicly accessible social networks or ones accessed by company users

Taxonomy: Categories for VoE

Policies&Practices>CompanyCommitment 35
Therefore, each category of the model (like "Reward>Salary") includes additional training documents that allow the algorithm to learn by text examples that effectively correspond to that category. The model also includes rules based on linguistic resources that improve the accuracy in the "Reward>Salary" category.
An example of classification from an intranet forum
The data source was a series of threads in an employee forum within the company's intranet. The insight extraction model was applied to demonstrate the model's capacity. Figure 2 shows the frequencies of the insights gained, in descending order.

Figure1 - Insight extraction
The topics that appear most in VoE analysis are information technologies (within the people category) and technology (within the work category).

An example of classification for exit surveys and the causes of talent flight

An analysis was done on the exit interviews from the last year using the VoE analysis tool. The company observed the impact of talent flight as a very significant event. The questions asked in these interviews revolve around:
·       Longevity in the company: Is the company unfriendly toward new arrivals? Or do people leave the company after having spent several years with the company and they don't see their career progressing as they had expected? Depending on which longevity segment the problem was in, they could propose different actionable measures, like improving the onboarding plan for the new arrivals, modifying the hiring criteria, or encouraging internal promotion to improve the career plans for the veteran employees.
·       Department and projects: Sometimes the problem is located in particular projects, departments, or even with managers who provoke employee burnout. It's difficult to identify them; the cycle of discontent that leads to job abandonment can take to two to three years. In this case, you have to think about putting the employee under a different manager or taking other more drastic measures to nip the discontent in the bud.
·       Training and benefits: Do training and benefits offered to employees really add to their job satisfaction? Are there really measures that can be taken to help make employees loyal? Are some of these measures a waste of resources?
·       Opinions: The answer to the question, "why have you decided to leave the company?" is the most relevant and most complicated area to analyze. It contains the causes that the employee states on their exit survey. It's an open field where the employee explains the reasons for leaving. Being open text fields with unstructured data, and emotionally charged in many cases, most HR departments don't analyze them appropriately. Especially in large companies where outgoing employees and new hires are frequent. However, that's exactly where you'll find the most relevant information that can help reduce turnover.
Since it's anonymous, the only fields included in this survey are longevity in the company and positive and negative aspects of the company (both fields are an open response).
As a preliminary step, you apply unstructured information analysis technology which has been developed to completely automate the analysis of all the exit surveys provided, specifically the "positive aspects" or "strengths" questions, and the "negative aspects" or "weaknesses" questions. This technology helps draw out multi-level labels that allow text to be classified into different dimensions or points of view. It systematically codifies what the employee is saying, the so-called "voice of the employee."

Classification model developed for talent flight

A classification model was developed specifically for talent flight. The model understands a list of categories (taxonomy) and the resources necessary to classify the documents in the defined classes.
Taxonomy: Categories of Talent Flight
The specific model applied was the classification model developed for talent flight. It has two depth levels with seven main categories and seven sub-categories.
In this case, all the responses were in Spanish, but this technology allows for text analysis in other languages, or even mixed languages, an essential condition in companies with branches and customers in different countries.

Figure2 - Diagram with the main reasons for job abandonment

Positive/Negative aspects analysis

Figure 5 shows a bar graph displaying the frequency of the terms mentioned in the surveys. You can see that the motives most frequently mentioned are professional improvement (56% of the surveys), better salary (47%), and new experiences (21%).

Figure3 - Frequencies of the terms mentioned in the surveys

Segmentation by longevity

By dividing employees into longevity groups (those who've been employed with the company less than twelve months, one to three years , and more than three years), very valuable conclusions were drawn.
On one hand, the people with less than twelve months (26%) mainly perceived the company's strength as the working environment, while their complaints were mainly directed at the lack of internal communication.
Among those who had been with the company one to three years (32%), the strong point continued to be the working environment, along with positive relationships with colleagues. But in addition to the lack of internal communication, the negative factors were salary and growth opportunities.
Among the people with the most seniority in the company (42%), the results radically change. Among the strengths that stand out were their colleagues and internal communication, while the weak points were employee motivation, co-worker expertise, and salary (as a third factor).

A few conclusions

This analysis objectively and tangibly proves how a single aspect (such as internal communication, specialized knowledge, or salary range) is perceived in a radically different way depending on the employee profile or how long they've been with the company.
At the root of this study, the company can improve the effectiveness of employee loyalty plans by focusing on improving internal communication and support for new hires, and technical training plans for those who have been with the company three or more years, all thanks to previously unobtainable actionable information.

3. Text classification and clustering

Text clustering: "grouping a set of texts that are more similar to each other than to the rest of the groups." In machine-learning, clustering belongs to the unsupervised algorithms category: It uses unlabeled data. The goal is to explore them to find a structure or a way to organize them. Therefore, unsupervised classification systems do not have previously classified examples available, but using the properties of the examples, you can try to create a cluster according to similarity.
Classification or categorization is a form of supervised learning in which you work off of a previous classification and assign each document to one or more available classes.
A frequent use of text clustering consists of applying it to texts that you're classifying in order to identify new categories that can be added to the model. Clustering helps make discoveries that can enrich the taxonomy. Once again, you discover what Donald Rumsfeld noted: "what we don't know that we don't know."

4. Sentiment analysis

In fact, sentiment analysis is a huge document classification task that's done automatically, according to the positive or negative connotations in the document.
In general terms, sentiment analysis attempts to determine a person's attitude regarding a topic or the general contextual polarity of a document. The polarity can be obtained on a global level from the complete text, or you can go deeper and see the polarity expressed in each of the phrases that make up the text, or even about a specific topic.

 Classification models are likely the most frequently used task to understand the voice of the employee. Sentiment analysis is nothing but a form of classification.