8/29/2017

Big Data and People Analytics


Nowadays we have access to more and more data about the tastes, opinions, and behaviors of employees. It's very useful for making decisions about performance, wages, turnover, leaves of absence, etc.
That's why we sometimes use "Big data for Human Resources" and "People Analytics" interchangeably.
But is this really big data? When can we say we've arrived at "Big?"

The BIG Confusion

I quite agree with Seth Grimes, Senior Consultant at Alta Plana Corporation when he claims:
Big data has taken a beating in recent years, the accusation being that marketers and analysts have stretched and squeezed the term to cover a multitude of disparate problems, technologies, and products.
It's definitely a confusing term. The "big" in "big data'" is just as inappropriate as using "new technologies" in the 21st century to refer to the internet and other digital environments. Are these technologies still new, after all these years?
So, I prefer to talk about just “data” and not "big data". Just as you should use "technology" and never "new technologies".
There is strikingly little consensus on what the "big" means in the famous and oft-mentioned "big data". And I'd like to add my own two cents to the bank of confusion that's already out there.
Dan Ariely, the author of Predictably Irrational, grappled with this confusion when he said in 2011:
Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…

What I found on Wikipedia was:
Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them.
The bold formatting is mine. Because who's to say what "traditional software" is? Traditional when? Danger, slippery slope! What is traditional software? They want to refer to it vaguely, without daring to mention the relational databases that are often unable to manage "large" datasets.
Along that same vein of going beyond "traditional" means, one author even says that big data starts when MS Excel doesn't have the capacity to manage that data.
The most frequently found definition is from 2001. Laney Douglas associates the "big" with volume, velocity, and variety of the datasets (the three V's). Many moons have passed since 2001, and what was once large volume, high velocity, and great variety aren't any longer. So we're still in the same boat.

Big data solved once and for all?

To resolve the situation once and for all, the Berkeley Department of Data Science published the enlightening post "What is big data?" on its blog datascience@berkeley. In their own words, this is how they decided to go about defining "big data":
"To settle the question once and for all, we asked more than 40 thought leaders in publishing, fashion, food, automobiles, medicine, marketing, and every industry in between how exactly they would define the phrase “big data.”
At the risk of seeming petty but only to highlight how funny it is for a data scientist, allow me to point out that in the post they collected 43 definitions, not 40. I'm sure they just rounded down to 40, and that's fine. But it could be that the remaining three were added later, and someone forgot to update the number. Or perhaps 43 was too large a number to be processed by typically used software and, understandably, calculation errors occurred.
The definitions provided by those who were consulted have all the smells, colors, and flavors imaginable. So, it's anything but "settled once and for all." I think that, despite the large Volume and Variety of these definitions, there is more and more confusion.
To quote Roy Batty, the blond guy in Blade Runner, I've seen uses of 'big data' you wouldn't believe.

Furthermore, all these definitions based on typically used software clearly have very limited validity. For me, the "big" in big data is always small and outdated,
so much that if datascience\@Berkeley asked me, I would answer,
I'm glad you asked me that question. "Big data" seems like a confusing term. The "big" in "big data'" is just as inappropriate as using "new technologies" in the 21st century to refer to the Internet and other digital environments. Are they still new, after all these years? "Big data" is now used to mean almost anything related to data, both collecting and storing it as well as the analytical process that leads to extracting economic value from it. So I prefer to talk about just “data” and not "big data." In the same way that I talk about technology and never "new technologies."
And I say "prefer" because you'll actually see me use "big data" more often than I'd like.
Because, as my dear friend Gotzon Zaratiegi used to say accompanied by one of his contagious laughs,
"We all have our contradictions, my friend."

The unbearable task of defining

I don't like definitions, and I often distrust dictionaries and other reference materials. Every time I read or hear the beginnings of articles, books, or lectures with phrases like "Dictionary X (Wikipedia, Oxford, or whatever) defines the word... ('big data,' 'analytics,' 'measurement,' 'marketing,' or whatever) as blah blah blah", it's like re-opening an old wound.
And I have good reason to be skeptical of definitions and dictionaries. In the last century, almost in another life prior to my reincarnation as a consultant in 1995, I was a linguist and worked for several years as a dictionary editor. I'm very familiar with the pitfalls that are hidden in these definitions, and the messes that were—and are—made because of the naivety of gullible readers.
Obi Wan Kenobi:
"Who's more foolish, the fool or the fool who follows him?"
Star Wars Episode IV-A New Hope
Even though I don't feel like it, I can't move on without marking the ground where I've stepped. So now marks the beginning of my spiel on definitions.

What does big data really mean for data professionals?


Big Data is about Hadoop, Spark, and similar technologies, which allow you to store and process data through clusters of computers, usually cheap ones, that work together adding storage and power
With large amounts of data currently available, companies in almost every industry are storing and analyzing it in order to gain a competitive edge.
Data is often used to discover customer behavior in order to minimize the risk of loss and maximize the value of each one. We call it "Customer Analytics."
People Analytics, as you probably know by now, is the science and art of applying data science techniques to HR in order to better understand our collaborators and increase their degree of satisfaction and productivity.
However, data and analytics aren't the same as big data, even though the media insists on mixing them up. So, when does data start to earn the adjective "big"?
Among data professionals, there's an almost universal consensus when it comes to defining big data. And, get this, it has nothing to do with what you see every day in the media.
For those of us who work with data, without getting very technical, a big data project is simply the one where more than one computer is needed to process that data. A big data project is always related to concepts such as distributed computing like Hadoop, Spark, and similar technologies, which allow you to store and process data through clusters of computers, usually cheap ones, that work together adding storage and power.
In analytics, we call it "big data" technology when, because of the volume of data, the need for quick responses, or the variety of the data types, you have to use a group of computers that work together as if they were one.
So yes, big data is essentially the same as what Doug Laney said in 2001, the three V's (volume, speed, and variety); it involves challenges that demand resources and computing processes that require computers to divide the work.

More and more companies are using data and analytics. Few people actually use big data.

In fact, very few organizations need to rely on big data technologies to extract value from HR data. All the professionals in the field would back me up: In all the projects we work on, a shared server is able to process the data in the appropriate time. Thus, it isn't necessary to resort to big data technologies, as long as you're working with structured data...

Sources of Data

Every day, employees in a company show multiple signs that can be used to measure their performance, their degree of job satisfaction, and their commitment to the organization. Technology has enormously simplified the measurement and collection of objective parameters on work activities (attendance records or "clocking in," periodic employee performance reviews, number of e-mails sent, time spent in meetings, conference room reservations, shared calendars, etc.). It can also measure subjective parameters (employee feedback about work relationships or the company, its products, or its services) with different metrics specific to the organization or more standardized ones like the Employee Net Promoter Score.

Employees could also be sharing very relevant information on a daily basis about their state of mind regarding the organization through their interaction with co-workers on different channels of internal or external social media networks, which could be tapped into and analyzed.
Text Analytics is another important contributor to economic value. Seth Grimes is a well-known specialist in text analytics. In a presentation he stated that, when trying to collect the Voice of the Customer, humans perform poorly, and you need tools to help interpret and capitalize on the large amounts of data found on social networks, web chats, or customer calls.
Companies like MeaningCloud are bringing the technology and the knowledge that text analytics provides from the VoC (Voice of the Customer) to the VoE (Voice of the Employee).