Our Big Cleanup of Our Big Data

Screen Shot 2013-05-07 at 11.26.15 AMInformatica, the company for which I work, deals in big data challenges every day. It’s what we do — help customers turn their data into actionable business insights. When I took the helm as VP of global talent acquisition I was surprised to learn that the data within the talent acquisition function was not up to the standards Informatica lives by. Clearly, talent acquisition was not seeing the huge competitive advantage that data could bring — at least not the way sales, marketing, and research were viewing it. And that, to me, seemed like a major problem, but also a terrific opportunity!

This is the story of how Informatica Talent Acquisition became data-centric and used that centricity to our advantage to fix the problem.

Go to the Source

No matter how big or small your company, the data related to talent comes from varied and diverse roles within the talent acquisition function. The role may be named Researcher, Sourcer, Talent Lead Generator, or even Recruiter. Putting the name aside, the data comes from the first person to connect with a potential candidate. Usually that person, or in Informatica’s case, that team, is the one who finds the data and captures it. Because talent acquisition in the past was largely about making a single hire, our data was captured haphazardly and stored with …. let’s say, less than best practices. In addition, we didn’t know big data was about to hit us square in the face with more social data points than yesteryear’s Talent Sourcer could believe. I went to our sourcing team as well as our research department to begin assessing how we were acquiring, storing, and streamlining our data.

Get Help

Data is at the heart of so many recruiting conversations today. But it’s not just about the data, it’s the access to the right data at the right time by the right person, which is paramount to making good business or hiring decisions. This led me to Dave Mendoza, a talent acquisition strategy consultant, who had developed a process called “talent mapping” which we applied to help us identity, retrieve, and categorize our talent data. From that point he was able to create our Talent Knowledge Library. This library allows us to store, access, and finally develop a talent data methodology aptly named, Future-casting. This methodology defines a process wherein Informatica can use its talent acquisition data for competitive intelligence, workforce planning, and candidate nurturing.

Get Centralized

The most valuable part of our transformation process was the implementation of our Talent Knowledge Library. The weakest point with this new solution was not the capturing or categorizing of our data; it was that we had no central repository that would allow for unstructured data to be housed, amended, and retrieved by multiple talent sourcers. To solve this issue we implemented a candidate relationship management application — Avature. This tool allowed us to build a talent library — a single source repository of our global talent pools, which could then be accessed by all the roles within the talent acquisition organization. Having a centralized database has improved our hiring efficiencies such as decreasing the time- and cost-to-fill requisitions.

Take Ownership

Because Informatica is a global company, it doesn’t make sense for us to house all of our data in a proprietary system. While the new social sourcing platforms are fast and powerful, the data doesn’t belong to the company once entered. That didn’t work for us, especially given we had teams all over the world working with different tools. With a practical approach to data capture and retrieval, we now have a central databank of very specific competitive intelligence that has the ability to withstand time because the tool can capture social and mobile data and thus is built for future-proofing. Because the data is ours, we retain our competitive advantage, even during talent acquisition transition periods.

Set Standards

One truth became very clear as we took on this data-centric approach to talent acquisition: if you don’t set standards for processes and protocols around your data, you may as well use a bucket, as no repository will be of much use without accurate and useable data that can be accessed consistently by everyone. Being able to search the data according to company-wide standards was both obvious and mind-blowing. These four standards are what we put into place when creating our talent library: 1) Data must be usable and searchable; 2) Extraction and leverage of data must be easy; 3) Data can be migrated from multiple lead generation platforms; 4) Data can be categorized, tagged, and mapped to talent for ease of segmentation.

Article Continues Below

Embrace Social

In today’s globalized world, people frequently change their physical address, their employer, and their email addresses, but they rarely change their Twitter handle or Facebook name. This is why “people data” quickly turns outdated and social data is the new commodity within the enterprise. People who use social networks are leaving a living, always-fresh data shadow making it easy for us to capture their most relevant contact data. It sounds a bit like we’ve become on-line stalkers, but marketers and business development professionals have been doing it for years. And just as we move toward predictive modeling on these pieces of personal data, so too are our competitors for talent.

By configuring our CRM systems to accurately capture and search these social data points, our sourcing team is more efficient and effective. It has reduced duplicate entries which caused candidate fatigue in our recruiting processes.

I think Dave says it perfectly in his recent white paper “Future-casting: How the rise of Big Social Data API is set to Transform the Business of Recruiting”: “Future-casting has the ability to review the career progression of both internal employees and external candidates. This stems directly from the ability to track candidates more accurately via their social data. Now, more than ever before, corporations and the talent acquisition professionals within them can keep fresh data on every candidate in their system, with a few simple tweaks. This new philosophy of future-casting puts dynamic data into the hands of the organization, reducing dependency on job boards and even social platforms so they can create their own convergent model that combines all three.”

Results Will Come

At Informatica we saw results very quickly because we had an expert dedicated to addressing the challenges, and we were committed to making our data work for us. But if you don’t have a global sourcing team or a full-time consultant, you can still begin at the top of this list. Talk to your CRM or ATS vendors about how you can tweak your tracking systems. Assess and map your current talent process. Begin using products that allow you to own your own data. Finally, set standards such as the ones I mentioned previously and make sure everyone adheres to them.


8 Comments on “Our Big Cleanup of Our Big Data

  1. I enjoyed your article and agree with you about the necessity to own your own data. I also believe that allows for an organization to own the relationship with the prospect as well. Owning that relationship is an imperative when we are building some of our relationships in the social channels which are outside of our control.

    It sounds like “Super Dave” made some great contributions. Well done.

  2. Finally an article from someone who has been there done that and is knowledgeable about the topic of “Big Data”. So many times I hear the terms “Big Data” and come to find out they are referring to 10,000 records and a handful of data points.

    This is an excellent follow up to the incredible white paper Dave wrote. I think the next article needs to lay out the exact methods/steps used in how he completed the “talent mapping” process.

    Come on guys pull back the curtain even more!!!

  3. Brad, you should have lunch/dinner with the big data guys at LinkedIn. Seeing their heat maps of talent really helped me see the future of visual and predictive analytics in recruiting. Our HRIS systems can’t provide us the data.

    You know what’s sad? It’s easier for me to paint a picture of internal capabilities at my company, by looking at the employees LI profiles. I have to go LinkedIn to get data on my own company.

  4. @ Brad: Thank you for the article- I find the topic fascinating and potentially very disruptive to the concept of sourcing. If I may ask:
    1) How many individual records do you have?
    2) How often do you update/”clean” them, and is it done automatically?

    Looking forward to Informatica’s RIS presentation in the Talent Demand Forecasting Section of Innovation.

    Keith keithsrj@sbcglobal.net

  5. Thanks to all for the comments. @Keith, to answer your questions re size. In this situation big data is not really about size, but more about data complexity. Big data is the ability to take transactional data and merge them with interaction data (ie: Social media) in an environment that is too complex for a relational data base to cope with. LinkedIn certainly has a big data play with the volumes of data they have at their fingertips, but alas for the rest of us we need to pull data from our own data sources and try and map it all together. ATS, CRM, HRIS. Informatica has a large number of cloud connector products that we are using to help consolidate the data then we can start to look for trending etc. Your IT groups are able to download a subscription service from http://www.informaticacloud.com/products/integration-platform-as-a-service-ipaas/connector-toolkit.html. There are many connectors for many different applications available. If there is not the connector you’re looking for, Informatica have partners that can also create the connectors for your specific systems.

    Regarding cleaning the data, that’s a never ending process and we are automating the flow of data from system to system as much as possible. One process we work on now is negation filters. This is when we run a report to tell me when data is not present vs. when it is. Good example is run a report from our CRM that tell me all the records that are missing social URL’s or missing a cell phone. This provides us with a to-do list we have a data team focus on to help continually cleanse the data.

  6. Thanks, Brad. Would it be meaningful for Informatica to (hypothetically) say :
    “As of COB yesterday, we have 657,212 unique individual resumes/”external personnel records” of which the contact/social information which has been verified as valid no more than 30 days ago” or something similar?



  7. though certainly not anywhere need those volumes, that’s the direction. Having a cell phone number and social links provides pipeline longevity and with all the data quality processes in place you end up being aligned with the future casting methodology.



Leave a Comment

Your email address will not be published. Required fields are marked *