Using Zero-Shot Classification for identifying advanced company attributes to allow targeted B2B sampling

The B2B database of Sample Solutions covers of 200M company records in combination with contact data which can be leveraged for a myriad numbers of research appliations across qual and quant but also different modes.

While Sample Solutions covers over 60 data points on its B2B sample that is used for various establishment surveys, there are still cases where our data points do not cover all the requirements.

Standardly there are the following items that can be derived from a standard website if not available in our standard B2B data package.

  1. Company size (number of employees, revenue, etc.)
  2. Industry or sector
  3. Products or services offered
  4. Location(s) of the company’s offices or facilities
  5. Contact information (phone number, email address, physical address)
  6. Social media presence and engagement

However, often researchers want to narrow down the companies they would like to target on more specific items such as ISO certifications, technologies employed, open job positions, stand on ESG, growing companies, companies that are going through internationalisation, companies that are working with remote teams and so on.

In this article we are going through some of the applications how we leverage Big Data and zero-shot classifications to find suitable respondents for your next research project or the so-called needle in the haystack.

What are the steps to enrich additional data points?

Use cases for zero-shot classifications

Zero-shot classification is a type of machine learning technique that allows a model to recognize and classify objects without prior training on those specific objects. In other words, the model can classify objects it has never seen before, without any additional training.

Traditional machine learning models require large amounts of labeled data to be trained on specific objects or classes. However, zero-shot classification models can generalize their knowledge to new classes by leveraging semantic relationships between different classes.

For example, if a model has been trained to recognize different types of animals such as dogs, cats, and birds, it can still classify an image of a zebra even though it has never seen one before. This is because the model understands that zebras are similar to horses, which are similar to other animals it has already been trained on.

Example of a zero-shot classification

Example of a zero-shot classification

The example above shows how a specific text can be classified into the right category.

Zero-shot classification is particularly useful in scenarios where there is limited labeled data available or when new classes need to be added to an existing model without retraining. It can also be used to identify and classify uncommon or rare objects that may not have been included in the original training dataset.

Certifications

ISO certifications are a set of standards that are internationally recognized for ensuring quality management in various industries. ISO stands for International Organization for Standardization, which is a non-governmental organization that develops and publishes these standards.

ISO certifications cover a wide range of areas, including quality management, environmental management, occupational health and safety, information security, and more. These certifications are designed to help organizations establish and maintain effective management systems that meet the needs of their customers while also complying with relevant regulatory requirements.

Within market research, often ISO 20252 for quality management but also ISO 27001 is accredited. Needless to say that Sample Solutions is certified for both.

Technologies

In today’s digital age, having a website is crucial for businesses to thrive and succeed. However, it’s not just the design and content of a website that matters, but also the underlying technologies that power it. In this article, we’ll explore some common website technologies that a company may have installed.

First and foremost, most websites are built using a content management system (CMS). A CMS is a software application that allows users to create, manage, and publish digital content, such as web pages and blog posts. Some popular CMS options include WordPress, Drupal, and Joomla.

Another important technology that companies may use on their websites is a CRM system or a marketing automation platform. If you would like to conduct research among specific users of enterprise technology, this can be extremely useful.

Technology Adoption

Next to generic technologies that a company provides, it is possible to look at the adoption of new technologies.

This could be a new CRM system, a chatbot or a marketing tracking application that can indicate increase focus on growth within a company.

By tracking the HTML code and the CNAME directories of a companies domain, it is possible to showcase changes within the company within the last 30 days.

Growing Companies

The Sample Solutions data goes beyond static data. We can track historic employment and churn data of staff members to accurately showcase growing or accelerating companies in combination with companies that are expanding into new territories. By analyzing historic employee data, we can track the expansion of a company’s workforce over time, which is a strong indicator of growth. Additionally, monitoring new job postings can provide insights into the roles and skills that are in demand, suggesting areas where the company is investing. Other valuable information includes employee endorsements and skill sets, which can highlight emerging trends and key competencies within the company. Combining these data points allows for a comprehensive understanding of a company’s growth trajectory and strategic direction.

Another option to identify growing companies is the recruitment or new job positions in sales and business development such as job titles like MENA or APAC. Recruiting for sales usually can indicate accelerated growth.

Stand on Remote Work

To identify remote working companies we usually start by examining the company’s profile for keywords like “remote,” “distributed team,” or “flexible work” in their description or posts. Check for mentions of remote work policies or global hiring practices in the “About” section. Additionally, review employee profiles and job titles for indications of remote work, such as “Remote Worker” or location-independent roles. Employee posts and company updates often highlight remote work culture, virtual events, and team collaborations across different time zones, providing further evidence of a remote working environment.

Next to that we analze the network of employees at a company to check the degree of cnetralization within a country or even on an international level.

By doing so we can target specific companies which are heavily focussed on remote work which allows researchers to find suitable respondents for eg. remote working software.

Environmental, Social und Governance

Environmental, Social, and Governance (ESG) is a set of criteria used to evaluate a company’s performance in three key areas: environmental impact, social responsibility, and corporate governance. These factors are becoming increasingly important for investors who want to make informed decisions about where to invest their money.

The environmental component of ESG looks at a company’s impact on the environment. This includes things like the company’s carbon footprint, energy usage, waste management practices, and water usage. Companies that are committed to reducing their environmental impact are more likely to be viewed favorably by investors who are concerned about sustainability.

The challenge

How can we identify companies that claim to have ISO certifications or actually care about ESG without actually reaching out to them and ask them directly.

For retrieving back the information, we make use of the zero-shot classification model BART. Before we actually can classify content, we need to find the suitable content

The BART model

BART (Bidirectional and Auto-Regressive Transformer) is a language model developed by Facebook AI Research. It is a neural network-based model that uses the transformer architecture to generate text. The BART model is trained on a combination of unsupervised and supervised learning methods, which makes it suitable for a wide range of natural language processing tasks.

One of the key features of BART is its ability to perform bidirectional encoding of text. This means that the model can take into account both the left and right context of a word when generating text. This feature allows BART to produce more coherent and fluent text compared to traditional auto-regressive models. As a side note, this article was not written by BART 🙂

In contrast to transfer learning, zero-shot learning does not require any pre-trained model for a specific use case but we can make use of the general model for classifications.

Conclusion

Having a great B2B database is one thing and signficantly helps us in sampling offline and online B2B projects. Nevertheless there are limitations to what kind of variables can be stored.

In these cases we can use extended company relations, companies network data (eg. job posts, historic data), tracking data from a website which can be used as a predictor and secondary data to sample very likely respondents for your survey data.

Within Sample Solutions we believe in one hand in automation of a majority of B2B projects but then again you have projects that are so niche that it is difficult to automate these. Here is where we are different and come up with tech-driven solutions to sample your next B2B project.

Carsten is the founder of Sample Solutions and Lifepanel with over a decade of sampling and social research experience. A trained aerospace engineer who discovered his love for random phone numbers.

  • Home
  • Solutions
  • Who We Are
  • Contact