imaga: Combining two major E-commerce platforms on different stacks into a unified CRM.

Daniel Solovev

Head of PHP Department

Hey there! I'm Daniel Solovyov, heading up the PHP division—the largest segment in Imaga's development wing. I'd like to tell you how we tackled a CRM system integration into two entirely unfamiliar IT setups, saving the sales teams of two major online stores from drowning in endless email spreadsheets. I'll go into the nitty-gritty of what data we exported, how we de-duplicated it, and the services we tapped for validation.

Background. How did the task come about?

Two online retailers approached us; let's call them A and B for simplicity. Each had its own client database of legal entities, with accumulated data spanning their entire operational history. The handling of B2B clients operated on an «everything via email» approach. Sales managers exchanged numerous outdated Excel spreadsheets. Only some people were prepared for such procedures, and managers often left within their first few days.

The business was determined to revamp this process, automating all feasible procedures to advance the B2B sector and actively invest in its growth.

Limitations

The primary limitations were:

MVP timeframe: six months (from June-July to New Year's). The client approached us during the summer, and we aimed for a release freeze before the New Year.
Budget: Crafting a massive, fail-safe, and consequently expensive behemoth is out of the question. The business aims to test hypotheses quickly. The faster, the better.

Additionally, there were other constraints:

Unified Interface: Sales managers require a singular tool to manage clients efficiently. This tool includes email correspondence, IP phones, report generation, and access to all client profiles and orders.
Low Entry Barrier for Managers: The system interface should be straightforward and intuitive for managers to use effortlessly.

Key task

Establish a unified client database for legal entities and enhance it with data from the existing databases of two online stores for future sales purposes.

To achieve this, we need to:

Acquire and incorporate the necessary dataset into our new system to enable managers to access and engage with it.
Turn them into CRM entities. We opted for the Bitrix24 solution as our CRM.

The decision to opt for Bitrix24 is influenced by the constraints mentioned earlier, along with the fact that the platform provides a broad range of features:

Integration with AD/LDAP, as required for SSO;
Integration with IP Phone systems;
Integration with email;
An intuitive and animated interface.

What data are we exporting?

Let's start collecting information about what we want to see in our CRM system. We'll identify the key entities and their necessary attributes for input while also setting up criteria to validate the data. This process will help us define the desired outcome of our efforts.

In the context of Bitrix24, we'll be engaging with two primary entities:

Company — Details regarding a legal entity, encompassing its legal structure and banking information.

Contact — Information about an individual contact person, such as a procurement manager. This contact is associated with a specific company.

Next, we establish the business requirements for Bitrix24:

Only active companies are exported.
Duplicate contacts and companies are excluded.

Technical software development stage: Working within an external infrastructure.

We ventured into the infrastructure of two massive e-commerce platforms where we needed to familiarize ourselves. So, at this stage, we needed to address the following questions:

Who can assist from the e-commerce side with integration (specific teams and individuals)? Who should we reach out to? Who can provide insights into network connectivity, products, and systems within each store?
How will data exchange occur? Real-time? Large file uploads? Message brokers? REST?
Will any modifications be required from the e-commerce side?
What technology stack will be needed (aside from Bitrix24, PHP, and MySQL)?
How swiftly can we receive data (technical capabilities)? Crucially, we shouldn't overload either the e-commerce stores or ourselves with our exchanges.

To grasp the infrastructure, we convened with representatives from the online stores, addressing pivotal details:

How do we procure data for each entity (via REST, brokers, files, etc.)?
In what format (JSON, XML, CSV, etc.)?
How fast?
How often can exchanges occur?

Design. Implementation choice. Architecture.

During the software development phase, we determined:

Integration methods.
Data upload modes.
Spoiler alert: there are two (full and delta).
Ways to validate companies to avoid transferring "junk" and duplicates to the new system.
How to transform "raw" data into CRM entities.

Mechanism: Extraction. Transformation. Loading.

In this section, I'll outline how we retrieved data and go into the technical aspects. To tackle the challenge, we needed to establish two modes of data retrieval:

Full extraction: containing all data from the current and preceding years.
Update extraction: periodically obtaining data about new companies from the online stores.

The data volume ended up being as follows:

In online store A: 75,000 contacts and 53,000 companies.
In online store B: 600,000 contacts and 125,000 companies.

When these figures are added, they won't match the final totals because deduplication was performed, and irrelevant companies were excluded. We opted for the ETL process as the basis for implementation. ETL stands for Extract, Transform, Load. Let's check out each stage in more detail.

Extract

This stage is likely the most challenging of the three we faced. At this point, we need to set up integration with the online stores to obtain "raw" data. It's essential to grasp that this data is only "raw" for the new CRM; for the online store, it represents the complete dataset in its database.

For the complete extraction:

Store A: MySQL slave replica (Symfony console command)
Store B: JSON ( Halaxa/JSON-machine + Symfony console command)

We've also added the Symfony Console component to our project to run complete extraction from MySQL and handle other background tasks.

For the delta extraction:

Store A: MySQL slave replica (cronjob + Symfony console command)
Store B: Kafka (PHP extension RdKafka + Supervisor + Symfony console command)

I'll focus specifically on Store B. They were using SAP CRM to store all their client data. SAP CRM had a web service capable of pushing its updates to Kafka. They provided us with a separate topic. On our end, we built a consumer, which required us to install the PHP extension RdKafka. This consumer ran in an infinite loop, so we used Supervisor to oversee it. If it crashed, Supervisor would restart it. We communicated with Supervisor using Symfony Console.

Transform

Now, it's crucial to clean and transform the data to align with the requirements of the business model. Here's what we did:

Removed companies without Tax Identification Numbers (TIN).
Converted encoding from Windows-1251 to UTF-8.
Standardized phone numbers into a standard format.
Eliminated extra spaces from text fields.
Organized contacts and companies into DTOs.

Load

Now that we have cleaned data, we can store it in the database as CRM entities, including companies and contacts.

This step was consistent across all three sources (database, Kafka, files), as we are consolidating everything into a single repository and thus need to ensure uniform organization.

Simultaneously, we validated the company, ensuring only active entities were included. We utilized the Unified State Register of Legal Entities service to meet this requirement, which provides the necessary company information. It's worth mentioning that Bitrix24 offers a pre-built module for accessing this data based on the Taxpayer Identification Number (TIN).

However, the data arrived in a "raw" state. Before querying the Unified State Register of Legal Entities, we needed to ensure the validity of the Taxpayer Identification Number (TIN). This verification process took around 0.4 seconds, slowing the extraction processing speed. Therefore, we implemented a standard check for the TIN's length and characters: it must consist of 10 or 12 digits. Additionally, we introduced a check for control numbers, which determines the accuracy of the TIN using a mathematical formula. This formula is standardized across all TINs.

Following the Unified State Register of Legal Entities validation, requests are sent only with valid Tax Identification Numbers, reducing the loading stage by 30–40%. Invalid companies encountered during the loading process were stored in a separate table.

Releases

First release: we used a shared loader to extract from online store A.

Second release: we released the delta extraction from online store B.

Third release: we fully extracted historical data from online store B.

In this manner, even during the development phase, we consistently supplied sales managers with fresh data.

Final Stack and Architecture

Throughout the project, we used the following tech stack:

B2B CRM: Bitrix 24.
Database: MySQL.
Message broker: Kafka.
ETL: PHP package flow-php/etl.
Handling large JSON files: PHP package halaxa/json-machine.
Reading from Slave Replica: symfony/console + cron.
Kafka Consumers: symfony/console + supervisor.

As we can see, we had three extractors: one shared for store A and two separate ones for store B (one for Kafka, the other for JSON). Two transformers—one for each store—produced identical DTOs and passed them to the loader. Then, the loader pushed everything into the B2B CRM.

Conclusion

Bitrix24 has been effectively customized. We've uploaded over 170,000 active companies and more than 264,000 contacts from both online stores. Sales managers now have access to an extensive client database from these two online stores, enabling them to efficiently work with existing clients and drive repeat sales using Bitrix24 CRM functionality. All reports and analytical data are now available with just one click, eliminating the need for spreadsheets in emails.

We accomplished our objectives thanks to the effective collaboration between the business and developers.

Daniel Solovev

Head of PHP Department

Combining two major E-commerce platforms on different stacks into a unified CRM.