Back to blog home

A guide to using the Spryker DataImport module

In an earlier post we explored some of the similarities and differences between Spryker Commerce and Magento, one of the world’s most popular ecommerce platforms. Here, in our first Spryker tutorial, Andrey Astakhov talks us through the Spryker DataImport module.

Simplifying data migration to Spryker

The attraction of moving to a commerce framework like Spryker lies in the ability to innovate quickly without having to tear-up your existing systems. But, as with any ecommerce platform move, migrating data to your new Spryker platform can be time-consuming and complex.

Thankfully, Spryker has a module that’s designed to simplify the process. The DataImport module allows you to upload data from existing external sources – such as CSV files or a product information management (PIM) system – to your online store.

Like all other Spryker modules, the module follows the principles of package design and communicates with other modules via internal APIs. It uses a console command as its entry point for importing data from CSV files to the database.  

The DataImport module has been invaluable in helping our clients, such as German office supplies company Certeo, to import data to their new Spryker stores. Like any other ecommerce tool, it has its pros and cons, although most of the cons can be addressed by making tweaks to your project code.

In this tutorial, I’ll explain the concepts and structure of the DataImport module, and show you how to use and extend it in your ecommerce projects.

Installing Spryker DataImport

If your web shop is based on Demoshop, then you already have an installed DataImport module. 

If not, run this command: composer require spryker/data-import

Module classes and interfaces

All projects are different. For this reason Spryker DataImport module provides a very generic solution that allows you to customise data imports in your project code.

The main building blocks of DataImport module are as follows:

  • Data reader
  • Import step
  • Step broker
  • Import hooks

 

The core of the module is the DataImporter class. The idea behind DataImporter is to retrieve import data using a data reader and process data with steps that are combined in brokers. Hooks allow you to apply additional operations before and / or after import.

Data reader

The DataImport module comes with CsvReader class. If your data is not stored in CSV format, it’s a good idea to implement a data reader in your project, which provides a way to sequentially get data from a data source. Later on I’ll give an example of a reader for XML files.

The following diagram shows two important interfaces: DataReaderInterface and ConfigurableDataReaderInterface.

Spryker diagram

Now let’s take a look at the purpose of each interface:

Interface Purpose
DataReaderInterface Allows you to read data from the data source
ConfigurableDataReaderInterface Allows you to configure data reader with an instance of DataImporterReaderConfigurationTransfer 

Data steps

Data step is an operation on a single entry of imported data. Inserting or updating product data into database is a typical example of a data step.

As you can see here, the Spryker DataImport module provides you with three interfaces:

Spryker

Again, let’s explore the respective purposes of those interfaces:

Interface Purpose
DataImportStepInterface

Executes an operation on data. Examples: persist (store) product information to product tables; store category data to database

DataImportStepBeforeExecuteInterface Performs operations before processing a line of imported data
DataImportStepAfterExecuteInterface

Performs operations after processing a line of imported data. Example: touch updated entities

 

In your project you’ll probably implement many data steps. The most important steps will be those that write data to a database. By convention they are named Writers.

TouchAwareStep

The DataImport module contains one implementation of a step. TouchAwareStep is a helper class that can be used in your project to simplify a communication with the Touch module. The main idea of this step is to send identifiers of modified entities to TouchFacade using bulk update.  

Demoshop data steps

Demoshop is an example of an ecommerce application based on the Spryker framework. Take a look at how it implements data steps and use it as inspiration for your project .

In the Demoshop you can find several data steps:

Class Purpose
AddLocalesStep Fetches locales from the database and adds their identifiers and names to a processed dataset. It allows other data steps to reuse locale information without querying database
LocalizedAttributesExtractorStep Groups attributes from a dataset by locale

 

Since data steps may get data from a database or perform other time- and computer-resource-intensive operations, it makes sense to fetch some data only once and cache it for later reuse.

One of the possible ways to cache data is to store it directly to a dataset. As you can see in AddLocalesStep, it fetches locales for the very first record from a data reader and saves locales to the dataset:

// src/Pyz/Zed/DataImport/Business/Model/Locale/AddLocalesStep.php

class AddLocalesStep implements DataImportStepInterface
{
   public function execute(DataSetInterface $dataSet)
   {
       if (empty($this->locales)) {
           $localeEntityCollection = SpyLocaleQuery::create()
               ->filterByLocaleName($this->availableLocales, Criteria::IN)
               ->find();

           foreach ($localeEntityCollection as $localeEntity) {
               $this->locales[$localeEntity->getLocaleName()] = $localeEntity->getIdLocale();
           }
       }

       $dataSet[static::KEY_LOCALES] = $this->locales;
   }

Data step broker

Data step broker is nothing more than a collection of data steps with an execute method that calls all embedded data steps:

Spryker diagram

In this way you can create a new step broker as detailed below:

$dataSetStepBroker = new DataSetStepBroker();
$dataSetStepBroker
   ->addStep(new AddLocalesStep())
   ->addStep(new LocalizedAttributesExtractorStep([]));

DataSetStep

DataSetStepBrokerTransactionAware wraps data modifications in transactions. This data step broker opens transactions on the beginning and commits portions of data. 

Data import hooks

Data import hook is a data operation that the Spryker DataImporter executes before or after an import.

In the Spryker core DataImport module you’ll find two interfaces:

Spryker diagram

Interface Purpose
DataImporterBeforeImportInterface Runs before data import 
DataImporterAfterImportInterface

Runs after data import. Example: call touch engine for modified data after import

How DataImporter runs data steps and hooks 

At the beginning of a data import, DataImporter iterates before-import data hooks, then goes through the available data step brokers and finalises the import by running after-import hooks:

Spryker

Below: a simplified class diagram of the DataImport module

Spryker diagram

Using DataImport in a project

DataImport is a tiny framework that gives you basic functionality to implement data import in your project. 

Create a simple data importer

The following steps explain how to create and register a straightforward importer with a single data step.

The first thing to do in your project is to create a data step class inside a Pyz\Zed\DataImport\Business\Model namespace and implement a data step inherited from DataImportStepInterface.


// src/Pyz/Zed/DataImport/Business/Model/YourWriterStep.php

namespace Pyz\Zed\DataImport\Business\Model;

class YourWriterStep implements DataImportStepInterface
{
   public function execute(DataSetInterface $dataSet): void
   {
       // process $dataSet and persist data
   }
}

Additionally, you can implement DataImportStepBeforeExecuteInterface and / or DataImportStepAfterExecuteInterface in the very same or in a different class.

Now that you have a data step, create a data step broker and a new importer in your factory:

// src/Pyz/Zed/DataImport/Business/DataImportBusinessFactory.php

namespace Pyz\Zed\DataImport\Business;

class DataImportBusinessFactory extends SprykerDataImportBusinessFactory
{
   protected function createNewDataStep(): YourWriterStep
   {
       return new YourWriterStep();
   }
  
   protected function getNewDataStepBroker(): DataSetStepBrokerInterface
   {
       $dataSetStepBroker = $this->createTransactionAwareDataSetStepBroker();
       $dataSetStepBroker->addStep($this->createNewDataStep());
       return $dataSetStepBroker;
   }

   protected function createNewImporter(): DataImporterInterface
   {
       $dataImporter = $this->getCsvDataImporterFromConfig($this->getConfig()->getDataImporterConfiguration());
       $dataSetStepBroker = $this->getNewDataStepBroker();
       $dataImporter->addDataSetStepBroker($dataSetStepBroker);
       return $dataImporter;
   }
}

Last, but not least, register you new data importer in a collection of importers:

// src/Pyz/Zed/DataImport/Business/DataImportBusinessFactory.php

public function getImporter(): DataImporterInterface
{
   $dataImporterCollection = $this->createDataImporterCollection();
   $dataImporterCollection->addDataImporter($this->createNewImporter());

   return $dataImporterCollection;
}

How to add new data reader

If your data is not in CSV files, but has another format such as XML files, implement your XmlReader as shown in this example:

// src/Pyz/Zed/DataImport/Business/Model/DataReader/XmlDataReader.php

namespace Pyz\Zed\DataImport\Business\Model\DataReader;

use SimpleXMLIterator;
use Spryker\Zed\DataImport\Business\Model\DataReader\DataReaderInterface;
use Spryker\Zed\DataImport\Business\Model\DataSet\DataSet;
use Spryker\Zed\DataImport\Business\Model\DataSet\DataSetInterface;

class XmlDataReader implements DataReaderInterface
{
   /*
    * SimpleXMLIterator
    */
   private $iterator;

   public function __construct(string $filename)
   {
       $this->iterator = new SimpleXMLIterator(file_get_contents($filename));
   }

   public function current(): DataSetInterface
   {
       $data = $this->iterator->current();

       return new DataSet($data);
   }

   public function next()
   {
       $this->iterator->next();
   }

   public function key()
   {
       return $this->iterator->key();
   }

   public function valid()
   {
       return $this->iterator->valid();
   }

   public function rewind()
   {
       $this->iterator->rewind();
   }
}

And then change how the data reader is created in a factory:

// src/Pyz/Zed/DataImport/Business/DataImportBusinessFactory.php

protected function createGlossaryImporter()
{
   $dataImporterConfigurationTransfer = $this->getConfig()->getGlossaryDataImporterConfiguration();
   $xmlReader = new XmlDataReader($dataImporterConfigurationTransfer->getReaderConfiguration()->getFileName());
  
   $dataImporter = $this->createDataImporter($dataImporterConfigurationTransfer->getImportType(), $xmlReader);
  
   // create data step broker here
  
   return $dataImporter;
}

How to output import progress to console

If your data import takes longer than a couple of seconds and you need to visualise how the import process is progressing, you’ll need to print some intermediate result in the meantime.

One of the options for doing this is to log messages and show them in the console output. The Spryker Log module, which uses the well-known Monolog library, is your starting point to start logging.

First, create a LoggerConfig class that implements LoggerConfigInterface from Log module.

// src/Pyz/Zed/DataImport/Business/Logger/LoggerConfig.php

namespace Pyz\Zed\DataImport\Business\Logger;

use Monolog\Handler\HandlerInterface;
use Monolog\Handler\StreamHandler;
use Monolog\Logger;
use Spryker\Shared\Log\Config\LoggerConfigInterface;

class LoggerConfig implements LoggerConfigInterface
{
   public function getChannelName(): string
   {
       return 'DataImport';
   }

   /**
    * @return \Monolog\Handler\HandlerInterface[]
    */
   public function getHandlers(): array
   {
       return [$this->createStreamHandler()];
   }

   /**
    * @return \callable[]
    */
   public function getProcessors(): array
   {
       return [];
   }

   protected function createStreamHandler(): HandlerInterface
   {
       return new StreamHandler('php://stdout', Logger::INFO);
   }
}

Then use LoggerTrait to output message from data import step:

// src/Pyz/Zed/DataImport/Business/Model/YourWriterStep.php

namespace Pyz\Zed\DataImport\Business\Model;

class YourWriterStep implements DataImportStepInterface
{
   use LoggerTrait;

   public function execute(DataSetInterface $dataSet): void
   {
       $this->getLogger(new LoggerConfig())->info('Start import...');

       // process data set
   }
}

You might need to change the standard Monolog output format to print pretty messages with timestamps:

// src/Pyz/Zed/DataImport/Business/Logger/LoggerConfig.php;

protected function createStreamHandler(): HandlerInterface
{
   $handler = new StreamHandler('php://stdout', Logger::INFO);

   $formatter = new LineFormatter("\033[1;30m%datetime%\e[0m %message%" . PHP_EOL, 'H:i:s');
   $handler->setFormatter($formatter);

   return $handler;
}

How to use the DataImport console command

The DataImport module provides a console command to execute import:

vendor/bin/console data:import

By default, it runs every registered importer, even if some of the steps fail. As a result, you can see the following output at the end, without knowing the exact cause of the failure:

Importer type: category

Importable DataSets: 379

Imported DataSets: 0

Import status: Failed

To discover what’s caused an import to fail you’ll need to run the import console command with -t option:

vendor/bin/console data:import -t

Enabling this option will stop the data import immediately and will print an error message with full trace information.

Apart from that, you can use the console command to import a particular data subset using offset and limit. This command will import the first ten categories:

vendor/bin/console data:import:category -o 1 -l 10

Suggestions for improving DataImport

The following three points address some issues I encountered working with the DataImport module, along with some suggestions for how Spryker developers can improve the module going forward:

1. Batch processing

One of the biggest issues with DataImport is the fact that it does not allow you to work with batches of import data; data steps can only access a single data item despite the fact that batch operations are more effective, especially If you work with a huge amount of data. 

In my opinion, developers needs something like DataImportStepInterface::executeForCollection with a configurable size of collection passed as an argument of this method. 

2. Communicating import progress

Another challenge is that the Spryker DataImport module does not provide a simple way to communicate with a console. Demoshop import console command does not show any progress information and prints import results at the end of execution.

We’ve already discussed one possible solution for writing to console using logger. Another possible option is to use EventDispatcher.

3. More examples for other import data sources 

DataImport is so csv-oriented that it seems to be designed mostly for import from CSV files. You can see it in DataImporterReaderConfigurationTransfer where almost all fields relate to the specific file format: csvDelimiter, csvEnclosure, etc.  

Wrapping up

As we’ve explored in this guide, the DataImport module is a useful way to simplify the process of importing data to your Sryker online shop. Due to an architecture that follows SOLID principles, it’s possible to configure a data import from any datasource, either from static files, or a PIM solution such as Akeneo

For these reasons the Spryker DataImport module is flexible, capable of meeting the requirements of nearly every Spryker project, and is well worth checking out.

Proven to perform in the most complex and demanding environments, Spryker Commerce separates frontend apps and backend capabilities, so that retailers, multi-channel players, or marketplaces can very quickly adapt to changing consumer behaviours and markets. Talk to an ecommerce consultant today to explore if Spryker could be right for you.

Related reading