ETL Tool Selection

While evaluating ETL tools, the following characteristics should be looked for:

Functional capability & complexity of data: This includes both the ‘transformation’ piece and the ‘cleansing’ piece. In general, the typical ETL tools are either geared towards having strong transformation capabilities or having strong cleansing capabilities, but they are seldom very strong in both. As a result, if you know your data is going to be dirty coming in, make sure your ETL tool has strong cleansing capabilities. If you know there are going to be a lot of different data transformations, it then makes sense to pick a tool that is strong in transformation. The more complex the data transformation / data cleansing requirements, the more suitable it is to purchase an ETL tool. However, if data transformations are simple, or much data cleansing is not required, then it may be sufficient to simply build the ETL routine from scratch.

Ability to read directly from your data source: For each organization, there is a different set of data sources. Make sure the ETL tool you select can connect directly to your source data / legacy databases. Ideally, the ETL tool selected should be able to read data from flat files, XML files, Excel files, Ideation services and some common databases like Oracle, SQL Server, MS Access, Sybase, IBM DB2, MySQL and PostgreSQL. It should also support connection to databases using ODBC and JDBC connectors.

Speed / performance & data volume: Available commercial tools typically have features that can speed up data movement. If the data volume ranges by only a few gigabytes, then an open source or even an in-build ETL process is sufficient. But if the data volume is more than 10 gigabytes or is in terabytes, then a commercial ETL tool with built-in performance boosting features should be used.

Metadata support: The ETL tool plays a key role in your metadata because it maps the source data to the destination, which is an important piece of the metadata. In fact, some organizations have come to rely on the documentation of their ETL tool as their metadata source. As a result, it is very important to select an ETL tool that works with your overall metadata strategy.

Cost: While deciding on an ETL tool, its total cost of ownership must be taken into account. This includes cost of licenses, cost of support, training and consultation from the vendor. Some tools like Informatica Power Center and Oracle Warehouse Builder require separate medium to high-end servers to be deployed. The cost of hardware required for the deployment of ETL tools should also be accounted for when calculating the total cost.

Learning curve / implementation time: This includes the time in which users can make themselves familiar with the OEM application development and start using it productively.

Vendor history & support: Should you decide to purchase an existing third-party ETL tool, you must then decide which one to buy. Often, there are a number of choices to pick from; some are well-known while others are not as well-known. When buying a third-party tool, you should consider the history of the tool vendor, its market standing, product reviews and support provided.

Processing your request, Please wait....