DB comparison tool
Posted by Bill Ramos | On January 27th, 2017 | In Database Migration, Database Upgrades | Tags: data migration, Database Compare Suite, database migration, DB2, MySQL, Netezza, oracle, PostgreSQL, Redshift, sql server, sybase, Teradata
Many of us regularly use Database Compare Suite — a tool developed by DB Best — for database migration and upgrade projects. This powerful and handy software solution helps confirm successful database migrations by verifying that the data made it to the destination table.
Customers reported to us that comparing data between two huge tables simply took too long just to see if they were different. We listened and did something about it. This latest version of Database Compare Suite introduces a new feature called Fast Data Comparison to solve this problem.
Data comparison has always been the core of the Database Compare Suite product. Originally, we included two data comparison operations in this application:
- Comparing Data. A basic operation that displays the number of differences in two tables but does not show where they exactly are.
- Comparing Data in Detail. Advanced operation that can show exactly which rows are different.
But they both were not suitable for faster comparison of big data sets in data warehouses.
So, we are proud to introduce the newest Fast Data Comparison feature capable of comparing the tables with millions of rows in just minutes, not hours or days.
Check the following video to learn more about how the Fast Data Comparison operation works.
Background on Fast Data Comparison
Let’s explore how this feature works and discover why the Fast Data Comparison operation is actually so fast. First, you should remember that the Fast Data Comparison can only tell whether the tables are equal or not. It cannot tell the number of differences or show exactly where they are found.
Fast Data Comparison operation uses a checksum algorithm with a hash function. Fast Data Comparison calculates the hash value for each compared table using a specific hash function. Then it compares these hash values, but not the tables’ data row by row as in the regular Data Comparison operation.
This is why the Fast Data Comparison operation delivers the result so quickly. The checksum is calculated directly on the database server. The execution speed of this operation doesn’t depend on the speed of internet connection and the overall performance of your computer. Whereas, the regular and detailed data comparison operations are handled on the client computer running Database Compare Suite.
So, what does this mean really in terms of performance? We tested Fast Data Comparison and the regular Data Comparison operations on a table with 600 million rows. The first operation finished in just over an hour, and the second one lasted for 2 weeks! This is a 330x boost in performance!
For smaller tables the difference is not as spectacular. In our test comparing two tables with 1 million rows, Fast Data Comparison completed in 11 seconds versus 70 seconds for a 7x boost in performance.
Fast Data Comparison Limitations
However, the amazing Fast Data Comparison operation has some restrictions. It’s available only for a limited set of source and target database environment combinations. You can use Fast Data Comparison for all homogenous operations (we mean comparing two databases in the same environment). Also, the impressive Fast Data Comparison feature is supported for the following heterogeneous data comparison operations:
- Amazon Redshift — Oracle;
- Amazon Redshift — Teradata;
- Oracle — Microsoft SQL Server.
The option becomes unavailable for other source and targets.
Calculating the checksum, one should keep in mind that some data fields can be excluded from comparison or partially truncated. For example, date or time columns can be excluded from the comparison operation or maximum length of string and binary columns can be set, what will be considered during checksum calculation. These preferences are to be managed in the operation settings page.
Database Compare Suite now supports 9 database environments. They include Amazon Redshift, Microsoft Azure SQL Database, IBM DB2, Microsoft SQL Server, MySQL, Oracle, PostgreSQL, Sybase ASE. This means that there are 45 pairs of databases available for homogenous and heterogeneous database operations (schema and data comparison, migration and synchronization).
We are going to expand the number of supported databases by adding Greenplum to the current list of 9 database environments in the next release. So, stay tuned for our updates of this awesome product for database administrators and developers.
The Database Compare Suite tool is the DB Best’s ultimate solution for comparing and synchronizing schemas and data along with many migration possibilities between homogeneous and heterogeneous database environments.