org.afcs.warts.db
Class DataComparison

java.lang.Object
  extended byorg.afcs.warts.db.DataComparison
All Implemented Interfaces:
TabularData

public final class DataComparison
extends java.lang.Object
implements TabularData

A DataComparison instance analyses and stores the differences between two datasets. Datasets can only be compared if the tables from which they were loaded are "equivalent", which means that they have the same columns, even if they are in different schemas.

The first step in comparing datasets is to build a list containing the union of the primary keys in the first and second data sets. We then iterate through these keys. If a row exists in the first dataset with the key but not in the second, then that row is marked as a required insertion. If a row in the second dataset matches but not the first, then that row is marked as a required deletion.

If the row exists in both datasets, then we iterate through all of the columns looking for differences. If one of the values is null and one isn't, then that's an obvious difference. If both values are non-null and are instances of DataHighBitAnalysis, then we compare the strings contained in the analysis. If the 'compare encoding' flag was set to true at initialisation, then we also compare the 'data class' calculated by the analysis. For any other data type, we simply use Object.equals(java.lang.Object) to compare them.

LICENSE: This code is released to the public domain and may be used for any purpose whatsoever without permission or acknowledgment.

Version:
Last Modified 17 September 2003
Author:
Warren Hedley ( whedley at sdsc dot edu )

Field Summary
static int ROW_DELETION
          This may be returned from getRowClassification(int) to indicate that the specified row requires deletion (i.e., was in the second dataset but not in the first).
static int ROW_INSERTION
          This may be returned from getRowClassification(int) to indicate that the specified row requires insertion (i.e., was in the first dataset but not in the second).
static int ROW_UNCHANGED
          This may be returned from getRowClassification(int) to indicate that all columns in the specified row have identical values in both datasets.
static int ROW_UPDATED
          This may be returned from getRowClassification(int) to indicate that the specified row was in both datasets but does not have identical values in all non-primary key columns (although some differences may just be encoding differnces).
 
Constructor Summary
DataComparison(DataSet setOne, DataSet setTwo)
          Initialises a comparison of the two data sets.
DataComparison(DataSet setOne, DataSet setTwo, boolean compareEncoding)
          Initialises a comparison of the two data sets.
DataComparison(DataSet setOne, DataSet setTwo, boolean[] compareColumns, boolean compareEncoding)
          Initialises a comparison of the two data sets, optionally excluding comparison of certain columns, and optionally ignoring encoding differences.
 
Method Summary
 boolean foundDifferences()
          Returns true if differences were found between the datasets provided at initialisation.
 java.util.List getAllColumnValues(int rowIndex)
          Returns a list containing the values of both the primary key columns and non primary columns for the specified row.
 TableDescription getDestTableDescription()
          Returns the table description for the second dataset, which also provides access to account information.
 java.util.List getNonPrimaryKeyColumnValues(int rowIndex)
          Returns an unmodifiable list containing the non-primary key column values for the specified row.
 int getNumRows()
          Returns the total number of rows in the comparison, which is defined by the union of the primary keys from the first and second datasets.
 int getNumRowsDeleted()
          Returns the number of rows that were in dataset one and not in dataset two, based on the presence of primary keys.
 int getNumRowsInserted()
          Returns the number of rows that were in dataset two and not in dataset one, based on the presence of primary keys.
 int getNumRowsUnchanged()
          Returns the number of rows that were in both datasets (based on the presence of identical primary keys) with unchanged non-primary key column values (considering only the columns that were compared).
 int getNumRowsUpdated()
          Returns the number of rows that were in both datasets (based on the presence of identical primary keys) with changed non-primary key column values (considering only the columns that were compared), where these changes might be just encoding differences.
 int getNumValuesInColumnUpdated(int columnIndex, boolean justEncodingDifferences)
          Returns the number of values in the columnIndex'th non-primary key column that differed between the source and destination data sets.
 java.util.List getPrimaryKeyColumnValues(int rowIndex)
          Returns an unmodifiable list containing the primary key column values for the specified row.
 int getRowClassification(int rowIndex)
          Returns the row classification (which will be equal to one of the ROW constants defined in this class) for the specified row.
 TableDescription getSourceTableDescription()
          Returns the table description for the first dataset, which also provides access to account information.
 TableDescription getTableDescription()
          Returns the table description for the first dataset, which also provides access to account information.
 java.lang.Object getValueAt(int rowIndex, int columnIndex)
          Returns the object stored during the comparison of the data at the specified row and column indices.
 boolean rowHasEncodingDifferences(int rowIndex)
          Returns true if the specified row contains differences between the two datasets that are just encoding differences.
 void rowSynchronized(int rowIndex)
          This method is called by DataSynchronizationAction after synchronization has been performed on the specified row of the and it marks the row as unchanged, replacing any CellDifference instances in the row with the new value.
 java.lang.String toString()
          Returns a text description of the current instance that can be used for debugging purposes.
 boolean wasColumnCompared(int columnIndex)
          Returns true if the values in the columnIndex'th non-primary key column of the source and destination data sets were compared during analysis.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

ROW_INSERTION

public static final int ROW_INSERTION
This may be returned from getRowClassification(int) to indicate that the specified row requires insertion (i.e., was in the first dataset but not in the second).

See Also:
Constant Field Values

ROW_DELETION

public static final int ROW_DELETION
This may be returned from getRowClassification(int) to indicate that the specified row requires deletion (i.e., was in the second dataset but not in the first).

See Also:
Constant Field Values

ROW_UPDATED

public static final int ROW_UPDATED
This may be returned from getRowClassification(int) to indicate that the specified row was in both datasets but does not have identical values in all non-primary key columns (although some differences may just be encoding differnces).

See Also:
Constant Field Values

ROW_UNCHANGED

public static final int ROW_UNCHANGED
This may be returned from getRowClassification(int) to indicate that all columns in the specified row have identical values in both datasets.

See Also:
Constant Field Values
Constructor Detail

DataComparison

public DataComparison(DataSet setOne,
                      DataSet setTwo)
Initialises a comparison of the two data sets. All columns will be compared and encoding comparisons will be performed for all textual data.

Parameters:
setOne - The first data set to look at.
setTwo - The second data set to look at.
Throws:
java.lang.IllegalArgumentException - If the two datasets cannot be compared because the table descriptions associated with each dataset are not equivalent.
java.lang.NullPointerException - If either argument is null.

DataComparison

public DataComparison(DataSet setOne,
                      DataSet setTwo,
                      boolean compareEncoding)
Initialises a comparison of the two data sets. All columns will be compared and encoding comparisons will be performed for all textual data.

Parameters:
setOne - The first data set to look at.
setTwo - The second data set to look at.
compareEncoding - Whether to compare the encoding of textual data.
Throws:
java.lang.IllegalArgumentException - If the two datasets cannot be compared because the table descriptions associated with each dataset are not equivalent.
java.lang.NullPointerException - If either argument is null.

DataComparison

public DataComparison(DataSet setOne,
                      DataSet setTwo,
                      boolean[] compareColumns,
                      boolean compareEncoding)
Initialises a comparison of the two data sets, optionally excluding comparison of certain columns, and optionally ignoring encoding differences.

Parameters:
setOne - The first data set to look at.
setTwo - The second data set to look at.
compareColumns - An array with a flag for each non-primary key column (the array must be that length) indicating whether comparisons should be run for that column. If null, all columns will be compared.
compareEncoding - Whether to compare the encoding of textual data.
Throws:
java.lang.IllegalArgumentException - If the two datasets cannot be compared because the table descriptions associated with each dataset are not equivalent.
java.lang.NullPointerException - If either argument is null.
Method Detail

getValueAt

public java.lang.Object getValueAt(int rowIndex,
                                   int columnIndex)

Returns the object stored during the comparison of the data at the specified row and column indices. The row index references into a row list that is the union of all primary keys in both data sets, sorted in primary key order. The columns are stored in the same order as the table description.

If the specified row is an insertion, then the value returned will be from the first dataset. If the specified row is a deletion, then the value returned will be from the second dataset. A CellDifference instance will be returned if the indices reference a non-primary key column value that differed between the two datasets received at initialisation.

Specified by:
getValueAt in interface TabularData
Parameters:
rowIndex - The row to return data from.
columnIndex - The column to return data from.
Returns:
The object stored during the comparison of the data at the specified row and column indices.
Throws:
java.lang.IllegalArgumentException - If either index is out of bounds.

getAllColumnValues

public java.util.List getAllColumnValues(int rowIndex)
Returns a list containing the values of both the primary key columns and non primary columns for the specified row. The list is constructed in this method and no internal reference is kept, so it may be modified with impunity.

Parameters:
rowIndex - The row to return the list of primary key columns for.
Returns:
A list containing the values of both the primary key columns and non primary columns for the specified row.

getPrimaryKeyColumnValues

public java.util.List getPrimaryKeyColumnValues(int rowIndex)
Returns an unmodifiable list containing the primary key column values for the specified row.

Parameters:
rowIndex - The row to return the list of primary key columns for.
Returns:
An unmodifiable list containing the primary key column values for the specified row.

getNonPrimaryKeyColumnValues

public java.util.List getNonPrimaryKeyColumnValues(int rowIndex)
Returns an unmodifiable list containing the non-primary key column values for the specified row.

Parameters:
rowIndex - The row to return the list of primary key columns for.
Returns:
An unmodifiable list containing the non-primary key column values for the specified row.

foundDifferences

public boolean foundDifferences()
Returns true if differences were found between the datasets provided at initialisation.

Returns:
True if differences were found between the datasets provided at initialisation.

getNumRows

public int getNumRows()
Returns the total number of rows in the comparison, which is defined by the union of the primary keys from the first and second datasets.

Specified by:
getNumRows in interface TabularData
Returns:
The total number of rows in the comparison.

getNumRowsDeleted

public int getNumRowsDeleted()
Returns the number of rows that were in dataset one and not in dataset two, based on the presence of primary keys.

Returns:
The number of rows that were in dataset one and not in dataset two.

getNumRowsInserted

public int getNumRowsInserted()
Returns the number of rows that were in dataset two and not in dataset one, based on the presence of primary keys.

Returns:
The number of rows that were in dataset two and not in dataset one.

getNumRowsUnchanged

public int getNumRowsUnchanged()
Returns the number of rows that were in both datasets (based on the presence of identical primary keys) with unchanged non-primary key column values (considering only the columns that were compared).

Returns:
The number of rows that were in both datasets with unchanged non-primary key column values.

getNumRowsUpdated

public int getNumRowsUpdated()
Returns the number of rows that were in both datasets (based on the presence of identical primary keys) with changed non-primary key column values (considering only the columns that were compared), where these changes might be just encoding differences.

Returns:
The number of rows that were in both datasets with changed non-primary key column values.

getRowClassification

public int getRowClassification(int rowIndex)
Returns the row classification (which will be equal to one of the ROW constants defined in this class) for the specified row.

Parameters:
rowIndex - The row to examine.
Returns:
The row classification for the specified row.
Throws:
java.lang.IllegalArgumentException - If rowIndex is out of range.

rowHasEncodingDifferences

public boolean rowHasEncodingDifferences(int rowIndex)
Returns true if the specified row contains differences between the two datasets that are just encoding differences.

Parameters:
rowIndex - The row to examine.
Returns:
True if the specified row contains differences between the two datasets that are just encoding differences.
Throws:
java.lang.IllegalArgumentException - If rowIndex is out of range.

wasColumnCompared

public boolean wasColumnCompared(int columnIndex)
Returns true if the values in the columnIndex'th non-primary key column of the source and destination data sets were compared during analysis.

Parameters:
columnIndex - The index of the column to compare (primary key columns are never compared, so indexing covers the non-primary key columns only).
Returns:
True if the values in the columnIndex'th non-primary key column of the source and destination data sets were compared during analysis.
Throws:
java.lang.IllegalArgumentException - If columnIndex is out of range.

getNumValuesInColumnUpdated

public int getNumValuesInColumnUpdated(int columnIndex,
                                       boolean justEncodingDifferences)
Returns the number of values in the columnIndex'th non-primary key column that differed between the source and destination data sets.

Parameters:
columnIndex - The index of the column to return results for (primary key columns are never compared, so indexing covers the non-primary key columns only).
justEncodingDifferences - If true, only the number of encoding differences is returned.
Returns:
The number of values in the columnIndex'th non-primary key column that differed between the source and destination data sets.
Throws:
java.lang.IllegalArgumentException - If columnIndex is out of range.

getTableDescription

public TableDescription getTableDescription()
Returns the table description for the first dataset, which also provides access to account information. This method is needed to implement the TabularData interface.

Specified by:
getTableDescription in interface TabularData
Returns:
The table description for the first dataset.

getSourceTableDescription

public TableDescription getSourceTableDescription()
Returns the table description for the first dataset, which also provides access to account information.

Returns:
The table description for the first dataset.

getDestTableDescription

public TableDescription getDestTableDescription()
Returns the table description for the second dataset, which also provides access to account information.

Returns:
The table description for the second dataset.

rowSynchronized

public void rowSynchronized(int rowIndex)
This method is called by DataSynchronizationAction after synchronization has been performed on the specified row of the and it marks the row as unchanged, replacing any CellDifference instances in the row with the new value. It is assumed that all text values were inserted into the second database as UTF-8, so where the source was Latin-1, this may be marked as an encoding difference.

Parameters:
rowIndex - The row to mark as synchronized.
Throws:
java.lang.IllegalArgumentException - If rowIndex is out of range.

toString

public java.lang.String toString()
Returns a text description of the current instance that can be used for debugging purposes.

Returns:
A text description of the current instance that can be used for debugging purposes.