org.afcs.warts.db
Class TableHighBitAnalysis

java.lang.Object
  extended byorg.afcs.warts.db.TableHighBitAnalysis
All Implemented Interfaces:
TabularData

public final class TableHighBitAnalysis
extends java.lang.Object
implements TabularData

The TableHighBitAnalysis class is responsible for compiling a report on the non-ascii characters (or "high bits") in an entire table's worth of data. An instance is constructed via findHighBits(DataSet) and essentially wraps the DataSet received, providing access to the source data and table description, so that references to the original DataSet are no longer needed.

LICENSE: This code is released to the public domain and may be used for any purpose whatsoever without permission or acknowledgment.

Version:
Last Modified 17 September 2003
Author:
Warren Hedley ( whedley at sdsc dot edu )

Field Summary
static int PROP_2_BYTE_UTF_8_CHAR
          Used to obtain a count of 2 byte UTF-8 characters or rows containing 2 byte UTF-8 characters in a specific column.
static int PROP_3_BYTE_UTF_8_CHAR
          Used to obtain a count of 3 byte UTF-8 characters or rows containing 3 byte UTF-8 characters in a specific column.
static int PROP_AMBIGUOUS_BYTE
          Used to obtain a count of ambiguous bytes or rows containing ambiguous bytes in a specific column.
static int PROP_ILLEGAL_BYTE
          Used to obtain a count of illegal bytes or rows containing illegal bytes in a specific column.
static int PROP_LATIN_1_CHAR
          Used to obtain a count of Latin-1 characters or rows containing Latin-1 characters in a specific column.
static int PROP_OVERSIZED
          Used to obtain a count the oversized rows in a specific column.
 
Method Summary
 boolean containsNonAsciiBytes(int columnIndex)
          Returns true if the specified column contains any non-ascii bytes.
static TableHighBitAnalysis findHighBits(DataSet tableData)
          Constructs a new instance containing the summataion of all any non-ascii character analysis objects contained in the specified data set.
 boolean foundNonAsciiBytes()
          Returns true if non-ascii character analysis found any non-ascii bytes in the table.
 int getByteAnalysisProperty(int columnIndex, int propertyIndex)
          Returns the specified property of the byte-wise analysis of the specified column.
 int getNumRows()
          Returns the number of rows in hte dataset analysed by this instance.
 int getRowAnalysisProperty(int columnIndex, int propertyIndex)
          Returns the specified property of the row-wise analysis of the specified column.
 TableDescription getTableDescription()
          Returns the description of the table from which the dataset analysed by this instance was loaded.
 java.lang.Object getValueAt(int rowIndex, int columnIndex)
          Returns the value at the specified row and column from the dataset analysed by this instance.
 boolean isColumnAnalysed(int columnIndex)
          Returns true if the specified column contains character data and is thus analysed.
 java.lang.String toString()
          Returns a text description of the current instance that can be used for debugging purposes.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PROP_LATIN_1_CHAR

public static final int PROP_LATIN_1_CHAR
Used to obtain a count of Latin-1 characters or rows containing Latin-1 characters in a specific column.

See Also:
Constant Field Values

PROP_2_BYTE_UTF_8_CHAR

public static final int PROP_2_BYTE_UTF_8_CHAR
Used to obtain a count of 2 byte UTF-8 characters or rows containing 2 byte UTF-8 characters in a specific column.

See Also:
Constant Field Values

PROP_3_BYTE_UTF_8_CHAR

public static final int PROP_3_BYTE_UTF_8_CHAR
Used to obtain a count of 3 byte UTF-8 characters or rows containing 3 byte UTF-8 characters in a specific column.

See Also:
Constant Field Values

PROP_AMBIGUOUS_BYTE

public static final int PROP_AMBIGUOUS_BYTE
Used to obtain a count of ambiguous bytes or rows containing ambiguous bytes in a specific column. See DataHighBitAnalysis for a definition of ambiguity in this case.

See Also:
Constant Field Values

PROP_ILLEGAL_BYTE

public static final int PROP_ILLEGAL_BYTE
Used to obtain a count of illegal bytes or rows containing illegal bytes in a specific column. See DataHighBitAnalysis for a definition of an illegal byte.

See Also:
Constant Field Values

PROP_OVERSIZED

public static final int PROP_OVERSIZED
Used to obtain a count the oversized rows in a specific column. An oversized row contains a value encoded using Latin-1 that would overflow if the string was encoded using UTF-8 and the column had a fixed size in bytes.

See Also:
Constant Field Values
Method Detail

findHighBits

public static TableHighBitAnalysis findHighBits(DataSet tableData)
Constructs a new instance containing the summataion of all any non-ascii character analysis objects contained in the specified data set.

Parameters:
tableData - The data set to analyse.
Returns:
A new instance containing the analysis of the specified dataset.
Throws:
java.lang.NullPointerException - If tableData is null.

foundNonAsciiBytes

public boolean foundNonAsciiBytes()
Returns true if non-ascii character analysis found any non-ascii bytes in the table.

Returns:
True if non-ascii character analysis found any non-ascii bytes in the table.

containsNonAsciiBytes

public boolean containsNonAsciiBytes(int columnIndex)
Returns true if the specified column contains any non-ascii bytes.

Parameters:
columnIndex - The column to enquire about.
Returns:
True if the specified column contains any non-ascii bytes.
Throws:
java.lang.IllegalArgumentException - If columnIndex is out of bounds.

getByteAnalysisProperty

public int getByteAnalysisProperty(int columnIndex,
                                   int propertyIndex)
Returns the specified property of the byte-wise analysis of the specified column.

Parameters:
columnIndex - The column to return the property for.
propertyIndex - The property index. This should be one of the PROP_* constants defined in this class, other than PROP_OVERSIZED, which is a row-wise property only.
Returns:
The specified property of the byte-wise analysis of the specified column.
Throws:
java.lang.IllegalArgumentException - If either argument is out of bounds.

getRowAnalysisProperty

public int getRowAnalysisProperty(int columnIndex,
                                  int propertyIndex)
Returns the specified property of the row-wise analysis of the specified column.

Parameters:
columnIndex - The column to return the property for.
propertyIndex - The property index. This should be one of the PROP_* constants defined in this class.
Returns:
The specified property of the row-wise of the specified column.
Throws:
java.lang.IllegalArgumentException - If either argument is out of bounds.

isColumnAnalysed

public boolean isColumnAnalysed(int columnIndex)
Returns true if the specified column contains character data and is thus analysed.

Parameters:
columnIndex - The index of the column to check.
Returns:
True if the specified column contains character data.

getTableDescription

public TableDescription getTableDescription()
Returns the description of the table from which the dataset analysed by this instance was loaded.

Specified by:
getTableDescription in interface TabularData
Returns:
The description of the table from which the dataset analysed by this instance was loaded.

getNumRows

public int getNumRows()
Returns the number of rows in hte dataset analysed by this instance.

Specified by:
getNumRows in interface TabularData
Returns:
The number of rows in hte dataset analysed by this instance.

getValueAt

public java.lang.Object getValueAt(int rowIndex,
                                   int columnIndex)
Returns the value at the specified row and column from the dataset analysed by this instance.

Specified by:
getValueAt in interface TabularData
Parameters:
rowIndex - The row to retrieve the data from.
columnIndex - The column to retrieve the data from.
Returns:
The value at the specified row and column from the dataset analysed by this instance.
Throws:
java.lang.IllegalArgumentException - If either argument is out of bounds.

toString

public java.lang.String toString()
Returns a text description of the current instance that can be used for debugging purposes.

Returns:
A text description of the current instance that can be used for debugging purposes.