External Materialization

What is it

In Teiid, a view is a virtual table based on the computing(loading/transforming/federating) of a complex SQL statement across heterogeneous data sources. Teiid external materialization process can cache the View data to an external data source systems on a periodic basis. When a user issues queries against this View, the request will be redirected to this external data source system where cached results will be returned, rather than re-computing results from source systems. Materialization can prove to be time and resource saving if your View transformation is complex and/or access to the source systems is constrained.

External Materialization

Materialized View - Materialized view is just like other views, with additional options in View Options, to enable pre-computing and caching data to an external data source system.

Materialized Table - Materialized table represents the target table for the materialized View, has the same structure as the materialized view, but exists on the external data source system.

MatView Status Table - Each materialized view has a reference to 'Status' table, this used to save the Materialized views' refresh status. This table typically exists on the same physical source with the Materialized Table.

An external materialized view gives the administrator full control over the loading and refresh strategies. Refer to Materialization Management for details.

External Materialized Data Source Systems

The following are the types of data sources that have been tested to work in the external materialization process:

RDBMS Systems

  • RDBMS - a relational database should work. Example databases; Oracle, Postgresql, MySQL, MS SqlServer, SAP Hana, etc.

If the database supports a transactional rename operation, you can use the default load strategy that uses a staging table and rely on renaming the staging table to the live table in the after load script.

Note
TEIID-4294 raises that not every database supports a transactional rename, either as separate or a block of statements. If this is the case you should consider using a LOADNUMBER column, or a custom load strategy that maintains only a single table.

JBoss Data Grid (JDG)

  • JBoss Data Grid (JDG) - for in-memory caching of results.

View Options

The following View properties are extension properties that used in the management of the Materialized View.

Property Name Description Optional Default

MATERIALIZED

Set the value to 'TRUE' for the View to be materialized.

false

n/a

MATERIALIZED_TABLE

Defines the name of target table, this also hints the materialization is using external materialization. Omitting this property and setting the MATERIALIZED proeprty true, invokes internal materialization.

false

n/a

UPDATABLE

Allow updating Materialized View via DML UPDATE

true

false

teiid_rel:ALLOW_MATVIEW_MANAGEMENT

Allow Teiid based automatic management of load/refresh strategies of View.

true

false

teiid_rel:MATVIEW_STATUS_TABLE

Fully qualified Status Table Name to manage the load/refresh of the View. See below for table structure and DDL for it.

false

n/a

teiid_rel:MATVIEW_LOADNUMBER_COLUMN

Name of column in the MATERIALIZED_TABLE that can hold status information about load/refresh load process. The column type MUST be long, and typically named as "LoadNumber".

false

NONE

teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT

DDL/DML command to run before the actual load of the cache

true

When not defined, no script will be run

teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT

DDL/DML command to run after the actual load of the cache. teiid_rel:MATVIEW_STAGE_TABLE to MATVIEW table

true

When not defined, no script will be run

teiid_rel:MATVIEW_SHARE_SCOPE

Allowed values are {IMPORTED, FULL}, which define if the cached contents are shared among different VDB versions and different imported VDBs and parent VDB.

true

IMPORTED

teiid_rel:ON_VDB_START_SCRIPT

DDL/DML command to run start of vdb

true

n/a

teiid_rel:ON_VDB_DROP_SCRIPT

DDL/DML command to run at VDB un-deploy; typically used for cleaning the cache/status tables. DO NOT use this script to delete the contents of Status table, when cache scope settings are configured for {FULL} scope, if another version of the VDB is still active. Deletion of this information will reload the materialization table.

true

n/a

teiid_rel:MATVIEW_ONERROR_ACTION

Action to be taken when mat view contents are requested but cache is invalid. Allowed values are (THROW_EXCEPTION = throws an exception, IGNORE = ignores the warning and supplied invalidated data, WAIT = waits until the data is refreshed and valid then provides the updated data)

true

WAIT

teiid_rel:MATVIEW_TTL

time to live in milliseconds. Provide property or cache hint on view transformation - property takes precedence.

true

2^63 milliseconds - effectively the table will not refresh, but will be loaded a single time initially

teiid_rel:MATVIEW_WRITE_THROUGH

When true Teiid will perform both the underlying update and the corresponding update against the materialization target for an insert/update/delete issued against the view.

true

false

Tip
for scripts that need more than one statement executed, use a procedure block BEGIN statement; statement; …​ END
Important
When a vdb is imported into another vdb, materializied views are automatically shared across these vdbs. The teiid_rel:MATVIEW_SHARE_SCOPE property must be set to 'IMPORTED' or 'FULL' on importing VDB’s materialized views to enable sharing across the both vdbs. The below table shows an example of how this property works

For example: Table A is in VDB X.1 and Table C in VDB Y.1 Table A & B in VDB X.2 and imports Y.1 then depending on scope setting the system will cache sharing will work as

Scope X.1 Y.1 X.2

IMPORTED

A-own copy

C-Shared w/X.2

A-own copy,B-own copy,C-Shared from Y.1

FULL

A-Shared with/X.*

C-Shared w/X.2

A-Shared with/ X,B-Shared w/X,C-Shared from/Y.1

Materialization Management

When designing Views, you can define additional metadata and extension properties(refer to above section) on the views to control the loading and refreshing of external materialization cache. This option provides a limited, but a powerful way to manage the materialization views. Below we will list steps need to take to configure a View to be materialized.

1. Creation of Status Table

To manage and report the loading and refreshing activity of materialization of the view, a Materialized Table and Status Table need be be defined in one of the source models in the VDB. Create these tables on the physical database, before you deploy the VDB.

The below defines the DDL for creating the Status table.

CREATE TABLE status
(
  VDBName varchar(50) not null,
  VDBVersion varchar(50) not null,
  SchemaName varchar(50) not null,
  Name varchar(256) not null,
  TargetSchemaName varchar(50),
  TargetName varchar(256) not null,
  Valid boolean not null,
  LoadState varchar(25) not null,
  Cardinality long,
  Updated timestamp not null,
  LoadNumber long not null,
  PRIMARY KEY (VDBName, VDBVersion, SchemaName, Name)
);

Appendix-1: DDL for creating MatView Status Table contains a series of verified schemas against different RDBMS sources. These can be modified to suit your database, please make sure the names and data types match exactly.

Warning
Some databases, such as MySQL with the InnoDB backend, may not allow a large primary key such as the one for the status table. If you experience this, you should consider making the field sizes shorter (such as the table name), using a different database to hold the status, or using a smaller index (for example just over vdbname and vdbversion).

2. Creation of View and Materialized Table

Define the View and its transformation either using the Designer or directly in DDL in a VDB’s model. Then provide the extension properties on the View as defined in View Options

Set the MATERIALIZED to 'TRUE' and the MATERIALIZED_TABLE point to a target table is necessary for external materialization, UPDATABLE is optional, set it to 'TRUE' if want the external materialized view be updatable, this must be set to true, if you want to issue incremental eager updates to the view. Define the TTL to define the load/refresh semantics.

In an another PHYSICAL model in the VDB (where the Status table defined), define the Materialized table, where the Materialized Table should have the same structure as View it is representing, with additional "LoadNumber" column with "long" data type.

Once a View, which is defined with the above properties, is deployed, the following sequence of events will take place:

Tip
Example VDB based on DDL is defined below for reference.

Materialization Table Loading

Upon deployment of the VDB to the Teiid server, SYSADMIN.loadMatView used to perform a complete refresh of materialized table, this procedure reads the extension properties defined from View Options to customize the load. The following describes the sequence of events that occur inside this procedure

  1. Inserts/updates an entry in teiid_rel:MATVIEW_STATUS_TABLE, which indicates that the cache is being loaded.

  2. Executes teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT if defined.

  3. Runs a query to load the cache contents. This makes use of View’s transformation to load the contents.

  4. Executes teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT if defined.

  5. Updates teiid_rel:MATVIEW_STATUS_TABLE entry to set materialized view status status to "LOADED" and valid. If failure happens it will be marked as such.

Tip
The start/stop scripts are not cluster aware - that is they will run on each cluster member as the VDB is deployed. When deploying into a clustered environment, the scripts should be written in such a way as to be cluster safe.

Eager Materialization Table Update

SYSADMIN.updateMatView used to perform a eager incremental update based on criteria provided. If you know that certain data points in the source system were changed after last full refresh of the materialized view, you can call this procedure with a criteria based on the view that cover those changed values, and this procedure will update only those affected rows in the materialized table.

Note: This script is not invoked automatically by Teiid, as the source update events may be occurring outside of Teiid. This procedure needs to be invoked by user, when he/she knows that there is change in the source systems.

Using JDG as Materialization Store

The JDG can be used as a materialization target to save cache contents of the View in a JDG server. JDG can be cluster aware and is accessed using the JDG Hot Rod Client. See the JDG Hot Rod DataSource for how to configure accessing the remote cache.

To configure for external materialization, see the HotRod Translator. This explains how to configure Materialization Management that is specific for using JDG remote cache and is essential for managing the underlying multiple caches needed in order to perform materialization.

Once the Materialized View is completely defined, deploy it to a Teiid Server. Then create a new session and issue a query against Materialized View. You will find it’s time-saving and cpu-saving if your query sentences are complex and across multiple, heterogeneous data stores.

Define Materialized View in Designer

  • Create materialized views and corresponding physical materialized target tables in Designer. This can be done through setting the materialized and target table manually, or by selecting the desired views, right clicking, then selecting Modeling→"Create Materialized Views"

  • Generate the DDL for your physical model materialization target tables. This can be done by selecting the model, right clicking, then choosing Export→"Metadata Modeling"→"Data Definition Language (DDL) File". This script can be used to create the desired schema for your materialization target on whatever source you choose.

Materialization with Embedded Teiid

The definition of the view and materialized table, status table all of the steps defined above are same as they are part of the VDB. Follow same steps above and deploy the VDB in the embedded Teiid as shown below.

EmbeddedServer server = new EmbeddedServer();
…
server.addConnectionFactory("name", Object);
…
server.addTranslator("name", ExecutionFactory);
EmbeddedConfiguration config = new EmbeddedConfiguration();
config.setTransactionManager(EmbeddedHelper.getTransactionManager());
server.start(config);
server.deployVDB("matView-vdb.xml");

Appendix-1: DDL for creating MatView Status Table

h2
CREATE TABLE status
(
  VDBName varchar(50) not null,
  VDBVersion varchar(50) not null,
  SchemaName varchar(50) not null,
  Name varchar(256) not null,
  TargetSchemaName varchar(50),
  TargetName varchar(256) not null,
  Valid boolean not null,
  LoadState varchar(25) not null,
  Cardinality long,
  Updated timestamp not null,
  LoadNumber long not null,
  PRIMARY KEY (VDBName, VDBVersion, SchemaName, Name)
) ;
MariaDB
CREATE TABLE status
(
  VDBName varchar(50) not null,
  VDBVersion varchar(50) not null,
  SchemaName varchar(50) not null,
  Name varchar(256) not null,
  TargetSchemaName varchar(50),
  TargetName varchar(256) not null,
  Valid boolean not null,
  LoadState varchar(25) not null,
  Cardinality bigint,
  Updated timestamp not null,
  LoadNumber bigint not null,
  PRIMARY KEY (VDBName, VDBVersion, SchemaName, Name)
) OPTIONS (UPDATABLE true);

Appendix-2: Example VDB with External Materialized View Options

The below VDB defines three models, one "Source" model that defines your source database where your business data is in, "ViewModel" defines a "Person" view which is derived from subset of the data from your table in the "Source" model’s table(s). Note that view table also marked with few extension properties to allow external materialization. The "materialized" model defines a source database model, where it has a table with exact table structure as the ViewModel’s materialized view with additional column called "LoadNumber". Note the "materialized table also contains the "status" table. Both these tables must be created manually on the source database before VDB is deployed to the server.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<vdb name="example" version="1">
    <model name="Source">
        <property name="importer.useFullSchemaName" value="false" />
        <source name="source" translator-name="h2" connection-jndi-name="java:/my-ds" />
    </model>

    <model name="ViewModel" type="VIRTUAL">
        <metadata type="DDL"><![CDATA[
          CREATE VIEW Person (
            id varchar,
            name varchar,
            dob date,
            PRIMARY KEY (id)
          ) OPTIONS (
            MATERIALIZED 'TRUE', UPDATABLE 'TRUE',
            MATERIALIZED_TABLE 'materialized.PersonCached',
            "teiid_rel:MATVIEW_TTL" 20000,
            "teiid_rel:ALLOW_MATVIEW_MANAGEMENT" 'true',
            "teiid_rel:MATVIEW_LOADNUMBER_COLUMN" 'LoadNumber',
            "teiid_rel:MATVIEW_STATUS_TABLE" 'materialized.status'
          )
          AS
            SELECT p.id, p.name, p.dob FROM Source.Person AS p;
        ]]>
        </metadata>
    </model>

    <model name="materialized" type="PHYSICAL">
        <source name="matview" translator-name="h2" connection-jndi-name="java:/matview-ds" />
        <metadata type="DDL"><![CDATA[
          CREATE VIEW PersonCached (
            id varchar,
            name varchar,
            dob date,
            LoadNumber long,
            PRIMARY KEY (id)
          );
          CREATE TABLE status (
            VDBName varchar(50) not null,
            VDBVersion varchar(50) not null,
            SchemaName varchar(50) not null,
            Name varchar(256) not null,
            TargetSchemaName varchar(50),
            TargetName varchar(256) not null,
            Valid boolean not null,
            LoadState varchar(25) not null,
            Cardinality long,
            Updated timestamp not null,
            LoadNumber long not null,
            PRIMARY KEY (VDBName, VDBVersion, SchemaName, Name)
          );
        ]]>
        </metadata>
    </model>
</vdb>

results matching ""

    No results matching ""