External Materialization

What is it

In Teiid, a view is a virtual table based on the computing(loading/transforming/federating) of a complex SQL statement across heterogeneous data sources. Teiid External Materialization caches the view data to an external physical database system. When a new session issues a query against this view, the request will be redirected to the external physical database system and the cached results will be returned, rather than re-computing. This can prove time-saving and cpu-saving if your query sentences are complex and across multiple, heterogeneous data stores.

External Materialization

Materialized View - Materialized view is just like other views, with additional options in View Options, to enable pre-computing and caching data to an external database system.

Materialized Table - Materialized table represents the target table for the Materialized view, has the same structure as the Materialized view, but lives on the physical database system, need to create manually and referenced with the Materialized views via MATERIALIZED_TABLE option in designing. The data between Materialized table and physical table is kept synchronized by teiid system with a time interval.

MatView Status Table - Each Materialized view has to reference a status table, used to save the Materialized views' status, the status table usually lives on the same physical source with the Materialized Table, need to create manually and referenced with the Materialized view via teiid_rel:MATVIEW_STATUS_TABLE in designing.

Materialized Stage Table - Each Materialized view can have a stage table, it has the same structure with Materialized View and Materialized Table, need to create manually and referenced with the Materialized view via teiid_rel:MATERIALIZED_STAGE_TABLE in designing. Primary purpose of stage table is for better management. If stage table defined, the source data is first populated to the stage table, then stage table will be renamed to Materialized Table.

An external materialized view gives the administrator full control over the loading and refresh strategies. Refer to Materialization Management for details.

External Materialized Data Source Systems

The following are the types of data sources that have been tested to work in the external materialization process:

RDBMS Systems

  • RDBMS - a relational database should work, as long as the user has the correct permissions to rename a table. Example databases; Oracle, Postgresql, MySQL, MS SqlServer, SAP Hana, etc.

JBoss Data Grid (JDG)

  • JBoss Data Grid (JDG) - for in-memory caching of results.

View Options

The following View options, which start with teiid_rel: prefix, are extension properties used in the management of the Materialized View.

Property Name Description Optional Default

MATERIALIZED

Set the value to 'TRUE' for the view to be materialized

false

n/a

MATERIALIZED_TABLE

Define the name of target source table, this also hints the materialization is external

false

n/a

UPDATABLE

Allow updating Materialized View via DML UPDATE

true

false

teiid_rel:ALLOW_MATVIEW_MANAGEMENT

Allow Teiid based management

true

false

teiid_rel:MATVIEW_STATUS_TABLE

fully qualified Status Table Name defined above

false

n/a

teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT

DDL/DML command to run before the actual load of the cache, typically used to truncate staging table

true

When not defined, no script will be run

teiid_rel:MATVIEW_LOAD_SCRIPT

command to run for loading of the cache

true

will be determined based on view transformation

teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT

DDL/DML command to run after the actual load of the cache. Typically used to rename staging table to actual cache table. Required when MATVIEW_LOAD_SCRIPT not defined to copy data from teiid_rel:MATVIEW_STAGE_TABLE to MATVIEW table

true

When not defined, no script will be run

teiid_rel:MATVIEW_SHARE_SCOPE

Allowed values are \{NONE, VDB, SCHEMA}, which define if the cached contents are shared among different VDB versions and different VDBs as long as schema names match

true

NONE

teiid_rel:MATERIALIZED_STAGE_TABLE

When MATVIEW_LOAD_SCRIPT property not defined, Teiid loads the cache contents into this table. Required when MATVIEW_LOAD_SCRIPT not defined

true

n/a

teiid_rel:ON_VDB_START_SCRIPT

DDL/DML command to run start of vdb

true

n/a

teiid_rel:ON_VDB_DROP_SCRIPT

DDL/DML command to run at VDB un-deploy; typically used for cleaning the cache/status tables

true

n/a

teiid_rel:MATVIEW_ONERROR_ACTION

Action to be taken when mat view contents are requested but cache is invalid. Allowed values are (THROW_EXCEPTION = throws an exception, IGNORE = ignores the warning and supplied invalidated data, WAIT = waits until the data is refreshed and valid then provides the updated data)

true

WAIT

teiid_rel:MATVIEW_TTL

time to live in milliseconds. Provide property or cache hint on view transformation - property takes precedence.

true

2^63 milliseconds - effectively the table will not refresh, but will be loaded a single time initially

Tip
for scripts that need more than one statement executed, use a procedure block BEGIN statement; statement; …​ END

Set the MATERIALIZED to 'TRUE' and the MATERIALIZED_TABLE point to a target table is necessary for external materialization, UPDATABLE is optional, set it to 'TRUE' if want the external materialized view be updatable.

The above options are used in the management based system procedure. Once a View, which is defined with the above properties, is deployed, the following sequence of events will take place:

  1. Upon the VDB deployment, teiid_rel:ON_VDB_START_SCRIPT will be run on completion of the deployment.

  2. Based on the teiid_rel:MATVIEW_TTL defined ttl, a Scheduler run/queue JobTask be created, which keeps the cache contents synchronized with source table. JobTask will first run SYSADMIN.matViewStatus procedure, refine the next JobTask’s deplay by retrived Materialized view status and the ttl, then run SYSADMIN.loadMatView procedure, which loads the cache contents. More details refer to Materialization View Loading.

  3. When VDB is un-deployed (not when server is restarted) the teiid_rel:ON_VDB_DROP_SCRIPT script will be run.

Tip
The start/stop scripts are not cluster aware - that is they will run on each cluster member as the VDB is deployed. When deploying into a clustered environment, the scripts should be written in such a way as to be cluster safe.

Materialization View Loading

SYSADMIN.loadMatView used to perform a complete refresh of materialized table, it’s base on extension properties in View Options:

  1. Inserts/updates a entry in teiid_rel:MATVIEW_STATUS_TABLE, which indicates that the cache is being loaded.

  2. Initializes teiid_rel:MATVIEW_LOAD_SCRIPT to insert into stage_table select * from matview option nocache matview if not defined.

  3. Executes teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT if defined.

  4. Executes teiid_rel:MATVIEW_LOAD_SCRIPT.

  5. Executes teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT if defined.

  6. Updates teiid_rel:MATVIEW_STATUS_TABLE entry to set materialized view status status to "LOADED" and valid.

Materialization Management

Users, when they are designing their views, can define additional metadata and extension properties(refer to above section) on their views to control the loading and refreshing of external materialization cache. This option provides a limited, but a powerful way to manage the materialization views. For this purpose, SYSADMIN Schema defines three procedures (loadMatView, updateMatView, matViewStatus) to cooperate with the Materialization Management.

Based on the defined metadata, extension properties(refer to above section)on the view, and management system procedures, a Scheduler automatically starts during the VDB deployment and will load and keep the cache freshed.

Typical RDBMS Usage Steps

1. Create Tables

To manage and report the loading and refreshing activity of materialization view, a Materialized Table, a MatView Status Table and a Materialized Stage Table need be be defined in any one of the source models. Create these tables on the physical database, before you start designing Materialized View.

The Materialized Table and MatView Status Table is necessary, Materialized Stage Table is optional. The Materialized Table and Materialized Stage Table should have the same structure with the Materialized View. The MatView Status Table must create with below schema:

CREATE TABLE status
(
  VDBName varchar(50) not null,
  VDBVersion varchar(50) not null,
  SchemaName varchar(50) not null,
  Name varchar(256) not null,
  TargetSchemaName varchar(50),
  TargetName varchar(256) not null,
  Valid boolean not null,
  LoadState varchar(25) not null,
  Cardinality long,
  Updated timestamp not null,
  LoadNumber long not null,
  PRIMARY KEY (VDBName, VDBVersion, SchemaName, Name)
);

Appendix-1: DDL for creating MatView Status Table contains a series of verified schemas against diverse physical sources.

Warning
Some databases, such as MySQL with the InnoDB backend, may not allow a large primary key such as the one for the status table. If you experience this, you should consider making the fields shorter (such as the table name), using a different database to hold the status, or using a smaller index (for example just over vdbname and vdbversion).

2. Create Materialized View

For better management and control of the loading and refreshing strategies of the materialized view, the extension properties in View Options, this can be done either through Designer, or edit the DDL. Refer to Define Materialized View in Designer if through Designer.

The loading and refreshing strategies controled by load scripts, there are two kinds of load scripts:

  • VDB Scope Scripts - VDB start script and VDB stop script which defined by teiid_rel:ON_VDB_START_SCRIPT and teiid_rel:ON_VDB_DROP_SCRIPT correspondently, its executed in VDB deploying and removing.

  • Procedure Scope Scripts - before load script, load script and after load script which defined by teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT, teiid_rel:MATVIEW_LOAD_SCRIPT and teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT correspondently, these scripts be executed in a sequence by SYSADMIN.loadMatView, refer to Materialization View Loading for details.

To refresh/load the materialized view, the basic principles to define load script is:

  • Define truncate target/staging table sql in teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT

  • Define insert into target/staging table sql in teiid_rel:MATVIEW_LOAD_SCRIPT

  • Define alter stage table to target table logic in teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT.

If stage table is defined, you can define truncate staging table in before load script, define insert staging table in load script(if not defined, insert into stageTable select * from matview option nocache matview will be set in runtime), define alter staging table to materialized table in after load script:

"teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT" 'execute accounts.native(''truncate table stageTable'');',
"teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT"  'begin execute accounts.native(''ALTER TABLE matTable RENAME TO matTable_temp'');execute accounts.native(''ALTER TABLE stageTable RENAME TO matTable'');execute accounts.native(''ALTER TABLE matTable_temp RENAME TO stageTable''); end',

If stage table is not defined, you can define truncate target table in before load script and define insert target table in load script:

"teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT" 'execute accounts.native(''truncate table matTable'');',
"teiid_rel:MATVIEW_LOAD_SCRIPT" 'insert into matTable select * from matview option nocache matview',
Note
That however may be too simplistic because your index creation may be more performant if deferred until after the table has been created. Also full snapshot refreshes are best done to a staging table then swapping it for the existing physical table to ensure that the refresh does not impact user queries and to ensure that the table is valid prior to use.

For some sources, the effect of a RENAME operations may not be performed atomically with the other after load script operations. If this is the case, it will be possible for the materialization target table to not exist while being queried. Possible workarounds include:

  • It’s possible to send a set of commands to the source rather than as individual native queries. For example: 'execute accounts.native(''ALTER TABLE matTable RENAME TO matTable_temp; ALTER TABLE stageTable RENAME TO matTable; ALTER TABLE matTable_temp RENAME TO stageTable'')'

  • Use an alternative strategy based upon maintaining just a single table, such as updating based upon an UPSERT/MERGE and then a purge of entries that are no longer valid.

3. Use Materialized View

Once the Materialized View is completely defined, deploy it to a Teiid Server. Then create a new session and issue a query against Materialized View. You will find it’s time-saving and cpu-saving if your query sentences are complex and across multiple, heterogeneous data stores.

Typical JDG Usage Steps

1. Define JDG Caches

The typical usage of JDG in materialization is to configure a JDG server, which can be cluster aware, that will be accessed using the JDG Hot Rod Client. See the JDG Hot Rod DataSource for how to configure accessing the remote cache.

2. Create Materialized View

To configure for external materialization, see the HotRod Translator. This explains how to configure Materialization Management that is specific for using JDG remote cache and is essential for managing the underlying multiple caches needed in order to perform materialization.

3. Use Materialized View

Once the Materialized View is completely defined, deploy it to a Teiid Server. Then create a new session and issue a query against Materialized View. You will find it’s time-saving and cpu-saving if your query sentences are complex and across multiple, heterogeneous data stores.

Loading And Refreshing

System Procedures are used to Loading And Refreshing Materialized view in any time:

  • SYSADMIN.loadMatView - complete refresh the cache contents, reload the materialized table.

  • SYSADMIN.updateMatView - to partially update the cache contents, update a subset of the materialized table. When partial update is run the cache expiration time is renewed for new term based on Cache Hint again.

In Teiid, once a VDB be started, all External Materialized Views will be refreshed by SYSADMIN.loadMatView periodically with a time interval.

A sample VDB with external materialized view options

teiid-mat-example-vdb.xml is a sample VDB definition which contains six materialized view:

  • Model_A.VIEW_A - demonstrates MATVIEW_BEFORE_LOAD_SCRIPT, MATVIEW_AFTER_LOAD_SCRIPT with MATERIALIZED_STAGE_TABLE

  • Model_B.VIEW_B - demonstrates MATVIEW_LOAD_SCRIPT with MATERIALIZED_STAGE_TABLE

  • Model_C.VIEW_C - demonstrates MATVIEW_BEFORE_LOAD_SCRIPT, MATVIEW_LOAD_SCRIPT without MATERIALIZED_STAGE_TABLE

  • Model_D.VIEW_D - demonstrates MATVIEW_AFTER_LOAD_SCRIPT with MATERIALIZED_STAGE_TABLE

  • Model_E.VIEW_E - demonstrates the minmum options in materialized view

  • Model_F.VIEW_F - demonstrates the maxmum options in materialized view

Define Materialized View in Designer

Typical Usage Steps

  • Create materialized views and corresponding physical materialized target tables in Designer. This can be done through setting the materialized and target table manually, or by selecting the desired views, right clicking, then selecting Modeling→"Create Materialized Views"

  • Generate the DDL for your physical model materialization target tables. This can be done by selecting the model, right clicking, then choosing Export→"Metadata Modeling"→"Data Definition Language (DDL) File". This script can be used to create the desired schema for your materialization target on whatever source you choose.

  • Determine a load and refresh strategy. With the schema created the most simplistic approach is to just load the data. The load can even be done through Teiid with

insert into target_table select * from matview option nocache matview
Note
The Designer tooling for this feature is lacking at this moment but this will be added in coming releases.

Materialization with Embedded Server

Views with extension properties in View Options and load scripts in Materialization Management defined, you can set up External Materialization with Embedded Server as below

EmbeddedServer server = new EmbeddedServer();
…
server.addConnectionFactory("name", Object);
…
server.addTranslator("name", ExecutionFactory);
EmbeddedConfiguration config = new EmbeddedConfiguration();
config.setTransactionManager(EmbeddedHelper.getTransactionManager());
server.start(config);
server.deployVDB("matView-vdb.xml");

Appendix-1: DDL for creating MatView Status Table

h2
CREATE TABLE status
(
  VDBName varchar(50) not null,
  VDBVersion varchar(50) not null,
  SchemaName varchar(50) not null,
  Name varchar(256) not null,
  TargetSchemaName varchar(50),
  TargetName varchar(256) not null,
  Valid boolean not null,
  LoadState varchar(25) not null,
  Cardinality long,
  Updated timestamp not null,
  LoadNumber long not null,
  PRIMARY KEY (VDBName, VDBVersion, SchemaName, Name)
);
MariaDB
CREATE TABLE status
(
  VDBName varchar(50) not null,
  VDBVersion integer not null,
  SchemaName varchar(50) not null,
  Name varchar(256) not null,
  TargetSchemaName varchar(50),
  TargetName varchar(256) not null,
  Valid boolean not null,
  LoadState varchar(25) not null,
  Cardinality bigint,
  Updated timestamp not null,
  LoadNumber bigint not null,
  PRIMARY KEY (VDBName, VDBVersion, SchemaName, Name)
);

results matching ""

    No results matching ""