CREATE TABLE status
(
VDBName varchar(50) not null,
VDBVersion varchar(50) not null,
SchemaName varchar(50) not null,
Name varchar(256) not null,
TargetSchemaName varchar(50),
TargetName varchar(256) not null,
Valid boolean not null,
LoadState varchar(25) not null,
Cardinality long,
Updated timestamp not null,
LoadNumber long not null,
PRIMARY KEY (VDBName, VDBVersion, SchemaName, Name)
);
External Materialization
This document will explain what Teiid External Materialization is and how to use it.
What is it
In Teiid, a view is a virtual table based on the computing(loading/transforming/federating) of a complex SQL statement across heterogeneous data sources. Teiid External Materialization caches the view data to an external physical database system. When a new session issues a query against this view, the request will be redirected to the external physical database system and the cached results will be returned, rather than re-computing. This can prove time-saving and cpu-saving if your query sentences are complex and across multiple, heterogeneous data stores.
Materialized View - Materialized view is just like other views, with additional options in View Options, to enable pre-computing and caching data to an external database system.
Materialized Table - Materialized table represents the target table for the Materialized view, has the same structure as the Materialized view, but lives on the physical database system, need to create manually and referenced with the Materialized views via MATERIALIZED_TABLE
option in designing. The data between Materialized table and physical table is kept synchronized by teiid system with a time interval.
MatView Status Table - Each Materialized view has to reference a status table, used to save the Materialized views' status, the status table usually lives on the same physical source with the Materialized Table
, need to create manually and referenced with the Materialized view via teiid_rel:MATVIEW_STATUS_TABLE
in designing.
Materialized Stage Table - Each Materialized view can have a stage table, it has the same structure with Materialized View
and Materialized Table
, need to create manually and referenced with the Materialized view via teiid_rel:MATERIALIZED_STAGE_TABLE
in designing. Primary purpose of stage table is for better management. If stage table defined, the source data is first populated to the stage table, then stage table will be renamed to Materialized Table
.
An external materialized view gives the administrator full control over the loading and refresh strategies. Refer to Materialization Management for details.
External Materialized Data Source Systems
The following are the types of data sources that have been tested to work in the external materialization process:
RDBMS Systems
-
RDBMS - a relational database should work, as long as the user has the correct permissions to rename a table. Example databases; Oracle, Postgresql, MySQL, MS SqlServer, SAP Hana, etc.
JBoss Data Grid (JDG)
-
JBoss Data Grid (JDG) - for in-memory caching of results.
View Options
The following View options, which start with teiid_rel: prefix, are extension properties used in the management of the Materialized View.
Property Name | Description | Optional | Default |
---|---|---|---|
MATERIALIZED |
Set the value to 'TRUE' for the view to be materialized |
false |
n/a |
MATERIALIZED_TABLE |
Define the name of target source table, this also hints the materialization is external |
false |
n/a |
UPDATABLE |
Allow updating Materialized View via DML |
true |
false |
teiid_rel:ALLOW_MATVIEW_MANAGEMENT |
Allow Teiid based management |
true |
false |
teiid_rel:MATVIEW_STATUS_TABLE |
fully qualified Status Table Name defined above |
false |
n/a |
teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT |
DDL/DML command to run before the actual load of the cache, typically used to truncate staging table |
true |
When not defined, no script will be run |
teiid_rel:MATVIEW_LOAD_SCRIPT |
command to run for loading of the cache |
true |
will be determined based on view transformation |
teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT |
DDL/DML command to run after the actual load of the cache. Typically used to rename staging table to actual cache table. Required when MATVIEW_LOAD_SCRIPT not defined to copy data from teiid_rel:MATVIEW_STAGE_TABLE to MATVIEW table |
true |
When not defined, no script will be run |
teiid_rel:MATVIEW_SHARE_SCOPE |
Allowed values are \{NONE, VDB, SCHEMA}, which define if the cached contents are shared among different VDB versions and different VDBs as long as schema names match |
true |
NONE |
teiid_rel:MATERIALIZED_STAGE_TABLE |
When MATVIEW_LOAD_SCRIPT property not defined, Teiid loads the cache contents into this table. Required when MATVIEW_LOAD_SCRIPT not defined |
true |
n/a |
teiid_rel:ON_VDB_START_SCRIPT |
DDL/DML command to run start of vdb |
true |
n/a |
teiid_rel:ON_VDB_DROP_SCRIPT |
DDL/DML command to run at VDB un-deploy; typically used for cleaning the cache/status tables |
true |
n/a |
teiid_rel:MATVIEW_ONERROR_ACTION |
Action to be taken when mat view contents are requested but cache is invalid. Allowed values are (THROW_EXCEPTION = throws an exception, IGNORE = ignores the warning and supplied invalidated data, WAIT = waits until the data is refreshed and valid then provides the updated data) |
true |
WAIT |
teiid_rel:MATVIEW_TTL |
time to live in milliseconds. Provide property or cache hint on view transformation - property takes precedence. |
true |
2^63 milliseconds - effectively the table will not refresh, but will be loaded a single time initially |
Tip
|
for scripts that need more than one statement executed, use a procedure block BEGIN statement; statement; … END |
Set the MATERIALIZED
to 'TRUE' and the MATERIALIZED_TABLE
point to a target table is necessary for external materialization, UPDATABLE
is optional, set it to 'TRUE' if want the external materialized view be updatable.
The above options are used in the management based system procedure. Once a View, which is defined with the above properties, is deployed, the following sequence of events will take place:
-
Upon the VDB deployment,
teiid_rel:ON_VDB_START_SCRIPT
will be run on completion of the deployment. -
Based on the
teiid_rel:MATVIEW_TTL
defined ttl, a Scheduler run/queue JobTask be created, which keeps the cache contents synchronized with source table. JobTask will first run SYSADMIN.matViewStatus procedure, refine the next JobTask’s deplay by retrived Materialized view status and the ttl, then run SYSADMIN.loadMatView procedure, which loads the cache contents. More details refer to Materialization View Loading. -
When VDB is un-deployed (not when server is restarted) the
teiid_rel:ON_VDB_DROP_SCRIPT
script will be run.
Tip
|
The start/stop scripts are not cluster aware - that is they will run on each cluster member as the VDB is deployed. When deploying into a clustered environment, the scripts should be written in such a way as to be cluster safe. |
Materialization View Loading
SYSADMIN.loadMatView used to perform a complete refresh of materialized table, it’s base on extension properties in View Options:
-
Inserts/updates a entry in
teiid_rel:MATVIEW_STATUS_TABLE
, which indicates that the cache is being loaded. -
Initializes
teiid_rel:MATVIEW_LOAD_SCRIPT
toinsert into stage_table select * from matview option nocache matview
if not defined. -
Executes
teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT
if defined. -
Executes
teiid_rel:MATVIEW_LOAD_SCRIPT
. -
Executes
teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT
if defined. -
Updates
teiid_rel:MATVIEW_STATUS_TABLE
entry to set materialized view status status to "LOADED" and valid.
Materialization Management
Users, when they are designing their views, can define additional metadata and extension properties(refer to above section) on their views to control the loading and refreshing of external materialization cache. This option provides a limited, but a powerful way to manage the materialization views. For this purpose, SYSADMIN Schema defines three procedures (loadMatView, updateMatView, matViewStatus) to cooperate with the Materialization Management.
Based on the defined metadata, extension properties(refer to above section)on the view, and management system procedures, a Scheduler automatically starts during the VDB deployment and will load and keep the cache freshed.
Typical RDBMS Usage Steps
1. Create Tables
To manage and report the loading and refreshing activity of materialization view, a Materialized Table
, a MatView Status Table
and a Materialized Stage Table
need be be defined in any one of the source models. Create these tables on the physical database, before you start designing Materialized View
.
The Materialized Table
and MatView Status Table
is necessary, Materialized Stage Table
is optional. The Materialized Table
and Materialized Stage Table
should have the same structure with the Materialized View
. The MatView Status Table
must create with below schema:
Appendix-1: DDL for creating MatView Status Table contains a series of verified schemas against diverse physical sources.
Warning
|
Some databases, such as MySQL with the InnoDB backend, may not allow a large primary key such as the one for the status table. If you experience this, you should consider making the fields shorter (such as the table name), using a different database to hold the status, or using a smaller index (for example just over vdbname and vdbversion). |
2. Create Materialized View
For better management and control of the loading and refreshing strategies of the materialized view, the extension properties in View Options, this can be done either through Designer, or edit the DDL. Refer to Define Materialized View in Designer if through Designer.
The loading and refreshing strategies controled by load scripts, there are two kinds of load scripts:
-
VDB Scope Scripts - VDB start script and VDB stop script which defined by
teiid_rel:ON_VDB_START_SCRIPT
andteiid_rel:ON_VDB_DROP_SCRIPT
correspondently, its executed in VDB deploying and removing. -
Procedure Scope Scripts - before load script, load script and after load script which defined by
teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT
,teiid_rel:MATVIEW_LOAD_SCRIPT
andteiid_rel:MATVIEW_AFTER_LOAD_SCRIPT
correspondently, these scripts be executed in a sequence by SYSADMIN.loadMatView, refer to Materialization View Loading for details.
To refresh/load the materialized view, the basic principles to define load script is:
-
Define truncate target/staging table sql in
teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT
-
Define insert into target/staging table sql in
teiid_rel:MATVIEW_LOAD_SCRIPT
-
Define alter stage table to target table logic in
teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT
.
If stage table is defined, you can define truncate staging table in before load script, define insert staging table in load script(if not defined, insert into stageTable select * from matview option nocache matview
will be set in runtime), define alter staging table to materialized table in after load script:
"teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT" 'execute accounts.native(''truncate table stageTable'');',
"teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT" 'begin execute accounts.native(''ALTER TABLE matTable RENAME TO matTable_temp'');execute accounts.native(''ALTER TABLE stageTable RENAME TO matTable'');execute accounts.native(''ALTER TABLE matTable_temp RENAME TO stageTable''); end',
If stage table is not defined, you can define truncate target table in before load script and define insert target table in load script:
"teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT" 'execute accounts.native(''truncate table matTable'');',
"teiid_rel:MATVIEW_LOAD_SCRIPT" 'insert into matTable select * from matview option nocache matview',
Note
|
That however may be too simplistic because your index creation may be more performant if deferred until after the table has been created. Also full snapshot refreshes are best done to a staging table then swapping it for the existing physical table to ensure that the refresh does not impact user queries and to ensure that the table is valid prior to use. |
For some sources, the effect of a RENAME operations may not be performed atomically with the other after load script operations. If this is the case, it will be possible for the materialization target table to not exist while being queried. Possible workarounds include:
-
It’s possible to send a set of commands to the source rather than as individual native queries. For example: 'execute accounts.native(''ALTER TABLE matTable RENAME TO matTable_temp; ALTER TABLE stageTable RENAME TO matTable; ALTER TABLE matTable_temp RENAME TO stageTable'')'
-
Use an alternative strategy based upon maintaining just a single table, such as updating based upon an UPSERT/MERGE and then a purge of entries that are no longer valid.
3. Use Materialized View
Once the Materialized View is completely defined, deploy it to a Teiid Server. Then create a new session and issue a query against Materialized View. You will find it’s time-saving and cpu-saving if your query sentences are complex and across multiple, heterogeneous data stores.
Typical JDG Usage Steps
1. Define JDG Caches
The typical usage of JDG in materialization is to configure a JDG server, which can be cluster aware, that will be accessed using the JDG Hot Rod Client. See the JDG Hot Rod DataSource for how to configure accessing the remote cache.
2. Create Materialized View
To configure for external materialization, see the HotRod Translator. This explains how to configure Materialization Management that is specific for using JDG remote cache and is essential for managing the underlying multiple caches needed in order to perform materialization.
3. Use Materialized View
Once the Materialized View is completely defined, deploy it to a Teiid Server. Then create a new session and issue a query against Materialized View. You will find it’s time-saving and cpu-saving if your query sentences are complex and across multiple, heterogeneous data stores.
Loading And Refreshing
System Procedures are used to Loading And Refreshing Materialized view in any time:
-
SYSADMIN.loadMatView - complete refresh the cache contents, reload the materialized table.
-
SYSADMIN.updateMatView - to partially update the cache contents, update a subset of the materialized table. When partial update is run the cache expiration time is renewed for new term based on Cache Hint again.
In Teiid, once a VDB be started, all External Materialized Views will be refreshed by SYSADMIN.loadMatView periodically with a time interval.
A sample VDB with external materialized view options
teiid-mat-example-vdb.xml is a sample VDB definition which contains six materialized view:
-
Model_A.VIEW_A - demonstrates
MATVIEW_BEFORE_LOAD_SCRIPT
,MATVIEW_AFTER_LOAD_SCRIPT
withMATERIALIZED_STAGE_TABLE
-
Model_B.VIEW_B - demonstrates
MATVIEW_LOAD_SCRIPT
withMATERIALIZED_STAGE_TABLE
-
Model_C.VIEW_C - demonstrates
MATVIEW_BEFORE_LOAD_SCRIPT
,MATVIEW_LOAD_SCRIPT
withoutMATERIALIZED_STAGE_TABLE
-
Model_D.VIEW_D - demonstrates
MATVIEW_AFTER_LOAD_SCRIPT
withMATERIALIZED_STAGE_TABLE
-
Model_E.VIEW_E - demonstrates the minmum options in materialized view
-
Model_F.VIEW_F - demonstrates the maxmum options in materialized view
Define Materialized View in Designer
Typical Usage Steps
-
Create materialized views and corresponding physical materialized target tables in Designer. This can be done through setting the materialized and target table manually, or by selecting the desired views, right clicking, then selecting Modeling→"Create Materialized Views"
-
Generate the DDL for your physical model materialization target tables. This can be done by selecting the model, right clicking, then choosing Export→"Metadata Modeling"→"Data Definition Language (DDL) File". This script can be used to create the desired schema for your materialization target on whatever source you choose.
-
Determine a load and refresh strategy. With the schema created the most simplistic approach is to just load the data. The load can even be done through Teiid with
insert into target_table select * from matview option nocache matview
Note
|
The Designer tooling for this feature is lacking at this moment but this will be added in coming releases. |
Materialization with Embedded Server
Views with extension properties in View Options and load scripts in Materialization Management defined, you can set up External Materialization with Embedded Server as below
EmbeddedServer server = new EmbeddedServer();
…
server.addConnectionFactory("name", Object);
…
server.addTranslator("name", ExecutionFactory);
EmbeddedConfiguration config = new EmbeddedConfiguration();
config.setTransactionManager(EmbeddedHelper.getTransactionManager());
server.start(config);
server.deployVDB("matView-vdb.xml");
Appendix-1: DDL for creating MatView Status Table
CREATE TABLE status
(
VDBName varchar(50) not null,
VDBVersion varchar(50) not null,
SchemaName varchar(50) not null,
Name varchar(256) not null,
TargetSchemaName varchar(50),
TargetName varchar(256) not null,
Valid boolean not null,
LoadState varchar(25) not null,
Cardinality long,
Updated timestamp not null,
LoadNumber long not null,
PRIMARY KEY (VDBName, VDBVersion, SchemaName, Name)
);
CREATE TABLE status
(
VDBName varchar(50) not null,
VDBVersion integer not null,
SchemaName varchar(50) not null,
Name varchar(256) not null,
TargetSchemaName varchar(50),
TargetName varchar(256) not null,
Valid boolean not null,
LoadState varchar(25) not null,
Cardinality bigint,
Updated timestamp not null,
LoadNumber bigint not null,
PRIMARY KEY (VDBName, VDBVersion, SchemaName, Name)
);