Query planner

For each sub-command in the user command an appropriate kind of sub-planner is used (relational, XML, procedure, etc).

Each planner has three primary phases:

  1. Generate canonical plan

  2. Optimization

  3. Plan to process converter — Converts plan data structure into a processing form.

Relational planner

A relational processing plan is created by the optimizer after the logical plan is manipulated by a series of rules. The application of rules is determined both by the query structure and by the rules themselves. The node structure of the debug plan resembles that of the processing plan, but the node types more logically represent SQL operations.

Canonical plan and all nodes

As described in the Planning overview, a SQL statement submitted to the query engine is parsed, resolved, validated, and rewritten before it is converted into a canonical plan form. The canonical plan form most closely resembles the initial SQL structure. A SQL select query has the following possible clauses (all but SELECT are optional): WITH, SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT. These clauses are logically executed in the following order:

  1. WITH (create common table expressions) — Processed by a specialized PROJECT NODE.

  2. FROM (read and join all data from tables) — Processed by a SOURCE node for each from clause item, or a Join node (if >1 table).

  3. WHERE (filter rows) — Processed by a SELECT node.

  4. GROUP BY (group rows into collapsed rows) — Processed by a GROUP node.

  5. HAVING (filter grouped rows) — Processed by a SELECT node.

  6. SELECT (evaluate expressions and return only requested rows) — Processed by a PROJECT node and DUP_REMOVE node (for SELECT DISTINCT).

  7. INTO — Processed by a specialized PROJECT with a SOURCE child.

  8. ORDER BY (sort rows) — Processed by a SORT node.

  9. LIMIT (limit result set to a certain range of results) — Processed by a LIMIT node.

For example, a SQL statement such as SELECT max(pm1.g1.e1) FROM pm1.g1 WHERE e2 = 1 creates a logical plan:

images/dv_query_plan.png

Project(groups=[anon_grp0], props={PROJECT_COLS=[anon_grp0.agg0 AS expr1]})
  Group(groups=[anon_grp0], props={SYMBOL_MAP={anon_grp0.agg0=MAX(pm1.G1.E1)}})
    Select(groups=[pm1.G1], props={SELECT_CRITERIA=pm1.G1.E2 = 1})
      Source(groups=[pm1.G1])

Here the Source corresponds to the FROM clause, the Select corresponds to the WHERE clause, the Group corresponds to the implied grouping to create the max aggregate, and the Project corresponds to the SELECT clause.

Note
The effect of grouping generates what is effectively an inline view, anon_grp0, to handle the projection of values created by the grouping.
Table 1. Node Types
Type Name Description

ACCESS

A source access or plan execution.

DUP_REMOVE

Removes duplicate rows

JOIN

A join (LEFT OUTER, FULL OUTER, INNER, CROSS, SEMI, and so forth).

PROJECT

A projection of tuple values

SELECT

A filtering of tuples

SORT

An ordering operation, which may be inserted to process other operations such as joins.

SOURCE

Any logical source of tuples including an inline view, a source access, XMLTABLE, and so forth.

GROUP

A grouping operation.

SET_OP

A set operation (UNION/INTERSECT/EXCEPT).

NULL

A source of no tuples.

TUPLE_LIMIT

Row offset / limit

Node properties

Each node has a set of applicable properties that are typically shown on the node.

Table 2. Access properties
Property Name Description

ATOMIC_REQUEST

The final form of a source request.

MODEL_ID

The metadata object for the target model/schema.

PROCEDURE_CRITERIA/PROCEDURE_INPUTS/PROCEDURE_DEFAULTS

Used in planning procedureal relational queries.

IS_MULTI_SOURCE

set to true when the node represents a multi-source access.

SOURCE_NAME

used to track the multi-source source name.

CONFORMED_SOURCES

tracks the set of conformed sources when the conformed extension metadata is used.

SUB_PLAN/SUB_PLANS

used in multi-source planning.

Table 3. Set operation properties
Property Name Description

SET_OPERATION/USE_ALL

defines the set operation(UNION/INTERSECT/EXCEPT) and if all rows or distinct rows are used.

Table 4. Join properties
Property Name Description

JOIN_CRITERIA

All join predicates.

JOIN_TYPE

Type of join (INNER, LEFT OUTER, and so forth).

JOIN_STRATEGY

The algorithm to use (nested loop, merge, and so forth).

LEFT_EXPRESSIONS

The expressions in equi-join predicates that originate from the left side of the join.

RIGHT_EXPRESSIONS

The expressions in equi-join predicates that originate from the right side of the join.

DEPENDENT_VALUE_SOURCE

set if a dependent join is used.

NON_EQUI_JOIN_CRITERIA

Non-equi join predicates.

SORT_LEFT

If the left side needs sorted for join processing.

SORT_RIGHT

If the right side needs sorted for join processing.

IS_OPTIONAL

If the join is optional.

IS_LEFT_DISTINCT

If the left side is distinct with respect to the equi join predicates.

IS_RIGHT_DISTINCT

If the right side is distinct with respect to the equi join predicates.

IS_SEMI_DEP

If the dependent join represents a semi-join.

PRESERVE

If the preserve hint is preserving the join order.

Table 5. Project properties
Property Name Description

PROJECT_COLS

The expressions projected.

INTO_GROUP

The group targeted if this is a select into or insert with a query expression.

HAS_WINDOW_FUNCTIONS

True if window functions are used.

CONSTRAINT

The constraint that must be met if the values are being projected into a group.

UPSERT

If the insert is an upsert.

Table 6. Select properties
Property Name Description

SELECT_CRITERIA

The filter.

IS_HAVING

If the filter is applied after grouping.

IS_PHANTOM

True if the node is marked for removal, but temporarily left in the plan.

IS_TEMPORARY

Inferred criteria that may not be used in the final plan.

IS_COPIED

If the criteria has already been processed by rule copy criteria.

IS_PUSHED

If the criteria is pushed as far as possible.

IS_DEPENDENT_SET

If the criteria is the filter of a dependent join.

Table 7. Sort properties
Property Name Description

SORT_ORDER

The order by that defines the sort.

UNRELATED_SORT

If the ordering includes a value that is not being projected.

IS_DUP_REMOVAL

If the sort should also perform duplicate removal over the entire projection.

Table 8. Source properties
Property Name Description

SYMBOL_MAP

The mapping from the columns above the source to the projected expressions. Also present on Group nodes.

PARTITION_INFO

The partitioning of the union branches.

VIRTUAL_COMMAND

If the source represents an view or inline view, the query that defined the view.

MAKE_DEP

Hint information.

PROCESSOR_PLAN

The processor plan of a non-relational source(typically from the NESTED_COMMAND).

NESTED_COMMAND

The non-relational command.

TABLE_FUNCTION

The table function (XMLTABLE, OBJECTTABLE, and so forth.) defining the source.

CORRELATED_REFERENCES

The correlated references for the nodes below the source.

MAKE_NOT_DEP

If make not dep is set.

INLINE_VIEW

If the source node represents an inline view.

NO_UNNEST

If the no_unnest hint is set.

MAKE_IND

If the make ind hint is set.

SOURCE_HINT

The source hint. See Federated optimizations.

ACCESS_PATTERNS

Access patterns yet to be satisfied.

ACCESS_PATTERN_USED

Satisfied access patterns.

REQUIRED_ACCESS_PATTERN_GROUPS

Groups needed to satisfy the access patterns. Used in join planning.

Note
Many source properties also become present on associated access nodes.
Table 9. Group properties
Property Name Description

GROUP_COLS

The grouping columns.

ROLLUP

If the grouping includes a rollup.

Table 10. Tuple limit properties
Property Name Description

MAX_TUPLE_LIMIT

Expression that evaluates to the max number of tuples generated.

OFFSET_TUPLE_COUNT

Expression that evaluates to the tuple offset of the starting tuple.

IS_IMPLICIT_LIMIT

If the limit is created by the rewriter as part of a subquery optimization.

IS_NON_STRICT

If the unordered limit should not be enforced strictly.

Table 11. General and costing properties
Property Name Description

OUTPUT_COLS

The output columns for the node. Is typically set after rule assign output elements.

EST_SET_SIZE

Represents the estimated set size this node would produce for a sibling node as the independent node in a dependent join scenario.

EST_DEP_CARDINALITY

Value that represents the estimated cardinality (amount of rows) produced by this node as the dependent node in a dependent join scenario.

EST_DEP_JOIN_COST

Value that represents the estimated cost of a dependent join (the join strategy for this could be Nested Loop or Merge).

EST_JOIN_COST

Value that represents the estimated cost of a merge join (the join strategy for this could be Nested Loop or Merge).

EST_CARDINALITY

Represents the estimated cardinality (amount of rows) produced by this node.

EST_COL_STATS

Column statistics including number of null values, distinct value count, and so forth.

EST_SELECTIVITY

Represents the selectivity of a criteria node.

Rules

Relational optimization is based upon rule execution that evolves the initial plan into the execution plan. There are a set of pre-defined rules that are dynamically assembled into a rule stack for every query. The rule stack is assembled based on the contents of the user’s query and the views/procedures accessed. For example, if there are no view layers, then rule Merge Virtual, which merges view layers together, is not needed and will not be added to the stack.  This allows the rule stack to reflect the complexity of the query.

Logically the plan node data structure represents a tree of nodes where the source data comes up from the leaf nodes (typically Access nodes in the final plan), flows up through the tree and produces the user’s results out the top. The nodes in the plan structure can have bidirectional links, dynamic properties, and allow any number of child nodes. Processing plans in contrast typically have fixed properties.

Plan rule manipulate the plan tree, fire other rules, and drive the optimization process. Each rule is designed to perform a narrow set of tasks. Some rules can be run multiple times. Some rules require a specific set of precursors to run properly.

  • Access Pattern Validation — Ensures that all access patterns have been satisfied.

  • Apply Security — Applies row and column level security.

  • Assign Output Symbol — This rule walks top down through every node and calculates the output columns for each node. Columns that are not needed are dropped at every node, which is known as projection minimization. This is done by keeping track of both the columns needed to feed the parent node and also keeping track of columns that are "created" at a certain node.

  • Calculate Cost — Adds costing information to the plan

  • Choose Dependent — This rule looks at each join node and determines whether the join should be made dependent and in which direction. Cardinality, the number of distinct values, and primary key information are used in several formulas to determine whether a dependent join is likely to be worthwhile. The dependent join differs in performance ideally because a fewer number of values will be returned from the dependent side.

    Also, we must consider the number of values passed from independent to dependent side. If that set is larger than the maximum number of values in an IN criteria on the dependent side, then we must break the query into a set of queries and combine their results. Executing each query in the connector has some overhead and that is taken into account. Without costing information a lot of common cases where the only criteria specified is on a non-unique (but strongly limiting) field are missed.

    A join is eligible to be dependent if:

    • There is at least one equi-join criterion, for example, tablea.col = tableb.col

    • The join is not a full outer join and the dependent side of the join is on the inner side of the join.

The join will be made dependent if one of the following conditions, listed in precedence order, holds:

  • There is an unsatisfied access pattern that can be satisfied with the dependent join criteria.

  • The potential dependent side of the join is marked with an option makedep.

  • (4.3.2) if costing was enabled, the estimated cost for the dependent join (5.0+ possibly in each direction in the case of inner joins) is computed and compared to not performing the dependent join. If the costs were all determined (which requires all relevant table cardinality, column ndv, and possibly nnv values to be populated) the lowest is chosen.

  • If key metadata information indicates that the potential dependent side is not "small" and the other side is "not small" or (5.0.1) the potential dependent side is the inner side of a left outer join.

Dependent join is the key optimization we use to efficiently process multi-source joins. Instead of reading all of source A and all of source B and joining them on A.x = B.x, we read all of A, and then build a set of A.x that are passed as a criteria when querying B. In cases where A is small and B is large, this can drastically reduce the data retrieved from B, thus greatly speeding the overall query.

  • Choose Join Strategy — Choose the join strategy based upon the cost and attributes of the join.

  • Clean Criteria — Removes phantom criteria.

  • Collapse Source — Takes all of the nodes below an access node and creates a SQL query representation.

  • Copy Criteria — This rule copies criteria over an equality criteria that is present in the criteria of a join. Since the equality defines an equivalence, this is a valid way to create a new criteria that may limit results on the other side of the join (especially in the case of a multi-source join).

  • Decompose Join — This rule performs a partition-wise join optimization on joins of a partitioned union. For more information, see Partitioned unions in Federated optimizations. The decision to decompose is based upon detecting that each side of the join is a partitioned union (note that non-ANSI joins of more than 2 tables may cause the optimization to not detect the appropriate join). The rule currently only looks for situations where at most 1 partition matches from each side.

  • Implement Join Strategy — Adds necessary sort and other nodes to process the chosen join strategy

  • Merge Criteria — Combines select nodes

  • Merge Virtual — Removes view and inline view layers

  • Place Access — Places access nodes under source nodes. An access node represents the point at which everything below the access node gets pushed to the source or is a plan invocation. Later rules focus on either pushing under the access or pulling the access node up the tree to move more work down to the sources. This rule is also responsible for placing access patterns. For more information, see Access patterns in Federated optimizations

  • Plan Joins — This rule attempts to find an optimal ordering of the joins performed in the plan, while ensuring that access pattern dependencies are met. This rule has three main steps.

    1. It must determine an ordering of joins that satisfy the access patterns present.

    2. It will heuristically create joins that can be pushed to the source (if a set of joins are pushed to the source, we will not attempt to create an optimal ordering within that set. More than likely it will be sent to the source in the non-ANSI multi-join syntax and will be optimized by the database).

    3. It will use costing information to determine the best left-linear ordering of joins performed in the processing engine. This third step will do an exhaustive search for 7 or less join sources and is heuristically driven by join selectivity for 8 or more sources.

  • Plan Outer Joins — Reorders outer joins as permitted to improve push down.

  • Plan Procedures — Plans procedures that appear in procedural relational queries.

  • Plan Sorts — Optimizations around sorting, such as combining sort operations or moving projection.

  • Plan Subqueries — New for Teiid 12. Generalizes the subquery optimization that was performed in Merge Criteria to allow for the creation of join plans from subqueries in both projection and filtering.

  • Plan Unions — Reorders union children for more pushdown.

  • Plan Aggregates — Performs aggregate decomposition over a join or union.

  • Push Limit — Pushes the affect of a limit node further into the plan.

  • Push Non-Join Criteria — This rule will push predicates out of an on clause if it is not necessary for the correctness of the join.

  • Push Select Criteria — Push select nodes as far as possible through unions, joins, and views layers toward the access nodes. In most cases movement down the tree is good as this will filter rows earlier in the plan. We currently do not undo the decisions made by Push Select Criteria.  However in situations where criteria cannot be evaluated by the source, this can lead to sub-optimal plans.

  • Push Large IN — Push IN predicates that are larger than the translator can process directly to be processed as a dependent set.

One of the most important optimization related to pushing criteria, is how the criteria will be pushed through join. Consider the following plan tree that represents a subtree of the plan for the query select * from A inner join b on (A.x = B.x) where B.y = 3

    SELECT (B.y = 3)
           |
          JOIN - Inner Join on (A.x = B.x)
         /     \
      SRC (A)   SRC (B)
Note
SELECT nodes represent criteria, and SRC stands for SOURCE.

It is always valid for inner join and cross joins to push (single source) criteria that are above the join, below the join. This allows for criteria originating in the user query to eventually be present in source queries below the joins. This result can be represented visually as:

    JOIN - Inner Join on (A.x = B.x)
          /    \
         /   SELECT (B.y = 3)
        |        |
      SRC (A)   SRC (B)

The same optimization is valid for criteria specified against the outer side of an outer join. For example:

     SELECT (B.y = 3)
           |
          JOIN - Right Outer Join on (A.x = B.x)
         /     \
      SRC (A)   SRC (B)

Becomes

          JOIN - Right Outer Join on (A.x = B.x)
          /    \
         /   SELECT (B.y = 3)
        |        |
      SRC (A)   SRC (B)

However criteria specified against the inner side of an outer join needs special consideration. The above scenario with a left or full outer join is not the same. For example:

      SELECT (B.y = 3)
           |
          JOIN - Left Outer Join on (A.x = B.x)
         /     \
      SRC (A)   SRC (B)

Can become (available only after 5.0.2):

    JOIN - Inner Join on (A.x = B.x)
          /    \
         /   SELECT (B.y = 3)
        |        |
      SRC (A)   SRC (B)

Since the criterion is not dependent upon the null values that may be populated from the inner side of the join, the criterion is eligible to be pushed below the join — but only if the join type is also changed to an inner join. On the other hand, criteria that are dependent upon the presence of null values CANNOT be moved. For example:

    SELECT (B.y is null)
           |
          JOIN - Left Outer Join on (A.x = B.x)
         /     \
      SRC (A)   SRC (B)

The preceding plan tree must have the criteria remain above the join, becuase the outer join may be introducing null values itself.

  • Raise Access — This rule attempts to raise the Access nodes as far up the plan as posssible. This is mostly done by looking at the source’s capabilities and determining whether the operations can be achieved in the source or not.

  • Raise Null — Raises null nodes. Raising a null node removes the need to consider any part of the old plan that was below the null node.

  • Remove Optional Joins — Removes joins that are marked as or determined to be optional.

  • Substitute Expressions — Used only when a function based index is present.

  • Validate Where All — Ensures criteria is used when required by the source.

Cost calculations

The cost of node operations is primarily determined by an estimate of the number of rows (also referred to as cardinality) that will be processed by it. The optimizer will typically compute cardinalities from the bottom up of the plan (or subplan) at several points in time with planning — once generally with rule calculate cost, and then specifically for join planning and other decisions. The cost calculation is mainly directed by the statistics set on physical tables (cardinality, NNV, NDV, and so forth) and is also influenced by the presence of constraints (unique, primary key, index, and so forth). If there is a situation that seems like a sub-optimal plan is being chosen, you should first ensure that at least representative table cardinalities are set on the physical tables involved.

Reading a debug plan

As each relational sub plan is optimized, the plan will show what is being optimized and it’s canonical form:

OPTIMIZE:
SELECT e1 FROM (SELECT e1 FROM pm1.g1) AS x

----------------------------------------------------------------------------
GENERATE CANONICAL:
SELECT e1 FROM (SELECT e1 FROM pm1.g1) AS x

CANONICAL PLAN:
Project(groups=[x], props={PROJECT_COLS=[e1]})
  Source(groups=[x], props={NESTED_COMMAND=SELECT e1 FROM pm1.g1, SYMBOL_MAP={x.e1=e1}})
    Project(groups=[pm1.g1], props={PROJECT_COLS=[e1]})
      Source(groups=[pm1.g1])

With more complicated user queries, such as a procedure invocation or one containing subqueries, the sub-plans may be nested within the overall plan. Each plan ends by showing the final processing plan:

----------------------------------------------------------------------------
OPTIMIZATION COMPLETE:
PROCESSOR PLAN:
AccessNode(0) output=[e1] SELECT g_0.e1 FROM pm1.g1 AS g_0

The affect of rules can be seen by the state of the plan tree before and after the rule fires. For example, the debug log below shows the application of rule merge virtual, which will remove the "x" inline view layer:

EXECUTING AssignOutputElements

AFTER:
Project(groups=[x], props={PROJECT_COLS=[e1], OUTPUT_COLS=[e1]})
  Source(groups=[x], props={NESTED_COMMAND=SELECT e1 FROM pm1.g1, SYMBOL_MAP={x.e1=e1}, OUTPUT_COLS=[e1]})
    Project(groups=[pm1.g1], props={PROJECT_COLS=[e1], OUTPUT_COLS=[e1]})
      Access(groups=[pm1.g1], props={SOURCE_HINT=null, MODEL_ID=Schema name=pm1, nameInSource=null, uuid=3335, OUTPUT_COLS=[e1]})
        Source(groups=[pm1.g1], props={OUTPUT_COLS=[e1]})


============================================================================
EXECUTING MergeVirtual

AFTER:
Project(groups=[pm1.g1], props={PROJECT_COLS=[e1], OUTPUT_COLS=[e1]})
  Access(groups=[pm1.g1], props={SOURCE_HINT=null, MODEL_ID=Schema name=pm1, nameInSource=null, uuid=3335, OUTPUT_COLS=[e1]})
    Source(groups=[pm1.g1])

Some important planning decisions are shown in the plan as they occur as an annotation. For example, the following code snippet shows that the access node could not be raised, because the parent SELECT node contained an unsupported subquery.

Project(groups=[pm1.g1], props={PROJECT_COLS=[e1], OUTPUT_COLS=null})
  Select(groups=[pm1.g1], props={SELECT_CRITERIA=e1 IN /*+ NO_UNNEST */ (SELECT e1 FROM pm2.g1), OUTPUT_COLS=null})
    Access(groups=[pm1.g1], props={SOURCE_HINT=null, MODEL_ID=Schema name=pm1, nameInSource=null, uuid=3341, OUTPUT_COLS=null})
      Source(groups=[pm1.g1], props={OUTPUT_COLS=null})


============================================================================
EXECUTING RaiseAccess
LOW Relational Planner SubqueryIn is not supported by source pm1 - e1 IN /*+ NO_UNNEST */ (SELECT e1 FROM pm2.g1) was not pushed

AFTER:
Project(groups=[pm1.g1])
  Select(groups=[pm1.g1], props={SELECT_CRITERIA=e1 IN /*+ NO_UNNEST */ (SELECT e1 FROM pm2.g1), OUTPUT_COLS=null})
    Access(groups=[pm1.g1], props={SOURCE_HINT=null, MODEL_ID=Schema name=pm1, nameInSource=null, uuid=3341, OUTPUT_COLS=null})
      Source(groups=[pm1.g1])
Procedure planner

The procedure planner is fairly simple. It converts the statements in the procedure into instructions in a program that will be run during processing. This is mostly a 1-to-1 mapping and very little optimization is performed.

XQuery

XQuery is eligible for specific optimizations. For more information, see XQuery optimization. Document projection is the most common optimization. It will be shown in the debug plan as an annotation. For example, with the user query that contains "xmltable('/a/b' passing doc columns x string path '@x', val string path '.')", the debug plan would show a tree of the document that will effectively be used by the context and path XQuerys:

MEDIUM XQuery Planning Projection conditions met for /a/b - Document projection will be used
child element(Q{}a)
  child element(Q{}b)
    attribute attribute(Q{}x)
      child text()
    child text()

results matching ""

    No results matching ""