MetaMatrix Features
SQL Support
MetaMatrix supports SQL, both for issuing federated queries to MetaMatrix Query and MetaMatrix Enterprise, and for defining view transformations in MetaMatrix Designer; see also Procedure Language for how SQL is used in virtual procedures and update procedures.
Data Types
MetaMatrix uses runtime types when executing queries. These are different than semantic types defined in type fields in the MetaMatrix Designer. The runtime type is specified in the runtime type property in the Designer. The ODBC type is also used in API type mappings on top of the JDBC type from which they are derived. Data types can be converted as described in the Data Type Conversions page.
MetaMatrix runtime datatypes consist of the following:
| Type | Value | Java Class | JDBC Type | ODBC Type |
|---|---|---|---|---|
| string | string of characters | java.lang.String | VARCHAR | VARCHAR |
| char | a single Unicode character | java.lang.Character | JAVA_OBJECT | VARCHAR |
| boolean | a single bit, or Boolean, with two possible values | java.lang.Boolean | BIT | SMALLINT |
| byte | numeric, integral type, signed 8-bit | java.lang.Byte | TINYINT | SMALLINT |
| short | numeric, integral type, signed 16-bit | java.lang.Short | SMALLINT | SMALLINT |
| integer | numeric, integral type, signed 32-bit | java.lang.Integer | INTEGER | INTEGER |
| long | numeric, integral type, signed 64-bit | java.lang.Long | BIGINT | INTEGER |
| biginteger | numeric, integral type, arbitrary precision | java.math.BigInteger | NUMERIC | NUMERIC |
| float | numeric, floating point type, 32-bit IEEE 754 floating-point numbers | java.lang.Float | REAL | FLOAT |
| double | numeric, floating point type, 64-bit IEEE 754 floating-point numbers | java.lang.Double | DOUBLE | DOUBLE |
| bigdecimal | numeric, floating point type, arbitrary precision | java.math.BigDecimal | NUMERIC | NUMERIC |
| date | datetime, representing a single day (year, month, day) | java.sql.Date | DATE | DATE |
| time | datetime, representing a single time (hours, minutes, seconds, milliseconds) | java.sql.Time | TIME | TIME |
| timestamp | datetime, representing a single date and time (year, month, day, hours, minutes, seconds, milliseconds, nanoseconds) | java.sql.Timestamp | TIMESTAMP | TIMESTAMP |
| object | any arbitrary Java object, must implement java.lang.Serializable | java.lang.Object | JAVA_OBJECT | VARCHAR |
| blob | binary large object, representing a stream of bytes | java.sql.Blob | BLOB | BLOB |
| clob | character large object, representing a stream of characters | java.sql.Clob | CLOB | CLOB |
| xml | XML document, which may be retrieved either as a string (in materialized form) or a clob (in stream form) | java.lang.String (default) or java.sql.Clob | VARCHAR | VARCHAR |
Identifiers
SQL commands contain references to tables and columns. These references are in the form of identifiers, which uniquely identify the tables and columns in the context of the command. All queries in MetaMatrix are processed in the context of a virtual database, or VDB. Because MetaMatrix federates information across multiple sources, tables and columns must be scoped in some manner to avoid conflicts. This scoping is provided by models, which contain the information for each data source or set of views.
Fully-qualified table and column names are of the following form, where the separate 'parts' of the identifier are delimited by periods.
- TABLE: <model_name>.<table_spec>
- COLUMN: <model_name>.<table_spec>.<column_name>
Because different data sources organize tables in different ways, some prepending catalog or schema or user information, MetaMatrix allows the 'table_spec' to be a dot-delimited construct.
Some examples of valid fully-qualified table identifiers are:
- MyModel.MySchema.Portfolios
- MyModel.Inventory
- MyModel.MyCatalog.dbo.Authors
Some examples of valid fully-qualified column identifiers are:
- MyModel.MySchema.Portfolios.portfolioID
- MyModel.Inventory.totalPallets
- MyModel.MyCatalog.dbo.Authors.lastName
Fully-qualified identifiers can always be used in SQL commands. Partially- or unqualified forms can also be used, as long as the resulting names are unambiguous in the context of the command. Different forms of qualification can be mixed in the same query.
Examples:
- "MyModel"."MySchema.Portfolios"."portfolioID"
- "MyModel"."Inventory"."totalPallets"
- "MyModel"."MyCatalog.dbo.Authors"."lastName"
Expressions
Identifiers, literals, and functions can be combined into expressions. Expressions can be used almost anywhere in a query -- SELECT, FROM (if specifying join criteria, WHERE, GROUP BY, HAVING. You cannot use expressions in an ORDER BY clause.
MetaMatrix supports the following types of expressions:
Column Identifiers
Column identifiers are used to specify the output columns in SELECT statements, the columns and their values for INSERT and UPDATE statements, and criteria used in WHERE and FROM clauses. They are also used in GROUP BY, HAVING, and ORDER BY clauses.
The syntax for column identifiers was defined in the Identifiers section above.
Aliases
Both tables and columns may be aliased in a SELECT query. Alias names must follow the same naming rule as identifier names (as they will be used as identifiers). Tables are aliased in the FROM clause and columns are aliased in the SELECT clause. To alias an identifier, use the AS keyword and specify the alias after the identifier. The AS keyword is optional.
Table names used in a query can be aliased in the FROM clause. Once a table is aliased, the table alias must be used to identify columns elsewhere in the query; the original table name is not accepted.
Column names used in a query can be aliased in the SELECT clause. These aliases apply only to the output column names of the overall query and the ORDER BY clause. They cannot be used in other clauses of the query.
In the example above, notice the following things:
- Logistics.Orders table is aliased to 'o' using the AS keyword
- Inventory.Parts table is aliased to 'p' and the AS keyword is omitted
- Column o.orderID is aliased as ID using the AS keyword
- Column p.color is aliased as PartColor and the AS keyword is omitted
- The aliased column PartColor is used in the ORDER BY clause
- The unaliased column p.color is used in the WHERE clause
Literals and Nulls
Literal values represent fixed values. These can any of the 'standard' data types: string, integer, double, boolean, date, time, timestamp. Some examples of simple literal values are:
- ‘abc’
- ‘isn’’t true’ - use an extra single tick to escape a tick in a string with single ticks
- 5
- -37.75e01
- 100.0
- true
- false
- '\27'
Integer values will be assigned an integral data type big enough to hold the value (integer, long, or biginteger). Floating point values will always be parsed as a double.
MetaMatrix provides a syntax for specifying some non-string literal values using a special literal syntax:
- {b'true'} - boolean
- {d'2006-10-12'} - date (YMD)
- {t'10:59:30'} - time, second precision (HMS)
- {ts'2006-12-24 11:59:59.999'} - timestamp, millisecond precision
The keyword 'null' is used to represent an absent or unknown value and is inherently untyped. In many cases, a null literal value will be assigned an implied type based on context. For example, in the function '5 + null', the null value will be assigned the type 'integer' to match the type of the value '5'. A null literal used in the SELECT clause of a query with no implied context will be assigned to type 'string'.
Scalar Functions
MetaMatrix supports the use of scalar functions -- functions that return a single scalar value -- in SQL statements. MetaMatrix provides an extensive set of built-in functions, covering all the functions commonly used in JDBC and SQL. In addition, MetaMatrix provides the capability for users to define their own custom functions, and subsequently use those in queries.
Aggregate Functions
Aggregate functions take sets of values from a group produced by an explicit or implicit GROUP BY and return a single scalar value computed from the group. MetaMatrix supports the following aggregate functions:
- COUNT(*) – count the number of values (including nulls and duplicates) in a group
- COUNT(expression) – count the number of values (excluding nulls) in a group
- SUM(expression) – sum of the values (excluding nulls) in a group
- AVG(expression) – average of the values (excluding nulls) in a group
- MIN(expression) – minimum value in a group (excluding null)
- MAX(expression) – maximum value in a group (excluding null)
Additionally, some aggregate functions may contain a keyword 'DISTINCT' before the expression, indicating that duplicate expression values should be ignored. DISTINCT is not allowed in COUNT(*) and is not meaningful in MIN or MAX (result would be unchanged), so it can be used in COUNT, SUM, and AVG.
Aggregate functions may only be used in the HAVING or SELECT clauses and may not be nested within another aggregate function.
Aggregate functions may be nested inside other functions.
For more information on aggregates, see the sections on GROUP BY or HAVING.
Case and Searched Case Expressions
MetaMatrix supports two forms of the CASE expression which allows conditional logic in a scalar expression. The two forms are:
- CASE <expr> ( WHEN <expr> THEN <expr>)+ [ELSE expr] END
- CASE ( WHEN <criteria> THEN <expr>)+ [ELSE expr] END
Each form allows for an output based on conditional logic. The first form starts with an initial expression and evaluates WHEN expressions until the values match, and outputs the THEN expression. If no WHEN is matched, the ELSE expression is output. If no WHEN is matched and no ELSE is specified, a null literal value is output.
The second form (the searched case expression) searches the WHEN clauses, which specify an arbitrary criteria to evaluate. If any criteria evaluates to true, the THEN expression is evaluated and output. If no WHEN is true, the ELSE is evaluated or NULL is output if none exists.
Scalar Subqueries
Subqueries can be used to produce a single scalar value in the WHERE or HAVING clauses only. A scalar subquery must have a single column in the SELECT clause and should return either 0 or 1 row. If no rows are returned, null will be returned as the scalar subquery value.
For other types of subqueries, see the Subqueries section below.
Parameter References
Parameters are specified using a '?' symbol. Parameters may only be used with PreparedStatement or CallableStatements in JDBC. Each parameter is linked to a value specified by 1-based index in the JDBC API.
Criteria
Criteria are of two basic forms:
- Predicates that evaluate to true or false
- Logical criteria that combine predicates (AND, OR, NOT)
MetaMatrix supports the following predicates:
- expression (=|<>|!=|<|>|<=|>=) expression
- expression (=|<>|!=|<|>|<=|>=) (ANY|ALL|SOME) expression
- expression [NOT] IS NULL
- expression [NOT] LIKE expression [ESCAPE char]
- expression [NOT] IN (expression, ...)
- expression [NOT] IN (subquery)
- EXISTS(subquery)
- expression BETWEEN minExpression AND maxExpression
All predicates and criteria can be used in the FROM clause (for join criteria), WHERE clause, HAVING clause, or CASE expressions.
Some examples of valid criteria are:
- (balance > 2500.0)
- 100*(50 - x)/(25 - y) > z
- concat(areaCode,concat('-',phone)) LIKE '314'
SQL Commands
There are 4 basic commands for manipulating data in SQL, corresponding to the CRUD create, read, update, and delete operations: INSERT, SELECT, UPDATE, and DELETE. In addition, procedures can be executed using the EXECUTE command.
SELECT Command
The SELECT command is used to retrieve records from one or more tables.
A SELECT command has a number of clauses:
- SELECT ...
- [FROM ...]
- [WHERE ...]
- [GROUP BY ...]
- [HAVING ...]
- [ORDER BY ...]
- [LIMIT [offset,] limit]
- [OPTION ...]
The order of the clauses in a query is as follows:
All of these clauses other than OPTION are defined by the SQL specification. The specification also specifies the order that these clauses will be logically processed. Below is the processing order where each stage passes a set of rows to the following stage. Note that this processing model is logical and does not represent the way any actual database engine performs the processing, although it is a useful model for understanding questions about SQL.
- FROM stage - gathers all rows from all tables involved in the query and logically joins them with a Cartesian product, producing a single large table with all columns from all tables. Joins and join criteria are then applied to filter rows that do not match the join structure.
- WHERE stage - applies a criteria to every output row from the FROM stage, further reducing the number of rows
- GROUP BY stage - groups sets of rows with matching values in the group by columns.
- HAVING stage - applies criteria to each group of rows. Criteria can only be applied to columns that will have constant values within a group (those in the grouping columns or aggregate functions applied across the group).
- SELECT stage - specifies the column expressions that should be returned from the query. Expressions are evaluated, including aggregate functions based on the groups of rows, which will no longer exist after this point. The output columns are named using either column aliases or an implicit name determined by the engine. If SELECT DISTINCT is specified, duplicate removal will be performed on the rows being returned from the SELECT stage.
- ORDER BY stage - sorts the rows returned from the SELECT stage as desired. Supports sorting on multiple columns in specified order, ascending or descending. The output columns will be identical to those columns returned from the SELECT stage and will have the same name.
- LIMIT stage - returns only the specified rows (with skip and limit values).
This model can be used to understand many questions about SQL. For example, columns aliased in the SELECT clause can only be referenced by alias in the ORDER BY clause. Without knowledge of the processing model, this can be somewhat confusing. Seen in light of the model, it is clear that the ORDER BY stage is the only stage occurring after the SELECT stage, which is where the columns are named. Because the WHERE clause is processed before the SELECT, the columns have not yet been named and the aliases are not yet known.
INSERT Command
The INSERT command is used to add a record to a table.
- INSERT INTO table (column,...) VALUES (value,...)
UPDATE Command
The UPDATE command is used to modify records in a table. The operation may result in 1 or more records being updated, or in no records being updated if none match the criteria.
- UPDATE table SET (column=value,...) [WHERE criteria]
DELETE Command
The DELETE command is used to remove records from a table. The operation may result in 1 or more records being deleted, or in no records being deleted if none match the criteria.
- DELETE FROM table [WHERE criteria]
EXECUTE Command
The EXECUTE command is used to execute a procedure, such as a virtual procedure or a stored procedure. Procedures may have zero or more scalar input parameters. The return value from a procedure is a result set, the same as is returned from a SELECT. Note that EXEC can be used as a short form of this command.
- EXECUTE proc()
- EXECUTE proc(param1)
- EXECUTE proc(param1,param2)
- EXECUTE proc(name1=param1,name4=param4,name7=param7)
- EXEC proc( param1, param2, param3 )
Temp Tables
MetaMatrix supports creating temporary,or "temp", tables. Temp tables are dynamically created, but are treated as any other physical table.
Temp tables can be defined implicitly by referencing them in a SELECT INTO or in an INSERT statement or explcitly with a CREATE TABLE statement. Implicitly created temp tables must have a name that starts with '#'.
Creation syntax:
- CREATE LOCAL TEMPORARY TABLE <temporary table name> (<column name> <data type>,...)
- SELECT <element name>,...INTO <temporary table name> FROM <table name>
- INSERT INTO <temporary table name> ((<column name>,...)VALUES (<value>,...)
Drop syntax:
- DROP TABLE <temporary table name>
Limitations:
- With the CREATE TABLE syntax only basic table definition (column name and type information) is supported.
- The "ON COMMIT" clause is not supported in the CREATE TABLE statement.
- "drop behavior" option is not supported in the drop statement.
- Only local temporary tables are supported. This implies that the scope of temp table will be either to the sesssion or the block of a virtual procedure that creates it.
- Session level temp tables are not fail-over safe.
- temp tables are non-transactional.
- Temp tables do not support update or delete operations.
The following example loads a temporary table with data from 2 sources, and with a manually inserted record, and then uses that temp table in a subsequent query. Note that the #temp temporary table is not accessable outside of the procedure block that creates it.
SQL Clauses
This section describes the clauses that are used in the various SQL commands described in the previous section. Nearly all these features follow standard SQL syntax and functionality, so any SQL reference can be used for more information.
SELECT Clause
SQL queries start with the SELECT keyword and are often referred to as "SELECT statements" MetaMatrix supports most of the standard SQL query constructs.
- SELECT *
- SELECT table.*
- SELECT [DISTINCT] expression [AS name]
- SELECT CASE [expression] {WHEN criteria THEN expression}* END
- SELECT COUNT *
- SELECT {COUNT|SUM|AVG|MIN|MAX} ([DISTINCT] expression)
FROM Clause
The FROM clause specifies the target table(s) for SELECT, UPDATE, and DELETE statements.
- FROM {table [AS alias]}*
- FROM table1 [INNER|LEFT OUTER|RIGHT OUTER|FULL OUTER] JOIN table1 ON join-criteria
- FROM join-spec [INNER|LEFT OUTER|RIGHT OUTER|FULL OUTER] JOIN table1 ON join-criteria
- FROM table1 CROSS JOIN table1
- FROM (subquery) AS alias
- FROM table1 JOIN table2 MAKEDEP ON join-criteria
- FROM table1 JOIN table2 MAKENOTDEP ON join-criteria (NEW in 5.0)
WHERE Clause
The WHERE clause defines the criteria to limit the records affected by SELECT, UPDATE, and DELETE statements.
The general form of the WHERE is
- WHERE [NOT] criteria
- WHERE (([NOT]criteria) {AND|OR} ([NOT] criteria))*
where criteria is of the form:
- expression1 {=|<>|<|>|<=|>=} expression2
- expression1 [NOT]{LIKE|IN} expression2
- expression1 [NOT]BETWEEN expression2
- expression IS [NOT] NULL
- expression IN (subquery)
- expression {=|<>|<|>|<=|>=} (scalar subquery)
- expression {=|<>|<|>|<=|>=} [ANY|SOME] (subquery)
- EXISTS (subquery)
GROUP BY Clause
The GROUP BY clause denotes that rows should be grouped according to the specified expression values. One row will be returned for each group, after optionally filtering those aggregate rows based on a HAVING clause.
- GROUP BY (column1,)*
- GROUP BY expression
If an aggregate function is used in the SELECT clause and no GROUP BY is specified, an implicit GROUP BY will be performed with the entire result set as a single group. In this case, every column in the SELECT must be an aggregate function as no other column value will be fixed across the entire group.
HAVING Clause
The HAVING clause operates exactly as a WHERE clause although it operates on the output of a GROUP BY. It supports the same syntax as the WHERE clause.
Expressions can contain aggregate functions: COUNT, AVG, SUM, MIN, MAX.
ORDER BY Clause
The ORDER BY clause specifies how the returned records from a SELECT should be sorted. The options are ASC (ascending) and DESC (descending).
- ORDER BY column1 [ASC|DESC], ...
- ORDER BY 3 [ASC|DESC], ...
UNION Clause
The UNION keyword denotes the unioning of two sets of records with the same column structures.
- SELECT ... UNION [ALL] SELECT ... [ORDER BY...]
The output columns will be named by the output columns of the first SELECT in the union. Each SELECT must have the same number of output columns and compatible data types for each relative column. Data type conversion will be performed if data types are inconsistent and implicit conversions exist.
LIMIT Clause
The LIMIT keyword specifies a limit on the number of records returned from the SELECT command. An optional offset (the number of rows to skip) can be specified. The syntax of the limit clause is:
- LIMIT [offset,] limit
Some examples are:
- LIMIT 100 - returns the first 100 records (rows 1-100)
- LIMIT 500, 100 - skips 500 records and returns the next 100 records (rows 501-600)
INTO Clause
When the into clause is specified with a SELECT, the results of the query are inserted into the specified table. This is often used to insert records into a temporary table. The INTO clause immediately precedes the FROM clause, with the following syntax:
- INTO <table name>
Note that the INTO clause is logically applied last in processing, after the ORDER BY and LIMIT clauses.
OPTION Clause
The OPTION keyword denotes options the user can pass in with the command. These options are MetaMatrix-specific.
- OPTION SHOWPLAN - returns the query plan along with the results
- OPTION PLANONLY - returns the query plan, but does not execute the command
- OPTION MAKEDEP - specifies source tables that should be made dependent in the join
- OPTION MAKENOTDEP - prevents a dependent join from being used (NEW in 5.0)
- OPTION DEBUG - prints query planner debug information in the log and returns it through the JDBC API
Command Hints
Optional joins - is a feature that allows joined tables or nested joins to be removed from a query plan under some conditions. The syntax of an optional join is an inline comment /* optional */ preceding the joined table or nested join. Some example from clauses are:
- from /* optional */ tableA, tableB
- from /*optional */ tableA inner join tableB on ...
- from tableA inner join /* optional */ (tableB inner join tableC on ...)
The optional side of the join will be removed if no columns from that join are directly used in the output columns of the user query even if the cardinality of the results are effected.
Subqueries
A subquery is a SQL query embedded within another SQL query. The query containing the subquery is the outer query.
MetaMatrix supports four basic types of subqueries:
Subqueries in the FROM Clause
Subqueries in the FROM clause of the outer query (also known as "inline views") can return any number of rows and columns. This type of subquery must always be given an alias.
Subqueries in the WHERE Clause
Subqueries in the criteria of the outer query include subqueries in an IN clause, subqueries using the ANY/SOME or ALL predicate quantifier, and subqueries using the EXISTS predicate.
The following usages of subqueries must each select only one column, but can return any number of rows.
Scalar Subqueries
A scalar subquery is a subquery that returns exactly one value. Scalar subqueries are a type of expression. They can be used anywhere in a MetaMatrix query where an expression can be used, such as the SELECT, FROM, WHERE, or HAVING clauses.
The following examples show the use of scalar subqueries in the SELECT clause of an outer query and in the WHERE clause of an outer query.
Scalar subqueries were also discussed in the Expression section above.
Correlated Subqueries
Correlated subqueries are subqueries that refer back to elements of the outer query. Correlated subqueries are supported in subqueries of the SELECT, WHERE, and HAVING clauses of an outer query.
In the above example, X.b is a correlated reference from the subquery to the outer query group X. Conceptually, this means that the subquery will be re-evaluated during processing for every value of X.b in the outer query.