Modeling functions
Functions allow you to create complex calculations, filters or aggregations.
Aggregate functions
Average
Outputs the average of the values in the Input Column, grouped by the categories in the Partition Columns provided.
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing input values. The average will be computed for the values in the Input Column, grouped by the categories in the Partition Columns provided. |
Order By - (ORDER BY clause) | |||
Order by column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Count
Counts the number of non-NULL values in the Input Column, grouped by the Partition Column(s) provided.
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Counts the number of items in the Input Column, grouped by the categories in the Grouping Columns provided. |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline Column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Maximum
Outputs the maximum value in the Input Column, grouped by the categories in the Partition Columns provided (optional).
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select a column to be evaluated using the Maximum function. |
Order By - (ORDER BY clause) | |||
Order by column | Pipeline column (Select from the drop-down list. | Optional | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Minimum
Outputs the minimum value in the Input Column, grouped by the categories in the Partition Columns. provided (optional).
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select a column to be evaluated using the Minimum function. |
Order By - (ORDER BY clause) | |||
Order by column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Standard deviation
Outputs the standard deviation of values in the Input Column, treated as a sample (n = 'number of items' - 1) and grouped by the categories in the Partition Columns provided.
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select a column to be evaluated. |
Order By - (ORDER BY clause) | |||
Order by column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Standard deviation (Population)
Outputs the standard deviation of values in the Input Column, treated as a population (n = 'number of items') and grouped by the categories in the Partition Columns provided.
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select a column to be evaluated. |
Order By - (ORDER BY clause) | |||
Order by column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list). | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Sum
Outputs the sum of values in the Input Column, grouped by the categories in the Partition Columns provided.
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select the column to be evaluated. |
Order By - (ORDER BY clause) | |||
Order by column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Variance
Outputs the variance of values in the Input Column, treated as a sample (n = 'number of items' - 1) and grouped by the categories in the Partition Columns provided.
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select the column to be evaluated. |
Order By - (ORDER BY clause) | |||
Order by column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Variance (Population)
Outputs the variance of values in the Input Column, treated as a population (n = 'number of items') and grouped by the categories in the Partition Columns provided.
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select the column to be evaluated. |
Order By - (ORDER BY clause) | |||
Order by column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order Direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY BY clause) | |||
Grouping column | Pipeline Column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Functions explorer
The FUNCTIONS explorer provides access to functions in Data Hub.
The following additional features are available with the FUNCTIONS Explorer:
Refresh the list, to verify that you are viewing the most recent version, using the Refresh button.
Filter the list using the Search text box. This feature allows you to search below the highlighted node or through all nodes if one is not selected.
Note
Functions can also be viewed in the Public directory in RESOURCE EXPLORER. All relative time functions (for example, Current Month) are also available under the Date hierarchies in the using the DIMENSION TREE.
Analytic functions
Cumulative distribution
Returns the relative position of rows based on a cumulative distribution for the column(s) selected in the Order By step.
Output values range between 0.0 and 1.0 (inclusive).
Field | Input Type | Required? | Description |
Order By - (ORDER BYclause) | |||
Order by column | Pipeline Column (Select from the drop-down list.) | Required | Select the column to be evaluated. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
First value
This function will return the first value in the Input Column after sorting (Order By) and grouping (Partitioning).
Field | Input Type | Required? | Description |
Input Column | Pipeline Column (Select from the drop-down list.) | Required | Select the Input Column for evaluation. |
Order By - (ORDER BY BY clause) | |||
Order by column | Pipeline Column (Select from the drop-down list.) | Required | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Last value
This function will return the last value in the Input Column after sorting (Order By) and grouping (Partitioning).
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select the Input Column for evaluation. |
Order By - (ORDER BYclause) | |||
Order by column | Pipeline Column (Select from the drop-down list.) | Required | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Lag (Offset)
Returns data from preceding rows in the input column selected.
For example:
ID | ID (Lag=3) |
1 | NULL |
2 | NULL |
3 | NULL |
4 | 1 |
5 | 2 |
6 | 3 |
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select the column to be evaluated. |
Offset | Expression | Required | Enter the number of values to offset the rows in the Input Column. If the offset is 1, then the output will be the value from the preceding row in the input column. |
Default | Expression | Required | Provide a default value to be used when the row offset lies outside the table. For example: if the offset is set to 10, then the first 10 rows of the calculated column will use the default value. (Default is NULL). |
Order By - (ORDER BY clause) | |||
Order by column | Pipeline column (Select from the drop-down list.) | Required | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order Direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline Column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
The lead (Offset)
Returns data from subsequent rows in the input column selected.
For example:
ID | ID (Lead=3) |
1 | 4 |
2 | 5 |
3 | 6 |
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing input values. |
Offset | Expression | Required | Enter the number of values to offset the rows in the Input Column. If the offset is 1, then the output will be the value from the next row in the input column. |
Default | Expression | Required | Provide a default value to be used when the row offset lies outside the table. For example: if the offset is set to 10, then the first 10 rows of the calculated column will use the default value. (Default is NULL). |
Order By - (ORDER BY clause) | |||
Order by column | Pipeline column (Select from the drop-down list.) | Required | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order Direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline Column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Percentile (Continuous Distribution)
Returns values representing the Percentile for the Order By column(s) within each partition (optional). This function uses a continuous distribution to determine percentiles and returns a computed value that may not exist in the input column.
Note
A percentile of 0.5 represents the Median value of the input column within a group of values (partition).
Field | Input Type | Required? | Description |
Percentile | A value is greater than 0.0 and less than or equal 1.0. | Required | Input the percentile to be computed. Input the percentile as a value between 0.0 and 1.0. |
Order By - (ORDER BYclause) | |||
Order by column | Pipeline Column (Select from the drop-down list.) | Required | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Percentile (Discrete Distribution)
Returns values representing the Percentile for the Order By column(s) within each partition (optional). This function uses a discrete distribution to determine percentiles and returns a value from the input column.
Note
A percentile of 0.5 represents the Median value of the input column within a group of values (partition).
Field | Input Type | Required? | Description |
Percentile | A value is greater than 0.0 and less than or equal 1.0. | Required | Input the percentile to be computed. Input the percentile as a value between 0.0 and 1.0. |
Order By - (ORDER BYclause) | |||
Order by column | Pipeline Column (Select from the drop-down list.) | Required | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
Percentile rank
Returns the rank of rows as a percentile for the column(s) selected in the Order By step.
Output values range between 0.0 and 1.0 (inclusive).
Field | Input Type | Required? | Description |
Order By - (ORDER BYclause) | |||
Order by column | Pipeline Column (Select from the drop-down list.) | Required | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
JSON functions
JSON String Validity Test
This function determines if there is valid JSON strings in the selected column.
0 - indicates the input string is not valid JSON.
1 - indicates the input string is a valid JSON string.
Note
This function requires SQL Server 2016 or later.
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select the Input column (or columns) for evaluation. |
Modify JSON Value
This function returns the input JSON column, with the New Value at the JSON Path specified.
Note
This function requires SQL Server 2016 or later.
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing JSON values to be updated. |
JSON Path | Expression (valid JSON path) | Required | Provide the path to the value to be updated. |
New Value | Expression | Required | Provide a new value at the provided JSON path. |
Query JSON (Object)
This function returns the result of the Input JSON Query Expression.
Note
This function requires SQL Server 2016 or later.
Field | Input Type | Required? | Description |
Input JSON Query Expression | Expression | Required | Input the JSON query expression to be evaluated. |
JSON Path | Expression (valid JSON path) | Optional | Provide the path to the JSON object or array. |
Query JSON (Value)
Returns the value at the specified JSON Path for the Input Column.
Note
This function requires SQL Server 2016 or later.
Field | Input Type | Required? | Description |
Input Column | Pipeline Column (Select from the drop-down list.) | Required | Select the input column containing JSON to be queried for values. |
JSON Path | Expression (valid JSON path) | Required | Provide the path to the value. |
Ranking functions
Provides several ranking functions:
Rank: Returns a numerical rank, using duplicate numbers where there are duplicate entries in the Input Column (such as 1, 2, 2, 2, 5, 6).
Dense Rank: Similar to Rank but values are not skipped after duplicates (such as 1, 2, 2, 2, 3, 4).
Row Number: Similar to Rank but numbers all rows sequentially, regardless of duplicates (such as 1, 2, 3, 4, 5, 6).
Field | Input Type | Required? | Description |
Rank function | Select from:
| Required | Select the column containing input values. The average will be computed for the values in the Input Column, grouped by the categories in the Partition Columns provided. |
Order By - (ORDER BYclause) | |||
Order by column | Pipeline Column (Select from the drop-down list.) | Required | Select a column (or columns) for sorting (ordering) the input column prior to applying the current function. |
Order direction | Select from the drop-down list. | Options:
| |
Group By - (PARTITION BY clause) | |||
Grouping column | Pipeline column (Select from the drop-down list.) | Optional | Select a column (or columns) for grouping (or partitioning) the input column prior to applying the current function. |
String functions
Concatenate
Concatenates (joins) the Input Columns, separated by the Delimiter provided.
Field | Input Type | Required? | Description |
Delimiter /Separator | Expression | Optional | Provide a separator for the concatenated columns. This can be a single character (such as ','), a string (such as 'and'), or no separator (''). |
Input Column | Pipeline Column (Select from the drop-down list.) | Minimum 2 columns required | Select the columns to be concatenated in the order they are to be concatenated. |
Left (Truncate)
This function is used for truncating text, retaining the number of characters Length specified, and will retain characters from the left of the string.
Example
if an initial string in the input column was Expenses, a Length of 3 would return Exp.
Field | Input Type | Required? | Description |
Input Column | Pipeline Column (Select from the drop-down list. | Required | Select the column containing the strings to be truncated. |
Length | Integer | Required | Enter the length of the input column strings to be retained. |
Replace string
This function is used to find and replace content in the Input Column. When the Search Pattern is discovered in the Input Column, it will be replaced with the Replacement String
Example
Input String: qwerty
Search Pattern: 'ert'
Replacement: 'zzz'
Result: qwzzzy
This function will not replace the values in the original column but will create a duplicate version of the original column then perform the replacements.
Field | Input Type | Required? | Description |
Input column | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing the data strings to be modified by replacement. |
Search pattern | Expression | Required | Provide a search pattern. When this string is discovered in the Input Column, it will be replaced with the Replacement String. |
Replacement String | Expression | Required | When the Search Pattern is discovered in the Input column, it will be replaced with the Replacement String. |
Right (Truncate)
This function is used for truncating text, retaining the number of characters Length specified, and will retain characters from the right of the string.
Example
If an initial string in the input column was Expenses, a Length of 5 would return enses.
Field | Input Type | Required? | Description |
Input Column | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing the strings to be truncated. |
Length | Integer | Required | Enter the length of the input column strings to be retained. |
Trim after pattern
This function truncates the input column based on the search pattern provided. All content after (and including) the first match for the input search pattern will be removed for each cell.
Example
Cell contents: ABC,DEF,GHI,JKL
Search Pattern: ','
Result: ABC
Field | Input Type | Required? | Description |
Input Column | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing characters (such as text) to be truncated. |
Search pattern | Expression | Required | Enter a search term to indicate breakpoints in the cells of the input column. This may be a delimiter (such as ',' , '_' , ':' , ' ' , etc), character, or string of characters. For example Cell Contents: ABC.z.DEF.z.GHI.z.JKL Search Pattern: '.z.' Result: ABC |
Trim before and after pattern
This function truncates the input column based on the search pattern provided. All content before the first match and after the last match for the input search pattern will be removed for each cell (including the matched content).
Example
Cell contents: ABC,DEF,GHI,JKL
Search Pattern: ','
Result: DEF,GHI,
Field | Input Type | Required? | Description |
Input Column | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing characters (such as text) to be truncated. |
Search pattern | Expression | Required | Enter a search term to indicate breakpoints in the cells of the input column. This may be a delimiter (such as ',' , '_' , ':' , ' ' , etc), character or string of characters. For example Cell Contents: ABC.z.DEF.z.GHI.z.JKL Search Pattern: '.z.' Result: DEF.z.GHI |
Trim to pattern
This function truncates the input column based on the search pattern provided. All content before (and including) the first match for the input search pattern will be removed for each cell.
Example
Cell contents: ABC,DEF,GHI,JKL
Search Pattern: ','
Result: DEF,GHI,JKL
Field | Input Type | Required? | Description |
Input Column | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing characters (such as text) to be truncated. |
Search pattern | Expression | Required | Enter a search term to indicate breakpoints in the cells of the input column. This may be a delimiter (such as ',' , '_' , ':' , ' ' , etc), character or string of characters. For example Cell Contents: ABC.z.DEF.z.GHI.z.JKL Search Pattern: '.z.' Result: DEF.z.GHI.z.JKL |
Trim whitespace
Trims leading and trailing whitespace from data in the Input Column.
Note
This function requires SQL Server 2017 or later.
Field | Input Type | Required? | Description |
Input Column | Pipeline Column (Select from the drop-down list.) | Required | Select the column to be evaluated. |
Time functions
Create date
Joins the input Year, Month, and Day columns provided to produce a date.
Field | Input Type | Required? | Description |
Input year column | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing the year (YYYY) value for the output date. |
Search month column | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing the month (MM) value for the output date. |
Input day column | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing the day (DD) value for the output date. |
Date offset (Date Add)
The date in the Input Date Column will be offset based on the Amount of Offset and the Measure of Time (or Units of Time).
For example:
Input Date: 2000-01-01
Units: days
Amount of Offset: 5
Result: 2000-01-06
Field | Input Type | Required? | Description |
Measure of Time (Units) | Select from:
| Required | Select the element of the Input Date Column to be offset. |
Amount of Offset | Integer or Float | Required | Enter the size of the offset. |
Input Date Column | Pipeline Column (Select from the drop-down list.) | Required | Select an input date column for the function. |
Day from date column
Returns the day (DD) value from the Input Date Column.
The Input Date Column can be:
A Date type column.
A DateTime type column.
A Text type column containing a recognized date or date-time format.
Field | Input Type | Required? | Description |
Input Date Column | Pipeline Column (Select from the drop-down list.) | Required | Select an input date column for the function. |
Difference between dates
Calculates the difference between two date, time, or date-time columns. The result is returned in the Measure of Time (Units) selected.
For example:
Start Date: 2000-01-01
End Date: 2000-02-01
Units: Day
Results: 31
Field | Input Type | Required? | Description |
Measure of Time (Units) | Select from:
| Required | Select the unit of time results will be expressed in. |
Start Date/Time Colum | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing the start date. |
End Date/Time Colum | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing the end date. |
Extract date component
This function returns the Date Component selected of the input Date, Time, or DateTime data.
Field | Input Type | Required? | Description |
Input Column | Pipeline Column (Select from the drop-down list.) | Required | Select the column containing Date, Time, or Datetime data. |
Date Component | Select from:
| Required | Select the Component of the Date, Time, or DateTime string to be extracted from the Input Column. |
Month from date column
Returns the month (MM) value from the Input Date Column.
The Input Date Column can be:
A Date type column.
A DateTime type column.
A Text type column containing a recognized date or date-time format.
Field | Input Type | Required? | Description |
Input Date Column | Pipeline Column (Select from the drop-down list.) | Required | Select an input date column for the function. |
Time zone conversion
Converts the source date-time column to another time zone.
Example
Source date-time: 01 Jan 2000 00:00:00
Source Time Zone: UTC (Coordinated Universal Time)
Target Time Zone: (UTC+10:00) Brisbane
Result: 01 Jan 2000 10:00:00
Field | Input Type | Required? | Description |
Source column | Pipeline Column (Select from the drop-down list.) | Required | Select an input date time column for the function. |
Source Time Zone - Select from the following options. | |||
Source Time Zone | Radio selection | Required | Select one of the following methods to indicate the time zone of the Source Column. |
Column | Uses the time zone of the Source Column's Date Time data type. | ||
Parameter | Overrides the Source Column's time zone with the value from the selected parameter. | ||
Time Zone | Overrides the Source Column's time zone with the selected time zone (selected from the drop-down list). | ||
Target Time Zone | |||
Target Time Zone | Select from the drop-down list. | Required | Select the output time zone. |
Year from date column
Returns the year (YYYY) value from the Input Date Column.
The Input Date Column can be:
A Date type column.
A DateTime type column.
A Text type column containing a recognized date or date-time format.
Field | Input Type | Required? | Description |
Input Column | Pipeline Column (Select from the drop-down list.) | Required | Select the column for the function. |
Other functions
First Non-NULL value
Returns the first non-NULL value in each row for the selected Input Columns.
Field | Input Type | Required? | Description |
Input Column | Pipeline Column (Select from the drop-down list.) | Required (minimum 2 columns) | Select the column tp be evaluated. Columns will be evaluated in the order they are added. |
If (Complex)
The If (Complex) function will evaluate the When, Then, and Else expressions provided.
Example
When [Water_Temp] <= 0 Then 'Solid'
When [Water_Temp] >= 100 Then 'Gas'
When [Water_Temp] > 0 AND [Water_Temp] < 100 Then 'Liquid'
For simple equality statements with a single input column, consider using If (Simple).
Field | Input Type | Required? | Description |
Condition - add a condition pair | |||
When | Expression | Required | Enter a boolean expression to evaluate the corresponding Then expression when true. |
Then | Expression | Required | Enter an expression to evaluate when the corresponding When expression is met. |
The Else Clause | |||
Else | Expression | Required | Enter an expression to evaluate when no When expressions are met. |
If (Simple)
The If (Simple) function will evaluate the Input Expression based on the When, Then, and Else expressions provided.
The IF (Simple) only accepts equality statements and is best suited for categorical inputs.
Example
Input Expression: [Motor_Vehicle]
When 'T' Then 'Truck'
When 'C' Then 'Car'
When 'B' Then 'Motorcycle'
Else 'Other MV'
For more complex if conditions, use If (Complex).
Field | Input Type | Required? | Description |
Input | Expression | Required | Provide an input expression or column to be evaluated. |
Condition - add a condition pair | |||
When | Expression | Required | Enter a boolean expression to evaluate the corresponding Then expression when true. |
Then | Expression | Required | Enter an expression to evaluate when the corresponding When expression is met. |
The Else Clause | |||
Else | Expression | Required | Enter an expression to evaluate when no When expressions are met. |
Random number between
Returns random integers between Upper and Lower limits provided. The Upper and Lower limits are included in the range of values generated.
For example:
Upper Limit: 10
Lower Limit: 5
Result: Each row is randomly assigned one of the following numbers: 5 , 6 , 7 , 8 , 9 , 10.
Field | Input Type | Description | ||||
Lower Limit | Expression |
| ||||
Upper Limit | Expression | Required | Enter the Integer (whole number) to be used as the upper limit for the random numbers to be generated. Only Integers are supported for this function. The upper limit is included in the range of values generated. |
SQL expression
You can enter a scalar (single value) SQL expression to define a new column using the +Calculation button.
Example
If you want to calculate a column called "Discount Amount", the following SQL expression could be used: [Sales Amount] * [Discount Pct]
Click the + Calculation button.
A new calculated column, called New Calculated Column, is added to the end of the pipeline schema.
On the right side of the screen, alter the column's settings in the Properties panel.
In the Properties panel, enter a meaningful name for the calculated column.
Enter a valid SQL expression for the column’s calculation.
As you type, simple SQL syntax and column reference checking is dynamically performed. This feature saves time as typing errors are found more quickly. The validation also verifies that columns in the current pipeline referred to in the SQL expression are correct and that the SQL has correct syntax.
Note
If you change a column's physical name, any calculated columns in the current pipeline that refer to the column are automatically updated. Calculations in other pipelines that refer to the column will display an error indication. Clicking in the SQL pane for each offending calculation will force the column name to be updated.
Use defined parameters in the expression.
Use the code completion options to view lists of available code and syntax options as you type an entry.
Drag-and-drop columns from the Out column on the COLUMNS pipeline tab's column list, as shown below.
Note
If you need to enter a conditional expression, use the SQL CASE expression.
Click Apply to test run the SQL code.
This action locates any run-time errors, such as divide by zero, that are not found by the dynamic syntax checking (which was performed as you typed the code). The associated column is automatically profiled and the information is immediately displayed in the Profile area on the Properties pane.
The color of the triangle icon on the Apply button represents the status of the SQL expression:
Green. The expression has been successfully validated and may be executed.
Yellow. The expression is invalid or an error was encountered when executing the SQL expression.
Dark gray. The expression is currently being validated.
Light gray. The expression has already been executed.
Note
For examples of these colors, hover over the Apply button in the Data Hub interface and view the corresponding info tip.
The calculation’s result appears in the Preview pane, at the bottom of the tab. The example below illustrates the calculation of sick leave days from sick leave hours in the employee pipeline (assuming an 8 hour workday).
Note
Lookups are always done before calculations in each step. You can define a lookup and a calculation in the same step, which allows you to define a calculation on a lookup column.
If you wish to perform a lookup on a calculated column, you must configure the calculation in one step, and then add a lookup on the calculated column in a subsequent step.
Defining a calculated column with a subquery
You can define a calculated column by entering SQL to perform a correlated subquery. A correlated subquery is a nested query that uses values from the outer query. The correlated subquery is evaluated once for each row returned by the outer query.
In the context of calculated columns in a data model, the "outer" query is performed transparently within Data Hub, and the subquery is executed within it, as part of the SELECT clause. You need only enter the subquery text in the calculated column.
The @IN and @INALIAS tokens are provided to support this functionality. The @IN token may be used to refer to the input table view to the current step within the pipeline. The @INALIAS token may be used to refer to the table in the otherwise-unseen "outer" section of the query.
In the following example, a complete SQL query to return information, including the amount (for each detail line on an invoice) together with the average value of the detail lines for that invoice, would appear as follows:
Note
The subquery text is highlighted in blue.
Example
Select SalesOrderId, SalesOrderDetailId, LineTotal,
(select avg(LineTotal)
from Sales.SalesOrderDetail as [INNER]
where [INNER].SalesOrderID = [OUTER].SalesOrderID)
As AvgLineForSalesOrder
from Sales.SalesOrderDetail as [OUTER]
In a data model, the "outer" query would be implicit. To enter the subquery as a calculated column, use the @IN and @INALIAS tokens to replace the table names, as shown below:
Example
(select avg(LineTotal)
from @IN as [INNER]
where [INNER].SalesOrderID = @INALIAS.SalesOrderID)
Reference: Common fields
The ORDER BY clause and the PARTITION BY clause are common to most functions in the following groups:
Aggregate Functions
Analytic Functions
Ranking Functions
Order By - (ORDER BY clause)
Field | Input Type | Description | |
Order By Column | Pipeline Column (Select from the drop-down list.) | Select an input date column for the function. | |
Order Direction | Select from the drop-down list. | Options:
|
Group By - (PARTITION BY clause)
Field | Input Type | Description | |
Grouping Column | Pipeline Column (Select from the drop-down list.) | Select a column (or columns) for sorting the input column prior to applying the current function. |