Semantic Layer processing
Semantic layer processing is an important part of preparing a model for use.
This task involves creating a physical structure in the data warehouse and semantic layer that matches the model's logical structure.
When making minor changes to a processed model (e.g. altering a calculated column), re-processing the model (via the Process button) affects only the affected pipelines, rather than the entire semantic model. This increases the speed of the overall operation.
A process action is not always successful, as there may be errors in the model that prevent the creation of a valid semantic model.
Changes to security are updated each time a model is processed.
Incremental processing
Set standard process options
Process Semantic Layer: Click this check box and specify the following setting options:
Offline: When selected, each part of the semantic layer is processed in a separate transaction. The semantic layer will not be reportable during processing.
If this setting is not selected, the cube structure changes are processed and cube data is updated in a single transaction. If an error occurs during that process, the transaction is rolled back. This means the cube is always available to be queried. This maintains continuous access to the cube but requires a large amount of memory on the SQL Server Analysis Services (
When this setting is selected, the cube structure changes are processed and data is updated in many separate transactions. The staging database updates in one transaction. The cube structure is updated in another. Each measure group and dimension is processed separately.
Using this option reduces the amount of memory that SSAS uses during the update and process stages. However, once the cube structure changes are completed, the cube cannot be queried until all data processing transactions are complete, as the cube is offline. If there are errors in the data update, the cube will remain offline until they corrected and data updates are completed.
Create backup: Select this option to create a backup copy of the semantic model (if the semantic model has been previously created).
Create backup: Select this option to create a backup copy of the semantic model (if the semantic model has been previously created).
Set pipeline incremental processing properties
Specify the incremental load behavior for individual pipelines using the two groups of Incremental Process settings on the pipeline's Properties panel.
Note
For source table and SQL source pipelines, changing these pipeline-level settings overrides the corresponding, inherited data source settings. Warehouse, warehouse SQL, and Date pipelines do not inherit these settings from the data source.
These groups of settings allow you to control pipeline incremental processing in the following ways:
Source group options. These settings allow you to specify how data is loaded from the data source and sent to the warehouse.
Warehouse & Cube group options. These settings allow you to specify how data is loaded from the warehouse (no source involvement).
Incremental process options
Depending on the type of pipeline you are using, you will either see one or two groups of the following properties:
Source table pipelines and source SQL pipelines.
Two groups (Source and Warehouse & Cube) appear.
Warehouse pipelines and warehouse SQL pipelines.
Only the Warehouse & Cube group appears as these pipeline types do not communicate directly with the source data.
Date pipelines.
Only the Warehouse & Cube group appears.
Warehouse incremental process options
Specify how data is loaded from the tables in the warehouse (with no direct source involvement) using the options in the Incremental Process (Warehouse & Cube) group on the pipeline's Properties panel.
Important
These options, unlike the Source options, are available for all pipeline types, including Date pipelines.
Create Warehouse Table - A pipeline is implemented in the data warehouse as a series of nested SQL views. The end result of the series of views is used as the basis for the measures and dimensions in the cube. Enabling this setting causes the final result of the series of pipeline views to be written to the data warehouse as a table. To maximize performance and minimize data warehouse size, by default this setting is disabled for all pipelines, except for pipelines representing financial dimensions.
Note
This setting is also available from the Database group on the Properties panel. Both buttons perform the same operation and are synced.
Enabling this setting in most cases generally slightly reduces the time to process the cube, but significantly increases the time needed to process the data warehouse, and also increases the storage requirement within the data warehouse. However, in some cases, processing performance for pipelines that make heavy use of lookups and subqueries, such as those used for financial dimensions, may be greatly improved if this setting is enabled, as the reduction in cube processing time outweighs the increase in warehouse processing time.
Warehouse Incremental Load - If selected, only rows not previously loaded from the source are loaded into the warehouse table, saving processing time. If the pipeline contains complex SQL steps or calculations that aggregate, disable this setting to ensure that the warehouse table is populated correctly. The warehouse table must have Enforce Key Uniqueness enabled for this feature to work.
Synchronize Deletes - Deleting rows from data tables can be slow. To optimize performance, ZAP disables Synchronized deletes by default. When an incremental load is performed, only new rows are added and changed rows are updated; deleted rows are left in the staging and warehouse tables.
Rows deleted from the source can be removed in the Data Hub warehouse in two ways.
When a Full Refresh is performed
When a full source load is performed with Synchronize deletes enabled.
Important
Synchronize deletes is not recommended when using source incremental load, as Data Hub only has definite knowledge that a row has been removed when performing a full source load.
In smaller and simpler datasets from sources like Excel and CSV, performance cost for incremental deletes may not be a problem. In these cases synchronized deletes can be enabled by default so that warehouse deleted records are more in sync with the source.
Note
Not removing deleted rows may affect the accuracy of reports produced from the cube. To minimize this problem, you should perform a full load of the source data periodically to effectively remove the deleted rows. A full load can be achieved by creating a model process schedule with the Disable Incremental check box selected.
In many data sources (especially ERP), this problem is alleviated by not deleting rows, but rather flagging them as disabled. In this case, the need for periodic full loads to maintain report accuracy is eliminated.
Enforce key uniqueness - This setting applies to the columns marked as part of the pipeline's primary key on the final step of the pipeline. These columns appear with a key symbol at the right of their entries in the pipeline schema. By default, this option is selected, which ensures the key values are unique. You should clear (unselect) this option to allow duplicate keys, which may be necessary when the duplicate rows in the pipeline are needed for a lookup within the warehouse. To understand how this setting works, consider the example of a customer pipeline with keys CustomerCode and CompanyID (the legal entity that created the customer record). There could be rows in the pipeline with the same CustomerCode but different CompanyIDs, representing branches of a customer organization in different geographic regions that are serviced by different companies. If you need to distinguish between customers from different companies, you could include both CustomerCode and CompanyID in the key and check the Enforce key uniqueness check box. This will mean that customers with entries under more than one company will appear as separate members in the cube. Depending on the column used to name the customer members, they may display with the same name, making it difficult to distinguish them, and making some calculations more complex. This situation can be improved by creating a hierarchy with a company level above the customer level, which lets you easily distinguish the customers from different companies. If you don't need to distinguish between customers from different companies in your model, the best solution is to add an aggregate step to combine the duplicate customer rows. The aggregate step will use CustomerCode as the key field, which will be unique. However, if you don't need to distinguish between customers associated with different companies in the cube, but you still want to distinguish between them for lookups in the warehouse (e.g. to find their addresses), you have the option of just setting CustomerCode as the key, and ignoring the duplicates by clearing the Enforce key uniqueness check box. The only issue with this strategy is the unlikely situation where you will not identify as errors any rows that use the same CustomerCode for different customers.
Note
If any columns are marked with the key symbol in the initial step, their uniqueness will always be enforced, regardless of this setting.
This option cannot be deselected if Warehouse Incremental Load is enabled, as a unique key is required to identify new rows.
Override settings from data source - For source table pipelines and source SQL pipelines, the default settings for the two incremental options (Default Incremental Type and Quick Incremental Type) are determined by the corresponding settings in the pipeline's data source. To adjust these settings for the current pipeline, select this check box and specify the options for the Default Incremental Type and Quick Incremental Type drop-down lists (described below).
Note
The Override settings from data source check box do not appear for warehouse, warehouse SQL, or Date pipelines, since they are not directly linked to the source and do not inherit settings from it.
Default Incremental Type and Quick Incremental Type drop-down lists - Specify the pipeline-level behavior for the two available types of incremental loading. The following options are available from each drop-down:
Add & Update - The default setting for the Default Incremental Type. When selected, new data from this pipeline is added and existing data is updated, as needed.
Add Only - The default setting for the Quick Incremental Type. When selected, only new data from this pipeline is added. No existing data is updated.
Modules
Specify module functionality
Once a module has been added to your model, you can determine where it will be visible. Settings made here by a solution designer will appear in the model. The customer is able to change the settings in the resulting model.
Dimension Browser checkboxes.
When modules are selected, a drop-down list will appear in the Dimension Tree, allowing you to choose a module.
The dimension tree will then display only measures and dimensions for the selected module, making it easier for you to find the cube items you want. An item named the same as the model always appears at the top of the drop-down list, letting you display all measures and dimensions.
You may disable these checkboxes. (e.g. if your modules are used only for cube security, rather than to control the dimension tree display).