Partitioning tables in Azure Synapse
Table Partition
- Enables you to divide data into smaller groups of data.
- Mostly created on a date column.
- Supported on all table types:
- clustered columnstore
- clustered index
- heap
- Supported on all distribution types:
- hash
- round robin
- Can be done on one column only.
Benefits
Load operations
When leveraging partitions to load data into a table, you can avoid use of transaction logs, significantly improving performance.
Inserts/Updates/Deletes
Leverage Partition Switching to move entire partitions between tables. This is a metadata-only operation i.e. no physical movement of data is involved.
Partition switching is executed using ALTER TABLE SWITCH statement.
General requirements for Switching Partitions
- Both tables must exist before the SWITCH operation.
- The receiving partition must exist and it must be empty.
- The receiving non-partitioned table must exist and it must be empty.
- Partitions must be on the same column.
- Source and target tables must share the same filegroup.
exhaustive list of requirements
Partition Switching patterns
1. Switch from non-partitioned table to non-partitioned table
ALTER TABLE src_tbl SWITCH TO tgt_tbl
Note: tgt_tbl MUST be empty and should have exact same schema as src_tbl, or SQL server will throw an error.
2. Switch from non-partitioned table to partitioned table
This is not supported in Azure Synapse, as check constraints are not supported to enforce the range of values in a table.
ALTER TABLE src_tbl SWITCH TO tgt_tbl PARTITION 1
Above SQL statement will fail if (non-partitioned) src_tbl doesn’t have check constraints to validate that it only contains data with values that are allowed in partition 1 on the (partitioned) tgt_tbl.
If such a constraint was not added while creating the table, you can alter the table to add the constraint.
ALTER TABLE src_tbl
WITH CHECK ADD CONSTRAINT orderdate_check
CHECK(order_date IS NOT NULL AND order_date <'2008-01-01')
3. Switch from partitioned table to non-partitioned table
ALTER TABLE src_tbl SWITCH PARTITION 1 TO tgt_tbl;
4. Switch from partitioned table to partitioned table
ALTER TABLE src_tbl SWITCH PARTITION 2 TO tgt_tbl PARTITION 2
In our example, tgt_tbl didn’t had any rows in partition # 2 but if it had rows in that partition, sql server will throw this error message:
ALTER TABLE SWITCH statement failed. The specified partition 2 of target table ‘Distribution_16.dbo.Table_54e7fg12e67a45d9859e904b723a9ae7_16’ must be empty.
There are two ways to deal with the scenario when target table’s partition is not empty:
Solution 1
--Clear partition # 2 by switching it to a dummy table (partitioned to non-partitioned table switching pattern)
ALTER TABLE tgt_tbl PARTITION 2 SWITCH TO stg_tbl;
--Switch from partitioned table to partitioned table
ALTER TABLE src_tbl SWITCH PARTITION 2 to tgt_tbl PARTITION 2;
--truncate dummy table
TRUNCATE TABLE stg_tbl;
Solution 2
ALTER TABLE src_tbl SWITCH PARTITION 2 to tgt_tbl PARTITION WITH (TRUNCATE_TARGET = ON)
Query operations
Partitioning can improve query performance by limiting the scan to only the qualifying partitions. This method of filtering can avoid a full table scan and only scan a smaller subset of data.
Note: Clustered columnstore indexes make predicate elimination performance benefits less beneficial, but in some cases there can be a benefit to queries.
For more information refer Microsoft Documentation
Comments