Regular indexed keys and the
serve very similar purposes in Google Cloud Spanner.
For example, let's consider this table which describes completable tasks.
CREATE TABLE tasks created_by_user_id STRING(36) NOT NULL, # e.g. "Buy milk" content STRING(MAX), # "HIGH" | "LOW" priority STRING(MAX), completed_at TIMESTAMP,
Let's imagine we want to show the current user all their tasks for a given priority. One way we can speed up this query is by creating the following index.
CREATE INDEX active_tasks_by_priority_idx ON user ( created_by_user_id, completed_at, priority )
Alternatively, we could have
priority in a
STORING clause instead of in the index keys.
CREATE INDEX active_tasks_by_priority_idx ON user ( created_by_user_id, completed_at ) STORING ( priority )
The advantage of adding
STORING over an index with just the other two keys is that it removes the
need to join on the base
tasks table which makes the query cheaper as
priority is colocated with
the index. If we did not store
priority, Spanner would have to look at the values in our index then
go to the base
tasks table to look up the
priority to the indexed keys would also avoid having to look up the value in the base
table. Let's imagine our app limits the user to three active (
compelted_at IS NOT NULL) HIGH
priority tasks and five active MEDIUM priority tasks (focus is important). This means that when we
create a new task, we need to first do a read inside a read-write transaction to avoid a task being
created which exceeds the task limit, then do our write to create the task. What is the difference
between using indexed keys and
STORING if we create one HIGH priority task then one MEDIUM priority
STORING is used and we create a new HIGH priority task, the transaction will lock all rows
matching the specified
completed_at values. i.e. All user task creation
will be locked, regardless of priority. This means that the subsequent MEDIUM priority task creation
will be blocked.
priority is an indexed key, on the other hand, the transaction will only lock rows which match
priority. Consequently, the MEDIUM priority task creation will not be blocked.
This is a trivial example where the blocking is minimal. However, with larger amounts of data, this difference can cause serious performance issues.
Also note that read-only transactions do not acquire locks so this difference only applies to read-write transactions.
STORINGcomes at the cost of extra storage space vs not using
STORING. However, this is also true of additional index keys. What is the difference in cost here?
INTERLEAVEfit in here? Like
STORING, it is an approach to reduce join costs.