Regular indexed keys and the
STORING clause
serve very similar purposes in Google Cloud Spanner.
For example, let's consider this table which describes completable tasks.
CREATE TABLE tasks
created_by_user_id STRING(36) NOT NULL,
# e.g. "Buy milk"
content STRING(MAX),
# "HIGH" | "LOW"
priority STRING(MAX),
completed_at TIMESTAMP,Let's imagine we want to show the current user all their tasks for a given priority. One way we can speed up this query is by creating the following index.
CREATE INDEX
active_tasks_by_priority_idx
ON
user (
created_by_user_id,
completed_at,
priority
)Alternatively, we could have priority in a STORING clause instead
of in the index keys.
CREATE INDEX
active_tasks_by_priority_idx
ON
user (
created_by_user_id,
completed_at
) STORING (
priority
)The advantage of adding STORING over an index with just the other
two keys is that it removes the need to join on the base tasks table
which makes the query cheaper as priority is colocated with the
index. If we did not store priority, Spanner would have to look at
the values in our index then go to the base tasks table to look up
the priority value.
But adding priority to the indexed keys would also avoid having to
look up the value in the base table. Let's imagine our app limits the
user to three active (compelted_at IS NOT NULL) HIGH priority tasks
and five active MEDIUM priority tasks (focus is important). This means
that when we create a new task, we need to first do a read inside a
read-write transaction to avoid a task being created which exceeds the
task limit, then do our write to create the task. What is the
difference between using indexed keys and STORING if we create one
HIGH priority task then one MEDIUM priority task?
If STORING is used and we create a new HIGH priority task, the
transaction will lock all rows matching the specified
created_by_user_id and completed_at values. i.e. All user task
creation will be locked, regardless of priority. This means that the
subsequent MEDIUM priority task creation will be blocked.
If priority is an indexed key, on the other hand, the transaction
will only lock rows which match the queried priority. Consequently,
the MEDIUM priority task creation will not be blocked.
This is a trivial example where the blocking is minimal. However, with larger amounts of data, this difference can cause serious performance issues.
Also note that read-only transactions do not acquire locks so this difference only applies to read-write transactions.
STORING comes at the cost of extra storage space vs not using
STORING. However, this is also true of additional index keys. What
is the difference in cost here?INTERLEAVE fit in here? Like STORING, it is an approach
to reduce join costs.