Best index for column where values are mostly the sameHow to index a query with `WHERE field IS...
How Create a list of the first 10,000 digits of Pi and sum it?
Is there a way to pause a running process on Linux systems and resume later?
Was Opportunity's last message to Earth "My battery is low and it's getting dark"?
What does an unprocessed RAW file look like?
How bad is a Computer Science course that doesn't teach Design Patterns?
Why write a book when there's a movie in my head?
Is it possible to detect 100% of SQLi with a simple regex?
Now...where was I?
How do I avoid the "chosen hero" feeling?
How can I differentiate duration vs starting time
Why Doesn't It Completely Uninstall?
How do I handle a blinded enemy which wants to attack someone it's sure is there?
SQL Server 2017 crashes when backing up because filepath is wrong
Build ASCII Podiums
How to typeset a small black square as a binary operator?
Can a planet be tidally unlocked?
Variance of sine and cosine of a random variable
What's the function of the word "ли" in the following contexts?
Have the UK Conservatives lost the working majority and if so, what does this mean?
Taking an academic pseudonym?
Did the characters in Moving Pictures not know about cameras like Twoflower's?
Badly designed reimbursement form. What does that say about the company?
Question: "Are you hungry?" Answer: "I feel like eating."
In the Lost in Space intro why was Dr. Smith actor listed as a special guest star?
Best index for column where values are mostly the same
How to index a query with `WHERE field IS NULL`?PostgreSQL partial index unused when created on a table with existing dataPostgres partial index on IS NULL not workingHow do databases store index key values (on-disk) for variable length fields?Select rows, where 3 columns have the same valuesnumbering rows consecutively for a number of tablesIndexes: integer vs string performance if the number of nodes is the sameAre two logically equal indices physically the same index?Are PostgreSQL clusters and servers the same thing?How to make value in a row equal to now() in postgresMultiple COUNTs over the same columnWhat is the data type of the ‘ctid’ system column in Postgres?Multi-column and single-column index on the same column
We have an integer column that currently consists only of 0 or 1 values. This column has now been used by a developer to store a unique 32-bit identifier on some occasions, and we need to be able to efficiently pull out rows containing any one of these identifiers.
Given the value will be 0 or 1 say (I don't have figures yet) 99% of the time, how might it best be indexed to query against the minority case? Am I even right in thinking the volume of common values will be an issue?
Column | Type | Modifiers
----------------------------+---------+--------------------
event_value | integer | not null
There are currently no indexes on this column. And I don't envisage the need to regularly select just the 0 or 1 values.
The table is of a reasonable size, currently 30 million rows and growing fast.
I appreciate this isn't the best use of the column, but that can't change in the short term.
postgresql index postgresql-9.6
|
show 1 more comment
We have an integer column that currently consists only of 0 or 1 values. This column has now been used by a developer to store a unique 32-bit identifier on some occasions, and we need to be able to efficiently pull out rows containing any one of these identifiers.
Given the value will be 0 or 1 say (I don't have figures yet) 99% of the time, how might it best be indexed to query against the minority case? Am I even right in thinking the volume of common values will be an issue?
Column | Type | Modifiers
----------------------------+---------+--------------------
event_value | integer | not null
There are currently no indexes on this column. And I don't envisage the need to regularly select just the 0 or 1 values.
The table is of a reasonable size, currently 30 million rows and growing fast.
I appreciate this isn't the best use of the column, but that can't change in the short term.
postgresql index postgresql-9.6
3
Maybe a filtered index?create index on the_table (...) where event_value > 1;
– a_horse_with_no_name
2 days ago
3
What @a_horse_with_no_name suggests. But you'll then should be using modified filters:where (event_value > 1) and event_value = @XYZ
, so the filtered index is used.
– ypercubeᵀᴹ
2 days ago
@a_horse_with_no_name Ah, ok. I didn't even realise you could have a where clause attached to the index. Thanks.
– whoasked
2 days ago
1
Yes. The benefit compared to a common index is that it has a much smaller size (1% in your case) as it will store only the rows with values that differ from 0,1. @a_horse_with_no_name, add an answer?
– ypercubeᵀᴹ
2 days ago
@ypercubeᵀᴹ You should not even need the modified filters. I can construct situations in which you do, but they are unlikely to occur in practice in this context.
– jjanes
2 days ago
|
show 1 more comment
We have an integer column that currently consists only of 0 or 1 values. This column has now been used by a developer to store a unique 32-bit identifier on some occasions, and we need to be able to efficiently pull out rows containing any one of these identifiers.
Given the value will be 0 or 1 say (I don't have figures yet) 99% of the time, how might it best be indexed to query against the minority case? Am I even right in thinking the volume of common values will be an issue?
Column | Type | Modifiers
----------------------------+---------+--------------------
event_value | integer | not null
There are currently no indexes on this column. And I don't envisage the need to regularly select just the 0 or 1 values.
The table is of a reasonable size, currently 30 million rows and growing fast.
I appreciate this isn't the best use of the column, but that can't change in the short term.
postgresql index postgresql-9.6
We have an integer column that currently consists only of 0 or 1 values. This column has now been used by a developer to store a unique 32-bit identifier on some occasions, and we need to be able to efficiently pull out rows containing any one of these identifiers.
Given the value will be 0 or 1 say (I don't have figures yet) 99% of the time, how might it best be indexed to query against the minority case? Am I even right in thinking the volume of common values will be an issue?
Column | Type | Modifiers
----------------------------+---------+--------------------
event_value | integer | not null
There are currently no indexes on this column. And I don't envisage the need to regularly select just the 0 or 1 values.
The table is of a reasonable size, currently 30 million rows and growing fast.
I appreciate this isn't the best use of the column, but that can't change in the short term.
postgresql index postgresql-9.6
postgresql index postgresql-9.6
edited 2 days ago
MDCCL
6,78331745
6,78331745
asked 2 days ago
whoaskedwhoasked
195
195
3
Maybe a filtered index?create index on the_table (...) where event_value > 1;
– a_horse_with_no_name
2 days ago
3
What @a_horse_with_no_name suggests. But you'll then should be using modified filters:where (event_value > 1) and event_value = @XYZ
, so the filtered index is used.
– ypercubeᵀᴹ
2 days ago
@a_horse_with_no_name Ah, ok. I didn't even realise you could have a where clause attached to the index. Thanks.
– whoasked
2 days ago
1
Yes. The benefit compared to a common index is that it has a much smaller size (1% in your case) as it will store only the rows with values that differ from 0,1. @a_horse_with_no_name, add an answer?
– ypercubeᵀᴹ
2 days ago
@ypercubeᵀᴹ You should not even need the modified filters. I can construct situations in which you do, but they are unlikely to occur in practice in this context.
– jjanes
2 days ago
|
show 1 more comment
3
Maybe a filtered index?create index on the_table (...) where event_value > 1;
– a_horse_with_no_name
2 days ago
3
What @a_horse_with_no_name suggests. But you'll then should be using modified filters:where (event_value > 1) and event_value = @XYZ
, so the filtered index is used.
– ypercubeᵀᴹ
2 days ago
@a_horse_with_no_name Ah, ok. I didn't even realise you could have a where clause attached to the index. Thanks.
– whoasked
2 days ago
1
Yes. The benefit compared to a common index is that it has a much smaller size (1% in your case) as it will store only the rows with values that differ from 0,1. @a_horse_with_no_name, add an answer?
– ypercubeᵀᴹ
2 days ago
@ypercubeᵀᴹ You should not even need the modified filters. I can construct situations in which you do, but they are unlikely to occur in practice in this context.
– jjanes
2 days ago
3
3
Maybe a filtered index?
create index on the_table (...) where event_value > 1;
– a_horse_with_no_name
2 days ago
Maybe a filtered index?
create index on the_table (...) where event_value > 1;
– a_horse_with_no_name
2 days ago
3
3
What @a_horse_with_no_name suggests. But you'll then should be using modified filters:
where (event_value > 1) and event_value = @XYZ
, so the filtered index is used.– ypercubeᵀᴹ
2 days ago
What @a_horse_with_no_name suggests. But you'll then should be using modified filters:
where (event_value > 1) and event_value = @XYZ
, so the filtered index is used.– ypercubeᵀᴹ
2 days ago
@a_horse_with_no_name Ah, ok. I didn't even realise you could have a where clause attached to the index. Thanks.
– whoasked
2 days ago
@a_horse_with_no_name Ah, ok. I didn't even realise you could have a where clause attached to the index. Thanks.
– whoasked
2 days ago
1
1
Yes. The benefit compared to a common index is that it has a much smaller size (1% in your case) as it will store only the rows with values that differ from 0,1. @a_horse_with_no_name, add an answer?
– ypercubeᵀᴹ
2 days ago
Yes. The benefit compared to a common index is that it has a much smaller size (1% in your case) as it will store only the rows with values that differ from 0,1. @a_horse_with_no_name, add an answer?
– ypercubeᵀᴹ
2 days ago
@ypercubeᵀᴹ You should not even need the modified filters. I can construct situations in which you do, but they are unlikely to occur in practice in this context.
– jjanes
2 days ago
@ypercubeᵀᴹ You should not even need the modified filters. I can construct situations in which you do, but they are unlikely to occur in practice in this context.
– jjanes
2 days ago
|
show 1 more comment
1 Answer
1
active
oldest
votes
First off, like you said yourself, not the best use of the column. Should be a separate boolean
and an integer
column for your "32.bit identifiers". If that's NULL
99% of the time, that is no problem. NULL
storage is very cheap.
Either way, you should definitely use a partial index. (That's the proper term as used in the manual.) Excluding 99 % of the rows from the index makes it massively smaller, which matters for performance with millions of rows.
While the rare values are "32-bit identifiers", it may be incorrect to assume those are all > 1. Postgres uses signed integer, and 32-bit entities would also cover negative numbers. (Can we even rule out 0
or 1
as one of those identifiers?) If there can be negative values, too:
CREATE INDEX tbl_event_value_part_idx ON tbl (event_value)
WHERE event_value > 1 OR event_value < 0; -- or similar
event_value
does not have to be an index column, regardless of its use in the WHERE clause. That entirely depends on the kinds of queries to expect. Either way, the safe bet is to add the same WHERE
conditions literally to any query supposed to use the index, even if that's logically redundant. Postgres can make very basic logical conclusions to determine applicable indexes, but it is no AI and does not try to be (would get too expensive quickly). Like:
SELECT * FROM tbl WHERE event_value > 1 OR event_value < 0
Related:
- Postgres partial index on IS NULL not working
- PostgreSQL partial index unused when created on a table with existing data
- How to index a query with `WHERE field IS NULL`?
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230135%2fbest-index-for-column-where-values-are-mostly-the-same%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
First off, like you said yourself, not the best use of the column. Should be a separate boolean
and an integer
column for your "32.bit identifiers". If that's NULL
99% of the time, that is no problem. NULL
storage is very cheap.
Either way, you should definitely use a partial index. (That's the proper term as used in the manual.) Excluding 99 % of the rows from the index makes it massively smaller, which matters for performance with millions of rows.
While the rare values are "32-bit identifiers", it may be incorrect to assume those are all > 1. Postgres uses signed integer, and 32-bit entities would also cover negative numbers. (Can we even rule out 0
or 1
as one of those identifiers?) If there can be negative values, too:
CREATE INDEX tbl_event_value_part_idx ON tbl (event_value)
WHERE event_value > 1 OR event_value < 0; -- or similar
event_value
does not have to be an index column, regardless of its use in the WHERE clause. That entirely depends on the kinds of queries to expect. Either way, the safe bet is to add the same WHERE
conditions literally to any query supposed to use the index, even if that's logically redundant. Postgres can make very basic logical conclusions to determine applicable indexes, but it is no AI and does not try to be (would get too expensive quickly). Like:
SELECT * FROM tbl WHERE event_value > 1 OR event_value < 0
Related:
- Postgres partial index on IS NULL not working
- PostgreSQL partial index unused when created on a table with existing data
- How to index a query with `WHERE field IS NULL`?
add a comment |
First off, like you said yourself, not the best use of the column. Should be a separate boolean
and an integer
column for your "32.bit identifiers". If that's NULL
99% of the time, that is no problem. NULL
storage is very cheap.
Either way, you should definitely use a partial index. (That's the proper term as used in the manual.) Excluding 99 % of the rows from the index makes it massively smaller, which matters for performance with millions of rows.
While the rare values are "32-bit identifiers", it may be incorrect to assume those are all > 1. Postgres uses signed integer, and 32-bit entities would also cover negative numbers. (Can we even rule out 0
or 1
as one of those identifiers?) If there can be negative values, too:
CREATE INDEX tbl_event_value_part_idx ON tbl (event_value)
WHERE event_value > 1 OR event_value < 0; -- or similar
event_value
does not have to be an index column, regardless of its use in the WHERE clause. That entirely depends on the kinds of queries to expect. Either way, the safe bet is to add the same WHERE
conditions literally to any query supposed to use the index, even if that's logically redundant. Postgres can make very basic logical conclusions to determine applicable indexes, but it is no AI and does not try to be (would get too expensive quickly). Like:
SELECT * FROM tbl WHERE event_value > 1 OR event_value < 0
Related:
- Postgres partial index on IS NULL not working
- PostgreSQL partial index unused when created on a table with existing data
- How to index a query with `WHERE field IS NULL`?
add a comment |
First off, like you said yourself, not the best use of the column. Should be a separate boolean
and an integer
column for your "32.bit identifiers". If that's NULL
99% of the time, that is no problem. NULL
storage is very cheap.
Either way, you should definitely use a partial index. (That's the proper term as used in the manual.) Excluding 99 % of the rows from the index makes it massively smaller, which matters for performance with millions of rows.
While the rare values are "32-bit identifiers", it may be incorrect to assume those are all > 1. Postgres uses signed integer, and 32-bit entities would also cover negative numbers. (Can we even rule out 0
or 1
as one of those identifiers?) If there can be negative values, too:
CREATE INDEX tbl_event_value_part_idx ON tbl (event_value)
WHERE event_value > 1 OR event_value < 0; -- or similar
event_value
does not have to be an index column, regardless of its use in the WHERE clause. That entirely depends on the kinds of queries to expect. Either way, the safe bet is to add the same WHERE
conditions literally to any query supposed to use the index, even if that's logically redundant. Postgres can make very basic logical conclusions to determine applicable indexes, but it is no AI and does not try to be (would get too expensive quickly). Like:
SELECT * FROM tbl WHERE event_value > 1 OR event_value < 0
Related:
- Postgres partial index on IS NULL not working
- PostgreSQL partial index unused when created on a table with existing data
- How to index a query with `WHERE field IS NULL`?
First off, like you said yourself, not the best use of the column. Should be a separate boolean
and an integer
column for your "32.bit identifiers". If that's NULL
99% of the time, that is no problem. NULL
storage is very cheap.
Either way, you should definitely use a partial index. (That's the proper term as used in the manual.) Excluding 99 % of the rows from the index makes it massively smaller, which matters for performance with millions of rows.
While the rare values are "32-bit identifiers", it may be incorrect to assume those are all > 1. Postgres uses signed integer, and 32-bit entities would also cover negative numbers. (Can we even rule out 0
or 1
as one of those identifiers?) If there can be negative values, too:
CREATE INDEX tbl_event_value_part_idx ON tbl (event_value)
WHERE event_value > 1 OR event_value < 0; -- or similar
event_value
does not have to be an index column, regardless of its use in the WHERE clause. That entirely depends on the kinds of queries to expect. Either way, the safe bet is to add the same WHERE
conditions literally to any query supposed to use the index, even if that's logically redundant. Postgres can make very basic logical conclusions to determine applicable indexes, but it is no AI and does not try to be (would get too expensive quickly). Like:
SELECT * FROM tbl WHERE event_value > 1 OR event_value < 0
Related:
- Postgres partial index on IS NULL not working
- PostgreSQL partial index unused when created on a table with existing data
- How to index a query with `WHERE field IS NULL`?
answered 4 mins ago
Erwin BrandstetterErwin Brandstetter
92.9k9178292
92.9k9178292
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230135%2fbest-index-for-column-where-values-are-mostly-the-same%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
Maybe a filtered index?
create index on the_table (...) where event_value > 1;
– a_horse_with_no_name
2 days ago
3
What @a_horse_with_no_name suggests. But you'll then should be using modified filters:
where (event_value > 1) and event_value = @XYZ
, so the filtered index is used.– ypercubeᵀᴹ
2 days ago
@a_horse_with_no_name Ah, ok. I didn't even realise you could have a where clause attached to the index. Thanks.
– whoasked
2 days ago
1
Yes. The benefit compared to a common index is that it has a much smaller size (1% in your case) as it will store only the rows with values that differ from 0,1. @a_horse_with_no_name, add an answer?
– ypercubeᵀᴹ
2 days ago
@ypercubeᵀᴹ You should not even need the modified filters. I can construct situations in which you do, but they are unlikely to occur in practice in this context.
– jjanes
2 days ago