PostgreSQL 9.4: Index Only Used For Lower INET RangesOptimizing queries on a range of timestamps (two...
Intern applicant asking for compensation equivalent to that of permanent employee
Incorporating research and background: How much is too much?
One Half of Ten; A Riddle
Why publish a research paper when a blog post or a lecture slide can have more citation count than a journal paper?
CREATE ASSEMBLY System.DirectoryServices.AccountManagement.dll without enabling TRUSTWORTHY
what does しにみえてる mean?
Why is mind meld hard for T'pol in Star Trek: Enterprise?
Early credit roll before the end of the film
Why zero tolerance on nudity in space?
Why is the copy constructor called twice in this code snippet?
Would a National Army of mercenaries be a feasible idea?
If I delete my router's history can my ISP still provide it to my parents?
How can I get my players to come to the game session after agreeing to a date?
Why would the Pakistan airspace closure cancel flights not headed to Pakistan itself?
Can a hotel cancel a confirmed reservation?
How to say "Brexit" in Latin?
How to deal with an incendiary email that was recalled
Math Saturation Symbol
Am I a Rude Number?
Why avoid shared user accounts?
Which one of these password policies is more secure?
If I deleted a game I lost the disc for, can I reinstall it digitally?
Why would space fleets be aligned?
How do Chazal know that the descendants of a Mamzer may never marry into the general populace?
PostgreSQL 9.4: Index Only Used For Lower INET Ranges
Optimizing queries on a range of timestamps (two columns)PostgreSQL 9.2 (PostGIS) performance problemHow to index WHERE (start_date >= '2013-12-15')How can I speed up a Postgres query containing lots of Joins with an ILIKE conditionpostgres explain plan with giant gaps between operationsSlow fulltext search due to wildly inaccurate row estimatesIndex for numeric field is not usedpostgresql 9.2 hash join issueSorting killing my postgresql queryWhy is this query with WHERE, ORDER BY and LIMIT so slow?
For some reason queries on the high range of are extremely slow and queries on the low range are extremely fast.
My index is an adapted version of the answer in Optimizing queries on a range of timestamps (two columns):
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high);
but it only works for low ranges for some reason.
Table
CREATE TABLE IF NOT EXISTS ip_range_domains (
ip_range_domain_id BIGSERIAL PRIMARY KEY,
domain_id BIGINT REFERENCES domains NOT NULL,
source_type_id INTEGER REFERENCES source_types NOT NULL,
low INET NOT NULL,
high INET NOT NULL,
auto_high_conf BOOLEAN NOT NULL DEFAULT FALSE,
invalidation_reason_id INTEGER REFERENCES invalidation_reasons DEFAULT NULL,
invalidated_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NULL,
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT current_timestamp
);
CREATE INDEX domain_id_btree ON ip_range_domains (domain_id);
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high);
This is fast:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '8.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=25411.02..278369.96 rows=948529 width=55) (actual time=61.514..61.567 rows=55 loops=1)
Recheck Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Heap Blocks: exact=23
Buffers: shared hit=3613
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..25173.89 rows=948529 width=0) (actual time=61.493..61.493 rows=55 loops=1)
Index Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Buffers: shared hit=3590
Planning time: 0.537 ms
Execution time: 61.631 ms
This is slow:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '200.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=1016801 width=55) (actual time=14090.840..21951.343 rows=1 loops=1)
Filter: (('200.8.8.8'::inet >= low) AND ('200.8.8.8'::inet <= high))
Rows Removed by Filter: 23156868
Buffers: shared hit=21232 read=217499
Planning time: 0.111 ms
Execution time: 21951.376 ms
After a bit of manual investigation I found that 74.181.234.146 uses the index but 74.181.234.147 does not. Interestingly, as I get higher, the queries that use the index start taking 600-700ms. Perhaps that's just an issue of finding the data on disk. That's an acceptable response time but faster would be better.
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.146'::INET BETWEEN low AND high;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=256258.42..580278.67 rows=5685950 width=55) (actual time=593.066..593.068 rows=3 loops=1)
Recheck Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Heap Blocks: exact=3
Buffers: shared hit=38630
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..254836.93 rows=5685950 width=0) (actual time=593.057..593.057 rows=3 loops=1)
Index Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Buffers: shared hit=38627
Planning time: 0.108 ms
Execution time: 593.094 ms
Lowest query that doesn't use an index:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.147'::INET BETWEEN low AND high;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=5685950 width=55) (actual time=5723.461..21914.826 rows=3 loops=1)
Filter: (('74.181.234.147'::inet >= low) AND ('74.181.234.147'::inet <= high))
Rows Removed by Filter: 23156866
Buffers: shared hit=21864 read=216867
Planning time: 0.108 ms
Execution time: 21914.850 ms
Version:
=> SELECT version();
version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit
postgresql query-performance postgresql-9.4
bumped to the homepage by Community♦ 3 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
For some reason queries on the high range of are extremely slow and queries on the low range are extremely fast.
My index is an adapted version of the answer in Optimizing queries on a range of timestamps (two columns):
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high);
but it only works for low ranges for some reason.
Table
CREATE TABLE IF NOT EXISTS ip_range_domains (
ip_range_domain_id BIGSERIAL PRIMARY KEY,
domain_id BIGINT REFERENCES domains NOT NULL,
source_type_id INTEGER REFERENCES source_types NOT NULL,
low INET NOT NULL,
high INET NOT NULL,
auto_high_conf BOOLEAN NOT NULL DEFAULT FALSE,
invalidation_reason_id INTEGER REFERENCES invalidation_reasons DEFAULT NULL,
invalidated_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NULL,
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT current_timestamp
);
CREATE INDEX domain_id_btree ON ip_range_domains (domain_id);
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high);
This is fast:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '8.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=25411.02..278369.96 rows=948529 width=55) (actual time=61.514..61.567 rows=55 loops=1)
Recheck Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Heap Blocks: exact=23
Buffers: shared hit=3613
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..25173.89 rows=948529 width=0) (actual time=61.493..61.493 rows=55 loops=1)
Index Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Buffers: shared hit=3590
Planning time: 0.537 ms
Execution time: 61.631 ms
This is slow:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '200.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=1016801 width=55) (actual time=14090.840..21951.343 rows=1 loops=1)
Filter: (('200.8.8.8'::inet >= low) AND ('200.8.8.8'::inet <= high))
Rows Removed by Filter: 23156868
Buffers: shared hit=21232 read=217499
Planning time: 0.111 ms
Execution time: 21951.376 ms
After a bit of manual investigation I found that 74.181.234.146 uses the index but 74.181.234.147 does not. Interestingly, as I get higher, the queries that use the index start taking 600-700ms. Perhaps that's just an issue of finding the data on disk. That's an acceptable response time but faster would be better.
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.146'::INET BETWEEN low AND high;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=256258.42..580278.67 rows=5685950 width=55) (actual time=593.066..593.068 rows=3 loops=1)
Recheck Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Heap Blocks: exact=3
Buffers: shared hit=38630
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..254836.93 rows=5685950 width=0) (actual time=593.057..593.057 rows=3 loops=1)
Index Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Buffers: shared hit=38627
Planning time: 0.108 ms
Execution time: 593.094 ms
Lowest query that doesn't use an index:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.147'::INET BETWEEN low AND high;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=5685950 width=55) (actual time=5723.461..21914.826 rows=3 loops=1)
Filter: (('74.181.234.147'::inet >= low) AND ('74.181.234.147'::inet <= high))
Rows Removed by Filter: 23156866
Buffers: shared hit=21864 read=216867
Planning time: 0.108 ms
Execution time: 21914.850 ms
Version:
=> SELECT version();
version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit
postgresql query-performance postgresql-9.4
bumped to the homepage by Community♦ 3 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
What's the distribution of data, is it roughly uniform? What is sample data for low and high?
– Jakub Kania
Jan 27 '16 at 20:02
add a comment |
For some reason queries on the high range of are extremely slow and queries on the low range are extremely fast.
My index is an adapted version of the answer in Optimizing queries on a range of timestamps (two columns):
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high);
but it only works for low ranges for some reason.
Table
CREATE TABLE IF NOT EXISTS ip_range_domains (
ip_range_domain_id BIGSERIAL PRIMARY KEY,
domain_id BIGINT REFERENCES domains NOT NULL,
source_type_id INTEGER REFERENCES source_types NOT NULL,
low INET NOT NULL,
high INET NOT NULL,
auto_high_conf BOOLEAN NOT NULL DEFAULT FALSE,
invalidation_reason_id INTEGER REFERENCES invalidation_reasons DEFAULT NULL,
invalidated_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NULL,
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT current_timestamp
);
CREATE INDEX domain_id_btree ON ip_range_domains (domain_id);
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high);
This is fast:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '8.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=25411.02..278369.96 rows=948529 width=55) (actual time=61.514..61.567 rows=55 loops=1)
Recheck Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Heap Blocks: exact=23
Buffers: shared hit=3613
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..25173.89 rows=948529 width=0) (actual time=61.493..61.493 rows=55 loops=1)
Index Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Buffers: shared hit=3590
Planning time: 0.537 ms
Execution time: 61.631 ms
This is slow:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '200.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=1016801 width=55) (actual time=14090.840..21951.343 rows=1 loops=1)
Filter: (('200.8.8.8'::inet >= low) AND ('200.8.8.8'::inet <= high))
Rows Removed by Filter: 23156868
Buffers: shared hit=21232 read=217499
Planning time: 0.111 ms
Execution time: 21951.376 ms
After a bit of manual investigation I found that 74.181.234.146 uses the index but 74.181.234.147 does not. Interestingly, as I get higher, the queries that use the index start taking 600-700ms. Perhaps that's just an issue of finding the data on disk. That's an acceptable response time but faster would be better.
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.146'::INET BETWEEN low AND high;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=256258.42..580278.67 rows=5685950 width=55) (actual time=593.066..593.068 rows=3 loops=1)
Recheck Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Heap Blocks: exact=3
Buffers: shared hit=38630
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..254836.93 rows=5685950 width=0) (actual time=593.057..593.057 rows=3 loops=1)
Index Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Buffers: shared hit=38627
Planning time: 0.108 ms
Execution time: 593.094 ms
Lowest query that doesn't use an index:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.147'::INET BETWEEN low AND high;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=5685950 width=55) (actual time=5723.461..21914.826 rows=3 loops=1)
Filter: (('74.181.234.147'::inet >= low) AND ('74.181.234.147'::inet <= high))
Rows Removed by Filter: 23156866
Buffers: shared hit=21864 read=216867
Planning time: 0.108 ms
Execution time: 21914.850 ms
Version:
=> SELECT version();
version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit
postgresql query-performance postgresql-9.4
For some reason queries on the high range of are extremely slow and queries on the low range are extremely fast.
My index is an adapted version of the answer in Optimizing queries on a range of timestamps (two columns):
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high);
but it only works for low ranges for some reason.
Table
CREATE TABLE IF NOT EXISTS ip_range_domains (
ip_range_domain_id BIGSERIAL PRIMARY KEY,
domain_id BIGINT REFERENCES domains NOT NULL,
source_type_id INTEGER REFERENCES source_types NOT NULL,
low INET NOT NULL,
high INET NOT NULL,
auto_high_conf BOOLEAN NOT NULL DEFAULT FALSE,
invalidation_reason_id INTEGER REFERENCES invalidation_reasons DEFAULT NULL,
invalidated_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NULL,
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT current_timestamp
);
CREATE INDEX domain_id_btree ON ip_range_domains (domain_id);
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high);
This is fast:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '8.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=25411.02..278369.96 rows=948529 width=55) (actual time=61.514..61.567 rows=55 loops=1)
Recheck Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Heap Blocks: exact=23
Buffers: shared hit=3613
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..25173.89 rows=948529 width=0) (actual time=61.493..61.493 rows=55 loops=1)
Index Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Buffers: shared hit=3590
Planning time: 0.537 ms
Execution time: 61.631 ms
This is slow:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '200.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=1016801 width=55) (actual time=14090.840..21951.343 rows=1 loops=1)
Filter: (('200.8.8.8'::inet >= low) AND ('200.8.8.8'::inet <= high))
Rows Removed by Filter: 23156868
Buffers: shared hit=21232 read=217499
Planning time: 0.111 ms
Execution time: 21951.376 ms
After a bit of manual investigation I found that 74.181.234.146 uses the index but 74.181.234.147 does not. Interestingly, as I get higher, the queries that use the index start taking 600-700ms. Perhaps that's just an issue of finding the data on disk. That's an acceptable response time but faster would be better.
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.146'::INET BETWEEN low AND high;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=256258.42..580278.67 rows=5685950 width=55) (actual time=593.066..593.068 rows=3 loops=1)
Recheck Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Heap Blocks: exact=3
Buffers: shared hit=38630
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..254836.93 rows=5685950 width=0) (actual time=593.057..593.057 rows=3 loops=1)
Index Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Buffers: shared hit=38627
Planning time: 0.108 ms
Execution time: 593.094 ms
Lowest query that doesn't use an index:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.147'::INET BETWEEN low AND high;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=5685950 width=55) (actual time=5723.461..21914.826 rows=3 loops=1)
Filter: (('74.181.234.147'::inet >= low) AND ('74.181.234.147'::inet <= high))
Rows Removed by Filter: 23156866
Buffers: shared hit=21864 read=216867
Planning time: 0.108 ms
Execution time: 21914.850 ms
Version:
=> SELECT version();
version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit
postgresql query-performance postgresql-9.4
postgresql query-performance postgresql-9.4
edited Apr 13 '17 at 12:42
Community♦
1
1
asked Jan 27 '16 at 16:51
Josh HorowitzJosh Horowitz
1133
1133
bumped to the homepage by Community♦ 3 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 3 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
What's the distribution of data, is it roughly uniform? What is sample data for low and high?
– Jakub Kania
Jan 27 '16 at 20:02
add a comment |
What's the distribution of data, is it roughly uniform? What is sample data for low and high?
– Jakub Kania
Jan 27 '16 at 20:02
What's the distribution of data, is it roughly uniform? What is sample data for low and high?
– Jakub Kania
Jan 27 '16 at 20:02
What's the distribution of data, is it roughly uniform? What is sample data for low and high?
– Jakub Kania
Jan 27 '16 at 20:02
add a comment |
1 Answer
1
active
oldest
votes
Your statistics are way off. The planner thinks it's going to retrieve 948529, 1016801, 5685950 and 5685950 rows, respectively, when infact it retrieves 55, 1, 3 and 3.
In the second example, that is why it's preferring a seqscan (I am guessing it expects to have about 5e+6 values in the table and 1e+6, being 20% of the entire table, is not worth an index scan).
What you need is increasing the statistics target on the two columns (ALTER TABLE ip_range_domains ALTER COLUMN high SET STATISTICS 1000
) and run ANALYZE ip_range_domains
. This will give the planner a much better idea of the data contained in this table (specifically, the two columns you are filtering on). Try playing with different values (10000 is the maximum).
The decision the planner needs to make is that between a sequential scan of a table, where each individual read is quite cheap (not least because the OS can prefetch that for you, but also because of the mechanics of the storage medium), and a random read (which is believed to be much more expensive because of... well, random, unpredictable reading pattern).
If the planner's conclusion is it will have to read 30% of the index to come up with 20% of the table, which it will all have to retrieve using random fetches, that is simply just too costly, so it will prefer a full sequential scan.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f127500%2fpostgresql-9-4-index-only-used-for-lower-inet-ranges%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Your statistics are way off. The planner thinks it's going to retrieve 948529, 1016801, 5685950 and 5685950 rows, respectively, when infact it retrieves 55, 1, 3 and 3.
In the second example, that is why it's preferring a seqscan (I am guessing it expects to have about 5e+6 values in the table and 1e+6, being 20% of the entire table, is not worth an index scan).
What you need is increasing the statistics target on the two columns (ALTER TABLE ip_range_domains ALTER COLUMN high SET STATISTICS 1000
) and run ANALYZE ip_range_domains
. This will give the planner a much better idea of the data contained in this table (specifically, the two columns you are filtering on). Try playing with different values (10000 is the maximum).
The decision the planner needs to make is that between a sequential scan of a table, where each individual read is quite cheap (not least because the OS can prefetch that for you, but also because of the mechanics of the storage medium), and a random read (which is believed to be much more expensive because of... well, random, unpredictable reading pattern).
If the planner's conclusion is it will have to read 30% of the index to come up with 20% of the table, which it will all have to retrieve using random fetches, that is simply just too costly, so it will prefer a full sequential scan.
add a comment |
Your statistics are way off. The planner thinks it's going to retrieve 948529, 1016801, 5685950 and 5685950 rows, respectively, when infact it retrieves 55, 1, 3 and 3.
In the second example, that is why it's preferring a seqscan (I am guessing it expects to have about 5e+6 values in the table and 1e+6, being 20% of the entire table, is not worth an index scan).
What you need is increasing the statistics target on the two columns (ALTER TABLE ip_range_domains ALTER COLUMN high SET STATISTICS 1000
) and run ANALYZE ip_range_domains
. This will give the planner a much better idea of the data contained in this table (specifically, the two columns you are filtering on). Try playing with different values (10000 is the maximum).
The decision the planner needs to make is that between a sequential scan of a table, where each individual read is quite cheap (not least because the OS can prefetch that for you, but also because of the mechanics of the storage medium), and a random read (which is believed to be much more expensive because of... well, random, unpredictable reading pattern).
If the planner's conclusion is it will have to read 30% of the index to come up with 20% of the table, which it will all have to retrieve using random fetches, that is simply just too costly, so it will prefer a full sequential scan.
add a comment |
Your statistics are way off. The planner thinks it's going to retrieve 948529, 1016801, 5685950 and 5685950 rows, respectively, when infact it retrieves 55, 1, 3 and 3.
In the second example, that is why it's preferring a seqscan (I am guessing it expects to have about 5e+6 values in the table and 1e+6, being 20% of the entire table, is not worth an index scan).
What you need is increasing the statistics target on the two columns (ALTER TABLE ip_range_domains ALTER COLUMN high SET STATISTICS 1000
) and run ANALYZE ip_range_domains
. This will give the planner a much better idea of the data contained in this table (specifically, the two columns you are filtering on). Try playing with different values (10000 is the maximum).
The decision the planner needs to make is that between a sequential scan of a table, where each individual read is quite cheap (not least because the OS can prefetch that for you, but also because of the mechanics of the storage medium), and a random read (which is believed to be much more expensive because of... well, random, unpredictable reading pattern).
If the planner's conclusion is it will have to read 30% of the index to come up with 20% of the table, which it will all have to retrieve using random fetches, that is simply just too costly, so it will prefer a full sequential scan.
Your statistics are way off. The planner thinks it's going to retrieve 948529, 1016801, 5685950 and 5685950 rows, respectively, when infact it retrieves 55, 1, 3 and 3.
In the second example, that is why it's preferring a seqscan (I am guessing it expects to have about 5e+6 values in the table and 1e+6, being 20% of the entire table, is not worth an index scan).
What you need is increasing the statistics target on the two columns (ALTER TABLE ip_range_domains ALTER COLUMN high SET STATISTICS 1000
) and run ANALYZE ip_range_domains
. This will give the planner a much better idea of the data contained in this table (specifically, the two columns you are filtering on). Try playing with different values (10000 is the maximum).
The decision the planner needs to make is that between a sequential scan of a table, where each individual read is quite cheap (not least because the OS can prefetch that for you, but also because of the mechanics of the storage medium), and a random read (which is believed to be much more expensive because of... well, random, unpredictable reading pattern).
If the planner's conclusion is it will have to read 30% of the index to come up with 20% of the table, which it will all have to retrieve using random fetches, that is simply just too costly, so it will prefer a full sequential scan.
answered Jan 27 '16 at 21:48
Grega BremecGrega Bremec
744
744
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f127500%2fpostgresql-9-4-index-only-used-for-lower-inet-ranges%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What's the distribution of data, is it roughly uniform? What is sample data for low and high?
– Jakub Kania
Jan 27 '16 at 20:02