Can indices save a giant `WHERE foo IN (…)` query with thousands of entries in the query list?Optimizing...

Coworker asking me to not bring cakes due to self control issue. What should I do?

How to write Muḥammad ibn Mūsā al-Khwārizmī?

Was there a pre-determined arrangment for division of Germany in case it surrendered before any Soviet forces entered its territory?

How can I put a period right after the algorithm's number in the algorithm's title?

Can you say "leftside right"?

What could cause an entire planet of humans to become aphasic?

Using Ansible, how can I take actions on each file in a specific location?

What is wrong with my use of "find -print0"?

How to deal with an underperforming subordinate?

What if I miss a connection and don't have money to book next flight?

Why is it that Bernie Sanders is always called a "socialist"?

I am a giant among ants

How do I add a strong "onion flavor" to the biryani (in restaurant style)?

Promise.all returning empty objects

Including proofs of known theorems in master's thesis

Minimum Viable Product for RTS game?

What does an unprocessed RAW file look like?

How long can the stop in a stop-and-go be?

Sing Baby Shark

Why do single electrical receptacles exist?

Probability X1 ≥ X2

How can I prevent an oracle who can see into the past from knowing everything that has happened?

Is there a way to pause a running process on Linux systems and resume later?

Can I use a single resistor for multiple LED with different +ve sources?



Can indices save a giant `WHERE foo IN (…)` query with thousands of entries in the query list?


Optimizing ORDER BY in a full text search queryHow can I speed up a Postgres query containing lots of Joins with an ILIKE conditionSlow fulltext search due to wildly inaccurate row estimatesHow to index a query with `WHERE field IS NULL`?How can I create a database where entries are compressed files searchable by index?Indices for a join query with datetime rangeDuplicate entries in query with sumHow can I query for terms like “@foo” with Postgres full text search?Scalable way to manage aggregated statistics over time in hierarchy?How to index two tables for JOINed query optimisation













2















I have some client code hitting my postgres instance with a giant SELECT statement that is matching against a list of IDs. The query ends up looking like this...



SELECT "id","name","status","regionid" FROM "public"."store" WHERE "regionid" IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,$81,$82,$83,$84,$85,$86,$87,$88,$89,$90,$91,$92,$93,$94,$95,$96,$97,$98,$99,$100,$101,$102,$103,$104,$105,$106,$107,$108,$109,$110,$111,$112,$113,$114,$115,$116,$117,$118,$119,$120,$121,$122,$123,$124,$125,$126,$127,$128,$129,$130,$131,$132,$133,$134,$135,$136,$137,$138,$139,$140,$141,$142,$143,$144,$145,$146,$147,$148,$149,$150,$151,$152,$153,$154,$155,$156,$157,$158,$159,$160,$161,$162,$163,$164,$165,$166,$167,$168,$169,$170,$171,$172,$173,$174,$175,$176,$177,$178,$179,$180,$181,$182,$183,$184,$185,$186,$187,$188,$189,$190,$191,$192,$193,$194,$195,$196,$197,$198,$199,$200,$201,$202,$203,$204,$205,$206,$207,$208,$209,$210,$211,$212,$213,$214,$215,$216,$217,$218,$219,$220,$221,$222,$223,$224,$225,$226,$227,$228,$229,$230,$231,$232,$233,$234,$235,$236,$237,$238,$239,$240,$241,$242,$243,$244,$245,$246,$247,$248,$249,$250,$251,$252,$253,$254,$255,$256,$257,$258,$259,$260,$261,$262,$263,$264,$265,$266,$267,$268,$269,$270,$271,$272,$273,$274,$275,$276,$277,$278,$279,$280,$281,$282,$283,$284,$285,$286,$287,$288,$289,$290,$291,$292,$293,$294,$295,$296,$297,$298,$299,$300,$301,$302,$303,$304,$305,$306,$307,$308,$309,$310,$311,$312,$313,$314,$315,$316,$317,$318,$319,$320,$321,$322,$323,$324,$325,$326,$327,$328,$329,$330,$331,$332,$333,$334,$335,$336,$337,$338,$339,$340,$341,$342,$343,$344,$345,$346,$347,$348,$349,$350,$351,$352,$353,$354,$355,$356,$357,$358,$359,$360,$361,$362,$363,$364,$365,$366,$367,$368,$369,$370,$371,$372,$373,$374,$375,$376,$377,$378,$379,$380,$381,$382,$383,$384,$385,$386,$387,$388,$389,$390,$391,$392,$393,$394,$395,$396,$397,$398,$399,$400,$401,$402,$403,$404,$405,$406,$407,$408,$409,$410,$411,$412,$413,$414,$415,$416,$417,$418,$419,$420,$421,$422,$423,$424,$425,$426,$427,$428,$429,$430,$431,$432,$433,$434,$435,$436,$437,$438,$439,$440,$441,$442,$443,$444,$445,$446,$447,$448,$449,$450,$451,$452,$453,$454,$455,$456,$457,$458,$459,$460,$461,$462,$463,$464,$465,$466,$467,$468,$469,$470,$471,$472,$473,$474,$475,$476,$477,$478,$479,$480,$481,$482,$483,$484,$485,$486,$487,$488,$489,$490,$491,$492,$493,$494,$495,$496,$497,$498,$499,$500,$501,$502,$503,$504,$505,$506,$507,$508,$509,$510,$511,$512,$513,$514,$515,$516,$517,$518,$519,$520,$521,$522,$523,$524,$525,$526,$527,$528,$529,$530,$531,$532,$533,$534,$535,$536,$537,$538,$539,$540,$541,$542,$543,$544,$545,$546,$547,$548,$549,$550,$551,$552,$553,$554,$555,$556,$557,$558,$559,$560,$561,$562,$563,$564,$565,$566,$567,$568,$569,$570,$571,$572,$573,$574,$575,$576,$577,$578,$579,$580,$581,$582,$583,$584,$585,$586,$587,$588,$589,$590,$591,$592,$593,$594,$595,$596,$597,$598,$599,$600,$601,$602,$603,$604,$605,$606,$607,$608,$609,$610,$611,$612,$613,$614,$615,$616,$617,$618,$619,$620,$621,$622,$623,$624,$625,$626,$627,$628,$629,$630,$631,$632,$633,$634,$635,$636,$637,$638,$639,$640,$641,$642,$643,$644,$645,$646,$647,$648,$649,$650,$651,$652,$653,$654,$655,$656,$657,$658,$659,$660,$661,$662,$663,$664,$665,$666,$667,$668,$669,$670,$671,$672,$673,$674,$675,$676,$677,$678,$679,$680,$681,$682,$683,$684,$685,$686,$687,$688,$689,$690,$691,$692,$693,$694,$695,$696,$697,$698,$699,$700,$701,$702,$703,$704,$705,$706,$707,$708,$709,$710,$711,$712,$713,$714,$715,$716,$717,$718,$719,$720,$721,$722,$723,$724,$725,$726,$727,$728,$729,$730,$731,$732,$733,$734,$735,$736,$737,$738,$739,$740,$741,$742,$743,$744,$745,$746,$747,$748,$749,$750,$751,$752,$753,$754,$755,$756,$757,$758,$759,$760,$761,$762,$763,$764,$765,$766,$767,$768,$769,$770,$771,$772,$773,$774,$775,$776,$777,$778,$779,$780,$781,$782,$783,$784,$785,$786,$787,$788,$789,$790,$791,$792,$793,$794,$795,$796,$797,$798,$799,$800,$801,$802,$803,$804,$805,$806,$807,$808,$809,$810,$811,$812,$813,$814,$815,$816,$817,$818,$819,$820,$821,$822,$823,$824,$825,$826,$827,$828,$829,$830,$831,$832,$833,$834,$835,$836,$837,$838,$839,$840,$841,$842,$843,$844,$845,$846,$847,$848,$849,$850,$851,$852,$853,$854,$855,$856,$857,$858,$859,$860,$861,$862,$863,$864,$865,$866,$867,$868,$869,$870,$871,$872,$873,$874,$875,$876,$877,$878,$879,$880,$881,$882,$883,$884,$885,$886,$887,$888,$889,$890,$891,$892,$893,$894,$895,$896,$897,$898,$899,$900,$901,$902,$903,$904,$905,$906,$907,$908,$909,$910,$911,$912,$913,$914,$915,$916,$917,$918,$919,$920,$921,$922,$923,$924,$925,$926,$927,$928,$929,$930,$931,$932,$933,$934,$935,$936,$937,$938,$939,$940,$941,$942,$943,$944,$945,$946,$947,$948,$949,$950,$951,$952,$953,$954,$955,$956,$957,$958,$959,$960,$961,$962,$963,$964,$965,$966,$967,$968,$969,$970,$971,$972,$973,$974,$975,$976,$977,$978,$979,$980,$981,$982,$983,$984,$985,$986,$987,$988,$989,$990,$991,$992,$993,$994,$995,$996,$997,$998,$999,$1000,$1001,$1002,$1003,$1004,$1005,$1006,$1007,$1008) ORDER BY "id" [ '00178340-4467-11e6-8a37-219b2acdc346',
'003be4e0-0260-11e6-af27-2d5012c820c0',
'006a6460-db4f-11e5-9c34-e3c0bb24c767',
// snip ]


It seems like this query is especially slow. My initial intuition is to just slap an index on regionid within the store table, but I wonder if the database will really continue to scale as the number of regionid elements that are queried over grows over time.



What is the typical way to scale this over time? Will a simple index on regionid scalably save me going forwards, or is the WHERE IN ultimately going to be a crux?



EDIT: Here is the result of an EXPLAIN query on a shortened dataset:



[
{
"Plan": {
"Startup Cost": 211.01,
"Plans": [
{
"Startup Cost": 34.7,
"Plan Width": 110,
"Plans": [
{
"Startup Cost": 0,
"Plan Width": 0,
"Node Type": "Bitmap Index Scan",
"Index Cond": "(store.regionid = ANY ('{00178340-4467-11e6-8a37-219b2acdc346,003be4e0-0260-11e6-af27-2d5012c820c0,006a6460-db4f-11e5-9c34-e3c0bb24c767,00811b60-4d37-11e6-80df-5b5d32d23050,008d2270-8694-11e6-9802-ed42b1afc98c,014d2350-4d4a-11e6-b004-9d13a9078e9a,01516370-94a3-11e6-8ece-0f437fd4c46e,015654a0-4874-11e6-8a37-219b2acdc346}'::text[]))",
"Plan Rows": 55,
"Parent Relationship": "Outer",
"Index Name": "unique_stores",
"Total Cost": 34.69
}
],
"Recheck Cond": "(store.regionid = ANY ('{00178340-4467-11e6-8a37-219b2acdc346,003be4e0-0260-11e6-af27-2d5012c820c0,006a6460-db4f-11e5-9c34-e3c0bb24c767,00811b60-4d37-11e6-80df-5b5d32d23050,008d2270-8694-11e6-9802-ed42b1afc98c,014d2350-4d4a-11e6-b004-9d13a9078e9a,01516370-94a3-11e6-8ece-0f437fd4c46e,015654a0-4874-11e6-8a37-219b2acdc346}'::text[]))",
"Node Type": "Bitmap Heap Scan",
"Plan Rows": 55,
"Relation Name": "store",
"Alias": "store",
"Output": [
"id",
"name",
"status",
"regionid",
"storeid"
],
"Parent Relationship": "Outer",
"Total Cost": 209.42,
"Schema": "public"
}
],
"Sort Key": [
"store.id",
"store.storeid"
],
"Plan Rows": 55,
"Node Type": "Sort",
"Output": [
"id",
"name",
"status",
"regionid",
"storeid"
],
"Plan Width": 110,
"Total Cost": 211.15
}
}
]









share|improve this question
















bumped to the homepage by Community 18 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1





    How many distinct regionid values are in the store table? If the application is listing out 1000 regions and there are only 1500 regions in total, it's almost certainly going to be faster to do a table scan than to use an index on regionid. If there are 150,000 different regions, on the other hand, using an index on regionid would almost certainly be more efficient even if the list grew to a few thousand entries.

    – Justin Cave
    Nov 2 '16 at 19:17






  • 1





    Separate from that, though, long before I got to 1000 elements, I'd seriously think about loading the regions I want to search into a temp table or some other structure so that I wasn't generating thousands of different queries that differed only by the number of bind variables in the IN list.

    – Justin Cave
    Nov 2 '16 at 19:17











  • @JustinCave There is a typical ratio of about 20 stores per region...so it will scale ~20:1. With respect to your second comment, I'll have to do some reading up on that one.

    – Jim
    Nov 2 '16 at 19:21











  • For some context (I am still trying to sort this out, myself a bit!), it appears that a client ORM is generating this query through a permissions check. So, a user has permission (enforced at an application level) to access stores from regions with those IDs, and then they are querying for the accessible stores through this WHERE IN query (generated with an ORM)

    – Jim
    Nov 2 '16 at 19:22











  • That's not the ratio to be concerned with. It doesn't matter whether there are 1, 10, 100, or 1000 stores per region. The important question is what fraction of the rows in the table the query is returning. If there are tens or hundreds of thousands of regions, then an IN list with 1000 regions is only going to return a few percent of the rows in the table and an index would make sense. If the query is going to return most of the rows in the table because the IN list is so long, then an index won't be helpful.

    – Justin Cave
    Nov 2 '16 at 19:25
















2















I have some client code hitting my postgres instance with a giant SELECT statement that is matching against a list of IDs. The query ends up looking like this...



SELECT "id","name","status","regionid" FROM "public"."store" WHERE "regionid" IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,$81,$82,$83,$84,$85,$86,$87,$88,$89,$90,$91,$92,$93,$94,$95,$96,$97,$98,$99,$100,$101,$102,$103,$104,$105,$106,$107,$108,$109,$110,$111,$112,$113,$114,$115,$116,$117,$118,$119,$120,$121,$122,$123,$124,$125,$126,$127,$128,$129,$130,$131,$132,$133,$134,$135,$136,$137,$138,$139,$140,$141,$142,$143,$144,$145,$146,$147,$148,$149,$150,$151,$152,$153,$154,$155,$156,$157,$158,$159,$160,$161,$162,$163,$164,$165,$166,$167,$168,$169,$170,$171,$172,$173,$174,$175,$176,$177,$178,$179,$180,$181,$182,$183,$184,$185,$186,$187,$188,$189,$190,$191,$192,$193,$194,$195,$196,$197,$198,$199,$200,$201,$202,$203,$204,$205,$206,$207,$208,$209,$210,$211,$212,$213,$214,$215,$216,$217,$218,$219,$220,$221,$222,$223,$224,$225,$226,$227,$228,$229,$230,$231,$232,$233,$234,$235,$236,$237,$238,$239,$240,$241,$242,$243,$244,$245,$246,$247,$248,$249,$250,$251,$252,$253,$254,$255,$256,$257,$258,$259,$260,$261,$262,$263,$264,$265,$266,$267,$268,$269,$270,$271,$272,$273,$274,$275,$276,$277,$278,$279,$280,$281,$282,$283,$284,$285,$286,$287,$288,$289,$290,$291,$292,$293,$294,$295,$296,$297,$298,$299,$300,$301,$302,$303,$304,$305,$306,$307,$308,$309,$310,$311,$312,$313,$314,$315,$316,$317,$318,$319,$320,$321,$322,$323,$324,$325,$326,$327,$328,$329,$330,$331,$332,$333,$334,$335,$336,$337,$338,$339,$340,$341,$342,$343,$344,$345,$346,$347,$348,$349,$350,$351,$352,$353,$354,$355,$356,$357,$358,$359,$360,$361,$362,$363,$364,$365,$366,$367,$368,$369,$370,$371,$372,$373,$374,$375,$376,$377,$378,$379,$380,$381,$382,$383,$384,$385,$386,$387,$388,$389,$390,$391,$392,$393,$394,$395,$396,$397,$398,$399,$400,$401,$402,$403,$404,$405,$406,$407,$408,$409,$410,$411,$412,$413,$414,$415,$416,$417,$418,$419,$420,$421,$422,$423,$424,$425,$426,$427,$428,$429,$430,$431,$432,$433,$434,$435,$436,$437,$438,$439,$440,$441,$442,$443,$444,$445,$446,$447,$448,$449,$450,$451,$452,$453,$454,$455,$456,$457,$458,$459,$460,$461,$462,$463,$464,$465,$466,$467,$468,$469,$470,$471,$472,$473,$474,$475,$476,$477,$478,$479,$480,$481,$482,$483,$484,$485,$486,$487,$488,$489,$490,$491,$492,$493,$494,$495,$496,$497,$498,$499,$500,$501,$502,$503,$504,$505,$506,$507,$508,$509,$510,$511,$512,$513,$514,$515,$516,$517,$518,$519,$520,$521,$522,$523,$524,$525,$526,$527,$528,$529,$530,$531,$532,$533,$534,$535,$536,$537,$538,$539,$540,$541,$542,$543,$544,$545,$546,$547,$548,$549,$550,$551,$552,$553,$554,$555,$556,$557,$558,$559,$560,$561,$562,$563,$564,$565,$566,$567,$568,$569,$570,$571,$572,$573,$574,$575,$576,$577,$578,$579,$580,$581,$582,$583,$584,$585,$586,$587,$588,$589,$590,$591,$592,$593,$594,$595,$596,$597,$598,$599,$600,$601,$602,$603,$604,$605,$606,$607,$608,$609,$610,$611,$612,$613,$614,$615,$616,$617,$618,$619,$620,$621,$622,$623,$624,$625,$626,$627,$628,$629,$630,$631,$632,$633,$634,$635,$636,$637,$638,$639,$640,$641,$642,$643,$644,$645,$646,$647,$648,$649,$650,$651,$652,$653,$654,$655,$656,$657,$658,$659,$660,$661,$662,$663,$664,$665,$666,$667,$668,$669,$670,$671,$672,$673,$674,$675,$676,$677,$678,$679,$680,$681,$682,$683,$684,$685,$686,$687,$688,$689,$690,$691,$692,$693,$694,$695,$696,$697,$698,$699,$700,$701,$702,$703,$704,$705,$706,$707,$708,$709,$710,$711,$712,$713,$714,$715,$716,$717,$718,$719,$720,$721,$722,$723,$724,$725,$726,$727,$728,$729,$730,$731,$732,$733,$734,$735,$736,$737,$738,$739,$740,$741,$742,$743,$744,$745,$746,$747,$748,$749,$750,$751,$752,$753,$754,$755,$756,$757,$758,$759,$760,$761,$762,$763,$764,$765,$766,$767,$768,$769,$770,$771,$772,$773,$774,$775,$776,$777,$778,$779,$780,$781,$782,$783,$784,$785,$786,$787,$788,$789,$790,$791,$792,$793,$794,$795,$796,$797,$798,$799,$800,$801,$802,$803,$804,$805,$806,$807,$808,$809,$810,$811,$812,$813,$814,$815,$816,$817,$818,$819,$820,$821,$822,$823,$824,$825,$826,$827,$828,$829,$830,$831,$832,$833,$834,$835,$836,$837,$838,$839,$840,$841,$842,$843,$844,$845,$846,$847,$848,$849,$850,$851,$852,$853,$854,$855,$856,$857,$858,$859,$860,$861,$862,$863,$864,$865,$866,$867,$868,$869,$870,$871,$872,$873,$874,$875,$876,$877,$878,$879,$880,$881,$882,$883,$884,$885,$886,$887,$888,$889,$890,$891,$892,$893,$894,$895,$896,$897,$898,$899,$900,$901,$902,$903,$904,$905,$906,$907,$908,$909,$910,$911,$912,$913,$914,$915,$916,$917,$918,$919,$920,$921,$922,$923,$924,$925,$926,$927,$928,$929,$930,$931,$932,$933,$934,$935,$936,$937,$938,$939,$940,$941,$942,$943,$944,$945,$946,$947,$948,$949,$950,$951,$952,$953,$954,$955,$956,$957,$958,$959,$960,$961,$962,$963,$964,$965,$966,$967,$968,$969,$970,$971,$972,$973,$974,$975,$976,$977,$978,$979,$980,$981,$982,$983,$984,$985,$986,$987,$988,$989,$990,$991,$992,$993,$994,$995,$996,$997,$998,$999,$1000,$1001,$1002,$1003,$1004,$1005,$1006,$1007,$1008) ORDER BY "id" [ '00178340-4467-11e6-8a37-219b2acdc346',
'003be4e0-0260-11e6-af27-2d5012c820c0',
'006a6460-db4f-11e5-9c34-e3c0bb24c767',
// snip ]


It seems like this query is especially slow. My initial intuition is to just slap an index on regionid within the store table, but I wonder if the database will really continue to scale as the number of regionid elements that are queried over grows over time.



What is the typical way to scale this over time? Will a simple index on regionid scalably save me going forwards, or is the WHERE IN ultimately going to be a crux?



EDIT: Here is the result of an EXPLAIN query on a shortened dataset:



[
{
"Plan": {
"Startup Cost": 211.01,
"Plans": [
{
"Startup Cost": 34.7,
"Plan Width": 110,
"Plans": [
{
"Startup Cost": 0,
"Plan Width": 0,
"Node Type": "Bitmap Index Scan",
"Index Cond": "(store.regionid = ANY ('{00178340-4467-11e6-8a37-219b2acdc346,003be4e0-0260-11e6-af27-2d5012c820c0,006a6460-db4f-11e5-9c34-e3c0bb24c767,00811b60-4d37-11e6-80df-5b5d32d23050,008d2270-8694-11e6-9802-ed42b1afc98c,014d2350-4d4a-11e6-b004-9d13a9078e9a,01516370-94a3-11e6-8ece-0f437fd4c46e,015654a0-4874-11e6-8a37-219b2acdc346}'::text[]))",
"Plan Rows": 55,
"Parent Relationship": "Outer",
"Index Name": "unique_stores",
"Total Cost": 34.69
}
],
"Recheck Cond": "(store.regionid = ANY ('{00178340-4467-11e6-8a37-219b2acdc346,003be4e0-0260-11e6-af27-2d5012c820c0,006a6460-db4f-11e5-9c34-e3c0bb24c767,00811b60-4d37-11e6-80df-5b5d32d23050,008d2270-8694-11e6-9802-ed42b1afc98c,014d2350-4d4a-11e6-b004-9d13a9078e9a,01516370-94a3-11e6-8ece-0f437fd4c46e,015654a0-4874-11e6-8a37-219b2acdc346}'::text[]))",
"Node Type": "Bitmap Heap Scan",
"Plan Rows": 55,
"Relation Name": "store",
"Alias": "store",
"Output": [
"id",
"name",
"status",
"regionid",
"storeid"
],
"Parent Relationship": "Outer",
"Total Cost": 209.42,
"Schema": "public"
}
],
"Sort Key": [
"store.id",
"store.storeid"
],
"Plan Rows": 55,
"Node Type": "Sort",
"Output": [
"id",
"name",
"status",
"regionid",
"storeid"
],
"Plan Width": 110,
"Total Cost": 211.15
}
}
]









share|improve this question
















bumped to the homepage by Community 18 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1





    How many distinct regionid values are in the store table? If the application is listing out 1000 regions and there are only 1500 regions in total, it's almost certainly going to be faster to do a table scan than to use an index on regionid. If there are 150,000 different regions, on the other hand, using an index on regionid would almost certainly be more efficient even if the list grew to a few thousand entries.

    – Justin Cave
    Nov 2 '16 at 19:17






  • 1





    Separate from that, though, long before I got to 1000 elements, I'd seriously think about loading the regions I want to search into a temp table or some other structure so that I wasn't generating thousands of different queries that differed only by the number of bind variables in the IN list.

    – Justin Cave
    Nov 2 '16 at 19:17











  • @JustinCave There is a typical ratio of about 20 stores per region...so it will scale ~20:1. With respect to your second comment, I'll have to do some reading up on that one.

    – Jim
    Nov 2 '16 at 19:21











  • For some context (I am still trying to sort this out, myself a bit!), it appears that a client ORM is generating this query through a permissions check. So, a user has permission (enforced at an application level) to access stores from regions with those IDs, and then they are querying for the accessible stores through this WHERE IN query (generated with an ORM)

    – Jim
    Nov 2 '16 at 19:22











  • That's not the ratio to be concerned with. It doesn't matter whether there are 1, 10, 100, or 1000 stores per region. The important question is what fraction of the rows in the table the query is returning. If there are tens or hundreds of thousands of regions, then an IN list with 1000 regions is only going to return a few percent of the rows in the table and an index would make sense. If the query is going to return most of the rows in the table because the IN list is so long, then an index won't be helpful.

    – Justin Cave
    Nov 2 '16 at 19:25














2












2








2








I have some client code hitting my postgres instance with a giant SELECT statement that is matching against a list of IDs. The query ends up looking like this...



SELECT "id","name","status","regionid" FROM "public"."store" WHERE "regionid" IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,$81,$82,$83,$84,$85,$86,$87,$88,$89,$90,$91,$92,$93,$94,$95,$96,$97,$98,$99,$100,$101,$102,$103,$104,$105,$106,$107,$108,$109,$110,$111,$112,$113,$114,$115,$116,$117,$118,$119,$120,$121,$122,$123,$124,$125,$126,$127,$128,$129,$130,$131,$132,$133,$134,$135,$136,$137,$138,$139,$140,$141,$142,$143,$144,$145,$146,$147,$148,$149,$150,$151,$152,$153,$154,$155,$156,$157,$158,$159,$160,$161,$162,$163,$164,$165,$166,$167,$168,$169,$170,$171,$172,$173,$174,$175,$176,$177,$178,$179,$180,$181,$182,$183,$184,$185,$186,$187,$188,$189,$190,$191,$192,$193,$194,$195,$196,$197,$198,$199,$200,$201,$202,$203,$204,$205,$206,$207,$208,$209,$210,$211,$212,$213,$214,$215,$216,$217,$218,$219,$220,$221,$222,$223,$224,$225,$226,$227,$228,$229,$230,$231,$232,$233,$234,$235,$236,$237,$238,$239,$240,$241,$242,$243,$244,$245,$246,$247,$248,$249,$250,$251,$252,$253,$254,$255,$256,$257,$258,$259,$260,$261,$262,$263,$264,$265,$266,$267,$268,$269,$270,$271,$272,$273,$274,$275,$276,$277,$278,$279,$280,$281,$282,$283,$284,$285,$286,$287,$288,$289,$290,$291,$292,$293,$294,$295,$296,$297,$298,$299,$300,$301,$302,$303,$304,$305,$306,$307,$308,$309,$310,$311,$312,$313,$314,$315,$316,$317,$318,$319,$320,$321,$322,$323,$324,$325,$326,$327,$328,$329,$330,$331,$332,$333,$334,$335,$336,$337,$338,$339,$340,$341,$342,$343,$344,$345,$346,$347,$348,$349,$350,$351,$352,$353,$354,$355,$356,$357,$358,$359,$360,$361,$362,$363,$364,$365,$366,$367,$368,$369,$370,$371,$372,$373,$374,$375,$376,$377,$378,$379,$380,$381,$382,$383,$384,$385,$386,$387,$388,$389,$390,$391,$392,$393,$394,$395,$396,$397,$398,$399,$400,$401,$402,$403,$404,$405,$406,$407,$408,$409,$410,$411,$412,$413,$414,$415,$416,$417,$418,$419,$420,$421,$422,$423,$424,$425,$426,$427,$428,$429,$430,$431,$432,$433,$434,$435,$436,$437,$438,$439,$440,$441,$442,$443,$444,$445,$446,$447,$448,$449,$450,$451,$452,$453,$454,$455,$456,$457,$458,$459,$460,$461,$462,$463,$464,$465,$466,$467,$468,$469,$470,$471,$472,$473,$474,$475,$476,$477,$478,$479,$480,$481,$482,$483,$484,$485,$486,$487,$488,$489,$490,$491,$492,$493,$494,$495,$496,$497,$498,$499,$500,$501,$502,$503,$504,$505,$506,$507,$508,$509,$510,$511,$512,$513,$514,$515,$516,$517,$518,$519,$520,$521,$522,$523,$524,$525,$526,$527,$528,$529,$530,$531,$532,$533,$534,$535,$536,$537,$538,$539,$540,$541,$542,$543,$544,$545,$546,$547,$548,$549,$550,$551,$552,$553,$554,$555,$556,$557,$558,$559,$560,$561,$562,$563,$564,$565,$566,$567,$568,$569,$570,$571,$572,$573,$574,$575,$576,$577,$578,$579,$580,$581,$582,$583,$584,$585,$586,$587,$588,$589,$590,$591,$592,$593,$594,$595,$596,$597,$598,$599,$600,$601,$602,$603,$604,$605,$606,$607,$608,$609,$610,$611,$612,$613,$614,$615,$616,$617,$618,$619,$620,$621,$622,$623,$624,$625,$626,$627,$628,$629,$630,$631,$632,$633,$634,$635,$636,$637,$638,$639,$640,$641,$642,$643,$644,$645,$646,$647,$648,$649,$650,$651,$652,$653,$654,$655,$656,$657,$658,$659,$660,$661,$662,$663,$664,$665,$666,$667,$668,$669,$670,$671,$672,$673,$674,$675,$676,$677,$678,$679,$680,$681,$682,$683,$684,$685,$686,$687,$688,$689,$690,$691,$692,$693,$694,$695,$696,$697,$698,$699,$700,$701,$702,$703,$704,$705,$706,$707,$708,$709,$710,$711,$712,$713,$714,$715,$716,$717,$718,$719,$720,$721,$722,$723,$724,$725,$726,$727,$728,$729,$730,$731,$732,$733,$734,$735,$736,$737,$738,$739,$740,$741,$742,$743,$744,$745,$746,$747,$748,$749,$750,$751,$752,$753,$754,$755,$756,$757,$758,$759,$760,$761,$762,$763,$764,$765,$766,$767,$768,$769,$770,$771,$772,$773,$774,$775,$776,$777,$778,$779,$780,$781,$782,$783,$784,$785,$786,$787,$788,$789,$790,$791,$792,$793,$794,$795,$796,$797,$798,$799,$800,$801,$802,$803,$804,$805,$806,$807,$808,$809,$810,$811,$812,$813,$814,$815,$816,$817,$818,$819,$820,$821,$822,$823,$824,$825,$826,$827,$828,$829,$830,$831,$832,$833,$834,$835,$836,$837,$838,$839,$840,$841,$842,$843,$844,$845,$846,$847,$848,$849,$850,$851,$852,$853,$854,$855,$856,$857,$858,$859,$860,$861,$862,$863,$864,$865,$866,$867,$868,$869,$870,$871,$872,$873,$874,$875,$876,$877,$878,$879,$880,$881,$882,$883,$884,$885,$886,$887,$888,$889,$890,$891,$892,$893,$894,$895,$896,$897,$898,$899,$900,$901,$902,$903,$904,$905,$906,$907,$908,$909,$910,$911,$912,$913,$914,$915,$916,$917,$918,$919,$920,$921,$922,$923,$924,$925,$926,$927,$928,$929,$930,$931,$932,$933,$934,$935,$936,$937,$938,$939,$940,$941,$942,$943,$944,$945,$946,$947,$948,$949,$950,$951,$952,$953,$954,$955,$956,$957,$958,$959,$960,$961,$962,$963,$964,$965,$966,$967,$968,$969,$970,$971,$972,$973,$974,$975,$976,$977,$978,$979,$980,$981,$982,$983,$984,$985,$986,$987,$988,$989,$990,$991,$992,$993,$994,$995,$996,$997,$998,$999,$1000,$1001,$1002,$1003,$1004,$1005,$1006,$1007,$1008) ORDER BY "id" [ '00178340-4467-11e6-8a37-219b2acdc346',
'003be4e0-0260-11e6-af27-2d5012c820c0',
'006a6460-db4f-11e5-9c34-e3c0bb24c767',
// snip ]


It seems like this query is especially slow. My initial intuition is to just slap an index on regionid within the store table, but I wonder if the database will really continue to scale as the number of regionid elements that are queried over grows over time.



What is the typical way to scale this over time? Will a simple index on regionid scalably save me going forwards, or is the WHERE IN ultimately going to be a crux?



EDIT: Here is the result of an EXPLAIN query on a shortened dataset:



[
{
"Plan": {
"Startup Cost": 211.01,
"Plans": [
{
"Startup Cost": 34.7,
"Plan Width": 110,
"Plans": [
{
"Startup Cost": 0,
"Plan Width": 0,
"Node Type": "Bitmap Index Scan",
"Index Cond": "(store.regionid = ANY ('{00178340-4467-11e6-8a37-219b2acdc346,003be4e0-0260-11e6-af27-2d5012c820c0,006a6460-db4f-11e5-9c34-e3c0bb24c767,00811b60-4d37-11e6-80df-5b5d32d23050,008d2270-8694-11e6-9802-ed42b1afc98c,014d2350-4d4a-11e6-b004-9d13a9078e9a,01516370-94a3-11e6-8ece-0f437fd4c46e,015654a0-4874-11e6-8a37-219b2acdc346}'::text[]))",
"Plan Rows": 55,
"Parent Relationship": "Outer",
"Index Name": "unique_stores",
"Total Cost": 34.69
}
],
"Recheck Cond": "(store.regionid = ANY ('{00178340-4467-11e6-8a37-219b2acdc346,003be4e0-0260-11e6-af27-2d5012c820c0,006a6460-db4f-11e5-9c34-e3c0bb24c767,00811b60-4d37-11e6-80df-5b5d32d23050,008d2270-8694-11e6-9802-ed42b1afc98c,014d2350-4d4a-11e6-b004-9d13a9078e9a,01516370-94a3-11e6-8ece-0f437fd4c46e,015654a0-4874-11e6-8a37-219b2acdc346}'::text[]))",
"Node Type": "Bitmap Heap Scan",
"Plan Rows": 55,
"Relation Name": "store",
"Alias": "store",
"Output": [
"id",
"name",
"status",
"regionid",
"storeid"
],
"Parent Relationship": "Outer",
"Total Cost": 209.42,
"Schema": "public"
}
],
"Sort Key": [
"store.id",
"store.storeid"
],
"Plan Rows": 55,
"Node Type": "Sort",
"Output": [
"id",
"name",
"status",
"regionid",
"storeid"
],
"Plan Width": 110,
"Total Cost": 211.15
}
}
]









share|improve this question
















I have some client code hitting my postgres instance with a giant SELECT statement that is matching against a list of IDs. The query ends up looking like this...



SELECT "id","name","status","regionid" FROM "public"."store" WHERE "regionid" IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,$81,$82,$83,$84,$85,$86,$87,$88,$89,$90,$91,$92,$93,$94,$95,$96,$97,$98,$99,$100,$101,$102,$103,$104,$105,$106,$107,$108,$109,$110,$111,$112,$113,$114,$115,$116,$117,$118,$119,$120,$121,$122,$123,$124,$125,$126,$127,$128,$129,$130,$131,$132,$133,$134,$135,$136,$137,$138,$139,$140,$141,$142,$143,$144,$145,$146,$147,$148,$149,$150,$151,$152,$153,$154,$155,$156,$157,$158,$159,$160,$161,$162,$163,$164,$165,$166,$167,$168,$169,$170,$171,$172,$173,$174,$175,$176,$177,$178,$179,$180,$181,$182,$183,$184,$185,$186,$187,$188,$189,$190,$191,$192,$193,$194,$195,$196,$197,$198,$199,$200,$201,$202,$203,$204,$205,$206,$207,$208,$209,$210,$211,$212,$213,$214,$215,$216,$217,$218,$219,$220,$221,$222,$223,$224,$225,$226,$227,$228,$229,$230,$231,$232,$233,$234,$235,$236,$237,$238,$239,$240,$241,$242,$243,$244,$245,$246,$247,$248,$249,$250,$251,$252,$253,$254,$255,$256,$257,$258,$259,$260,$261,$262,$263,$264,$265,$266,$267,$268,$269,$270,$271,$272,$273,$274,$275,$276,$277,$278,$279,$280,$281,$282,$283,$284,$285,$286,$287,$288,$289,$290,$291,$292,$293,$294,$295,$296,$297,$298,$299,$300,$301,$302,$303,$304,$305,$306,$307,$308,$309,$310,$311,$312,$313,$314,$315,$316,$317,$318,$319,$320,$321,$322,$323,$324,$325,$326,$327,$328,$329,$330,$331,$332,$333,$334,$335,$336,$337,$338,$339,$340,$341,$342,$343,$344,$345,$346,$347,$348,$349,$350,$351,$352,$353,$354,$355,$356,$357,$358,$359,$360,$361,$362,$363,$364,$365,$366,$367,$368,$369,$370,$371,$372,$373,$374,$375,$376,$377,$378,$379,$380,$381,$382,$383,$384,$385,$386,$387,$388,$389,$390,$391,$392,$393,$394,$395,$396,$397,$398,$399,$400,$401,$402,$403,$404,$405,$406,$407,$408,$409,$410,$411,$412,$413,$414,$415,$416,$417,$418,$419,$420,$421,$422,$423,$424,$425,$426,$427,$428,$429,$430,$431,$432,$433,$434,$435,$436,$437,$438,$439,$440,$441,$442,$443,$444,$445,$446,$447,$448,$449,$450,$451,$452,$453,$454,$455,$456,$457,$458,$459,$460,$461,$462,$463,$464,$465,$466,$467,$468,$469,$470,$471,$472,$473,$474,$475,$476,$477,$478,$479,$480,$481,$482,$483,$484,$485,$486,$487,$488,$489,$490,$491,$492,$493,$494,$495,$496,$497,$498,$499,$500,$501,$502,$503,$504,$505,$506,$507,$508,$509,$510,$511,$512,$513,$514,$515,$516,$517,$518,$519,$520,$521,$522,$523,$524,$525,$526,$527,$528,$529,$530,$531,$532,$533,$534,$535,$536,$537,$538,$539,$540,$541,$542,$543,$544,$545,$546,$547,$548,$549,$550,$551,$552,$553,$554,$555,$556,$557,$558,$559,$560,$561,$562,$563,$564,$565,$566,$567,$568,$569,$570,$571,$572,$573,$574,$575,$576,$577,$578,$579,$580,$581,$582,$583,$584,$585,$586,$587,$588,$589,$590,$591,$592,$593,$594,$595,$596,$597,$598,$599,$600,$601,$602,$603,$604,$605,$606,$607,$608,$609,$610,$611,$612,$613,$614,$615,$616,$617,$618,$619,$620,$621,$622,$623,$624,$625,$626,$627,$628,$629,$630,$631,$632,$633,$634,$635,$636,$637,$638,$639,$640,$641,$642,$643,$644,$645,$646,$647,$648,$649,$650,$651,$652,$653,$654,$655,$656,$657,$658,$659,$660,$661,$662,$663,$664,$665,$666,$667,$668,$669,$670,$671,$672,$673,$674,$675,$676,$677,$678,$679,$680,$681,$682,$683,$684,$685,$686,$687,$688,$689,$690,$691,$692,$693,$694,$695,$696,$697,$698,$699,$700,$701,$702,$703,$704,$705,$706,$707,$708,$709,$710,$711,$712,$713,$714,$715,$716,$717,$718,$719,$720,$721,$722,$723,$724,$725,$726,$727,$728,$729,$730,$731,$732,$733,$734,$735,$736,$737,$738,$739,$740,$741,$742,$743,$744,$745,$746,$747,$748,$749,$750,$751,$752,$753,$754,$755,$756,$757,$758,$759,$760,$761,$762,$763,$764,$765,$766,$767,$768,$769,$770,$771,$772,$773,$774,$775,$776,$777,$778,$779,$780,$781,$782,$783,$784,$785,$786,$787,$788,$789,$790,$791,$792,$793,$794,$795,$796,$797,$798,$799,$800,$801,$802,$803,$804,$805,$806,$807,$808,$809,$810,$811,$812,$813,$814,$815,$816,$817,$818,$819,$820,$821,$822,$823,$824,$825,$826,$827,$828,$829,$830,$831,$832,$833,$834,$835,$836,$837,$838,$839,$840,$841,$842,$843,$844,$845,$846,$847,$848,$849,$850,$851,$852,$853,$854,$855,$856,$857,$858,$859,$860,$861,$862,$863,$864,$865,$866,$867,$868,$869,$870,$871,$872,$873,$874,$875,$876,$877,$878,$879,$880,$881,$882,$883,$884,$885,$886,$887,$888,$889,$890,$891,$892,$893,$894,$895,$896,$897,$898,$899,$900,$901,$902,$903,$904,$905,$906,$907,$908,$909,$910,$911,$912,$913,$914,$915,$916,$917,$918,$919,$920,$921,$922,$923,$924,$925,$926,$927,$928,$929,$930,$931,$932,$933,$934,$935,$936,$937,$938,$939,$940,$941,$942,$943,$944,$945,$946,$947,$948,$949,$950,$951,$952,$953,$954,$955,$956,$957,$958,$959,$960,$961,$962,$963,$964,$965,$966,$967,$968,$969,$970,$971,$972,$973,$974,$975,$976,$977,$978,$979,$980,$981,$982,$983,$984,$985,$986,$987,$988,$989,$990,$991,$992,$993,$994,$995,$996,$997,$998,$999,$1000,$1001,$1002,$1003,$1004,$1005,$1006,$1007,$1008) ORDER BY "id" [ '00178340-4467-11e6-8a37-219b2acdc346',
'003be4e0-0260-11e6-af27-2d5012c820c0',
'006a6460-db4f-11e5-9c34-e3c0bb24c767',
// snip ]


It seems like this query is especially slow. My initial intuition is to just slap an index on regionid within the store table, but I wonder if the database will really continue to scale as the number of regionid elements that are queried over grows over time.



What is the typical way to scale this over time? Will a simple index on regionid scalably save me going forwards, or is the WHERE IN ultimately going to be a crux?



EDIT: Here is the result of an EXPLAIN query on a shortened dataset:



[
{
"Plan": {
"Startup Cost": 211.01,
"Plans": [
{
"Startup Cost": 34.7,
"Plan Width": 110,
"Plans": [
{
"Startup Cost": 0,
"Plan Width": 0,
"Node Type": "Bitmap Index Scan",
"Index Cond": "(store.regionid = ANY ('{00178340-4467-11e6-8a37-219b2acdc346,003be4e0-0260-11e6-af27-2d5012c820c0,006a6460-db4f-11e5-9c34-e3c0bb24c767,00811b60-4d37-11e6-80df-5b5d32d23050,008d2270-8694-11e6-9802-ed42b1afc98c,014d2350-4d4a-11e6-b004-9d13a9078e9a,01516370-94a3-11e6-8ece-0f437fd4c46e,015654a0-4874-11e6-8a37-219b2acdc346}'::text[]))",
"Plan Rows": 55,
"Parent Relationship": "Outer",
"Index Name": "unique_stores",
"Total Cost": 34.69
}
],
"Recheck Cond": "(store.regionid = ANY ('{00178340-4467-11e6-8a37-219b2acdc346,003be4e0-0260-11e6-af27-2d5012c820c0,006a6460-db4f-11e5-9c34-e3c0bb24c767,00811b60-4d37-11e6-80df-5b5d32d23050,008d2270-8694-11e6-9802-ed42b1afc98c,014d2350-4d4a-11e6-b004-9d13a9078e9a,01516370-94a3-11e6-8ece-0f437fd4c46e,015654a0-4874-11e6-8a37-219b2acdc346}'::text[]))",
"Node Type": "Bitmap Heap Scan",
"Plan Rows": 55,
"Relation Name": "store",
"Alias": "store",
"Output": [
"id",
"name",
"status",
"regionid",
"storeid"
],
"Parent Relationship": "Outer",
"Total Cost": 209.42,
"Schema": "public"
}
],
"Sort Key": [
"store.id",
"store.storeid"
],
"Plan Rows": 55,
"Node Type": "Sort",
"Output": [
"id",
"name",
"status",
"regionid",
"storeid"
],
"Plan Width": 110,
"Total Cost": 211.15
}
}
]






postgresql index






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 2 '16 at 19:36







Jim

















asked Nov 2 '16 at 19:11









JimJim

1112




1112





bumped to the homepage by Community 18 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 18 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 1





    How many distinct regionid values are in the store table? If the application is listing out 1000 regions and there are only 1500 regions in total, it's almost certainly going to be faster to do a table scan than to use an index on regionid. If there are 150,000 different regions, on the other hand, using an index on regionid would almost certainly be more efficient even if the list grew to a few thousand entries.

    – Justin Cave
    Nov 2 '16 at 19:17






  • 1





    Separate from that, though, long before I got to 1000 elements, I'd seriously think about loading the regions I want to search into a temp table or some other structure so that I wasn't generating thousands of different queries that differed only by the number of bind variables in the IN list.

    – Justin Cave
    Nov 2 '16 at 19:17











  • @JustinCave There is a typical ratio of about 20 stores per region...so it will scale ~20:1. With respect to your second comment, I'll have to do some reading up on that one.

    – Jim
    Nov 2 '16 at 19:21











  • For some context (I am still trying to sort this out, myself a bit!), it appears that a client ORM is generating this query through a permissions check. So, a user has permission (enforced at an application level) to access stores from regions with those IDs, and then they are querying for the accessible stores through this WHERE IN query (generated with an ORM)

    – Jim
    Nov 2 '16 at 19:22











  • That's not the ratio to be concerned with. It doesn't matter whether there are 1, 10, 100, or 1000 stores per region. The important question is what fraction of the rows in the table the query is returning. If there are tens or hundreds of thousands of regions, then an IN list with 1000 regions is only going to return a few percent of the rows in the table and an index would make sense. If the query is going to return most of the rows in the table because the IN list is so long, then an index won't be helpful.

    – Justin Cave
    Nov 2 '16 at 19:25














  • 1





    How many distinct regionid values are in the store table? If the application is listing out 1000 regions and there are only 1500 regions in total, it's almost certainly going to be faster to do a table scan than to use an index on regionid. If there are 150,000 different regions, on the other hand, using an index on regionid would almost certainly be more efficient even if the list grew to a few thousand entries.

    – Justin Cave
    Nov 2 '16 at 19:17






  • 1





    Separate from that, though, long before I got to 1000 elements, I'd seriously think about loading the regions I want to search into a temp table or some other structure so that I wasn't generating thousands of different queries that differed only by the number of bind variables in the IN list.

    – Justin Cave
    Nov 2 '16 at 19:17











  • @JustinCave There is a typical ratio of about 20 stores per region...so it will scale ~20:1. With respect to your second comment, I'll have to do some reading up on that one.

    – Jim
    Nov 2 '16 at 19:21











  • For some context (I am still trying to sort this out, myself a bit!), it appears that a client ORM is generating this query through a permissions check. So, a user has permission (enforced at an application level) to access stores from regions with those IDs, and then they are querying for the accessible stores through this WHERE IN query (generated with an ORM)

    – Jim
    Nov 2 '16 at 19:22











  • That's not the ratio to be concerned with. It doesn't matter whether there are 1, 10, 100, or 1000 stores per region. The important question is what fraction of the rows in the table the query is returning. If there are tens or hundreds of thousands of regions, then an IN list with 1000 regions is only going to return a few percent of the rows in the table and an index would make sense. If the query is going to return most of the rows in the table because the IN list is so long, then an index won't be helpful.

    – Justin Cave
    Nov 2 '16 at 19:25








1




1





How many distinct regionid values are in the store table? If the application is listing out 1000 regions and there are only 1500 regions in total, it's almost certainly going to be faster to do a table scan than to use an index on regionid. If there are 150,000 different regions, on the other hand, using an index on regionid would almost certainly be more efficient even if the list grew to a few thousand entries.

– Justin Cave
Nov 2 '16 at 19:17





How many distinct regionid values are in the store table? If the application is listing out 1000 regions and there are only 1500 regions in total, it's almost certainly going to be faster to do a table scan than to use an index on regionid. If there are 150,000 different regions, on the other hand, using an index on regionid would almost certainly be more efficient even if the list grew to a few thousand entries.

– Justin Cave
Nov 2 '16 at 19:17




1




1





Separate from that, though, long before I got to 1000 elements, I'd seriously think about loading the regions I want to search into a temp table or some other structure so that I wasn't generating thousands of different queries that differed only by the number of bind variables in the IN list.

– Justin Cave
Nov 2 '16 at 19:17





Separate from that, though, long before I got to 1000 elements, I'd seriously think about loading the regions I want to search into a temp table or some other structure so that I wasn't generating thousands of different queries that differed only by the number of bind variables in the IN list.

– Justin Cave
Nov 2 '16 at 19:17













@JustinCave There is a typical ratio of about 20 stores per region...so it will scale ~20:1. With respect to your second comment, I'll have to do some reading up on that one.

– Jim
Nov 2 '16 at 19:21





@JustinCave There is a typical ratio of about 20 stores per region...so it will scale ~20:1. With respect to your second comment, I'll have to do some reading up on that one.

– Jim
Nov 2 '16 at 19:21













For some context (I am still trying to sort this out, myself a bit!), it appears that a client ORM is generating this query through a permissions check. So, a user has permission (enforced at an application level) to access stores from regions with those IDs, and then they are querying for the accessible stores through this WHERE IN query (generated with an ORM)

– Jim
Nov 2 '16 at 19:22





For some context (I am still trying to sort this out, myself a bit!), it appears that a client ORM is generating this query through a permissions check. So, a user has permission (enforced at an application level) to access stores from regions with those IDs, and then they are querying for the accessible stores through this WHERE IN query (generated with an ORM)

– Jim
Nov 2 '16 at 19:22













That's not the ratio to be concerned with. It doesn't matter whether there are 1, 10, 100, or 1000 stores per region. The important question is what fraction of the rows in the table the query is returning. If there are tens or hundreds of thousands of regions, then an IN list with 1000 regions is only going to return a few percent of the rows in the table and an index would make sense. If the query is going to return most of the rows in the table because the IN list is so long, then an index won't be helpful.

– Justin Cave
Nov 2 '16 at 19:25





That's not the ratio to be concerned with. It doesn't matter whether there are 1, 10, 100, or 1000 stores per region. The important question is what fraction of the rows in the table the query is returning. If there are tens or hundreds of thousands of regions, then an IN list with 1000 regions is only going to return a few percent of the rows in the table and an index would make sense. If the query is going to return most of the rows in the table because the IN list is so long, then an index won't be helpful.

– Justin Cave
Nov 2 '16 at 19:25










1 Answer
1






active

oldest

votes


















0














Since Postgres 9.2 index-only scan is supported so, in answer to your question whether an index can help in this situation, I think it definitely might.



Instead of just indexing the regionid field, create an index on (regionid, id, name, status) and the query can scan the index instead of the table. Assuming that the table has many more fields than just those four, then this operation should be much quicker.



Don't forget to test whether this impacts on the UPDATE, DELETE and INSERT operations.



More info at Use The Index Luke






share|improve this answer


























  • @Jim this was an interesting question and I'm keen to hear back whether my proposed solution was any help. Could you share your findings?

    – mendosi
    Nov 10 '16 at 0:26











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f154081%2fcan-indices-save-a-giant-where-foo-in-query-with-thousands-of-entries-in%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














Since Postgres 9.2 index-only scan is supported so, in answer to your question whether an index can help in this situation, I think it definitely might.



Instead of just indexing the regionid field, create an index on (regionid, id, name, status) and the query can scan the index instead of the table. Assuming that the table has many more fields than just those four, then this operation should be much quicker.



Don't forget to test whether this impacts on the UPDATE, DELETE and INSERT operations.



More info at Use The Index Luke






share|improve this answer


























  • @Jim this was an interesting question and I'm keen to hear back whether my proposed solution was any help. Could you share your findings?

    – mendosi
    Nov 10 '16 at 0:26
















0














Since Postgres 9.2 index-only scan is supported so, in answer to your question whether an index can help in this situation, I think it definitely might.



Instead of just indexing the regionid field, create an index on (regionid, id, name, status) and the query can scan the index instead of the table. Assuming that the table has many more fields than just those four, then this operation should be much quicker.



Don't forget to test whether this impacts on the UPDATE, DELETE and INSERT operations.



More info at Use The Index Luke






share|improve this answer


























  • @Jim this was an interesting question and I'm keen to hear back whether my proposed solution was any help. Could you share your findings?

    – mendosi
    Nov 10 '16 at 0:26














0












0








0







Since Postgres 9.2 index-only scan is supported so, in answer to your question whether an index can help in this situation, I think it definitely might.



Instead of just indexing the regionid field, create an index on (regionid, id, name, status) and the query can scan the index instead of the table. Assuming that the table has many more fields than just those four, then this operation should be much quicker.



Don't forget to test whether this impacts on the UPDATE, DELETE and INSERT operations.



More info at Use The Index Luke






share|improve this answer















Since Postgres 9.2 index-only scan is supported so, in answer to your question whether an index can help in this situation, I think it definitely might.



Instead of just indexing the regionid field, create an index on (regionid, id, name, status) and the query can scan the index instead of the table. Assuming that the table has many more fields than just those four, then this operation should be much quicker.



Don't forget to test whether this impacts on the UPDATE, DELETE and INSERT operations.



More info at Use The Index Luke







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 3 '16 at 9:38









dezso

22.2k116096




22.2k116096










answered Nov 3 '16 at 3:23









mendosimendosi

1,972520




1,972520













  • @Jim this was an interesting question and I'm keen to hear back whether my proposed solution was any help. Could you share your findings?

    – mendosi
    Nov 10 '16 at 0:26



















  • @Jim this was an interesting question and I'm keen to hear back whether my proposed solution was any help. Could you share your findings?

    – mendosi
    Nov 10 '16 at 0:26

















@Jim this was an interesting question and I'm keen to hear back whether my proposed solution was any help. Could you share your findings?

– mendosi
Nov 10 '16 at 0:26





@Jim this was an interesting question and I'm keen to hear back whether my proposed solution was any help. Could you share your findings?

– mendosi
Nov 10 '16 at 0:26


















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f154081%2fcan-indices-save-a-giant-where-foo-in-query-with-thousands-of-entries-in%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Szabolcs (Ungheria) Altri progetti | Menu di navigazione48°10′14.56″N 21°29′33.14″E /...

Discografia di Klaus Schulze Indice Album in studio | Album dal vivo | Singoli | Antologie | Colonne...

How to make inet_server_addr() return localhost in spite of ::1/128RETURN NEXT in Postgres FunctionConnect to...