Best index for column where values are mostly the sameHow to index a query with `WHERE field IS...

How Create a list of the first 10,000 digits of Pi and sum it?

Is there a way to pause a running process on Linux systems and resume later?

Was Opportunity's last message to Earth "My battery is low and it's getting dark"?

What does an unprocessed RAW file look like?

How bad is a Computer Science course that doesn't teach Design Patterns?

Why write a book when there's a movie in my head?

Is it possible to detect 100% of SQLi with a simple regex?

Now...where was I?

How do I avoid the "chosen hero" feeling?

How can I differentiate duration vs starting time

Why Doesn't It Completely Uninstall?

How do I handle a blinded enemy which wants to attack someone it's sure is there?

SQL Server 2017 crashes when backing up because filepath is wrong

Build ASCII Podiums

How to typeset a small black square as a binary operator?

Can a planet be tidally unlocked?

Variance of sine and cosine of a random variable

What's the function of the word "ли" in the following contexts?

Have the UK Conservatives lost the working majority and if so, what does this mean?

Taking an academic pseudonym?

Did the characters in Moving Pictures not know about cameras like Twoflower's?

Badly designed reimbursement form. What does that say about the company?

Question: "Are you hungry?" Answer: "I feel like eating."

In the Lost in Space intro why was Dr. Smith actor listed as a special guest star?



Best index for column where values are mostly the same


How to index a query with `WHERE field IS NULL`?PostgreSQL partial index unused when created on a table with existing dataPostgres partial index on IS NULL not workingHow do databases store index key values (on-disk) for variable length fields?Select rows, where 3 columns have the same valuesnumbering rows consecutively for a number of tablesIndexes: integer vs string performance if the number of nodes is the sameAre two logically equal indices physically the same index?Are PostgreSQL clusters and servers the same thing?How to make value in a row equal to now() in postgresMultiple COUNTs over the same columnWhat is the data type of the ‘ctid’ system column in Postgres?Multi-column and single-column index on the same column













3















We have an integer column that currently consists only of 0 or 1 values. This column has now been used by a developer to store a unique 32-bit identifier on some occasions, and we need to be able to efficiently pull out rows containing any one of these identifiers.



Given the value will be 0 or 1 say (I don't have figures yet) 99% of the time, how might it best be indexed to query against the minority case? Am I even right in thinking the volume of common values will be an issue?



           Column           |  Type   |     Modifiers
----------------------------+---------+--------------------
event_value | integer | not null


There are currently no indexes on this column. And I don't envisage the need to regularly select just the 0 or 1 values.



The table is of a reasonable size, currently 30 million rows and growing fast.



I appreciate this isn't the best use of the column, but that can't change in the short term.










share|improve this question




















  • 3





    Maybe a filtered index? create index on the_table (...) where event_value > 1;

    – a_horse_with_no_name
    2 days ago






  • 3





    What @a_horse_with_no_name suggests. But you'll then should be using modified filters: where (event_value > 1) and event_value = @XYZ, so the filtered index is used.

    – ypercubeᵀᴹ
    2 days ago











  • @a_horse_with_no_name Ah, ok. I didn't even realise you could have a where clause attached to the index. Thanks.

    – whoasked
    2 days ago






  • 1





    Yes. The benefit compared to a common index is that it has a much smaller size (1% in your case) as it will store only the rows with values that differ from 0,1. @a_horse_with_no_name, add an answer?

    – ypercubeᵀᴹ
    2 days ago













  • @ypercubeᵀᴹ You should not even need the modified filters. I can construct situations in which you do, but they are unlikely to occur in practice in this context.

    – jjanes
    2 days ago


















3















We have an integer column that currently consists only of 0 or 1 values. This column has now been used by a developer to store a unique 32-bit identifier on some occasions, and we need to be able to efficiently pull out rows containing any one of these identifiers.



Given the value will be 0 or 1 say (I don't have figures yet) 99% of the time, how might it best be indexed to query against the minority case? Am I even right in thinking the volume of common values will be an issue?



           Column           |  Type   |     Modifiers
----------------------------+---------+--------------------
event_value | integer | not null


There are currently no indexes on this column. And I don't envisage the need to regularly select just the 0 or 1 values.



The table is of a reasonable size, currently 30 million rows and growing fast.



I appreciate this isn't the best use of the column, but that can't change in the short term.










share|improve this question




















  • 3





    Maybe a filtered index? create index on the_table (...) where event_value > 1;

    – a_horse_with_no_name
    2 days ago






  • 3





    What @a_horse_with_no_name suggests. But you'll then should be using modified filters: where (event_value > 1) and event_value = @XYZ, so the filtered index is used.

    – ypercubeᵀᴹ
    2 days ago











  • @a_horse_with_no_name Ah, ok. I didn't even realise you could have a where clause attached to the index. Thanks.

    – whoasked
    2 days ago






  • 1





    Yes. The benefit compared to a common index is that it has a much smaller size (1% in your case) as it will store only the rows with values that differ from 0,1. @a_horse_with_no_name, add an answer?

    – ypercubeᵀᴹ
    2 days ago













  • @ypercubeᵀᴹ You should not even need the modified filters. I can construct situations in which you do, but they are unlikely to occur in practice in this context.

    – jjanes
    2 days ago
















3












3








3








We have an integer column that currently consists only of 0 or 1 values. This column has now been used by a developer to store a unique 32-bit identifier on some occasions, and we need to be able to efficiently pull out rows containing any one of these identifiers.



Given the value will be 0 or 1 say (I don't have figures yet) 99% of the time, how might it best be indexed to query against the minority case? Am I even right in thinking the volume of common values will be an issue?



           Column           |  Type   |     Modifiers
----------------------------+---------+--------------------
event_value | integer | not null


There are currently no indexes on this column. And I don't envisage the need to regularly select just the 0 or 1 values.



The table is of a reasonable size, currently 30 million rows and growing fast.



I appreciate this isn't the best use of the column, but that can't change in the short term.










share|improve this question
















We have an integer column that currently consists only of 0 or 1 values. This column has now been used by a developer to store a unique 32-bit identifier on some occasions, and we need to be able to efficiently pull out rows containing any one of these identifiers.



Given the value will be 0 or 1 say (I don't have figures yet) 99% of the time, how might it best be indexed to query against the minority case? Am I even right in thinking the volume of common values will be an issue?



           Column           |  Type   |     Modifiers
----------------------------+---------+--------------------
event_value | integer | not null


There are currently no indexes on this column. And I don't envisage the need to regularly select just the 0 or 1 values.



The table is of a reasonable size, currently 30 million rows and growing fast.



I appreciate this isn't the best use of the column, but that can't change in the short term.







postgresql index postgresql-9.6






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago









MDCCL

6,78331745




6,78331745










asked 2 days ago









whoaskedwhoasked

195




195








  • 3





    Maybe a filtered index? create index on the_table (...) where event_value > 1;

    – a_horse_with_no_name
    2 days ago






  • 3





    What @a_horse_with_no_name suggests. But you'll then should be using modified filters: where (event_value > 1) and event_value = @XYZ, so the filtered index is used.

    – ypercubeᵀᴹ
    2 days ago











  • @a_horse_with_no_name Ah, ok. I didn't even realise you could have a where clause attached to the index. Thanks.

    – whoasked
    2 days ago






  • 1





    Yes. The benefit compared to a common index is that it has a much smaller size (1% in your case) as it will store only the rows with values that differ from 0,1. @a_horse_with_no_name, add an answer?

    – ypercubeᵀᴹ
    2 days ago













  • @ypercubeᵀᴹ You should not even need the modified filters. I can construct situations in which you do, but they are unlikely to occur in practice in this context.

    – jjanes
    2 days ago
















  • 3





    Maybe a filtered index? create index on the_table (...) where event_value > 1;

    – a_horse_with_no_name
    2 days ago






  • 3





    What @a_horse_with_no_name suggests. But you'll then should be using modified filters: where (event_value > 1) and event_value = @XYZ, so the filtered index is used.

    – ypercubeᵀᴹ
    2 days ago











  • @a_horse_with_no_name Ah, ok. I didn't even realise you could have a where clause attached to the index. Thanks.

    – whoasked
    2 days ago






  • 1





    Yes. The benefit compared to a common index is that it has a much smaller size (1% in your case) as it will store only the rows with values that differ from 0,1. @a_horse_with_no_name, add an answer?

    – ypercubeᵀᴹ
    2 days ago













  • @ypercubeᵀᴹ You should not even need the modified filters. I can construct situations in which you do, but they are unlikely to occur in practice in this context.

    – jjanes
    2 days ago










3




3





Maybe a filtered index? create index on the_table (...) where event_value > 1;

– a_horse_with_no_name
2 days ago





Maybe a filtered index? create index on the_table (...) where event_value > 1;

– a_horse_with_no_name
2 days ago




3




3





What @a_horse_with_no_name suggests. But you'll then should be using modified filters: where (event_value > 1) and event_value = @XYZ, so the filtered index is used.

– ypercubeᵀᴹ
2 days ago





What @a_horse_with_no_name suggests. But you'll then should be using modified filters: where (event_value > 1) and event_value = @XYZ, so the filtered index is used.

– ypercubeᵀᴹ
2 days ago













@a_horse_with_no_name Ah, ok. I didn't even realise you could have a where clause attached to the index. Thanks.

– whoasked
2 days ago





@a_horse_with_no_name Ah, ok. I didn't even realise you could have a where clause attached to the index. Thanks.

– whoasked
2 days ago




1




1





Yes. The benefit compared to a common index is that it has a much smaller size (1% in your case) as it will store only the rows with values that differ from 0,1. @a_horse_with_no_name, add an answer?

– ypercubeᵀᴹ
2 days ago







Yes. The benefit compared to a common index is that it has a much smaller size (1% in your case) as it will store only the rows with values that differ from 0,1. @a_horse_with_no_name, add an answer?

– ypercubeᵀᴹ
2 days ago















@ypercubeᵀᴹ You should not even need the modified filters. I can construct situations in which you do, but they are unlikely to occur in practice in this context.

– jjanes
2 days ago







@ypercubeᵀᴹ You should not even need the modified filters. I can construct situations in which you do, but they are unlikely to occur in practice in this context.

– jjanes
2 days ago












1 Answer
1






active

oldest

votes


















0














First off, like you said yourself, not the best use of the column. Should be a separate boolean and an integer column for your "32.bit identifiers". If that's NULL 99% of the time, that is no problem. NULL storage is very cheap.



Either way, you should definitely use a partial index. (That's the proper term as used in the manual.) Excluding 99 % of the rows from the index makes it massively smaller, which matters for performance with millions of rows.



While the rare values are "32-bit identifiers", it may be incorrect to assume those are all > 1. Postgres uses signed integer, and 32-bit entities would also cover negative numbers. (Can we even rule out 0 or 1 as one of those identifiers?) If there can be negative values, too:



CREATE INDEX tbl_event_value_part_idx ON tbl (event_value)
WHERE event_value > 1 OR event_value < 0; -- or similar


event_value does not have to be an index column, regardless of its use in the WHERE clause. That entirely depends on the kinds of queries to expect. Either way, the safe bet is to add the same WHERE conditions literally to any query supposed to use the index, even if that's logically redundant. Postgres can make very basic logical conclusions to determine applicable indexes, but it is no AI and does not try to be (would get too expensive quickly). Like:



SELECT * FROM tbl WHERE event_value > 1 OR event_value < 0


Related:




  • Postgres partial index on IS NULL not working

  • PostgreSQL partial index unused when created on a table with existing data

  • How to index a query with `WHERE field IS NULL`?





share























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230135%2fbest-index-for-column-where-values-are-mostly-the-same%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    First off, like you said yourself, not the best use of the column. Should be a separate boolean and an integer column for your "32.bit identifiers". If that's NULL 99% of the time, that is no problem. NULL storage is very cheap.



    Either way, you should definitely use a partial index. (That's the proper term as used in the manual.) Excluding 99 % of the rows from the index makes it massively smaller, which matters for performance with millions of rows.



    While the rare values are "32-bit identifiers", it may be incorrect to assume those are all > 1. Postgres uses signed integer, and 32-bit entities would also cover negative numbers. (Can we even rule out 0 or 1 as one of those identifiers?) If there can be negative values, too:



    CREATE INDEX tbl_event_value_part_idx ON tbl (event_value)
    WHERE event_value > 1 OR event_value < 0; -- or similar


    event_value does not have to be an index column, regardless of its use in the WHERE clause. That entirely depends on the kinds of queries to expect. Either way, the safe bet is to add the same WHERE conditions literally to any query supposed to use the index, even if that's logically redundant. Postgres can make very basic logical conclusions to determine applicable indexes, but it is no AI and does not try to be (would get too expensive quickly). Like:



    SELECT * FROM tbl WHERE event_value > 1 OR event_value < 0


    Related:




    • Postgres partial index on IS NULL not working

    • PostgreSQL partial index unused when created on a table with existing data

    • How to index a query with `WHERE field IS NULL`?





    share




























      0














      First off, like you said yourself, not the best use of the column. Should be a separate boolean and an integer column for your "32.bit identifiers". If that's NULL 99% of the time, that is no problem. NULL storage is very cheap.



      Either way, you should definitely use a partial index. (That's the proper term as used in the manual.) Excluding 99 % of the rows from the index makes it massively smaller, which matters for performance with millions of rows.



      While the rare values are "32-bit identifiers", it may be incorrect to assume those are all > 1. Postgres uses signed integer, and 32-bit entities would also cover negative numbers. (Can we even rule out 0 or 1 as one of those identifiers?) If there can be negative values, too:



      CREATE INDEX tbl_event_value_part_idx ON tbl (event_value)
      WHERE event_value > 1 OR event_value < 0; -- or similar


      event_value does not have to be an index column, regardless of its use in the WHERE clause. That entirely depends on the kinds of queries to expect. Either way, the safe bet is to add the same WHERE conditions literally to any query supposed to use the index, even if that's logically redundant. Postgres can make very basic logical conclusions to determine applicable indexes, but it is no AI and does not try to be (would get too expensive quickly). Like:



      SELECT * FROM tbl WHERE event_value > 1 OR event_value < 0


      Related:




      • Postgres partial index on IS NULL not working

      • PostgreSQL partial index unused when created on a table with existing data

      • How to index a query with `WHERE field IS NULL`?





      share


























        0












        0








        0







        First off, like you said yourself, not the best use of the column. Should be a separate boolean and an integer column for your "32.bit identifiers". If that's NULL 99% of the time, that is no problem. NULL storage is very cheap.



        Either way, you should definitely use a partial index. (That's the proper term as used in the manual.) Excluding 99 % of the rows from the index makes it massively smaller, which matters for performance with millions of rows.



        While the rare values are "32-bit identifiers", it may be incorrect to assume those are all > 1. Postgres uses signed integer, and 32-bit entities would also cover negative numbers. (Can we even rule out 0 or 1 as one of those identifiers?) If there can be negative values, too:



        CREATE INDEX tbl_event_value_part_idx ON tbl (event_value)
        WHERE event_value > 1 OR event_value < 0; -- or similar


        event_value does not have to be an index column, regardless of its use in the WHERE clause. That entirely depends on the kinds of queries to expect. Either way, the safe bet is to add the same WHERE conditions literally to any query supposed to use the index, even if that's logically redundant. Postgres can make very basic logical conclusions to determine applicable indexes, but it is no AI and does not try to be (would get too expensive quickly). Like:



        SELECT * FROM tbl WHERE event_value > 1 OR event_value < 0


        Related:




        • Postgres partial index on IS NULL not working

        • PostgreSQL partial index unused when created on a table with existing data

        • How to index a query with `WHERE field IS NULL`?





        share













        First off, like you said yourself, not the best use of the column. Should be a separate boolean and an integer column for your "32.bit identifiers". If that's NULL 99% of the time, that is no problem. NULL storage is very cheap.



        Either way, you should definitely use a partial index. (That's the proper term as used in the manual.) Excluding 99 % of the rows from the index makes it massively smaller, which matters for performance with millions of rows.



        While the rare values are "32-bit identifiers", it may be incorrect to assume those are all > 1. Postgres uses signed integer, and 32-bit entities would also cover negative numbers. (Can we even rule out 0 or 1 as one of those identifiers?) If there can be negative values, too:



        CREATE INDEX tbl_event_value_part_idx ON tbl (event_value)
        WHERE event_value > 1 OR event_value < 0; -- or similar


        event_value does not have to be an index column, regardless of its use in the WHERE clause. That entirely depends on the kinds of queries to expect. Either way, the safe bet is to add the same WHERE conditions literally to any query supposed to use the index, even if that's logically redundant. Postgres can make very basic logical conclusions to determine applicable indexes, but it is no AI and does not try to be (would get too expensive quickly). Like:



        SELECT * FROM tbl WHERE event_value > 1 OR event_value < 0


        Related:




        • Postgres partial index on IS NULL not working

        • PostgreSQL partial index unused when created on a table with existing data

        • How to index a query with `WHERE field IS NULL`?






        share











        share


        share










        answered 4 mins ago









        Erwin BrandstetterErwin Brandstetter

        92.9k9178292




        92.9k9178292






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230135%2fbest-index-for-column-where-values-are-mostly-the-same%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Szabolcs (Ungheria) Altri progetti | Menu di navigazione48°10′14.56″N 21°29′33.14″E /...

            Discografia di Klaus Schulze Indice Album in studio | Album dal vivo | Singoli | Antologie | Colonne...

            How to make inet_server_addr() return localhost in spite of ::1/128RETURN NEXT in Postgres FunctionConnect to...