Database & Application design to support Reporting, Analytics and StatsReporting Service and Application...

In the Lost in Space intro why was Dr. Smith actor listed as a special guest star?

What is formjacking?

80-bit collision resistence because of 80-bit x87 registers?

How to store all ctor parameters in fields

What does "don't have a baby" imply or mean in this sentence?

Cryptic cross... with words

I am a loser when it comes to jobs, what possibilities do I have?

3D buried view in Tikz

How can I make my enemies feel real and make combat more engaging?

Boss asked me to sign a resignation paper without a date on it along with my new contract

Is there a name for this series?

Why would you use 2 alternate layout buttons instead of 1, when only one can be selected at once

How can I persuade an unwilling soul to become willing?

Including proofs of known theorems in master's thesis

Identical projects by students at two different colleges: still plagiarism?

PostGIS function to move a polygon to centre over new point coordinates

How do I make my single-minded character more interested in the main story?

Is Screenshot Time-tracking Common?

Can I legally make a website about boycotting a certain company?

How can I give a Ranger advantage on a check due to Favored Enemy without spoiling the story for the player?

Do these large-scale, human power-plant-tending robots from the Matrix movies have a name, in-universe or out?

Spanning tree Priority values

Why can all solutions to the simple harmonic motion equation be written in terms of sines and cosines?

Trying to make a 3dplot



Database & Application design to support Reporting, Analytics and Stats


Reporting Service and Application RoleOnly allow one checked row in a Column in SQL ServerSQL Server Database Design for ReportingData Warehouse design for reporting against data for many time zonesINSERT/SELECT slow after TRUNCATE TABLE on table with 6 over mllion rowsService Broker received an error message on this conversationHow to design efficient queries and structure for data logging table (numerical stats)Actual Row Size vs My Estimated Row SizeColumnstore index on multi-tenant database













0















I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".



Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.



We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.



So, I am trying to figure out how I should set this up at the application and database level.



Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?



Ideas:




  1. Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.


  2. Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)


  3. Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?



Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!










share|improve this question














bumped to the homepage by Community 13 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.






migrated from serverfault.com Sep 15 '15 at 20:50


This question came from our site for system and network administrators.























    0















    I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".



    Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.



    We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.



    So, I am trying to figure out how I should set this up at the application and database level.



    Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?



    Ideas:




    1. Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.


    2. Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)


    3. Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?



    Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!










    share|improve this question














    bumped to the homepage by Community 13 mins ago


    This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.






    migrated from serverfault.com Sep 15 '15 at 20:50


    This question came from our site for system and network administrators.





















      0












      0








      0








      I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".



      Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.



      We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.



      So, I am trying to figure out how I should set this up at the application and database level.



      Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?



      Ideas:




      1. Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.


      2. Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)


      3. Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?



      Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!










      share|improve this question














      I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".



      Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.



      We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.



      So, I am trying to figure out how I should set this up at the application and database level.



      Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?



      Ideas:




      1. Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.


      2. Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)


      3. Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?



      Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!







      sql-server






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Sep 15 '15 at 20:13









      BrookBrook

      11




      11





      bumped to the homepage by Community 13 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







      bumped to the homepage by Community 13 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.






      migrated from serverfault.com Sep 15 '15 at 20:50


      This question came from our site for system and network administrators.









      migrated from serverfault.com Sep 15 '15 at 20:50


      This question came from our site for system and network administrators.
























          1 Answer
          1






          active

          oldest

          votes


















          0














          Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.



          I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
          Cache and Queue are the keywords I think.






          share|improve this answer























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "182"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f115143%2fdatabase-application-design-to-support-reporting-analytics-and-stats%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.



            I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
            Cache and Queue are the keywords I think.






            share|improve this answer




























              0














              Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.



              I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
              Cache and Queue are the keywords I think.






              share|improve this answer


























                0












                0








                0







                Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.



                I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
                Cache and Queue are the keywords I think.






                share|improve this answer













                Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.



                I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
                Cache and Queue are the keywords I think.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Sep 15 '15 at 21:11









                GummibeerGummibeer

                11




                11






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Database Administrators Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f115143%2fdatabase-application-design-to-support-reporting-analytics-and-stats%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Szabolcs (Ungheria) Altri progetti | Menu di navigazione48°10′14.56″N 21°29′33.14″E /...

                    Discografia di Klaus Schulze Indice Album in studio | Album dal vivo | Singoli | Antologie | Colonne...

                    How to make inet_server_addr() return localhost in spite of ::1/128RETURN NEXT in Postgres FunctionConnect to...