Database & Application design to support Reporting, Analytics and StatsReporting Service and Application...
In the Lost in Space intro why was Dr. Smith actor listed as a special guest star?
What is formjacking?
80-bit collision resistence because of 80-bit x87 registers?
How to store all ctor parameters in fields
What does "don't have a baby" imply or mean in this sentence?
Cryptic cross... with words
I am a loser when it comes to jobs, what possibilities do I have?
3D buried view in Tikz
How can I make my enemies feel real and make combat more engaging?
Boss asked me to sign a resignation paper without a date on it along with my new contract
Is there a name for this series?
Why would you use 2 alternate layout buttons instead of 1, when only one can be selected at once
How can I persuade an unwilling soul to become willing?
Including proofs of known theorems in master's thesis
Identical projects by students at two different colleges: still plagiarism?
PostGIS function to move a polygon to centre over new point coordinates
How do I make my single-minded character more interested in the main story?
Is Screenshot Time-tracking Common?
Can I legally make a website about boycotting a certain company?
How can I give a Ranger advantage on a check due to Favored Enemy without spoiling the story for the player?
Do these large-scale, human power-plant-tending robots from the Matrix movies have a name, in-universe or out?
Spanning tree Priority values
Why can all solutions to the simple harmonic motion equation be written in terms of sines and cosines?
Trying to make a 3dplot
Database & Application design to support Reporting, Analytics and Stats
Reporting Service and Application RoleOnly allow one checked row in a Column in SQL ServerSQL Server Database Design for ReportingData Warehouse design for reporting against data for many time zonesINSERT/SELECT slow after TRUNCATE TABLE on table with 6 over mllion rowsService Broker received an error message on this conversationHow to design efficient queries and structure for data logging table (numerical stats)Actual Row Size vs My Estimated Row SizeColumnstore index on multi-tenant database
I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".
Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.
We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.
So, I am trying to figure out how I should set this up at the application and database level.
Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?
Ideas:
Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.
Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)
Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?
Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!
sql-server
bumped to the homepage by Community♦ 13 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
migrated from serverfault.com Sep 15 '15 at 20:50
This question came from our site for system and network administrators.
add a comment |
I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".
Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.
We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.
So, I am trying to figure out how I should set this up at the application and database level.
Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?
Ideas:
Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.
Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)
Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?
Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!
sql-server
bumped to the homepage by Community♦ 13 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
migrated from serverfault.com Sep 15 '15 at 20:50
This question came from our site for system and network administrators.
add a comment |
I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".
Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.
We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.
So, I am trying to figure out how I should set this up at the application and database level.
Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?
Ideas:
Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.
Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)
Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?
Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!
sql-server
I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".
Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.
We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.
So, I am trying to figure out how I should set this up at the application and database level.
Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?
Ideas:
Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.
Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)
Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?
Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!
sql-server
sql-server
asked Sep 15 '15 at 20:13
BrookBrook
11
11
bumped to the homepage by Community♦ 13 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 13 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
migrated from serverfault.com Sep 15 '15 at 20:50
This question came from our site for system and network administrators.
migrated from serverfault.com Sep 15 '15 at 20:50
This question came from our site for system and network administrators.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.
I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
Cache and Queue are the keywords I think.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f115143%2fdatabase-application-design-to-support-reporting-analytics-and-stats%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.
I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
Cache and Queue are the keywords I think.
add a comment |
Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.
I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
Cache and Queue are the keywords I think.
add a comment |
Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.
I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
Cache and Queue are the keywords I think.
Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.
I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
Cache and Queue are the keywords I think.
answered Sep 15 '15 at 21:11
GummibeerGummibeer
11
11
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f115143%2fdatabase-application-design-to-support-reporting-analytics-and-stats%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown