Python to write multiple dataframes and highlight rows inside an excel fileReading columns and rows in a .csv...
What is the smallest molar volume?
What really causes series inductance of capacitors?
Protagonist constantly has to have long words explained to her. Will this get tedious?
Whats happened with already installed GNOME apps if I install and run KDE to Ubuntu 18.04?
Can a Hydra make multiple opportunity attacks at once?
What is the reward?
SQL Server Service does not start automatically after system restart
Is it possible to narrate a novel in a faux-historical style without alienating the reader?
Is there a configuration of the 8-puzzle where locking a tile makes it harder?
Taking an academic pseudonym?
Does しかたない imply disappointment?
Excluding or including by awk
UK visa start date and Flight Depature Time
typeof generic and casted type
What sort of grammatical construct is ‘Quod per sortem sternit fortem’?
How can guns be countered by melee combat without raw-ability or exceptional explanations?
Distribution of sum of independent exponentials with random number of summands
Sets which are both Sum-free and Product-free.
How do I make my single-minded character more interested in the main story?
What is formjacking?
Why do single electrical receptacles exist?
70s or 80s B-movie about aliens in a family's television, fry the house cat and trap the son inside the TV
What does @ mean in a hostname in DNS configuration?
What's the meaning of #0?
Python to write multiple dataframes and highlight rows inside an excel file
Reading columns and rows in a .csv fileExcel Mapping ModuleSplit excel file with multiple sheets, manipulate the data and create final out fileWrite binary save file in PythonSum over selected numpy.ndarray column and write to a filePython read/write pickled fileWrite millions of lines to a file - Python, Dataframes and RedisCopy Excel file and make multiple daily versionsFastest way to write large CSV file in pythonInvert rows and columns in an Excel file
$begingroup$
I am trying to write multiple dataframes into an excel file one after another with the same logic. Nothing changes for any of the data frame, except the number of columns or number of records. The functions are still the same.
For example,
writer = pd.ExcelWriter(OutputName)
Emp_ID_df.to_excel(writer,'Sheet1',index = False)
Visa_df.to_excel(writer,'Sheet2',index = False)
custom_df_1.to_excel(writer,'Sheet3',index = False)
writer.save()
Now then, once written I am trying to highlight a row if any of the Boolean column has False value in it. There is no library in anacondas that I am aware of can highlight cells in excel. So I am going for the native one.
from win32com.client import Dispatch #to work with excel files
Pre_Out_df_ncol = Emp_ID_df.shape[1]
Pre_Out_df_nrow = Emp_ID_df.shape[0]
RequiredCol_let = colnum_num_string(Pre_Out_df_ncol)
arr = (Emp_ID_df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows = np.arange(1, len(Emp_ID_df)+ 1)[arr].tolist()
Pre_Out_df_ncol_2 = Visa_df.shape[1]
Pre_Out_df_nrow_2 = Visa_df.shape[0]
RequiredCol_let_2 = colnum_num_string(Pre_Out_df_ncol_2)
arr_2 = (Visa_df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows_2 = np.arange(1, len(Visa_df)+ 1)[arr_2].tolist()
Pre_Out_df_ncol_3 = custom_df_1.shape[1]
Pre_Out_df_nrow_3 = custom_df_1.shape[0]
RequiredCol_let_3 = colnum_num_string(Pre_Out_df_ncol_3)
arr_3 = (custom_df_1.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows_3 = np.arange(1, len(custom_df_1)+ 1)[arr_3].tolist()
xlApp = Dispatch("Excel.Application")
xlwb1 = xlApp.Workbooks.Open(OutputName)
xlApp.visible = False
print ("n...Highlighting the Output File at " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
for i in range(len(ReqRows)):
j = ReqRows[i] + 1
xlwb1.sheets('Sheet1').Range('A' + str(j) + ":" + RequiredCol_let + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet1').Columns.AutoFit()
for i in range(len(ReqRows_2)):
j = ReqRows_2[i] + 1
xlwb1.sheets('Sheet2').Range('A' + str(j) + ":" + RequiredCol_let_2 + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet2').Columns.AutoFit()
for i in range(len(ReqRows_3)):
j = ReqRows_3[i] + 1
xlwb1.sheets('Sheet3').Range('A' + str(j) + ":" + RequiredCol_let_3 + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet3').Columns.AutoFit()
At last, I am changing the name of the sheet
xlwb1.Sheets("Sheet1").Name = "XXXXA"
xlwb1.Sheets("Sheet2").Name = "XXXXASDAD"
xlwb1.Sheets("Sheet3").Name = "SADAD"
xlwb1.Save()
Now there are a few problems here
1) My number of dataframe increases and which means I am writing up the same code again and again.
2) The highlighting process works but it is too slow. Sometimes 90 % of the rows needs to be highlighted. There are 1 million rows and doing them one after another takes 35 minutes.
Kindly help me with this.
python python-3.x pandas
New contributor
$endgroup$
add a comment |
$begingroup$
I am trying to write multiple dataframes into an excel file one after another with the same logic. Nothing changes for any of the data frame, except the number of columns or number of records. The functions are still the same.
For example,
writer = pd.ExcelWriter(OutputName)
Emp_ID_df.to_excel(writer,'Sheet1',index = False)
Visa_df.to_excel(writer,'Sheet2',index = False)
custom_df_1.to_excel(writer,'Sheet3',index = False)
writer.save()
Now then, once written I am trying to highlight a row if any of the Boolean column has False value in it. There is no library in anacondas that I am aware of can highlight cells in excel. So I am going for the native one.
from win32com.client import Dispatch #to work with excel files
Pre_Out_df_ncol = Emp_ID_df.shape[1]
Pre_Out_df_nrow = Emp_ID_df.shape[0]
RequiredCol_let = colnum_num_string(Pre_Out_df_ncol)
arr = (Emp_ID_df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows = np.arange(1, len(Emp_ID_df)+ 1)[arr].tolist()
Pre_Out_df_ncol_2 = Visa_df.shape[1]
Pre_Out_df_nrow_2 = Visa_df.shape[0]
RequiredCol_let_2 = colnum_num_string(Pre_Out_df_ncol_2)
arr_2 = (Visa_df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows_2 = np.arange(1, len(Visa_df)+ 1)[arr_2].tolist()
Pre_Out_df_ncol_3 = custom_df_1.shape[1]
Pre_Out_df_nrow_3 = custom_df_1.shape[0]
RequiredCol_let_3 = colnum_num_string(Pre_Out_df_ncol_3)
arr_3 = (custom_df_1.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows_3 = np.arange(1, len(custom_df_1)+ 1)[arr_3].tolist()
xlApp = Dispatch("Excel.Application")
xlwb1 = xlApp.Workbooks.Open(OutputName)
xlApp.visible = False
print ("n...Highlighting the Output File at " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
for i in range(len(ReqRows)):
j = ReqRows[i] + 1
xlwb1.sheets('Sheet1').Range('A' + str(j) + ":" + RequiredCol_let + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet1').Columns.AutoFit()
for i in range(len(ReqRows_2)):
j = ReqRows_2[i] + 1
xlwb1.sheets('Sheet2').Range('A' + str(j) + ":" + RequiredCol_let_2 + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet2').Columns.AutoFit()
for i in range(len(ReqRows_3)):
j = ReqRows_3[i] + 1
xlwb1.sheets('Sheet3').Range('A' + str(j) + ":" + RequiredCol_let_3 + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet3').Columns.AutoFit()
At last, I am changing the name of the sheet
xlwb1.Sheets("Sheet1").Name = "XXXXA"
xlwb1.Sheets("Sheet2").Name = "XXXXASDAD"
xlwb1.Sheets("Sheet3").Name = "SADAD"
xlwb1.Save()
Now there are a few problems here
1) My number of dataframe increases and which means I am writing up the same code again and again.
2) The highlighting process works but it is too slow. Sometimes 90 % of the rows needs to be highlighted. There are 1 million rows and doing them one after another takes 35 minutes.
Kindly help me with this.
python python-3.x pandas
New contributor
$endgroup$
$begingroup$
Did you have a look at xlsxwriter.readthedocs.io/example_pandas_conditional.html and xlsxwriter.readthedocs.io/working_with_conditional_formats.html ?
$endgroup$
– Graipher
2 hours ago
add a comment |
$begingroup$
I am trying to write multiple dataframes into an excel file one after another with the same logic. Nothing changes for any of the data frame, except the number of columns or number of records. The functions are still the same.
For example,
writer = pd.ExcelWriter(OutputName)
Emp_ID_df.to_excel(writer,'Sheet1',index = False)
Visa_df.to_excel(writer,'Sheet2',index = False)
custom_df_1.to_excel(writer,'Sheet3',index = False)
writer.save()
Now then, once written I am trying to highlight a row if any of the Boolean column has False value in it. There is no library in anacondas that I am aware of can highlight cells in excel. So I am going for the native one.
from win32com.client import Dispatch #to work with excel files
Pre_Out_df_ncol = Emp_ID_df.shape[1]
Pre_Out_df_nrow = Emp_ID_df.shape[0]
RequiredCol_let = colnum_num_string(Pre_Out_df_ncol)
arr = (Emp_ID_df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows = np.arange(1, len(Emp_ID_df)+ 1)[arr].tolist()
Pre_Out_df_ncol_2 = Visa_df.shape[1]
Pre_Out_df_nrow_2 = Visa_df.shape[0]
RequiredCol_let_2 = colnum_num_string(Pre_Out_df_ncol_2)
arr_2 = (Visa_df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows_2 = np.arange(1, len(Visa_df)+ 1)[arr_2].tolist()
Pre_Out_df_ncol_3 = custom_df_1.shape[1]
Pre_Out_df_nrow_3 = custom_df_1.shape[0]
RequiredCol_let_3 = colnum_num_string(Pre_Out_df_ncol_3)
arr_3 = (custom_df_1.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows_3 = np.arange(1, len(custom_df_1)+ 1)[arr_3].tolist()
xlApp = Dispatch("Excel.Application")
xlwb1 = xlApp.Workbooks.Open(OutputName)
xlApp.visible = False
print ("n...Highlighting the Output File at " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
for i in range(len(ReqRows)):
j = ReqRows[i] + 1
xlwb1.sheets('Sheet1').Range('A' + str(j) + ":" + RequiredCol_let + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet1').Columns.AutoFit()
for i in range(len(ReqRows_2)):
j = ReqRows_2[i] + 1
xlwb1.sheets('Sheet2').Range('A' + str(j) + ":" + RequiredCol_let_2 + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet2').Columns.AutoFit()
for i in range(len(ReqRows_3)):
j = ReqRows_3[i] + 1
xlwb1.sheets('Sheet3').Range('A' + str(j) + ":" + RequiredCol_let_3 + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet3').Columns.AutoFit()
At last, I am changing the name of the sheet
xlwb1.Sheets("Sheet1").Name = "XXXXA"
xlwb1.Sheets("Sheet2").Name = "XXXXASDAD"
xlwb1.Sheets("Sheet3").Name = "SADAD"
xlwb1.Save()
Now there are a few problems here
1) My number of dataframe increases and which means I am writing up the same code again and again.
2) The highlighting process works but it is too slow. Sometimes 90 % of the rows needs to be highlighted. There are 1 million rows and doing them one after another takes 35 minutes.
Kindly help me with this.
python python-3.x pandas
New contributor
$endgroup$
I am trying to write multiple dataframes into an excel file one after another with the same logic. Nothing changes for any of the data frame, except the number of columns or number of records. The functions are still the same.
For example,
writer = pd.ExcelWriter(OutputName)
Emp_ID_df.to_excel(writer,'Sheet1',index = False)
Visa_df.to_excel(writer,'Sheet2',index = False)
custom_df_1.to_excel(writer,'Sheet3',index = False)
writer.save()
Now then, once written I am trying to highlight a row if any of the Boolean column has False value in it. There is no library in anacondas that I am aware of can highlight cells in excel. So I am going for the native one.
from win32com.client import Dispatch #to work with excel files
Pre_Out_df_ncol = Emp_ID_df.shape[1]
Pre_Out_df_nrow = Emp_ID_df.shape[0]
RequiredCol_let = colnum_num_string(Pre_Out_df_ncol)
arr = (Emp_ID_df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows = np.arange(1, len(Emp_ID_df)+ 1)[arr].tolist()
Pre_Out_df_ncol_2 = Visa_df.shape[1]
Pre_Out_df_nrow_2 = Visa_df.shape[0]
RequiredCol_let_2 = colnum_num_string(Pre_Out_df_ncol_2)
arr_2 = (Visa_df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows_2 = np.arange(1, len(Visa_df)+ 1)[arr_2].tolist()
Pre_Out_df_ncol_3 = custom_df_1.shape[1]
Pre_Out_df_nrow_3 = custom_df_1.shape[0]
RequiredCol_let_3 = colnum_num_string(Pre_Out_df_ncol_3)
arr_3 = (custom_df_1.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows_3 = np.arange(1, len(custom_df_1)+ 1)[arr_3].tolist()
xlApp = Dispatch("Excel.Application")
xlwb1 = xlApp.Workbooks.Open(OutputName)
xlApp.visible = False
print ("n...Highlighting the Output File at " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
for i in range(len(ReqRows)):
j = ReqRows[i] + 1
xlwb1.sheets('Sheet1').Range('A' + str(j) + ":" + RequiredCol_let + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet1').Columns.AutoFit()
for i in range(len(ReqRows_2)):
j = ReqRows_2[i] + 1
xlwb1.sheets('Sheet2').Range('A' + str(j) + ":" + RequiredCol_let_2 + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet2').Columns.AutoFit()
for i in range(len(ReqRows_3)):
j = ReqRows_3[i] + 1
xlwb1.sheets('Sheet3').Range('A' + str(j) + ":" + RequiredCol_let_3 + str(j)).Interior.ColorIndex = 6
xlwb1.sheets('Sheet3').Columns.AutoFit()
At last, I am changing the name of the sheet
xlwb1.Sheets("Sheet1").Name = "XXXXA"
xlwb1.Sheets("Sheet2").Name = "XXXXASDAD"
xlwb1.Sheets("Sheet3").Name = "SADAD"
xlwb1.Save()
Now there are a few problems here
1) My number of dataframe increases and which means I am writing up the same code again and again.
2) The highlighting process works but it is too slow. Sometimes 90 % of the rows needs to be highlighted. There are 1 million rows and doing them one after another takes 35 minutes.
Kindly help me with this.
python python-3.x pandas
python python-3.x pandas
New contributor
New contributor
edited 2 hours ago
Mathias Ettinger
24.7k33184
24.7k33184
New contributor
asked 2 hours ago
Sid29Sid29
1184
1184
New contributor
New contributor
$begingroup$
Did you have a look at xlsxwriter.readthedocs.io/example_pandas_conditional.html and xlsxwriter.readthedocs.io/working_with_conditional_formats.html ?
$endgroup$
– Graipher
2 hours ago
add a comment |
$begingroup$
Did you have a look at xlsxwriter.readthedocs.io/example_pandas_conditional.html and xlsxwriter.readthedocs.io/working_with_conditional_formats.html ?
$endgroup$
– Graipher
2 hours ago
$begingroup$
Did you have a look at xlsxwriter.readthedocs.io/example_pandas_conditional.html and xlsxwriter.readthedocs.io/working_with_conditional_formats.html ?
$endgroup$
– Graipher
2 hours ago
$begingroup$
Did you have a look at xlsxwriter.readthedocs.io/example_pandas_conditional.html and xlsxwriter.readthedocs.io/working_with_conditional_formats.html ?
$endgroup$
– Graipher
2 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
First, starting from your code, you should realize that you are repeating yourself, three times. This goes against the principle Don't repeat Yourself (DRY).
The only real difference between processing your three sheets are their name and the underlying dataframe, so you could make this into two functions:
from win32com.client import Dispatch
import pandas as pd
def highlight_false(df):
arr = (df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
return np.arange(1, len(df) + 1)[arr].tolist()
def color_rows(sheet, rows, col):
for row in rows:
cells = f"A{row+1}:{col}{row+1}"
sheet.Range(cells).Interior.ColorIndex = 6
sheet.Columns.AutoFit()
if __name__ == "__main__":
Emp_ID_df = ...
writer = pd.ExcelWriter(OutputName)
Emp_ID_df.to_excel(writer, 'Sheet1', index=False)
excel_app = Dispatch("Excel.Application")
workbook = excel_app.Workbooks.Open(OutputName)
excel_app.visible = False
sheet_names = ["Sheet1"]
dfs = [Emp_ID_df]
for sheet_name, df in zip(sheet_names, dfs):
sheet = workbook.Sheets(sheet)
rows = highlight_false(df)
col = colnum_num_string(df.shape[1])
color_rows(sheet, rows, col)
However, there is an even easier method using xlsxwriter
:
import pandas as pd
Emp_ID_df = ...
writer = pd.ExcelWriter(OutputName, engine='xlsxwriter')
Emp_ID_df.to_excel(writer, sheet_name="Sheet1", index=False)
workbook = writer.book
format1 = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
dfs = [Emp_ID_df]
for df, sheet in zip(dfs, writer.sheets.values()):
nrow, ncol = df.shape
col_letter = colnum_num_string(ncol + 1)
cells = f"A1:{col_letter}{nrow+1}"
sheet.conditional_format(cells, {"type": "cell",
"criteria": "==",
"value": 0,
"format": format1})
writer.save()
You might have to ensure that the sheets do not get out of snyc from the data frames, or just keep track of what name you save each dataframe to.
In addition I used Python's official style-guide, PEP8, which recommends using lower_case
for functions and variables as well as renaming your variables so they are a lot clearer.
$endgroup$
add a comment |
$begingroup$
1) My number of dataframe increases and which means I am writing up
the same code again and again
You're repeating a lot of code which is essentially doing the same thing with just a couple of variables.
I think for this you should wrap it in a function.
I've taken your code and put it inside a function:
def highlight_false_cells(sheetName, dataFrame, OutputName):
Pre_Out_df_ncol = dataFrame.shape[1]
Pre_Out_df_nrow = dataFrame.shape[0] # Is this required? It doesn't look to be used.
RequiredCol_let = colnum_num_string(Pre_Out_df_ncol)
arr = (dataFrame.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows = np.arange(1, len(dataFrame) + 1)[arr].tolist()
xlApp = Dispatch("Excel.Application")
xlwb1 = xlApp.Workbooks.Open(OutputName)
xlApp.visible = False
print("n...Highlighting the Output File at " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
for i in range(len(ReqRows)):
j = ReqRows[i] + 1
xlwb1.sheets(sheetName).Range('A' + str(j) + ":" + RequiredCol_let + str(j)).Interior.ColorIndex = 6
xlwb1.sheets(sheetName).Columns.AutoFit()
xlwb1.Save()
To call this for your dataframes:
highlight_false_cells("XXXXA", Emp_ID_df, OutputName)
highlight_false_cells("XXXXASDAD", Visa_df, OutputName)
highlight_false_cells("SADAD", custom_df_1, OutputName)
I'm not really familiar with the packages you're using, so there may be a mistake in logic within there. However hopefully it gives you a good example how to take your work and put it into a function.
I'd also recommend looking into a programming principle called "DRY", which stands for "Don't Repeat Yourself". If you can learn to spot areas like this where you have a lot of repeated lines then it will make stuff easier I believe.
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sid29 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214042%2fpython-to-write-multiple-dataframes-and-highlight-rows-inside-an-excel-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
First, starting from your code, you should realize that you are repeating yourself, three times. This goes against the principle Don't repeat Yourself (DRY).
The only real difference between processing your three sheets are their name and the underlying dataframe, so you could make this into two functions:
from win32com.client import Dispatch
import pandas as pd
def highlight_false(df):
arr = (df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
return np.arange(1, len(df) + 1)[arr].tolist()
def color_rows(sheet, rows, col):
for row in rows:
cells = f"A{row+1}:{col}{row+1}"
sheet.Range(cells).Interior.ColorIndex = 6
sheet.Columns.AutoFit()
if __name__ == "__main__":
Emp_ID_df = ...
writer = pd.ExcelWriter(OutputName)
Emp_ID_df.to_excel(writer, 'Sheet1', index=False)
excel_app = Dispatch("Excel.Application")
workbook = excel_app.Workbooks.Open(OutputName)
excel_app.visible = False
sheet_names = ["Sheet1"]
dfs = [Emp_ID_df]
for sheet_name, df in zip(sheet_names, dfs):
sheet = workbook.Sheets(sheet)
rows = highlight_false(df)
col = colnum_num_string(df.shape[1])
color_rows(sheet, rows, col)
However, there is an even easier method using xlsxwriter
:
import pandas as pd
Emp_ID_df = ...
writer = pd.ExcelWriter(OutputName, engine='xlsxwriter')
Emp_ID_df.to_excel(writer, sheet_name="Sheet1", index=False)
workbook = writer.book
format1 = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
dfs = [Emp_ID_df]
for df, sheet in zip(dfs, writer.sheets.values()):
nrow, ncol = df.shape
col_letter = colnum_num_string(ncol + 1)
cells = f"A1:{col_letter}{nrow+1}"
sheet.conditional_format(cells, {"type": "cell",
"criteria": "==",
"value": 0,
"format": format1})
writer.save()
You might have to ensure that the sheets do not get out of snyc from the data frames, or just keep track of what name you save each dataframe to.
In addition I used Python's official style-guide, PEP8, which recommends using lower_case
for functions and variables as well as renaming your variables so they are a lot clearer.
$endgroup$
add a comment |
$begingroup$
First, starting from your code, you should realize that you are repeating yourself, three times. This goes against the principle Don't repeat Yourself (DRY).
The only real difference between processing your three sheets are their name and the underlying dataframe, so you could make this into two functions:
from win32com.client import Dispatch
import pandas as pd
def highlight_false(df):
arr = (df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
return np.arange(1, len(df) + 1)[arr].tolist()
def color_rows(sheet, rows, col):
for row in rows:
cells = f"A{row+1}:{col}{row+1}"
sheet.Range(cells).Interior.ColorIndex = 6
sheet.Columns.AutoFit()
if __name__ == "__main__":
Emp_ID_df = ...
writer = pd.ExcelWriter(OutputName)
Emp_ID_df.to_excel(writer, 'Sheet1', index=False)
excel_app = Dispatch("Excel.Application")
workbook = excel_app.Workbooks.Open(OutputName)
excel_app.visible = False
sheet_names = ["Sheet1"]
dfs = [Emp_ID_df]
for sheet_name, df in zip(sheet_names, dfs):
sheet = workbook.Sheets(sheet)
rows = highlight_false(df)
col = colnum_num_string(df.shape[1])
color_rows(sheet, rows, col)
However, there is an even easier method using xlsxwriter
:
import pandas as pd
Emp_ID_df = ...
writer = pd.ExcelWriter(OutputName, engine='xlsxwriter')
Emp_ID_df.to_excel(writer, sheet_name="Sheet1", index=False)
workbook = writer.book
format1 = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
dfs = [Emp_ID_df]
for df, sheet in zip(dfs, writer.sheets.values()):
nrow, ncol = df.shape
col_letter = colnum_num_string(ncol + 1)
cells = f"A1:{col_letter}{nrow+1}"
sheet.conditional_format(cells, {"type": "cell",
"criteria": "==",
"value": 0,
"format": format1})
writer.save()
You might have to ensure that the sheets do not get out of snyc from the data frames, or just keep track of what name you save each dataframe to.
In addition I used Python's official style-guide, PEP8, which recommends using lower_case
for functions and variables as well as renaming your variables so they are a lot clearer.
$endgroup$
add a comment |
$begingroup$
First, starting from your code, you should realize that you are repeating yourself, three times. This goes against the principle Don't repeat Yourself (DRY).
The only real difference between processing your three sheets are their name and the underlying dataframe, so you could make this into two functions:
from win32com.client import Dispatch
import pandas as pd
def highlight_false(df):
arr = (df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
return np.arange(1, len(df) + 1)[arr].tolist()
def color_rows(sheet, rows, col):
for row in rows:
cells = f"A{row+1}:{col}{row+1}"
sheet.Range(cells).Interior.ColorIndex = 6
sheet.Columns.AutoFit()
if __name__ == "__main__":
Emp_ID_df = ...
writer = pd.ExcelWriter(OutputName)
Emp_ID_df.to_excel(writer, 'Sheet1', index=False)
excel_app = Dispatch("Excel.Application")
workbook = excel_app.Workbooks.Open(OutputName)
excel_app.visible = False
sheet_names = ["Sheet1"]
dfs = [Emp_ID_df]
for sheet_name, df in zip(sheet_names, dfs):
sheet = workbook.Sheets(sheet)
rows = highlight_false(df)
col = colnum_num_string(df.shape[1])
color_rows(sheet, rows, col)
However, there is an even easier method using xlsxwriter
:
import pandas as pd
Emp_ID_df = ...
writer = pd.ExcelWriter(OutputName, engine='xlsxwriter')
Emp_ID_df.to_excel(writer, sheet_name="Sheet1", index=False)
workbook = writer.book
format1 = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
dfs = [Emp_ID_df]
for df, sheet in zip(dfs, writer.sheets.values()):
nrow, ncol = df.shape
col_letter = colnum_num_string(ncol + 1)
cells = f"A1:{col_letter}{nrow+1}"
sheet.conditional_format(cells, {"type": "cell",
"criteria": "==",
"value": 0,
"format": format1})
writer.save()
You might have to ensure that the sheets do not get out of snyc from the data frames, or just keep track of what name you save each dataframe to.
In addition I used Python's official style-guide, PEP8, which recommends using lower_case
for functions and variables as well as renaming your variables so they are a lot clearer.
$endgroup$
First, starting from your code, you should realize that you are repeating yourself, three times. This goes against the principle Don't repeat Yourself (DRY).
The only real difference between processing your three sheets are their name and the underlying dataframe, so you could make this into two functions:
from win32com.client import Dispatch
import pandas as pd
def highlight_false(df):
arr = (df.select_dtypes(include=[bool])).eq(False).any(axis=1).values
return np.arange(1, len(df) + 1)[arr].tolist()
def color_rows(sheet, rows, col):
for row in rows:
cells = f"A{row+1}:{col}{row+1}"
sheet.Range(cells).Interior.ColorIndex = 6
sheet.Columns.AutoFit()
if __name__ == "__main__":
Emp_ID_df = ...
writer = pd.ExcelWriter(OutputName)
Emp_ID_df.to_excel(writer, 'Sheet1', index=False)
excel_app = Dispatch("Excel.Application")
workbook = excel_app.Workbooks.Open(OutputName)
excel_app.visible = False
sheet_names = ["Sheet1"]
dfs = [Emp_ID_df]
for sheet_name, df in zip(sheet_names, dfs):
sheet = workbook.Sheets(sheet)
rows = highlight_false(df)
col = colnum_num_string(df.shape[1])
color_rows(sheet, rows, col)
However, there is an even easier method using xlsxwriter
:
import pandas as pd
Emp_ID_df = ...
writer = pd.ExcelWriter(OutputName, engine='xlsxwriter')
Emp_ID_df.to_excel(writer, sheet_name="Sheet1", index=False)
workbook = writer.book
format1 = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
dfs = [Emp_ID_df]
for df, sheet in zip(dfs, writer.sheets.values()):
nrow, ncol = df.shape
col_letter = colnum_num_string(ncol + 1)
cells = f"A1:{col_letter}{nrow+1}"
sheet.conditional_format(cells, {"type": "cell",
"criteria": "==",
"value": 0,
"format": format1})
writer.save()
You might have to ensure that the sheets do not get out of snyc from the data frames, or just keep track of what name you save each dataframe to.
In addition I used Python's official style-guide, PEP8, which recommends using lower_case
for functions and variables as well as renaming your variables so they are a lot clearer.
edited 1 hour ago
answered 1 hour ago
GraipherGraipher
24.9k53587
24.9k53587
add a comment |
add a comment |
$begingroup$
1) My number of dataframe increases and which means I am writing up
the same code again and again
You're repeating a lot of code which is essentially doing the same thing with just a couple of variables.
I think for this you should wrap it in a function.
I've taken your code and put it inside a function:
def highlight_false_cells(sheetName, dataFrame, OutputName):
Pre_Out_df_ncol = dataFrame.shape[1]
Pre_Out_df_nrow = dataFrame.shape[0] # Is this required? It doesn't look to be used.
RequiredCol_let = colnum_num_string(Pre_Out_df_ncol)
arr = (dataFrame.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows = np.arange(1, len(dataFrame) + 1)[arr].tolist()
xlApp = Dispatch("Excel.Application")
xlwb1 = xlApp.Workbooks.Open(OutputName)
xlApp.visible = False
print("n...Highlighting the Output File at " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
for i in range(len(ReqRows)):
j = ReqRows[i] + 1
xlwb1.sheets(sheetName).Range('A' + str(j) + ":" + RequiredCol_let + str(j)).Interior.ColorIndex = 6
xlwb1.sheets(sheetName).Columns.AutoFit()
xlwb1.Save()
To call this for your dataframes:
highlight_false_cells("XXXXA", Emp_ID_df, OutputName)
highlight_false_cells("XXXXASDAD", Visa_df, OutputName)
highlight_false_cells("SADAD", custom_df_1, OutputName)
I'm not really familiar with the packages you're using, so there may be a mistake in logic within there. However hopefully it gives you a good example how to take your work and put it into a function.
I'd also recommend looking into a programming principle called "DRY", which stands for "Don't Repeat Yourself". If you can learn to spot areas like this where you have a lot of repeated lines then it will make stuff easier I believe.
New contributor
$endgroup$
add a comment |
$begingroup$
1) My number of dataframe increases and which means I am writing up
the same code again and again
You're repeating a lot of code which is essentially doing the same thing with just a couple of variables.
I think for this you should wrap it in a function.
I've taken your code and put it inside a function:
def highlight_false_cells(sheetName, dataFrame, OutputName):
Pre_Out_df_ncol = dataFrame.shape[1]
Pre_Out_df_nrow = dataFrame.shape[0] # Is this required? It doesn't look to be used.
RequiredCol_let = colnum_num_string(Pre_Out_df_ncol)
arr = (dataFrame.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows = np.arange(1, len(dataFrame) + 1)[arr].tolist()
xlApp = Dispatch("Excel.Application")
xlwb1 = xlApp.Workbooks.Open(OutputName)
xlApp.visible = False
print("n...Highlighting the Output File at " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
for i in range(len(ReqRows)):
j = ReqRows[i] + 1
xlwb1.sheets(sheetName).Range('A' + str(j) + ":" + RequiredCol_let + str(j)).Interior.ColorIndex = 6
xlwb1.sheets(sheetName).Columns.AutoFit()
xlwb1.Save()
To call this for your dataframes:
highlight_false_cells("XXXXA", Emp_ID_df, OutputName)
highlight_false_cells("XXXXASDAD", Visa_df, OutputName)
highlight_false_cells("SADAD", custom_df_1, OutputName)
I'm not really familiar with the packages you're using, so there may be a mistake in logic within there. However hopefully it gives you a good example how to take your work and put it into a function.
I'd also recommend looking into a programming principle called "DRY", which stands for "Don't Repeat Yourself". If you can learn to spot areas like this where you have a lot of repeated lines then it will make stuff easier I believe.
New contributor
$endgroup$
add a comment |
$begingroup$
1) My number of dataframe increases and which means I am writing up
the same code again and again
You're repeating a lot of code which is essentially doing the same thing with just a couple of variables.
I think for this you should wrap it in a function.
I've taken your code and put it inside a function:
def highlight_false_cells(sheetName, dataFrame, OutputName):
Pre_Out_df_ncol = dataFrame.shape[1]
Pre_Out_df_nrow = dataFrame.shape[0] # Is this required? It doesn't look to be used.
RequiredCol_let = colnum_num_string(Pre_Out_df_ncol)
arr = (dataFrame.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows = np.arange(1, len(dataFrame) + 1)[arr].tolist()
xlApp = Dispatch("Excel.Application")
xlwb1 = xlApp.Workbooks.Open(OutputName)
xlApp.visible = False
print("n...Highlighting the Output File at " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
for i in range(len(ReqRows)):
j = ReqRows[i] + 1
xlwb1.sheets(sheetName).Range('A' + str(j) + ":" + RequiredCol_let + str(j)).Interior.ColorIndex = 6
xlwb1.sheets(sheetName).Columns.AutoFit()
xlwb1.Save()
To call this for your dataframes:
highlight_false_cells("XXXXA", Emp_ID_df, OutputName)
highlight_false_cells("XXXXASDAD", Visa_df, OutputName)
highlight_false_cells("SADAD", custom_df_1, OutputName)
I'm not really familiar with the packages you're using, so there may be a mistake in logic within there. However hopefully it gives you a good example how to take your work and put it into a function.
I'd also recommend looking into a programming principle called "DRY", which stands for "Don't Repeat Yourself". If you can learn to spot areas like this where you have a lot of repeated lines then it will make stuff easier I believe.
New contributor
$endgroup$
1) My number of dataframe increases and which means I am writing up
the same code again and again
You're repeating a lot of code which is essentially doing the same thing with just a couple of variables.
I think for this you should wrap it in a function.
I've taken your code and put it inside a function:
def highlight_false_cells(sheetName, dataFrame, OutputName):
Pre_Out_df_ncol = dataFrame.shape[1]
Pre_Out_df_nrow = dataFrame.shape[0] # Is this required? It doesn't look to be used.
RequiredCol_let = colnum_num_string(Pre_Out_df_ncol)
arr = (dataFrame.select_dtypes(include=[bool])).eq(False).any(axis=1).values
ReqRows = np.arange(1, len(dataFrame) + 1)[arr].tolist()
xlApp = Dispatch("Excel.Application")
xlwb1 = xlApp.Workbooks.Open(OutputName)
xlApp.visible = False
print("n...Highlighting the Output File at " + datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
for i in range(len(ReqRows)):
j = ReqRows[i] + 1
xlwb1.sheets(sheetName).Range('A' + str(j) + ":" + RequiredCol_let + str(j)).Interior.ColorIndex = 6
xlwb1.sheets(sheetName).Columns.AutoFit()
xlwb1.Save()
To call this for your dataframes:
highlight_false_cells("XXXXA", Emp_ID_df, OutputName)
highlight_false_cells("XXXXASDAD", Visa_df, OutputName)
highlight_false_cells("SADAD", custom_df_1, OutputName)
I'm not really familiar with the packages you're using, so there may be a mistake in logic within there. However hopefully it gives you a good example how to take your work and put it into a function.
I'd also recommend looking into a programming principle called "DRY", which stands for "Don't Repeat Yourself". If you can learn to spot areas like this where you have a lot of repeated lines then it will make stuff easier I believe.
New contributor
New contributor
answered 1 hour ago
cphilipcphilip
262
262
New contributor
New contributor
add a comment |
add a comment |
Sid29 is a new contributor. Be nice, and check out our Code of Conduct.
Sid29 is a new contributor. Be nice, and check out our Code of Conduct.
Sid29 is a new contributor. Be nice, and check out our Code of Conduct.
Sid29 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214042%2fpython-to-write-multiple-dataframes-and-highlight-rows-inside-an-excel-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Did you have a look at xlsxwriter.readthedocs.io/example_pandas_conditional.html and xlsxwriter.readthedocs.io/working_with_conditional_formats.html ?
$endgroup$
– Graipher
2 hours ago