Excel is arguably the most used format for documentation by businesses organizations. One can speculate why that is the case: loosely-structured text format, tendency of people to think of complex data sets in table format, available on almost every desktop. While most of us can think of better ways to capture and manage business data, the fact remains, Excel is here to stay and a lot of information used in IT projects is in that format. This article is about my experience in processing Excel files using Java.
A few years ago (2006), I evaluated two open-source packages for processing data in Excel format: JExcelApi (http://jexcelapi.sourceforge.net) and Apache-POI (http://poi.apache.org/spreadsheet). I was looking for specific capabilities: read/write the cell content, set the color of a row, freeze panes. Also, I needed cell formatting capabilities: wrap text, set the type to "General" or "Numeric." Based on these requirements, and how intuitive was the API, I decided to use JExcelApi. Using this library I was able to embed import/export capabilities in several Java standalone applications that handle large Excel files with tens of sheets and tens of thousands or rows.
I will mention at this point that since 2007 Microsoft has introduced a new format: Excel-2007 or Excel-OOXML. This format is not backwards compatible with the previous format (Excel 97-2003). This creates new problems for the Java developer to support both formats. JExcelApi does not support the new format, Apache-POI does it with similar but different classes.
The JExcelApi has separate classes for reading and writing Excel files. The following code reads data from the second column of the first sheet:
WorkbookSettings ws = new WorkbookSettings();
ws.setLocale(new Locale("en", "EN"));
ws.setEncoding("ISO-8859-1");
Workbook workbook = Workbook.getWorkbook(xlsFile, ws);
Sheet sheet = workbook.getSheet(0);
for (int row = 1; row < sheet.getRows(); row++) {
String val = sheet.getCell(1, row).getContents().trim();
[...]
}
workbook.close();
Creating an Excel file is straightforward. Notice that we are using classes starting with Writable:
WorkbookSettings ws = new WorkbookSettings();
ws.setLocale(new Locale("en", "EN"));
WritableWorkbook workbook = Workbook.createWorkbook(xlsFile, ws);
WritableSheet s = workbook.createSheet("Start", 0);
s.addCell(new Label(0, 0, "AAA"));
workbook.write();
workbook.close();
There is no API to update an existing file, but there is a workaround: create a new file using the original as a "template:"
WorkbookSettings ws = new WorkbookSettings();
ws.setLocale(new Locale("en", "EN"));
ws.setEncoding("ISO-8859-1");
File workFile = File.createTempFile("UpdateExcelFile", ".xls");
Workbook template = Workbook.getWorkbook(original, ws);
WritableWorkbook workbook = Workbook.createWorkbook(workFile, template);
template.close();
WritableSheet sheet = workbook.getSheet(0);
.... process the cells in this sheeet
workbook.write();
workbook.close();
FileUtils.copyFile(workFile, original);
workFile.delete();
Things get more complicated when the files are very large. On reading, instead of getting the entire Excel file in memory, it is better to process one sheet at a time. Rather than using the following code:
Sheet[] sheets = workbook.getSheets();
for (Sheet sheet : sheets) {
[...]
}
use the following:
int numOfSheets = workbook.getNumberOfSheets();
for (int i = 0 ; i < numOfSheets; i++) {
Sheet sheet = workbook.getSheet(i);
[...]
}
Also, when creating large files, writing is much faster if some features are disabled:
WorkbookSettings ws = new WorkbookSettings();
ws.setLocale(new Locale("en", "EN"));
ws.setEncoding(("ISO-8859-1");
ws.setDrawingsDisabled(true);
ws.setNamesDisabled(true);
ws.setMergedCellChecking(false);
ws.setFormulaAdjust(false);
ws.setCellValidationDisabled(true);
WritableWorkbook workbook = Workbook.createWorkbook(xlsFile, ws);
[...]
The information about disabling features cames out of the JExcelApi forum, which I highly recommend for developers: http://tech.groups.yahoo.com/group/JExcelApi.
One useful capability in my programs is to launch Excel after building the file. While most Java developers know about to use Runtime.getRuntime().exec(…), there are subtleties in making it to work smoothly. First, read the article on JavaWorld about how to use Runtime.exec correctly: http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html. Based on it, the code to launch Excel has the following skeleton:
String cmd = new String[] {....}
ProcessBuilder procBuilder = new ProcessBuilder(cmd);
[...]
procBuidler.start();
[...]
int exitCode = procBuilder.waitFor();
But what should be the command to launch? First impulse is to use: excel.exe <xls-file> - the problem is that we don’t know where the Excel binary file is on the file system is. There are some options – check the PATH, use configuration files – but there is a better option: use the command CMD which can launch Excel by using the file association in Windows between the extension (.xls) and the program to handle it (excel.exe). To launch Excel for a specific file, the command is:
CMD /C start [<title>] <xls-file>
The Java code is:
String title == "\"ExcelLauncher\"";
String[] cmd = new String[] { "cmd.exe", "/C", "start",
title, xlsFile.getPath() };
The code above has a few tricks in it: first, although the syntax of CMD specifies that the <title> is optional, it is always set in the Java invocation, otherwise the command fails in the cases when the path to the Excel file contains spaces (e.g. "C:\Documents and Settings\joe.smith\a.xls"). Another detail is that the title must start and end with quotes, otherwise the command fails.
The code snippets shown in this article can easily be used to slice and dice the Excel files that so often capture the business and system information for IT projects. From here it is possible to go in different directions: link the external information with the data stored in the IT systems, mass-update IT systems with information stored in Excel files, reconcile data between external Excel files and system data.
No Comments
No comments yet.