[练手习作]TMX与Excel互相转换与单独模版生成程序
挺冷清的,发一个前一段给邻居做的小程序,活跃一下学习气氛,呵呵翻译记忆交换标准 (TMX)
--------------------------------------------------------------
翻译记忆交换标准的 TMX 是“ T ranslation M emory e X change ”的缩写。 TMX 是中立的、开放的 XML 标准之一,它的目的是促进不同计算机辅助翻译( CAT )和本地化工具创建的翻译记忆( TM )数据交换。遵守 TMX 标准,不同工具、不同本地化公司创建的翻译记忆文件可以很方便的交换翻译记忆数据。
TMX标准的最初讨论需要追溯到1997 年 6 月,当年参加本地化行业标准协会(LISA)会议的与会者,包括本地化客户、工具提供商及本地化服务提供商召开了一个小型的会议,针对与日俱增的本地化工具的翻译记忆数据不兼容问题进行了讨论,会后这些成员形成了 LLSA 的一个专门团体,即 OSCAR(Open Standards for Container/Content Allowing Re-use),而 TMX 规范正是 OSCAR 的一个最重要的成果。
TMX 发布于 1998 年,是第一个 XML 标准。至今全球已经有 20 万用户正在应用这个标准。 TMX 标准实行认证方式应用,本地化工具开发公司的本地化工具如果通过了 LISA 的 TMX 标准测试,可以在本地化产品中加上TMX 图标。本地化开发商为了使产品符合行业标准,纷纷推出通过 TMX 认证的工具,否则很难被本地化服务公司购买和使用。
TMX规范的下载页面:
http://www.lisa.org/standards/tmx/specification.html
软件目标:
---------------------------------------------
程序的目的很简单,就是能够实现TMX与Excel之间的互相转换,以方便用户在不同环境下的使用(有无特定编辑器,语法检查器等等),以最大限度的利用不同软件的强项和方便与别人进行文件交换。也可以生成空的带有如果预设值的Excel和TMX文件。
软件设计思路与分析:
------------------------------------------------
本程序为个人版定制开发,所以预设值都是写死的,而且由于功能及其单一,无需图形界面功能,所以只提供命令行界面。而且由于功能要求有限,并且对规范的特性应用很有限(只用了一小部分tmx1.1的特性),不需要作过多的灵活性处理。
技术选择与实现思路:
----------------------------------------------------
小程序一个,无需考虑过多,为方便调试,使用log4j,xml操作使用jdom,excel操作使用apache的poi。
为方便命令行执行,要能够在原文件名后加入日期和时间,再加上新的文件类型后缀。
为方便windows鼠标操作,计划通过注册表文件来提供右键菜单功能,不过这个没实现,用户没要求,可以留待以后继续完善。
程序部署方式与执行方法的考虑与设想
------------------------------------------------------
使用java可执行jar方式,可方便拷贝执行。只要机器上有java运行环境,即可以用命令行执行。另外由于要操作TMX文件,xml解析器要求jar文件同目录下(其实是当前工作目录或path,只是放在同一目录下更方便)必须有tmx11.dtd。
命令行用法
TMX2XLS Usage:
Invocation:
java -jar tmx2xls.jar %parameters%
or
java -cp tmx2xls.jar com.greenflute.tmx.TMX2XLS %parameters%
Parameters:
1st param: 2xls|2tmx|newxls|newtmx
2nd param: <source file name>|<full path file name>
when newtmx or newxls was used, this file will be created.
3rd param: optional, <target file name>|<full path file name>,
autogenerated filename: <source file name>.yyyymmddhhmiss.
Reg file:
Adds a context menu for tmx|xls file, only for convience under Windows.
Warning:
It's a free tool, no guarantee for data corruption or errors.
程序设计与分析
----------------------------------------------------
为方便处理命令行和最大限度的代码重用,以及功能与控制分开的目的(方便以后转变为其他界面或执行方式,或加入其他软件包),程序分为两个类。
TMX2XLS作为Controller,负责处理命令行输入,usage显示,和程序主体部分的执行流程,用一段类似于factory method的代码来控制不同类型文件的解析和生成,保存。提供3个方法,main,usage,和getUniqName用来生成唯一文件名。
TMX作为实际的功能类,其生命周期由TMX2XLS控制,只提供两大类4个核心功能方法(parseTMX,saveTMX: parseXLS,saveXLS),核心的数据载体用一个Hashtable来保存文件属性,一个ArrayList(成员为String数组)来保存实际的多语言国际化数据。通过这两个载体实现数据的双向转换,即:无论来源文件是什么,都先转换为结构无关的中心数据载体,再进行后续操作。生成新文件也是如此,先操作中心数据载体,再进行后续工作。这样做使程序结构和控制逻辑简化了许多。
TMX2XLS.java
package com.greenflute.tmx;
import org.apache.log4j.Logger;
import org.apache.log4j.PropertyConfigurator;
import java.io.File;
import java.io.IOException;
import java.util.Calendar;
import java.util.Properties;
/**
* TMX2XLS v0.1
* <p/>
* convert tmx file to xls, or vice versa
* only work with tmx11.dtd, and only very simple functions were provided
* <p/>
* XLS Fromat Specification
* 1st sheet:
* Name: body
* Content: body of tmx file
* Columns: 2
* 1st Column: Source language content
* 2st Column: Target language content
* Rows: 2+
* 1st Row: Label of language name
* 2nd Row and following: translation content,
* every row is a tu,
* every cell is a tuv and embeding seg
* last row: tmx2xls tmx2xls
* <p/>
* 2nd sheet:
* Name: header
* Content: tmx header infos
* 1st Column: header name
* 2st Column: header content
* Rows: fixed
* 1: tmx2xls tmx2xls
* 2: xml.version 1.0
* 3: xml.encodingUTF-8
* <?xml version="1.0" encoding="UTF-8"?>
* 4: tmx.doctype
* <!DOCTYPE tmx SYSTEM "tmx11.dtd">
* 5: tmx.version 1.1
* 6: header.creationtool TMX2XLS
* 7: header.creationtoolversion 0.1
* 8: header.segtype sentence
* segtype="sentence"
* 9: header.o-tmfTMX2XLS
* 10: header.adminlang EN-US
* adminlang="EN-US"
* 11: srclangxxx
* it's besser to make a reference formula mit cell body.A1
* 12: header.datatype plaintext
* datatype="plaintext"
*/
public class TMX2XLS {
/**
* 1st param: 2xls or 2tmx
* 2nd param: original file name
* 3rd param: optional, target file name, when not provided, file name will be: originalname.yyyymmddhhmiss.
*
* @param args
*/
public static void main(String[] args) { //todo popup dialog to show error
//init logger
logger = Logger.getLogger(TMX2XLS.class);
Properties prop = new Properties();
try {
prop.load(TMX2XLS.class.getResourceAsStream("/log4j.properties"));
PropertyConfigurator.configure(prop);
} catch (IOException e) {
//not a fatal error!
e.printStackTrace();
}
prop = null;
int task = NOTIMPLAMENTED;
String srcname = null, targetname = null;
if (args.length >= 1 && args.length <= 3) {
//validate the command parameter
task = args.equalsIgnoreCase(TOTMX_TEXT) ? TOTMX :
args.equalsIgnoreCase(TOXLS_TEXT) ? TOXLS :
args.equalsIgnoreCase(NEWTMX_TEXT) ? NEWTMX :
args.equalsIgnoreCase(NEWXLS_TEXT) ? NEWXLS : NOTIMPLAMENTED;
//validate file existence
if (task == TOTMX || task == TOXLS) {
//must provide at least source file
if (args.length >= 2) {
//srcname
if (!new File(args).exists()) {
task = NOTIMPLAMENTED;
logger.error("Source file must exist!");
} else {
srcname = args;
//targetname
if (args.length == 3) {
if (new File(args).exists()) {
task = NOTIMPLAMENTED;
logger.error("Target file exists, choose another name!");
} else {
targetname = args;
}
} else {
//make a new name
targetname = args.substring(0, args.length() - 4) + "." +
getUniqName() + (task == TOTMX ? ".tmx" : ".xls");
}
}
} else {
task = NOTIMPLAMENTED;
logger.error("Please specify source file!");
}
} else if (task == NEWTMX || task == NEWXLS) {
if (args.length == 1) {
targetname = getUniqName() + (task == NEWTMX ? ".tmx" : ".xls");
} else {
if (new File(args).exists()) {
task = NOTIMPLAMENTED;
logger.error("Target file exist, choose another name!");
} else {
targetname = args;
}
}
}
}
//realwork start here
if (task != NOTIMPLAMENTED) {
TMX tmx = new TMX();
//parse
if (task == TOTMX) {
tmx.parseXLS(srcname);
} else if (task == TOXLS) {
tmx.parseTMX(srcname);
}
//write
if (task == TOTMX || task == NEWTMX) {
tmx.saveTMX(targetname);
} else if (task == TOXLS || task == NEWXLS) {
tmx.saveXLS(targetname);
}
} else {
//usage
usage();
}
logger.debug("" + task);
logger.debug(srcname);
logger.debug(targetname);
}
private static void usage() {
System.out.println("*******************************************************************************");
System.out.println("TMX2XLS Usage: ");
System.out.println(" ");
System.out.println("Invocation: ");
System.out.println("java -jar tmx2xls.jar %parameters% ");
System.out.println("or ");
System.out.println("java -cp tmx2xls.jar com.greenflute.tmx.TMX2XLS %parameters% ");
System.out.println(" ");
System.out.println("Parameters: ");
System.out.println("1st param: 2xls|2tmx|newxls|newtmx ");
System.out.println("2nd param: <source file name>|<full path file name> ");
System.out.println(" when newtmx or newxls was used, this file will be created. ");
System.out.println("3rd param: optional, <target file name>|<full path file name>, ");
System.out.println(" autogenerated filename: <source file name>.yyyymmddhhmiss. ");
System.out.println(" ");
System.out.println("Reg file: ");
System.out.println("Adds a context menu for tmx|xls file, only for convience under Windows. ");
System.out.println(" ");
System.out.println("Warning: ");
System.out.println("It's a free tool, no guarantee for data corruption or errors. ");
System.out.println("*******************************************************************************");
}
private static String getUniqName() {
Calendar cal = Calendar.getInstance();
return cal.get(Calendar.YEAR) +
((cal.get(Calendar.MONTH) + 1) <= 9 ? ("0" + (cal.get(Calendar.MONTH) + 1)) : (cal.get(Calendar.MONTH) + 1) + "") +
(cal.get(Calendar.DAY_OF_MONTH) <= 9 ? ("0" + cal.get(Calendar.DAY_OF_MONTH)) : cal.get(Calendar.DAY_OF_MONTH) + "") +
(cal.get(Calendar.HOUR_OF_DAY) <= 9 ? ("0" + cal.get(Calendar.HOUR_OF_DAY)) : cal.get(Calendar.HOUR_OF_DAY) + "") +
(cal.get(Calendar.MINUTE) <= 9 ? ("0" + cal.get(Calendar.MINUTE)) : cal.get(Calendar.MINUTE) + "") +
(cal.get(Calendar.SECOND) <= 9 ? ("0" + cal.get(Calendar.SECOND)) : cal.get(Calendar.SECOND) + "");
}
private static Logger logger = null;
private static final String TOTMX_TEXT = "2tmx";
private static final String TOXLS_TEXT = "2xls";
private static final String NEWTMX_TEXT = "newtmx";
private static final String NEWXLS_TEXT = "newxls";
private static final int TOTMX = 1;
private static final int TOXLS = 2;
private static final int NEWTMX = 3;
private static final int NEWXLS = 4;
private static final int NOTIMPLAMENTED = 0;
}
主控业务逻辑
//realwork start here
if (task != NOTIMPLAMENTED) {
TMX tmx = new TMX();
//parse
if (task == TOTMX) {
tmx.parseXLS(srcname);
} else if (task == TOXLS) {
tmx.parseTMX(srcname);
}
//write
if (task == TOTMX || task == NEWTMX) {
tmx.saveTMX(targetname);
} else if (task == TOXLS || task == NEWXLS) {
tmx.saveXLS(targetname);
}
TMX.java
package com.greenflute.tmx;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.log4j.Logger;
import org.apache.log4j.PropertyConfigurator;
import org.jdom.Document;
import org.jdom.JDOMException;
import org.jdom.Element;
import org.jdom.DocType;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import org.jdom.output.Format;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.*;
public class TMX {
private Hashtable properties = null;
private ArrayList segments = null;
public TMX() {
logger = Logger.getLogger(TMX.class);
properties = new Hashtable();
segments = new ArrayList();
}
public void parseTMX(String filename) {
try {
SAXBuilder builder = new SAXBuilder();
Document document = builder.build(filename);
//doctpye
DocType doctype = document.getDocType();
properties.put("doctype.name", doctype.getElementName());
properties.put("doctpye.systemid", doctype.getSystemID());
logger.debug(doctype.getElementName());
logger.debug(doctype.getPublicID());
logger.debug(doctype.getSystemID());
logger.debug(doctype.toString());
//root element
Element root = document.getRootElement();
properties.put("tmx.version", root.getAttributeValue("version"));
//header
Element header = root.getChild("header");
properties.put("header.creationtool", header.getAttributeValue("creationtool"));
properties.put("header.creationtoolversion", header.getAttributeValue("creationtoolversion"));
properties.put("header.segtype", header.getAttributeValue("segtype"));
properties.put("header.o-tmf", header.getAttributeValue("o-tmf"));
properties.put("header.adminlang", header.getAttributeValue("adminlang"));
properties.put("header.srclang", header.getAttributeValue("srclang"));
properties.put("header.datatype", header.getAttributeValue("datatype"));
//body
Element body = root.getChild("body");
List tus = body.getChildren();
Iterator it = tus.iterator();
//loop through every tu,tuv,seg
List tuvs = null;
String src = null, target = null;
Object temp = null;
Element tu = null;
boolean targetlangok = false; //targetlanguage
while (it.hasNext()) {
tu = (Element) it.next();
tuvs = tu.getChildren();
temp = tuvs.get(0);
src = (temp == null) ? "" : ((Element) (temp)).getChild("seg").getText();
temp = tuvs.get(1);
target = (temp == null) ? "" : ((Element) (temp)).getChild("seg").getText();
if (!targetlangok) {
properties.put("header.targetlang", ((Element) (temp)).getAttributeValue("lang"));
targetlangok = true;
} else {
segments.add(new String[]{src, target});
}
}
document = null;
builder = null;
} catch (JDOMException e) {
logger.error("Error occured in parsing TMX file!", e);
} catch (IOException e) {
logger.error("Error occured in parsing TMX file!", e);
}
}
public void saveTMX(String filename) {
Document document = new Document();
Object temp = null;
String src = null, target = null; //language
//doctype
temp = properties.get("doctpye.name");
DocType doctype = new DocType(temp == null ? "tmx" : (String) temp);
temp = properties.get("doctpye.systemid");
doctype.setSystemID(temp == null ? "tmx11.dtd" : (String) temp);
document.setDocType(doctype);
//root
Element root = document.setRootElement(new Element("tmx")).getRootElement();
temp = properties.get("tmx.version");
root.setAttribute("version", (temp == null ? "1.1" : (String) temp));
//header
Element header = new Element("header");
temp = properties.get("header.creationtool");
header.setAttribute("creationtool", temp == null ? "TMX2XLS" : (String) temp);
temp = properties.get("header.creationtoolversion");
header.setAttribute("creationtoolversion", temp == null ? "0.1" : (String) temp);
temp = properties.get("header.segtype");
header.setAttribute("segtype", temp == null ? "sentence" : (String) temp);
temp = properties.get("header.o-tmf");
header.setAttribute("o-tmf", temp == null ? "TMX2XLS" : (String) temp);
temp = properties.get("header.adminlang");
header.setAttribute("adminlang", temp == null ? "EN-US" : (String) temp);
temp = properties.get("header.srclang");
header.setAttribute("srclang", temp == null ? "DE" : (String) temp);
src = temp == null ? "DE" : (String) temp; //srclanguage
temp = properties.get("header.datatype");
header.setAttribute("datatype", temp == null ? "plaintext" : (String) temp);
root.addContent(header);
temp = properties.get("header.targetlang");
target = temp == null ? "LT" : (String) temp;//targetlanguage
//body
Element body = new Element("body");
Element tu = null, tuv = null;
String[] sentence = null;
for (int i = 0; i < segments.size(); i++) {
sentence = (String[]) segments.get(i);
tu = new Element("tu");
//src tuv
tuv = new Element("tuv").
setAttribute("lang", src).
addContent(new Element("seg").setText(sentence));
tu.addContent(tuv);
//target tuv
tuv = new Element("tuv").
setAttribute("lang", target).
addContent(new Element("seg").setText(sentence));
tu.addContent(tuv);
//add to body
body.addContent(tu);
}
root.addContent(body);
try {
XMLOutputter outputter = new XMLOutputter();
outputter.setFormat(Format.getPrettyFormat());
outputter.output(document, new FileOutputStream(filename));
//outputter.output(document, System.out);
outputter = null;
document = null;
} catch (IOException e) {
logger.error("Error occured in saving TMX file!", e);
}
}
public void parseXLS(String filename) {
try {
HSSFWorkbook book = new HSSFWorkbook(new FileInputStream(filename));
HSSFSheet header = book.getSheet("header");
HSSFRow row = null;
//check the file format
if (header == null) {
throw new Exception("Invalid XLS Format, Can't transform to TMX file!");
}
Iterator it = header.rowIterator();
while (it.hasNext()) {
row = (HSSFRow) it.next();
properties.put(row.getCell((short) 0).getStringCellValue(),
row.getCell((short) 1).getStringCellValue());
logger.debug(row.getCell((short) 0).getStringCellValue());
logger.debug(row.getCell((short) 1).getStringCellValue());
}
HSSFSheet body = book.getSheet("body");
it = body.rowIterator();
boolean targetlangok = false; //targetlanguage
while (it.hasNext()) {
//this is the first row!!
if (!targetlangok) {
properties.put("header.targetlang", ((HSSFRow) it.next()).getCell((short) 1).getStringCellValue());
targetlangok = true;
} else {
row = (HSSFRow) it.next();
segments.add(new String[]{row.getCell((short) 0).getStringCellValue(),
row.getCell((short) 1).getStringCellValue()});
logger.debug(row.getCell((short) 0).getStringCellValue());
logger.debug(row.getCell((short) 1).getStringCellValue());
}
}
} catch (IOException e) {
logger.error("Error occured in parsing XLS file!", e);
} catch (Exception e) {
logger.error("Error occured in parsing XLS file!", e);
}
}
public void saveXLS(String filename) {
Object temp = null;
String src = null, target = null;
temp = properties.get("header.srclang");
src = temp == null ? "DE" : (String) temp; //srclanguage
temp = properties.get("header.targetlang");
target = temp == null ? "LT" : (String) temp;//targetlanguage
HSSFWorkbook book = new HSSFWorkbook();
HSSFSheet body = book.createSheet("body");
HSSFRow row = null;
HSSFCell cell = null;
//first row, language title
row = body.createRow(0);
row.createCell((short) 0).setCellValue(src);
row.createCell((short) 1).setCellValue(target);
//sentences
String[] sentence = null;
for (int i = 0; i < segments.size(); i++) {
sentence = (String[]) segments.get(i);
row = body.createRow(i + 1);
cell = row.createCell((short) 0);
cell.setEncoding((short) 1);
cell.setCellValue(sentence);
cell = row.createCell((short) 1);
cell.setEncoding((short) 1);
cell.setCellValue(sentence);
logger.debug(sentence);
logger.debug(sentence);
}
//header
HSSFSheet header = book.createSheet("header");
temp = properties.get("doctpye.name");
row = header.createRow(0);
row.createCell((short) 0).setCellValue("doctpye.name");
row.createCell((short) 1).setCellValue(temp == null ? "tmx" : (String) temp);
row = header.createRow(1);
temp = properties.get("doctpye.systemid");
row.createCell((short) 0).setCellValue("doctpye.systemid");
row.createCell((short) 1).setCellValue(temp == null ? "tmx11.dtd" : (String) temp);
row = header.createRow(2);
temp = properties.get("tmx.version");
row.createCell((short) 0).setCellValue("tmx.version");
row.createCell((short) 1).setCellValue(temp == null ? "1.1" : (String) temp);
row = header.createRow(3);
temp = properties.get("header.creationtool");
row.createCell((short) 0).setCellValue("header.creationtool");
row.createCell((short) 1).setCellValue(temp == null ? "TMX2XLS" : (String) temp);
row = header.createRow(4);
temp = properties.get("header.creationtoolversion");
row.createCell((short) 0).setCellValue("header.creationtoolversion");
row.createCell((short) 1).setCellValue(temp == null ? "0.1" : (String) temp);
row = header.createRow(5);
temp = properties.get("header.segtype");
row.createCell((short) 0).setCellValue("header.segtype");
row.createCell((short) 1).setCellValue(temp == null ? "sentence" : (String) temp);
row = header.createRow(6);
temp = properties.get("header.o-tmf");
row.createCell((short) 0).setCellValue("header.o-tmf");
row.createCell((short) 1).setCellValue(temp == null ? "TMX2XLS" : (String) temp);
row = header.createRow(7);
temp = properties.get("header.adminlang");
row.createCell((short) 0).setCellValue("header.adminlang");
row.createCell((short) 1).setCellValue(temp == null ? "EN-US" : (String) temp);
row = header.createRow(8);
//temp = properties.get("header.srclang");
row.createCell((short) 0).setCellValue("header.srclang");
row.createCell((short) 1).setCellFormula("body!a1");
row = header.createRow(9);
temp = properties.get("header.datatype");
row.createCell((short) 0).setCellValue("header.datatype");
row.createCell((short) 1).setCellValue(temp == null ? "plaintext" : (String) temp);
row = header.createRow(10);
row.createCell((short) 0).setCellValue("header.targetlang");
row.createCell((short) 1).setCellFormula("body!b1");
try {
book.write(new FileOutputStream(filename));
} catch (IOException e) {
logger.error("Error occured in saving XLS file!", e);
}
}
private static Logger logger = null;
}
自定义的Excel数据结构
* XLS Fromat Specification* 1st sheet:
* Name: body
* Content: body of tmx file
* Columns: 2
* 1st Column: Source language content
* 2st Column: Target language content
* Rows: 2+
* 1st Row: Label of language name
* 2nd Row and following: translation content,
* every row is a tu,
* every cell is a tuv and embeding seg
* last row: tmx2xls tmx2xls
* <p/>
* 2nd sheet:
* Name: header
* Content: tmx header infos
* 1st Column: header name
* 2st Column: header content
* Rows: fixed
* 1: tmx2xls tmx2xls
* 2: xml.version 1.0
* 3: xml.encodingUTF-8
* <?xml version="1.0" encoding="UTF-8"?>
* 4: tmx.doctype
* <!DOCTYPE tmx SYSTEM "tmx11.dtd">
* 5: tmx.version 1.1
* 6: header.creationtool TMX2XLS
* 7: header.creationtoolversion 0.1
* 8: header.segtype sentence
* segtype="sentence"
* 9: header.o-tmfTMX2XLS
* 10: header.adminlang EN-US
* adminlang="EN-US"
* 11: srclangxxx
* it's besser to make a reference formula mit cell body.A1
* 12: header.datatype plaintext
* datatype="plaintext" 希望大家多多批评指正,呵呵:lol:
log4j.properties
差点儿忘了#OFF, FATAL, ERROR, WARN, INFO, DEBUG, ALL
log4j.threshold=ALL
log4j.rootLogger=,filelog
# For appender named appenderName, set its class.
# Note: The appender name can contain dots.
log4j.appender.filelog=org.apache.log4j.DailyRollingFileAppender
log4j.appender.filelog.File=${java.io.tmpdir}/TMX2XLS_log.html
log4j.appender.filelog.DatePattern='.'yyyy-MM-dd'.html'
For each named appender you can configure its Layout. The syntax for configuring an appender's layout is:
log4j.appender.filelog.layout=org.apache.log4j.HTMLLayout
log4j.appender.filelog.layout.Title=TMX2XLS
jar打包
为了方便部署,将jdom,log4j,poi各包解开,与程序一起打包。manifest
Manifest-Version: 1.0
Created-By: 1.4.2 (Sun Microsystems Inc.)
Main-Class: com.greenflute.tmx.TMX2XLS
Fertig
页:
[1]