The Problem with Translating DITA
Today, I was reading the 188th Tool Kit – Basic Edition newsletter by Jost Zetzsche. I love reading Jost’s blog because he isn’t afraid to say what he thinks, even if it is controversial. He is also very well-studied and very well-connected.
In his most recent newsletter, Jost discusses the demise of LISA (Localization Industry Standards Association), the various groups that have stepped up to assume LISA’s charter (interoperability standards), and some of the problems that he sees facing the translation industry with the current proposed replacements. This is an interesting discussion and Jost correctly describes the issues facing the smaller LSP and the individual translator.
I was fascinated by this comment in his post:
“…there are some standards that work very well for translation buyers but are certainly against our [individual translators/small LSP] interests. In my opinion, one of those standards is DITA, an XML-based standard that provides the ability to segment the source text into small chunks that can be used in a variety of ways and allow for a great reuse of data; however, this works much to the detriment of the translator who often lacks the necessary context.”
Wow! In all of my years of dealing with XML and DITA, and in my years of watching and being part of the translation community, I have never considered the impact of DITA on translation. Of course, it makes perfect sense. In the DITA world, content is broken into small chunks so that they can be reused and repurposed in a variety of ways. When a chunk of content is sent to translation, it is not necessarily combined with any of the other chunks to which it relates. The chunk is completely out of context. Kind of like my children eavesdropping on my phone calls, but only catching errant sentences, for which they later want a full explanation.
Creating a small chunk of content that will live in multiple places, be surrounded by a variety of other pieces of content, and be consumed via a variety of media is a difficult task unto itself. However, the writer creating this small chunk usually has access to the CMS system that houses the related information.
The translator is often an individual, contracted by an LSP, who may or may not be able to access the related pieces of content. I repeat: Wow! This seems like an incredibly daunting task, particularly if the content was not created to be global-ready. By this I mean, the writer was not following the basic rules of writing text that will be translated. I’ve written lots of posts on these basic rules. For example, making sure that you use a noun with the words “this, that, these, and those,” so the translator knows if the noun is masculine or feminine. Or, making sure that your sentences do not contain idiomatic phrases that have no meaning in another language. And, my favorite, keeping your sentences as short and simple as possible.
I have to wonder how helpful translation memory is, if the context of the segment is unavailable. Sure, perhaps that segment has been translated before. But, how would the translator know if the context of this rendition of the segment is appropriate?
As I’ve mention countless times, I am not an expert on translation. I am a student, though. And I am very curious to understand more about this topic. If you have information to add to this discussion, or if I am missing a point, or if I am just incorrect, please let me know.