Taking Special Care of Special Issues...
Feb. 12th, 2007 12:13 pm[Edited: Fixed my st00pid markup so you can now, uh, see what I actually meant to say.]
In response to my entry about Idunna 10, a couple people offered assistance.
I thought I'd post a sample page to show the kind of good, but difficult, material we're dealing with here.
Follow me behind the curtain.
This is Idunna 11, page 1--the original scans were done by
gnowun at 300 dpi so the illos (fancy schmancy talk for "illustrations") will still look good when printed back out. To compare, your monitor, and almost all graphics designed for Internet consumption, are 72 dpi. Also, the original is a TIFF to losslessly preserve quality, and this example is a low-quality JPEG.
Still, the picture below should give you a good idea of the size of the problem here:

The funny-looking letters are in a style called Fraktur. Here's a shot of the whole alphabet done up this way, nicked from Wikipedia's Fraktur article:

The typical issue winds through a mishmash of US English, UK English, the occasional Fraktur word or phrase, a healthy flavoring of loanwords from more properly Germanic languages, and several authors with a fond habit for "Saxon English"--i.e., using as little Romantic vocabulary as possible, which usually leads to dusty, archaic corners of our beautifully bastard tongue.
One will not only learn a lot about Troth history by bring on this project, you also need to know a bit before you start so you can chew on words like "foresib" and "busyship" and not boggle. Crucial, too, is the ability to deal with þ and not ask, "what happened to that p? Is a slipped hump painful?"
My proofreading guy just asked me not to bother with the OCR run, figuring it takes him as long to proofread its text as it does him to just type the goldurned thing in in the first place. As Adobe Acrobat's OCR engine thought we worshipped the "Yanir"--with the much-simpler layout that 10 had--I'm inclined to agree. Frankly, volunteer hours are cheaper than software, and manual typing that actually is getting done is preferable to OCR feeding that isn't.
-- Lorrie
In response to my entry about Idunna 10, a couple people offered assistance.
I thought I'd post a sample page to show the kind of good, but difficult, material we're dealing with here.
Follow me behind the curtain.
This is Idunna 11, page 1--the original scans were done by
Still, the picture below should give you a good idea of the size of the problem here:
The funny-looking letters are in a style called Fraktur. Here's a shot of the whole alphabet done up this way, nicked from Wikipedia's Fraktur article:

The typical issue winds through a mishmash of US English, UK English, the occasional Fraktur word or phrase, a healthy flavoring of loanwords from more properly Germanic languages, and several authors with a fond habit for "Saxon English"--i.e., using as little Romantic vocabulary as possible, which usually leads to dusty, archaic corners of our beautifully bastard tongue.
One will not only learn a lot about Troth history by bring on this project, you also need to know a bit before you start so you can chew on words like "foresib" and "busyship" and not boggle. Crucial, too, is the ability to deal with þ and not ask, "what happened to that p? Is a slipped hump painful?"
My proofreading guy just asked me not to bother with the OCR run, figuring it takes him as long to proofread its text as it does him to just type the goldurned thing in in the first place. As Adobe Acrobat's OCR engine thought we worshipped the "Yanir"--with the much-simpler layout that 10 had--I'm inclined to agree. Frankly, volunteer hours are cheaper than software, and manual typing that actually is getting done is preferable to OCR feeding that isn't.
-- Lorrie
no subject
Date: 2007-02-12 07:45 am (UTC):)
L
no subject
Date: 2007-02-12 08:09 pm (UTC)-- Lorrie
no subject
Date: 2007-02-12 03:41 pm (UTC)HAHAHAHAHAHAHAHAHAHAHAHAHAHAHA. What would they say upon sight of an eth? :P
Re: Ð/ð
Date: 2007-02-12 08:10 pm (UTC)-- Lorrie
Re: Ð/ð
Date: 2007-02-12 08:11 pm (UTC)no subject
Date: 2007-02-25 12:10 pm (UTC)M
no subject
Date: 2007-02-26 08:35 pm (UTC)Thanks for responding!
-- Lorrie