Text Cleanup

JavaScript for Adobe InDesign
Latest update 05/27/2017

The script repairs common flaws in text, done in one operation rather than repeated visits to the InDesign Find/Change dialog. The script was created to clean up imported text from word processing applications, content that is often loaded with unwanted formatting — extra spaces between words, multiple spaces used to align text, unwanted line breaks, and more.

  • Apply to an entire document, a selected story, or selected text
  • Confirmation to proceed or skip changes
  • Replace a defined number of spaces with a tab character
  • Remove spaces before or after tabs
  • Remove spaces and tabs at paragraph begin or end
  • Reduce multiple spaces or tabs to one
  • If desired, keep two spaces between sentences
  • Replace double hyphens with em or en dash
  • Remove forced line breaks
  • Remove empty paragraphs
  • Save and restore all settings

Free to download and use. Contributions of any amount are appreciated but not required.

Text Cleanup screen 1
Download
Text Cleanup

Instructions for use

The interface is divided into five sections: Scope, Spaces, Tabs, Other, and Settings. When the OK button is clicked, processing begins. The first match discovered is selected in the layout and a confirmation dialog appears on screen.

Text Cleanup screen 2

OK — the selected text is removed or replaced as indicated, and the next proposed change is selected.

OK to all — the selected text, and all subsequent changes, are performed without further user intervention.

Skip — the selected text remains unchanged, and the next proposed change is selected.

Cancel — processing ceases without further changes. Any changes previously accepted remain. Use undo to back out of all changes, or revert the document.

Once processing is complete, the user is notified.

Some notes:

Changes are performed in the order listed on screen. For example, replacing spaces with a tab is done before removing or replacing multiple spaces, otherwise changing spaces to a tab would never occur. Though changes are performed in the most logical order, cases of particularly odd formatting may warrant a second run of the script. If the ultimate result is not realized the first time through, try a second pass, targeting the particular flaw that remains.

The confirmation function of the script selects the text that will be changed and brings it into view on screen. If any story includes overset text, it is not possible to show the proposed change because it is hidden off-screen in the overset text. If this condition exists, a warning will be presented and the user has the option to continue regardless, or decline and remedy the overset text before trying again, the recommended choice. Then you can see what is being changed and confirm each change before continuing.

Section 1: Scope

Document — changes apply to the entire document that is currently open and the top-most window if multiple documents are open.

Selected story — changes apply strictly to the selected story, or the story containing text currently selected. If no story is selected, the choice is disabled. The user may also choose Document to increase the scope of text affected.

Selection — changes apply strictly to the selected text. If no text is selected, the choice is disabled. The user may also choose Selected story or Document to increase the scope of text affected.

Section 2: Spaces

Replace all special with normal — all instances of special space characters are replaced with normal space characters. Examples of special space characters are non-breaking, thin, en, and em spaces, among others. Use with caution for completed documents, in which case the designer has likely used special space characters for intended purposes. This option exists for imported text from unreliable sources such as word processing applications, in which case the use of special space characters is usually by mistake and may cause undesired results while typesetting the text.

Replace with tab — the user may define a minimum number of multiple spaces that when detected, the spaces detected and any additional spaces are replaced with a single tab character. Also, the change may be restricted to only instances of multiple spaces at the beginning of paragraphs. Use with care as the results will be dramatic. Best used for newly imported text that will be subsequently styled.

Remove before or after tab — any number of space characters alone, before or after a tab character, are removed. The option is relatively safe, but in some cases may upset the text flow. Use with care for completed documents.

Remove at paragraph begin — any number of space characters discovered at the beginning of any paragraph are removed.

Between words, replace two or more with one — any instance of two or more space characters is reduced to a single space, unless the choice Keep two spaces between sentences is enabled. See below for more details.

Include special space characters — the replacement of multiple spaces will include any special space characters such as non-breaking, en, or em space, etc. Any instance that includes special space characters will be replaced by a normal space character.

Keep two spaces between sentences — preserves instances of two spaces, but only when the instance directly follows a period, signaling the end of a sentence. This choice will not add spaces between sentences, only keep them if they already exist. Please note, if it were up to me (a typesetter of many years), this would not be an option. Unfortunately, clients exist who are adamant that sentences be separated by double spaces, regardless of any explanation of typography and how it differs from typewriters and lessons learned in High School. These clients write the check, so I’m in no position to argue. If you have similar clients, you’ll appreciate the option to preserve their double spaces despite our distaste for the practice.

Section 3: Tabs

Remove at paragraph begin — any number of tab characters discovered at the beginning of any paragraph are removed.

Replace two or more with one — any instance of two or more consecutive tab characters is reduced to a single tab.

Section 4: Other

Replace double hyphen with em dash or en dash — any instance of two hyphens is replaced with a single em dash or en dash, as chosen by the user.

Remove forced line breaks — all forced line breaks are removed. The script will judge if a space character precedes or follows each line break, and for instances where a space is absent, one will be inserted to ensure that words do not crash together once the line break is removed. Use with care for completed documents, as forced line breaks added by the designer are likely intended and their removal may upset the text flow. This option exists for imported text from unreliable sources such as word processing applications, in which case the insertion of forced line breaks is usually by mistake and may cause undesired results while typesetting the text.

Remove excess at paragraph and story end — any combination of spaces or tabs, prior to the end of paragraphs, is removed. For stories, the same applies and includes the removal of forced line breaks and excess paragraph ends at the story end. If one or more paragraph ends exist at story end, one paragraph end will remain, otherwise the story concludes with a story end marker. This option is safe to use for any document as the result has no effect on text flow.

Remove empty paragraphs — instances of two consecutive paragraph ends is replaced with one. Use with caution for completed documents as the result will be dramatic, eliminating space between paragraphs in cases where empty paragraphs have been used to do so. It’s not the right way to create space between paragraphs, but the practice is common even in completed documents. The option exists for imported text to prepare it for proper configuration of space between paragraphs (if desired) by giving the paragraph style a value for space before and/or after rather than inserting empty paragraphs between paragraphs.

Section 5: Settings

Current choices may be saved and restored later. Select from the Load drop-down list to choose saved settings, which will then update the current choices. Click the Delete button and the saved settings selected in the Load drop-down list will be permanently removed. Click the Save button, provide a name for the settings, and the current choices will be preserved. If the name already exists, the user may choose to replace the saved settings.

Each time processing occurs, the current settings are preserved, and the next time the script is launched, settings are restored to the last values used.

Note that the functionality to save settings requires a file to store the settings, which coexists in the script folder alongside the script file itself. It has the same name as the script but a different extension, “json”. The file may or may not be visible depending on the InDesign Scripts Panel option Display unsupported files. Normally only script files are visible, but when this option is enabled in the Script Panel fly-out menu, all files are visible.

Download
Text Cleanup

For help installing scripts, see How to install and use scripts in Adobe Creative Cloud applications.

IMPORTANT: by downloading the script you agree that the software is provided without any warranty, express or implied. USE AT YOUR OWN RISK. Always make backups of important data.