How to instantly remove unwanted line breaks when copying from a PDF

Web Editors' Community
,
21 Jun 2011

Sometimes you need to copy text from a PDF document into the website. If you just grab the text and paste it straight in, you will often get these sorts of effects, where the lines don't reach the right-hand edge of the page:

In order to make the process relevant and

accessible to industry, stage two has seen the

group working on two case studies with a

Or:

The opinions expressed in this publication are those of the authors, and do not represent the views of the Institute
of Materials, Minerals and Mining, its Council or its officers except where explicitly identified as such. This publication
is copyright under the Berne Convention and the international Copyright Convention

Here is a useful trick to quickly resolve this without having to remove all the line breaks manually. Basically all it does is automatically replace all the unwanted line breaks with a single space, making all the text run together into a single paragraph:

  1. copy the text you want from the PDF
  2. paste into a new Word document
  3. click “edit” then “replace”
  4. make sure you’re in the “find what” field
  5. click “more” then “special”
  6. select “paragraph mark” (top of the list)
  7. click into the “replace with” field
  8. press the space bar once
  9. click “replace all”
  10. click “ok” then close the “find & replace” box.


All the line breaks have now been removed.

Now, copy the text from the Word doc and paste into the website – but please remember to remove the Word formatting, either by using the Notepad process (copy from Word, paste into Notepad, copy from Notepad, paste into website) or by pasting direct from Word into the website but applying the “format eraser” to the text afterwards.

The only problem with the above technique is that you may have a large chunk of text, spread across several paragraphs, that you want to paste from the PDF into the website - because if you remove all the line breaks, all your paragraphing will go as well. So how do you retain the paragraphs but remove all the unwanted line breaks? Here's how:

  1. copy the text off the PDF
  2. paste into a new Word document
  3. go through the text, manually inserting a ` symbol (top left of standard keyboards) at the beginning of each paragraph. (NB this can be any symbol you like, e.g. @ ~ $ etc - but it must be one that isn't used legitimately within the text)
  4. click “edit” then “replace”
  5. make sure you’re in the “find what” field
  6. click “more” then “special”
  7. select “paragraph mark” (top of the list)
  8. click into the “replace with” field
  9. press the space bar once
  10. click “replace all”
  11. click “edit” then “replace”
  12. make sure you’re in the “find what” field
  13. type a ` character (or whatever symbol you chose to denote the start of a paragraph in step 3)
  14. click into "replace with" field
  15. click "special" (if this isn't showing, click the "more" button first)
  16. click "paragraph mark" TWICE
  17. click "replace all"

Now, all instances of your ` (or other chosen symbol) will have been replaced with two line breaks, reproducing your paragraphs.

Richard Cooper is Web & New Media Development Manager for IOM3. Twitter: @iom3webmanager