|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pdfbox.util.PDFStreamEngine
org.apache.pdfbox.util.PDFTextStripper
com.java4less.pdf.PDFToTextConverter
public class PDFToTextConverter
this class converts a PDF file to text. This works only if the PDF file contains texts and not an just an image. For example some scanners or faxes can create PDF files but these files contains an image of the scanned page. These cannot be converter to text.
Constructor Summary | |
---|---|
PDFToTextConverter()
|
Method Summary | |
---|---|
java.lang.String |
convertToString(java.io.InputStream is)
convert PDF input stream to text |
int |
getPageColumns()
number of characters per line in text output |
boolean |
isAddEmptyLines()
if false, empty lines will be removed |
boolean |
isPreserveSpaces()
if false, spaces will be removed. |
void |
setAddEmptyLines(boolean addEmptyLines)
if false, empty lines will be removed |
void |
setPageColumns(int pageColumns)
number of characters per line in text output |
void |
setPreserveSpaces(boolean preserveSpaces)
if false, spaces will be removed. |
Methods inherited from class org.apache.pdfbox.util.PDFTextStripper |
---|
getAverageCharTolerance, getEndBookmark, getEndPage, getLineSeparator, getPageSeparator, getSpacingTolerance, getStartBookmark, getStartPage, getText, getText, getWordSeparator, inspectFontEncoding, resetEngine, setAverageCharTolerance, setEndBookmark, setEndPage, setLineSeparator, setPageSeparator, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, shouldSeparateByBeads, shouldSortByPosition, shouldSuppressDuplicateOverlappingText, writeText, writeText |
Methods inherited from class org.apache.pdfbox.util.PDFStreamEngine |
---|
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getTotalCharCnt, getValidCharCnt, getXObjects, processEncodedText, processOperator, processStream, processSubStream, registerOperatorProcessor, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public PDFToTextConverter() throws java.io.IOException
java.io.IOException
Method Detail |
---|
public java.lang.String convertToString(java.io.InputStream is) throws java.io.IOException
java.io.IOException
public int getPageColumns()
public void setPageColumns(int pageColumns)
pageColumns
- public boolean isPreserveSpaces()
public void setPreserveSpaces(boolean preserveSpaces)
preserveSpaces
- public boolean isAddEmptyLines()
public void setAddEmptyLines(boolean addEmptyLines)
addEmptyLines
-
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |