我有以下问题。我有一个 PDF,其中附有作为注释的 XML 文件。 不是作为嵌入文件,而是作为注释。现在我尝试使用以下链接中的代码来阅读它:

iTextSharp - how to open/read/extract a file attachment?

它适用于嵌入文件,但不适用于作为注释的文件附件。

我在 Google 上搜索从 PDF 中提取注释并找到以下链接: Reading PDF Annotations with iText

因此注释类型为“文件附件注释”

有人可以展示一个可行的示例吗?

预先感谢您的帮助

请您参考如下方法:

正如在有关 iText 和 iTextSharp 的问题中经常出现的那样,我们应该首先查看 keyword list on itextpdf.com 。在这里你可以找到File attachment, extract attachments引用 iText in Action — 2nd Edition 中的两个 Java 示例:

旧的关键字列表不再存在; itextpdf.com 网站现在提供了其他搜索示例的方法,但我不会描述它们,以免网站再次更改并且我再次出现死链接...

基于iText in Action — Second Edition的相关iText示例是:

  • 第4部分.第16章.KubrickDvds
  • 第4部分.第16章.库布里克纪录片

这里是Samples from iText5

(我还没有找到示例到 .Net 和 iText 7 的端口,但根据其他来源,这个端口应该不会太困难...)

KubrickDvds 包含以下方法 extractAttachments/ExtractAttachments 来提取文件附件注释:

Java、iText 5.x:

/** 
 * Extracts attachments from an existing PDF. 
 * @param src   the path to the existing PDF 
 */ 
public void extractAttachments(String src) throws IOException { 
    PdfReader reader = new PdfReader(src); 
    PdfArray array; 
    PdfDictionary annot; 
    PdfDictionary fs; 
    PdfDictionary refs; 
    for (int i = 1; i <= reader.getNumberOfPages(); i++) { 
        array = reader.getPageN(i).getAsArray(PdfName.ANNOTS); 
        if (array == null) continue; 
        for (int j = 0; j < array.size(); j++) { 
            annot = array.getAsDict(j); 
            if (PdfName.FILEATTACHMENT.equals(annot.getAsName(PdfName.SUBTYPE))) { 
                fs = annot.getAsDict(PdfName.FS); 
                refs = fs.getAsDict(PdfName.EF); 
                for (PdfName name : refs.getKeys()) { 
                    FileOutputStream fos 
                        = new FileOutputStream(String.format(PATH, fs.getAsString(name).toString())); 
                    fos.write(PdfReader.getStreamBytes((PRStream)refs.getAsStream(name))); 
                    fos.flush(); 
                    fos.close(); 
                } 
            } 
        } 
    } 
    reader.close(); 
} 

Java、iText 7.x:

public void extractAttachments(String src) throws IOException { 
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(src)); 
    PdfReader reader = new PdfReader(src); 
    PdfArray array; 
    PdfDictionary annot; 
    PdfDictionary fs; 
    PdfDictionary refs; 
    for (int i = 1; i <= pdfDoc.getNumberOfPages(); i++) { 
        array = pdfDoc.getPage(i).getPdfObject().getAsArray(PdfName.Annots); 
        if (array == null) continue; 
        for (int j = 0; j < array.size(); j++) { 
            annot = array.getAsDictionary(j); 
            if (PdfName.FileAttachment.equals(annot.getAsName(PdfName.Subtype))) { 
                fs = annot.getAsDictionary(PdfName.FS); 
                refs = fs.getAsDictionary(PdfName.EF); 
                for (PdfName name : refs.keySet()) { 
                    FileOutputStream fos 
                            = new FileOutputStream(String.format(PATH, fs.getAsString(name).toString())); 
                    fos.write(refs.getAsStream(name).getBytes()); 
                    fos.flush(); 
                    fos.close(); 
                } 
            } 
        } 
    } 
    reader.close(); 
} 

C#、iText 5.x:

/** 
 * Extracts attachments from an existing PDF. 
 * @param src the path to the existing PDF 
 * @param zip the ZipFile object to add the extracted images 
 */ 
public void ExtractAttachments(byte[] src, ZipFile zip) { 
  PdfReader reader = new PdfReader(src); 
  for (int i = 1; i <= reader.NumberOfPages; i++) { 
    PdfArray array = reader.GetPageN(i).GetAsArray(PdfName.ANNOTS); 
    if (array == null) continue; 
    for (int j = 0; j < array.Size; j++) { 
      PdfDictionary annot = array.GetAsDict(j); 
      if (PdfName.FILEATTACHMENT.Equals( 
          annot.GetAsName(PdfName.SUBTYPE))) 
      { 
        PdfDictionary fs = annot.GetAsDict(PdfName.FS); 
        PdfDictionary refs = fs.GetAsDict(PdfName.EF); 
        foreach (PdfName name in refs.Keys) { 
          zip.AddEntry( 
            fs.GetAsString(name).ToString(),  
            PdfReader.GetStreamBytes((PRStream)refs.GetAsStream(name)) 
          ); 
        } 
      } 
    } 
  } 
} 

KubrickDocumentary 包含以下方法 extractDocLevelAttachments/ExtractDocLevelAttachments 用于提取文档级附件:

Java、iText 5.x:

/** 
 * Extracts document level attachments 
 * @param filename     a file from which document level attachments will be extracted 
 * @throws IOException 
 */ 
public void extractDocLevelAttachments(String filename) throws IOException { 
    PdfReader reader = new PdfReader(filename); 
    PdfDictionary root = reader.getCatalog(); 
    PdfDictionary documentnames = root.getAsDict(PdfName.NAMES); 
    PdfDictionary embeddedfiles = documentnames.getAsDict(PdfName.EMBEDDEDFILES); 
    PdfArray filespecs = embeddedfiles.getAsArray(PdfName.NAMES); 
    PdfDictionary filespec; 
    PdfDictionary refs; 
    FileOutputStream fos; 
    PRStream stream; 
    for (int i = 0; i < filespecs.size(); ) { 
      filespecs.getAsString(i++); 
      filespec = filespecs.getAsDict(i++); 
      refs = filespec.getAsDict(PdfName.EF); 
      for (PdfName key : refs.getKeys()) { 
        fos = new FileOutputStream(String.format(PATH, filespec.getAsString(key).toString())); 
        stream = (PRStream) PdfReader.getPdfObject(refs.getAsIndirectObject(key)); 
        fos.write(PdfReader.getStreamBytes(stream)); 
        fos.flush(); 
        fos.close(); 
      } 
    } 
    reader.close(); 
} 

Java、iText 7.x

public void extractDocLevelAttachments(String src) throws IOException { 
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(src)); 
    PdfDictionary root = pdfDoc.getCatalog().getPdfObject(); 
    PdfDictionary documentnames = root.getAsDictionary(PdfName.Names); 
    PdfDictionary embeddedfiles = documentnames.getAsDictionary(PdfName.EmbeddedFiles); 
    PdfArray filespecs = embeddedfiles.getAsArray(PdfName.Names); 
    PdfDictionary filespec; 
    PdfDictionary refs; 
    FileOutputStream fos; 
    PdfStream stream; 
    for (int i = 0; i < filespecs.size(); ) { 
        filespecs.getAsString(i++); 
        filespec = filespecs.getAsDictionary(i++); 
        refs = filespec.getAsDictionary(PdfName.EF); 
        for (PdfName key : refs.keySet()) { 
            fos = new FileOutputStream(String.format(PATH, filespec.getAsString(key).toString())); 
            stream = refs.getAsStream(key); 
            fos.write(stream.getBytes()); 
            fos.flush(); 
            fos.close(); 
        } 
    } 
    pdfDoc.close(); 
} 

C#、iText 5.x:

/** 
 * Extracts document level attachments 
 * @param PDF from which document level attachments will be extracted 
 * @param zip the ZipFile object to add the extracted images 
 */ 
public void ExtractDocLevelAttachments(byte[] pdf, ZipFile zip) { 
  PdfReader reader = new PdfReader(pdf); 
  PdfDictionary root = reader.Catalog; 
  PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES); 
  PdfDictionary embeddedfiles =  
      documentnames.GetAsDict(PdfName.EMBEDDEDFILES); 
  PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES); 
  for (int i = 0; i < filespecs.Size; ) { 
    filespecs.GetAsString(i++); 
    PdfDictionary filespec = filespecs.GetAsDict(i++); 
    PdfDictionary refs = filespec.GetAsDict(PdfName.EF); 
    foreach (PdfName key in refs.Keys) { 
      PRStream stream = (PRStream) PdfReader.GetPdfObject( 
        refs.GetAsIndirectObject(key) 
      ); 
      zip.AddEntry( 
        filespec.GetAsString(key).ToString(),  
        PdfReader.GetStreamBytes(stream) 
      ); 
    } 
  } 
} 

(出于某种原因,C# 示例将提取的文件放入某个 ZIP 文件中,而 Java 版本将它们放入文件系统中...哦好吧...)


评论关闭
IT源码网

微信公众号号:IT虾米 (左侧二维码扫一扫)欢迎添加!