Converting InfoPath to PDF in BizTalk

by eliasen 10. October 2009 14:32

Hi all

So, the other day I had this requirement for a BizTalk pipeline component:

Take an InfoPath formula and convert it into a PDF that is to be sent out via email. This seemed easy enough. I searched a bit, and found that three simple steps were needed:

  1. Install this: 2007 Microsoft Office Add-in: Microsoft Save as PDF 
  2. In my code, reference Microsoft.Office.InfoPath.dll and Microsoft.Office.InfoPath.FormControl.dll
  3. Write these lines of code:
   1: FormControl formControl = new FormControl();
   2: formControl.Open(pInMsg.Data);
   3: string output = Path.GetTempFileName();
   4: formControl.XmlForm.CurrentView.Export(output, Microsoft.Office.InfoPath.ExportFormat.Pdf);

Of course, this would also mean some code that would read the pdf file back in and then create the output message. But hey, that was just the price I had to pay.

BUT… I was being naive… As the more clever of my readers have probably all ready realized, if something is called FORMcontrol, then it is for programs that have a UI. The code crashed big time at runtime with some ActiveX exception :-(

Then I remembered that I have a colleague who had previously told me that she had done this at some point, so I emailed her for her code.

Unfortunately, her code involved taking the form, extracting the XSL from the XSN file, perform a transformation on the XML using the XSL which will generate HTML and then using some utility to convert this into PDF. This was more complex than I had hoped, but I saw no other way. Unfortunately, her code had this line in it:

   1: StreamReader stream = new StreamReader(XmlFormView.XmlForm.Template.OpenFileFromPackage("View1.xsl"));
which, as you might have guessed also requires a UI, in this case it is used in a web application. So no go.

So, it seems that I will have to do a lot of dirty work myself :-(

This turned into quite a list of subtasks:

  • Take the XML document that comes through the pipeline component
  • Take the value of the processing instruction called “mso-infoPathSolution” This processing instruction is always present in an InfoPath form and it looks something like this:
    <?mso-infoPathSolution solutionVersion="1.0.0.2" productVersion="12.0.0" PIVersion="1.0.0.0" href="http://path.to/form.xsn" name="urn:schemas-microsoft-com:office:infopath:MyForm:-myXSD-2009-09-21T15-43-10" ?>
  • Take the value of the href “attribute” that is in the value of the processing instruction. The href is a URI that points to the XSN that this XML is an instance of, you see.
  • Get the XSN file that is located at the URI.
  • Extract the XSL file that matches the view of the form you want to convert into PDF.
  • Perform the transformation
  • Convert into PDF

 

So I am now going from the few lines of code I was hoping for to a more complex solution… so lets look at the code:

First of all, I need the value of the processing instruction. This is easily done:

   1: private static string GetHrefFromXml(XmlDocument infoPathForm)
   2: {
   3:     XmlNode piNode = infoPathForm.SelectSingleNode("/processing-instruction(\"mso-infoPathSolution\")");
   4:     if (piNode != null && piNode is XmlProcessingInstruction)
   5:     {
   6:         var pi = (XmlProcessingInstruction)piNode;
   7:         string href = pi.Value;
   8:         int location = href.IndexOf(Href);
   9:         if (location != -1)
  10:         {
  11:             href = href.Substring(location + Href.Length);
  12:             href = href.Substring(0, href.IndexOf("\""));
  13:             return href;
  14:         }
  15:         throw new ApplicationException("No href attribute was found in the procesing instruction (mso-infoPathSolution). Without this, the location of the form cannot be detected and without the form no PDF can be generated.");
  16:     }
  17:     throw new ApplicationException("Required XML processing instruction (mso-infoPathSolution) not found. Without this, the location of the form cannot be detected and without the form no PDF can be generated.");
  18: }

The most annoying part is, that the value of a processing instruction can be anything. In this case, it appears to be a list of attributes like “normal” XML, but since this is not guaranteed, there is no language support for getting the value of the href “attribute”. So I chose to use string manipulation to get the value.

After getting the href, I need to get the XSN file from SharePoint Server, where the form is published. This turned out to be a challenge also.

My first approach was quite simple:

   1: private static byte[] GetFormByUrl(string href)
   2: {
   3:     var wc = new WebClient
   4:     {
   5:         Credentials = CredentialCache.DefaultCredentials
   6:     };
   7:     return wc.DownloadData(href);
   8: }

This turned out to be something silly, though. What happens when SharePoint and Forms Server get a request for the XSN file, it assumes some one is trying to fill out the form. So what I got back was the HTML that the Forms Server was sending a user that wanted to fill out the form. Then I thought I’d try to do this:

   1: private static byte[] GetFormByUrl(string href)
   2: {
   3:     HttpWebRequest wr = (HttpWebRequest)HttpWebRequest.Create(href);
   4:     wr.AllowAutoRedirect = false;
   5:     WebResponse resp = wr.GetResponse();
   6:     Stream stream = resp.GetResponseStream();
   7:     using (MemoryStream ms = new MemoryStream())
   8:     {
   9:         byte[] buffer = new byte[1024];
  10:         int bytes = 0;
  11:         while ((bytes = stream.Read(buffer,0, buffer.Length)) != -1)
  12:             ms.Write(buffer,0,bytes);
  13:         return ms.ToArray();
  14:     }
  15: }

Basically, using an HttpWebRequest I could ask it to not redirect. This didn’t work either, since what I then got back was some HTML that basically just said that the page has moved. Bummer.

But then another colleague who apparently is better at searching than I am found out that I can add a noredirect parameter to my request that will instruct SharePoint to not redirect. This is different from my current approach because my current approach instructs .NET to not follow redirects, whereas this new approach instructs SharePoint to not ask me to redirect.

So I ended up with something as simple as this:

   1: private static byte[] GetFormByUrl(string href)
   2: {
   3:     string url = href + "?noredirect=true";
   4:     var wc = new WebClient
   5:     {
   6:         Credentials = CredentialCache.DefaultCredentials
   7:     };
   8:     return wc.DownloadData(url);
   9: }

Simple and beautiful :-)

Now I have the XSN file and the next issue pops up, naturally; How do I get the XSL extracted from the XSN file. The XSN file is just a cabinet file with another extension, so I thought this must be easy. I found out it is not. I searched and searched and ended up finding all sorts of weird stuff where people used p/invoke to do stuff and what not. I am confused that Microsoft have not added at least extraction functionality to the .NET framework, but they haven’t.

I ended up doing this:

   1: private static string ExtractCabFile(string cabFile)
   2: {
   3:     string destDir = CreateTmp(true, "");
   4:  
   5:     var sh = new Shell();
   6:     Folder fldr = sh.NameSpace(destDir);
   7:     foreach (FolderItem f in sh.NameSpace(cabFile).Items())
   8:         fldr.CopyHere(f, 0);
   9:     return destDir;
  10: }

This code assumes that the XSN file has been written to a temporary file with the extension .CAB – this is very important, since the shell command will open up the .CAB file with the default program, which is then the explorer. After that, all files in the cabinet file is copied to “destDir” which is just a directory created in the users Temp directory.

I am quite annoyed to have to go through all this, but that’s how things go sometimes.

So now I have found the href of the form, downloaded the form and extracted its files. Time for the transformation:

   1: private static MemoryStream PerformTransformation(XmlDocument xmldoc, string destDir, string view)
   2: {
   3:     var transform = new XslCompiledTransform();
   4:     var stream = new StreamReader(destDir + @"\" + view + ".xsl");
   5:     XmlReader xmlReader = XmlReader.Create(stream);
   6:     transform.Load(xmlReader);
   7:  
   8:     var outputMemStream = new MemoryStream();
   9:     transform.Transform(xmldoc, null, outputMemStream);
  10:     stream.Close();
  11:     xmlReader.Close();
  12:     outputMemStream.Seek(0, SeekOrigin.Begin);
  13:     return outputMemStream;
  14: }

So just a normal XSLT transformation, resulting in some HTML that is returned in a stream.

After this, I need to convert it into PDF, which is really simple using a tool we bought for this:

   1: private static byte[] GetPdfFromHtml(Parameters param)
   2: {
   3:     var pdfConverter = new PdfConverter
   4:     {
   5:         LicenseKey = "SomethingElse - You are not getting the correct License Key :-)"
   6:     };
   7:  
   8:     byte[] pdfBytes = pdfConverter.GetPdfBytesFromHtmlStream(param.HtmlStream, Encoding.UTF8, param.DestDir.EndsWith(@"\") ? param.DestDir : param.DestDir + @"\");
   9:     return pdfBytes;
  10: }

We are using the ExpertPDF library for this. The third parameter for the GetPdfBytesFromHtmlStream method call is the directory where the cabinet file was extracted to, since this is where all images used in the form are also kept and they are needed for the PDF to include them.

All in all; the component now works, but it turned out to be a lot more difficult than I had hoped.

As a last detail, I added a property to my pipeline component that the developer can use to decide which view to use for the transformation form XML to HTML.

The complete code for the pipeline component will not be available for download, since this was done for a customer, but I might do something a bit smaller and simpler and add it to my pipeline component collection later on.

--

eliasen

Tags:

Comments (1) -

Mikael Sand
10/12/2009 6:19:25 AM #

This is soooo cool. Really! I can't wait to get an assignment like this.

Reply

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading

About the author

Jan Eliasen is 37 years old, divorced and has 2 sons, Andreas (July 2004) and Emil (July 2006).

Jan has a masters degree in computer science and is currently employed at Logica Denmark as an IT architect.

Jan is a 6 times Microsoft MVP in BizTalk Server (not currently an MVP) and proud co-author of the BizTalk 2010 Unleashed book.

BizTalk Server 2010 Unleashed


Buy from Amazon

Microsoft MVP


6 times: July 2004, July 2008, July 2009, July 2010, July 2011, and July 2012. Not currently an MVP.

MCTS

Image to show

Month List

Page List