Tools and Services Blog Learn Quizzes Smile API Log In / Sign Up
Tools and Services Blog Learn Quizzes Smile API Log In / Sign Up
« Return to the tutorials list
We have updated our privacy policy to let you know that we use cookies to personalise content and ads. We also use cookies to analyse our traffic and we share information about your use of our site and application with our advertising and analytics partners. By using this website or our application you agree to our use of cookies. Learn more about the way this website uses cookies or remove this message.

Injecting code into pages with PHP DOMDocument

September 30, 2014 Difficulty: 25 / 50 Tweet
cube elements

Numerous content management systems provide various ways through which developers can alter the content of a web page through plugins or other types of extensions. In today's tutorial I will show you how an article that is loaded from an external source can be modified using the DOMDocument Class in PHP.

DOMDocument is part of the DOM PHP extension, which allows operations on XML / HTML documents using its API. DOMDocument actually represents an entire HTML or XML document. We will be loading it from an existing URL or file. If you are loading the HTML from the database, just replace the loadHTMLFile() method with loadHTML($html_string) which takes an HTML string as a parameter.

One other thing that should be mentioned is that if you are dealing with HTML instead of XML, the string doesn't need to be well formed. However, if the DOMDocument will encounter bad markup while loading the HTML string it will spit out a lot warnings. Because we all know that there's a lot of bad HTML out there... So i suggest you begin your script with the libxml_use_internal_errors(true) function which simply disables libxml errors. Now let's have a look at how you can load and alter an HTML document using DOM functions in PHP.

  
    <?php
      $dom = new DOMDocument(); //init
      libxml_use_internal_errors(true); //disable libxmlerrors

      //if you are getting a string from the database then use loadHTML($htmlcode)
      $dom->loadHTMLFile("http://www.codepunker.com"); // load the codepunker front page
      $all_h3s = $dom->getElementsByTagName('h3'); // get all h3 tags from the document
      foreach ($all_h3s as $h3) {
        $h3->appendChild( $dom->createTextNode(' (This text was modified) ') ); //append a text node to every h3 in the DOM
      }
      $out = $dom->saveHTML(); //save the changes
      var_dump( $out );
    ?>
  

The above code loads the HTML file generated by www.codepunker.com into the $dom variable, then it fetches each <h3> tag into a traversable object - an instance of the "DOMNodeList" class. Going through each element, the code then creates a new text node and appends it to the current <h3>. At the very end the changes are saved and the result is sent to the browser.

Another cool DOM-related method is DOMDocument::getElementById. This one returns a DOMElement object which in turn can be used to "get" or "set" attributes using getAttribute() and setAttribute() respectively. The DomElement can also be altered with inherited methods coming from DOMNode - appendChild() being the most commonly used.

Another important use case is the ability to retrieve the innerHTML or innerText from a DOMelement or DOMNode. If you simply want to get the text inside a tag the DOMNode::nodevalue property comes in handy, but if you want to "mimic" the functionality that innerHTML has in JavaScript - load all child tags and text nodes - then you can use a helper function like the one below (Thanks to Junior ).

  
  <?php
    function DOMinnerHTML($element) 
    { 
      $innerHTML = ""; 
      $children = $element->childNodes; 
      foreach ($children as $child) 
      { 
          $tmp_dom = new DOMDocument(); 
          $tmp_dom->appendChild($tmp_dom->importNode($child, true)); 
          $innerHTML.=trim($tmp_dom->saveHTML()); 
      } 
      return $innerHTML; 
    }

    $dom = new DOMDocument( );
    libxml_use_internal_errors(true);
    $dom->loadHTMLFile("someHTML.html");
    $all_h3s = $dom->getElementsByTagName('h3');
    foreach ($all_h3s as $h3)
      var_dump(DOMinnerHTML($h3));
  ?>
  
comments powered by Disqus