Joel Verhagen

a computer programming blog

Visiting the DOM with CefGlue

tl;dr

Take a look at the whole example project. I wrote it to be as self-explanatory as possible. This code uses the same versions of all of the dependencies as my previous CEF post.

Instructions

This post is a follow-up on a previous post. If you are not familiar with CefGlue, then you should probably go read that post first!

A commenter on my previous CefGlue post mentioned that the example I linked to about visiting the WebKit DOM (i.e. the actual in-memory tree) was not very helpful. Well, maybe there's something I can do about it! I whipped together a fun little demo which dumps the top 30 Hacker News titles to a text file.

The main difficulty when accessing a rendered page's DOM is that you can only do so in the same process as the associated renderer for that page. Remember, Chromium (and by extension, CEF and CefGlue) use multiple processes to run the browser. Therefore, the basic flow that you will have to follow is:

  1. Send a CefProcessMessage to the renderer process.
  2. Recieve the message in the render process.
  3. Point a CefDomVisitor to a CefFrame.

Send the Message

To send a message to a renderer process, you need a reference to a CefBrowser instance. This can be easily obtained from the OnLoadEnd callback in your CefLoadHandler subclass. Remember, you inject this specific handler in your CefClient subclass.

This will basically get the ball rolling by queuing another process to get'r'done.

internal class DemoCefLoadHandler : CefLoadHandler
{
    protected override void OnLoadEnd(CefBrowser browser, CefFrame frame, int httpStatusCode)
    {
        browser.SendProcessMessage(
            CefProcessId.Renderer,
            CefProcessMessage.Create("GetHackerNewsTitles")
        );
    }
}

Note the CefProcessId.Renderer bit. This directs the destination of the message.

Recieve the Message

To recieve the message, you need to override the GetRenderProcessHandler method on your CefApp subclass. The method requires that a CefRenderProcessHandler is returned. This means you need to create your own subclass and override the OnProcessMessageReceived method.

internal class DemoCefApp : CefApp
{
    private readonly DemoCefRenderProcessHandler _renderProcessHandler;

    public DemoCefApp()
    {
        _renderProcessHandler = new DemoCefRenderProcessHandler();
    }

    protected override CefRenderProcessHandler GetRenderProcessHandler()
    {
        return _renderProcessHandler;
    }
}

internal class DemoCefRenderProcessHandler : CefRenderProcessHandler
{
    protected override bool OnProcessMessageReceived(CefBrowser browser, CefProcessId sourceProcess, CefProcessMessage message)
    {
        // this code gets run in the renderer process
    }
}

Visit the DOM

Inside the OnProcessMessageReceived method, point a CefDomVisitor instance to a frame. In most cases, you will be pointing the visitor to the main frame (browser.GetMainFrame())

internal class DemoCefRenderProcessHandler : CefRenderProcessHandler
{
    protected override bool OnProcessMessageReceived(CefBrowser browser, CefProcessId sourceProcess, CefProcessMessage message)
    {
        if (message.Name == "GetHackerNewsTitles")
        {
            CefFrame mainFrame = browser.GetMainFrame();
            mainFrame.VisitDom(new DemoCefDomVisitor());
            return true;
        }

        return false;
    }
}

internal class DemoCefDomVisitor : CefDomVisitor
{
    protected override void Visit(CefDomDocument document)
    {
        File.WriteAllLines(
            "HackerNewsTitles.txt",
            GetHackerNewsTitles(document.Root)
        );
    }

    private IEnumerable<string> GetHackerNewsTitles(CefDomNode node)
    {
        if (IsHackerNewsTitle(node))
        {
            yield return node.FirstChild.InnerText;
        }

        CefDomNode child = node.FirstChild;
        while (child != null)
        {
            foreach (string title in GetHackerNewsTitles(child))
            {
                yield return title;
            }
            child = child.NextSibling;
        }
    }

    private bool IsHackerNewsTitle(CefDomNode node)
    {
        return
            node.NodeType == CefDomNodeType.Element &&
            node.ElementTagName == "TD" &&
            node.HasAttribute("class") &&
            node.GetAttribute("class") == "title" &&
            node.FirstChild.NextSibling != null;
    }
}

Final Notes

I would recommend doing as much work as you can in the render process instead of relaying things back in forth between the process constantly. Although process messages are in-memory, there is still a cost (especially if you are sending large blobs of data).

Also, remember you can use Visual Studio to debug the non-primary CEF processes by manually attaching your debugger.