Visiting the DOM with CefGlue
tl;dr
Take a look at the whole example project. I wrote it to be as self-explanatory as possible. This code uses the same versions of all of the dependencies as my previous CEF post.
Instructions
This post is a follow-up on a previous post. If you are not familiar with CefGlue, then you should probably go read that post first!
A commenter on my previous CefGlue post mentioned that the example I linked to about visiting the WebKit DOM (i.e. the actual in-memory tree) was not very helpful. Well, maybe there’s something I can do about it! I whipped together a fun little demo which dumps the top 30 Hacker News titles to a text file.
The main difficulty when accessing a rendered page’s DOM is that you can only do so in the same process as the associated renderer for that page. Remember, Chromium (and by extension, CEF and CefGlue) use multiple processes to run the browser. Therefore, the basic flow that you will have to follow is:
- Send a
CefProcessMessage
to the renderer process. - Recieve the message in the render process.
- Point a
CefDomVisitor
to aCefFrame
.
Send the Message
To send a message to a renderer process, you need a reference to a CefBrowser
instance. This can be easily obtained from the OnLoadEnd
callback in your CefLoadHandler
subclass. Remember, you inject this specific handler in your CefClient
subclass.
This will basically get the ball rolling by queuing another process to get’r’done.
Note the CefProcessId.Renderer
bit. This directs the destination of the message.
Recieve the Message
To recieve the message, you need to override the GetRenderProcessHandler
method on your CefApp
subclass. The method requires that a CefRenderProcessHandler
is returned. This means you need to create your own subclass and override the OnProcessMessageReceived
method.
Visit the DOM
Inside the OnProcessMessageReceived
method, point a CefDomVisitor
instance to a frame. In most cases, you will be pointing the visitor to the main frame (browser.GetMainFrame()
)
Final Notes
I would recommend doing as much work as you can in the render process instead of relaying things back in forth between the process constantly. Although process messages are in-memory, there is still a cost (especially if you are sending large blobs of data).
Also, remember you can use Visual Studio to debug the non-primary CEF processes by manually attaching your debugger.