To automate browser I need to list all user interactions with it. I divided them into two sections. First are webpage interactions so keyboard/mouse/focus events. Second are browser window/tabs/address bar interactions.
First let’s list webpage interactions cause those (besides selection) are fairy simple.
Browser have three keyboard events
And bunch of mouse events:
- mousedown - interested can indicate selection start
- mouseup - interested
- mousemove - interested
- click - interested
- wheel - interested
- mouseenter - important but I can use mouseover
- mouseleave - important but I can use mouseover
- mouseout - important but I can use mouseover
- mouseover - interested
Finally focus events:
- focus - I can use it to detect tab press that moves focus or use two below events
- focusin - not so important
- focusout - not so important
So those were some basic stuff I want to record to automate browser interactions. When doing so I need to know the place in DOM where interaction occurred and what were the effects of it. Sometimes to get effects I need to listen to the browser itself.
Great is that browser API named WebExtensions is now mostly cross browser compatible. So I can write once deploy everywhere with small modifications. ex. between chrome/opera and firefox it would be at least:
but webExtensions API is not the point of this post. So getting back to browser interactions.
Let’s start with webNavigation so I can know what’s the current status of the webpage.
- onBeforeNavigate- interested - something is going on with the address bar
- onCommitted - interested - I know that the browser want this document
- onDOMContentLoaded - interested - I can interact with DOM
- onCompleted - interested - I know I can do other actions on webpage
- onCreatedNavigationTarget - probably interested - cause indicates user opened tab in new window but let’s leave it for now
- onHistoryStateUpdated - not interested - we can get the same from onCompleted or window.location.href from webpage
- onReferenceFragmentUpdated - not interested - I can stick with webpage interactions
- onTabReplaced - not interested - nope
- onErrorOccurred - not interested - let’s not focus on user errors
Then we got tabs so I can detect if someone want to interact between two applications in different tabs or something happened ex. some popup showed.
- onActivated - interested - know that someone switched tab
- onActiveChanged - not interested - deprecated
- onAttached - probably interested - someone attached tab to the window
- onCreated - interested - someone created tab
- onDetached - probably interested - someone detached tab from this window
- onHighlightChanged - not interested - deprecated
- onHighlighted - not interested - I never seen someone highlighting multiple tabs of browser so let’s not focus on that for now
- onMoved - not interested - let’s not detect browser window move
- onRemoved - interested - there might be some usecase when someone close tab for some reason
- onReplaced - not interested - let’s not focus on it
- onSelectionChanged - not interested - deprecated
- onUpdated - interested - yes I want to track updates on tabs
- onZoomChange - probably interested - but let’s not focus on that now
The last one are webRequest actions on webpage so we can store data for later usage. Also sometimes we need to delay next browser interaction and make it after the data is loaded so when we are recording webRequest we would know when to wait and when just simply interact with webpage ( hope that make sense). Let’s not forget that webpage actions are mostly asynchronous.
- onAuthRequired - interested - want to know when we need some basic authentication
- onBeforeRedirect - interested - nasty stuff - redirections with ex. advertisement on pages or some tracking
- onBeforeRequest - interested - so we know there will be some request related probably with some interaction
- onBeforeSendHeaders - probably not interested - unless we want to modify headers later
- onCompleted - interested
- onErrorOccurred - probably interested - detect some errors
- onHeadersReceived - probably not interested - unless we want to modify response headers
- onResponseStarted - not interested - information event more important is onComplete
- onSendHeaders - not interested - information event more important is onBeforeSendHeaders
Also there are more browser api. I will be also considering those in future as most interesting: contextMenus/cookies/history/runtime/sessions/windows.
But for now I will focus on events I listed above as the foundation of browser automator project.
My focus is to create the browser automator that work locally without any cloud or internet connection. Every action will be stored in localStorage of the extension. I already know that webpage and browser interactions could be recorded into set of user actions and then saved with name. What I will focus next is replaying those interactions and also replaying browser actions.
Important points to consider when doing it is creating pauses between actions to pause recording / player. Allow manual modifications of set of actions and create some sort of universal pseudo description language of those interactions. So stay tuned for some more insights from struggles when building browser automation.