User interactions with browser

2017-06-28 00:19:00

To automate browser I need to list all user interactions with it. I divided them into two sections. First are webpage interactions so keyboard/mouse/focus events. Second are browser window/tabs/address bar interactions.

First let's list webpage interactions cause those (besides selection) are fairy simple.

Browser have three keyboard events

keydown - interested
keyup - interested
keypress - result of keydown and keyup

And bunch of mouse events:

mousedown - interested can indicate selection start
mouseup - interested
mousemove - interested
click - interested
wheel - interested
mouseenter - important but I can use mouseover
mouseleave - important but I can use mouseover
mouseout - important but I can use mouseover
mouseover - interested

Finally focus events:

focus - I can use it to detect tab press that moves focus or use two below events
focusin - not so important
focusout - not so important

So those were some basic stuff I want to record to automate browser interactions. When doing so I need to know the place in DOM where interaction occurred and what were the effects of it. Sometimes to get effects I need to listen to the browser itself.

Great is that browser API named WebExtensions is now mostly cross browser compatible. So I can write once deploy everywhere with small modifications. ex. between chrome/opera and firefox it would be at least:

if(typeof browser == "undefined") {
    var browser = chrome;
}

but webExtensions API is not the point of this post. So getting back to browser interactions.

Let's start with webNavigation so I can know what's the current status of the webpage.

onBeforeNavigate- interested - something is going on with the address bar
onCommitted - interested - I know that the browser want this document
onDOMContentLoaded - interested - I can interact with DOM
onCompleted - interested - I know I can do other actions on webpage
onCreatedNavigationTarget - probably interested - cause indicates user opened tab in new window but let's leave it for now
onHistoryStateUpdated - not interested - we can get the same from onCompleted or window.location.href from webpage
onReferenceFragmentUpdated - not interested - I can stick with webpage interactions
onTabReplaced - not interested - nope
onErrorOccurred - not interested - let's not focus on user errors

Then we got tabs so I can detect if someone want to interact between two applications in different tabs or something happened ex. some popup showed.

onActivated - interested - know that someone switched tab
onActiveChanged - not interested - deprecated
onAttached - probably interested - someone attached tab to the window
onCreated - interested - someone created tab
onDetached - probably interested - someone detached tab from this window
onHighlightChanged - not interested - deprecated
onHighlighted - not interested - I never seen someone highlighting multiple tabs of browser so let's not focus on that for now
onMoved - not interested - let's not detect browser window move
onRemoved - interested - there might be some usecase when someone close tab for some reason
onReplaced - not interested - let's not focus on it
onSelectionChanged - not interested - deprecated
onUpdated - interested - yes I want to track updates on tabs
onZoomChange - probably interested - but let's not focus on that now

The last one are webRequest actions on webpage so we can store data for later usage. Also sometimes we need to delay next browser interaction and make it after the data is loaded so when we are recording webRequest we would know when to wait and when just simply interact with webpage ( hope that make sense). Let's not forget that webpage actions are mostly asynchronous.

onAuthRequired - interested - want to know when we need some basic authentication
onBeforeRedirect - interested - nasty stuff - redirections with ex. advertisement on pages or some tracking
onBeforeRequest - interested - so we know there will be some request related probably with some interaction
onBeforeSendHeaders - probably not interested - unless we want to modify headers later
onCompleted - interested
onErrorOccurred - probably interested - detect some errors
onHeadersReceived - probably not interested - unless we want to modify response headers
onResponseStarted - not interested - information event more important is onComplete
onSendHeaders - not interested - information event more important is onBeforeSendHeaders

Also there are more browser api. I will be also considering those in future as most interesting: contextMenus/cookies/history/runtime/sessions/windows.

But for now I will focus on events I listed above as the foundation of browser automator project.

As You can see it needs a bit of work if You want to listen to browser actions. I also hope that replaying those actions can be done all in javascript.

My focus is to create the browser automator that work locally without any cloud or internet connection. Every action will be stored in localStorage of the extension. I already know that webpage and browser interactions could be recorded into set of user actions and then saved with name. What I will focus next is replaying those interactions and also replaying browser actions.

Important points to consider when doing it is creating pauses between actions to pause recording / player. Allow manual modifications of set of actions and create some sort of universal pseudo description language of those interactions. So stay tuned for some more insights from struggles when building browser automation.

Home

About

User interactions with browser

Tags :