Data anomalies in Search Console

In the past 5-6 years or so, a lot of online businesses, especially the ones who are hungry for growth have relied on organic traffic as one of their key sources. Now growth could mean an increase in pure numbers (traffic, sessions, users) … or it could mean an increase in more tangible business parameters (revenues, profits). One of the things that I have learnt is that depending on which success metrics we chase, our own identity undergoes a shift.

Search as a major source of traffic

The major contributors to organic traffic are search and social. Wherever there is a site which has great and unique content by the loads, there is a chance for driving organic traffic.

At different points in time, I have been skeptical about Social Media and me-too posting that most brand pages do on platforms such as Facebook. However, Search for me has always been fascinating and I still have faith in Search :).

SEO can’t be a method

Search Engine Optimization (SEO) has evolved over a period of time and I have blogged about it on multiple occasions. Unfortunately, the number of times the algorithm changes and the rate of evolution of what Google (the market leader in this space) construes as quality content ensures that you can’t have a steady SEO “process”.

Having said that, SEO involves a fair amount of design thinking.

The reason behind this statement is because the problem behind search visibility (and the factors that control that) keep changing. It’s a wicked problem. Design thinking can solve such kind of problems because of its test and iterate mechanism.

Data to drive Design Thinking

This is where having the correct data to decide on next steps is crucial. Having a data driven design thinking approach would entail that there are periodical reviews of what kind of data we have available to make the right choices.

Search data has always been plagued with incomplete information. Starting from the 2011 encrypted search announcement, where a bulk of the data in Google Analytics was being reported as (not set). There have been ample approaches to clarify this data, unfortunately, as Google Search goes more towards handhelds and as digital privacy increases, the percentage of data where there is clear visibility will keep going down.

This can’t be helped. What can be done is take these “anomalies” into account and factor those in while doing your analysis.

So what kind of Data anomalies in Search Console do we expect to find?

Google Support has compiled this list. They keep updating their data reporting logic and keep updating this page as well.

One of the major changes that you can see is that last month, they started reporting more data in Google Webmaster Tools. Please bear in mind that this is just a change in the data that is being reported and not the actual search traffic that is on your site.

The link also explains why there is data disparity between Google Analytics and Google Webmaster Tools and any other third party tool that you could be using to generate keyword data.

So, my data is incomplete, what to do?

Don’t panic.

Work with the list of data anomalies and identify which ones are impacting you the most. Having visibility on which parts of data are not available to you is also better than not knowing anything and assuming that the data you have is complete.

In iterations, the first comparison is always your previous state. In both cases the data being made available to you is pretty much the same. Hence, a week on week comparison report is much more valuable as opposed to a comparison report with your closest competitor.

As long as the measures of success is on the same tool, the data anomaly should be cancelled out. Please bear in mind that for most of our data work, we do not need precise data but can work with coarse data.

A simple approach to identify this would be – if you work with charts and graphs more, then you can work with coarse data and absorb the anomalies. If you work with more than 4 decimals, then you might want to add 3-4 lines of disclaimer below your data.

ḷ.com, one more shenanigan in Referral Spam

Spamming the Analytics data of websites is now an old practice. It’s better known as Referral Spam, and I have written about this in the past at multiple times. Purely a black hat practice, I doubt whether it would give great returns.

Yes, it would give traffic to the spammer, but how does that really translate into revenue. Or is the tactic hoping to drive gullible folks by the hordes?

The referral spam industry for some reason also loves to send the geographical position as Samara. For those of you who are noticing this now, here’s how the tactic works.

How Referral Spam works

  1. The bot hits a particular site for multiple times in the day
  2. The analyst sees his Google Analytics account, and gets surprised by a spike in traffic. Who wouldn’t mind seeing such a spike :)
  3. The obvious report to check this out would be the Source / Medium in the Acquisition section.
  4. There staring at you in all glory is the spamming domain
  5. The analyst gets curious, and visits the site

The rest, would not be history, it would be a scam.

How should I combat this?

Raven Tools has a comprehensive article on combating Referral spam. They have listed multiple methods to ensure that this spammy data is not accounted for in your analytics data.

Personally, I allow the data to reside in my Master Data View. The reason behind that is – since I do not look at aggregate data anyways (I prefer lots and lots of custom segments), I am not too bothered with that data! I do however, mark it as a annotation on my GA. That’s the advice I would give to anyone.

Is there a point to Social Media Management?

Life is short. It is time to point out an ugly truth, and to be the brave person that you are, the intelligent rational assessor of reality that you are, and kill all the organic social media activity by your company. All of it. Seems radical, but let’s take it one step at a time.…

via Stop All Social Media Activity (Organic) | Solve For A Profitable Reality — Occam’s Razor by Avinash Kaushik

Any Social Media Marketer would take this as an affront, but the wealth of insights based on pure data that’s being shared by Avinash in the above article is something to think about.

Social Media Platforms are not to be confused as Owned Platforms

There are platforms which we build (such as our very own discussion forum) or a blog. These are Owned Platforms … and then there are platforms where people exist and we simply establish our brand’s presence on those platforms. Such as any Social Media sites e.g Facebook, Twitter.

In such cases, your brand’s outreach is subject to the policies dictated by that platform. Zuck’s Death Spiral (ZDS) is one such example that Avinash is talking about.

Shouldn’t brands adopt Social?

By all means adopt social and engage with your customers online. However, keep in mind that when in Rome, you do as the Romans do. That means, on Facebook – you follow the rules that Zuck lays out. Ergo, the same rinse repeat formula of posting 4-5 Social Media posts a day may not work.

What is required instead, is a concerted effort to truly wow your fans. If you do not wish to do that and want to instead rely on the same well worn formula of doing selfies of your brand, then your social media team is doing you a grave injustice.

A Success/Failure method for Analytics

When identifying the Key Performance Indicators (KPI) of your business, it makes sense to choose the proper measures of success. I have written about choosing the proper measures of success in the past. Since most of the work that I do is in the realm of the web, the principles via which we operate and do reports are more or less the same.

The only thing that changes is the conversion … or the success metric. In other words, the reason for which the website is built, the purpose of that site. Hence, the measure of success approach works.

Designing for new paradigms

However, what would happen if the product being built is not meant for the web, or was not based on the same principles? How would we go about identifying metrics and actionable reports.

For that we would have to go to the very reason why we need analytics.

The Purpose of Analytics

If I were to define the reason why we use analytics in any product, it would be to –

  1. Identify the wins, celebrate them and try to find the rules which get us more wins
  2. Identify the failures, and figure out ways to fix those failures so that we can improve

This view helps us do two things primarily, one to find out and scale the good things, and the other to find out and weed out the bad things in our product.

To do this, we would need metrics (or KPIs) that would indicate a success or a failure.

Measures of Success

The measure of success metric help in identifying the clear wins and celebrating them within the team. These also help in figuring out what worked for you in the past and on how to re-create those wins. One definitive thing that needs to be done (and I have learnt this the hard way), is that wins or measures of success metrics need to shared in a broader audience to give a sense of purpose to the entire team on what they are working on.

A good measure of success is task completion rate, or conversion rate, or profitability.

Measures of Failure

The measure of failure metric help in identifying failures within a certain activity. These are also metrics which help in identifying opportunities of improvement. Measure of Failure metrics should help us root out problems within our current design/product. I say root out, because once you identify the failure, you have to act and ensure that the failure does not happen again.

An example of measure of failure could be bounce rate.

Unlike measures of success, measures of failure may not be shared with large teams. Rather I feel (and I am want your opinion on this), that they are much more effective when communicated to the right localized teams.

Importance of Context in Analyzing data

Recently, I was analyzing some user generated data in a mobile app. The app was sending content on specific categories to a niche audience, and at the end of each content piece, there was a simple 5 star rating feedback for users to rate the piece.

The assumption that the design team who thought of this was that the feedback data was an objective metric.

Objective metric for Subjective behavior

Unfortunately, the behavior of users and how they understood the content piece is a very subjective topic. By subjective, I mean to say that for two different users, the value they would associate to the usefulness of the same piece varies.

We could always say ceterus paribus, but I would say – “Let’s not fool ourselves here”.

In the world of subjectivity, ceterus paribus doesn’t exist

There could be so many factors that are associated to my giving a 5/5 to a piece v/s 4/5 to the same piece, that in the end, I’d be forced to say it depends, and then list out of a whole new set of variables.

Slicing the Data with new variables

This is a problem. Since, my existing data set does not have these new variable. So, from analyzing – now I am back to collecting data. To be frank, there’s no end to this cycle … collect data, realize that you might want more data and rinse, repeat.

Where do we divine the new rules and new variables? We start from the context.

Ergo, the simple and freeing approach of the answer to the questions we were looking for in the data, sometimes lies partially in the data points, and partially in the context.

Let me illustrate this

Let’s take a fairly popular metric – Bounce rate.

Now, if I were to say that my website’s bounce rate is 100%, what would you say?

Sucks, right??

.

.

.

Now, if I were to tell you that my website is a single page website where I want my users to watch a product launch video. That bounce rate suddenly pales and aren’t you itching to ask me about the number of users who played the video upto a certain point?

If you have been working with Google Analytics, then some of you might even suggest that adding a non-interaction event in GA when the play button is hit.

One more example

Let’s take one more metric. Pages/Session to measure how much content the user is consuming on a site.

.

.

.

Let’s see this in a different spiel. A user is on your site, searching for content and is not able to find what he wants, and keeps visiting different pages. After going through 8-9 pages, he finally gives up and leaves the site. That 8.5 as pages/session now doesn’t seem that sexy now does it?

 

Understand the context

Therefore staring at a pure data puke may not help. Understanding the context under which that data was collected is as important as going through excel sheets or powerpoint presentations.

TL;DR – Data without context is open to too many interpretations and is a waste of time.

Shifting to _utmz to _ga

Some years back I had written about the __utmz cookie that Google Analytics uses to identify source attribution for visitors. If you are interested in reading that post, click here on Understanding the __utmz cookie.

Google evolves beyond Urchin

Google Analytics is based on the Urchin tracking management system and has been improving on that system over a period of time. As I have seen this product evolve, and many more features that were not there in Urchin … one of the major changes has been in the usage of cookies.

That makes my earlier post defunct.

The utmz Cookie

The utmz cookie used to contain the information about where the user has come from, which campaign, source and medium did the user react to arrive at the site. This information could be read and stored in a separate system (such as a CRM whenever a lead is captured). This could help in attribution of paying customers, and bring in all the crunchy goodness that you wanted.

Unfortunately, the utmz cookie no longer exists. The cookies have changed, if you are interested in know which cookies Google Analytics uses now, you can read this support article.

Where does that leave us?

So how do we go about finding more information about the user. This information is now not readable. However, what information we have on hand is a unique identifier of the cookie. That much still hasn’t changed.

So let’s take a look under the hood shall we,

The _ga cookie contains a value. This is the client id of the user. If you see the cookies collection, there are multiple _ga cookies, however, when you match it with the domain column, for every user – domain combination, there is a single _ga cookie.

This cookie is accessible by your server side script as well as your client side JavaScript. Therefore, we can get access to the _ga cookie value and store the client id within.

What is a client id?

To understand this, let’s go to Google Analytics. In GA, under the Audience section, we have a User Explorer report. Here’s a screenshot from my GA –

Check the value – 129754452.1496423206

This is available in the _ga cookie as well as in the user explorer. I can now identify specific users and leads in my CRM based on their client ids.

Therefore, I can even start checking their user behavior on the site, like so –

This is how the user has been visiting the site over a period of time. Notice the source is changing for different visits.

In a world where I would have been storing just the final source in the CRM, now I have a much more detailed view of how the user keeps coming to my site. This allows me to explore other attribution models and share the credit of the user’s conversion across channels.

This brings me one step closer to the World of And.

The World of And

In case if you haven’t already watched this, you need to watch this –

 

Taking a look at Jetpack Stats

Let me state upfront that I love Google Analytics. I use it at work in 13 Llama Interactive to measure the effectiveness of the campaigns that my team runs.

That being said, I will try and not be too biased about comparing Jetpack Stats to Google Analytics. As a marketer, the way I look at an analytics package is from an ability to extract a fair amount of data.

However, Jetpack Stats is on top of WordPress and available to all WordPress based sites which are connected to the WordPress.com site. This makes Jetpack Stats primary user base as bloggers.

Let’s see what Jetpack Stats has to offer.

The wp-admin Dashboard Integration

Jetpack Stats puts a nice pretty looking graph on the wp-admin Dashboard. This is how it looks like for my site –

Jetpack-Stats-on-wp-admin-Dashboard

Now, this is fairly similar to the Audience Overview you get when you check out Google Analytics.

Google-Analytics-Dashboard

Straight off the bat, I prefer Jetpack Stats overview as opposed to the one given by Google Analytics. Jetpack Stats also provides me with how my posts have performed this day, this report would be available in GA witin the Behavior section, the Site Content report.

The Top Searches that you see in the screenshot would have been helpful had it been accurate. Unfortunately, Google accounts for the majority of organic traffic on my site, and most of that traffic is encrypted. Thus, these keywords that you see (really, I rank for ‘big ass girl dunes’) are not a complete set!

Jetpack Stats does not talk to Google Webmaster Tools, which now is the only source of this keyword data.

Jetpack Stats Posting Activity

One awesome feature about Jetpack Stats is the posting activity screen –

Jetpack-Posting_Activity

This data is shown with a correlation of average traffic per day as well as traffic per month. You could always get this data in Google Analytics (here is a useful post I had written some time back – Google Analytics for Content Marketers).

It’s just this kind of insights that makes me keep Jetpack around for my measurement requirements.

Jetpack Stats vs Google Analytics

Jetpack Stats is a very lightweight tool and it would be useful for a simple blog. However the minute we enter the realm of finding user engagement and performance marketing, Jetpack simply does not have those features yet.

This is where Google Analytics shines through with its Event tracking.

Having said that, Jetpack Stats is an apt solution for a user who is more focused on the publishing process.