Getting started with R

Back in 2017-18, I started teaching a course in a business school – instead of including a lot of theoretical frameworks, I opted to go with the basics and some implementation and tooling concepts. One of the tools that I chose to teach was how to use R in business analytics.

For those of you who do not know R, here is a helpful wiki article on the same.

Teaching what is broadly a scripting language to graduate students who havent written a single line of code is a humbling experience. You cannot take concepts such as variables, control loops, libraries as taken for granted since the exposure to these has been minimal. So how does one pack all that information into simple, usable instructions?

This is what I did.

Use free MOOCs

For students to understand the basic concepts of R – vectors, assignments, matrix, simple functions, etc, I prefer free MOOC such as the Introduction to R by Datacamp. The course is simple, it has the R ide within the practice area and allows for easy practice sessions and a playground to test your scripts while you are reading through the course material. I find this super useful for application oriented courses.

Right before jumping into the actual concepts of how R can be used for business analysis, this basic introduction course helps in establish a good solid base for participants who want to get started with R.

Use multiple datasets

I typically start these sessions with a financial data set. Credit card usage statistics, or some such information. However, I realized that students do better if they are able to relate to the date. During the length of the course, I found that switching to a course such as movie data (and thanks to IMDB for opening up their database) or cricket data made a lot more sense. It became easier for the participants to apply conceptual learning on the data sets.

See and Do

We used to incorporate several practice sessions in the class. This included getting the basics, writing scripts and getting started with R.

Some of the easier ways are –

  1. Use the R-Studio IDE installer
  2. Use the Anaconda Navigator
  3. Use an online tool like Rdrr

Building your Custom Connector on Google Data Studio

Building your Custom Connector on Google Data Studio

Disclaimer – This is going to be a slightly technical post. If code scares you, then I’d suggest you skip this one. However, if data excites you, read on fellow adventurer!

When Google launched Google Data Studio, I had written an in-depth post about how one can create dashboards in Data Studio and some of the easy to use dashboard templates that Data Studio has to offer. As the product evolved, one of the most powerful features that this product had to offer was the ability to create custom data connectors to your own datasets.

What does a custom connector do?

A custom connector enables a user to access their own data source within Google Data Studio. Let’s take an example of a marketing research associate who wants to present her findings. One approach she could use would be to put all that data in Google Sheets, and then use one of the in-built connectors.

However, what would she do if her data set was large and does not fit in Google Sheet or Excel? Or if her data set included multiple surveys which are inter-related to each other?

What if this data was in a database, or available as an API? This is where the custom connector for Google Data Studio works.

I wrote my first connector a year back, and I had to do some digging around. I thought that I should pen down my notes so that more people can do this much more easily. Here are my notes for the same.

Building a custom connector

Before you jump into the implementation bit, know that this is based in JavaScript and you need to be comfortable with Google App Scripts. It’s okay if you do not know this, but JavaScript is a must.

Google has official documentation on the developer site of how to build a Community Connector, this is a pretty good resource to start. It has a step by step video and instruction guide as well.

Let’s look into what makes different parts of a connector, here is a link to a sample connector code on Github.

Community Connector sections

Each community connector is a separate Google Apps script that you deploy using Google App scripts. The connector is in itself made of the following sections –

  • Configuration section – This is to flesh out all the meta information about the community connector. Use this section to take any inputs from the user e.g API Secret and key if you don’t wish to store this in your code.
  • Authentication section – This is to authorize the app script. If your data is lying behind a secure mechanism, then use this section to authorize the script to access the data. This supports OAuth2 as well.
  • Schema section – This is used to define the structure to the data you are importing into Data Studio. Use this section to outline which fields and what data types are they. You can also add more information on what kind of aggregation do you want this field to be have (Sum, Average, Min, Max, etc).
  • Data section – This section is used to fetch the data that you are importing. Use this section for data validations or if you want to do any last minute data tweaks (e.g date conversions from string to date).

That’s all there is to it. Now, let us get into the actual flow of the script.

Connector code flow

When you are writing your connector, be sure to go through the developer reference first. In your script, you will have to include the following functions –

  1. getConfig() – this returns the configurable user options for the connector, this will be shown to the user when the user is adding the connector to their Google Data Studio accounts.
  2. getAuthType() – this is the function which is used to check if any authentication is required. If OAuth is set, then the community connector interface would check for the OAuth details
  3. getSchema() – this returns the schema of the data that is being access, this will be shown to the user when the data is being explored (where we can see the dimensions and metrics).
  4. getData() – this is the function which is used to access the data, the data format that is expected is outlined here. Normally, it is advised that the programmer write a separate function for fetching the data, a post processing function for setting up the return values, and finally call those in the correct order in this function.

Do note, that these functions will be called in the same order as they are listed. As long as you have these functions in your code, you have a functioning connector. Once you have this, you will have to deploy the code.

That’s it. Now, add this community connector to your Google Data Studio account, and make the reports you want to!

Data, Reporting and doing what’s right

Data is being used to showcase that vaue has been generated. In order to do this, the most beautiful reports have to be eeked out. Now if you are a follower of Avinash Kaushik and don’t like data pukes, then you would be aghast at some of the reports that agencies in India tend to dish out.

I was, and 13 Llama Interactive was born out of that need to do better at both data driven marketing and reporting with transparency.

The road to hell is paved with good intentions

If you’ve been providing paid marketing services to clients for any extended period of time, you know that every person you work with has a different level of online marketing knowledge. Some people might be experienced account managers, others might know basics, while others still might not know the industry at all. It can be easy…

via 5 Agency Reporting Tips to Prove Your Value to Clients — WordStream RSS Feed

Apparently “agency reporting” is a thing. This is where every week or every month, the agency that is handling the brand account (or the performance account if you may) sends across reams of PDFs (or excel sheets) that’s meant to prove that whatever hair brained plan that they had cooked up the last period has worked.

The most common method to justify existence is to keep throwing boatloads of data reports from all tools and then talk about worthless metrics. Each of these tools mentioned in the article that I have shared helps agencies do this at scale, effortlessly.

Is too much data a bad thing?

It can be. If all that data is leading to Analysis Paralysis … or if it leads to falling in love with data analysis itself and forgetting real business outcomes (the reason why you got money for funding the collection of all that data).

If no one is using this mountain of data for solving problems, then it’s better that the data not be collected at all.

Yes, you are letting go of possibilities, but so be it. The damage to the business by wasting resources on gathering more liabilities instead of assets is much worse.

That’s what creates a paradox. Should we or shouldn’t we collect data?

Here’s a great video from Superweek that makes the case pretty well.

Data the new Oil

Any analysis team would work day and night to justify the reason for their being. There are enough articles being shared on the internet on arriving at a Return on Investment for Analytics (RoIA). However, the main service that any of these teams did was to crunch business data into A-has. This hasn’t changed over the years, and a lot of analysts derive job satisfaction through this very hunt for the A-ha! from their audiences.

The switch to being a core business

Data and business analysis was until now a support function, which needed business data in order to thrive and be effective. Aside from very few models (those that sold business critical data such as ratings, organizational data, etc), the data was never used as the primary product.

There was always a pre-activity and an analysis activity for that data to be useful. However, over the years I am seeing that has changed. Data is now being presented and sold as the main product.

Data as the product

Those of you who know Bloomberg, Hoovers, S&P or CRISIL, would know that data as a product business model works. Now that you know the pattern, let’s take a look at how this business model works.

Data collection as a ancilliary service

There is one function of the business which works with the entire industry it is catering to, to collect data. This more often than not is made available as a freemium or free service.

Some examples of this would be – Alexa Certified metrics, Google Analytics, Walnut app, Swaggerhub, etc.

You get the general idea here. If a good product or service is offering you a free plan, more often than not the data you are entering on that platform would be mused for multiple usecases. Not just for your primary use case.

Data aggregation and visualization

This is akin to the marketing function, and most probably gets a lot of early adopters talking good things about the product.

E.g a blogger singing paeans about Google Analytics, an industry benchmark visualization being shared, data report about a competitor, etc.

This way, the inherent value in the data is presented.

Data access and pricing plans

This is how the business is monetizing the data. By selling access to it. Often on a pay per use basis, or a per data point basis. Note, there might be multiple reports given to the user, however the user has to do the analysis on their own.

E.g SEMRush, SimilarWeb, Alexa, etc.

Wait, these are all old products

Yes. They have been around for quite some time. However, I am seeing that other industry are also copying this model. I recently spoke to someone in the pharma industry who was selling aggregated prescription data to pharma companies.

The credit industry has already been doing this for so many years. TransUnion is a perfect example. In India, most working professionals are familiary with their CIBIL scores. What few people realize that CIBIL is a TransUnion company. Similarily, CRIF score (which is an alternative bureau) belongs to Experian.

What gets my goat in this scenario, is that the firm which is collecting data is based out of another country! This firm now claims to own and know the data of citizens belonging to another country.

Shut up and take my data

Let’s go back 300 years or so. The British killed the Indian textile industry by mutilating the weavers who used to make cloth. Then they bought the cotton and other crops at throwaway prices, that cotton is similar to the data that is being collected. The industry grade cotton which was then imported back in India is similar to the data aggregation and reports that are being sold.

The only difference is that 300 years back, we were scared of the East India Company. This time around, we are welcoming the data traders with open arms. Should we not be a bit more aware of who and how our data is being used?

The reason why EU is taking such a harsh stance with GDPR is a bit more clear. Where is the call for privacy and better data sharing protocols?

🎯 Why You Need to Stop Tracking These 5 Metrics

This article was written as part of the SEMrush Big Blogging Contest.

One of the things that going digital does to any brand, is that it suddenly gives access to a lot of data. Data, that opens up a world of possibilities.

Possibilities which had not earlier been anticipated or even thought of. Somehow, it propels teams to start thinking in terms of achieving certain data metrics … and that seems to justify the sheer obsession with data.

Continue reading “🎯 Why You Need to Stop Tracking These 5 Metrics”

Blind spots in Analytics

April 10, 2018. Dark social, even though we can’t see it or know what it is, is here. And we should fear it.

via Dark Social is Dangerous — Gareth Roberts

Read through the post, and realized that the title is a bit off. It’s not that Social Media is sending some dangerous traffic, but that the traffic being sent is being incorrectly measured as Direct traffic and therefore, difficult to act upon. This misdirection can lead to a lot of tactical mistakes.

What’s more interesting is the story about World War II that Gareth has nicely illustrated. The deaths due to a D-Day rehearsal were more than D-Day itself. The reason behind this is people coming to the wrong conclusions because of the data made available.

A light skim of this article might put me off Social Media as a marketing channel. As it is I am a bit biased against it, but this would have put the final nail in the coffin. However … this is the blind spot that I am referring to.

Slight misinformation, and there we go jumping to the wrong conclusions. As an analyst, something that you might want to keep in mind is the quality and the veracity of the data that you analyze.

More information about Samara Oblast

I had blogged about getting traffic through bots leaving referral signatures, and it seemed as if the whole internet saw this happening on their sites. After I blogged this post, Moz.com came out with suggestions on putting filters on Google Analytics to clear our your analytics data.

Continue reading “More information about Samara Oblast”