Lightweight Alternatives to Google Analytics

I have my own self-hosted Matomo instance [1].

Via Docker & docker-compose it’s quite easy to install and keep up to date and Matomo is open source, well maintained, very well behaved and pretty hands off.

And I configured it on my websites with cookies turned off [2] and with IP anonymization [3]. In such an instance you don’t need consent, or even a cookie banner, because you’re not dropping cookies, or collecting personal info. Profiling visitors is no longer possible, but you still get valuable data on visits.

Note that if you want to self-host Matomo, you don’t need more than a VPS with 1 GB of RAM (even less but let’s assume significant traffic) so it’s cheap to self host too.

And I disagree with another commenter here saying Analytics is just for vanity. That’s not true — even for a personal blog analytics are useful to see which articles are still being visited and thus need to be kept up to date, or in case content is deprecated, the least you could do is to put up a warning.

And if you write that blog with a purpose (e.g. promoting yourself or your projects) then you need to get a sense of how well your articles are received. You can’t do marketing without a feedback loop.

[1] https://matomo.org/

[2] https://matomo.org/faq/general/faq_157/

[3] https://matomo.org/docs/privacy/

> And I disagree with another commenter here saying Analytics is just for vanity. That’s not true — even for a personal blog analytics are useful to see which articles are still being visited and thus need to be kept up to date, or in case content is deprecated, the least you could do is to put up a warning.

Some examples: I maintained a Vim ChangeLog for a while (which is quite some work), and turned out no one was reading that, so … why bother?

In another case, I wrote an article about “how to detect automatically generated emails” and I thought it wasn’t actually that interesting and no one read it so considered archiving it, but turned out quite a few people end up there through Google searches etc. and I ended up updating it instead of archiving it, as it was clearly useful to people.

Three times I found out that a blog post of mine was on the HN frontpage, because my matomo was going bonkers. This might count as “vanity metrics”, but it taught me, above all, how valuable “being here” is. It has a clear and undeniable “inkstain” effect: where other blogs, news outlets and so on follow HN. HN goes hand-in-hand with reddit, for my niche, though.

My wife as several e-commerce and our weekly “walking through the matomo screens over a glass of wine” has learned us that there are important niches. And what those niches are (vegan smartphone covers, Fairphone flip-cases, fairtrade and environmental-friendly mouthmasks).

Sure, you also need customer interviews and old-fashioned market-research, but your webapp and website is telling you a lot about your users.

And sure, often, you don’t need any metrics. But just like Carpetsmoker above, I see a lot of value in metrics. Just don’t fall in the trap to collect “you-never-know” metrics: that is privacy-invading, a liability and requires a scale beyond anything you really need. You don’t need data-lake, distributed ETL processes and whatnot to find out that there are products in your webshop selling better than others because you are doing well on natural searches for that product.

I self-hosted Matomo for a year and a half (and took over the AUR package for it and improved it in the process). It was no trouble to run, but I ended up uninstalling it late last year, for a few reasons: its interface is painfully slow (and that’s nothing to do with my 1GB/1 vCPU VPS—I’ve interacted with a decent-sized instance at innocraft.cloud and it was similar), and I seldom looked at it, and I couldn’t think of any way in which anything I found in the analytics would change my behaviour, and server-side analytics are good enough (better on some ways, worse in others), and I value speed. So all up, I figured: why am I slowing all my users down with this 50KB of JavaScript (of which I frankly need less than 1KB), and why am I keeping this software going?

So now I pull out GoAccess (which reads the server logs) from time to time. I find that my Atom feed is the vast majority of traffic to my site, which Matomo couldn’t tell me. I should implement pagination on the feed and see if that helps. (Or limit the number of items in the feed, but conceptually I rather like everything being accessible from the feed. Wonder how many feed readers support pagination?)

My websites are behind Cloudflare. If not Cloudflare then I’d use another CDN. Therefore I don’t have logs.

Also I disagree about the slowness.

The script is loaded asynchrously, it does not block the page and I measure my loading times, which are really good actually. Just did a measurement and my front-page loads in 271 ms and this includes all network requests, including Matomo.

I don’t think this is a real concern, but rather a premature optimization. If GoAccess works for you, great, but that’s not something I can use due to CDN.

The painfully slow interface I’m speaking of is Matomo’s app, the part you as the site administrator look at; not piwik.js or whatever they call it now.

But since you’ve raised the script part, 50KB of JS loaded from a new host is perhaps surprisingly much work, especially on slower devices. I find the difference between running no JavaScript at all and running Matomo’s client script, even asynchronously, to be easily visible.

> Just did a measurement and my front-page loads in 271 ms and this includes all network requests

Is your audience just the people in your locality, or the entire world?

Exact same reasoning here. I was burdening my users with slower load times, for something that didn’t ever impact my “product” decisions. For any meaningful analysis, I always pulled up server side logs of things.

So I changed to GoAccess too. I don’t check it too often, just when I want to see the impact of some spam/publicity posting around.

Is pagination supported in feeds?

It would be very useful. Like you said, it’s nice when everything is accessible from the feed.

Deliberate semantics were defined for this as part of AtomPub, https://tools.ietf.org/html/rfc5023#section-10.1 (before that, it made sense that it would mean this because of the relations registry, but nothing had been defined). It’s clearly applicable to Atom syndication in general, but it’s definitely more useful to AtomPub. I have no idea how wide client support is.

> And I configured it on my websites with cookies turned off [2] and with IP anonymization [3].

Do you have a way to filter out your own visits in this case? On small pages I find that my own clicks and events during testing contaminates the statistics.

> And I configured it on my websites with cookies turned off [2] and with IP anonymization [3]. In such an instance you don’t need consent, or even a cookie banner, because you’re not dropping cookies, or collecting personal info. Profiling visitors is no longer possible, but you still get valuable data on visits.

Does this mean each page hit cannot linked to be any other? For example, can I see that a visitor viewed a particular sequence of pages?

Just spitballing here but could you use the Referrer header to track sequences of pages

I mean, I’m not making the argument that analytics are useless, but this seems like the worst possible example. You can do this trivially with a script to analyze your server (e.g. Apache) logs. And you don’t need “a VPS with 1 GB of RAM” for that – which is four times the RAM of the VPS my personal website has run on for the last half decade.

This approach also uses no client-side javascript to collect data, so you wouldn’t have to alarm users with potential privacy threats, because nothing is stored other than what’s in the HTTP headers.

For a lot or maybe even most people, all of that entails some mix of: possibly having to set up server logs in the first place, redirecting them somewhere for analysis and possibly archival, having access to a server in the first place, hooking up the analytics apparatus to consume the logs yet also be able to serve a dashboard, setup yet another moving part, finally get around to that “TODO: setup real logging instead of println()” issue, etc.

There’s lots of take for granted there. It’s no wonder copy and pasting a

There may actually quite a lot of PII in the HTTP headers: IP address, Accept-Language, and User-Agent being the most obvious (especially mobile browsers often send a ridiculous amount of PII in the User-Agent header).

There’s also some very useful information that’s hard to get from HTTP headers, screen size being the most obvious one.

I don’t think it’s fundamentally more privacy-friendly. The real problem with JavaScript trackers from GA and ad networks is that they’ll try to profile you with tricks like font metrics, audio API, set cross-domain cookies, and whatnot. But that doesn’t apply to either Plausible or GoatCounter (and in the case of GoatCounter, the count.js script is intentionally unminified so it’s very easy to see what exactly it does if you care to do so).

GoatCounter runs fine on constrained environments; it has a RES memory size of about 30M, and for the first few months goatcounter.com ran on a $5/month VPS which was mostly just sitting idle.

Right, I found the official docker image https://github.com/matomo-org/docker

But I was hoping for a simple DIY guide for setting everything up? I mean with Google Analytics a dummy can set it up very quickly. I know it’ll take more work with matomo, but I need a little more details then just “use docker”.

I know I need a VPS, something like Digital Ocean.

I know I need the matomo docker image.

Sorry for misleading you. You don’t need docker to run matomo (docker makes it convenient to install matomo – especially if you are running other containers on your server – but there are shortcomings, mainly the container setup).

Matomo is just a set of php files. Upload them to any php hosting with a mysql database, in a matomo or matomoanalytics folder and point your browser to your domain name/matomo/ and the install setup should begin.

If you have 0 experience with docker and just needs matomo then forget about docker and start from the php file with a standard php/mysql host.

Matomo needs a database server (MySQL) and a way to execute cronjobs – that’s about it. There’s some gotchas if you’re running in a HA setup or if the database runs on a separate server.

I have a full deployment of Matomo in Kubernetes on gitlab [1] if anyone would find it useful (includes correct settings for running in multiple pods).

1: https://gitlab.com/pcgamingwiki/webanalytics/-/tree/master/

I haven’t used the docker image, but Matomo is PHP in its simplest form.

You unzip it. Click through a setup wizard. Done.

(I also run in readonly noexec in an fpm chroot but that’s not necessary.) I set it up for a couple of clients since the Piwik days and it’s been pretty much set and forget, apart from the occasional upgrades.

There are plenty of advanced functionality which probably few people understand and use. For my personal projects I am fine with log analytics which I mostly use goaccess for.

I am not familiar with matomo, but aren’t those instructions enough? You go to DO, create new droplet from that image, and done? I assume once you have it installed you will get more info on how to add the tracker on your site from their interface ?

I turned off Google Analytics, because I realized that it doesn’t actually report any useful or actionable data, just vanity metrics, and many of them of dubious quality.

I run a SaaS and what matters for me is paid subscriptions. “Visits” (even if by humans, which is hard to tell) really do not matter much. Yes, I do want to increase conversion rates, and run bandit experiments, but I’m better off doing that myself.

What also matters are search terms, but Google’s search console (or tools, or whatever it’s called this week) provides that.

Turning off Google Analytics was hard to do psychologically — the Fear Of Missing Out is strong. But it turns out I’m not missing out on anything, except some dubious vanity data. And I’m making the web a better place in the process.

Analytics may give you the number of paid subcriptions but that is not the reason it exists. Modern analytics systems are build to give you a statistical view of all events/interactions leading up to a conversion or, more important, to a conversion that did not happen. With this information it is possible to optimize all touchpoints a user has with your services.

It is easy to track all aspects of subscriptions including recurring revenue. The data quality depends on the data you send to google analytics and is not “dubious” but in your own responsibility. And google analytics is really good in separating bots from human visits.

If google analytics did only report vanity metrics to you, you most likely did not use it the right way. Maybe you missed the segmentation tools to find groups of users for whom your service did not work out?

> google analytics is really good in separating bots from human visits

That certainly wasn’t my experience.

But more generally: I optimize touchpoints using automated Bernoulli bandits with multiple variants. And I track the metrics that really matter (like signups, MRR, churn, etc) very, very carefully. My point was that just adding GA doesn’t bring much value, and makes the web worse.

Another example of how page visits don’t matter: it’s easy to get a huge spike of HN users clicking through. But if my SaaS has no relevance to HN users (except as a technical curiosity), this doesn’t matter at all. It won’t change my revenue, so it’s irrelevant.

Unless you run a site with ads, focusing on page visits doesn’t make sense: it’s like measuring the performance of a supermarket and getting excited about increasing traffic on a nearby highway. Could it influence your sales? Possibly. Is it actionable? Nope.

It’s more like the supermarket keeping track of what isles people are going down and what items they are looking at. Which supermarkets would do if they could because it’s extremely valuable data.

> Which supermarkets would do if they could

They can, and they do. Here[1] is a random example of applying that to a supermarket (the other pages on that vendor’s site shows some of the other ways the tech can be used).

Solutions tend to use a mix of video camera data and cellular data (wifi and bluetooth beacons[2]) to track in-store movements and behavior.

Done well, they can also connect individual store journeys to specific point of sale transactions, and understand hoe variations of in-store behavioral patterns ultimately influenced actual transactions. Which opens the door to running in-store optimization experiments very similarly to how you’d run A/B optimization tests for websites/apps.

[1] https://retailerin.com/en/retailerINfor/customer-behaviour-a…

[2] https://www.thinkwithgoogle.com/marketing-resources/retail-m…

GA used to be good in separating the bots from the humans, but now I don’t think so. I see a huge number of visitors from Dublin that seem to be AWS sourced bots (and Asburn, both have bounce rates like no other city). Millions of visits, from a place where we would expect thousands.

GA’s “Filter bots” setting simply applies the IAB bot filtering list[1]. The resources required to operate a page rendering bot[2] used to be a high enough bar that the ones doing so were large actors and would make their way onto that list pretty quickly.

The IAB list is still a helpful baseline (if overpriced if you want to lease the list itself[3]), but all it’s doing is applying suppression based on things like known bot IP ranges and user agents. It’s far less effective than it used to be since it’s so incredibly easy and cheap nowadays for anyone with an interest to spin up a rendering bot. Now you’ve got to supplement it actively with your own set of filters and heuristics if you really want to get rid of bot traffic polluting your data.

[1] https://iabtechlab.com/software/iababc-international-spiders…

[2] Scraping bots were cheap and easy to run before, but didn’t render the GA javascript code so never showed up in analytics. It’s only bots that use something like a headless browser to render the page that show up in GA, and those have only become commoditized and cheap/easy relatively recently.

[3] Google applies the IAB list to your traffic for free if you check the setting for it, but if you want to use the IAB list yourself you have to pay $4k – $14k annually to lease it from IAB.

The bots got better and harder to detect.

You can actually set analytics to exclude traffic from a certain ISP or set of IPs on the View’s Filters page. There is some legit traffic that comes from data centers, but only a tiny fraction.

I also don’t think google analytics is that useful. It’s one of many possible tools for user tracking, and far from the best one. Thought it’s “free” so I guess that’s why it’s more popular than much better tools.

But the fundamental problem is that analytics provides data and information, when what people want is knowledge and wisdom. But you need to do the work to get it. No amount of analytics is going to tell you that your Facebook ads are underperforming their potential because your buy button is hidden by an overlay in the built-in Facebook browser, especially if Facebook is your best performing channel overall. Analytics can’t tell you what’s not there. There’s no substitute for having developers and marketers actually purchase your product, themselves, on the channels your customers use. And too many companies don’t do that.

Just a hypothesis I have, that most e-commerce companies are leaving millions to tens of millions of dollars on the table by not giving all their employees a corporate credit card and having them purchase their product on it, say, once a month. They’re losing out on far more than they’d end up paying for the few instances of fraud.

For a personal project everyone basically ignored every post I made about it. I used server logs to generate analytics and noticed people hit the first page and left (I assume they spent seconds and left).

I then broke up the main page to many smaller pages and notice people still leaving right away but my documentation page got more overall clicks since the main page didn’t massively overload them

I guess analytics don’t give you answers but know what pages they tend to click on and what happens when you make a page more simple/more heavy you can figure out a solution

However I didn’t end up finding a solution and I’m planning to rewrite my project. I have a better idea how I should introduce it to people next time

I feel the same way.. mostly. I want one thing from GA and that is user flow. I’ve first hand seen this help a startup run landing page experiments and very quickly to improve their conversion rates. (The startup was selling a luxury consumer product)

This won’t work for everyone and will be of little value to many. However that tool just doesn’t seem to exist on most if not all of the lightweight alternatives. Matomo is the only alternative I have seen implement this feature, though those with more experience with the alternatives will hopefully show me I am wrong on that.

The main product I work on does not use google analytics, but we do use mixpanel to see what our paid users are actually doing with the SaaS. We actually don’t care who each user is, we just want the aggregate data. We believe this data is important for retaining paid subscriptions and attracting new ones. Let me explain.

There’s one feature that 100% of our subscribers use. I mean, it’s the main thing we do and everyone who subscribes needs that function. Even without analytics, we know that we have to continually make that feature faster and smarter to stay ahead of our competition. We know that if we fall behind our competitors, our subscriptions will dwindle.

But, then we have a bunch of other features that help customers solve some edge case problems around that main thing. It’s very important for us to know which of those functions are being tried, used, and reused (or not). Not all customers use these features, but some are tools we have that nobody else in our space does. So, having insights on which ones are getting traction and which ones need improvement help us spend our marketing and engineering time better to attract new customers and retain our existing ones.

I can’t imagine not having any analytics. I feel like we need them to continually make small course corrections that ensure we’re providing the best value to our existing customers.

> We actually don’t care who each user is, we just want the aggregate data.

> It’s very important for us to know which of those functions are being tried, used, and reused (or not)

Aren’t these 2 at odds with each other, or else how can you tell when the same person re-uses a feature? Surely you need some kind of user identifier for that?

Yes, good point, I should have been more clear. You’re right that we track unique users, but we do so in abstract. I suppose with some work we could pull together enough data from different sources to determine who a particular user was in a particular session. I just meant that we don’t do that and it’s not easy for us to do that because we don’t care about that type of data. We only care about what each abstract user does in the sense that we want to know the aggregate of how many users did that thing.

Additionally, unlike google analytics, we do support the browser’s “do not track” flag. So if a user doesn’t want to be tracked at all, we completely respect that.

So far GA is answering two important questions for me – which marketing strategies are actually working (because it’s hard to tell when you’ve got multiple going at once), and also making sure my marketing is actually hitting the geo-regions I need it to.

That said though once I know which marketing tools are effective, there’s nothing more that GA does that CloudFlare couldn’t just tell me anyway (i.e. am I getting more or less traffic) and I’ll probably drop it as it’s one less dashboard to look at – like you said that conversion to subscriber _is_ the ultimate metric for success.

This was my experience as well. Early on, when I was still learning, it helped me learn the importance of extremely low Time-To-First-Paint. But now, I just default to good designs that are easy to read. There’s nothing new to be learned that GA could provide insight on.

Also, early on, I found the Referrer tracking to be useful for discovering the reach of my projects and to help me get into conversations with users on other sites to help them with using my software. But that feature of GA eventually became useless when Google did nothing to address Referrer spam.

> I realized that it doesn’t actually report any useful or actionable data

The actionable part never occurred to me, but makes so much sense. What action could anyone really take, based on the data presented by Google Analytics? On top of my head I can actually think of anything you could easily get from server logs.

Examples I saw in a website I launched last year:
High bounce rate showed that people did not find the content engaging / not what they were looking for, confirming a hypothesis I had; this was particularly true for certain sections of the website.
Time spent on pages was another metric that was helpful to find problems in some sections.
Easy slicing of traffic sources by referrer and country, showed me that most of my “good” traffic was coming from facebook, so I invested more time there.

To optimize the bounce rate metric even further a funnel can be created and measured. This allows to see bounces for each part of the funnel up to your conversion(s). It is often really valuable to see how a change on your website improves one part of the funnel but impairs another one, and than to think about the reason why this happens.

There are millions of possible answers to this question… Examples:

– Conversion rate grouped by browser and browser version to find browser specific bugs.
– Total revenue per user per marketing channel / search keyword / … to optimize budget allocation.
– Revenue by mobile OS version share to decide testing procedures to not optimize for users that don’t contribute to your bottom line.
– Client side loading times per user location and provider to optimize infrastructure placement.
– …

One of the reason Google Analytics (and logging solutions like it) is popular is that people don’t have access to server logs.

Server side logs aren’t an accurate way of doing analytics and bring up compliance challenges (storing access logs and using them for purposes other than security etc.)

> Server side logs aren’t an accurate way of doing analytics …

Based on what? Server side logs records what the server itself sent. That sounds like it should offer far more accuracy than what a JS only solution would do.

That being said, if the code for analysing the server logs is lousy, it’s not going to help. 🙂

The identifier in server logs is IP address, which is useless for distinguishing users. On the corporate side, NAT hides all the visits of many people behind a single IP. On the consumer side, ISPs change IP addresses all the time so a single person will show up as multiple IP addresses.

Note that I’m just talking about distinguishing people, not identifying or tracking. A basic question like “how many people came to my website today” is more accurately answered with client-side analytics than by analyzing web server logs.

> A basic question like “how many people came to my website today” is more accurately answered with client-side analytics than by analyzing web server logs.

How can client side analytics be more accurate, when ad blockers stop visits from even being registered by client-side analytics?

Before looking at accuracy, you already have compliance challenges with logging IPs / user agents and using it for analytics purposes.

Please don’t try and change the goal posts. You said:

> Server side logs aren’t an accurate way of doing analytics…

What is your thinking behind that statement, as it sounds incorrect?

Can you talk a little bit more about why server side logs aren’t accurate for analytics? I was planning on doing basic analytics (pageviews mostly) using Cloudfront access logs stored in an S3 bucket and queried through Athena for a site I’m working on which uses the AWS stack.

I know Cloudfront logs can sometimes drop, but is there a more important reason you’re talking about?

I think with server-side only is harder to filter bots, crawlers and get accurate bounce rates or session length times.

Popular. Because adding lines of javascript is much easier than processing server log files.

I find time spent on page is a great way to measure performance of redesigns on page. There are numerous actionable points.

It might be, assuming two things:

1) That you actually care about this metric. I don’t, I do not get paid by the number of minutes spent on pages, I get paid by the number of signed-up subscribers who use my software to make their workflow easier. I can (and prefer to) use bandit testing to measure the performance of redesigns.

2) that it can be reliably measured, which I don’t think it can.

We are a SaaS company also, and time spent reading manual/tutorial pages is important to us.

With regards to point two, being reliably measured sounds to me like perfection is the enemy of adequate. Perhaps in low volumes you can’t measure certain stats like this reliabily but in large volumes I think it’s useful.

Tom Gilb once said that “everything can be measured so accurately that the result is better than having no measurement at all” or words to that effect. I’m sure he phrased it better. In this case, some users aren’t measured, which drags down the accuracy of the measurement.

Even if the numbers are off by quite a few per cent, I can easily see how some site operators might benefit from knowing e.g. that visitors close one tutorial much quicker than the others.

As long as your measurement is a random sample everything is okay. Even if it is not, it is much more information than you had before. You just need to keep it in mind when evaluating conclusions. We are not talking about drug tails here and no one dies if the measurement is not 100% accurate.

I am able to measure everything 100% accurate. But this is really really expensive. It’s a trade-of.

Unless your page is google.com, where more time spent on page means the experience is worse, not better.

Context is important, and it’s one metric amongst others that should be used.

The point is, there are clearly actionable points GA can offer. It’s disingenuous to think there are not.

I agree, saying that there are no actionable points in GA is like saying analytics in general are useless.

Counter example: I search for information on a subject. I get many interesting and relevant results and therefore spend more time on the page.

This makes it even more obvious that a longer or shorter session length is not clearly better or worse, it depends on a lot of other things.

Hook into onbeforeunload and call your own javascript to post back to your own site to let you know when they’ve left.

Well, you don’t specifically need GA for it, but tracking referers to sales, for example…is often actionable. Similar for a funnel view to see where visitors drop off. Especially combined with some A/B testing.

Good for you! This is the right method. Analytics — in the sense of watching mouse clicks and reference urls — are only trying to close the feedback loop FAR sooner than when it actually matters.

Depends how big you plan on getting. Be infinitely easier to hire someone who knows how to use GA to make decisions, and historical data will be useful at that point.

Do you not find value in adding events to your pages so you have user funnels for conversions? Or do you think bots cause too much noise for this?

I run a reasonably large website and about two years ago it dawned on me that I never checked Google Analytics. It was completely useless. It wasn’t telling me anything useful. I also knew that it was marginally user hostile (or at least perceived as such) and affecting page performance, even if only slightly.

Removing it felt momentous and insane. But in November 2018 I finally plucked up the courage and removed it. The crazy thing is, until this article appeared on the top of Hacker News reminded me, I had completely forgotten that I had removed it. Far from the world ending, it turned out to be the most inconsequential thing imaginable.

(I remember pouring over web server logs in Analog and AWStats 15+ years ago. Now I honestly can’t remember why. I think it was some combination of vanity… and because everyone else was doing it. I suspect for most web developers GA was just the natural evolution of that muscle memory.)

GA and AWStats are both awful products for a lot of people. For us, we check out Fathom dashboard daily to see referrers and popular content. And vitality (right now we can see a ton of traffic coming from HN). When I used GA, I never checked it.

I’ve looked at many reporting tools, most of them are probably great for corporate/enterprise stuff.

I’m self-employed, so I have no boss or shareholders that need pretty reports with bar charts. In my case my site is deeply database driven and I can build engagement statistics directly from real data using complex SQL queries.

And while there’s only a few such ‘reports’ that I check regularly, most of them are temporally incongruous—I think that’s how you’d describe it—in that they look at what happened in the past contextualised by what’s known in the present. (E.g. tracking engagements from new/irregular users, while they were new/irregular users, but which subsequently became regular users.)

Well that’s a different story then. It sounds like you measure things your own way, so I agree that analytics are pointless in your edge case.

For us, we have generated a lot of revenue by measuring what works and what doesn’t. That’s why analytics are worth it for a lot of people.

Fathom being so quick to load and simple to use, I glance at it here and there. I had given up on trying to find a good ‘light’ way to navigate Analytics.

> GA and AWStats are both awful products for a lot of people.

Just wonder, what’s awful about AWStats?

Sure it’s dated, and “analog” in a way that it’s log-based, not JS. But it does not send the tracking to the third party, can be used offline.

I run fathom self hosted. I don’t love it and am looking for an alternative. But this is because they don’t update the self hosted version.

I get it that I’m asking for a free service, I just kind of wish they never offered it if they were going to ditch it. I don’t make money off my sites. I wish they had a less than X income version to self host. Oh well.

If you are making money and willing to pay I have a feeling Fathom is great.

You can check out https://usertrack.net, but it’s currently $99/life. I do plan to add a free version with some limited features (no heatmaps, no session recordings, no multi-domain, etc.), but the free version should be anyway better than most of the alternatives (fast interface, segments, quick user filtering, easy to install).

If you want, you can contact me on Twitter to tell me what the free version should contain for you to consider using it, or if the current price for the lifetime version is too high for you.

> No tracking

Personally, I think that Fathom strikes a good balance between privacy and usability, but it does still use tracking (or at least it did when I was looking at it a few weeks back) – the difference is that it uses fingerprinting instead of cookies. I think it’s implemented in a privacy-focused way, but it does look like they are ignoring some of the EU ePrivacy guidance, which explicitly states that consent should be obtained before using fingerprinting, even if PII can’t be reverse-engineered from the fingerprint.

As I say, I think their implementation makes a lot of sense, and even as a privacy advocate myself I think those particular pieces of ePrivacy guidance focused on fingerprinting is excessive. But the EU doesn’t seem to agree.

We’re not ignoring the guidance, it’s just such a grey area when it comes to PECR / ePrivacy. Even the ICO’s guidance, it talks about “cookie-like” technology. Our technology isn’t cookie-like. And our processing isn’t cookie-like either. We’ve had lawyers look at our documentation and all of them have said it’s a grey area.

You’ll know this but some people reading might not: Under GDPR, there are multiple legal bases for processing and we rely on legitimate interest. PECR / ePrivacy is the grey area for us and other services.

Having said all of this, we’re fortunately moving away from requiring any compliance at all… by avoiding the complexities all together. We’re rolling a refactor to our data collector over the next few weeks, and we won’t have to have these conversations about grey areas anymore 🙂 We’ve hired a top-tier privacy consultant and are going to be deploying a huge update, putting us at the top of the list for compliant analytics. Every single privacy-focused analytics service is in a grey area right now (some think they’re not but they are). We will be the first to move out of this GDPR / ePrivacy grey area dance.

As you say, you see the logic behind the implementation we had, but we’re dealing with politicians who don’t understand the difference between Google Analytics and privacy-focused analytics. And that’s fine, the work they’ve done has lead to better privacy for everyone, so we appreciate them.

> We’re not ignoring the guidance, it’s just such a grey area when it comes to PECR / ePrivacy. Even the ICO’s guidance, it talks about “cookie-like” technology. Our technology isn’t cookie-like. And our processing isn’t cookie-like either. We’ve had lawyers look at our documentation and all of them have said it’s a grey area.

That sounds like you are trying to pick and choose the bits you want to hear 🙂

There have been several ammendments since the original ePrivacy guidance. There is at least one such directive that is very explicit about fingerprinting specifically. If doesn’t use ambiguous language, it states clearly that consent is required for fingerprinting.

As I said, I personally think it’s just bonkers, and I think your service is absolutely in the spirit of the ePrivacy rules. But you can’t say the rules on fingerprinting are not clear.

I’m keen to see what you’ve got coming, as the only way I see to avoid consent is not to associate identifiers with users at all – so each page hit would be a completely independent object. Can you say anything about your plans here?

Like I say, we’ve had lawyers review our docs. Even the term “fingerprinting” has more nuance to it. Fingerprinting is used as a way to attempt to set a permanent cookie / identify an individual, and their actions. We don’t do this.

And we definitely agree that it’s bonkers.

I can’t say anything here until we’ve got our press release out.

> Like I say, we’ve had lawyers review our docs. Even the term “fingerprinting” has more nuance to it. Fingerprinting is used as a way to attempt to set a permanent cookie / identify an individual, and their actions. We don’t do this.

Ouch, I kind of wish you hadn’t said that, because it sounds like you’re straying dangerously close into weasel words and deliberately incorrectly interpretations. Sorry if that sounds harsh, but what I’ve read is very clear.

As before I like your solution, and I think it’s absolutely in the spirit of privacy. But the guidance is really clear here, and gives examples of fingerprinting. Nobody said a fingerprint has to be a permanent identifier; as far as I recall, Fathom does use fingerprinting to identify individuals, so that a sequence of page views can be attributed to a single visitor. I understand that those fingerprints include a timestamp, and so are only valid for some time (2 hours, or whatever it is).

Thanks for your input here, Gordon. It doesn’t sound harsh at all, you clearly care about privacy regulations and you’re trying to help. Ultimately, we had moved based on conversations with lawyers. But as I say, we are rolling out changes this week & next, so it doesn’t matter what we think about the regulation 🙂 And thanks again for the challenge.

Fathom started out with much fanfare as being Open Source, but as soon as their paid service gained enough traction they went proprietary only.

Can’t recommend a company which pulls crap like that. 🙁

Interesting take. For every star we had on GitHub we did 21 cents in monthly revenue for (around $1300). We needed to turn Fathom into a viable competitor to Google Analytics. We can’t do that if we are working a few hours a week on it, we needed to be full time.

Here’s a comparison I’ve always looked at. Here are 2 solid database tools:

Table Plus – https://tableplus.com/
SQLite Browser – https://sqlitebrowser.org/

Table Plus is about 3 years old, SQLite is 6 years old. I have used both but Table Plus is 10000x better and more popular. Everyone talks about it and they innovate fast. They work full time on it too, $59 a license. They are so active, and have so many great features. Full time salaries help with that. SQLite has $102 in Patreon support. Table Plus makes that in 2 sales.

Look at Sequel Pro. One of the best products in the game but it never built a sustainable business model and it wasn’t economically viable to continue, and look at it now.

I’m happy to say that we’re achieving our goal every day. Our goal is to build a sustainable, profitable business (profitable businesses don’t typically close down) that we can we can work full time on every day, moving people away from Google Analytics. For every 1 negative comment we get, hundreds / thousands of people signing up for Fathom.

We’ve spoken about it here too: https://usefathom.com/podcast/opensource

> For every star we had on GitHub we did 21 cents in monthly revenue …

Who gives a crap? You guys started out as OSS and said it would continue to be. That turned out to be bullcrap, just looking for exposure.

> I have used both but Table Plus is 10000x better and more popular.

sigh Table Plus supports many different types of database. If people are looking for a macOS specific solution that does that, Table Plus might be a decent choice.

“SQLite Browser” on the other hand is specific to a single database. Though is cross platform and fairly popular in it’s niche.

That being said, where are you getting “more popular” for Table Plus from though? Measured by what things?

> SQLite is 6 years old

Huh?

> SQLite has $102 in Patreon support.

You seem to be meaning DB Browser for SQLite.

Do we seem like a commercial project to you?

We recently added a Patreon link to our download page, after forgetting to put it out there much. The Patreon was originally created to cover our server costs (about US$75/mo from memory) some years ago, a goal we reached in 3 days.

That being said, we’re trying out various things for a SaaS model as well because – like yourselves – we recognise success there will help with sustainability.

But we’re sure as hell not going to bait and switch anyone.

> Look at Sequel Pro. … and look at it now.

Seems like a commercial product whose organisation went broke, released the software as OSS, and couldn’t get traction for that either?

People do appear to have picked up the project again recently, in an attempt to move it forwards:

https://github.com/sequelpro/sequelpro/issues/3679#issuecomm…

> Our goal is to…

Sure. You guys started out by bullshitting people. Unfortunately, the lack of integrity doesn’t seem to be directly hurting. 🙁

One can hope it’ll mean missed important opportunities for yourselves later, but who knows. You seem to be really leaning into the sales thing here on HN though.

> We’ve spoken about it here too …

With an established history of lying, who cares?

We are 2 indie developers trying to build a sustainable business. We are trying to compete against Google. The idea that the open-source package is responsible for our success is ludicrous. Our success came from thousands of hours of work. You’ll believe the narrative you choose. You’re entitled to that. We will always be “liars” to you because we chose to spend our time building up the paid service. You feel like we should open source all of our code, that’s fine. This conversation isn’t productive, and you seem spiteful, so I’m retiring. And I hadn’t realized that you were the creator of DB browser, that was just a recent example I’d been discussing with friends. Good luck with your chosen path. Please hug your family and take some time to relax.

I think the biggest peeve people have isn’t so much that Fathom v2/PRO is closed source, but rather the “we’ll open source it soon” kind of “dangling the carrot” stuff.

In the GitHub issue you mentioned “since people are confused [..]”, but people are confused because your communication has quite frankly not been very good. Even today just looking at the Fathom README it’s not at all clear that Fathom v2 (or “PRO”) is an entirely different codebase/product which merely shares the name and branding, and not much else, and it’s not clear that the “Lite” version is essentially unmaintained either.

We’re going to make our stance clear with our next release of Lite. We definitely made some mistakes with regards to V2. Originally we had planned to open-source it, but then we decided against it.

I’m still trying to understand why your current “production” (eg SaaS) version of Fathom can’t also be FOSS.

Other places (eg Redash) do so, and are doing extremely well.

The change from FOSS to non-FOSS, especially throwing away the goodwill you’d started out with… makes no sense. 🙁

Popularity of DB Browser for SQLite wise, you might want to check our stats page:

https://sqlitebrowser.org/stats/

We’re doing decently well, and our GitHub organisation has more active developers now than ever before. 🙂

Note that Fanthom had a change of ownership, and that the original author doesn’t seem entirely happy with the change of direction either.

Ahhh. I thought the original owner sold the project, but didn’t realise they didn’t have further involvement.

Is that info about the original author being unhappy somewhere public?

My company switched to Fathom from GA about 4 days ago.

We build privacy software so it felt slightly hypocritical to use a privacy-intrusive service like GA. So far so good.

I went from 0 to Fathom in under 20 mins and for our basic requirements it works really well .

Good job Fathom team 🙂

When I try to view your demo page I get ‘Secure Connection Failed’ every time. Firefox 77.0.1 on Windows 10.

So strange. I’ve run it through multiple checkers and all of them are valid. No other issues reported, just this one.

From the site:

> Our on-demand, auto-scaling servers will never slow your site down. Our tracker file is served via our super-fast CDN, with endpoints located around the world to ensure fast page loads.

This suggests that this solution is not self hosted. Is there a solution like this which is really self hosted? This service is one small change away from actually tracking.

Edit: Piwik/Matomo[1] appears to be the most mature one.
[1]: https://matomo.org/

Fathom Lite is self-hosted. Lots of people start off self-hosting but it’s typically useful for people with low traffic, or for people whose time is worth less than money, or even those who enjoy it. Because you have to maintain anything you self-host. We like to cater for both.

I am also buliding something similar: https://usertrack.net/

I think the main differences compared to Matomo is that it’s simpler (less features), but provides for much cheaper some of their premium features (heatmaps, session recordings).

Let me know if you have any questions about userTrack or any suggestions! 🙂

That’s the old project- they have decided not to open source the new one.

The open source project is barely maintained at this point- they update the readme and get the occasionally pull request, but it’s not really being developed.

I unfortunately switched to Fathom back when they were telling people they were committed to open source, so now I’m looking to migrate off to something a bit more trustworthy.

You’ve been saying this since last year. If I can help you migrate off of Fathom Lite to something else, please let me know.

Apparently this repo contains “Fathom Lite”, a (from a codebase perspective) unrelated predecessor of what is currently being sold as a SaaS.

And it seems that Fathom Lite misses one of the main selling points of Fathom – cookieless tracking.

SA is good, and Adrian prices their services responsibility. Fathom charges $24 / month as our 2nd tier and I do believe Adrian should offer a middle tier too. But I know nothing about his business behind the scenes, so I can’t comment. Ultimately, you can be confident that he prices his service to be sustainable, which we really respect.

I have zero use for your product but I just want to say that I love your website ! From the minimal design to the clear and concise copy, to the signup process. It’s all frictionless and smooth, you nailed it 🙂

That means a lot, thank you. We’ve been running Farhom since 2018, and we’ve put a lot of thought into the user experience 🙂

> “We’re not announcing anything just yet but it’ll be our best release to date”

not a knock on you or fathom, but it seems like you’re in fast-response sales mode here (which is totally fine)… the above is a particularly empty statement. why would any next release not be the best to date?

maybe say it should be an exciting release, which is similarly anticipatory without being meaningless sales-speak.

Good point, heh. We’re going to be improving speed of aggregation, real time dashboard, page / ref level metrics, more advanced goals and various other pieces.

thanks! in a background thread, i’m on the lookout for a privacy-focused analytics offering. it’s for small personal things for now, so leaning toward simple and free, but who knows what the future holds.

> Hadn’t heard of Plausible, maybe that’s the one for me!

Plausible is pretty good, found it useful to monitor traffic and usage for small projects.

Thanks for mentioning Simple Analytics [1]. We are at this point indeed only cloud based. We believe we need to make a business case/profit first before putting a lot of extra work in a open source version and maybe failing with the business. It’s a dream to make it open source, but not at this time.

We are very firm on our values. We will never sell your data. We have many ways to get your raw data out of our system (API, download links, …).

Our collection script [2] is open source and today we are also adding source maps to our public scripts. Open source does not guarantee that a business runs that same software as their cloud based option. We are looking into services that can validate what we collect on our servers. We never collect any IPs of personal data [3].

Great to see more products that care about privacy, I hope they will really care and commit to their values for a long time.

[1] https://simpleanalytics.com

[2] https://github.com/simpleanalytics/scripts

[3] https://docs.simpleanalytics.com/what-we-collect

What kind of server-side analytics are people using today, for personal blogs and things? Projects like GoAccess which eat an nginx log file and output some analytics seem like a nice middle ground for those of us who want some feedback on how people are using a website, without needing all the bells and whistles of something more like Google Analytics (not to mention the fact that it doesn’t need any Javascript loaded or anything). Personally I’ve found GoAccess pretty good, but the interface a little difficult to use and understand, so I’m looking for projects like it.

Server side was how it was always done back in the early days of the web, and analog[0] was state of the art.

Around 1999/2000 there was a rise of ISPs needing to install reverse proxy caches because the growth of consumer access meant they were getting seriously contended on upstream access. I was working at the time at a UK 0845 white label ISP called Telinco (was behind Connect Free, Totalise, Current Bun and other 0845 ISPs), and to my knowledge we were the first in the UK to install a Netapps cache. It was the moment we realised (by checking the logs to see if it was working), just how much porn our customers were accessing.

Those caches blow server side analytics to pieces, because frequently you wouldn’t even know the user had hit the page. What server side analytics was useful for is what we’d now call Observability: they gave reasonable Latency, Error Rate and Throughput metrics, which combined with some other system logs might also give you a sense of Saturation.

As such, they were not too useful for marketing. Google Analytics was the first product that allowed high fidelity analytics even if reverse proxy caches (and even browser caches), were all over the place.

And here we are. In a World where we are tightly surveilled by corporate entities in order to try and get us to click on things. Bit sad really.

I’d encourage people to think about what they need these analytics for.

If it’s marketing, you might just as well using GA: it’s the best product out there. We just need to lobby for better regulation (at least GDPR and cookie setting popovers give us choices on that regard now).

If you’re stroking your ego, consider whether such an invasive technology is worth the price, and if you need those numbers.

If you’re making sure your infrastructure can handle the traffic, use server side analytics alone. Parse your logs using the huge number of tools out there able to do that in near-realtime, and leave your users’ browsers free of tracking cookies and javascript.

[0] https://www.web42.com/analog/

Yes, but those are caches that you control, so when it comes to analytics you would get the logs off them too in order to get accurate metrics.

I’ve setup GoAccess for a client’s site, the problem is it doesn’t have a great HA solution.

You either ship all your logs to one place (and hope that place doesn’t go offline) or ship your logs to multiple places and hope both destinations are in sync. We’ve opted for #2 right now (hint: it’s not perfect) but it’s made me think about writing an alternative.

Rather than shipping all the logs all around, my plan is to have each source (i.e. web server) run a process on it’s own logs, and use something like Redis to store the aggregated statistics.

GoatCounter author here: yeah, that’s not perfect; this also came up in the HN discussion for the article a while ago[1]. I’ve been in touch with one of the authors of the EUPL since and the short of it is that they don’t really think it’s an issue.

I’ve thought about this for quite some time, and decided I’ll use a slightly modified version of the EUPL which removes GPL from the compatible license appendix. Just haven’t gotten around to that for no reason in particular.

[1]: https://news.ycombinator.com/item?id=21914245

how can it be both copyleft and non viral? isn’t vitality a definitive feature of copy left?

That’s interesting, wouldn’t that mean that the licenses themselves aren’t in some sense ‘legal’ in the EU? How does the GPL prevent this from invalidating down the whole license?

That’s surprising, considering the virality is part of the license text, not the law; does the law prohibit that clause?

That’s an interesting analysis.

That directive is usually understood to be about reverse engineering in order to build compatible software: “to obtain the necessary information to achieve the interoperability of an independently created program with other programs” being a key bit.

It’s not immediately clear to me – a programmer but not a lawyer – that this has any bearing on whether linking creates a derivative work.

Have any other experts, or courts, weighed in on whether this analysis is sound?

If you decide to migrate off GA, there’s very little reason to not use self-hosted analytics.

The only case when you’d get better analytics from a _service_ is exactly a GA-like setup that can track people as they go from one website to another. That is, the real value of an analytics service is derived directly from its ability to invade people privacy, at scale.

Granted, migrating to another service is usually simpler, but it offers NO insights into the traffic that you can’t get from parsing server logs and in-page pingbacks. You do however get a 3rd party dependency and a subscription fee.

Server logs only tell you about things that happen on your server. If you are using JavaScript it’s likely there are plenty of events that might be valuable to you that never leave a trace in your logs.

For example, if you validate forms with JS you might want to track form submissions and validation errors.

My reason is the server not being able to handle the traffic. We used Piwik but I couldn’t trust it’d be able to handle big eventual spikes of traffic (which the site itself could, being static and on a CDN) or that it wouldn’t slow the site down (if I remember correctly I had the option to call piwik asynchronously and not slow down the site, but at risk that it’d be less accurate if people closed the window / navigated to another page quickly.).

Of course you can run your own analytics on AWS or similar and have no issues with handling traffic, but that means higher costs / difficulty in setting up and maintaining it.

> that can track people as they go from one website to another

Note that even in Google Analytics, this requires extra set-up, has limitations, and tends to be pretty fragile in practice. GA identifies users by a first-party cookie and tracking cross-site visits requires decorating links with cookie values.

If you’re interested just in aggregate traffic from one of your sites to another, rather than something that requires full-path analysis (like marketing attribute), then you can get that from looking at referrers. This should be more-or-less equally available in GA and server logs.

> If you decide to migrate off GA, there’s very little reason to not use self-hosted analytics.

My personal domain[0] was taken by domain squatters (forgotten bill in debit card shuffle, bought up within seconds of expire) so for now I have to host on github.io. Thoughts on an analytics service?

[0]: http://www.aarontag.com/

>The only case when you’d get better analytics from a _service_ is exactly a GA-like setup that can track people as they go from one website to another.

I was once making a service that provided cross site widgets for companies to embed. Obviously it was beneficial to track people as they go from one website to another, but at that point it was beneficial to do it with our own service.

I’ve tried GoAccess in the past, but I remember the documentation not being very thorough on certain topics like the databse, websocket connection, or log syntax. So it was a bit of a pain to set up.

It also had some weird quirks like generating duplicate entries or randomly failing to parse some log lines (you seem to have quite a few of those “failed requests” yourelf by the way).

There also doesn’t seem to be a good way to display statistics for multiple virtual hosts. Even if you change your log format to include the host, you just get an additional table in the dashboard, but still can’t look at the other metrics for each host separately. You’d have to run multiple GoAccess instances to achieve that.

Yeah I definitely had to open some issues to understand how it works. I have multiple virtual servers as well and wasn’t able to get it to break out links by virtual server.

I figured it’s fine for my needs since I literally have nothing on my domains. I could see it being frustrating for power users.

Tangential but…

“Analytics” is rarely useful or unuseful because of the tool. These tools need to be treated as data collection, not reporting.

If your goal is to inform certain decisions, track success or identify problems… a spreadsheet (or napkin) is usually where that happens.

Say you do analysis systematically, make a list of questions and use your tools to answer them… usually you find that the tool itself doesn’t matter much, and GA doesn’t answer most of your questions out-of-the-box anyway.

Say you want a “funnel.” That usually consists of a handful of data points. GA usually doesn’t have them by default, without tinkering configuration, etc. Decide what they are beforehand. Understand them. Use GA (or whatever) to get the data.

Finding the tool for the job is much easier once you know what the job is. GA is extremely noisy, bombarding users with half-accurate, half-understood reports.

I’ve been pretty happy with Matomo (formerly Piwik), especially their non-cookie mode. But the interface is ugly, confusing, and makes finding information much more difficult than Google Analytics does.

Edit: One major thing I am unhappy with in Matomo is event tracking. GA makes it much easier (in my experience) to track conversions and events, and presents the data in a better way.

I found the Matomo interface to be a breath of fresh air compared to Google Analytics! As a non-power user, GA was too heavy and enterprise-like. Matomo is much cleaner, simpler, and more efficient for me to work with.

Hi Cenk, I have been building a tool[0] similar to Matomo, but the plan was to make the UI much simpler to use but also provide more premium functionalities (heatmaps, session recordings) for cheaper.

My idea was to focus everything on “segments”. So for all the data you can quickly create user segments and instantly filter the data to see only stats for the users you want, or compare stats between segments.

There is a public dashboard that you can check, I would love some feedback if you have the time 🙂

[0]: https://usertrack.net/

Off topic, but I used to run a website that had Google Analytics. This site and domain are now 100% down and have been for over a year.

I still get monthly emails from Google about the analytics for this website. Apparently it’s getting 200-300 visitors per month still. I have replied back to Google vie email about this several times but never heard any reply.
I wonder what site they are tracking?

It is quite possible to take a GA tracking code for one site and put it on another site. This has happened to me quite a lot where people have lifted content or copied code from my site. You can see the hostnames in GA (you have to dig for it), which could explain this.

Thanks for the tutorial. Looks interesting.

I’m a fan of GoAccess. Unfortunately the queries are pre-made and there are nearly no options whatsoever. You can’t (yet) filter by date for example.

One thing I realized is, that on small sites, like my blog, an overwhelming amount of traffic comes from search engines or bots which are looking for vulnerabilities. Filtering them out takes a lot of time in any self-hosted or self-made solution.

I use GoAccess too. If I want to filter by date (for example) I just run the log file through sed before feeding it to GoAccess.

I’m honestly curious, are all the analytics tools, which rely on making third party queries, still efficient with extensive use of adblocking these days?

If not, then logs of webservers are the only 100% reliable place (if available of course), so old-style tools like awstats, Webalizer, etc [1] should have a rise in popularity again.

[1] https://en.wikipedia.org/wiki/List_of_web_analytics_software

The issue is that people often use free reverse proxies like cloudflare, that do caching, so not all requests reach the original server.

In this case, the source of truth is cloudflare’s loadbalancer, but you have to pay them to get full analytics.

Good adblockers do DNS lookup to see if subdomain points to tracking server anyway.

Only completely self-hosted (your domain, your tracking server) solutions are resilient to adblocking

Interesting, then as a workaround trackers may create an image with pseudo-random src pointing to pageXXXwebsiteYYYclientZZZ.ad.yoursite.com (yoursite.com is the site which serves the actual content), while asking the owner to point NS record for ad.yoursite.com to IP of their DNS server. So HTTP/S request or DNS request can reach them anyway. Obviously DNS caching will prevent them from knowing how many times this particular page was accessed by this particular client, but at least they will know that it was accessed at least once.

You can set up a proxy, which isn’t too hard (and operating a proxy is a lot simpler than operating an analytics service)

I really hope an analytic genius can come up with a technique (like differential privacy, but I’m no expert here) that would give advertisers what they want (unique visitor counts, and very few other metrics) to place ads on sites, yet doesn’t give away too much privacy, nor leads to enslavement under a single central entity. I guess if something like that doesn’t come along, then only old school content-based ads (site sponsoring) without any tracking can be considered ethical (or no-ads of course). The argument against content-based ads was always that it doesn’t suffice to finance even web hosting let alone content production. But with ad prices going to the bottom, I wonder if the figures still add up in favour of targetted ads today.

Snowplow is also an option. It’s an open-source data collection solution that, unlike GA, gives you full ownership of your event-level data and the freedom to define your own data structures. Not exactly what you’d call ‘lightweight’ but quite a few Snowplow users/customers have come from GA for the level of flexibility and control they can have over their data sets.

(Full disclosure: I work for Snowplow Analytics)

– https://github.com/snowplow/snowplow

– https://snowplowanalytics.com/

I’ve setup the Snowplow collector and tracker on some of my sites because that part is very straightforward (and the tutorial on the wiki is great), but I’ve never gotten past those steps to analyse the data collected.

Is there a highly-opinionated tutorial that shows how one can get some vanity metrics out from Snowplow?

I still can’t see any solid reasons why a site owner would not use GA.

Other products:

– Objectively lack features

– Potentially incur extra costs in money/time

– May be a small barrier in m&a

– May carry additional risks/attack vectors if self hosted

Trying to ween off big tech is commendable, but likely detrimental to a business.

Relatively high risk, low reward.

I’m happy to have my mind changed. I can see a case for user hostility, but most sites I imagine don’t have an audience sensitive to this at the moment anyway.

From an idealogical standpoint, other cloud stat tracking services would only function if not many people used them. And I would also imagine feature creep would be inevitable and lead them to becoming an inferior version of GA.

Some issues with GA version going self-hosted:
– Privacy of your users: For a specific user, Google knows all the website he visits
– Privacy your data: If Google knows the visitors of most websites, your competitors can leverage that advantage (using Google Ads for example) to steal your potential customers.
– Google Analytics is bloated and slow (both in terms of the tracking script and the dashboard UI, where it takes several seconds for each graph/page to load).
– You don’t own your data, at any point Google can, even though unlikely to, block your account (for breaking ToS of some other service of theirs) and you lose all your data.
– If everyone uses GA, it will become (already is) an analytics monopoly, which has many other drawbacks (lack of innovation for example).

I do think that for the average user, using GA might be fine because it’s free, easy to set-up and does its job. That is unless they care about all the possible consequences.

GA alternatives are a fairly new thing. When I looked at this in May last year there was essentially only one alternative: Matomo. It seems some sort of “critical mass” has been reached, and in the last year quite a few people have independently started working on alternatives.

I agree for many features are still lacking, but as a counter-argument 1) not everyone needs those features (not every product needs to solve 100% of the use cases), and 2) a lot of these products are still quite new, and are actively working on adding a number of those features.

If your site actively competes with a Google product, you might not want to give them access to your user data.

GDPR compliance.

GDPR is the European privacy law. It protects European citizens so it applies not only to European companies but any company that does business in Europe (having offices or advertising/selling there).

Google does not give much assurance regarding their GDPR compliance… their text on that subject is mostly CYA and then they make it your responsibility to decide how to use it in compliance (if at all possible).

The GDPR gives you a small window to count visitors through cookies as long as all private information (even IP) is anonymized… OR you can go do a more traditional tracking with their explicit agreement. This last use case is completely useless in terms of visitor statistics, but analytics companies sometimes dare suggest it (as in “this is the way to do things right… so our product is compliant and it’s not our responsibility if you break the law”).

That aside, I run international non-profit sites and GA is a bad look… and with good reason: Using social network sharing buttons, GA, CDNs, etc. gives too much power to track people to a few companies.

> GA is a bad look

Totally agree, but are there “acceptable” CDNs, like unpkg? What about Google Fonts?

> The GDPR gives you a small window to count visitors through cookies as long as all private information (even IP) is anonymized.

If it’s not too much to ask, could you expand on that a bit, or share a link? I guess you mean it’s ok to send a browser fingerprint for unique visitor stats without having to ask for permission, but I’m not aware of any legal debate let alone court decision with respect to that.

Edit: obviously I can’t read (“through cookies”), but cookies for unique visitor counts aren’t “functional” are they, so my interpretation is that those cookies need consent; I’d love to hear otherwise though

>GDPR is the European privacy law. It protects European citizens

The GDPR does not discriminate based on citizenship.

It applies if the organisation providing the service is in the EU/EEAOR if the user of the service is in the EU/EEA(to the extent that the data reference their activity in the EU/EEA).

And the UK, but thanks to brexit there’s a parallel UK GDPR in place so… take that in to account.

Unfortunately common on projects like these. Instead of guiding admins on how to properly configure SELinux it’s easiest to just throw your hands up and say “disable it”.

Some of my friends switched from GA to Countly. They are very satisfied, and I am thinking of using it in my next project.

Yeah, ahoy is pretty awesome! In fact everything by ankane is inkanely great – I have no idea how he manages to be so productive….

Is nice to see Plausible gaining traction. Here is an blog post [1] about how they were asked for using it on site with tens or hundreds of million page view.

I am wondering if HN is interested in hosting analytics like plausible that is open for us to see. Sometimes I do wonder how many page view do HN get per day, where are we all from etc. For example the plausible demo site. 35% are using macOS. But only 15% uses Safari.

[1] https://plausible.io/blog/april-2020-recap

It’s great to see more alternatives to GA, and to see those alternatives getting attention.

For those interested, one other FOSS analytics tool is Shynet [0]. Modern, privacy-friendly, and detailed web analytics that works without cookies or JS. It also looks pretty slick. Disclosure: I’m a maintainer.

[0] https://github.com/milesmcc/shynet

I was recently looking for a good tool that supports both web site analytics and app analytics (custom events, typically pushed by SPAs). I looked at GA, Amplitude and finally Matomo (which I ended up with). GA and Amplitude either did not offer or made it hard to work down to the micro level, essentially tracking known individual users down to the singular event level. Matomo makes this easy, although it certainly looks a bit dated compared to the competition. And the free parts are somewhat limited (you need to buy stuff or hosting).

I would have though that there would be several decent packages offering www + app analytics by now, but as I wrote, options were quite limited. Some of the options mentioned in the subject here looks like good options for just website analytics, but I’m not seeing much as far as “app analytics” (custom events) goes.

Thanks for the tip. One of Heap’s selling points seems to be that tracking events “manually” is over, everything is automatic. That might work if all “work” is defined as “stuff users do”. For other types of “work” (calculation pipelines, job delegation etc) I’m sure being able to “micro manage” events can be useful. But sure, my use case might be different.

Less companies are focusing on user level tracking because one single user is not a meaningful statistical group.

Companies focusing on user level tracking today provide a different set of tools one might be used to and that can of course be compliant, see https://www.hotjar.com/.

And I’m sure that makes sense if you have lots of users and low revenue per customer. If your use case is the opposite, tracking individual usage becomes more important. At least until you have lots of those users. After that, who cares! 😛

We launched https://everytwoyears.org today. It was my first project where I felt analytics was necessary, but also a moral quandary. For personal reasons, I’m very against PII big data. For project reasons, the project is literally about stopping mass surveillance so shipping a tool like Google Analytics was firmly off the table.

I went with https://app.usefathom.com which tracks _aggregate anonymized_ data.

They have the option to self host, but I’m sending them money to support the project. With today’s launch, I’m really happy with the product. Will continue using it.

We really appreciate people like you, thank you. And on a personal level, I respect your “BREAKING THROUGH TWO PARTY POLITICS” tagline.

I can also add mine, even though more complex, it’s still lightweight: https://usertrack.net

I tried bringing together the most useful analytics features (user segments, heatmaps, session recordings, tags/events) in a self-hosted platform with simple UI. A/B testing feature is also coming soon. I built the platform with the optimal use-case being improving conversion rates on landing pages.

My goal now is to prove and teach (even to non-technical users) that self-hosting is easy nowadays when you can create a VPS running your desired software in just a few clicks.

I would love to hear some criticism or why you wouldn’t want to try something like this.

I’d like to ditch google analytics for a new small side project I’m building. I live in a small country in Europe and for me the most important feature of these alternatives is the cookieless tracking and the lightweight scripts. However, the pricing is too steep for a project that won’t gain more than thousands users.

Fathom analytics and simple analytics cost ~100$/year.

Plausible costs ~50$

I really liked and almost settled with plausible but I just saw goatcounter right now. It’s free for personal / open source projects. That’s so nice for small projects like many people here are building.

You should check out Matomo (formerly Piwik), which is a free self-hosted GA alternative. I’m very satisfied with it so far. It disables tracking by default – I think, but this is configurable in the Matomo dashboard.

I have been using GA for my side projects but have been unhappy with the Google’s direction on privacy, so started researching others. There are just so many: I think a lot of developers (including myself) think it is easy to do & start rolling their own & then try to productize it.

Here is my research: https://til.marcuse.info/webmaster/alt-analytics.html

I ended up going with GoatCounter.

Maybe stop spying on people?

EU be like you have to put up this banner to tell them you are spying… it super annoying and everyone hates it… and you be like “sure, I love me some spying”.

How do you define ‘spying’? It’s near impossible to invent or discover almost anything, without something tangible to observe.

I suppose the purpose of why certain collections of data are put together is where all the anger rightfully comes from, but calling for a complete armistice on all (even innocuous) forms of data collection is a little churlish.

You do it in microcosm too, you know. How many pictures do you have on your phone, that contain people you don’t know?

Counting how many people come in to your store and how many of them actually but something is not “spying”.

You don’t need anything, but it’s a useful way to do it without any significant drawbacks. In many cases, it’s more or less the only option.

It is a significant drawback!
It is literally not the only option to the situation you described.

Spying is the only solution for spying, yes.
Spying is the only solution to over-eager people wanting mostly useless metrics.

It’s not much different than me checking my Twitter “likes” constantly.

Has anyone used any of these with a proxy to avoid ad blocker blocking? What I mean is, I installed matomo and then saw my ad blocker blocked it. Is there a way to make any of these work by proxying through the same domain as the site, so those analytics requests look just like all other ajax requests?

I was surprised matomo wasn’t listed here. Does anyone know if that was intentional? Seems like it fits the criteria of the post and the goals of open source.

> I was surprised matomo wasn’t listed here. Does anyone know if that was intentional? Seems like it fits the criteria of the post and the goals of open source.

Yes, this was intentional as this article was focused on “light-weight” analytics. I think they’ll do an article about Matomo at some point in the future as well.

Another shameless plug, if you are just using it for finding out where visitors are coming from and page hits, I wrote https://geo-yak.com for that. Doubles as an ip geolocation API.

I’m putting the finishing touches on a `tag=XXX` parameter that allows you to record a tag (like a pageid), and then filter the maps by it (not publicly documented yet, but will be in the next couple weeks).

I switched to GoatCounter for my personal blog and it’s more than capable. All I want is pageviews with timestamps per page and referrer info.

A lot of people may not care about this, but Google Analytics (and another 3rd party, hosted analytics platforms) are very important when trying to sell your website. Basically, it allows the buyer to access reliable historical data about your website which in turn makes it easier to arrive at a valuation.

Can I ask why any private website even needs analytics? I just don’t see the point. Is it just vanity, or making sure anyone actually reads the articles/content, or trying to adapt to audience preferences/search queries/…?

I get the point for any commercial venture but for most sides I simply don’t see the added value. If you want to know whether people like your work add a comment section or a newsletter signup – why do you need to spy on your users with intrusive tools, send their data around the globe to kraken like google, just to have a few statistics?

What do you mean by private?

I assume many people have sites for their small businesses. Imagine you have a restaurant, you want to be able to know how many people reach your site, how they reach it, why they don’t contact you (do they leave after seeing the menu? or the opening times? or the location? or photos?), are your contact forms properly working, is your website loading fast enough, etc.

Here’s the most lightweight alternative to Google Analytics:

Don’t use analytics. You really don’t need it. No. You really don’t. No, No. I promise you. Just stop.

All tracking is evil. All ads (except those inside a store for a product inside the same store) are evil.

Didn’t know about GoatCounter. I think its the only free hosted alternative that I saw.

Having a small static blog hosted on GithubPages, GA was the only option for me. (Not going to pay for analytics while my blog has like, 10 visits a week)

I just installed goatcounter on my githubPages Jekyll page. 2 minutes of work, works great.

Similar experience. I switched my blog off of GA to Goat. I don’t think I’ll switch back. I prefer how simple Goat is.

For work though, we use GA and I can’t really imagine switching. We actually use the event stuff, so it would be hard to switch away.

I briefly tried Matomo, but didn’t want the Javascript component and really just wanted log analysis. It’s okay at log analysis, but it doesn’t really shine unless you do javascript live tracking.

So I disabled it and went back to awstats. I’ve been using awstats for over a decade, and for my personal site and projects, it pretty much gives me the majority of the data I really care about.

I might look at shipping more complex nginx json logs to logstash/elastic search, but then I’d need to visualize them in Kibana and that just seems like a lot of heavy weight containers to run for stats I don’t really need.

Google Analytics has a “bot filtering” option that works pretty well (even though it’s not perfect). Do the alternatives also have similar features? There is a lot of automated traffic on the internet.

It looks like GoatCounter does; it uses a library, https://github.com/zgoat/isbot by the same author to detect bots.

Detecting and deterring spam and sketchy behaviour while using open source software could be an interesting technical problem area.

Matomo has the same, works pretty well. Via plugin they can also track bots to show that separately but they’re filtered by default.

I’m not sure if it’s actually filtered – I think they’re just tracked and classified. At least that’s what I’m seeing in my instance.

Of course. Fathom filters bots on the client side and then looks for bot signs on the server. If it looks questionable, it doesn’t process it.

I actually use GoAccess on my personal site, which runs on GitHub Pages. Obviously you don’t get the actual web server logs (only GitHub has those), but I have a few lines of JavaScript that hits a 1×1 image on CloudFront with the page and referrer information, and then I download the CloudFront logs and use those with GoAccess. Read more here: https://benhoyt.com/writings/replacing-google-analytics/