Google Safe Browsing missed 84% of confirmed phishing sites

Google Safe Browsing missed 84% of confirmed phishing sites

290

by jdup7

lich_king

I don't understand the metric they're using. Which is maybe to be expected of an article that looks LLM-written. But they started with ~250 URLs; that's a weirdly small sample. I'm sure there are tens of thousands malicious websites cropping up monthly. And I bet that Safe Browsing flags more than 16% of that?

So how did they narrow it down to that small number? Why these sites specifically?... what's the false positive / negative rate of both approaches? What's even going on?

john_strinlai

>what's the false positive / negative rate of both approaches

the false positive rate is 100%. they just say everything is phishing:

"When we ran the full dataset through the deep scan, it caught every single confirmed phishing site with zero false negatives. The tradeoff is that it flagged all 9 of the legitimate sites in our dataset as suspicious, which is worth it when you're actively investigating a link you don't trust."

eurleif

21h

A very long time ago, I had the idea to set up a joke site advertising "SpamZero, the world's best spam filter", with a bunch of hype about how it never, ever misses spam. When you clicked the download link, the joke would be revealed: you would get a file consisting of `function isSpam(msg) { return true; }`.

Apparently that's not a joke anymore?!

lorenzoguerra

it's 100% for what they call "deep scan", it's 66.7% for the "automatic scan". Practically unusable anyway

jdup7

Probably could have been a bit more descriptive around the dataset. Our tooling pulls in a lot more than 250 URLs but since we are manually confirming them that means a smaller dataset. In other words, out of the urls we pulled in these 250 were confirmed (by a human) as phishing. We did not do any selection beyond that. As for the article LLMs were used to help with the graphs and grammatical checks but that's it. This was our first month of going through this exercise and we definitely want to have larger datasets going forward as we expand capacity for review.

As for Safe Browsing catching more than 16% it depends on the timeline at the time these attacks are launched it's likely Safe Browsing catches closer to 0% but as the time goes on that number definitely climbs.

mholt

I never loved the idea of GSB or centralized blocklists in general due to the consequences of being wrong, or the implications for censorship.

So for my masters' thesis about 6-7 years ago now (sheesh) I proposed some alternative, privacy-preserving methods to help keep users safe with their web browsers: https://scholarsarchive.byu.edu/etd/7403/

I think Chrome adopted one or two of the ideas. Nowadays the methods might need to be updated especially in a world of LLMs, but regardless, my hope was/is that the industry will refine some of these approaches and ship them.

notepad0x90

Block lists will always be used for one reason or another, in this case these are verified malicious sites, there is no subjective analysis element in the equation that could be misconstrued as censorship. But even if there was, censorship implies a right to speech, in this case Google has the right to restrict the speech of it's users if it so wishes, matter of fact, through extensions there are many that do censor their users using Chrome.

rstupek

I know for a fact that GSB contains non-malicious sites in its dataset.

notepad0x90

22h

It is possible for sure. what's your point? spamhaus does too with IPs, abuse.ch does too, every enterprise firewall's reputation list does too. that's the whole point of reputation, if it was reliable 100% it wouldn't be "reputation".

rstupek

21h

You claimed they all are malicious sites or they wouldn’t be included but that’s factually incorrect

notepad0x90

13h

I assumed a human review is always in place, if not then you're right and I was wrong.

like_any_other

> censorship implies a right to speech, in this case Google has the right to restrict the speech of it's users

I don't follow. Even if Google does have the legal right [1], that does not make the censorship less problematic, or morally right. And even if it's hard to make a legislative fix ("You want to ban companies from trying to protect their users from phishing?") [2], that doesn't undo the problems of the current state, or mean we should be silent about it.

[1] This is far from certain, as it could be argued to be tortious interference, abuse of market power, defamation if they call something phishing when it's not.. Then there's the question of jurisdiction..

[2] It's a very common debating tactic to assert that a solution is difficult, to avoid admitting a problem exists.

notepad0x90

22h

Certainly they have the legal right as you pointed out. Freedom of speech is a legal right not a moral prerogative or entitlement.

HN bans users that violate its rules for example. If I were to insult you severely, HN mods have every right to protect you from my speech and censor me by deleting my message and banning me. The threats posed by these malicious sites are far worse than insults on a forum.

Companies like Google are expected by the public and governments alike to protect their users. they would even be entitled to requiring every site a user visits requires an EV cert and age verification enabled if they want. it isn't just their legal right, everyone, not just corporations, has the right to pursue what they feel is the correct way of doing things. Their responsiblity is to their investors first, users second, governmental regimes third and everyone else after that. Your presumed entitlement here is as everyone else.

For #2, I don't recall claiming a solution being difficult (unless you thought banning companies from protecting their users, was somehow a thing I was saying should be done). Matter of fact, I am near incensed that HN users are utterly and shamefully ignorant on harm users suffer. You should be ashamed of your ignorance. Not only this but I've had long debates on HN on similar lines when it came to topics like the play store require developer authentication. It almost makes me wish your freedom of speech was entirely taken away from you so you can have some understanding of the suffering people undergo, and what such measures are trying to prevent. Freedom of speech has never been a right obtained at the expense of harm to others. The moment someone is harmed, you lose your freedom of speech, that is the case in a public arena where such laws exist, but even more so under private platforms. But i did say almost! I think you're just used to problems being of a technical nature, where as in this case it is a human threat (crime) problem.

Furthermore, I am constantly disappointed at the sheer dereliction of duty exhibited by HNers when it comes to security. Your product must protect your users by default, there is absolutely no acceptable amount of harm users should experience for the sake of non-users. Site owners have no entitlements to browsers, they only have privileges. Browsers can and do absolutely play gatekeepers to websites.

As far as #1, I have argued tortious interference about Google's practices myself before. I am not a lawyer, so I don't know if this qualifies or not, but can I also claim tortious interference if HN bans me, if I miss out on HN job posts or exposure to the startup scene? can I claim defamation for being banned on HN wrongfully? is HN abusing it's market power because of the sheer number of silicon valley types that aggregate on this site? And I suspect you're not a lawyer either, because jurisdiction is a concept that applies to a judicial body (hence: juris), Google is not a judicial body, and they're not handing out a judicial sentence.

I wonder, are you aware of the CA/B forum? hmm..

The fact is, a browser is a software used to access network resources. Part of that feature set, as advertised explicitly to users, is that it will make attempts to keep their access to the network safe and secure. In other words, all of your claims of entitlement are nullified by the simple fact that the "censorship" is an advertised feature, one that not only most browser users use, but it is an opt-out-able optional feature. Not only that, there is always an option to click through the safebrowsing warning and visit the site anyways.

Both from a moral and legal perspective, I challenge you to make yourself liable to all damages people suffer as a result of not having safebrowsing enabled. Insure them free of charge. Next thing I know you'll be claiming enterprise networks shouldn't "censor" either, or better yet, they can but people who can't afford multi-million-dollar firewalls shouldn't be protected for the sake of access you feel entitled to.

As far as libel, simply being incorrect doesn't make it libel, it needs to be intentional. so long as they can back up the reasonable cause of your site making it on their list, it isn't libel. Just same as your IP can land on their lists and gmail will refuse to accept email from you (just same as every public email provider).

Freedom of speech is not freedom of access, both morally and legally. You dilute actual freedoms when you try to abuse them to gain advantages like this. It is important to understand when being able to do something is a right versus a privilege. It is also important to solve the root cause of problems, even though I disagree with you on this topic, Google's monopoly is a big problem, as is Microsoft's and other companies, but your solution being "there shouldn't be a solution" is (I'd dare say) morally objectionable and abhorrent considering the types of harm people suffer as a result. Perhaps appeals to block lists could have a more legally regulated process? But there are more pressing issues like payment processors banning merchants and users alike all the same (worse than browsers than site in terms of impact?), and not a single government would dare claim that is out of line, let along regulate it. The right of companies to do business how they want is highly protected in free market economies, and something like Chrome isn't even a paid product or service to where you can have a commercial or contractual claim over it.

Since this is a long comment, I'll add this finally to it: If you seriously think Google cant' block arbitrary sites on its free software and service, then by that logic users should also have entitlements for bans on sites like HN, and even on things like your open source project, you can't just not accept pull requests or ignore them, if it is affecting a user and they're relying on it, your features preventing them from doing things is tortious interference. claiming negative things about pull requests is libel.

like_any_other

12h

> Freedom of speech is a legal right not a moral prerogative or entitlement.

No, the 1st amendment of the US constitution is a legal right. Free speech in general is a much broader concept, not limited to its legal implementation.

> For #2, I don't recall claiming a solution being difficult

You didn't, but it is how these discussions usually develop, and I thought of saving some time. And indeed that's how it went.

notepad0x90

10h

it did not go that way, the problem is not difficulty of doing anything, a private corporation offering a free product has the right to do whatever it wants with that product. their reasoning behind GSB is not for you to debate.

Free speech in general is a legal concept. rights in general are not moral concepts, when you say you have a right to do something, it is always in the context of a rules based framework. When you say something is right (same word, different meaning) or wrong, that is morality. Speech can be right or wrong. prohibiting someone from speaking can also be right or wrong, but it isn't called "freedom of speech" or "censorship". If you can't articulate why something is morally wrong without referring to a right under some rule based framework, then you're not talking about morality, you're talking about not liking some rule.

When you are in someone's house, they have the right to decide what you can talk about or not talk about, because it is their home and your presence there is a privilege. Replace home with business, and then replace business with a free product that you're not even paying for and that's this situation.

"I don't like it" is not a moral reasoning. You need to be able to articulate why something is immoral if you're going to use morality as a reason. Similarly, you need to explain what specific laws grant you an entitlement if you feel like a legal entitlement is violated.

like_any_other

> Replace home with business, and then replace business with a free product that you're not even paying for and that's this situation.

And then replace business with country and society that enables that business' existence, and in whose sovereign land that business is located (i.e. in whose house it is), and that's still this situation.

> Free speech in general is a legal concept.

So if someone says "free speech", you just have no idea whatsoever what they're talking about, until they also tell you which country/jurisdiction they're talking about, do you?

And I didn't make a moral argument - I said that there is a moral (not just legal) argument to be made. I don't have the time or inclination to walk you through why free expression is desirable, or why letting a handful of giant entities crush speech and smaller businesses is undesirable. If you need that explained to you, I don't think we'll see eye to eye no matter how long we debate.

notepad0x90

> And then replace business with country and society that enables that business' existence, and in whose sovereign land that business is located (i.e. in whose house it is), and that's still this situation.

Yes, so it is a legal construct then? countries and societies generally exist under the rule of law. In the US, both legally and socially, we've decided to accept a free-market capitalist way. Under that social agreement, both individuals and companies have certain rights and entitlements over their products and services.

Under a more universal moral regime, if you have a good reason to believe someone might come in harms way, you have an obligation to do something about it so long as it is within your means to do. Preventing others from coming into harm supersedes the presumed entitlements of third parties. In this case, Google is nice enough to let users disable GSB or bypass GSB warnings. When a certificate for a website expires for example, similar to GSB every browser shows a warning. almost every single time, the site isn't compromised and there is no MITM attack happening, but we accept that is the best course of action, I don't see you protesting that because you understand it is the right thing to do. But in this case you just don't like GSB and you're looking for some moral ground to stand on because no other ground will let you.

> So if someone says "free speech", you just have no idea whatsoever what they're talking about, until they also tell you which country/jurisdiction they're talking about, do you?

You just said it isn't a legal concept, so why does that matter? But context does matter, in this case we're on a US based website talking about a US based company.

> why free expression is desirable, or why letting a handful of giant entities crush speech and smaller businesses is undesirable.

aha! you don't need to walk me through anything, but I think you confuse what is desirable and undesirable with what is moral and immoral. for desirable and undesirable, you use the law to enact your preferences. your desires however have no bearing on morality.

I don't think we'll see eye to eye either, but because I suspect our understand of morality and the rule of law is not aligned.

dvh

Just yesterday I marked another Gmail phishing scam. This wouldn't be worth mentioning but they are using Google's own service for it. It has to be intentional, there is no other explanation. https://news.ycombinator.com/item?id=46665414

inemesitaffia

22h

Seen similar with CloudFlare

obblekk

Maybe I’m an outlier but I’d rather this than accidentally block legit sites.

Otherwise this becomes just another tool for Google to wall in the subset of the internet they like.

timnetworks

The most dangerous links recently have been from sharepoint.com, dropbox.com, etc. and nobody is going to block those.

Comment was deleted :(

pothamk

One thing that often gets overlooked in these comparisons is distribution latency.

Detecting a phishing domain internally is one problem, but pushing a verified block to billions of browsers worldwide is a completely different operational challenge.

Systems like Safe Browsing have to worry about propagation time, cache layers, update intervals, and the risk of pushing a false positive globally. A specialized vendor can update instantly for a much smaller customer base.

That difference alone can easily look like a “miss” in snapshot-style measurements.

tadfisher

If you are not a bot, I suggest changing your voice so that you are distinguishable from one. You're not wrong, just like you weren't wrong about "one thing that trips people up about asyncio" yesterday, but I noticed the slop-speak immediately. I'm sure others have as well.

niwtsol

Multiple comments that start w/ "what's interesting about" by this user and very similar formatting kind of answers that question on human vs bot. Weird internet we live in these days.

virken

I'm all for stopping phishing - and the tool sounds great - but I have to say the Web Store Extension listing is very concerning - even with a new company/offering - there's only 4 users - and 1 rating (a 5 of course) - I'd like to try - but seems phishy :-(

luckman212

Is it possible to disable Safe Browsing AND also not have to manually click to confirm that "yes, I actually do want to keep the file I just downloaded, thank you" every. single. time.

yogorenapan

The answer you probably don't want: Linux + Firefox. Or just copy link and wget.

burnte

No. I keep checking every year.

supermatt

> When we ran the full dataset through the deep scan, it caught every single confirmed phishing site with zero false negatives. The tradeoff is that it flagged all 9 of the legitimate sites in our dataset as suspicious

Huh? Does this mean it just flagged everything as suspicious?

john_strinlai

indeed... it seems like it just says everything is phishing... which they go on to say is desirable?

"The tradeoff is that it flagged all 9 of the legitimate sites in our dataset as suspicious, which is worth it when you're actively investigating a link you don't trust."

so, you dont really need the scanning product at all. if you just assume every website is a phishing website, you will have the same performance as the scanner!

jdup7

Yeah probably could have done better at describing the methodology. The dataset is just the confirmed (manually by a human) phishing urls. We only included the FPs to show that the tooling isn't perfect there were many TNs that we did not include. Going forward we could definitely frame these results better.

badgersnake

lol, return false;

hedora

So, the false negative rate was 84%, but what was the false positive rate?

They have a table "AUTOMATIC SCAN RESULTS (263 URLS)" that sort of presents this information. Of the 9 sites that were negatives, they say they incorrectly flagged 6 as phishing.

With a false positive rate of 66%, it's not surprising they were able to drive down their false negative rate. Also, the test set of 254 phishing sites with 9 legitimate ones is a strange choice.

(Or maybe they need to work on how they present data in tables; tl;dr the supporting text.)

decimalenough

The false positive rate was 66% for "automatic scan" and 100% (!) for "deep scan".

In other words, you can get these numbers if your deep scan filter is isSuspicious() { return true; }.

dataflow

I think there might be a confusion here? The 100% seems like the true positive rate (correct detection), not the false positive rate?

decimalenough

Nope, 9 of 9 legit sites were incorrectly flagged:

> The tradeoff is that it flagged all 9 of the legitimate sites in our dataset as suspicious

dataflow

21h

Sorry, I think I had my wires crossed somewhere. Yeah, I see now. That's crazy/hilarious.

antonvs

Brb, applying for YC funding for my new AI-based phishing detection system.

(‘return true’ is just a very optimized neural network after all!)

candiddevmike

I'm getting some kind of chrome security warning when using zscaler now. Discussing all of this with non-techies, I think folks are overwhelmed by all of the security warnings they get and have stopped paying attention to them.

So what's the point of doing all of this if there isn't some kind of corresponding education on responsible computer use? There needs to be some personal responsibility here, you can't protect people against everything.

lorenzoguerra

>We also ran the full dataset of 263 URLs (254 phishing, 9 confirmed legitimate) through Muninn's automatic scan. This is the scan that runs on every page you visit without any action on your part. On its own, the automatic scan correctly identified 238 of the 254 phishing sites and only incorrectly flagged 6 legitimate pages.

...so it has a false positive rate of 67%? On a ridiculously small dataset?

jdup7

Fair point in isolation that number doesn't look good. The important context is that this dataset was built to test phishing detection, not to measure false positive rates on normal traffic. It's sourced from our threat intelligence tooling so it's almost entirely malicious URLs by design. The 9 clean sites aren't a random sample of everyday browsing. They're sites that were submitted as suspicious and turned out to be legitimate so they're basically the hardest possible set of clean pages to correctly classify. This seems like a common critique and we definitely could have done a better job of explaining the methodology. Going forward we will include numbers from daily use to give a better picture of FP rate.

kopollo

Let me give you a simple detection algorithm. Apply OCR to the screenshot because they often use logos. Also, parse the text from the HTML and compare it to the URL. You can catch a lot of spam this way.You can also examine many parameters in the js html code.

virken

I'm all for stopping phishing - and the tool sounds great - but the Web Store Extension listing is very concerning - even with a new company/offering - there's only 4 users - and 1 rating (a 5 of course) - I'd like to try - but seems phishy :-(

kemotep

Default deny and only permitting what you explicitly allow stops 90% of this in a corporate environment.

You don’t just leave all your ports open on the firewall and only close the ones exploited. You default deny and only allow the bare minimum you need in.

xnx

Why should I trust that "Norn Labs" knows what is and is not a phishing site?

nico

On a tangent - gmail has a feature to report phishing emails, but it seems like it’s only available on the website. Their mobile app doesn’t seem to have the option (same with “mark as unread”). Is it hidden or just not available?

bradyd

The mobile app definitely has mark as unread. It's the envelope icon next to the trashcan (the exact same icon as in the web interface). Never realized there was a report phishing option. I just mark those emails as spam, which is available in the app.

PunchyHamster

They put them directly in front of search results, why would they not miss them ?

dsr_

Almost all email phishing attempts we receive come from GMail.

passwordoops

Anecdotal and loosely related, but I can say since Gemini was forced into Gmail, much more obvious SPAM passes the filter

itvision

Criminals can easily show Google crawlers "good" websites.

The fact that Safe Browsing even works is already good enough.

thayne

It would be interesting to see how many of the sites safe browsing does block are false positives.

caaqil

Yeah, maybe let's change the title to remove that 84% rate. It's meaningless because it's just 254 websites, given the scale of what Google Safe Browsing deals with.

How is this serious? This is a marketing slop. If the title isn't enough indicator, the ending should be:

> If you're interested in trying Muninn, it's available as a Chrome extension. We're in an early phase and would genuinely appreciate feedback from anyone willing to give it a shot. And if you run across phishing in the wild, consider submitting it to Yggdrasil so the data can help protect others.

sirpilade

But hits 100% of browsing tracking

blell

Educate yourself on how it works before you say something like this.

sirpilade

Pun aside, I cannot fully trust a centralized URL checker on a remote server that I don’t own, even if they guarantee that my privacy is safe

notepad0x90

Glass is half empty, I see.

How about GSB stopped 16% of phishing sites? that's still huge.

loloquwowndueo

Would you use anything that was only 16% effective for its claimed purpose?

“Tylenol stops headaches in 16% of people” - it’s huge, right? That’s millions of people we’re talking about.

Would you use it?

notepad0x90

22h

You don't have to use tylenol, just like GSB. Since we're picking random analogies, I think bankruptcy and or violent crime are close to what this is stopping. I'd say if it stops just one person from losing their life's savings, suffering physical harm, or psychological trauma, yeah, your blog being on GSB is worth it.

asadotzler

99% of users don't even know they're being protected. There's no promise except "we work to make browsing safer" and cutting even 5% of malicious sites from a user's experience is an unmitigated win for that user at the low false positive rate Safe Browsing offers.

notepad0x90

22h

that doesn't make a difference, they're still being protected. 99% of users don't know that defender saved their lives multiple times from being destroyed either. Same with spam filters, app store rejections,etc..

I don't get why there is such a lack of critical thinking on this topic here.

NekkoDroid

If the other options would just straight up kill innocent bystanders (e.g. false positives for legit shops) I think that is a tradeoff I am willing to make.

HeatrayEnjoyer

Countless medications have <16% efficacy rate.

mock-possum

Idk why not? What’re the side effects?

debo_

I guess the glass is 16% full.

notepad0x90

21h

Stop locking your door if someone can just break through the window then. I think you and the author are conflating 16% effective with 84% of sites on GSB are false positives, that's not what that stat means.

7777777phil

Blocklists assume you can separate malicious infrastructure from legitimate infrastructure. Once phishing moves to Google Sites and Weebly that model just doesn't work.

iqandjoke

But why Apple choose to work with this on Safari?

varispeed

When Google will remove scams, phishing and other nonsense from their advertising? Especially the scareware stuff, where AI videos say someone might be listened to / hacked and here is the software that will help block it / find it whatnot. Then they collect personal data.

xvector

There's probably like one engineer maintaining this as a side project at the company

andor

Yeah, it would be interesting to know how much work is spent on it. I sometimes submit sites when I am targeted by a campaign, but I'm not sure if they end up in their deny-list.

throawayonthe

> ...full dataset of 263 URLs (254 phishing, 9 confirmed legitimate)

> ... automatic scan is optimized for precision (keeping false alarms low...

really?

> When we ran the full dataset through the deep scan, ... it flagged all 9 of the legitimate sites in our dataset as suspicious

lol

nickphx

So I tested out the extension.. First the extension spammed me with "login required".. So I click the notification to be taken to a login page.. Great? Now I have to create an account and verify a link.. Now I can test how great this is against a "fresh" facebook phishing page being actively promoted via Facebook Ads..

hxxps://r7ouhcqzdgae76-fsc0fydmbecefrap.z03.azurefd.net/new2/?utm_medium=paid&utm_source=fb&utm_id=6900429311725&utm_content=6900429312725&utm_t erm=6900429314125&utm_campaign=6900429311725

The "extension" did a "scan". {"url":"https://r7ouhcqzdgae76-fsc0fydmbecefrap.z03.azurefd.net/new2..."}

response: {"classification":"clean"}

great work?

If I click "Deep scan".. I see a screenshot blob being sent over.. response: { "classification": "phish", "reasons": [ "Our system has previously flagged this webpage as malicious." ] }

So if the site were already flagged, why does the "light" scan not show that?

hulitu

The purpose of "Safe Browsing" is to send your URLs to Google.

nickphx

"If you're interested in trying Muninn, it's available as a Chrome extension. We're in an early phase " Domain is less than 4 months old.. Software is "early phase".. Already making misleading marketing claims of usefulness..

mrexcess

These statistics would be a lot better if they were compared directly to the same measurements taken from dedicated cloud SWGs/SSEs like Zscaler. My somewhat subjective sense is that the whole industry is in a bit of a rough patch, the miss rate seems to be noticeably climbing all across the board.

epicprogrammer

Having spent some time in the anti-abuse and Trust & Safety space, I always take these vendor reports with a massive grain of salt. It’s a classic case of comparing apples to vendor-marketing oranges. A headline screaming about an 84% miss rate sounds like a systemic collapse until you look at the radically different constraint envelopes a global default like GSB and a specialized enterprise vendor operate under.

The biggest factor here is the false-positive cliff. Google Safe Browsing is the default safety net for billions of clients across Chrome, Safari, and Firefox. If GSB’s false-positive rate ticks up by even a fraction of a percent, they end up accidentally nuking legitimate small businesses, SaaS platforms, or municipal portals off the internet. Because of that massive blast radius, GSB fundamentally has to be deeply conservative. A boutique security vendor, on the other hand, can afford to be highly aggressive because an over-block in a corporate environment just results in a routine IT support ticket.

You also have to factor in the ephemeral nature of modern phishing infrastructure and basic selection bias. Threat actors heavily rely on automated DGAs and compromised hosts where the time-to-live for a payload is measured in hours, if not minutes. If a specialized vendor detects a zero-day phishing link at 10:00 AM, and GSB hasn't confidently propagated a global block to billions of edge clients by 10:15 AM, the vendor scores it as a "miss." Add in the fact that vendors naturally test against the specific subset of threats their proprietary engines are tuned to find, and that 84% number starts to make a lot more sense as a top-of-funnel marketing metric rather than a scientific baseline.

None of this is to say GSB is perfect right now. It has absolutely struggled to keep up with the recent explosion of automated, highly targeted spear-phishing and MFA-bypass proxy kits. But we should read this report for what it really is: a smart marketing push by a security vendor trying to sell a product, not a sign that the internet's baseline immune system is totally broken.

Medowar

> We also ran the full dataset of 263 URLs (254 phishing, 9 confirmed legitimate) through Muninn's automatic scan. This is the scan that runs on every page you visit without any action on your part. On its own, the automatic scan correctly identified 238 of the 254 phishing sites and only incorrectly flagged 6 legitimate pages. [...] The tradeoff is that it flagged all 9 of the legitimate sites in our dataset as suspicious, ...

Am I missing something or is that a 66%/100% False Positive Rate on legitimate Sites?

If GSB would have that ratio, it would be absolute unusable.. So comparing these two is absolutely wrong...

ApolloFortyNine

The 9/9 is actually crazy, and then they posted about it as if they found something? What they did was find a major issue in their own process and then told the world about it, that just doesn't seem right.

saalweachter

Crazy, and also like, 9? The sample size in that part of your test suite is 9?

trehalose

It would seem their service identifies only phishing sites as legitimate ones. It would seem 100% of sites they deem legitimate are phishing sites. Incredible.

thrwaway55

The deep scan detected all phishing sites correctly with the unfortunate tagging of legit sites as phishing too. I imagine their code looks something like isPhishing = true.

andoando

lol

nubinetwork

> I always take these vendor reports with a massive grain of salt. It’s a classic case of comparing apples to vendor-marketing oranges. A headline screaming about an 84% miss rate sounds like a systemic collapse until...

I've seen this before in the ip blocklist space... if you're layering up firewall rules, you're bound to see the higher priority layers more often.

That doesn't mean the other layers suck, security isn't always an A or B situation...

On the other hand, I don't know how I feel about how GSB is implemented... you're telling google every website you go to, but chances are the site already has google analytics or SSO...

throawayonthe

i thought it was checks against a local list of hashes? with frequent updates

7v3x3n3sem9vv

this is how Firefox does it. can't speak for the rest.

ajross

> I always take these vendor reports with a massive grain of salt.

Yeah. "Here's a blog post with some casually collected numbers about our product [...] It turns out that it's great!" is sorta boring.

But couple that with a headline framed as "Google [...] Bad" and straight to the top of the HN front page it goes!

jdup7

These are fair points and I agree with a lot of them. GSB operates at a scale we don't, and the conservatism that comes with being the default for billions of users is a real constraint. The post tries to acknowledge that ("the takeaway from all of this is not that Google Safe Browsing is bad") and we're upfront about the timing caveat since these were checked at time of scan.

Where I'd push back is on what this means for the average person. Most people have no protection against phishing beyond what their email provider and browser give them. If that protection is fundamentally reactive, catching threats hours or days after they go live, that's a real limitation worth talking about honestly. The 84% number isn't meant to say GSB is broken. It's meant to say there's a gap, and that gap has consequences for real users regardless of the engineering reasons behind it.

On the marketing angle, we aren't currently selling anything. The extension is free and so is submitting URLs for verification. We recognize it would be disingenuous to say we never will, but at the very least the data and the ability to check URLs (similar to PhishTank before they closed registration) will always be free. The dataset is also sourced from public threat intelligence feeds, not a curated set designed to make our tool look good. We think publishing findings like this is valuable even if you set aside everything about our tools.

philipallstar

> We think publishing findings like this is valuable even if you set aside everything about our tools.

In what way is it valuable?

bethekidyouwant

Their example is really dumb. Eventually, you get a fake Microsoft login page, but they clip out the address bar which clearly isn’t a Microsoft address so your auto complete password isn’t going to be put into the form and you’d have to be pretty dumb to type it in my hand or even to know your Microsoft password, it should be some random thing generated by Safari or whatever your password manager is. Not to mention two factor authentication.

eks391

21h

Most people are "really dumb" by your standards then. Not only are most people not going to check the URL, but many people don't know how password managers work, and the only reason they use the browser password manager is because it is on by default, and it is saving their collection of 3 reused passwords they manually type at each site when it doesn't auto populate.

Crafted by Rajat

Source Code

hckrnws

Google Safe Browsing missed 84% of confirmed phishing sites