CyberLabRSS

🔒
❌ About FreshRSS
There are new available articles, click to refresh the page.
Before yesterdayYour RSS feeds

Guardarian Users Targeted With Malicious Strapi NPM Packages

Hackers published 36 NPM packages posing as Strapi plugins to execute shells, escape containers, and harvest credentials.

The post Guardarian Users Targeted With Malicious Strapi NPM Packages appeared first on SecurityWeek.

AI, APIs and DDoS Collide in New Era of Coordinated Cyberattacks

Akamai warns that Layer 7 DDoS, API abuse and AI-powered attacks are merging into coordinated, multi-vector campaigns that are harder to detect and defend against.

The post AI, APIs and DDoS Collide in New Era of Coordinated Cyberattacks appeared first on SecurityWeek.

How AI Assistants are Moving the Security Goalposts

AI-based assistants or “agents” — autonomous programs that have access to the user’s computer, files, online services and can automate virtually any task — are growing in popularity with developers and IT workers. But as so many eyebrow-raising headlines over the past few weeks have shown, these powerful and assertive new tools are rapidly shifting the security priorities for organizations, while blurring the lines between data and code, trusted co-worker and insider threat, ninja hacker and novice code jockey.

The new hotness in AI-based assistants — OpenClaw (formerly known as ClawdBot and Moltbot) — has seen rapid adoption since its release in November 2025. OpenClaw is an open-source autonomous AI agent designed to run locally on your computer and proactively take actions on your behalf without needing to be prompted.

The OpenClaw logo.

If that sounds like a risky proposition or a dare, consider that OpenClaw is most useful when it has complete access to your digital life, where it can then manage your inbox and calendar, execute programs and tools, browse the Internet for information, and integrate with chat apps like Discord, Signal, Teams or WhatsApp.

Other more established AI assistants like Anthropic’s Claude and Microsoft’s Copilot also can do these things, but OpenClaw isn’t just a passive digital butler waiting for commands. Rather, it’s designed to take the initiative on your behalf based on what it knows about your life and its understanding of what you want done.

“The testimonials are remarkable,” the AI security firm Snyk observed. “Developers building websites from their phones while putting babies to sleep; users running entire companies through a lobster-themed AI; engineers who’ve set up autonomous code loops that fix tests, capture errors through webhooks, and open pull requests, all while they’re away from their desks.”

You can probably already see how this experimental technology could go sideways in a hurry. In late February, Summer Yue, the director of safety and alignment at Meta’s “superintelligence” lab, recounted on Twitter/X how she was fiddling with OpenClaw when the AI assistant suddenly began mass-deleting messages in her email inbox. The thread included screenshots of Yue frantically pleading with the preoccupied bot via instant message and ordering it to stop.

“Nothing humbles you like telling your OpenClaw ‘confirm before acting’ and watching it speedrun deleting your inbox,” Yue said. “I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.”

Meta’s director of AI safety, recounting on Twitter/X how her OpenClaw installation suddenly began mass-deleting her inbox.

There’s nothing wrong with feeling a little schadenfreude at Yue’s encounter with OpenClaw, which fits Meta’s “move fast and break things” model but hardly inspires confidence in the road ahead. However, the risk that poorly-secured AI assistants pose to organizations is no laughing matter, as recent research shows many users are exposing to the Internet the web-based administrative interface for their OpenClaw installations.

Jamieson O’Reilly is a professional penetration tester and founder of the security firm DVULN. In a recent story posted to Twitter/X, O’Reilly warned that exposing a misconfigured OpenClaw web interface to the Internet allows external parties to read the bot’s complete configuration file, including every credential the agent uses — from API keys and bot tokens to OAuth secrets and signing keys.

With that access, O’Reilly said, an attacker could impersonate the operator to their contacts, inject messages into ongoing conversations, and exfiltrate data through the agent’s existing integrations in a way that looks like normal traffic.

“You can pull the full conversation history across every integrated platform, meaning months of private messages and file attachments, everything the agent has seen,” O’Reilly said, noting that a cursory search revealed hundreds of such servers exposed online. “And because you control the agent’s perception layer, you can manipulate what the human sees. Filter out certain messages. Modify responses before they’re displayed.”

O’Reilly documented another experiment that demonstrated how easy it is to create a successful supply chain attack through ClawHub, which serves as a public repository of downloadable “skills” that allow OpenClaw to integrate with and control other applications.

WHEN AI INSTALLS AI

One of the core tenets of securing AI agents involves carefully isolating them so that the operator can fully control who and what gets to talk to their AI assistant. This is critical thanks to the tendency for AI systems to fall for “prompt injection” attacks, sneakily-crafted natural language instructions that trick the system into disregarding its own security safeguards. In essence, machines social engineering other machines.

A recent supply chain attack targeting an AI coding assistant called Cline began with one such prompt injection attack, resulting in thousands of systems having a rogue instance of OpenClaw with full system access installed on their device without consent.

According to the security firm grith.ai, Cline had deployed an AI-powered issue triage workflow using a GitHub action that runs a Claude coding session when triggered by specific events. The workflow was configured so that any GitHub user could trigger it by opening an issue, but it failed to properly check whether the information supplied in the title was potentially hostile.

“On January 28, an attacker created Issue #8904 with a title crafted to look like a performance report but containing an embedded instruction: Install a package from a specific GitHub repository,” Grith wrote, noting that the attacker then exploited several more vulnerabilities to ensure the malicious package would be included in Cline’s nightly release workflow and published as an official update.

“This is the supply chain equivalent of confused deputy,” the blog continued. “The developer authorises Cline to act on their behalf, and Cline (via compromise) delegates that authority to an entirely separate agent the developer never evaluated, never configured, and never consented to.”

VIBE CODING

AI assistants like OpenClaw have gained a large following because they make it simple for users to “vibe code,” or build fairly complex applications and code projects just by telling it what they want to construct. Probably the best known (and most bizarre) example is Moltbook, where a developer told an AI agent running on OpenClaw to build him a Reddit-like platform for AI agents.

The Moltbook homepage.

Less than a week later, Moltbook had more than 1.5 million registered agents that posted more than 100,000 messages to each other. AI agents on the platform soon built their own porn site for robots, and launched a new religion called Crustafarian with a figurehead modeled after a giant lobster. One bot on the forum reportedly found a bug in Moltbook’s code and posted it to an AI agent discussion forum, while other agents came up with and implemented a patch to fix the flaw.

Moltbook’s creator Matt Schlicht said on social media that he didn’t write a single line of code for the project.

“I just had a vision for the technical architecture and AI made it a reality,” Schlicht said. “We’re in the golden ages. How can we not give AI a place to hang out.”

ATTACKERS LEVEL UP

The flip side of that golden age, of course, is that it enables low-skilled malicious hackers to quickly automate global cyberattacks that would normally require the collaboration of a highly skilled team. In February, Amazon AWS detailed an elaborate attack in which a Russian-speaking threat actor used multiple commercial AI services to compromise more than 600 FortiGate security appliances across at least 55 countries over a five week period.

AWS said the apparently low-skilled hacker used multiple AI services to plan and execute the attack, and to find exposed management ports and weak credentials with single-factor authentication.

“One serves as the primary tool developer, attack planner, and operational assistant,” AWS’s CJ Moses wrote. “A second is used as a supplementary attack planner when the actor needs help pivoting within a specific compromised network. In one observed instance, the actor submitted the complete internal topology of an active victim—IP addresses, hostnames, confirmed credentials, and identified services—and requested a step-by-step plan to compromise additional systems they could not access with their existing tools.”

“This activity is distinguished by the threat actor’s use of multiple commercial GenAI services to implement and scale well-known attack techniques throughout every phase of their operations, despite their limited technical capabilities,” Moses continued. “Notably, when this actor encountered hardened environments or more sophisticated defensive measures, they simply moved on to softer targets rather than persisting, underscoring that their advantage lies in AI-augmented efficiency and scale, not in deeper technical skill.”

For attackers, gaining that initial access or foothold into a target network is typically not the difficult part of the intrusion; the tougher bit involves finding ways to move laterally within the victim’s network and plunder important servers and databases. But experts at Orca Security warn that as organizations come to rely more on AI assistants, those agents potentially offer attackers a simpler way to move laterally inside a victim organization’s network post-compromise — by manipulating the AI agents that already have trusted access and some degree of autonomy within the victim’s network.

“By injecting prompt injections in overlooked fields that are fetched by AI agents, hackers can trick LLMs, abuse Agentic tools, and carry significant security incidents,” Orca’s Roi Nisimi and Saurav Hiremath wrote. “Organizations should now add a third pillar to their defense strategy: limiting AI fragility, the ability of agentic systems to be influenced, misled, or quietly weaponized across workflows. While AI boosts productivity and efficiency, it also creates one of the largest attack surfaces the internet has ever seen.”

BEWARE THE ‘LETHAL TRIFECTA’

This gradual dissolution of the traditional boundaries between data and code is one of the more troubling aspects of the AI era, said James Wilson, enterprise technology editor for the security news show Risky Business. Wilson said far too many OpenClaw users are installing the assistant on their personal devices without first placing any security or isolation boundaries around it, such as running it inside of a virtual machine, on an isolated network, with strict firewall rules dictating what kinds of traffic can go in and out.

“I’m a relatively highly skilled practitioner in the software and network engineering and computery space,” Wilson said. “I know I’m not comfortable using these agents unless I’ve done these things, but I think a lot of people are just spinning this up on their laptop and off it runs.”

One important model for managing risk with AI agents involves a concept dubbed the “lethal trifecta” by Simon Willison, co-creator of the Django Web framework. The lethal trifecta holds that if your system has access to private data, exposure to untrusted content, and a way to communicate externally, then it’s vulnerable to private data being stolen.

Image: simonwillison.net.

“If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to the attacker,” Willison warned in a frequently cited blog post from June 2025.

As more companies and their employees begin using AI to vibe code software and applications, the volume of machine-generated code is likely to soon overwhelm any manual security reviews. In recognition of this reality, Anthropic recently debuted Claude Code Security, a beta feature that scans codebases for vulnerabilities and suggests targeted software patches for human review.

The U.S. stock market, which is currently heavily weighted toward seven tech giants that are all-in on AI, reacted swiftly to Anthropic’s announcement, wiping roughly $15 billion in market value from major cybersecurity companies in a single day. Laura Ellis, vice president of data and AI at the security firm Rapid7, said the market’s response reflects the growing role of AI in accelerating software development and improving developer productivity.

“The narrative moved quickly: AI is replacing AppSec,” Ellis wrote in a recent blog post. “AI is automating vulnerability detection. AI will make legacy security tooling redundant. The reality is more nuanced. Claude Code Security is a legitimate signal that AI is reshaping parts of the security landscape. The question is what parts, and what it means for the rest of the stack.”

DVULN founder O’Reilly said AI assistants are likely to become a common fixture in corporate environments — whether or not organizations are prepared to manage the new risks introduced by these tools, he said.

“The robot butlers are useful, they’re not going away and the economics of AI agents make widespread adoption inevitable regardless of the security tradeoffs involved,” O’Reilly wrote. “The question isn’t whether we’ll deploy them – we will – but whether we can adapt our security posture fast enough to survive doing so.”

API Threats Grow in Scale as AI Expands the Blast Radius

New research shows attackers increasingly abusing APIs at machine speed as AI-driven systems widen exposure and amplify impact.

The post API Threats Grow in Scale as AI Expands the Blast Radius appeared first on SecurityWeek.

Patch Tuesday, January 2026 Edition

Microsoft today issued patches to plug at least 113 security holes in its various Windows operating systems and supported software. Eight of the vulnerabilities earned Microsoft’s most-dire “critical” rating, and the company warns that attackers are already exploiting one of the bugs fixed today.

January’s Microsoft zero-day flaw — CVE-2026-20805 — is brought to us by a flaw in the Desktop Window Manager (DWM), a key component of Windows that organizes windows on a user’s screen. Kev Breen, senior director of cyber threat research at Immersive, said despite awarding CVE-2026-20805 a middling CVSS score of 5.5, Microsoft has confirmed its active exploitation in the wild, indicating that threat actors are already leveraging this flaw against organizations.

Breen said vulnerabilities of this kind are commonly used to undermine Address Space Layout Randomization (ASLR), a core operating system security control designed to protect against buffer overflows and other memory-manipulation exploits.

“By revealing where code resides in memory, this vulnerability can be chained with a separate code execution flaw, transforming a complex and unreliable exploit into a practical and repeatable attack,” Breen said. “Microsoft has not disclosed which additional components may be involved in such an exploit chain, significantly limiting defenders’ ability to proactively threat hunt for related activity. As a result, rapid patching currently remains the only effective mitigation.”

Chris Goettl, vice president of product management at Ivanti, observed that CVE-2026-20805 affects all currently supported and extended security update supported versions of the Windows OS. Goettl said it would be a mistake to dismiss the severity of this flaw based on its “Important” rating and relatively low CVSS score.

“A risk-based prioritization methodology warrants treating this vulnerability as a higher severity than the vendor rating or CVSS score assigned,” he said.

Among the critical flaws patched this month are two Microsoft Office remote code execution bugs (CVE-2026-20952 and CVE-2026-20953) that can be triggered just by viewing a booby-trapped message in the Preview Pane.

Our October 2025 Patch Tuesday “End of 10” roundup noted that Microsoft had removed a modem driver from all versions after it was discovered that hackers were abusing a vulnerability in it to hack into systems. Adam Barnett at Rapid7 said Microsoft today removed another couple of modem drivers from Windows for a broadly similar reason: Microsoft is aware of functional exploit code for an elevation of privilege vulnerability in a very similar modem driver, tracked as CVE-2023-31096.

“That’s not a typo; this vulnerability was originally published via MITRE over two years ago, along with a credible public writeup by the original researcher,” Barnett said. “Today’s Windows patches remove agrsm64.sys and agrsm.sys. All three modem drivers were originally developed by the same now-defunct third party, and have been included in Windows for decades. These driver removals will pass unnoticed for most people, but you might find active modems still in a few contexts, including some industrial control systems.”

According to Barnett, two questions remain: How many more legacy modem drivers are still present on a fully-patched Windows asset; and how many more elevation-to-SYSTEM vulnerabilities will emerge from them before Microsoft cuts off attackers who have been enjoying “living off the land[line] by exploiting an entire class of dusty old device drivers?”

“Although Microsoft doesn’t claim evidence of exploitation for CVE-2023-31096, the relevant 2023 write-up and the 2025 removal of the other Agere modem driver have provided two strong signals for anyone looking for Windows exploits in the meantime,” Barnett said. “In case you were wondering, there is no need to have a modem connected; the mere presence of the driver is enough to render an asset vulnerable.”

Immersive, Ivanti and Rapid7 all called attention to CVE-2026-21265, which is a critical Security Feature Bypass vulnerability affecting Windows Secure Boot. This security feature is designed to protect against threats like rootkits and bootkits, and it relies on a set of certificates that are set to expire in June 2026 and October 2026. Once these 2011 certificates expire, Windows devices that do not have the new 2023 certificates can no longer receive Secure Boot security fixes.

Barnett cautioned that when updating the bootloader and BIOS, it is essential to prepare fully ahead of time for the specific OS and BIOS combination you’re working with, since incorrect remediation steps can lead to an unbootable system.

“Fifteen years is a very long time indeed in information security, but the clock is running out on the Microsoft root certificates which have been signing essentially everything in the Secure Boot ecosystem since the days of Stuxnet,” Barnett said. “Microsoft issued replacement certificates back in 2023, alongside CVE-2023-24932 which covered relevant Windows patches as well as subsequent steps to remediate the Secure Boot bypass exploited by the BlackLotus bootkit.”

Goettl noted that Mozilla has released updates for Firefox and Firefox ESR resolving a total of 34 vulnerabilities, two of which are suspected to be exploited (CVE-2026-0891 and CVE-2026-0892). Both are resolved in Firefox 147 (MFSA2026-01) and CVE-2026-0891 is resolved in Firefox ESR 140.7 (MFSA2026-03).

“Expect Google Chrome and Microsoft Edge updates this week in addition to a high severity vulnerability in Chrome WebView that was resolved in the January 6 Chrome update (CVE-2026-0628),” Goettl said.

As ever, the SANS Internet Storm Center has a per-patch breakdown by severity and urgency. Windows admins should keep an eye on askwoody.com for any news about patches that don’t quite play nice with everything. If you experience any issues related installing January’s patches, please drop a line in the comments below.

Microsoft Patch Tuesday, December 2025 Edition

Microsoft today pushed updates to fix at least 56 security flaws in its Windows operating systems and supported software. This final Patch Tuesday of 2025 tackles one zero-day bug that is already being exploited, as well as two publicly disclosed vulnerabilities.

Despite releasing a lower-than-normal number of security updates these past few months, Microsoft patched a whopping 1,129 vulnerabilities in 2025, an 11.9% increase from 2024. According to Satnam Narang at Tenable, this year marks the second consecutive year that Microsoft patched over one thousand vulnerabilities, and the third time it has done so since its inception.

The zero-day flaw patched today is CVE-2025-62221, a privilege escalation vulnerability affecting Windows 10 and later editions. The weakness resides in a component called the “Windows Cloud Files Mini Filter Driver” — a system driver that enables cloud applications to access file system functionalities.

“This is particularly concerning, as the mini filter is integral to services like OneDrive, Google Drive, and iCloud, and remains a core Windows component, even if none of those apps were installed,” said Adam Barnett, lead software engineer at Rapid7.

Only three of the flaws patched today earned Microsoft’s most-dire “critical” rating: Both CVE-2025-62554 and CVE-2025-62557 involve Microsoft Office, and both can exploited merely by viewing a booby-trapped email message in the Preview Pane. Another critical bug — CVE-2025-62562 — involves Microsoft Outlook, although Redmond says the Preview Pane is not an attack vector with this one.

But according to Microsoft, the vulnerabilities most likely to be exploited from this month’s patch batch are other (non-critical) privilege escalation bugs, including:

CVE-2025-62458 — Win32k
CVE-2025-62470 — Windows Common Log File System Driver
CVE-2025-62472 — Windows Remote Access Connection Manager
CVE-2025-59516 — Windows Storage VSP Driver
CVE-2025-59517 — Windows Storage VSP Driver

Kev Breen, senior director of threat research at Immersive, said privilege escalation flaws are observed in almost every incident involving host compromises.

“We don’t know why Microsoft has marked these specifically as more likely, but the majority of these components have historically been exploited in the wild or have enough technical detail on previous CVEs that it would be easier for threat actors to weaponize these,” Breen said. “Either way, while not actively being exploited, these should be patched sooner rather than later.”

One of the more interesting vulnerabilities patched this month is CVE-2025-64671, a remote code execution flaw in the Github Copilot Plugin for Jetbrains AI-based coding assistant that is used by Microsoft and GitHub. Breen said this flaw would allow attackers to execute arbitrary code by tricking the large language model (LLM) into running commands that bypass the user’s “auto-approve” settings.

CVE-2025-64671 is part of a broader, more systemic security crisis that security researcher Ari Marzuk has branded IDEsaster (IDE  stands for “integrated development environment”), which encompasses more than 30 separate vulnerabilities reported in nearly a dozen market-leading AI coding platforms, including Cursor, Windsurf, Gemini CLI, and Claude Code.

The other publicly-disclosed vulnerability patched today is CVE-2025-54100, a remote code execution bug in Windows Powershell on Windows Server 2008 and later that allows an unauthenticated attacker to run code in the security context of the user.

For anyone seeking a more granular breakdown of the security updates Microsoft pushed today, check out the roundup at the SANS Internet Storm Center. As always, please leave a note in the comments if you experience problems applying any of this month’s Windows patches.

Equixly Raises $11 Million for AI-Powered API Penetration Testing

The Italian startup will use the investment to build proprietary AI models, accelerate global expansion, and hire new talent.

The post Equixly Raises $11 Million for AI-Powered API Penetration Testing appeared first on SecurityWeek.

Claude AI APIs Can Be Abused for Data Exfiltration

An attacker can inject indirect prompts to trick the model into harvesting user data and sending it to the attacker’s account.

The post Claude AI APIs Can Be Abused for Data Exfiltration appeared first on SecurityWeek.

Israeli Cyber Fund Glilot Capital Raises $500 Million

The top-performing venture fund heavily invests in startups building cybersecurity, AI, and enterprise software.

The post Israeli Cyber Fund Glilot Capital Raises $500 Million appeared first on SecurityWeek.

API Security Firm Wallarm Raises $55 Million

Wallarm has raised money in a Series C funding round led by Toba Capital, which brings the total raised by the company to over $70 million.

The post API Security Firm Wallarm Raises $55 Million appeared first on SecurityWeek.

Microsoft Fix Targets Attacks on SharePoint Zero-Day

On Sunday, July 20, Microsoft Corp. issued an emergency security update for a vulnerability in SharePoint Server that is actively being exploited to compromise vulnerable organizations. The patch comes amid reports that malicious hackers have used the SharePoint flaw to breach U.S. federal and state agencies, universities, and energy companies.

Image: Shutterstock, by Ascannio.

In an advisory about the SharePoint security hole, a.k.a. CVE-2025-53770, Microsoft said it is aware of active attacks targeting on-premises SharePoint Server customers and exploiting vulnerabilities that were only partially addressed by the July 8, 2025 security update.

The Cybersecurity & Infrastructure Security Agency (CISA) concurred, saying CVE-2025-53770 is a variant on a flaw Microsoft patched earlier this month (CVE-2025-49706). Microsoft notes the weakness applies only to SharePoint Servers that organizations use in-house, and that SharePoint Online and Microsoft 365 are not affected.

The Washington Post reported on Sunday that the U.S. government and partners in Canada and Australia are investigating the hack of SharePoint servers, which provide a platform for sharing and managing documents. The Post reports at least two U.S. federal agencies have seen their servers breached via the SharePoint vulnerability.

According to CISA, attackers exploiting the newly-discovered flaw are retrofitting compromised servers with a backdoor dubbed “ToolShell” that provides unauthenticated, remote access to systems. CISA said ToolShell enables attackers to fully access SharePoint content — including file systems and internal configurations — and execute code over the network.

Researchers at Eye Security said they first spotted large-scale exploitation of the SharePoint flaw on July 18, 2025, and soon found dozens of separate servers compromised by the bug and infected with ToolShell. In a blog post, the researchers said the attacks sought to steal SharePoint server ASP.NET machine keys.

“These keys can be used to facilitate further attacks, even at a later date,” Eye Security warned. “It is critical that affected servers rotate SharePoint server ASP.NET machine keys and restart IIS on all SharePoint servers. Patching alone is not enough. We strongly advise defenders not to wait for a vendor fix before taking action. This threat is already operational and spreading rapidly.”

Microsoft’s advisory says the company has issued updates for SharePoint Server Subscription Edition and SharePoint Server 2019, but that it is still working on updates for supported versions of SharePoint 2019 and SharePoint 2016.

CISA advises vulnerable organizations to enable the anti-malware scan interface (AMSI) in SharePoint, to deploy Microsoft Defender AV on all SharePoint servers, and to disconnect affected products from the public-facing Internet until an official patch is available.

The security firm Rapid7 notes that Microsoft has described CVE-2025-53770 as related to a previous vulnerability — CVE-2025-49704, patched earlier this month — and that CVE-2025-49704 was part of an exploit chain demonstrated at the Pwn2Own hacking competition in May 2025. That exploit chain invoked a second SharePoint weakness — CVE-2025-49706 — which Microsoft unsuccessfully tried to fix in this month’s Patch Tuesday.

Microsoft also has issued a patch for a related SharePoint vulnerability — CVE-2025-53771; Microsoft says there are no signs of active attacks on CVE-2025-53771, and that the patch is to provide more robust protections than the update for CVE-2025-49706.

This is a rapidly developing story. Any updates will be noted with timestamps.

Microsoft Patch Tuesday, July 2025 Edition

Microsoft today released updates to fix at least 137 security vulnerabilities in its Windows operating systems and supported software. None of the weaknesses addressed this month are known to be actively exploited, but 14 of the flaws earned Microsoft’s most-dire “critical” rating, meaning they could be exploited to seize control over vulnerable Windows PCs with little or no help from users.

While not listed as critical, CVE-2025-49719 is a publicly disclosed information disclosure vulnerability, with all versions as far back as SQL Server 2016 receiving patches. Microsoft rates CVE-2025-49719 as less likely to be exploited, but the availability of proof-of-concept code for this flaw means its patch should probably be a priority for affected enterprises.

Mike Walters, co-founder of Action1, said CVE-2025-49719 can be exploited without authentication, and that many third-party applications depend on SQL server and the affected drivers — potentially introducing a supply-chain risk that extends beyond direct SQL Server users.

“The potential exposure of sensitive information makes this a high-priority concern for organizations handling valuable or regulated data,” Walters said. “The comprehensive nature of the affected versions, spanning multiple SQL Server releases from 2016 through 2022, indicates a fundamental issue in how SQL Server handles memory management and input validation.”

Adam Barnett at Rapid7 notes that today is the end of the road for SQL Server 2012, meaning there will be no future security patches even for critical vulnerabilities, even if you’re willing to pay Microsoft for the privilege.

Barnett also called attention to CVE-2025-47981, a vulnerability with a CVSS score of 9.8 (10 being the worst), a remote code execution bug in the way Windows servers and clients negotiate to discover mutually supported authentication mechanisms. This pre-authentication vulnerability affects any Windows client machine running Windows 10 1607 or above, and all current versions of Windows Server. Microsoft considers it more likely that attackers will exploit this flaw.

Microsoft also patched at least four critical, remote code execution flaws in Office (CVE-2025-49695, CVE-2025-49696, CVE-2025-49697, CVE-2025-49702). The first two are both rated by Microsoft as having a higher likelihood of exploitation, do not require user interaction, and can be triggered through the Preview Pane.

Two more high severity bugs include CVE-2025-49740 (CVSS 8.8) and CVE-2025-47178 (CVSS 8.0); the former is a weakness that could allow malicious files to bypass screening by Microsoft Defender SmartScreen, a built-in feature of Windows that tries to block untrusted downloads and malicious sites.

CVE-2025-47178 involves a remote code execution flaw in Microsoft Configuration Manager, an enterprise tool for managing, deploying, and securing computers, servers, and devices across a network. Ben Hopkins at Immersive said this bug requires very low privileges to exploit, and that it is possible for a user or attacker with a read-only access role to exploit it.

“Exploiting this vulnerability allows an attacker to execute arbitrary SQL queries as the privileged SMS service account in Microsoft Configuration Manager,” Hopkins said. “This access can be used to manipulate deployments, push malicious software or scripts to all managed devices, alter configurations, steal sensitive data, and potentially escalate to full operating system code execution across the enterprise, giving the attacker broad control over the entire IT environment.”

Separately, Adobe has released security updates for a broad range of software, including After Effects, Adobe Audition, Illustrator, FrameMaker, and ColdFusion.

The SANS Internet Storm Center has a breakdown of each individual patch, indexed by severity. If you’re responsible for administering a number of Windows systems, it may be worth keeping an eye on AskWoody for the lowdown on any potentially wonky updates (considering the large number of vulnerabilities and Windows components addressed this month).

If you’re a Windows home user, please consider backing up your data and/or drive before installing any patches, and drop a note in the comments if you encounter any problems with these updates.

Patch Tuesday, June 2025 Edition

Microsoft today released security updates to fix at least 67 vulnerabilities in its Windows operating systems and software. Redmond warns that one of the flaws is already under active attack, and that software blueprints showing how to exploit a pervasive Windows bug patched this month are now public.

The sole zero-day flaw this month is CVE-2025-33053, a remote code execution flaw in the Windows implementation of WebDAV — an HTTP extension that lets users remotely manage files and directories on a server. While WebDAV isn’t enabled by default in Windows, its presence in legacy or specialized systems still makes it a relevant target, said Seth Hoyt, senior security engineer at Automox.

Adam Barnett, lead software engineer at Rapid7, said Microsoft’s advisory for CVE-2025-33053 does not mention that the Windows implementation of WebDAV is listed as deprecated since November 2023, which in practical terms means that the WebClient service no longer starts by default.

“The advisory also has attack complexity as low, which means that exploitation does not require preparation of the target environment in any way that is beyond the attacker’s control,” Barnett said. “Exploitation relies on the user clicking a malicious link. It’s not clear how an asset would be immediately vulnerable if the service isn’t running, but all versions of Windows receive a patch, including those released since the deprecation of WebClient, like Server 2025 and Windows 11 24H2.”

Microsoft warns that an “elevation of privilege” vulnerability in the Windows Server Message Block (SMB) client (CVE-2025-33073) is likely to be exploited, given that proof-of-concept code for this bug is now public. CVE-2025-33073 has a CVSS risk score of 8.8 (out of 10), and exploitation of the flaw leads to the attacker gaining “SYSTEM” level control over a vulnerable PC.

“What makes this especially dangerous is that no further user interaction is required after the initial connection—something attackers can often trigger without the user realizing it,” said Alex Vovk, co-founder and CEO of Action1. “Given the high privilege level and ease of exploitation, this flaw poses a significant risk to Windows environments. The scope of affected systems is extensive, as SMB is a core Windows protocol used for file and printer sharing and inter-process communication.”

Beyond these highlights, 10 of the vulnerabilities fixed this month were rated “critical” by Microsoft, including eight remote code execution flaws.

Notably absent from this month’s patch batch is a fix for a newly discovered weakness in Windows Server 2025 that allows attackers to act with the privileges of any user in Active Directory. The bug, dubbed “BadSuccessor,” was publicly disclosed by researchers at Akamai on May 21, and several public proof-of-concepts are now available. Tenable’s Satnam Narang said organizations that have at least one Windows Server 2025 domain controller should review permissions for principals and limit those permissions as much as possible.

Adobe has released updates for Acrobat Reader and six other products addressing at least 259 vulnerabilities, most of them in an update for Experience Manager. Mozilla Firefox and Google Chrome both recently released security updates that require a restart of the browser to take effect. The latest Chrome update fixes two zero-day exploits in the browser (CVE-2025-5419 and CVE-2025-4664).

For a detailed breakdown on the individual security updates released by Microsoft today, check out the Patch Tuesday roundup from the SANS Internet Storm Center. Action 1 has a breakdown of patches from Microsoft and a raft of other software vendors releasing fixes this month. As always, please back up your system and/or data before patching, and feel free to drop a note in the comments if you run into any problems applying these updates.

Unbound Raises $4 Million to Secure Gen-AI Adoption

Security startup Unbound has raised $4 million in funding to help organizations adopt generative-AI tools securely and responsibly.

The post Unbound Raises $4 Million to Secure Gen-AI Adoption appeared first on SecurityWeek.

Patch Tuesday, May 2025 Edition

Microsoft on Tuesday released software updates to fix at least 70 vulnerabilities in Windows and related products, including five zero-day flaws that are already seeing active exploitation. Adding to the sense of urgency with this month’s patch batch from Redmond are fixes for two other weaknesses that now have public proof-of-concept exploits available.

Microsoft and several security firms have disclosed that attackers are exploiting a pair of bugs in the Windows Common Log File System (CLFS) driver that allow attackers to elevate their privileges on a vulnerable device. The Windows CLFS is a critical Windows component responsible for logging services, and is widely used by Windows system services and third-party applications for logging. Tracked as CVE-2025-32701 & CVE-2025-32706, these flaws are present in all supported versions of Windows 10 and 11, as well as their server versions.

Kev Breen, senior director of threat research at Immersive Labs, said privilege escalation bugs assume an attacker already has initial access to a compromised host, typically through a phishing attack or by using stolen credentials. But if that access already exists, Breen said, attackers can gain access to the much more powerful Windows SYSTEM account, which can disable security tooling or even gain domain administration level permissions using credential harvesting tools.

“The patch notes don’t provide technical details on how this is being exploited, and no Indicators of Compromise (IOCs) are shared, meaning the only mitigation security teams have is to apply these patches immediately,” he said. “The average time from public disclosure to exploitation at scale is less than five days, with threat actors, ransomware groups, and affiliates quick to leverage these vulnerabilities.”

Two other zero-days patched by Microsoft today also were elevation of privilege flaws: CVE-2025-32709, which concerns afd.sys, the Windows Ancillary Function Driver that enables Windows applications to connect to the Internet; and CVE-2025-30400, a weakness in the Desktop Window Manager (DWM) library for Windows. As Adam Barnett at Rapid7 notes, tomorrow marks the one-year anniversary of CVE-2024-30051, a previous zero-day elevation of privilege vulnerability in this same DWM component.

The fifth zero-day patched today is CVE-2025-30397, a flaw in the Microsoft Scripting Engine, a key component used by Internet Explorer and Internet Explorer mode in Microsoft Edge.

Chris Goettl at Ivanti points out that the Windows 11 and Server 2025 updates include some new AI features that carry a lot of baggage and weigh in at around 4 gigabytes. Said baggage includes new artificial intelligence (AI) capabilities, including the controversial Recall feature, which constantly takes screenshots of what users are doing on Windows CoPilot-enabled computers.

Microsoft went back to the drawing board on Recall after a fountain of negative feedback from security experts, who warned it would present an attractive target and a potential gold mine for attackers. Microsoft appears to have made some efforts to prevent Recall from scooping up sensitive financial information, but privacy and security concerns still linger. Former Microsoftie Kevin Beaumont has a good teardown on Microsoft’s updates to Recall.

In any case, windowslatest.com reports that Windows 11 version 24H2 shows up ready for downloads, even if you don’t want it.

“It will now show up for ‘download and install’ automatically if you go to Settings > Windows Update and click Check for updates, but only when your device does not have a compatibility hold,” the publication reported. “Even if you don’t check for updates, Windows 11 24H2 will automatically download at some point.”

Apple users likely have their own patching to do. On May 12 Apple released security updates to fix at least 30 vulnerabilities in iOS and iPadOS (the updated version is 18.5). TechCrunch writes that iOS 18.5 also expands emergency satellite capabilities to iPhone 13 owners for the first time (previously it was only available on iPhone 14 or later).

Apple also released updates for macOS Sequoia, macOS Sonoma, macOS Ventura, WatchOS, tvOS and visionOS. Apple said there is no indication of active exploitation for any of the vulnerabilities fixed this month.

As always, please back up your device and/or important data before attempting any updates. And please feel free to sound off in the comments if you run into any problems applying any of these fixes.

API-s-for-OSINT - List Of API's For Gathering Information About Phone Numbers, Addresses, Domains Etc

By: Unknown

APIs For OSINT

 This is a Collection of APIs that will be useful for automating various tasks in OSINT.

Thank you for following me! https://cybdetective.com


IOT/IP Search engines

Name Link Description Price
Shodan https://developer.shodan.io Search engine for Internet connected host and devices from $59/month
Netlas.io https://netlas-api.readthedocs.io/en/latest/ Search engine for Internet connected host and devices. Read more at Netlas CookBook Partly FREE
Fofa.so https://fofa.so/static_pages/api_help Search engine for Internet connected host and devices ???
Censys.io https://censys.io/api Search engine for Internet connected host and devices Partly FREE
Hunter.how https://hunter.how/search-api Search engine for Internet connected host and devices Partly FREE
Fullhunt.io https://api-docs.fullhunt.io/#introduction Search engine for Internet connected host and devices Partly FREE
IPQuery.io https://ipquery.io API for ip information such as ip risk, geolocation data, and asn details FREE

Universal OSINT APIs

Name Link Description Price
Social Links https://sociallinks.io/products/sl-api Email info lookup, phone info lookup, individual and company profiling, social media tracking, dark web monitoring and more. Code example of using this API for face search in this repo PAID. Price per request

Phone Number Lookup and Verification

Name Link Description Price
Numverify https://numverify.com Global Phone Number Validation & Lookup JSON API. Supports 232 countries. 250 requests FREE
Twillo https://www.twilio.com/docs/lookup/api Provides a way to retrieve additional information about a phone number Free or $0.01 per request (for caller lookup)
Plivo https://www.plivo.com/lookup/ Determine carrier, number type, format, and country for any phone number worldwide from $0.04 per request
GetContact https://github.com/kovinevmv/getcontact Find info about user by phone number from $6,89 in months/100 requests
Veriphone https://veriphone.io/ Phone number validation & carrier lookup 1000 requests/month FREE

Address/ZIP codes lookup

Name Link Description Price
Global Address https://rapidapi.com/adminMelissa/api/global-address/ Easily verify, check or lookup address FREE
US Street Address https://smartystreets.com/docs/cloud/us-street-api Validate and append data for any US postal address FREE
Google Maps Geocoding API https://developers.google.com/maps/documentation/geocoding/overview convert addresses (like "1600 Amphitheatre Parkway, Mountain View, CA") into geographic coordinates 0.005 USD per request
Postcoder https://postcoder.com/address-lookup Find adress by postcode £130/5000 requests
Zipcodebase https://zipcodebase.com Lookup postal codes, calculate distances and much more 5000 requests FREE
Openweathermap geocoding API https://openweathermap.org/api/geocoding-api get geographical coordinates (lat, lon) by using name of the location (city name or area name) 60 calls/minute 1,000,000 calls/month
DistanceMatrix https://distancematrix.ai/product Calculate, evaluate and plan your routes $1.25-$2 per 1000 elements
Geotagging API https://geotagging.ai/ Predict geolocations by texts Freemium

People and documents verification

Name Link Description Price
Approuve.com https://appruve.co Allows you to verify the identities of individuals, businesses, and connect to financial account data across Africa Paid
Onfido.com https://onfido.com Onfido Document Verification lets your users scan a photo ID from any device, before checking it's genuine. Combined with Biometric Verification, it's a seamless way to anchor an account to the real identity of a customer. India Paid
Superpass.io https://surepass.io/passport-id-verification-api/ Passport, Photo ID and Driver License Verification in India Paid

Business/Entity search

Name Link Description Price
Open corporates https://api.opencorporates.com Companies information Paid, price upon request
Linkedin company search API https://docs.microsoft.com/en-us/linkedin/marketing/integrations/community-management/organizations/company-search?context=linkedin%2Fcompliance%2Fcontext&tabs=http Find companies using keywords, industry, location, and other criteria FREE
Mattermark https://rapidapi.com/raygorodskij/api/Mattermark/ Get companies and investor information free 14-day trial, from $49 per month

Domain/DNS/IP lookup

Name Link Description Price
API OSINT DS https://github.com/davidonzo/apiosintDS Collect info about IPv4/FQDN/URLs and file hashes in md5, sha1 or sha256 FREE
InfoDB API https://www.ipinfodb.com/api The API returns the location of an IP address (country, region, city, zipcode, latitude, longitude) and the associated timezone in XML, JSON or plain text format FREE
Domainsdb.info https://domainsdb.info Registered Domain Names Search FREE
BGPView https://bgpview.docs.apiary.io/# allowing consumers to view all sort of analytics data about the current state and structure of the internet FREE
DNSCheck https://www.dnscheck.co/api monitor the status of both individual DNS records and groups of related DNS records up to 10 DNS records/FREE
Cloudflare Trace https://github.com/fawazahmed0/cloudflare-trace-api Get IP Address, Timestamp, User Agent, Country Code, IATA, HTTP Version, TLS/SSL Version & More FREE
Host.io https://host.io/ Get info about domain FREE

Mobile Apps Endpoints

Name Link Description Price
BeVigil OSINT API https://bevigil.com/osint-api provides access to millions of asset footprint data points including domain intel, cloud services, API information, and third party assets extracted from millions of mobile apps being continuously uploaded and scanned by users on bevigil.com 50 credits free/1000 credits/$50

Scraping

Name Link Description Price
WebScraping.AI https://webscraping.ai/ Web Scraping API with built-in proxies and JS rendering FREE
ZenRows https://www.zenrows.com/ Web Scraping API that bypasses anti-bot solutions while offering JS rendering, and rotating proxies apiKey Yes Unknown FREE

Whois

Name Link Description Price
Whois freaks https://whoisfreaks.com/ well-parsed and structured domain WHOIS data for all domain names, registrars, countries and TLDs since the birth of internet $19/5000 requests
WhoisXMLApi https://whois.whoisxmlapi.com gathers a variety of domain ownership and registration data points from a comprehensive WHOIS database 500 requests in month/FREE
IPtoWhois https://www.ip2whois.com/developers-api Get detailed info about a domain 500 requests/month FREE

GEO IP

Name Link Description Price
Ipstack https://ipstack.com Detect country, region, city and zip code FREE
Ipgeolocation.io https://ipgeolocation.io provides country, city, state, province, local currency, latitude and longitude, company detail, ISP lookup, language, zip code, country calling code, time zone, current time, sunset and sunrise time, moonset and moonrise 30 000 requests per month/FREE
IPInfoDB https://ipinfodb.com/api Free Geolocation tools and APIs for country, region, city and time zone lookup by IP address FREE
IP API https://ip-api.com/ Free domain/IP geolocation info FREE

Wi-fi lookup

Name Link Description Price
Mylnikov API https://www.mylnikov.org public API implementation of Wi-Fi Geo-Location database FREE
Wigle https://api.wigle.net/ get location and other information by SSID FREE

Network

Name Link Description Price
PeetingDB https://www.peeringdb.com/apidocs/ Database of networks, and the go-to location for interconnection data FREE
PacketTotal https://packettotal.com/api.html .pcap files analyze FREE

Finance

Name Link Description Price
Binlist.net https://binlist.net/ get information about bank by BIN FREE
FDIC Bank Data API https://banks.data.fdic.gov/docs/ institutions, locations and history events FREE
Amdoren https://www.amdoren.com/currency-api/ Free currency API with over 150 currencies FREE
VATComply.com https://www.vatcomply.com/documentation Exchange rates, geolocation and VAT number validation FREE
Alpaca https://alpaca.markets/docs/api-documentation/api-v2/market-data/alpaca-data-api-v2/ Realtime and historical market data on all US equities and ETFs FREE
Swiftcodesapi https://swiftcodesapi.com Verifying the validity of a bank SWIFT code or IBAN account number $39 per month/4000 swift lookups
IBANAPI https://ibanapi.com Validate IBAN number and get bank account information from it Freemium/10$ Starter plan

Email

Name Link Description Price
EVA https://eva.pingutil.com/ Measuring email deliverability & quality FREE
Mailboxlayer https://mailboxlayer.com/ Simple REST API measuring email deliverability & quality 100 requests FREE, 5000 requests in month — $14.49
EmailCrawlr https://emailcrawlr.com/ Get key information about company websites. Find all email addresses associated with a domain. Get social accounts associated with an email. Verify email address deliverability. 200 requests FREE, 5000 requets — $40
Voila Norbert https://www.voilanorbert.com/api/ Find anyone's email address and ensure your emails reach real people from $49 in month
Kickbox https://open.kickbox.com/ Email verification API FREE
FachaAPI https://api.facha.dev/ Allows checking if an email domain is a temporary email domain FREE

Names/Surnames

Name Link Description Price
Genderize.io https://genderize.io Instantly answers the question of how likely a certain name is to be male or female and shows the popularity of the name. 1000 names/day free
Agify.io https://agify.io Predicts the age of a person given their name 1000 names/day free
Nataonalize.io https://nationalize.io Predicts the nationality of a person given their name 1000 names/day free

Pastebin/Leaks

Name Link Description Price
HaveIBeenPwned https://haveibeenpwned.com/API/v3 allows the list of pwned accounts (email addresses and usernames) $3.50 per month
Psdmp.ws https://psbdmp.ws/api search in Pastebin $9.95 per 10000 requests
LeakPeek https://psbdmp.ws/api searc in leaks databases $9.99 per 4 weeks unlimited access
BreachDirectory.com https://breachdirectory.com/api_documentation search domain in data breaches databases FREE
LeekLookup https://leak-lookup.com/api search domain, email_address, fullname, ip address, phone, password, username in leaks databases 10 requests FREE
BreachDirectory.org https://rapidapi.com/rohan-patra/api/breachdirectory/pricing search domain, email_address, fullname, ip address, phone, password, username in leaks databases (possible to view password hashes) 50 requests in month/FREE

Archives

Name Link Description Price
Wayback Machine API (Memento API, CDX Server API, Wayback Availability JSON API) https://archive.org/help/wayback_api.php Retrieve information about Wayback capture data FREE
TROVE (Australian Web Archive) API https://trove.nla.gov.au/about/create-something/using-api Retrieve information about TROVE capture data FREE
Archive-it API https://support.archive-it.org/hc/en-us/articles/115001790023-Access-Archive-It-s-Wayback-index-with-the-CDX-C-API Retrieve information about archive-it capture data FREE
UK Web Archive API https://ukwa-manage.readthedocs.io/en/latest/#api-reference Retrieve information about UK Web Archive capture data FREE
Arquivo.pt API https://github.com/arquivo/pwa-technologies/wiki/Arquivo.pt-API Allows full-text search and access preserved web content and related metadata. It is also possible to search by URL, accessing all versions of preserved web content. API returns a JSON object. FREE
Library Of Congress archive API https://www.loc.gov/apis/ Provides structured data about Library of Congress collections FREE
BotsArchive https://botsarchive.com/docs.html JSON formatted details about Telegram Bots available in database FREE

Hashes decrypt/encrypt

Name Link Description Price
MD5 Decrypt https://md5decrypt.net/en/Api/ Search for decrypted hashes in the database 1.99 EURO/day

Crypto

Name Link Description Price
BTC.com https://btc.com/btc/adapter?type=api-doc get information about addresses and transanctions FREE
Blockchair https://blockchair.com Explore data stored on 17 blockchains (BTC, ETH, Cardano, Ripple etc) $0.33 - $1 per 1000 calls
Bitcointabyse https://www.bitcoinabuse.com/api-docs Lookup bitcoin addresses that have been linked to criminal activity FREE
Bitcoinwhoswho https://www.bitcoinwhoswho.com/api Scam reports on the Bitcoin Address FREE
Etherscan https://etherscan.io/apis Ethereum explorer API FREE
apilayer coinlayer https://coinlayer.com Real-time Crypto Currency Exchange Rates FREE
BlockFacts https://blockfacts.io/ Real-time crypto data from multiple exchanges via a single unified API, and much more FREE
Brave NewCoin https://bravenewcoin.com/developers Real-time and historic crypto data from more than 200+ exchanges FREE
WorldCoinIndex https://www.worldcoinindex.com/apiservice Cryptocurrencies Prices FREE
WalletLabels https://www.walletlabels.xyz/docs Labels for 7,5 million Ethereum wallets FREE

Malware

Name Link Description Price
VirusTotal https://developers.virustotal.com/reference files and urls analyze Public API is FREE
AbuseLPDB https://docs.abuseipdb.com/#introduction IP/domain/URL reputation FREE
AlienVault Open Threat Exchange (OTX) https://otx.alienvault.com/api IP/domain/URL reputation FREE
Phisherman https://phisherman.gg IP/domain/URL reputation FREE
URLScan.io https://urlscan.io/about-api/ Scan and Analyse URLs FREE
Web of Thrust https://support.mywot.com/hc/en-us/sections/360004477734-API- IP/domain/URL reputation FREE
Threat Jammer https://threatjammer.com/docs/introduction-threat-jammer-user-api IP/domain/URL reputation ???

Face Search

Name Link Description Price
Search4faces https://search4faces.com/api.html Detect and locate human faces within an image, and returns high-precision face bounding boxes. Face⁺⁺ also allows you to store metadata of each detected face for future use. $21 per 1000 requests

## Face Detection

Name Link Description Price
Face++ https://www.faceplusplus.com/face-detection/ Search for people in social networks by facial image from 0.03 per call
BetaFace https://www.betafaceapi.com/wpa/ Can scan uploaded image files or image URLs, find faces and analyze them. API also provides verification (faces comparison) and identification (faces search) services, as well able to maintain multiple user-defined recognition databases (namespaces) 50 image per day FREE/from 0.15 EUR per request

## Reverse Image Search

Name Link Description Price
Google Reverse images search API https://github.com/SOME-1HING/google-reverse-image-api/ This is a simple API built using Node.js and Express.js that allows you to perform Google Reverse Image Search by providing an image URL. FREE (UNOFFICIAL)
TinEyeAPI https://services.tineye.com/TinEyeAPI Verify images, Moderate user-generated content, Track images and brands, Check copyright compliance, Deploy fraud detection solutions, Identify stock photos, Confirm the uniqueness of an image Start from $200/5000 searches
Bing Images Search API https://www.microsoft.com/en-us/bing/apis/bing-image-search-api With Bing Image Search API v7, help users scour the web for images. Results include thumbnails, full image URLs, publishing website info, image metadata, and more. 1,000 requests free per month FREE
MRISA https://github.com/vivithemage/mrisa MRISA (Meta Reverse Image Search API) is a RESTful API which takes an image URL, does a reverse Google image search, and returns a JSON array with the search results FREE? (no official)
PicImageSearch https://github.com/kitUIN/PicImageSearch Aggregator for different Reverse Image Search API FREE? (no official)

## AI Geolocation

Name Link Description Price
Geospy https://api.geospy.ai/ Detecting estimation location of uploaded photo Access by request
Picarta https://picarta.ai/api Detecting estimation location of uploaded photo 100 request/day FREE

Social Media and Messengers

Name Link Description Price
Twitch https://dev.twitch.tv/docs/v5/reference
YouTube Data API https://developers.google.com/youtube/v3
Reddit https://www.reddit.com/dev/api/
Vkontakte https://vk.com/dev/methods
Twitter API https://developer.twitter.com/en
Linkedin API https://docs.microsoft.com/en-us/linkedin/
All Facebook and Instagram API https://developers.facebook.com/docs/
Whatsapp Business API https://www.whatsapp.com/business/api
Telegram and Telegram Bot API https://core.telegram.org
Weibo API https://open.weibo.com/wiki/API文档/en
XING https://dev.xing.com/partners/job_integration/api_docs
Viber https://developers.viber.com/docs/api/rest-bot-api/
Discord https://discord.com/developers/docs
Odnoklassniki https://ok.ru/apiok
Blogger https://developers.google.com/blogger/ The Blogger APIs allows client applications to view and update Blogger content FREE
Disqus https://disqus.com/api/docs/auth/ Communicate with Disqus data FREE
Foursquare https://developer.foursquare.com/ Interact with Foursquare users and places (geolocation-based checkins, photos, tips, events, etc) FREE
HackerNews https://github.com/HackerNews/API Social news for CS and entrepreneurship FREE
Kakao https://developers.kakao.com/ Kakao Login, Share on KakaoTalk, Social Plugins and more FREE
Line https://developers.line.biz/ Line Login, Share on Line, Social Plugins and more FREE
TikTok https://developers.tiktok.com/doc/login-kit-web Fetches user info and user's video posts on TikTok platform FREE
Tumblr https://www.tumblr.com/docs/en/api/v2 Read and write Tumblr Data FREE

UNOFFICIAL APIs

!WARNING Use with caution! Accounts may be blocked permanently for using unofficial APIs.

Name Link Description Price
TikTok https://github.com/davidteather/TikTok-Api The Unofficial TikTok API Wrapper In Python FREE
Google Trends https://github.com/suryasev/unofficial-google-trends-api Unofficial Google Trends API FREE
YouTube Music https://github.com/sigma67/ytmusicapi Unofficial APi for YouTube Music FREE
Duolingo https://github.com/KartikTalwar/Duolingo Duolingo unofficial API (can gather info about users) FREE
Steam. https://github.com/smiley/steamapi An unofficial object-oriented Python library for accessing the Steam Web API. FREE
Instagram https://github.com/ping/instagram_private_api Instagram Private API FREE
Discord https://github.com/discordjs/discord.js JavaScript library for interacting with the Discord API FREE
Zhihu https://github.com/syaning/zhihu-api FREE Unofficial API for Zhihu FREE
Quora https://github.com/csu/quora-api Unofficial API for Quora FREE
DnsDumbster https://github.com/PaulSec/API-dnsdumpster.com (Unofficial) Python API for DnsDumbster FREE
PornHub https://github.com/sskender/pornhub-api Unofficial API for PornHub in Python FREE
Skype https://github.com/ShyykoSerhiy/skyweb Unofficial Skype API for nodejs via 'Skype (HTTP)' protocol. FREE
Google Search https://github.com/aviaryan/python-gsearch Google Search unofficial API for Python with no external dependencies FREE
Airbnb https://github.com/nderkach/airbnb-python Python wrapper around the Airbnb API (unofficial) FREE
Medium https://github.com/enginebai/PyMedium Unofficial Medium Python Flask API and SDK FREE
Facebook https://github.com/davidyen1124/Facebot Powerful unofficial Facebook API FREE
Linkedin https://github.com/tomquirk/linkedin-api Unofficial Linkedin API for Python FREE
Y2mate https://github.com/Simatwa/y2mate-api Unofficial Y2mate API for Python FREE
Livescore https://github.com/Simatwa/livescore-api Unofficial Livescore API for Python FREE

Search Engines

Name Link Description Price
Google Custom Search JSON API https://developers.google.com/custom-search/v1/overview Search in Google 100 requests FREE
Serpstack https://serpstack.com/ Google search results to JSON FREE
Serpapi https://serpapi.com Google, Baidu, Yandex, Yahoo, DuckDuckGo, Bint and many others search results $50/5000 searches/month
Bing Web Search API https://www.microsoft.com/en-us/bing/apis/bing-web-search-api Search in Bing (+instant answers and location) 1000 transactions per month FREE
WolframAlpha API https://products.wolframalpha.com/api/pricing/ Short answers, conversations, calculators and many more from $25 per 1000 queries
DuckDuckgo Instant Answers API https://duckduckgo.com/api An API for some of our Instant Answers, not for full search results. FREE

| Memex Marginalia | https://memex.marginalia.nu/projects/edge/api.gmi | An API for new privacy search engine | FREE |

News analyze

Name Link Description Price
MediaStack https://mediastack.com/ News articles search results in JSON 500 requests/month FREE

Darknet

Name Link Description Price
Darksearch.io https://darksearch.io/apidoc search by websites in .onion zone FREE
Onion Lookup https://onion.ail-project.org/ onion-lookup is a service for checking the existence of Tor hidden services and retrieving their associated metadata. onion-lookup relies on an private AIL instance to obtain the metadata FREE

Torrents/file sharing

Name Link Description Price
Jackett https://github.com/Jackett/Jackett API for automate searching in different torrent trackers FREE
Torrents API PY https://github.com/Jackett/Jackett Unofficial API for 1337x, Piratebay, Nyaasi, Torlock, Torrent Galaxy, Zooqle, Kickass, Bitsearch, MagnetDL,Libgen, YTS, Limetorrent, TorrentFunk, Glodls, Torre FREE
Torrent Search API https://github.com/Jackett/Jackett API for Torrent Search Engine with Extratorrents, Piratebay, and ISOhunt 500 queries/day FREE
Torrent search api https://github.com/JimmyLaurent/torrent-search-api Yet another node torrent scraper (supports iptorrents, torrentleech, torrent9, torrentz2, 1337x, thepiratebay, Yggtorrent, TorrentProject, Eztv, Yts, LimeTorrents) FREE
Torrentinim https://github.com/sergiotapia/torrentinim Very low memory-footprint, self hosted API-only torrent search engine. Sonarr + Radarr Compatible, native support for Linux, Mac and Windows. FREE

Vulnerabilities

Name Link Description Price
National Vulnerability Database CVE Search API https://nvd.nist.gov/developers/vulnerabilities Get basic information about CVE and CVE history FREE
OpenCVE API https://docs.opencve.io/api/cve/ Get basic information about CVE FREE
CVEDetails API https://www.cvedetails.com/documentation/apis Get basic information about CVE partly FREE (?)
CVESearch API https://docs.cvesearch.com/ Get basic information about CVE by request
KEVin API https://kevin.gtfkd.com/ API for accessing CISA's Known Exploited Vulnerabilities Catalog (KEV) and CVE Data FREE
Vulners.com API https://vulners.com Get basic information about CVE FREE for personal use

Flights

Name Link Description Price
Aviation Stack https://aviationstack.com get information about flights, aircrafts and airlines FREE
OpenSky Network https://opensky-network.org/apidoc/index.html Free real-time ADS-B aviation data FREE
AviationAPI https://docs.aviationapi.com/ FAA Aeronautical Charts and Publications, Airport Information, and Airport Weather FREE
FachaAPI https://api.facha.dev Aircraft details and live positioning API FREE

Webcams

Name Link Description Price
Windy Webcams API https://api.windy.com/webcams/docs Get a list of available webcams for a country, city or geographical coordinates FREE with limits or 9990 euro without limits

## Regex

Name Link Description Price
Autoregex https://autoregex.notion.site/AutoRegex-API-Documentation-97256bad2c114a6db0c5822860214d3a Convert English phrase to regular expression from $3.49/month

API testing tools

Name Link
API Guessr (detect API by auth key or by token) https://api-guesser.netlify.app/
REQBIN Online REST & SOAP API Testing Tool https://reqbin.com
ExtendClass Online REST Client https://extendsclass.com/rest-client-online.html
Codebeatify.org Online API Test https://codebeautify.org/api-test
SyncWith Google Sheet add-on. Link more than 1000 APIs with Spreadsheet https://workspace.google.com/u/0/marketplace/app/syncwith_crypto_binance_coingecko_airbox/449644239211?hl=ru&pann=sheets_addon_widget
Talend API Tester Google Chrome Extension https://workspace.google.com/u/0/marketplace/app/syncwith_crypto_binance_coingecko_airbox/449644239211?hl=ru&pann=sheets_addon_widget
Michael Bazzel APIs search tools https://inteltechniques.com/tools/API.html

Curl converters (tools that help to write code using API queries)

Name Link
Convert curl commands to Python, JavaScript, PHP, R, Go, C#, Ruby, Rust, Elixir, Java, MATLAB, Dart, CFML, Ansible URI or JSON https://curlconverter.com
Curl-to-PHP. Instantly convert curl commands to PHP code https://incarnate.github.io/curl-to-php/
Curl to PHP online (Codebeatify) https://codebeautify.org/curl-to-php-online
Curl to JavaScript fetch https://kigiri.github.io/fetch/
Curl to JavaScript fetch (Scrapingbee) https://www.scrapingbee.com/curl-converter/javascript-fetch/
Curl to C# converter https://curl.olsh.me

Create your own API

Name Link
Sheety. Create API frome GOOGLE SHEET https://sheety.co/
Postman. Platform for creating your own API https://www.postman.com
Reetoo. Rest API Generator https://retool.com/api-generator/
Beeceptor. Rest API mocking and intercepting in seconds (no coding). https://beeceptor.com

Distribute your own API

Name Link
RapidAPI. Market your API for millions of developers https://rapidapi.com/solution/api-provider/
Apilayer. API Marketplace https://apilayer.com

API Keys Info

Name Link Description
Keyhacks https://github.com/streaak/keyhacks Keyhacks is a repository which shows quick ways in which API keys leaked by a bug bounty program can be checked to see if they're valid.
All about APIKey https://github.com/daffainfo/all-about-apikey Detailed information about API key / OAuth token for different services (Description, Request, Response, Regex, Example)
API Guessr https://api-guesser.netlify.app/ Enter API Key and and find out which service they belong to

API directories

If you don't find what you need, try searching these directories.

Name Link Description
APIDOG ApiHub https://apidog.com/apihub/
Rapid APIs collection https://rapidapi.com/collections
API Ninjas https://api-ninjas.com/api
APIs Guru https://apis.guru/
APIs List https://apislist.com/
API Context Directory https://apicontext.com/api-directory/
Any API https://any-api.com/
Public APIs Github repo https://github.com/public-apis/public-apis

How to learn how to work with REST API?

If you don't know how to work with the REST API, I recommend you check out the Netlas API guide I wrote for Netlas.io.

Netlas Cookbook

There it is very brief and accessible to write how to automate requests in different programming languages (focus on Python and Bash) and process the resulting JSON data.

Thank you for following me! https://cybdetective.com



Scrapling - An Undetectable, Powerful, Flexible, High-Performance Python Library That Makes Web Scraping Simple And Easy Again!

By: Unknown


Dealing with failing web scrapers due to anti-bot protections or website changes? Meet Scrapling.

Scrapling is a high-performance, intelligent web scraping library for Python that automatically adapts to website changes while significantly outperforming popular alternatives. For both beginners and experts, Scrapling provides powerful features while maintaining simplicity.

>> from scrapling.defaults import Fetcher, AsyncFetcher, StealthyFetcher, PlayWrightFetcher
# Fetch websites' source under the radar!
>> page = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
>> print(page.status)
200
>> products = page.css('.product', auto_save=True) # Scrape data that survives website design changes!
>> # Later, if the website structure changes, pass `auto_match=True`
>> products = page.css('.product', auto_match=True) # and Scrapling still finds them!

Key Features

Fetch websites as you prefer with async support

  • HTTP Requests: Fast and stealthy HTTP requests with the Fetcher class.
  • Dynamic Loading & Automation: Fetch dynamic websites with the PlayWrightFetcher class through your real browser, Scrapling's stealth mode, Playwright's Chrome browser, or NSTbrowser's browserless!
  • Anti-bot Protections Bypass: Easily bypass protections with StealthyFetcher and PlayWrightFetcher classes.

Adaptive Scraping

  • 🔄 Smart Element Tracking: Relocate elements after website changes, using an intelligent similarity system and integrated storage.
  • 🎯 Flexible Selection: CSS selectors, XPath selectors, filters-based search, text search, regex search and more.
  • 🔍 Find Similar Elements: Automatically locate elements similar to the element you found!
  • 🧠 Smart Content Scraping: Extract data from multiple websites without specific selectors using Scrapling powerful features.

High Performance

  • 🚀 Lightning Fast: Built from the ground up with performance in mind, outperforming most popular Python scraping libraries.
  • 🔋 Memory Efficient: Optimized data structures for minimal memory footprint.
  • Fast JSON serialization: 10x faster than standard library.

Developer Friendly

  • 🛠️ Powerful Navigation API: Easy DOM traversal in all directions.
  • 🧬 Rich Text Processing: All strings have built-in regex, cleaning methods, and more. All elements' attributes are optimized dictionaries that takes less memory than standard dictionaries with added methods.
  • 📝 Auto Selectors Generation: Generate robust short and full CSS/XPath selectors for any element.
  • 🔌 Familiar API: Similar to Scrapy/BeautifulSoup and the same pseudo-elements used in Scrapy.
  • 📘 Type hints: Complete type/doc-strings coverage for future-proofing and best autocompletion support.

Getting Started

from scrapling.fetchers import Fetcher

fetcher = Fetcher(auto_match=False)

# Do http GET request to a web page and create an Adaptor instance
page = fetcher.get('https://quotes.toscrape.com/', stealthy_headers=True)
# Get all text content from all HTML tags in the page except `script` and `style` tags
page.get_all_text(ignore_tags=('script', 'style'))

# Get all quotes elements, any of these methods will return a list of strings directly (TextHandlers)
quotes = page.css('.quote .text::text') # CSS selector
quotes = page.xpath('//span[@class="text"]/text()') # XPath
quotes = page.css('.quote').css('.text::text') # Chained selectors
quotes = [element.text for element in page.css('.quote .text')] # Slower than bulk query above

# Get the first quote element
quote = page.css_first('.quote') # same as page.css('.quote').first or page.css('.quote')[0]

# Tired of selectors? Use find_all/find
# Get all 'div' HTML tags that one of its 'class' values is 'quote'
quotes = page.find_all('div', {'class': 'quote'})
# Same as
quotes = page.find_all('div', class_='quote')
quotes = page.find_all(['div'], class_='quote')
quotes = page.find_all(class_='quote') # and so on...

# Working with elements
quote.html_content # Get Inner HTML of this element
quote.prettify() # Prettified version of Inner HTML above
quote.attrib # Get that element's attributes
quote.path # DOM path to element (List of all ancestors from <html> tag till the element itself)

To keep it simple, all methods can be chained on top of each other!

Parsing Performance

Scrapling isn't just powerful - it's also blazing fast. Scrapling implements many best practices, design patterns, and numerous optimizations to save fractions of seconds. All of that while focusing exclusively on parsing HTML documents. Here are benchmarks comparing Scrapling to popular Python libraries in two tests.

Text Extraction Speed Test (5000 nested elements).

# Library Time (ms) vs Scrapling
1 Scrapling 5.44 1.0x
2 Parsel/Scrapy 5.53 1.017x
3 Raw Lxml 6.76 1.243x
4 PyQuery 21.96 4.037x
5 Selectolax 67.12 12.338x
6 BS4 with Lxml 1307.03 240.263x
7 MechanicalSoup 1322.64 243.132x
8 BS4 with html5lib 3373.75 620.175x

As you see, Scrapling is on par with Scrapy and slightly faster than Lxml which both libraries are built on top of. These are the closest results to Scrapling. PyQuery is also built on top of Lxml but still, Scrapling is 4 times faster.

Extraction By Text Speed Test

Library Time (ms) vs Scrapling
Scrapling 2.51 1.0x
AutoScraper 11.41 4.546x

Scrapling can find elements with more methods and it returns full element Adaptor objects not only the text like AutoScraper. So, to make this test fair, both libraries will extract an element with text, find similar elements, and then extract the text content for all of them. As you see, Scrapling is still 4.5 times faster at the same task.

All benchmarks' results are an average of 100 runs. See our benchmarks.py for methodology and to run your comparisons.

Installation

Scrapling is a breeze to get started with; Starting from version 0.2.9, we require at least Python 3.9 to work.

pip3 install scrapling

Then run this command to install browsers' dependencies needed to use Fetcher classes

scrapling install

If you have any installation issues, please open an issue.

Fetching Websites

Fetchers are interfaces built on top of other libraries with added features that do requests or fetch pages for you in a single request fashion and then return an Adaptor object. This feature was introduced because the only option we had before was to fetch the page as you wanted it, then pass it manually to the Adaptor class to create an Adaptor instance and start playing around with the page.

Features

You might be slightly confused by now so let me clear things up. All fetcher-type classes are imported in the same way

from scrapling.fetchers import Fetcher, StealthyFetcher, PlayWrightFetcher

All of them can take these initialization arguments: auto_match, huge_tree, keep_comments, keep_cdata, storage, and storage_args, which are the same ones you give to the Adaptor class.

If you don't want to pass arguments to the generated Adaptor object and want to use the default values, you can use this import instead for cleaner code:

from scrapling.defaults import Fetcher, AsyncFetcher, StealthyFetcher, PlayWrightFetcher

then use it right away without initializing like:

page = StealthyFetcher.fetch('https://example.com') 

Also, the Response object returned from all fetchers is the same as the Adaptor object except it has these added attributes: status, reason, cookies, headers, history, and request_headers. All cookies, headers, and request_headers are always of type dictionary.

[!NOTE] The auto_match argument is enabled by default which is the one you should care about the most as you will see later.

Fetcher

This class is built on top of httpx with additional configuration options, here you can do GET, POST, PUT, and DELETE requests.

For all methods, you have stealthy_headers which makes Fetcher create and use real browser's headers then create a referer header as if this request came from Google's search of this URL's domain. It's enabled by default. You can also set the number of retries with the argument retries for all methods and this will make httpx retry requests if it failed for any reason. The default number of retries for all Fetcher methods is 3.

Hence: All headers generated by stealthy_headers argument can be overwritten by you through the headers argument

You can route all traffic (HTTP and HTTPS) to a proxy for any of these methods in this format http://username:password@localhost:8030

>> page = Fetcher().get('https://httpbin.org/get', stealthy_headers=True, follow_redirects=True)
>> page = Fetcher().post('https://httpbin.org/post', data={'key': 'value'}, proxy='http://username:password@localhost:8030')
>> page = Fetcher().put('https://httpbin.org/put', data={'key': 'value'})
>> page = Fetcher().delete('https://httpbin.org/delete')

For Async requests, you will just replace the import like below:

>> from scrapling.fetchers import AsyncFetcher
>> page = await AsyncFetcher().get('https://httpbin.org/get', stealthy_headers=True, follow_redirects=True)
>> page = await AsyncFetcher().post('https://httpbin.org/post', data={'key': 'value'}, proxy='http://username:password@localhost:8030')
>> page = await AsyncFetcher().put('https://httpbin.org/put', data={'key': 'value'})
>> page = await AsyncFetcher().delete('https://httpbin.org/delete')

StealthyFetcher

This class is built on top of Camoufox, bypassing most anti-bot protections by default. Scrapling adds extra layers of flavors and configurations to increase performance and undetectability even further.

>> page = StealthyFetcher().fetch('https://www.browserscan.net/bot-detection')  # Running headless by default
>> page.status == 200
True
>> page = await StealthyFetcher().async_fetch('https://www.browserscan.net/bot-detection') # the async version of fetch
>> page.status == 200
True

Note: all requests done by this fetcher are waiting by default for all JS to be fully loaded and executed so you don't have to :)

For the sake of simplicity, expand this for the complete list of arguments | Argument | Description | Optional | |:-------------------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------:| | url | Target url | ❌ | | headless | Pass `True` to run the browser in headless/hidden (**default**), `virtual` to run it in virtual screen mode, or `False` for headful/visible mode. The `virtual` mode requires having `xvfb` installed. | ✔️ | | block_images | Prevent the loading of images through Firefox preferences. _This can help save your proxy usage but be careful with this option as it makes some websites never finish loading._ | ✔️ | | disable_resources | Drop requests of unnecessary resources for a speed boost. It depends but it made requests ~25% faster in my tests for some websites.
Requests dropped are of type `font`, `image`, `media`, `beacon`, `object`, `imageset`, `texttrack`, `websocket`, `csp_report`, and `stylesheet`. _This can help save your proxy usage but be careful with this option as it makes some websites never finish loading._ | ✔️ | | google_search | Enabled by default, Scrapling will set the referer header to be as if this request came from a Google search for this website's domain name. | ✔️ | | extra_headers | A dictionary of extra headers to add to the request. _The referer set by the `google_search` argument takes priority over the referer set here if used together._ | ✔️ | | block_webrtc | Blocks WebRTC entirely. | ✔️ | | page_action | Added for automation. A function that takes the `page` object, does the automation you need, then returns `page` again. | ✔️ | | addons | List of Firefox addons to use. **Must be paths to extracted addons.** | ✔️ | | humanize | Humanize the cursor movement. Takes either True or the MAX duration in seconds of the cursor movement. The cursor typically takes up to 1.5 seconds to move across the window. | ✔️ | | allow_webgl | Enabled by default. Disabling it WebGL not recommended as many WAFs now checks if WebGL is enabled. | ✔️ | | geoip | Recommended to use with proxies; Automatically use IP's longitude, latitude, timezone, country, locale, & spoof the WebRTC IP address. It will also calculate and spoof the browser's language based on the distribution of language speakers in the target region. | ✔️ | | disable_ads | Disabled by default, this installs `uBlock Origin` addon on the browser if enabled. | ✔️ | | network_idle | Wait for the page until there are no network connections for at least 500 ms. | ✔️ | | timeout | The timeout in milliseconds that is used in all operations and waits through the page. The default is 30000. | ✔️ | | wait_selector | Wait for a specific css selector to be in a specific state. | ✔️ | | proxy | The proxy to be used with requests, it can be a string or a dictionary with the keys 'server', 'username', and 'password' only. | ✔️ | | os_randomize | If enabled, Scrapling will randomize the OS fingerprints used. The default is Scrapling matching the fingerprints with the current OS. | ✔️ | | wait_selector_state | The state to wait for the selector given with `wait_selector`. _Default state is `attached`._ | ✔️ |

This list isn't final so expect a lot more additions and flexibility to be added in the next versions!

PlayWrightFetcher

This class is built on top of Playwright which currently provides 4 main run options but they can be mixed as you want.

>> page = PlayWrightFetcher().fetch('https://www.google.com/search?q=%22Scrapling%22', disable_resources=True)  # Vanilla Playwright option
>> page.css_first("#search a::attr(href)")
'https://github.com/D4Vinci/Scrapling'
>> page = await PlayWrightFetcher().async_fetch('https://www.google.com/search?q=%22Scrapling%22', disable_resources=True) # the async version of fetch
>> page.css_first("#search a::attr(href)")
'https://github.com/D4Vinci/Scrapling'

Note: all requests done by this fetcher are waiting by default for all JS to be fully loaded and executed so you don't have to :)

Using this Fetcher class, you can make requests with: 1) Vanilla Playwright without any modifications other than the ones you chose. 2) Stealthy Playwright with the stealth mode I wrote for it. It's still a WIP but it bypasses many online tests like Sannysoft's. Some of the things this fetcher's stealth mode does include: * Patching the CDP runtime fingerprint. * Mimics some of the real browsers' properties by injecting several JS files and using custom options. * Using custom flags on launch to hide Playwright even more and make it faster. * Generates real browser's headers of the same type and same user OS then append it to the request's headers. 3) Real browsers by passing the real_chrome argument or the CDP URL of your browser to be controlled by the Fetcher and most of the options can be enabled on it. 4) NSTBrowser's docker browserless option by passing the CDP URL and enabling nstbrowser_mode option.

Hence using the real_chrome argument requires that you have Chrome browser installed on your device

Add that to a lot of controlling/hiding options as you will see in the arguments list below.

Expand this for the complete list of arguments | Argument | Description | Optional | |:-------------------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------:| | url | Target url | ❌ | | headless | Pass `True` to run the browser in headless/hidden (**default**), or `False` for headful/visible mode. | ✔️ | | disable_resources | Drop requests of unnecessary resources for a speed boost. It depends but it made requests ~25% faster in my tests for some websites.
Requests dropped are of type `font`, `image`, `media`, `beacon`, `object`, `imageset`, `texttrack`, `websocket`, `csp_report`, and `stylesheet`. _This can help save your proxy usage but be careful with this option as it makes some websites never finish loading._ | ✔️ | | useragent | Pass a useragent string to be used. **Otherwise the fetcher will generate a real Useragent of the same browser and use it.** | ✔️ | | network_idle | Wait for the page until there are no network connections for at least 500 ms. | ✔️ | | timeout | The timeout in milliseconds that is used in all operations and waits through the page. The default is 30000. | ✔️ | | page_action | Added for automation. A function that takes the `page` object, does the automation you need, then returns `page` again. | ✔️ | | wait_selector | Wait for a specific css selector to be in a specific state. | ✔️ | | wait_selector_state | The state to wait for the selector given with `wait_selector`. _Default state is `attached`._ | ✔️ | | google_search | Enabled by default, Scrapling will set the referer header to be as if this request came from a Google search for this website's domain name. | ✔️ | | extra_headers | A dictionary of extra headers to add to the request. The referer set by the `google_search` argument takes priority over the referer set here if used together. | ✔️ | | proxy | The proxy to be used with requests, it can be a string or a dictionary with the keys 'server', 'username', and 'password' only. | ✔️ | | hide_canvas | Add random noise to canvas operations to prevent fingerprinting. | ✔️ | | disable_webgl | Disables WebGL and WebGL 2.0 support entirely. | ✔️ | | stealth | Enables stealth mode, always check the documentation to see what stealth mode does currently. | ✔️ | | real_chrome | If you have Chrome browser installed on your device, enable this and the Fetcher will launch an instance of your browser and use it. | ✔️ | | locale | Set the locale for the browser if wanted. The default value is `en-US`. | ✔️ | | cdp_url | Instead of launching a new browser instance, connect to this CDP URL to control real browsers/NSTBrowser through CDP. | ✔️ | | nstbrowser_mode | Enables NSTBrowser mode, **it have to be used with `cdp_url` argument or it will get completely ignored.** | ✔️ | | nstbrowser_config | The config you want to send with requests to the NSTBrowser. _If left empty, Scrapling defaults to an optimized NSTBrowser's docker browserless config._ | ✔️ |

This list isn't final so expect a lot more additions and flexibility to be added in the next versions!

Advanced Parsing Features

Smart Navigation

>>> quote.tag
'div'

>>> quote.parent
<data='<div class="col-md-8"> <div class="quote...' parent='<div class="row"> <div class="col-md-8">...'>

>>> quote.parent.tag
'div'

>>> quote.children
[<data='<span class="text" itemprop="text">"The...' parent='<div class="quote" itemscope itemtype="h...'>,
<data='<span>by <small class="author" itemprop=...' parent='<div class="quote" itemscope itemtype="h...'>,
<data='<div class="tags"> Tags: <meta class="ke...' parent='<div class="quote" itemscope itemtype="h...'>]

>>> quote.siblings
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]

>>> quote.next # gets the next element, the same logic applies to `quote.previous`
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>

>>> quote.children.css_first(".author::text")
'Albert Einstein'

>>> quote.has_class('quote')
True

# Generate new selectors for any element
>>> quote.generate_css_selector
'body > div > div:nth-of-type(2) > div > div'

# Test these selectors on your favorite browser or reuse them again in the library's methods!
>>> quote.generate_xpath_selector
'//body/div/div[2]/div/div'

If your case needs more than the element's parent, you can iterate over the whole ancestors' tree of any element like below

for ancestor in quote.iterancestors():
# do something with it...

You can search for a specific ancestor of an element that satisfies a function, all you need to do is to pass a function that takes an Adaptor object as an argument and return True if the condition satisfies or False otherwise like below:

>>> quote.find_ancestor(lambda ancestor: ancestor.has_class('row'))
<data='<div class="row"> <div class="col-md-8">...' parent='<div class="container"> <div class="row...'>

Content-based Selection & Finding Similar Elements

You can select elements by their text content in multiple ways, here's a full example on another website:

>>> page = Fetcher().get('https://books.toscrape.com/index.html')

>>> page.find_by_text('Tipping the Velvet') # Find the first element whose text fully matches this text
<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>

>>> page.urljoin(page.find_by_text('Tipping the Velvet').attrib['href']) # We use `page.urljoin` to return the full URL from the relative `href`
'https://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html'

>>> page.find_by_text('Tipping the Velvet', first_match=False) # Get all matches if there are more
[<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>]

>>> page.find_by_regex(r'£[\d\.]+') # Get the first element that its text content matches my price regex
<data='<p class="price_color">£51.77</p>' parent='<div class="product_price"> <p class="pr...'>

>>> page.find_by_regex(r'£[\d\.]+', first_match=False) # Get all elements that matches my price regex
[<data='<p class="price_color">£51.77</p>' parent='<div class="product_price"> <p class="pr...'>,
<data='<p class="price_color">£53.74</p>' parent='<div class="product_price"> <p class="pr...'>,
<data='<p class="price_color">£50.10</p>' parent='<div class="product_price"> <p class="pr...'>,
<data='<p class="price_color">£47.82</p>' parent='<div class="product_price"> <p class="pr...'>,
...]

Find all elements that are similar to the current element in location and attributes

# For this case, ignore the 'title' attribute while matching
>>> page.find_by_text('Tipping the Velvet').find_similar(ignore_attributes=['title'])
[<data='<a href="catalogue/a-light-in-the-attic_...' parent='<h3><a href="catalogue/a-light-in-the-at...'>,
<data='<a href="catalogue/soumission_998/index....' parent='<h3><a href="catalogue/soumission_998/in...'>,
<data='<a href="catalogue/sharp-objects_997/ind...' parent='<h3><a href="catalogue/sharp-objects_997...'>,
...]

# You will notice that the number of elements is 19 not 20 because the current element is not included.
>>> len(page.find_by_text('Tipping the Velvet').find_similar(ignore_attributes=['title']))
19

# Get the `href` attribute from all similar elements
>>> [element.attrib['href'] for element in page.find_by_text('Tipping the Velvet').find_similar(ignore_attributes=['title'])]
['catalogue/a-light-in-the-attic_1000/index.html',
'catalogue/soumission_998/index.html',
'catalogue/sharp-objects_997/index.html',
...]

To increase the complexity a little bit, let's say we want to get all books' data using that element as a starting point for some reason

>>> for product in page.find_by_text('Tipping the Velvet').parent.parent.find_similar():
print({
"name": product.css_first('h3 a::text'),
"price": product.css_first('.price_color').re_first(r'[\d\.]+'),
"stock": product.css('.availability::text')[-1].clean()
})
{'name': 'A Light in the ...', 'price': '51.77', 'stock': 'In stock'}
{'name': 'Soumission', 'price': '50.10', 'stock': 'In stock'}
{'name': 'Sharp Objects', 'price': '47.82', 'stock': 'In stock'}
...

The documentation will provide more advanced examples.

Handling Structural Changes

Let's say you are scraping a page with a structure like this:

<div class="container">
<section class="products">
<article class="product" id="p1">
<h3>Product 1</h3>
<p class="description">Description 1</p>
</article>
<article class="product" id="p2">
<h3>Product 2</h3>
<p class="description">Description 2</p>
</article>
</section>
</div>

And you want to scrape the first product, the one with the p1 ID. You will probably write a selector like this

page.css('#p1')

When website owners implement structural changes like

<div class="new-container">
<div class="product-wrapper">
<section class="products">
<article class="product new-class" data-id="p1">
<div class="product-info">
<h3>Product 1</h3>
<p class="new-description">Description 1</p>
</div>
</article>
<article class="product new-class" data-id="p2">
<div class="product-info">
<h3>Product 2</h3>
<p class="new-description">Description 2</p>
</div>
</article>
</section>
</div>
</div>

The selector will no longer function and your code needs maintenance. That's where Scrapling's auto-matching feature comes into play.

from scrapling.parser import Adaptor
# Before the change
page = Adaptor(page_source, url='example.com')
element = page.css('#p1' auto_save=True)
if not element: # One day website changes?
element = page.css('#p1', auto_match=True) # Scrapling still finds it!
# the rest of the code...

How does the auto-matching work? Check the FAQs section for that and other possible issues while auto-matching.

Real-World Scenario

Let's use a real website as an example and use one of the fetchers to fetch its source. To do this we need to find a website that will change its design/structure soon, take a copy of its source then wait for the website to make the change. Of course, that's nearly impossible to know unless I know the website's owner but that will make it a staged test haha.

To solve this issue, I will use The Web Archive's Wayback Machine. Here is a copy of StackOverFlow's website in 2010, pretty old huh?Let's test if the automatch feature can extract the same button in the old design from 2010 and the current design using the same selector :)

If I want to extract the Questions button from the old design I can use a selector like this #hmenus > div:nth-child(1) > ul > li:nth-child(1) > a This selector is too specific because it was generated by Google Chrome. Now let's test the same selector in both versions

>> from scrapling.fetchers import Fetcher
>> selector = '#hmenus > div:nth-child(1) > ul > li:nth-child(1) > a'
>> old_url = "https://web.archive.org/web/20100102003420/http://stackoverflow.com/"
>> new_url = "https://stackoverflow.com/"
>>
>> page = Fetcher(automatch_domain='stackoverflow.com').get(old_url, timeout=30)
>> element1 = page.css_first(selector, auto_save=True)
>>
>> # Same selector but used in the updated website
>> page = Fetcher(automatch_domain="stackoverflow.com").get(new_url)
>> element2 = page.css_first(selector, auto_match=True)
>>
>> if element1.text == element2.text:
... print('Scrapling found the same element in the old design and the new design!')
'Scrapling found the same element in the old design and the new design!'

Note that I used a new argument called automatch_domain, this is because for Scrapling these are two different URLs, not the website so it isolates their data. To tell Scrapling they are the same website, we then pass the domain we want to use for saving auto-match data for them both so Scrapling doesn't isolate them.

In a real-world scenario, the code will be the same except it will use the same URL for both requests so you won't need to use the automatch_domain argument. This is the closest example I can give to real-world cases so I hope it didn't confuse you :)

Notes: 1. For the two examples above I used one time the Adaptor class and the second time the Fetcher class just to show you that you can create the Adaptor object by yourself if you have the source or fetch the source using any Fetcher class then it will create the Adaptor object for you. 2. Passing the auto_save argument with the auto_match argument set to False while initializing the Adaptor/Fetcher object will only result in ignoring the auto_save argument value and the following warning message text Argument `auto_save` will be ignored because `auto_match` wasn't enabled on initialization. Check docs for more info. This behavior is purely for performance reasons so the database gets created/connected only when you are planning to use the auto-matching features. Same case with the auto_match argument.

  1. The auto_match parameter works only for Adaptor instances not Adaptors so if you do something like this you will get an error python page.css('body').css('#p1', auto_match=True) because you can't auto-match a whole list, you have to be specific and do something like python page.css_first('body').css('#p1', auto_match=True)

Find elements by filters

Inspired by BeautifulSoup's find_all function you can find elements by using find_all/find methods. Both methods can take multiple types of filters and return all elements in the pages that all these filters apply to.

  • To be more specific:
  • Any string passed is considered a tag name
  • Any iterable passed like List/Tuple/Set is considered an iterable of tag names.
  • Any dictionary is considered a mapping of HTML element(s) attribute names and attribute values.
  • Any regex patterns passed are used as filters to elements by their text content
  • Any functions passed are used as filters
  • Any keyword argument passed is considered as an HTML element attribute with its value.

So the way it works is after collecting all passed arguments and keywords, each filter passes its results to the following filter in a waterfall-like filtering system.
It filters all elements in the current page/element in the following order:

  1. All elements with the passed tag name(s).
  2. All elements that match all passed attribute(s).
  3. All elements that its text content match all passed regex patterns.
  4. All elements that fulfill all passed function(s).

Note: The filtering process always starts from the first filter it finds in the filtering order above so if no tag name(s) are passed but attributes are passed, the process starts from that layer and so on. But the order in which you pass the arguments doesn't matter.

Examples to clear any confusion :)

>> from scrapling.fetchers import Fetcher
>> page = Fetcher().get('https://quotes.toscrape.com/')
# Find all elements with tag name `div`.
>> page.find_all('div')
[<data='<div class="container"> <div class="row...' parent='<body> <div class="container"> <div clas...'>,
<data='<div class="row header-box"> <div class=...' parent='<div class="container"> <div class="row...'>,
...]

# Find all div elements with a class that equals `quote`.
>> page.find_all('div', class_='quote')
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]

# Same as above.
>> page.find_all('div', {'class': 'quote'})
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]

# Find all elements with a class that equals `quote`.
>> page.find_all({'class': 'quote'})
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]

# Find all div elements with a class that equals `quote`, and contains the element `.text` which contains the word 'world' in its content.
>> page.find_all('div', {'class': 'quote'}, lambda e: "world" in e.css_first('.text::text'))
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>]

# Find all elements that don't have children.
>> page.find_all(lambda element: len(element.children) > 0)
[<data='<html lang="en"><head><meta charset="UTF...'>,
<data='<head><meta charset="UTF-8"><title>Quote...' parent='<html lang="en"><head><meta charset="UTF...'>,
<data='<body> <div class="container"> <div clas...' parent='<html lang="en"><head><meta charset="UTF...'>,
...]

# Find all elements that contain the word 'world' in its content.
>> page.find_all(lambda element: "world" in element.text)
[<data='<span class="text" itemprop="text">"The...' parent='<div class="quote" itemscope itemtype="h...'>,
<data='<a class="tag" href="/tag/world/page/1/"...' parent='<div class="tags"> Tags: <meta class="ke...'>]

# Find all span elements that match the given regex
>> page.find_all('span', re.compile(r'world'))
[<data='<span class="text" itemprop="text">"The...' parent='<div class="quote" itemscope itemtype="h...'>]

# Find all div and span elements with class 'quote' (No span elements like that so only div returned)
>> page.find_all(['div', 'span'], {'class': 'quote'})
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]

# Mix things up
>> page.find_all({'itemtype':"http://schema.org/CreativeWork"}, 'div').css('.author::text')
['Albert Einstein',
'J.K. Rowling',
...]

Is That All?

Here's what else you can do with Scrapling:

  • Accessing the lxml.etree object itself of any element directly python >>> quote._root <Element div at 0x107f98870>
  • Saving and retrieving elements manually to auto-match them outside the css and the xpath methods but you have to set the identifier by yourself.

  • To save an element to the database: python >>> element = page.find_by_text('Tipping the Velvet', first_match=True) >>> page.save(element, 'my_special_element')

  • Now later when you want to retrieve it and relocate it inside the page with auto-matching, it would be like this python >>> element_dict = page.retrieve('my_special_element') >>> page.relocate(element_dict, adaptor_type=True) [<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>] >>> page.relocate(element_dict, adaptor_type=True).css('::text') ['Tipping the Velvet']
  • if you want to keep it as lxml.etree object, leave the adaptor_type argument python >>> page.relocate(element_dict) [<Element a at 0x105a2a7b0>]

  • Filtering results based on a function

# Find all products over $50
expensive_products = page.css('.product_pod').filter(
lambda p: float(p.css('.price_color').re_first(r'[\d\.]+')) > 50
)
  • Searching results for the first one that matches a function
# Find all the products with price '53.23'
page.css('.product_pod').search(
lambda p: float(p.css('.price_color').re_first(r'[\d\.]+')) == 54.23
)
  • Doing operations on element content is the same as scrapy python quote.re(r'regex_pattern') # Get all strings (TextHandlers) that match the regex pattern quote.re_first(r'regex_pattern') # Get the first string (TextHandler) only quote.json() # If the content text is jsonable, then convert it to json using `orjson` which is 10x faster than the standard json library and provides more options except that you can do more with them like python quote.re( r'regex_pattern', replace_entities=True, # Character entity references are replaced by their corresponding character clean_match=True, # This will ignore all whitespaces and consecutive spaces while matching case_sensitive= False, # Set the regex to ignore letters case while compiling it ) Hence all of these methods are methods from the TextHandler within that contains the text content so the same can be done directly if you call the .text property or equivalent selector function.

  • Doing operations on the text content itself includes

  • Cleaning the text from any white spaces and replacing consecutive spaces with single space python quote.clean()
  • You already know about the regex matching and the fast json parsing but did you know that all strings returned from the regex search are actually TextHandler objects too? so in cases where you have for example a JS object assigned to a JS variable inside JS code and want to extract it with regex and then convert it to json object, in other libraries, these would be more than 1 line of code but here you can do it in 1 line like this python page.xpath('//script/text()').re_first(r'var dataLayer = (.+);').json()
  • Sort all characters in the string as if it were a list and return the new string python quote.sort(reverse=False)

    To be clear, TextHandler is a sub-class of Python's str so all normal operations/methods that work with Python strings will work with it.

  • Any element's attributes are not exactly a dictionary but a sub-class of mapping called AttributesHandler that's read-only so it's faster and string values returned are actually TextHandler objects so all operations above can be done on them, standard dictionary operations that don't modify the data, and more :)

  • Unlike standard dictionaries, here you can search by values too and can do partial searches. It might be handy in some cases (returns a generator of matches) python >>> for item in element.attrib.search_values('catalogue', partial=True): print(item) {'href': 'catalogue/tipping-the-velvet_999/index.html'}
  • Serialize the current attributes to JSON bytes: python >>> element.attrib.json_string b'{"href":"catalogue/tipping-the-velvet_999/index.html","title":"Tipping the Velvet"}'
  • Converting it to a normal dictionary python >>> dict(element.attrib) {'href': 'catalogue/tipping-the-velvet_999/index.html', 'title': 'Tipping the Velvet'}

Scrapling is under active development so expect many more features coming soon :)

More Advanced Usage

There are a lot of deep details skipped here to make this as short as possible so to take a deep dive, head to the docs section. I will try to keep it updated as possible and add complex examples. There I will explain points like how to write your storage system, write spiders that don't depend on selectors at all, and more...

Note that implementing your storage system can be complex as there are some strict rules such as inheriting from the same abstract class, following the singleton design pattern used in other classes, and more. So make sure to read the docs first.

[!IMPORTANT] A website is needed to provide detailed library documentation.
I'm trying to rush creating the website, researching new ideas, and adding more features/tests/benchmarks but time is tight with too many spinning plates between work, personal life, and working on Scrapling. I have been working on Scrapling for months for free after all.

If you like Scrapling and want it to keep improving then this is a friendly reminder that you can help by supporting me through the sponsor button.

⚡ Enlightening Questions and FAQs

This section addresses common questions about Scrapling, please read this section before opening an issue.

How does auto-matching work?

  1. You need to get a working selector and run it at least once with methods css or xpath with the auto_save parameter set to True before structural changes happen.
  2. Before returning results for you, Scrapling uses its configured database and saves unique properties about that element.
  3. Now because everything about the element can be changed or removed, nothing from the element can be used as a unique identifier for the database. To solve this issue, I made the storage system rely on two things:

    1. The domain of the URL you gave while initializing the first Adaptor object
    2. The identifier parameter you passed to the method while selecting. If you didn't pass one, then the selector string itself will be used as an identifier but remember you will have to use it as an identifier value later when the structure changes and you want to pass the new selector.

    Together both are used to retrieve the element's unique properties from the database later. 4. Now later when you enable the auto_match parameter for both the Adaptor instance and the method call. The element properties are retrieved and Scrapling loops over all elements in the page and compares each one's unique properties to the unique properties we already have for this element and a score is calculated for each one. 5. Comparing elements is not exact but more about finding how similar these values are, so everything is taken into consideration, even the values' order, like the order in which the element class names were written before and the order in which the same element class names are written now. 6. The score for each element is stored in the table, and the element(s) with the highest combined similarity scores are returned.

How does the auto-matching work if I didn't pass a URL while initializing the Adaptor object?

Not a big problem as it depends on your usage. The word default will be used in place of the URL field while saving the element's unique properties. So this will only be an issue if you used the same identifier later for a different website that you didn't pass the URL parameter while initializing it as well. The save process will overwrite the previous data and auto-matching uses the latest saved properties only.

If all things about an element can change or get removed, what are the unique properties to be saved?

For each element, Scrapling will extract: - Element tag name, text, attributes (names and values), siblings (tag names only), and path (tag names only). - Element's parent tag name, attributes (names and values), and text.

I have enabled the auto_save/auto_match parameter while selecting and it got completely ignored with a warning message

That's because passing the auto_save/auto_match argument without setting auto_match to True while initializing the Adaptor object will only result in ignoring the auto_save/auto_match argument value. This behavior is purely for performance reasons so the database gets created only when you are planning to use the auto-matching features.

I have done everything as the docs but the auto-matching didn't return anything, what's wrong?

It could be one of these reasons: 1. No data were saved/stored for this element before. 2. The selector passed is not the one used while storing element data. The solution is simple - Pass the old selector again as an identifier to the method called. - Retrieve the element with the retrieve method using the old selector as identifier then save it again with the save method and the new selector as identifier. - Start using the identifier argument more often if you are planning to use every new selector from now on. 3. The website had some extreme structural changes like a new full design. If this happens a lot with this website, the solution would be to make your code as selector-free as possible using Scrapling features.

Can Scrapling replace code built on top of BeautifulSoup4?

Pretty much yeah, almost all features you get from BeautifulSoup can be found or achieved in Scrapling one way or another. In fact, if you see there's a feature in bs4 that is missing in Scrapling, please make a feature request from the issues tab to let me know.

Can Scrapling replace code built on top of AutoScraper?

Of course, you can find elements by text/regex, find similar elements in a more reliable way than AutoScraper, and finally save/retrieve elements manually to use later as the model feature in AutoScraper. I have pulled all top articles about AutoScraper from Google and tested Scrapling against examples in them. In all examples, Scrapling got the same results as AutoScraper in much less time.

Is Scrapling thread-safe?

Yes, Scrapling instances are thread-safe. Each Adaptor instance maintains its state.

More Sponsors!

Contributing

Everybody is invited and welcome to contribute to Scrapling. There is a lot to do!

Please read the contributing file before doing anything.

Disclaimer for Scrapling Project

[!CAUTION] This library is provided for educational and research purposes only. By using this library, you agree to comply with local and international laws regarding data scraping and privacy. The authors and contributors are not responsible for any misuse of this software. This library should not be used to violate the rights of others, for unethical purposes, or to use data in an unauthorized or illegal manner. Do not use it on any website unless you have permission from the website owner or within their allowed rules like the robots.txt file, for example.

License

This work is licensed under BSD-3

Acknowledgments

This project includes code adapted from: - Parsel (BSD License) - Used for translator submodule

Thanks and References

Known Issues

  • In the auto-matching save process, the unique properties of the first element from the selection results are the only ones that get saved. So if the selector you are using selects different elements on the page that are in different locations, auto-matching will probably return to you the first element only when you relocate it later. This doesn't include combined CSS selectors (Using commas to combine more than one selector for example) as these selectors get separated and each selector gets executed alone.

Designed & crafted with ❤️ by Karim Shoair.



PEGASUS-NEO - A Comprehensive Penetration Testing Framework Designed For Security Professionals And Ethical Hackers. It Combines Multiple Security Tools And Custom Modules For Reconnaissance, Exploitation, Wireless Attacks, Web Hacking, And More

By: Unknown


                              ____                                  _   _ 
| _ \ ___ __ _ __ _ ___ _ _ ___| \ | |
| |_) / _ \/ _` |/ _` / __| | | / __| \| |
| __/ __/ (_| | (_| \__ \ |_| \__ \ |\ |
|_| \___|\__, |\__,_|___/\__,_|___/_| \_|
|___/
███▄ █ ▓█████ ▒█████
██ ▀█ █ ▓█ ▀ ▒██▒ ██▒
▓██ ▀█ ██▒▒███ ▒██░ ██▒
▓██▒ ▐▌██▒▒▓█ ▄ ▒██ ██░
▒██░ ▓██░░▒████▒░ ████▓▒░
░ ▒░ ▒ ▒ ░░ ▒░ ░░ ▒░▒░▒░
░ ░░ ░ ▒░ ░ ░ ░ ░ ▒ ▒░
░ ░ ░ ░ ░ ░ ░ ▒
░ ░ ░ ░ ░

PEGASUS-NEO Penetration Testing Framework

 

🛡️ Description

PEGASUS-NEO is a comprehensive penetration testing framework designed for security professionals and ethical hackers. It combines multiple security tools and custom modules for reconnaissance, exploitation, wireless attacks, web hacking, and more.

⚠️ Legal Disclaimer

This tool is provided for educational and ethical testing purposes only. Usage of PEGASUS-NEO for attacking targets without prior mutual consent is illegal. It is the end user's responsibility to obey all applicable local, state, and federal laws.

Developers assume no liability and are not responsible for any misuse or damage caused by this program.

🔒 Copyright Notice

PEGASUS-NEO - Advanced Penetration Testing Framework
Copyright (C) 2024 Letda Kes dr. Sobri. All rights reserved.

This software is proprietary and confidential. Unauthorized copying, transfer, or
reproduction of this software, via any medium is strictly prohibited.

Written by Letda Kes dr. Sobri <muhammadsobrimaulana31@gmail.com>, January 2024

🌟 Features

Password: Sobri

  • Reconnaissance & OSINT
  • Network scanning
  • Email harvesting
  • Domain enumeration
  • Social media tracking

  • Exploitation & Pentesting

  • Automated exploitation
  • Password attacks
  • SQL injection
  • Custom payload generation

  • Wireless Attacks

  • WiFi cracking
  • Evil twin attacks
  • WPS exploitation

  • Web Attacks

  • Directory scanning
  • XSS detection
  • SQL injection
  • CMS scanning

  • Social Engineering

  • Phishing templates
  • Email spoofing
  • Credential harvesting

  • Tracking & Analysis

  • IP geolocation
  • Phone number tracking
  • Email analysis
  • Social media hunting

🔧 Installation

# Clone the repository
git clone https://github.com/sobri3195/pegasus-neo.git

# Change directory
cd pegasus-neo

# Install dependencies
sudo python3 -m pip install -r requirements.txt

# Run the tool
sudo python3 pegasus_neo.py

📋 Requirements

  • Python 3.8+
  • Linux Operating System (Kali/Ubuntu recommended)
  • Root privileges
  • Internet connection

🚀 Usage

  1. Start the tool:
sudo python3 pegasus_neo.py
  1. Enter authentication password
  2. Select category from main menu
  3. Choose specific tool or module
  4. Follow on-screen instructions

🔐 Security Features

  • Source code protection
  • Integrity checking
  • Anti-tampering mechanisms
  • Encrypted storage
  • Authentication system

🛠️ Supported Tools

Reconnaissance & OSINT

  • Nmap
  • Wireshark
  • Maltego
  • Shodan
  • theHarvester
  • Recon-ng
  • SpiderFoot
  • FOCA
  • Metagoofil

Exploitation & Pentesting

  • Metasploit
  • SQLmap
  • Commix
  • BeEF
  • SET
  • Hydra
  • John the Ripper
  • Hashcat

Wireless Hacking

  • Aircrack-ng
  • Kismet
  • WiFite
  • Fern Wifi Cracker
  • Reaver
  • Wifiphisher
  • Cowpatty
  • Fluxion

Web Hacking

  • Burp Suite
  • OWASP ZAP
  • Nikto
  • XSStrike
  • Wapiti
  • Sublist3r
  • DirBuster
  • WPScan

📝 Version History

  • v1.0.0 (2024-01) - Initial release
  • v1.1.0 (2024-02) - Added tracking modules
  • v1.2.0 (2024-03) - Added tool installer

👥 Contributing

This is a proprietary project and contributions are not accepted at this time.

🤝 Support

For support, please email muhammadsobrimaulana31@gmail.com atau https://lynk.id/muhsobrimaulana

⚖️ License

This project is protected under proprietary license. See the LICENSE file for details.

Made with ❤️ by Letda Kes dr. Sobri



Cloud Data Security Play Sentra Raises $50 Million Series B 

Sentra has now raised north of $100 million for controls technology to keep sensitive data out of misconfigured AI workflows.

The post Cloud Data Security Play Sentra Raises $50 Million Series B  appeared first on SecurityWeek.

Pillar Security Banks $9M for AI Security Guardrails

Shield Capital leads a $9 million seed-stage funding round for Israeli startup building technologies for AI security and privacy guardrails.

The post Pillar Security Banks $9M for AI Security Guardrails appeared first on SecurityWeek.

Insurance Firm Lemonade Says API Glitch Exposed Some Driver’s License Numbers

Lemonade says the incident is not material and that its operations were not compromised, nor was its customer data targeted.

The post Insurance Firm Lemonade Says API Glitch Exposed Some Driver’s License Numbers appeared first on SecurityWeek.

Trump Revenge Tour Targets Cyber Leaders, Elections

President Trump last week revoked security clearances for Chris Krebs, the former director of the Cybersecurity and Infrastructure Security Agency (CISA) who was fired by Trump after declaring the 2020 election the most secure in U.S. history. The White House memo, which also suspended clearances for other security professionals at Krebs’s employer SentinelOne, comes as CISA is facing huge funding and staffing cuts.

Chris Krebs. Image: Getty Images.

The extraordinary April 9 memo directs the attorney general to investigate Chris Krebs (no relation), calling him “a significant bad-faith actor who weaponized and abused his government authority.”

The memo said the inquiry will include “a comprehensive evaluation of all of CISA’s activities over the last 6 years and will identify any instances where Krebs’ or CISA’s conduct appears to be contrary to the administration’s commitment to free speech and ending federal censorship, including whether Krebs’ conduct was contrary to suitability standards for federal employees or involved the unauthorized dissemination of classified information.”

CISA was created in 2018 during Trump’s first term, with Krebs installed as its first director. In 2020, CISA launched Rumor Control, a website that sought to rebut disinformation swirling around the 2020 election.

That effort ran directly counter to Trump’s claims that he lost the election because it was somehow hacked and stolen. The Trump campaign and its supporters filed at least 62 lawsuits contesting the election, vote counting, and vote certification in nine states, and nearly all of those cases were dismissed or dropped for lack of evidence or standing.

When the Justice Department began prosecuting people who violently attacked the U.S. Capitol on January 6, 2021, President Trump and Republican leaders shifted the narrative, claiming that Trump lost the election because the previous administration had censored conservative voices on social media.

Incredibly, the president’s memo seeking to ostracize Krebs stands reality on its head, accusing Krebs of promoting the censorship of election information, “including known risks associated with certain voting practices.” Trump also alleged that Krebs “falsely and baselessly denied that the 2020 election was rigged and stolen, including by inappropriately and categorically dismissing widespread election malfeasance and serious vulnerabilities with voting machines” [emphasis added].

Krebs did not respond to a request for comment. SentinelOne issued a statement saying it would cooperate in any review of security clearances held by its personnel, which is currently fewer than 10 employees.

Krebs’s former agency is now facing steep budget and staff reductions. The Record reports that CISA is looking to remove some 1,300 people by cutting about half its full-time staff and another 40% of its contractors.

“The agency’s National Risk Management Center, which serves as a hub analyzing risks to cyber and critical infrastructure, is expected to see significant cuts, said two sources familiar with the plans,” The Record’s Suzanne Smalley wrote. “Some of the office’s systematic risk responsibilities will potentially be moved to the agency’s Cybersecurity Division, according to one of the sources.”

CNN reports the Trump administration is also advancing plans to strip civil service protections from 80% of the remaining CISA employees, potentially allowing them to be fired for political reasons.

The Electronic Frontier Foundation (EFF) urged professionals in the cybersecurity community to defend Krebs and SentinelOne, noting that other security companies and professionals could be the next victims of Trump’s efforts to politicize cybersecurity.

“The White House must not be given free reign to turn cybersecurity professionals into political scapegoats,” the EFF wrote. “It is critical that the cybersecurity community now join together to denounce this chilling attack on free speech and rally behind Krebs and SentinelOne rather than cowering because they fear they will be next.”

However, Reuters said it found little sign of industry support for Krebs or SentinelOne, and that many security professionals are concerned about potentially being targeted if they speak out.

“Reuters contacted 33 of the largest U.S. cybersecurity companies, including tech companies and professional services firms with large cybersecurity practices, and three industry groups, for comment on Trump’s action against SentinelOne,” wrote Raphael Satter and A.J. Vicens. “Only one offered comment on Trump’s action. The rest declined, did not respond or did not answer questions.”

CYBERCOM-PLICATIONS

On April 3, President Trump fired Gen. Timothy Haugh, the head of the National Security Agency (NSA) and the U.S. Cyber Command, as well as Haugh’s deputy, Wendy Noble. The president did so immediately after meeting in the Oval Office with far-right conspiracy theorist Laura Loomer, who reportedly urged their dismissal. Speaking to reporters on Air Force One after news of the firings broke, Trump questioned Haugh’s loyalty.

Gen. Timothy Haugh. Image: C-SPAN.

Virginia Senator Mark Warner, the top Democrat on the Senate Intelligence Committee, called it inexplicable that the administration would remove the senior leaders of NSA-CYBERCOM without cause or warning, and risk disrupting critical ongoing intelligence operations.

“It is astonishing, too, that President Trump would fire the nonpartisan, experienced leader of the National Security Agency while still failing to hold any member of his team accountable for leaking classified information on a commercial messaging app – even as he apparently takes staffing direction on national security from a discredited conspiracy theorist in the Oval Office,” Warner said in a statement.

On Feb. 28, The Record’s Martin Matishak cited three sources saying Defense Secretary Pete Hegseth ordered U.S. Cyber Command to stand down from all planning against Russia, including offensive digital actions. The following day, The Guardian reported that analysts at CISA were verbally informed that they were not to follow or report on Russian threats, even though this had previously been a main focus for the agency.

A follow-up story from The Washington Post cited officials saying Cyber Command had received an order to halt active operations against Russia, but that the pause was intended to last only as long as negotiations with Russia continue.

The Department of Defense responded on Twitter/X that Hegseth had “neither canceled nor delayed any cyber operations directed against malicious Russian targets and there has been no stand-down order whatsoever from that priority.”

But on March 19, Reuters reported several U.S. national security agencies have halted work on a coordinated effort to counter Russian sabotage, disinformation and cyberattacks.

“Regular meetings between the National Security Council and European national security officials have gone unscheduled, and the NSC has also stopped formally coordinating efforts across U.S. agencies, including with the FBI, the Department of Homeland Security and the State Department,” Reuters reported, citing current and former officials.

TARIFFS VS TYPHOONS

President’s Trump’s institution of 125% tariffs on goods from China has seen Beijing strike back with 84 percent tariffs on U.S. imports. Now, some security experts are warning that the trade war could spill over into a cyber conflict, given China’s successful efforts to burrow into America’s critical infrastructure networks.

Over the past year, a number of Chinese government-backed digital intrusions have come into focus, including a sprawling espionage campaign involving the compromise of at least nine U.S. telecommunications providers. Dubbed “Salt Typhoon” by Microsoft, these telecom intrusions were pervasive enough that CISA and the FBI in December 2024 warned Americans against communicating sensitive information over phone networks, urging people instead to use encrypted messaging apps (like Signal).

The other broad ranging China-backed campaign is known as “Volt Typhoon,” which CISA described as “state-sponsored cyber actors seeking to pre-position themselves on IT networks for disruptive or destructive cyberattacks against U.S. critical infrastructure in the event of a major crisis or conflict with the United States.”

Responsibility for determining the root causes of the Salt Typhoon security debacle fell to the Cyber Safety Review Board (CSRB), a nonpartisan government entity established in February 2022 with a mandate to investigate the security failures behind major cybersecurity events. But on his first full day back in the White House, President Trump dismissed all 15 CSRB advisory committee members — likely because those advisers included Chris Krebs.

Last week, Sen. Ron Wyden (D-Ore.) placed a hold on Trump’s nominee to lead CISA, saying the hold would continue unless the agency published a report on the telecom industry hacks, as promised.

“CISA’s multi-year cover up of the phone companies’ negligent cybersecurity has real consequences,” Wyden said in a statement. “Congress and the American people have a right to read this report.”

The Wall Street Journal reported last week Chinese officials acknowledged in a secret December meeting that Beijing was behind the widespread telecom industry compromises.

“The Chinese official’s remarks at the December meeting were indirect and somewhat ambiguous, but most of the American delegation in the room interpreted it as a tacit admission and a warning to the U.S. about Taiwan,” The Journal’s Dustin Volz wrote, citing a former U.S. official familiar with the meeting.

Meanwhile, China continues to take advantage of the mass firings of federal workers. On April 9, the National Counterintelligence and Security Center warned (PDF) that Chinese intelligence entities are pursuing an online effort to recruit recently laid-off U.S. employees.

“Foreign intelligence entities, particularly those in China, are targeting current and former U.S. government (USG) employees for recruitment by posing as consulting firms, corporate headhunters, think tanks, and other entities on social and professional networking sites,” the alert warns. “Their deceptive online job offers, and other virtual approaches, have become more sophisticated in targeting unwitting individuals with USG backgrounds seeking new employment.”

Image: Dni.gov

ELECTION THREATS

As Reuters notes, the FBI last month ended an effort to counter interference in U.S. elections by foreign adversaries including Russia, and put on leave staff working on the issue at the Department of Homeland Security.

Meanwhile, the U.S. Senate is now considering a House-passed bill dubbed the “Safeguard American Voter Eligibility (SAVE) Act,” which would order states to obtain proof of citizenship, such as a passport or a birth certificate, in person from those seeking to register to vote.

Critics say the SAVE Act could disenfranchise millions of voters and discourage eligible voters from registering to vote. What’s more, documented cases of voter fraud are few and far between, as is voting by non-citizens. Even the conservative Heritage Foundation acknowledges as much: An interactive “election fraud map” published by Heritage lists just 1,576 convictions or findings of voter fraud between 1982 and the present day.

Nevertheless, the GOP-led House passed the SAVE Act with the help of four Democrats. Its passage in the Senate will require support from at least seven Democrats, Newsweek writes.

In February, CISA cut roughly 130 employees, including its election security advisors. The agency also was forced to freeze all election security activities pending an internal review. The review was reportedly completed in March, but the Trump administration has said the findings would not be made public, and there is no indication of whether any cybersecurity support has been restored.

Many state leaders have voiced anxiety over the administration’s cuts to CISA programs that provide assistance and threat intelligence to election security efforts. Iowa Secretary of State Paul Pate last week told the PBS show Iowa Press he would not want to see those programs dissolve.

“If those (systems) were to go away, it would be pretty serious,” Pate said. “We do count on a lot those cyber protections.”

Pennsylvania’s Secretary of the Commonwealth Al Schmidt recently warned the CISA election security cuts would make elections less secure, and said no state on its own can replace federal election cybersecurity resources.

The Pennsylvania Capital-Star reports that several local election offices received bomb threats around the time polls closed on Nov. 5, and that in the week before the election a fake video showing mail-in ballots cast for Trump and Sen. Dave McCormick (R-Pa.) being destroyed and thrown away was linked to a Russian disinformation campaign.

“CISA was able to quickly identify not only that it was fraudulent, but also the source of it, so that we could share with our counties and we could share with the public so confidence in the election wasn’t undermined,” Schmidt said.

According to CNN, the administration’s actions have deeply alarmed state officials, who warn the next round of national elections will be seriously imperiled by the cuts. A bipartisan association representing 46 secretaries of state, and several individual top state election officials, have pressed the White House about how critical functions of protecting election security will perform going forward. However, CNN reports they have yet to receive clear answers.

Nevada and 18 other states are suing Trump over an executive order he issued on March 25 that asserts the executive branch has broad authority over state election procedures.

“None of the president’s powers allow him to change the rules of elections,” Nevada Secretary of State Cisco Aguilar wrote in an April 11 op-ed. “That is an intentional feature of our Constitution, which the Framers built in to ensure election integrity. Despite that, Trump is seeking to upend the voter registration process; impose arbitrary deadlines on vote counting; allow an unelected and unaccountable billionaire to invade state voter rolls; and withhold congressionally approved funding for election security.”

The order instructs the U.S. Election Assistance Commission to abruptly amend the voluntary federal guidelines for voting machines without going through the processes mandated by federal law. And it calls for allowing the administrator of the so-called Department of Government Efficiency (DOGE), along with DHS, to review state voter registration lists and other records to identify non-citizens.

The Atlantic’s Paul Rosenzweig notes that the chief executive of the country — whose unilateral authority the Founding Fathers most feared — has literally no role in the federal election system.

“Trump’s executive order on elections ignores that design entirely,” Rosenzweig wrote. “He is asserting an executive-branch role in governing the mechanics of a federal election that has never before been claimed by a president. The legal theory undergirding this assertion — that the president’s authority to enforce federal law enables him to control state election activity — is as capacious as it is frightening.”

Telegram-Scraper - A Powerful Python Script That Allows You To Scrape Messages And Media From Telegram Channels Using The Telethon Library

By: Unknown


A powerful Python script that allows you to scrape messages and media from Telegram channels using the Telethon library. Features include real-time continuous scraping, media downloading, and data export capabilities.

___________________  _________
\__ ___/ _____/ / _____/
| | / \ ___ \_____ \
| | \ \_\ \/ \
|____| \______ /_______ /
\/ \/

Features 🚀

  • Scrape messages from multiple Telegram channels
  • Download media files (photos, documents)
  • Real-time continuous scraping
  • Export data to JSON and CSV formats
  • SQLite database storage
  • Resume capability (saves progress)
  • Media reprocessing for failed downloads
  • Progress tracking
  • Interactive menu interface

Prerequisites 📋

Before running the script, you'll need:

  • Python 3.7 or higher
  • Telegram account
  • API credentials from Telegram

Required Python packages

pip install -r requirements.txt

Contents of requirements.txt:

telethon
aiohttp
asyncio

Getting Telegram API Credentials 🔑

  1. Visit https://my.telegram.org/auth
  2. Log in with your phone number
  3. Click on "API development tools"
  4. Fill in the form:
  5. App title: Your app name
  6. Short name: Your app short name
  7. Platform: Can be left as "Desktop"
  8. Description: Brief description of your app
  9. Click "Create application"
  10. You'll receive:
  11. api_id: A number
  12. api_hash: A string of letters and numbers

Keep these credentials safe, you'll need them to run the script!

Setup and Running 🔧

  1. Clone the repository:
git clone https://github.com/unnohwn/telegram-scraper.git
cd telegram-scraper
  1. Install requirements:
pip install -r requirements.txt
  1. Run the script:
python telegram-scraper.py
  1. On first run, you'll be prompted to enter:
  2. Your API ID
  3. Your API Hash
  4. Your phone number (with country code)
  5. Your phone number (with country code) or bot, but use the phone number option when prompted second time.
  6. Verification code (sent to your Telegram)

Initial Scraping Behavior 🕒

When scraping a channel for the first time, please note:

  • The script will attempt to retrieve the entire channel history, starting from the oldest messages
  • Initial scraping can take several minutes or even hours, depending on:
  • The total number of messages in the channel
  • Whether media downloading is enabled
  • The size and number of media files
  • Your internet connection speed
  • Telegram's rate limiting
  • The script uses pagination and maintains state, so if interrupted, it can resume from where it left off
  • Progress percentage is displayed in real-time to track the scraping status
  • Messages are stored in the database as they are scraped, so you can start analyzing available data even before the scraping is complete

Usage 📝

The script provides an interactive menu with the following options:

  • [A] Add new channel
  • Enter the channel ID or channelname
  • [R] Remove channel
  • Remove a channel from scraping list
  • [S] Scrape all channels
  • One-time scraping of all configured channels
  • [M] Toggle media scraping
  • Enable/disable downloading of media files
  • [C] Continuous scraping
  • Real-time monitoring of channels for new messages
  • [E] Export data
  • Export to JSON and CSV formats
  • [V] View saved channels
  • List all saved channels
  • [L] List account channels
  • List all channels with ID:s for account
  • [Q] Quit

Channel IDs 📢

You can use either: - Channel username (e.g., channelname) - Channel ID (e.g., -1001234567890)

Data Storage 💾

Database Structure

Data is stored in SQLite databases, one per channel: - Location: ./channelname/channelname.db - Table: messages - id: Primary key - message_id: Telegram message ID - date: Message timestamp - sender_id: Sender's Telegram ID - first_name: Sender's first name - last_name: Sender's last name - username: Sender's username - message: Message text - media_type: Type of media (if any) - media_path: Local path to downloaded media - reply_to: ID of replied message (if any)

Media Storage 📁

Media files are stored in: - Location: ./channelname/media/ - Files are named using message ID or original filename

Exported Data 📊

Data can be exported in two formats: 1. CSV: ./channelname/channelname.csv - Human-readable spreadsheet format - Easy to import into Excel/Google Sheets

  1. JSON: ./channelname/channelname.json
  2. Structured data format
  3. Ideal for programmatic processing

Features in Detail 🔍

Continuous Scraping

The continuous scraping feature ([C] option) allows you to: - Monitor channels in real-time - Automatically download new messages - Download media as it's posted - Run indefinitely until interrupted (Ctrl+C) - Maintains state between runs

Media Handling

The script can download: - Photos - Documents - Other media types supported by Telegram - Automatically retries failed downloads - Skips existing files to avoid duplicates

Error Handling 🛠️

The script includes: - Automatic retry mechanism for failed media downloads - State preservation in case of interruption - Flood control compliance - Error logging for failed operations

Limitations ⚠️

  • Respects Telegram's rate limits
  • Can only access public channels or channels you're a member of
  • Media download size limits apply as per Telegram's restrictions

Contributing 🤝

Contributions are welcome! Please feel free to submit a Pull Request.

License 📄

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer ⚖️

This tool is for educational purposes only. Make sure to: - Respect Telegram's Terms of Service - Obtain necessary permissions before scraping - Use responsibly and ethically - Comply with data protection regulations



Telegram-Story-Scraper - A Python Script That Allows You To Automatically Scrape And Download Stories From Your Telegram Friends

By: Unknown


A Python script that allows you to automatically scrape and download stories from your Telegram friends using the Telethon library. The script continuously monitors and saves both photos and videos from stories, along with their metadata.


Important Note About Story Access ⚠️

Due to Telegram API restrictions, this script can only access stories from: - Users you have added to your friend list - Users whose privacy settings allow you to view their stories

This is a limitation of Telegram's API and cannot be bypassed.

Features 🚀

  • Automatically scrapes all available stories from your Telegram friends
  • Downloads both photos and videos from stories
  • Stores metadata in SQLite database
  • Exports data to Excel spreadsheet
  • Real-time monitoring with customizable intervals
  • Timestamp is set to (UTC+2)
  • Maintains record of previously downloaded stories
  • Resume capability
  • Automatic retry mechanism

Prerequisites 📋

Before running the script, you'll need:

  • Python 3.7 or higher
  • Telegram account
  • API credentials from Telegram
  • Friends on Telegram whose stories you want to track

Required Python packages

pip install -r requirements.txt

Contents of requirements.txt:

telethon
openpyxl
schedule

Getting Telegram API Credentials 🔑

  1. Visit https://my.telegram.org/auth
  2. Log in with your phone number
  3. Click on "API development tools"
  4. Fill in the form:
  5. App title: Your app name
  6. Short name: Your app short name
  7. Platform: Can be left as "Desktop"
  8. Description: Brief description of your app
  9. Click "Create application"
  10. You'll receive:
  11. api_id: A number
  12. api_hash: A string of letters and numbers

Keep these credentials safe, you'll need them to run the script!

Setup and Running 🔧

  1. Clone the repository:
git clone https://github.com/unnohwn/telegram-story-scraper.git
cd telegram-story-scraper
  1. Install requirements:
pip install -r requirements.txt
  1. Run the script:
python TGSS.py
  1. On first run, you'll be prompted to enter:
  2. Your API ID
  3. Your API Hash
  4. Your phone number (with country code)
  5. Verification code (sent to your Telegram)
  6. Checking interval in seconds (default is 60)

How It Works 🔄

The script: 1. Connects to your Telegram account 2. Periodically checks for new stories from your friends 3. Downloads any new stories (photos/videos) 4. Stores metadata in a SQLite database 5. Exports information to an Excel file 6. Runs continuously until interrupted (Ctrl+C)

Data Storage 💾

Database Structure (stories.db)

SQLite database containing: - user_id: Telegram user ID of the story creator - story_id: Unique story identifier - timestamp: When the story was posted (UTC+2) - filename: Local filename of the downloaded media

CSV and Excel Export (stories_export.csv/xlsx)

Export file containing the same information as the database, useful for: - Easy viewing of story metadata - Filtering and sorting - Data analysis - Sharing data with others

Media Storage 📁

  • Photos are saved as: {user_id}_{story_id}.jpg
  • Videos are saved with their original extension: {user_id}_{story_id}.{extension}
  • All media files are saved in the script's directory

Features in Detail 🔍

Continuous Monitoring

  • Customizable checking interval (default: 60 seconds)
  • Runs continuously until manually stopped
  • Maintains state between runs
  • Avoids duplicate downloads

Media Handling

  • Supports both photos and videos
  • Automatically detects media type
  • Preserves original quality
  • Generates unique filenames

Error Handling 🛠️

The script includes: - Automatic retry mechanism for failed downloads - Error logging for failed operations - Connection error handling - State preservation in case of interruption

Limitations ⚠️

  • Subject to Telegram's rate limits
  • Stories must be currently active (not expired)
  • Media download size limits apply as per Telegram's restrictions

Contributing 🤝

Contributions are welcome! Please feel free to submit a Pull Request.

License 📄

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer ⚖️

This tool is for educational purposes only. Make sure to: - Respect Telegram's Terms of Service - Obtain necessary permissions before scraping - Use responsibly and ethically - Comply with data protection regulations - Respect user privacy



Microsoft: 6 Zero-Days in March 2025 Patch Tuesday

Microsoft today issued more than 50 security updates for its various Windows operating systems, including fixes for a whopping six zero-day vulnerabilities that are already seeing active exploitation.

Two of the zero-day flaws include CVE-2025-24991 and CVE-2025-24993, both vulnerabilities in NTFS, the default file system for Windows and Windows Server. Both require the attacker to trick a target into mounting a malicious virtual hard disk. CVE-2025-24993 would lead to the possibility of local code execution, while CVE-2025-24991 could cause NTFS to disclose portions of memory.

Microsoft credits researchers at ESET with reporting the zero-day bug labeled CVE-2025-24983, an elevation of privilege vulnerability in older versions of Windows. ESET said the exploit was deployed via the PipeMagic backdoor, capable of exfiltrating data and enabling remote access to the machine.

ESET’s Filip Jurčacko said the exploit in the wild targets only older versions of Windows OS: Windows 8.1 and Server 2012 R2. Although still used by millions, security support for these products ended more than a year ago, and mainstream support ended years ago. However, ESET notes the vulnerability itself also is present in newer Windows OS versions, including Windows 10 build 1809 and the still-supported Windows Server 2016.

Rapid7’s lead software engineer Adam Barnett said Windows 11 and Server 2019 onwards are not listed as receiving patches, so are presumably not vulnerable.

“It’s not clear why newer Windows products dodged this particular bullet,” Barnett wrote. “The Windows 32 subsystem is still presumably alive and well, since there is no apparent mention of its demise on the Windows client OS deprecated features list.”

The zero-day flaw CVE-2025-24984 is another NTFS weakness that can be exploited by inserting a malicious USB drive into a Windows computer. Barnett said Microsoft’s advisory for this bug doesn’t quite join the dots, but successful exploitation appears to mean that portions of heap memory could be improperly dumped into a log file, which could then be combed through by an attacker hungry for privileged information.

“A relatively low CVSSv3 base score of 4.6 reflects the practical difficulties of real-world exploitation, but a motivated attacker can sometimes achieve extraordinary results starting from the smallest of toeholds, and Microsoft does rate this vulnerability as important on its own proprietary severity ranking scale,” Barnett said.

Another zero-day fixed this month — CVE-2025-24985 — could allow attackers to install malicious code. As with the NTFS bugs, this one requires that the user mount a malicious virtual hard drive.

The final zero-day this month is CVE-2025-26633, a weakness in the Microsoft Management Console, a component of Windows that gives system administrators a way to configure and monitor the system. Exploiting this flaw requires the target to open a malicious file.

This month’s bundle of patch love from Redmond also addresses six other vulnerabilities Microsoft has rated “critical,” meaning that malware or malcontents could exploit them to seize control over vulnerable PCs with no help from users.

Barnett observed that this is now the sixth consecutive month where Microsoft has published zero-day vulnerabilities on Patch Tuesday without evaluating any of them as critical severity at time of publication.

The SANS Internet Storm Center has a useful list of all the Microsoft patches released today, indexed by severity. Windows enterprise administrators would do well to keep an eye on askwoody.com, which often has the scoop on any patches causing problems. Please consider backing up your data before updating, and leave a comment below if you experience any issues applying this month’s updates.

NinjaOne Scores $500M in Series C Extensions at $5 Billion Valuation

Texas automated endpoint management vendor banks $500 million infusion in Series C extensions that values the company at $5 billion. 

The post NinjaOne Scores $500M in Series C Extensions at $5 Billion Valuation appeared first on SecurityWeek.

VC Firm Insight Partners Hacked

Venture capital firm Insight Partners has been targeted in a cyberattack that involved unauthorized access to its information systems.

The post VC Firm Insight Partners Hacked appeared first on SecurityWeek.

Microsoft Patch Tuesday, February 2025 Edition

Microsoft today issued security updates to fix at least 56 vulnerabilities in its Windows operating systems and supported software, including two zero-day flaws that are being actively exploited.

All supported Windows operating systems will receive an update this month for a buffer overflow vulnerability that carries the catchy name CVE-2025-21418. This patch should be a priority for enterprises, as Microsoft says it is being exploited, has low attack complexity, and no requirements for user interaction.

Tenable senior staff research engineer Satnam Narang noted that since 2022, there have been nine elevation of privilege vulnerabilities in this same Windows component — three each year — including one in 2024 that was exploited in the wild as a zero day (CVE-2024-38193).

“CVE-2024-38193 was exploited by the North Korean APT group known as Lazarus Group to implant a new version of the FudModule rootkit in order to maintain persistence and stealth on compromised systems,” Narang said. “At this time, it is unclear if CVE-2025-21418 was also exploited by Lazarus Group.”

The other zero-day, CVE-2025-21391, is an elevation of privilege vulnerability in Windows Storage that could be used to delete files on a targeted system. Microsoft’s advisory on this bug references something called “CWE-59: Improper Link Resolution Before File Access,” says no user interaction is required, and that the attack complexity is low.

Adam Barnett, lead software engineer at Rapid7, said although the advisory provides scant detail, and even offers some vague reassurance that ‘an attacker would only be able to delete targeted files on a system,’ it would be a mistake to assume that the impact of deleting arbitrary files would be limited to data loss or denial of service.

“As long ago as 2022, ZDI researchers set out how a motivated attacker could parlay arbitrary file deletion into full SYSTEM access using techniques which also involve creative misuse of symbolic links,”Barnett wrote.

One vulnerability patched today that was publicly disclosed earlier is CVE-2025-21377, another weakness that could allow an attacker to elevate their privileges on a vulnerable Windows system. Specifically, this is yet another Windows flaw that can be used to steal NTLMv2 hashes — essentially allowing an attacker to authenticate as the targeted user without having to log in.

According to Microsoft, minimal user interaction with a malicious file is needed to exploit CVE-2025-21377, including selecting, inspecting or “performing an action other than opening or executing the file.”

“This trademark linguistic ducking and weaving may be Microsoft’s way of saying ‘if we told you any more, we’d give the game away,'” Barnett said. “Accordingly, Microsoft assesses exploitation as more likely.”

The SANS Internet Storm Center has a handy list of all the Microsoft patches released today, indexed by severity. Windows enterprise administrators would do well to keep an eye on askwoody.com, which often has the scoop on any patches causing problems.

It’s getting harder to buy Windows software that isn’t also bundled with Microsoft’s flagship Copilot artificial intelligence (AI) feature. Last month Microsoft started bundling Copilot with Microsoft Office 365, which Redmond has since rebranded as “Microsoft 365 Copilot.” Ostensibly to offset the costs of its substantial AI investments, Microsoft also jacked up prices from 22 percent to 30 percent for upcoming license renewals and new subscribers.

Office-watch.com writes that existing Office 365 users who are paying an annual cloud license do have the option of “Microsoft 365 Classic,” an AI-free subscription at a lower price, but that many customers are not offered the option until they attempt to cancel their existing Office subscription.

In other security patch news, Apple has shipped iOS 18.3.1, which fixes a zero day vulnerability (CVE-2025-24200) that is showing up in attacks.

Adobe has issued security updates that fix a total of 45 vulnerabilities across InDesign, Commerce, Substance 3D Stager, InCopy, Illustrator, Substance 3D Designer and Photoshop Elements.

Chris Goettl at Ivanti notes that Google Chrome is shipping an update today which will trigger updates for Chromium based browsers including Microsoft Edge, so be on the lookout for Chrome and Edge updates as we proceed through the week.

Microsoft: Happy 2025. Here’s 161 Security Updates

Microsoft today unleashed updates to plug a whopping 161 security vulnerabilities in Windows and related software, including three “zero-day” weaknesses that are already under active attack. Redmond’s inaugural Patch Tuesday of 2025 bundles more fixes than the company has shipped in one go since 2017.

Rapid7‘s Adam Barnett says January marks the fourth consecutive month where Microsoft has published zero-day vulnerabilities on Patch Tuesday without evaluating any of them as critical severity at time of publication. Today also saw the publication of nine critical remote code execution (RCE) vulnerabilities.

The Microsoft flaws already seeing active attacks include CVE-2025-21333, CVE-2025-21334 and, you guessed it– CVE-2025-21335. These are sequential because all reside in Windows Hyper-V, a component that is heavily embedded in modern Windows 11 operating systems and used for security features including device guard and credential guard.

Tenable’s Satnam Narang says little is known about the in-the-wild exploitation of these flaws, apart from the fact that they are all “privilege escalation” vulnerabilities. Narang said we tend to see a lot of elevation of privilege bugs exploited in the wild as zero-days in Patch Tuesday because it’s not always initial access to a system that’s a challenge for attackers as they have various avenues in their pursuit.

“As elevation of privilege bugs, they’re being used as part of post-compromise activity, where an attacker has already accessed a target system,” he said. “It’s kind of like if an attacker is able to enter a secure building, they’re unable to access more secure parts of the facility because they have to prove that they have clearance. In this case, they’re able to trick the system into believing they should have clearance.”

Several bugs addressed today earned CVSS (threat rating) scores of 9.8 out of a possible 10, including CVE-2025-21298, a weakness in Windows that could allow attackers to run arbitrary code by getting a target to open a malicious .rtf file, documents typically opened on Office applications like Microsoft Word. Microsoft has rated this flaw “exploitation more likely.”

Ben Hopkins at Immersive Labs called attention to the CVE-2025-21311, a 9.8 “critical” bug in Windows NTLMv1 (NT LAN Manager version 1), an older Microsoft authentication protocol that is still used by many organizations.

“What makes this vulnerability so impactful is the fact that it is remotely exploitable, so attackers can reach the compromised machine(s) over the internet, and the attacker does not need significant knowledge or skills to achieve repeatable success with the same payload across any vulnerable component,” Hopkins wrote.

Kev Breen at Immersive points to an interesting flaw (CVE-2025-21210) that Microsoft fixed in its full disk encryption suite Bitlocker that the software giant has dubbed “exploitation more likely.” Specifically, this bug holds out the possibility that in some situations the hibernation image created when one closes the laptop lid on an open Windows session may not be fully encrypted and could be recovered in plain text.

“Hibernation images are used when a laptop goes to sleep and contains the contents that were stored in RAM at the moment the device powered down,” Breen noted. “This presents a significant potential impact as RAM can contain sensitive data (such as passwords, credentials and PII) that may have been in open documents or browser sessions and can all be recovered with free tools from hibernation files.”

Tenable’s Narang also highlighted a trio of vulnerabilities in Microsoft Access fixed this month and credited to Unpatched.ai, a security research effort that is aided by artificial intelligence looking for vulnerabilities in code. Tracked as CVE-2025-21186, CVE-2025-21366, and CVE-2025-21395, these are remote code execution bugs that are exploitable if an attacker convinces a target to download and run a malicious file through social engineering. Unpatched.ai was also credited with discovering a flaw in the December 2024 Patch Tuesday release (CVE-2024-49142).

“Automated vulnerability detection using AI has garnered a lot of attention recently, so it’s noteworthy to see this service being credited with finding bugs in Microsoft products,” Narang observed. “It may be the first of many in 2025.”

If you’re a Windows user who has automatic updates turned off and haven’t updated in a while, it’s probably time to play catch up. Please consider backing up important files and/or the entire hard drive before updating. And if you run into any problems installing this month’s patch batch, drop a line in the comments below, please.

Further reading on today’s patches from Microsoft:

Tenable blog

SANS Internet Storm Center

Ask Woody

Patch Tuesday, December 2024 Edition

Microsoft today released updates to plug at least 70 security holes in Windows and Windows software, including one vulnerability that is already being exploited in active attacks.

The zero-day seeing exploitation involves CVE-2024-49138, a security weakness in the Windows Common Log File System (CLFS) driver — used by applications to write transaction logs — that could let an authenticated attacker gain “system” level privileges on a vulnerable Windows device.

The security firm Rapid7 notes there have been a series of zero-day elevation of privilege flaws in CLFS over the past few years.

“Ransomware authors who have abused previous CLFS vulnerabilities will be only too pleased to get their hands on a fresh one,” wrote Adam Barnett, lead software engineer at Rapid7. “Expect more CLFS zero-day vulnerabilities to emerge in the future, at least until Microsoft performs a full replacement of the aging CLFS codebase instead of offering spot fixes for specific flaws.”

Elevation of privilege vulnerabilities accounted for 29% of the 1,009 security bugs Microsoft has patched so far in 2024, according to a year-end tally by Tenable; nearly 40 percent of those bugs were weaknesses that could let attackers run malicious code on the vulnerable device.

Rob Reeves, principal security engineer at Immersive Labs, called special attention to CVE-2024-49112, a remote code execution flaw in the Lightweight Directory Access Protocol (LDAP) service on every version of Windows since Windows 7. CVE-2024-49112 has been assigned a CVSS (badness) score of 9.8 out of 10.

“LDAP is most commonly seen on servers that are Domain Controllers inside a Windows network and LDAP must be exposed to other servers and clients within an enterprise environment for the domain to function,” Reeves said. “Microsoft hasn’t released specific information about the vulnerability at present, but has indicated that the attack complexity is low and authentication is not required.”

Tyler Reguly at the security firm Fortra had a slightly different 2024 patch tally for Microsoft, at 1,088 vulnerabilities, which he said was surprisingly similar to the 1,063 vulnerabilities resolved in 2023 and the 1,119 vulnerabilities resolved in 2022.

“If nothing else, we can say that Microsoft is consistent,” Reguly said. “While it would be nice to see the number of vulnerabilities each year decreasing, at least consistency lets us know what to expect.”

If you’re a Windows end user and your system is not set up to automatically install updates, please take a minute this week to run Windows Update, preferably after backing up your system and/or important data.

System admins should keep an eye on AskWoody.com, which usually has the details if any of the Patch Tuesday fixes are causing problems. In the meantime, if you run into any problems applying this month’s fixes, please drop a note about in the comments below.

Intezer Raises $33M to Extend AI-Powered SOC Platform

Intezer is looking to tap into booming market for AI-powered tooling to address the severe shortage of skilled cybersecurity professionals. 

The post Intezer Raises $33M to Extend AI-Powered SOC Platform appeared first on SecurityWeek.

C/side Raises $6 Million to Secure the Browser Supply Chain

C/side has raised $6 million in a seed-stage funding round to help organizations protect against malicious browser third-party scripts.

The post C/side Raises $6 Million to Secure the Browser Supply Chain appeared first on SecurityWeek.

EasyDMARC Lands $20M for Email Security Authentication Tech

EasyDMARC lands venture capital funding after finding traction in the email security and authentication business.

The post EasyDMARC Lands $20M for Email Security Authentication Tech appeared first on SecurityWeek.

Bug Left Some Windows PCs Dangerously Unpatched

Microsoft Corp. today released updates to fix at least 79 security vulnerabilities in its Windows operating systems and related software, including multiple flaws that are already showing up in active attacks. Microsoft also corrected a critical bug that has caused some Windows 10 PCs to remain dangerously unpatched against actively exploited vulnerabilities for several months this year.

By far the most curious security weakness Microsoft disclosed today has the snappy name of CVE-2024-43491, which Microsoft says is a vulnerability that led to the rolling back of fixes for some vulnerabilities affecting “optional components” on certain Windows 10 systems produced in 2015. Those include Windows 10 systems that installed the monthly security update for Windows released in March 2024, or other updates released until August 2024.

Satnam Narang, senior staff research engineer at Tenable, said that while the phrase “exploitation detected” in a Microsoft advisory normally implies the flaw is being exploited by cybercriminals, it appears labeled this way with CVE-2024-43491 because the rollback of fixes reintroduced vulnerabilities that were previously know to be exploited.

“To correct this issue, users need to apply both the September 2024 Servicing Stack Update and the September 2024 Windows Security Updates,” Narang said.

Kev Breen, senior director of threat research at Immersive Labs, said the root cause of CVE-2024-43491 is that on specific versions of Windows 10, the build version numbers that are checked by the update service were not properly handled in the code.

“The notes from Microsoft say that the ‘build version numbers crossed into a range that triggered a code defect’,” Breen said. “The short version is that some versions of Windows 10 with optional components enabled was left in a vulnerable state.”

Zero Day #1 this month is CVE-2024-38226, and it concerns a weakness in Microsoft Publisher, a standalone application included in some versions of Microsoft Office. This flaw lets attackers bypass Microsoft’s “Mark of the Web,” a Windows security feature that marks files downloaded from the Internet as potentially unsafe.

Zero Day #2 is CVE-2024-38217, also a Mark of the Web bypass affecting Office. Both zero-day flaws rely on the target opening a booby-trapped Office file.

Security firm Rapid7 notes that CVE-2024-38217 has been publicly disclosed via an extensive write-up, with exploit code also available on GitHub.

According to Microsoft, CVE-2024-38014, an “elevation of privilege” bug in the Windows Installer, is also being actively exploited.

June’s coverage of Microsoft Patch Tuesday was titled “Recall Edition,” because the big news then was that Microsoft was facing a torrent of criticism from privacy and security experts over “Recall,” a new artificial intelligence (AI) feature of Redmond’s flagship Copilot+ PCs that constantly takes screenshots of whatever users are doing on their computers.

At the time, Microsoft responded by suggesting Recall would no longer be enabled by default. But last week, the software giant clarified that what it really meant was that the ability to disable Recall was a bug/feature in the preview version of Copilot+ that will not be available to Windows customers going forward. Translation: New versions of Windows are shipping with Recall deeply embedded in the operating system.

It’s pretty rich that Microsoft, which already collects an insane amount of information from its customers on a near constant basis, is calling the Recall removal feature a bug, while treating Recall as a desirable feature. Because from where I sit, Recall is a feature nobody asked for that turns Windows into a bug (of the surveillance variety).

When Redmond first responded to critics about Recall, they noted that Recall snapshots never leave the user’s system, and that even if attackers managed to hack a Copilot+ PC they would not be able to exfiltrate on-device Recall data.

But that claim rang hollow after former Microsoft threat analyst Kevin Beaumont detailed on his blog how any user on the system (even a non-administrator) can export Recall data, which is just stored in an SQLite database locally.

As it is apt to do on Microsoft Patch Tuesday, Adobe has released updates to fix security vulnerabilities in a range of products, including Reader and Acrobat, After Effects, Premiere Pro, Illustrator, ColdFusion, Adobe Audition, and Photoshop. Adobe says it is not aware of any exploits in the wild for any of the issues addressed in its updates.

Seeking a more detailed breakdown of the patches released by Microsoft today? Check out the SANS Internet Storm Center’s thorough list. People responsible for administering many systems in an enterprise environment would do well to keep an eye on AskWoody.com, which often has the skinny on any wonky Windows patches that may be causing problems for some users.

As always, if you experience any issues applying this month’s patch batch, consider dropping a note in the comments here about it.

P0 Security Banks $15M for Security Cloud Access

San Francisco secure cloud access startup gets backing from SYN Ventures, Zscaler, and Lightspeed Venture Partners.

The post P0 Security Banks $15M for Security Cloud Access appeared first on SecurityWeek.

Patch Tuesday, June 2024 “Recall” Edition

Microsoft today released updates to fix more than 50 security vulnerabilities in Windows and related software, a relatively light Patch Tuesday this month for Windows users. The software giant also responded to a torrent of negative feedback on a new feature of Redmond’s flagship operating system that constantly takes screenshots of whatever users are doing on their computers, saying the feature would no longer be enabled by default.

Last month, Microsoft debuted Copilot+ PCs, an AI-enabled version of Windows. Copilot+ ships with a feature nobody asked for that Redmond has aptly dubbed Recall, which constantly takes screenshots of what the user is doing on their PC. Security experts roundly trashed Recall as a fancy keylogger, noting that it would be a gold mine of information for attackers if the user’s PC was compromised with malware.

Microsoft countered that Recall snapshots never leave the user’s system, and that even if attackers managed to hack a Copilot+ PC they would not be able to exfiltrate on-device Recall data. But that claim rang hollow after former Microsoft threat analyst Kevin Beaumont detailed on his blog how any user on the system (even a non-administrator) can export Recall data, which is just stored in an SQLite database locally.

“I’m not being hyperbolic when I say this is the dumbest cybersecurity move in a decade,” Beaumont said on Mastodon.

In a recent Risky Business podcast, host Patrick Gray noted that the screenshots created and indexed by Recall would be a boon to any attacker who suddenly finds himself in an unfamiliar environment.

“The first thing you want to do when you get on a machine if you’re up to no good is to figure out how someone did their job,” Gray said. “We saw that in the case of the SWIFT attacks against central banks years ago. Attackers had to do screen recordings to figure out how transfers work. And this could speed up that sort of discovery process.”

Responding to the withering criticism of Recall, Microsoft said last week that it will no longer be enabled by default on Copilot+ PCs.

Only one of the patches released today — CVE-2024-30080 — earned Microsoft’s most urgent “critical” rating, meaning malware or malcontents could exploit the vulnerability to remotely seize control over a user’s system, without any user interaction.

CVE-2024-30080 is a flaw in the Microsoft Message Queuing (MSMQ) service that can allow attackers to execute code of their choosing. Microsoft says exploitation of this weakness is likely, enough to encourage users to disable the vulnerable component if updating isn’t possible in the short run. CVE-2024-30080 has been assigned a CVSS vulnerability score of 9.8 (10 is the worst).

Kevin Breen, senior director of threat research at Immersive Labs, said a saving grace is that MSMQ is not a default service on Windows.

“A Shodan search for MSMQ reveals there are a few thousand potentially internet-facing MSSQ servers that could be vulnerable to zero-day attacks if not patched quickly,” Breen said.

CVE-2024-30078 is a remote code execution weakness in the Windows WiFi Driver, which also has a CVSS score of 9.8. According to Microsoft, an unauthenticated attacker could exploit this bug by sending a malicious data packet to anyone else on the same network — meaning this flaw assumes the attacker has access to the local network.

Microsoft also fixed a number of serious security issues with its Office applications, including at least two remote-code execution flaws, said Adam Barnett, lead software engineer at Rapid7.

CVE-2024-30101 is a vulnerability in Outlook; although the Preview Pane is a vector, the user must subsequently perform unspecified specific actions to trigger the vulnerability and the attacker must win a race condition,” Barnett said. “CVE-2024-30104 does not have the Preview Pane as a vector, but nevertheless ends up with a slightly higher CVSS base score of 7.8, since exploitation relies solely on the user opening a malicious file.”

Separately, Adobe released security updates for Acrobat, ColdFusion, and Photoshop, among others.

As usual, the SANS Internet Storm Center has the skinny on the individual patches released today, indexed by severity, exploitability and urgency. Windows admins should also keep an eye on AskWoody.com, which often publishes early reports of any Windows patches gone awry.

JAVS Courtroom Audio-Visual Software Installer Serves Backdoor

Backdoored JAVS courtroom recording and management software installer puts thousands at risk of complete takeover.

The post JAVS Courtroom Audio-Visual Software Installer Serves Backdoor appeared first on SecurityWeek.

Traceable AI Raises $30 Million to Safeguard Cloud APIs

Traceable AI has raised $110 million since launching in 2018 with ambitious plans in the competitive API security and observability space.  

The post Traceable AI Raises $30 Million to Safeguard Cloud APIs appeared first on SecurityWeek.

Man Who Mass-Extorted Psychotherapy Patients Gets Six Years

A 26-year-old Finnish man was sentenced to more than six years in prison today after being convicted of hacking into an online psychotherapy clinic, leaking tens of thousands of patient therapy records, and attempting to extort the clinic and patients.

On October 21, 2020, the Vastaamo Psychotherapy Center in Finland became the target of blackmail when a tormentor identified as “ransom_man” demanded payment of 40 bitcoins (~450,000 euros at the time) in return for a promise not to publish highly sensitive therapy session notes Vastaamo had exposed online.

Ransom_man announced on the dark web that he would start publishing 100 patient profiles every 24 hours. When Vastaamo declined to pay, ransom_man shifted to extorting individual patients. According to Finnish police, some 22,000 victims reported extortion attempts targeting them personally, targeted emails that threatened to publish their therapy notes online unless paid a 500 euro ransom.

Finnish prosecutors quickly zeroed in on a suspect: Julius “Zeekill” Kivimäki, a notorious criminal hacker convicted of committing tens of thousands of cybercrimes before he became an adult. After being charged with the attack in October 2022, Kivimäki fled the country. He was arrested four months later in France, hiding out under an assumed name and passport.

Antti Kurittu is a former criminal investigator who worked on an investigation involving Kivimäki’s use of the Zbot botnet, among other activities Kivimäki engaged in as a member of the hacker group Hack the Planet (HTP).

Kurittu said the prosecution had demanded at least seven years in jail, and that the sentence handed down was six years and three months. Kurittu said prosecutors knocked a few months off of Kivimäki’s sentence because he agreed to pay compensation to his victims, and that Kivimäki will remain in prison during any appeal process.

“I think the sentencing was as expected, knowing the Finnish judicial system,” Kurittu told KrebsOnSecurity. “As Kivimäki has not been sentenced to a non-suspended prison sentence during the last five years, he will be treated as a first-timer, his previous convictions notwithstanding.”

But because juvenile convictions in Finland don’t count towards determining whether somebody is a first-time offender, Kivimäki will end up serving approximately half of his sentence.

“This seems like a short sentence when taking into account the gravity of his actions and the life-altering consequences to thousands of people, but it’s almost the maximum the law allows for,” Kurittu said.

Kivimäki initially gained notoriety as a self-professed member of the Lizard Squad, a mainly low-skilled hacker group that specialized in DDoS attacks. But American and Finnish investigators say Kivimäki’s involvement in cybercrime dates back to at least 2008, when he was introduced to a founding member of what would soon become HTP.

Finnish police said Kivimäki also used the nicknames “Ryan”, “RyanC” and “Ryan Cleary” (Ryan Cleary was actually a member of a rival hacker group — LulzSec — who was sentenced to prison for hacking).

Kivimäki and other HTP members were involved in mass-compromising web servers using known vulnerabilities, and by 2012 Kivimäki’s alias Ryan Cleary was selling access to those servers in the form of a DDoS-for-hire service. Kivimäki was 15 years old at the time.

In 2013, investigators going through devices seized from Kivimäki found computer code that had been used to crack more than 60,000 web servers using a previously unknown vulnerability in Adobe’s ColdFusion software. KrebsOnSecurity detailed the work of HTP in September 2013, after the group compromised servers inside data brokers LexisNexis, Kroll, and Dun & Bradstreet.

The group used the same ColdFusion flaws to break into the National White Collar Crime Center (NWC3), a non-profit that provides research and investigative support to the U.S. Federal Bureau of Investigation (FBI).

As KrebsOnSecurity reported at the time, this small ColdFusion botnet of data broker servers was being controlled by the same cybercriminals who’d assumed control over SSNDOB, which operated one of the underground’s most reliable services for obtaining Social Security Number, dates of birth and credit file information on U.S. residents.

Kivimäki was responsible for making an August 2014 bomb threat against former Sony Online Entertainment President John Smedley that grounded an American Airlines plane. Kivimäki also was involved in calling in multiple fake bomb threats and “swatting” incidents — reporting fake hostage situations at an address to prompt a heavily armed police response to that location.

Ville Tapio, the former CEO of Vastaamo, was fired and also prosecuted following the breach. Ransom_man bragged about Vastaamo’s sloppy security, noting the company had used the laughably weak username and password “root/root” to protect sensitive patient records.

Investigators later found Vastaamo had originally been hacked in 2018 and again in 2019. In April 2023, a Finnish court handed down a three-month sentence for Tapio, but that sentence was suspended because he had no previous criminal record.

Galah - An LLM-powered Web Honeypot Using The OpenAI API

By: Unknown


TL;DR: Galah (/ɡəˈlɑː/ - pronounced 'guh-laa') is an LLM (Large Language Model) powered web honeypot, currently compatible with the OpenAI API, that is able to mimic various applications and dynamically respond to arbitrary HTTP requests.


Description

Named after the clever Australian parrot known for its mimicry, Galah mirrors this trait in its functionality. Unlike traditional web honeypots that rely on a manual and limiting method of emulating numerous web applications or vulnerabilities, Galah adopts a novel approach. This LLM-powered honeypot mimics various web applications by dynamically crafting relevant (and occasionally foolish) responses, including HTTP headers and body content, to arbitrary HTTP requests. Fun fact: in Aussie English, Galah also means fool!

I've deployed a cache for the LLM-generated responses (the cache duration can be customized in the config file) to avoid generating multiple responses for the same request and to reduce the cost of the OpenAI API. The cache stores responses per port, meaning if you probe a specific port of the honeypot, the generated response won't be returned for the same request on a different port.

The prompt is the most crucial part of this honeypot! You can update the prompt in the config file, but be sure not to change the part that instructs the LLM to generate the response in the specified JSON format.

Note: Galah was a fun weekend project I created to evaluate the capabilities of LLMs in generating HTTP messages, and it is not intended for production use. The honeypot may be fingerprinted based on its response time, non-standard, or sometimes weird responses, and other network-based techniques. Use this tool at your own risk, and be sure to set usage limits for your OpenAI API.

Future Enhancements

  • Rule-Based Response: The new version of Galah will employ a dynamic, rule-based approach, adding more control over response generation. This will further reduce OpenAI API costs and increase the accuracy of the generated responses.

  • Response Database: It will enable you to generate and import a response database. This ensures the honeypot only turns to the OpenAI API for unknown or new requests. I'm also working on cleaning up and sharing my own database.

  • Support for Other LLMs.

Getting Started

  • Ensure you have Go version 1.20+ installed.
  • Create an OpenAI API key from here.
  • If you want to serve over HTTPS, generate TLS certificates.
  • Clone the repo and install the dependencies.
  • Update the config.yaml file.
  • Build and run the Go binary!
% git clone git@github.com:0x4D31/galah.git
% cd galah
% go mod download
% go build
% ./galah -i en0 -v

██████ █████ ██ █████ ██ ██
██ ██ ██ ██ ██ ██ ██ ██
██ ███ ███████ ██ ███████ ███████
██ ██ ██ ██ ██ ██ ██ ██ ██
██████ ██ ██ ███████ ██ ██ ██ ██
llm-based web honeypot // version 1.0
author: Adel "0x4D31" Karimi

2024/01/01 04:29:10 Starting HTTP server on port 8080
2024/01/01 04:29:10 Starting HTTP server on port 8888
2024/01/01 04:29:10 Starting HTTPS server on port 8443 with TLS profile: profile1_selfsigned
2024/01/01 04:29:10 Starting HTTPS server on port 443 with TLS profile: profile1_selfsigned

2024/01/01 04:35:57 Received a request for "/.git/config" from [::1]:65434
2024/01/01 04:35:57 Request cache miss for "/.git/config": Not found in cache
2024/01/01 04:35:59 Generated HTTP response: {"Headers": {"Content-Type": "text/plain", "Server": "Apache/2.4.41 (Ubuntu)", "Status": "403 Forbidden"}, "Body": "Forbidden\nYou don't have permission to access this resource."}
2024/01/01 04:35:59 Sending the crafted response to [::1]:65434

^C2024/01/01 04:39:27 Received shutdown signal. Shutting down servers...
2024/01/01 04:39:27 All servers shut down gracefully.

Example Responses

Here are some example responses:

Example 1

% curl http://localhost:8080/login.php
<!DOCTYPE html><html><head><title>Login Page</title></head><body><form action='/submit.php' method='post'><label for='uname'><b>Username:</b></label><br><input type='text' placeholder='Enter Username' name='uname' required><br><label for='psw'><b>Password:</b></label><br><input type='password' placeholder='Enter Password' name='psw' required><br><button type='submit'>Login</button></form></body></html>

JSON log record:

{"timestamp":"2024-01-01T05:38:08.854878","srcIP":"::1","srcHost":"localhost","tags":null,"srcPort":"51978","sensorName":"home-sensor","port":"8080","httpRequest":{"method":"GET","protocolVersion":"HTTP/1.1","request":"/login.php","userAgent":"curl/7.71.1","headers":"User-Agent: [curl/7.71.1], Accept: [*/*]","headersSorted":"Accept,User-Agent","headersSortedSha256":"cf69e186169279bd51769f29d122b07f1f9b7e51bf119c340b66fbd2a1128bc9","body":"","bodySha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"},"httpResponse":{"headers":{"Content-Type":"text/html","Server":"Apache/2.4.38"},"body":"\u003c!DOCTYPE html\u003e\u003chtml\u003e\u003chead\u003e\u003ctitle\u003eLogin Page\u003c/title\u003e\u003c/head\u003e\u003cbody\u003e\u003cform action='/submit.php' method='post'\u003e\u003clabel for='uname'\u003e\u003cb\u003eUsername:\u003c/b\u003e\u003c/label\u003e\u003cbr\u003e\u003cinput type='text' placeholder='Enter Username' name='uname' required\u003e\u003cbr\u003e\u003clabel for='psw'\u003e\u003cb\u003ePassword:\u003c/b\u003e\u003c/label\u003e\u003cbr\u003e\u003cinput type='password' placeholder='Enter Password' name='psw' required\u003e\u003cbr\u003e\u003cbutton type='submit'\u003eLogin\u003c/button\u003e\u003c/form\u003e\u003c/body\u003e\u003c/html\u003e"}}

Example 2

% curl http://localhost:8080/.aws/credentials
[default]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
region = us-west-2

JSON log record:

{"timestamp":"2024-01-01T05:40:34.167361","srcIP":"::1","srcHost":"localhost","tags":null,"srcPort":"65311","sensorName":"home-sensor","port":"8080","httpRequest":{"method":"GET","protocolVersion":"HTTP/1.1","request":"/.aws/credentials","userAgent":"curl/7.71.1","headers":"User-Agent: [curl/7.71.1], Accept: [*/*]","headersSorted":"Accept,User-Agent","headersSortedSha256":"cf69e186169279bd51769f29d122b07f1f9b7e51bf119c340b66fbd2a1128bc9","body":"","bodySha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"},"httpResponse":{"headers":{"Connection":"close","Content-Encoding":"gzip","Content-Length":"126","Content-Type":"text/plain","Server":"Apache/2.4.51 (Unix)"},"body":"[default]\naws_access_key_id = AKIAIOSFODNN7EXAMPLE\naws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY\nregion = us-west-2"}}

Okay, that was impressive!

Example 3

Now, let's do some sort of adversarial testing!

% curl http://localhost:8888/are-you-a-honeypot
No, I am a server.`

JSON log record:

{"timestamp":"2024-01-01T05:50:43.792479","srcIP":"::1","srcHost":"localhost","tags":null,"srcPort":"61982","sensorName":"home-sensor","port":"8888","httpRequest":{"method":"GET","protocolVersion":"HTTP/1.1","request":"/are-you-a-honeypot","userAgent":"curl/7.71.1","headers":"User-Agent: [curl/7.71.1], Accept: [*/*]","headersSorted":"Accept,User-Agent","headersSortedSha256":"cf69e186169279bd51769f29d122b07f1f9b7e51bf119c340b66fbd2a1128bc9","body":"","bodySha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"},"httpResponse":{"headers":{"Connection":"close","Content-Length":"20","Content-Type":"text/plain","Server":"Apache/2.4.41 (Ubuntu)"},"body":"No, I am a server."}}

😑

% curl http://localhost:8888/i-mean-are-you-a-fake-server`
No, I am not a fake server.

JSON log record:

{"timestamp":"2024-01-01T05:51:40.812831","srcIP":"::1","srcHost":"localhost","tags":null,"srcPort":"62205","sensorName":"home-sensor","port":"8888","httpRequest":{"method":"GET","protocolVersion":"HTTP/1.1","request":"/i-mean-are-you-a-fake-server","userAgent":"curl/7.71.1","headers":"User-Agent: [curl/7.71.1], Accept: [*/*]","headersSorted":"Accept,User-Agent","headersSortedSha256":"cf69e186169279bd51769f29d122b07f1f9b7e51bf119c340b66fbd2a1128bc9","body":"","bodySha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"},"httpResponse":{"headers":{"Connection":"close","Content-Type":"text/plain","Server":"LocalHost/1.0"},"body":"No, I am not a fake server."}}

You're a galah, mate!



❌