Where Do Intelligence Platforms Find Cleartext Breach Data?

Member
Joined
April 7, 2025
Messages
21
Reaction score
2
Points
3
<div class="bbWrapper">Question for the InfoSec Community<br /> <br /> I've been exploring platforms like Intelligence X, where you can search for a domain or email and get results from leaked databases (sometimes in cleartext).<br /> I'm curious — from where do such platforms gather this data?<br /> <br /> Do they:<br /> <br /> 1. Monitor breach forums (like BreachForums)?<br /> 2. Pull from dark web marketplaces?<br /> 3. Scrape from paste sites (e.g., Pastebin)?<br /> 4. Use public dumps shared on GitHub, Telegram, or other leak sites?<br /> <br /> Or something else entirely?<br /> <br /> <b>If there is any available links or PDFs to learn deeper please drop in the comments, I would like to explore more.</b><br /> <br /> Would love to hear insights on what data sources are commonly used by tools like Intelligence X, DeHashed, Scylla, LeakCheck, etc.</div>
 
  • Like
Reactions: bokachan

Premium Member
Joined
April 23, 2025
Messages
14
Reaction score
2
Points
3
<div class="bbWrapper">I think its pretty clear they do a bit of all of the above.<br /> <script class="js-extraPhrases" type="application/json"> { "lightbox_close": "Close", "lightbox_next": "Next", "lightbox_previous": "Previous", "lightbox_error": "The requested content cannot be loaded. Please try again later.", "lightbox_start_slideshow": "Start slideshow", "lightbox_stop_slideshow": "Stop slideshow", "lightbox_full_screen": "Full screen", "lightbox_thumbnails": "Thumbnails", "lightbox_download": "Download", "lightbox_share": "Share", "lightbox_zoom": "Zoom", "lightbox_new_window": "New window", "lightbox_toggle_sidebar": "Toggle sidebar" } </script> <div class="bbImageWrapper js-lbImage" title="1749214800629.png" data-src="https://dna.fail/attachments/1749214800629-png.2236/" data-lb-sidebar-href="" data-lb-caption-extra-html="" data-single-image="1"> <img src="https://dna.fail/attachments/1749214800629-png.2236/" data-url="" class="bbImage" data-zoom-target="1" style="" alt="1749214800629.png" title="1749214800629.png" width="1131" height="253" loading="lazy" /> </div> <br /> <br /> More interesting to me is what data architecture they use to store, tag and index what I image is a vast ocean of data with its provenance. Most leaks have some level of dirty data, missing columns and fields, duplicates, etc. as well as trash data if it was a full DB dump. Just the ETL process is a pain for these muti GB data sets.<br /> <br /> I dont think they are much different than most of the more commercial data brokers, who gather in data from wherever they can, scraped, &quot;permissioned&quot;, leaked or otherwise. Almost all of them operate in the grey IMO.</div>
 
  • Like
Reactions: Fritz12
Member
Joined
April 7, 2025
Messages
21
Reaction score
2
Points
3
<div class="bbWrapper">Thanks for the detailed information. Now I have no doubt. <a href="https://dna.fail/members/61634/" class="username" data-xf-init="member-tooltip" data-user-id="61634" data-username="@AllosOnama">@AllosOnama</a> <img class="smilie smilie--emoji" loading="lazy" alt="😀" title="Grinning face :grinning:" src="https://cdn.jsdelivr.net/joypixels/assets/6.6/png/unicode/64/1f600.png" data-shortname=":grinning:" /></div>
 
Advanced Member
Joined
January 1, 2025
Messages
221
Reaction score
26
Points
28
<div class="bbWrapper">Clearnet/darknet forums freebie or leaks sections, OSINT (using google dorks)</div>
 
New Member
Joined
June 28, 2025
Messages
3
Reaction score
0
Points
1
<div class="bbWrapper"><blockquote data-attributes="member: 57157" data-quote="a909us3r" data-source="post: 409470" class="bbCodeBlock bbCodeBlock--expandable bbCodeBlock--quote js-expandWatch"> <div class="bbCodeBlock-title"> <a href="/goto/post?id=409470" class="bbCodeBlock-sourceJump" rel="nofollow" data-xf-click="attribution" data-content-selector="#post-409470">a909us3r said:</a> </div> <div class="bbCodeBlock-content"> <div class="bbCodeBlock-expandContent js-expandContent "> Question for the InfoSec Community<br /> <br /> I've been exploring platforms like Intelligence X, where you can search for a domain or email and get results from leaked databases (sometimes in cleartext).<br /> I'm curious — from where do such platforms gather this data?<br /> <br /> Do they:<br /> <br /> 1. Monitor breach forums (like BreachForums)?<br /> 2. Pull from dark web marketplaces?<br /> 3. Scrape from paste sites (e.g., Pastebin)?<br /> 4. Use public dumps shared on GitHub, Telegram, or other leak sites?<br /> <br /> Or something else entirely?<br /> <br /> <b>If there is any available links or PDFs to learn deeper please drop in the comments, I would like to explore more.</b><br /> <br /> Would love to hear insights on what data sources are commonly used by tools like Intelligence X, DeHashed, Scylla, LeakCheck, etc. </div> <div class="bbCodeBlock-expandLink js-expandLink"><a role="button" tabindex="0">Click to expand...</a></div> </div> </blockquote>ty</div>
 
  • Tags
    breach data data breach data leak intelligence
  • Top