<div class="bbWrapper">Question for the InfoSec Community<br />
<br />
I've been exploring platforms like Intelligence X, where you can search for a domain or email and get results from leaked databases (sometimes in cleartext).<br />
I'm curious — from where do such platforms gather this data?<br />
<br />
Do they:<br />
<br />
1. Monitor breach forums (like BreachForums)?<br />
2. Pull from dark web marketplaces?<br />
3. Scrape from paste sites (e.g., Pastebin)?<br />
4. Use public dumps shared on GitHub, Telegram, or other leak sites?<br />
<br />
Or something else entirely?<br />
<br />
<b>If there is any available links or PDFs to learn deeper please drop in the comments, I would like to explore more.</b><br />
<br />
Would love to hear insights on what data sources are commonly used by tools like Intelligence X, DeHashed, Scylla, LeakCheck, etc.</div>
<div class="bbWrapper">I think its pretty clear they do a bit of all of the above.<br />
<script class="js-extraPhrases" type="application/json">
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
</script>
<div class="bbImageWrapper js-lbImage" title="1749214800629.png"
data-src="https://dna.fail/attachments/1749214800629-png.2236/" data-lb-sidebar-href="" data-lb-caption-extra-html="" data-single-image="1">
<img src="https://dna.fail/attachments/1749214800629-png.2236/"
data-url=""
class="bbImage"
data-zoom-target="1"
style=""
alt="1749214800629.png"
title="1749214800629.png"
width="1131" height="253" loading="lazy" />
</div>
<br />
<br />
More interesting to me is what data architecture they use to store, tag and index what I image is a vast ocean of data with its provenance. Most leaks have some level of dirty data, missing columns and fields, duplicates, etc. as well as trash data if it was a full DB dump. Just the ETL process is a pain for these muti GB data sets.<br />
<br />
I dont think they are much different than most of the more commercial data brokers, who gather in data from wherever they can, scraped, "permissioned", leaked or otherwise. Almost all of them operate in the grey IMO.</div>
<div class="bbWrapper">Thanks for the detailed information. Now I have no doubt. <a href="https://dna.fail/members/61634/" class="username" data-xf-init="member-tooltip" data-user-id="61634" data-username="@AllosOnama">@AllosOnama</a> <img class="smilie smilie--emoji" loading="lazy" alt="😀" title="Grinning face :grinning:" src="https://cdn.jsdelivr.net/joypixels/assets/6.6/png/unicode/64/1f600.png" data-shortname=":grinning:" /></div>
<div class="bbWrapper"><blockquote data-attributes="member: 57157" data-quote="a909us3r" data-source="post: 409470"
class="bbCodeBlock bbCodeBlock--expandable bbCodeBlock--quote js-expandWatch">
<div class="bbCodeBlock-title">
<a href="/goto/post?id=409470"
class="bbCodeBlock-sourceJump"
rel="nofollow"
data-xf-click="attribution"
data-content-selector="#post-409470">a909us3r said:</a>
</div>
<div class="bbCodeBlock-content">
<div class="bbCodeBlock-expandContent js-expandContent ">
Question for the InfoSec Community<br />
<br />
I've been exploring platforms like Intelligence X, where you can search for a domain or email and get results from leaked databases (sometimes in cleartext).<br />
I'm curious — from where do such platforms gather this data?<br />
<br />
Do they:<br />
<br />
1. Monitor breach forums (like BreachForums)?<br />
2. Pull from dark web marketplaces?<br />
3. Scrape from paste sites (e.g., Pastebin)?<br />
4. Use public dumps shared on GitHub, Telegram, or other leak sites?<br />
<br />
Or something else entirely?<br />
<br />
<b>If there is any available links or PDFs to learn deeper please drop in the comments, I would like to explore more.</b><br />
<br />
Would love to hear insights on what data sources are commonly used by tools like Intelligence X, DeHashed, Scylla, LeakCheck, etc.
</div>
<div class="bbCodeBlock-expandLink js-expandLink"><a role="button" tabindex="0">Click to expand...</a></div>
</div>
</blockquote>ty</div>