republican-creole
site Search:


 
    All Forums Hot Topics Gallery






how-to block ads


 
Search Topic:
Uniqs:
3825
Share Topic
Posting?
Post a:
Post a:
Links: ·Hijack This logs? ·Panda Free Tools ·Vundo Removal
page: 1 · 2 · 3
AuthorAll Replies


Steve
I know your IP address
Consultant
join:2001-03-10
Yorba Linda, CA
kudos:5

1 edit

Understanding crypto hashes

With all the hubbub regarding the crypto hashes I thought I would give a bit of background on how they work. I'm not really addressing the flaws in particular, mainly because I simply cannot hack the math involved, but the "big picture" might be useful.

MD5 and SHA-* are in the family of "cryptographic one-way hashes", which means they can take a large chunk of data and produce a "fingerprint" that you could say identifies the stream of data. This stream of data could be "a file", "an email message" or - curiously - "a password". This fixed-length fingerprint is also called a "digest", a "hash" or a "checksum"

Two very common uses of crypto hashes:
  • you download a file from a website, and they post the checksums of the data (example here), and you can run a checksum after you've downloaded it. If the checksum doesn't match, then either the file was damaged in transit, or the file was tampered with at the download site. Good thing you verified the checksum!

  • when Linux stores user passwords in the password file, it doesn't store them in cleartext (or anybody could easily get all the passwords). Instead, it can run an MD5 checksum on the password and produce a 128-bit checksum:
    testuser:$1$pTJhZVZ6$iLoDMnbPSekU.lot59fQx/:
    This long string (starting with $1$) is the MD5 checksum, and when you're attempting to login, the password you type is hashed and compared with this string: if they match, you must have typed the right password so it lets you in.
There are two key attributes that make these important:
  1. they are "one way" hashes, meaning that given a hash, you can't recover any knowledge of the original input data (even if the input data was smaller than the output data). If you know somebody's MD5 password hash, you can't get the cleartext password back.
  2. given a random checksum, you can't easily create a different stream of data taht produces the same checksum
It should be self-evident that if there are more bits in the input data than in the checksum, there will be more than one input stream that maps to the same checksum, this is known as a "hash collision", and it's at the center of the current news.

I'll note here that this is not "encryption", where you convert your data stream into a safe format while retaining the ability to get it back later with the right key. Hashes are strictly one way algorithms.

Even though hash collisions are inherently possible, it's supposed to be really hard to intentionally create a data stream with a specific final checksum. Let's see what could happen if somebody figures out a way to generate a collission on purpose.

In the news recently was the whole Bit Torrent/ XP/SP2 mess, where Microsoft XP Service Pack 2 was being distributed on a torrent. Legal/copyright issues aside, the way you could know that the file you got from a torrent was good is by comparing the checksums - I think Microsoft published the sums on their website somewhere.

But if I'm a bad guy, I could take the XP/SP2 distribution, modify it to contain bad stuff, and tweak it to create the same checksum and post it on a download site somewhere. Because the checksums match, everybody would assume the file was OK. Oops.

The security of one-way crypto hashes positively relies on the inability to intentionally create a collision, and this is what the researchers seem to be able to do. I suspect it's always been possible to create a collision with brute-force means, but it requires iterating over 2^128 possible values, and this is simply computationally infeasable.

My reading of the research says that the researchers have found ways to do this "faster than brute force", but still not quite "fast". I get a little bleary-eyed with the math, but it looks like there is a "family" of collisions that are available in 2^40 (a bit more than a trillion) hash steps.

The difference between 2^40 steps and 2^128 steps is about 28 orders of magnitude (!).

Many of our crypto algorithms rely on "things being hard in the opposite direction" (e.g., "one way" algorithms), such as the anti-collision properties of secure hashes, and the inability to factor very very very large nubmers (used for RSA public key crypto) in anything other than brute-force time. When these are called into question - as the hashes are - it's very disruptive.

In the crypto world, this development appears to be Earth shattering: the fundamental underpinnings of some very popular algorithms are in question. I am not qualified enough in crypto to weigh in with more than this hand-waving overview here, so I'm waiting for the cryptorati like my hero Bruce Schneier to provide practical, informed, non-hyped guidance on this.

Steve

--
Stephen J. Friedl * Security Consultant * Tustin, California USA * my web site


MrFixIT
Premium
join:2002-04-12
here

Excellent summary Steve!


x539

join:2003-08-23
Oklahoma City, OK

reply to Steve

quote:
But if I'm a bad guy, I could take the XP/SP2 distribution, modify it to contain bad stuff, and tweak it to create the same checksum and post it on a download site somewhere. Because the checksums match, everybody would assume the file was OK. Oops.

The security of one-way crypto hashes positively relies on the inability to intentionally create a collision, and this is what the researchers seem to be able to do. I suspect it's always been possible to create a collision with brute-force means, but it requires iterating over 2^128 possible values, and this is simply computationally infeasable.
Not unless you're a bad guy with a time machine or a bad guy that knows more than any living cryptographer at this point in time.

Just because you can generate something that has the same checksum as the XP/SP2 distribution (create a collision) does not mean that you can create something of your choosing that matches the checksum. You can generate a garbage file, but that's not going to be useful either to you or the intended victim unless you just want to annoy people by getting them to download a large garbage file instead of the real thing.


Steve
I know your IP address
Consultant
join:2001-03-10
Yorba Linda, CA
kudos:5

said by x539:
Not unless you're a bad guy with a time machine or a bad guy that knows more than any living cryptographer at this point in time.
I generally agree with your overall sentiment, though I think that once things start falling apart at the edges of an algorithm, it's hard to say what happens next.

I agree that "make badware match goodware's MD5" is still a very hard problem.

Steve
--
Stephen J. Friedl * Security Consultant * Tustin, California USA * my web site

x539

join:2003-08-23
Oklahoma City, OK

quote:
I generally agree with your overall sentiment, though I think that once things start falling apart at the edges of an algorithm, it's hard to say what happens next.
Completely agreed. My point was not that there aren't problems here, but that the specific example you gave is not feasible at this point in time. Don't want people panicking that they might have installed a trojaned version of SP2 even if the md5 matched ;-D

The biggest risk right now (I would think) is for programs that use checksums to verify more or less arbitrary input. Something along the lines of password checking, where generated garbage would be as acceptable in terms of input as the true password.


Steve
I know your IP address
Consultant
join:2001-03-10
Yorba Linda, CA
kudos:5

said by x539:
My point was ...
... that Steve is not really a crypto guy and that you're helping fill out the rough edges in his analysis

Steve
--
Stephen J. Friedl * Security Consultant * Tustin, California USA * my web site


KAD Imaging
Just Shoot It
Premium
join:2002-09-21
Hialeah, FL

reply to Steve
What's really funny to me is that I used to say that this was possible (even somewhere on BBR) and was chided quite brutally for it. Thing is...my "modest" 14 years in all facets of IT has taught me one thing...

"That which is done, can always be undone. The problem is time."

It may not be prophetic, but I think it holds true. How can the human mind stump itself?? It can't. Anything someone can create, someone else can destroy. (More so if you by into the whole "4th dimension" thought process where certain persons can think beyond the boundaries of our understanding.)

Software language and distribution is going to get interesting from here out...
--
-CK
Q: "What does a cold air intake, headers, catback, highflow cat , & Port Polish give you??
A: "Ricer on a plate!" lol. Visit SportCompactMiami.com



rchandra
Stargate Universe fan
Premium
join:2000-11-09
14225-2105

reply to Steve
I would like to clarify some things.

A "checksum," as its name implies, is a sum. It is really only used to refer to that class of hashes that use simple summing of data. The algorithms used to generate a checksum vary a little (number of bits in the result, number of data bits in each datum, whether signed or unsigned arithmetic is used to compute the checksum, what is done with an overflow/carry when the output bit size is exceeded, etc.), but it's all basically the same. This is not suitable for any more than a preliminary check of the data because it is susceptible to lots of errors. For example, if a bit is inverted in one datum, and the same bit is also inverted in the next datum, the checksum may come out the same despite two bits being inverted in the data. Similarly, if the order of two data are rearranged, the checksum will still be the same. CRCs (cyclical redundancy checks), MD5s, SHAs and so on make the order of the data significant.

Ironically enough, "SHA" means "secure hash algorithm"

Considering the recent discoveries, if arbitrary data are allowed, it may be possible (albeit difficult) to produce a collision (also sometimes called a "clash") by altering data in more than one place in the file. But what would be difficult in that case would be keeping the file size the same. Consider too that some of the data could be compressed from the original, which would provide a bigger "canvas" for one's arbitrary data to make the computed hash the same while retaining the same file size.

I'll also second another poster's notion that security it seems is at best long term but nonetheless temporary. I am not aware of a lock that's been devised that can't be picked if given enough time. _The Adolenscence Of P-1_ is a novel where this is one of the central themes. Thomas J. Ryan describes security as being a dynamic delaying device. Invention of new locks (hash algorithms) is needed constantly to stay as many steps ahead of the pickers (crackers) as possible. Moore's Law has held for a while, so that which is computationally infeasible now (or too expensive) might not necessarily be so in a few years or a decade.
--
English is a difficult enough language to interpret correctly when its rules are followed, let alone when a writer chooses not to follow those rules. Blog is here
Jeopardy! replies REALLY suck!



jig

join:2001-01-05
Hacienda Heights, CA

reply to x539


i guess i'm not so much worried about someone inserting malware (into a service pack, for example) as someone just inserting incorrect data. if this news is saying that the latter is possible, then even just that is enough to get in a little tizzy about. a program not working correctly can be just as bad or worse than someone's attempt at a working program that purposefully does harm (malware).



keith2468
Premium,MVM
join:2001-02-03
Winnipeg, MB

3 edits

reply to Steve
This is maybe stating the obvious, but if the data whose hash-key you are trying to emulate includes something free-form, like an image or icon, or a compressed image file, the task of making hask keys match after a specific code change will be much simpler than if the data is all executable code and constant data.

Whether it would take 1 week or 10 years elapsed time on a dedicated 3GHz P4 I have no idea.


ghost16825
Use security metrics
Premium
join:2003-08-26

reply to jig

said by jig:


i guess i'm not so much worried about someone inserting malware (into a service pack, for example) as someone just inserting incorrect data. if this news is saying that the latter is possible, then even just that is enough to get in a little tizzy about. a program not working correctly can be just as bad or worse than someone's attempt at a working program that purposefully does harm (malware).

But as has been stated repeatedly, would it still be a working program (executable)?
More that likely nothing would execute at all and you would get "This program is not a valid application" kind of errors. The result is a "junk" file which simply doesn't execute. (full stop) (period) (dot)


Steve
I know your IP address
Consultant
join:2001-03-10
Yorba Linda, CA
kudos:5

said by ghost16825:
But as has been stated repeatedly, would it still be a working program (executable)?
More that likely nothing would execute at all and you would get "This program is not a valid application" kind of errors. The result is a "junk" file which simply doesn't execute. (full stop) (period) (dot)
If it's a PE executable - most are - there is a whole infrastructure for supporting "sections", so you can park huge amounts of data that won't get in the way of anything.

Steve
--
Stephen J. Friedl * Security Consultant * Tustin, California USA * my web site


Snowy
mIRC unix.ro UnderNet
Premium
join:2003-04-05
Kailua, HI
kudos:5
Reviews:
·RoadRunner Cable
·Clearwire Wireless

reply to ghost16825
"But as has been stated repeatedly, would it still be a working program (executable)?

I'd guess not always, but I believe it would be reasonable to expect something, to unexpectedly occur.
Interesting topic.
--
Dave said "By the way, 4294967295 is just another way to write -1".



Steve
I know your IP address
Consultant
join:2001-03-10
Yorba Linda, CA
kudos:5

reply to Steve
In my research this evening I've found a bit more on attributes of cryptographic hash functions, and I'll summarize what I found. I love learning new terms

Note that "very hard" means "computationally infeasible", which is pretty much the same as "brute force".

Collision resistance means that it's very hard to find two inputs that have the same hash value, but you get to pick both inputs.

Preimage resistance means it's very hard to create an input that matches a given hash (but you don't know the input that led to the hash of interest).

Second preimage resistance means it's very hard to create an input that matches a given hash, but you do know the input stream that led to that hash.
I am going out on a limb here to find actual applications for each of them, but I'll try.

I want to beat collision resistance to create two messages that have the same hash: "I agree to the contract" and "I do not agree with the contract", sign the hash with my private key, and send the message. I can later swap out the message itself with the new one and convince everybody that I signed the "other" thing becausethe hash matches.

I want to beat preimage resistance if I am given a hash (an MD5 hash of a UNIX password) and I want to find a word that creates that hash. Remember that when dealing with hashed passwords, I don't have to find the same word you used: I just have to find any word that generates the same hash.

I want to beat second preimage resistance if I want to tamper with a tarball with published MD5 sums. I generate a different tarball with bad stuff inside, but manage to give it the same sum so careful people believe it's genuine.

The researchers only appear to have addressed collision resistance, and by all accounts they have done stunning work. And though they don't seem to have touch the other ones, once an algorithm is wounded, the sharks come from far and wide, and it's a bad bet that these other areas won't receive a little unwanted attention.

Steve
--
Stephen J. Friedl * Security Consultant * Tustin, California USA * my web site

B
Premium,MVM
join:2000-10-28


Very nice thread, Steve. Thank you!

-- B
--
In a realm outside causality and function



Steve
I know your IP address
Consultant
join:2001-03-10
Yorba Linda, CA
kudos:5

I'm here only to serve...

would you like fries with that?


B
Premium,MVM
join:2000-10-28

reply to Steve

said by Steve:
so I'm waiting for the cryptorati like my hero Bruce Schneier to provide practical, informed, non-hyped guidance on this.

And now we have
»Bruce Schneier's Thoughts on the Hash Stories !

-- B
--
In a realm outside causality and function


Steve
I know your IP address
Consultant
join:2001-03-10
Yorba Linda, CA
kudos:5

reply to Steve
Whew, I think I'm finished.

I have spent much of the last three days working on a much more comprehensive Tech Tip to provide background that anybody can understand crypto hashes, how they are used, what "breaking them" means, and what the implications are for the latest news.

Unixwiz.net Tech Tip: An Illustrated Guide to Cryptographic Hashes

Feedback and comments - even typos and spelling errors - are welcome.

Steve
--
Stephen J. Friedl * Security Consultant * Tustin, California USA * my web site



atangel
Now What??
Premium
join:2002-02-18
Bronx, NY

Nicely done. And thanks.



Steve
I know your IP address
Consultant
join:2001-03-10
Yorba Linda, CA
kudos:5

reply to Steve

Wow, digging around in Prof. Edward Felten's weblog there is a bit of perl code that shows two 1024-bit numbers that produce the same hash:

#!/usr/bin/perl -w

use strict;
use Digest::MD5 qw(md5_hex);

# Create a stream of bytes from hex.
my $bytes1 = map {chr(hex($_))} qw(
d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
2f ca b5 87 12 46 7e ab 40 04 58 3e b8 fb 7f 89
55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 71 41 5a
08 51 25 e8 f7 cd c9 9f d9 1d bd f2 80 37 3c 5b
d8 82 3e 31 56 34 8f 5b ae 6d ac d4 36 c9 19 c6
dd 53 e2 b4 87 da 03 fd 02 39 63 06 d2 48 cd a0
e9 9f 33 42 0f 57 7e e8 ce 54 b6 70 80 a8 0d 1e
c6 98 21 bc b6 a8 83 93 96 f9 65 2b 6f f7 2a 70
);

# Create a second stream of bytes from hex.
my $bytes2 = map {chr(hex($_))} qw(
d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
2f ca b5 07 12 46 7e ab 40 04 58 3e b8 fb 7f 89
55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 f1 41 5a
08 51 25 e8 f7 cd c9 9f d9 1d bd 72 80 37 3c 5b
d8 82 3e 31 56 34 8f 5b ae 6d ac d4 36 c9 19 c6
dd 53 e2 34 87 da 03 fd 02 39 63 06 d2 48 cd a0
e9 9f 33 42 0f 57 7e e8 ce 54 b6 70 80 28 0d 1e
c6 98 21 bc b6 a8 83 93 96 f9 65 ab 6f f7 2a 70
);

# Print MD5 hashes
print md5_hex($bytes1), "\n";
print md5_hex($bytes2), "\n";

--
Stephen J. Friedl * Security Consultant * Tustin, California USA * my web site

Wednesday, 23-May 05:58:53 Terms of Use & Privacy | feedback | contact | Hosting by nac.net - DSL,Hosting & Co-lo
over 12.5 years online © 1999-2012 dslreports.com.
Most commented news this week
Hot Topics