On Thu, Jan 19, 2012 at 09:21:17AM +0100, valentino.angeletti (at) enel (dot) com [email concealed] wrote:
> may ask you what software (and how it works brute force ecc) you used?

John the Ripper, indeed - generating a custom .chr file (which is based
on trigraph frequencies) from a sample of 1 million of pwgen'ed
passwords and then using this file to crack another (non-overlapping)
sample of pwgen'ed passwords. My initial notification to oss-security
and Bugtraq included these links, which describe this in more detail:

However, as I wrote in a followup posting to oss-security 2 days ago:

"I might update/revise my analysis on this issue in a few days.

Specifically, I now suspect that a (large) part of the apparent
non-uniformity of the distribution was in fact an artifact of my
analysis approach. I only analyzed sets of 1 million of pwgen'ed
passwords, so I could not directly check the distribution of full
passwords (1 million is too little, even compared to the small keyspace
of these passwords), whereas JtR only uses trigraph frequencies.

I am now generating 1 billion of pwgen'ed passwords, which should take a
couple of days to complete. [...]"

$ ./pwgen -1cn 8 1000000000 | dd obs=10M > 1g
17578125+0 records in
858+1 records out
9000000000 bytes (9.0 GB) copied, 147496 seconds, 61.0 kB/s

And I analyzed this larger sample briefly:

$ time ~/john/john-1.7.9-jumbo-5/run/unique -v -mem=25 1gu < 1g
Total lines read 1000000000 Unique lines written 697066573

real 144m40.619s
user 142m8.727s
sys 0m39.645s

So that's 697 million unique passwords in 1 billion, which for a uniform
distribution would correspond to a total keyspace size of 1.3 billion:

$ ./solve 697066573 1000000000
1296935185

I've attached the solve.c program to this message. [ BTW, I verified
that there's no fatal precision loss in its expected_different()
function (despite of the risky expression) for the value ranges on which
it is called here. I did so by also computing the expected different
value with a different (much slower) algorithm - just not as part of
equation solving (which would be slower yet). ]

However, let's see what numbers we get for smaller samples (actually,
subsets of the 1 billion sample above, but that's OK in this case):

Total lines read 100000000 Unique lines written 89163247
Total lines read 10000000 Unique lines written 9811335
Total lines read 1000000 Unique lines written 997978

As we can see, the guess for the total keyspace size keeps increasing as
we increase the sample size. That's under assumption that we have a
uniform distribution. Hence, our distribution is non-uniform.

That said, the keyspace may in fact be smaller than I had expected,
although I haven't hit it with my 1 billion sample yet. So we have a
mix of two problems here: likely small keyspace and non-uniform
distribution.

My John the Ripper pwgen.chr attack was probably testing a lot of
passwords that are actually impossible, so a much faster attack (even
more specifically focused on pwgen'ed passwords) should be possible.

I think I underestimated just how much smaller pwgen's pronounceable
passwords keyspace is compared to the full {62 different, length 8}
keyspace, although we still do not have the exact number.

I continue to think that the primary problem in terms of pwgen use is
that these passwords look much stronger than they actually are. For
example:

Looking at these, how many people would realize that the keyspace for
them may be thousands of times smaller than the full {62 different,
length 8} keyspace and that the distribution may be non-uniform?

Based on the 1 billion sample, the keyspace is 168,350 times smaller,
although this estimate has the non-uniformity "factored in" (a larger
sample would show a somewhat larger keyspace estimate).

A partial fix may be for pwgen to print a warning each time it is used
in this mode and with output to a tty (it already behaves differently
based on whether its output is a tty or not, so that won't be a new
drawback). Also, the default mode may be changed to the "secure" one,
with the weak alternative available via a non-default option.

<plug>
Or indeed people can just use pwqgen instead:

http://www.openwall.com/passwdqc/

$ for n in {1..10}; do pwqgen; done
Warm5Claw4Blame
hungry5tomato3Yeah
Midst_Vowel9Spate
Ohio7steak$Mild
Taxi&desert+gorge
fond-Pint=easy
mode6oldest5chief
Defeat7Oval-Anew
vomit+ate2Slid
tehran8hang3ritual

These are 47-bit (similar to the full {62 different, length 8} keyspace,
but at a longer length and easier to memorize). This can easily be
adjusted from the command-line:

$ for n in {1..3}; do pwqgen random=30; done
spend!deep
Alkali-self
decay9your
$ for n in {1..3}; do pwqgen random=64; done
meet-draft9Gun*wire
inner+Rusty4dogma3tape
Switch8Sword9even=Viral

The 30-bit ones above are comparable to pwgen's in security (about as
weak). In fact, they may be slightly better due to uniform distribution
(as long as /dev/urandom works well). The 64-bit ones are unreasonable
for most users/uses. The default of 47 bits is reasonable, although as
always this depends on threat model.
</plug>

Alexander
#include <math.h>
#include <stdio.h>
#include <stdlib.h>

min = 0; max = pow(62, 8);
do {
keyspace = (min + max) * 0.5;
if (expected_different(keyspace, total) > unique)
max = keyspace;
else
min = keyspace;
} while (max - min > 0.5);

> may ask you what software (and how it works brute force ecc) you used?

John the Ripper, indeed - generating a custom .chr file (which is based

on trigraph frequencies) from a sample of 1 million of pwgen'ed

passwords and then using this file to crack another (non-overlapping)

sample of pwgen'ed passwords. My initial notification to oss-security

and Bugtraq included these links, which describe this in more detail:

http://www.openwall.com/lists/john-users/2010/11/17/7

http://www.openwall.com/lists/john-users/2010/11/22/5

http://www.openwall.com/lists/john-users/2010/11/28/1

http://www.openwall.com/lists/john-users/2010/12/06/1

However, as I wrote in a followup posting to oss-security 2 days ago:

"I might update/revise my analysis on this issue in a few days.

Specifically, I now suspect that a (large) part of the apparent

non-uniformity of the distribution was in fact an artifact of my

analysis approach. I only analyzed sets of 1 million of pwgen'ed

passwords, so I could not directly check the distribution of full

passwords (1 million is too little, even compared to the small keyspace

of these passwords), whereas JtR only uses trigraph frequencies.

I am now generating 1 billion of pwgen'ed passwords, which should take a

couple of days to complete. [...]"

http://www.openwall.com/lists/oss-security/2012/01/17/14

This has in fact completed by now:

$ ./pwgen -1cn 8 1000000000 | dd obs=10M > 1g

17578125+0 records in

858+1 records out

9000000000 bytes (9.0 GB) copied, 147496 seconds, 61.0 kB/s

And I analyzed this larger sample briefly:

$ time ~/john/john-1.7.9-jumbo-5/run/unique -v -mem=25 1gu < 1g

Total lines read 1000000000 Unique lines written 697066573

real 144m40.619s

user 142m8.727s

sys 0m39.645s

So that's 697 million unique passwords in 1 billion, which for a uniform

distribution would correspond to a total keyspace size of 1.3 billion:

$ ./solve 697066573 1000000000

1296935185

I've attached the solve.c program to this message. [ BTW, I verified

that there's no fatal precision loss in its expected_different()

function (despite of the risky expression) for the value ranges on which

it is called here. I did so by also computing the expected different

value with a different (much slower) algorithm - just not as part of

equation solving (which would be slower yet). ]

However, let's see what numbers we get for smaller samples (actually,

subsets of the 1 billion sample above, but that's OK in this case):

Total lines read 100000000 Unique lines written 89163247

Total lines read 10000000 Unique lines written 9811335

Total lines read 1000000 Unique lines written 997978

$ ./solve 89163247 100000000

427419891

$ ./solve 9811335 10000000

261676022

$ ./solve 997978 1000000

246946702

As we can see, the guess for the total keyspace size keeps increasing as

we increase the sample size. That's under assumption that we have a

uniform distribution. Hence, our distribution is non-uniform.

That said, the keyspace may in fact be smaller than I had expected,

although I haven't hit it with my 1 billion sample yet. So we have a

mix of two problems here: likely small keyspace and non-uniform

distribution.

My John the Ripper pwgen.chr attack was probably testing a lot of

passwords that are actually impossible, so a much faster attack (even

more specifically focused on pwgen'ed passwords) should be possible.

I think I underestimated just how much smaller pwgen's pronounceable

passwords keyspace is compared to the full {62 different, length 8}

keyspace, although we still do not have the exact number.

I continue to think that the primary problem in terms of pwgen use is

that these passwords look much stronger than they actually are. For

example:

$ pwgen

athu9Bee Vae0jexa rae2Oa1c Aim8Ku3c No5aep0F OhY5quee ieVae2ti wah1aiM2

oaNg1oth baePule5 sod8oH6i ohfoh5Du Pai9Uch7 AeG3bies Maev6tae iKievae9

zo9eiSai Xito9aid iGh3ay8s owib0Ub8 Yahm0oaC Wu3VaiK7 IeK3sah2 xai7Eico

...

Looking at these, how many people would realize that the keyspace for

them may be thousands of times smaller than the full {62 different,

length 8} keyspace and that the distribution may be non-uniform?

Based on the 1 billion sample, the keyspace is 168,350 times smaller,

although this estimate has the non-uniformity "factored in" (a larger

sample would show a somewhat larger keyspace estimate).

A partial fix may be for pwgen to print a warning each time it is used

in this mode and with output to a tty (it already behaves differently

based on whether its output is a tty or not, so that won't be a new

drawback). Also, the default mode may be changed to the "secure" one,

with the weak alternative available via a non-default option.

<plug>

Or indeed people can just use pwqgen instead:

http://www.openwall.com/passwdqc/

$ for n in {1..10}; do pwqgen; done

Warm5Claw4Blame

hungry5tomato3Yeah

Midst_Vowel9Spate

Ohio7steak$Mild

Taxi&desert+gorge

fond-Pint=easy

mode6oldest5chief

Defeat7Oval-Anew

vomit+ate2Slid

tehran8hang3ritual

These are 47-bit (similar to the full {62 different, length 8} keyspace,

but at a longer length and easier to memorize). This can easily be

adjusted from the command-line:

$ for n in {1..3}; do pwqgen random=30; done

spend!deep

Alkali-self

decay9your

$ for n in {1..3}; do pwqgen random=64; done

meet-draft9Gun*wire

inner+Rusty4dogma3tape

Switch8Sword9even=Viral

The 30-bit ones above are comparable to pwgen's in security (about as

weak). In fact, they may be slightly better due to uniform distribution

(as long as /dev/urandom works well). The 64-bit ones are unreasonable

for most users/uses. The default of 47 bits is reasonable, although as

always this depends on threat model.

</plug>

Alexander

#include <math.h>

#include <stdio.h>

#include <stdlib.h>

static double expected_different(double keyspace, double subset)

{

return keyspace * (1.0 - pow((keyspace - 1.0) / keyspace, subset));

}

int main(int argc, char **argv) {

double unique, total, keyspace, min, max;

if (argc != 3)

return 1;

unique = atof(argv[1]);

total = atof(argv[2]);

min = 0; max = pow(62, 8);

do {

keyspace = (min + max) * 0.5;

if (expected_different(keyspace, total) > unique)

max = keyspace;

else

min = keyspace;

} while (max - min > 0.5);

printf("%.0f\n", keyspace);

return 0;

}

[ reply ]