X-Git-Url: https://www.fleuret.org/cgi-bin/gitweb/gitweb.cgi?a=blobdiff_plain;f=finddup.1;h=faaef4de3dc324f2786f6ebb38cf48608c5a926f;hb=d2131aa73229a6e713cca1ca95b714ff8e68ca12;hp=9cc21b4f13e9f2c5536f0a89423ab9c37bc0a240;hpb=a61c9478f31b957e0d4007df9feddd6f0139ccf8;p=finddup.git diff --git a/finddup.1 b/finddup.1 index 9cc21b4..faaef4d 100644 --- a/finddup.1 +++ b/finddup.1 @@ -63,16 +63,28 @@ show the real path of the files files with same inode are considered as different .TP \fB-m\fR, \fB--md5\fR -use MD5 hashing +use MD5 hashing (if compiled with the option) .SH "BUGS" None known, probably many. Valgrind does not complain though. -The MD5 hashing often hurts more than it helps, hence it is off by -default. The only case when it should really be useful is when you -have plenty of different files of same size, which does not happen -often. +The MD5 hashing is not satisfactory. It is computed for a file only if +the said file has to be read fully for a comparison (i.e. two files +match and we have to read them completely). + +Hence, in practice lot of partial MD5s are computed, which costs a lot +of cpu and is useless. This often hurts more than it helps. The only +case when it should really be useful is when you have plenty of +different files of same size, and lot of similar ones, which does not +happen often. + +Forcing the files to be read fully so that the MD5s are properly +computed is not okay neither, since it would fully read certain files, +even if we will never need their MD5s. + +Anyway, it has to be compiled in with 'make WITH_MD5=yes', and even in +that case it will be off by default .SH "WISH LIST"