X-Git-Url: https://www.fleuret.org/cgi-bin/gitweb/gitweb.cgi?p=finddup.git;a=blobdiff_plain;f=finddup.1;h=896cc88a4090a8fc9e85f965bd52424d56d60c8e;hp=dbe96411cbde0e264ce3cf04e944ce2035586e4d;hb=HEAD;hpb=53e31a1b26de7b33880e6860c1096f7d0284e0eb diff --git a/finddup.1 b/finddup.1 index dbe9641..896cc88 100644 --- a/finddup.1 +++ b/finddup.1 @@ -95,7 +95,7 @@ file content. Here are the things I tried, which did not help at all: (1) Computing md5s on the whole files, which is not satisfactory because files are -often not read entirely, hence the md5s can not be properly computed, +often not read entirely, hence the md5s cannot be properly computed, (2) computing XORs of the first 4, 16 and 256 bytes with rejection as soon as one does not match, (3) reading files in parts of increasing sizes so that rejection could be done with only a small fraction read @@ -105,7 +105,7 @@ when possible, (4) using mmap instead of open/read. The format of the output should definitely be improved. Not clear how. -Their could be some fancy option to link two instances of the command +There could be some fancy option to link two instances of the command running on different machines to reduce network disk accesses. This may not help much though.