CVS log for pgsql/contrib/fuzzystrmatch/fuzzystrmatch.c

[BACK] Up to [PostgreSQL CVS Repository] / pgsql / contrib / fuzzystrmatch

Request diff between arbitrary revisions - Display revisions graphically


Keyword substitution: kv
Default branch: MAIN


Revision 1.34: download - view: text, markup, annotated - select for diffs
Mon Aug 2 23:20:23 2010 UTC (17 months, 3 weeks ago) by rhaas
Branches: MAIN
CVS tags: REL9_1_ALPHA1, HEAD
Diff to: previous 1.33: preferred, colored
Changes since revision 1.33: +118 -19 lines
Teach levenshtein() about multi-byte characters.

Based on a patch by, and further ideas from, Alexander Korotkov.

Revision 1.33: download - view: text, markup, annotated - select for diffs
Thu Jul 29 20:11:48 2010 UTC (18 months ago) by rhaas
Branches: MAIN
Diff to: previous 1.32: preferred, colored
Changes since revision 1.32: +11 -11 lines
Avoid using text_to_cstring() in levenshtein functions.

Operating directly on the underlying varlena saves palloc and memcpy
overhead, which testing shows to be significant.

Extracted from a larger patch by Alexander Korotkov.

Revision 1.32: download - view: text, markup, annotated - select for diffs
Sat Jan 2 16:57:32 2010 UTC (2 years ago) by momjian
Branches: MAIN
CVS tags: REL9_0_STABLE, REL9_0_RC1, REL9_0_BETA4, REL9_0_BETA3, REL9_0_BETA2, REL9_0_BETA1, REL9_0_ALPHA5_BRANCH, REL9_0_ALPHA5, REL9_0_ALPHA4_BRANCH, REL9_0_ALPHA4, REL9_0_0
Diff to: previous 1.31: preferred, colored
Changes since revision 1.31: +2 -2 lines
Update copyright for the year 2010.

Revision 1.30.2.1: download - view: text, markup, annotated - select for diffs
Thu Dec 10 01:54:21 2009 UTC (2 years, 1 month ago) by rhaas
Branches: REL8_4_STABLE
CVS tags: REL8_4_4, REL8_4_3, REL8_4_2
Diff to: previous 1.30: preferred, colored; next MAIN 1.31: preferred, colored
Changes since revision 1.30: +7 -7 lines
Fix levenshtein with costs.  The previous code multiplied by the cost in only
3 of the 7 relevant locations.

Marcin Mank, slightly adjusted by me.

Revision 1.31: download - view: text, markup, annotated - select for diffs
Thu Dec 10 01:54:17 2009 UTC (2 years, 1 month ago) by rhaas
Branches: MAIN
CVS tags: REL8_5_ALPHA3_BRANCH, REL8_5_ALPHA3
Diff to: previous 1.30: preferred, colored
Changes since revision 1.30: +7 -7 lines
Fix levenshtein with costs.  The previous code multiplied by the cost in only
3 of the 7 relevant locations.

Marcin Mank, slightly adjusted by me.

Revision 1.30: download - view: text, markup, annotated - select for diffs
Thu Jun 11 14:48:51 2009 UTC (2 years, 7 months ago) by momjian
Branches: MAIN
CVS tags: REL8_5_ALPHA2_BRANCH, REL8_5_ALPHA2, REL8_5_ALPHA1_BRANCH, REL8_5_ALPHA1, REL8_4_RC2, REL8_4_RC1, REL8_4_1, REL8_4_0
Branch point for: REL8_4_STABLE
Diff to: previous 1.29: preferred, colored
Changes since revision 1.29: +40 -39 lines
8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.

Revision 1.29: download - view: text, markup, annotated - select for diffs
Tue Apr 7 15:53:54 2009 UTC (2 years, 9 months ago) by tgl
Branches: MAIN
CVS tags: REL8_4_BETA2, REL8_4_BETA1
Diff to: previous 1.28: preferred, colored
Changes since revision 1.28: +28 -10 lines
Defend against non-ASCII letters in fuzzystrmatch code.  The functions
still don't behave very sanely for multibyte encodings, but at least
they won't be indexing off the ends of static arrays.

Revision 1.28: download - view: text, markup, annotated - select for diffs
Thu Jan 1 17:23:32 2009 UTC (3 years ago) by momjian
Branches: MAIN
Diff to: previous 1.27: preferred, colored
Changes since revision 1.27: +2 -2 lines
Update copyright for 2009.

Revision 1.27: download - view: text, markup, annotated - select for diffs
Thu Apr 3 21:13:07 2008 UTC (3 years, 9 months ago) by tgl
Branches: MAIN
Diff to: previous 1.26: preferred, colored
Changes since revision 1.26: +218 -123 lines
Add a variant of the Levenshtein string-distance function that lets the user
specify the cost values to use, instead of always using 1's.
Volkan Yazici

In passing, remove fuzzystrmatch.h, which contained a bunch of stuff that had
no business being in a .h file; fold it into its only user, fuzzystrmatch.c.

Revision 1.26: download - view: text, markup, annotated - select for diffs
Tue Mar 25 22:42:41 2008 UTC (3 years, 10 months ago) by tgl
Branches: MAIN
Diff to: previous 1.25: preferred, colored
Changes since revision 1.25: +17 -35 lines
Simplify and standardize conversions between TEXT datums and ordinary C
strings.  This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString.  A number of
existing macros that provided variants on these themes were removed.

Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed.  There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).

This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring.  We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.

Brendan Jurd and Tom Lane

Revision 1.25: download - view: text, markup, annotated - select for diffs
Tue Jan 1 19:45:45 2008 UTC (4 years ago) by momjian
Branches: MAIN
CVS tags: REL8_3_STABLE, REL8_3_RC2, REL8_3_RC1, REL8_3_9, REL8_3_8, REL8_3_7, REL8_3_6, REL8_3_5, REL8_3_4, REL8_3_3, REL8_3_2, REL8_3_11, REL8_3_10, REL8_3_1, REL8_3_0
Diff to: previous 1.24: preferred, colored
Changes since revision 1.24: +2 -2 lines
Update copyrights in source tree to 2008.

Revision 1.24: download - view: text, markup, annotated - select for diffs
Tue Feb 13 18:00:35 2007 UTC (4 years, 11 months ago) by momjian
Branches: MAIN
CVS tags: REL8_3_BETA4, REL8_3_BETA3, REL8_3_BETA2, REL8_3_BETA1
Diff to: previous 1.23: preferred, colored
Changes since revision 1.23: +4 -4 lines
Update /contrib/fuzzystrmatch error message to mention bytes, not just
'length', which can be characters.

Revision 1.23: download - view: text, markup, annotated - select for diffs
Fri Jan 5 22:19:18 2007 UTC (5 years ago) by momjian
Branches: MAIN
Diff to: previous 1.22: preferred, colored
Changes since revision 1.22: +2 -2 lines
Update CVS HEAD for 2007 copyright.  Back branches are typically not
back-stamped for this.

Revision 1.22: download - view: text, markup, annotated - select for diffs
Mon Jul 10 18:40:16 2006 UTC (5 years, 6 months ago) by momjian
Branches: MAIN
CVS tags: REL8_2_STABLE, REL8_2_RC1, REL8_2_BETA3, REL8_2_BETA2, REL8_2_BETA1, REL8_2_9, REL8_2_8, REL8_2_7, REL8_2_6, REL8_2_5, REL8_2_4, REL8_2_3, REL8_2_2, REL8_2_17, REL8_2_16, REL8_2_15, REL8_2_14, REL8_2_13, REL8_2_12, REL8_2_11, REL8_2_10, REL8_2_1, REL8_2_0
Diff to: previous 1.21: preferred, colored
Changes since revision 1.21: +4 -4 lines
Remove a few baby-C macros in fuzzystrmatch.  Add a few missing includes.

Revision 1.21: download - view: text, markup, annotated - select for diffs
Tue May 30 22:12:13 2006 UTC (5 years, 8 months ago) by tgl
Branches: MAIN
Diff to: previous 1.20: preferred, colored
Changes since revision 1.20: +3 -1 lines
Magic blocks don't do us any good unless we use 'em ... so install one
in every shared library.

Revision 1.20: download - view: text, markup, annotated - select for diffs
Sun Mar 19 22:22:56 2006 UTC (5 years, 10 months ago) by neilc
Branches: MAIN
Diff to: previous 1.19: preferred, colored
Changes since revision 1.19: +1 -5 lines
Fix a few places that were checking for the return value of palloc() to be
non-NULL: palloc() ereports on OOM, so we can safely assume it returns a
valid pointer.

Revision 1.19: download - view: text, markup, annotated - select for diffs
Sat Mar 11 04:38:29 2006 UTC (5 years, 10 months ago) by momjian
Branches: MAIN
Diff to: previous 1.18: preferred, colored
Changes since revision 1.18: +1 -0 lines
Add CVS tag lines to files that were lacking them.

Revision 1.18: download - view: text, markup, annotated - select for diffs
Sun Mar 5 15:58:19 2006 UTC (5 years, 10 months ago) by momjian
Branches: MAIN
Diff to: previous 1.17: preferred, colored
Changes since revision 1.17: +1 -1 lines
Update copyright for 2006.  Update scripts.

Revision 1.17: download - view: text, markup, annotated - select for diffs
Sat Oct 15 02:49:05 2005 UTC (6 years, 3 months ago) by momjian
Branches: MAIN
CVS tags: REL8_1_STABLE, REL8_1_9, REL8_1_8, REL8_1_7, REL8_1_6, REL8_1_5, REL8_1_4, REL8_1_3, REL8_1_21, REL8_1_20, REL8_1_2, REL8_1_19, REL8_1_18, REL8_1_17, REL8_1_16, REL8_1_15, REL8_1_14, REL8_1_13, REL8_1_12, REL8_1_11, REL8_1_10, REL8_1_1, REL8_1_0RC1, REL8_1_0BETA4, REL8_1_0
Diff to: previous 1.16: preferred, colored
Changes since revision 1.16: +36 -37 lines
Standard pgindent run for 8.1.

Revision 1.16: download - view: text, markup, annotated - select for diffs
Fri Sep 30 22:38:44 2005 UTC (6 years, 3 months ago) by momjian
Branches: MAIN
CVS tags: REL8_1_0BETA3
Diff to: previous 1.15: preferred, colored
Changes since revision 1.15: +1 -1 lines
One of the web pages mentioned in dmetaphone.c has moved.  Also fix
a few typos in comments.

The dictionaries I checked list "altho" as a variant of "although,"
but I didn't find any other instances of the former in the source
tree so I changed it.

Michael Fuhr

Revision 1.15: download - view: text, markup, annotated - select for diffs
Wed Jan 26 08:04:04 2005 UTC (7 years ago) by neilc
Branches: MAIN
CVS tags: REL8_1_0BETA2, REL8_1_0BETA1
Diff to: previous 1.14: preferred, colored
Changes since revision 1.14: +20 -0 lines
The attached patch implements the soundex difference function which
compares two strings' soundex values for similarity, from Kris Jurka.
Also mark the text_soundex() function as STRICT, to avoid crashing
on NULL input.

Revision 1.14: download - view: text, markup, annotated - select for diffs
Sat Jan 1 05:43:06 2005 UTC (7 years ago) by momjian
Branches: MAIN
CVS tags: REL8_0_STABLE, REL8_0_9, REL8_0_8, REL8_0_7, REL8_0_6, REL8_0_5, REL8_0_4, REL8_0_3, REL8_0_25, REL8_0_24, REL8_0_23, REL8_0_22, REL8_0_21, REL8_0_20, REL8_0_2, REL8_0_19, REL8_0_18, REL8_0_17, REL8_0_16, REL8_0_15, REL8_0_14, REL8_0_13, REL8_0_12, REL8_0_11, REL8_0_10, REL8_0_1, REL8_0_0RC5, REL8_0_0RC4, REL8_0_0
Diff to: previous 1.13: preferred, colored
Changes since revision 1.13: +1 -1 lines
Update copyrights that were missed.

Revision 1.13: download - view: text, markup, annotated - select for diffs
Sun Aug 29 04:12:17 2004 UTC (7 years, 5 months ago) by momjian
Branches: MAIN
CVS tags: REL8_0_0RC3, REL8_0_0RC2, REL8_0_0RC1, REL8_0_0BETA5, REL8_0_0BETA4, REL8_0_0BETA3, REL8_0_0BETA2
Diff to: previous 1.12: preferred, colored
Changes since revision 1.12: +1 -1 lines
Update copyright to 2004.

Revision 1.12: download - view: text, markup, annotated - select for diffs
Thu Jul 1 03:25:48 2004 UTC (7 years, 6 months ago) by joe
Branches: MAIN
CVS tags: REL8_0_0BETA1
Diff to: previous 1.11: preferred, colored
Changes since revision 1.11: +6 -0 lines
Add double metaphone code from Andrew Dunstan. Also change metaphone so that
an empty input string causes an empty output string to be returned, instead of
throwing an ERROR -- per complaint from Aaron Hillegass, and consistent with
double metaphone. Fix examples in README.soundex pointed out by James Robinson.

Revision 1.11: download - view: text, markup, annotated - select for diffs
Mon Aug 4 23:59:37 2003 UTC (8 years, 5 months ago) by tgl
Branches: MAIN
CVS tags: WIN32_DEV, REL7_4_STABLE, REL7_4_RC2, REL7_4_RC1, REL7_4_BETA5, REL7_4_BETA4, REL7_4_BETA3, REL7_4_BETA2, REL7_4_BETA1, REL7_4_9, REL7_4_8, REL7_4_7, REL7_4_6, REL7_4_5, REL7_4_4, REL7_4_3, REL7_4_29, REL7_4_28, REL7_4_27, REL7_4_26, REL7_4_25, REL7_4_24, REL7_4_23, REL7_4_22, REL7_4_21, REL7_4_20, REL7_4_2, REL7_4_19, REL7_4_18, REL7_4_17, REL7_4_16, REL7_4_15, REL7_4_14, REL7_4_13, REL7_4_12, REL7_4_11, REL7_4_10, REL7_4_1, REL7_4
Diff to: previous 1.10: preferred, colored
Changes since revision 1.10: +1 -1 lines
Fix some copyright notices that weren't updated.  Improve copyright tool
so it won't miss 'em again.

Revision 1.10: download - view: text, markup, annotated - select for diffs
Mon Aug 4 00:43:10 2003 UTC (8 years, 5 months ago) by momjian
Branches: MAIN
Diff to: previous 1.9: preferred, colored
Changes since revision 1.9: +3 -3 lines
pgindent run.

Revision 1.9: download - view: text, markup, annotated - select for diffs
Thu Jul 24 17:52:25 2003 UTC (8 years, 6 months ago) by tgl
Branches: MAIN
Diff to: previous 1.8: preferred, colored
Changes since revision 1.8: +24 -5 lines
Error message editing in contrib (mostly by Joe Conway --- thanks Joe!)

Revision 1.8: download - view: text, markup, annotated - select for diffs
Tue Jun 24 22:59:46 2003 UTC (8 years, 7 months ago) by momjian
Branches: MAIN
Diff to: previous 1.7: preferred, colored
Changes since revision 1.7: +6 -5 lines
Jim C. Nasby wrote:
> Second argument to metaphone is suposed to set the limit on the
> number of characters to return, but it breaks on some phrases:
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'Hello world'::varchar AS a) a;
> HLW       | HLWR      | HLWRLT
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'A A COMEAUX MEMORIAL'::varchar AS a) a;
  > AKM       | AKMKS     | AKMKSMMRL
>
> In every case I've found that does this, the 4th and 5th letters are
> always 'KS'.

Nice catch.

There was a bug in the original metaphone algorithm from CPAN. Patch
attached (while I was at it I updated my email address, changed the
copyright to PGDG, and removed an unnecessary palloc). Here's how it
looks now:

regression=# select metaphone(a,4) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
   metaphone
-----------
   AKMK
(1 row)

regression=# select metaphone(a,5) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
   metaphone
-----------
   AKMKS
(1 row)

Joe Conway

Revision 1.7: download - view: text, markup, annotated - select for diffs
Mon Mar 10 22:28:17 2003 UTC (8 years, 10 months ago) by tgl
Branches: MAIN
Diff to: previous 1.6: preferred, colored
Changes since revision 1.6: +3 -3 lines
This patch fixes a bunch of spelling mistakes in comments throughout the
PostgreSQL source code.

Neil Conway

Revision 1.6: download - view: text, markup, annotated - select for diffs
Sun Dec 30 23:09:41 2001 UTC (10 years, 1 month ago) by tgl
Branches: MAIN
CVS tags: REL7_3_STABLE, REL7_3_9, REL7_3_8, REL7_3_7, REL7_3_6, REL7_3_5, REL7_3_4, REL7_3_21, REL7_3_20, REL7_3_2, REL7_3_19, REL7_3_18, REL7_3_17, REL7_3_16, REL7_3_15, REL7_3_14, REL7_3_13, REL7_3_12, REL7_3_11, REL7_3_10, REL7_2_STABLE, REL7_2_RC2, REL7_2_RC1, REL7_2_BETA5, REL7_2_8, REL7_2_7, REL7_2_6, REL7_2_5, REL7_2_4, REL7_2_3, REL7_2
Diff to: previous 1.5: preferred, colored
Changes since revision 1.5: +10 -9 lines
Make sure that all <ctype.h> routines are called with unsigned char
values; it's not portable to call them with signed chars.  I recall doing
this for the last release, but a few more uncasted calls have snuck in.

Revision 1.5: download - view: text, markup, annotated - select for diffs
Mon Oct 29 19:41:54 2001 UTC (10 years, 3 months ago) by momjian
Branches: MAIN
CVS tags: REL7_2_BETA4, REL7_2_BETA3, REL7_2_BETA2
Diff to: previous 1.4: preferred, colored
Changes since revision 1.4: +2 -2 lines
Add trailing semicolon for Joe Conway

Revision 1.4: download - view: text, markup, annotated - select for diffs
Thu Oct 25 05:49:19 2001 UTC (10 years, 3 months ago) by momjian
Branches: MAIN
CVS tags: REL7_2_BETA1
Diff to: previous 1.3: preferred, colored
Changes since revision 1.3: +263 -232 lines
pgindent run on all C files.  Java run to follow.  initdb/regression
tests pass.

Revision 1.3: download - view: text, markup, annotated - select for diffs
Thu Oct 25 01:29:37 2001 UTC (10 years, 3 months ago) by momjian
Branches: MAIN
Diff to: previous 1.2: preferred, colored
Changes since revision 1.2: +2 -2 lines
Add do { ... } while (0) to more bad macros.

Revision 1.2: download - view: text, markup, annotated - select for diffs
Tue Aug 7 18:16:01 2001 UTC (10 years, 5 months ago) by momjian
Branches: MAIN
Diff to: previous 1.1: preferred, colored
Changes since revision 1.1: +68 -0 lines
Sorry - I should have gotten to this sooner. Here's a patch which you should
be able to apply against what you just committed. It rolls soundex into
fuzzystrmatch.

Remove soundex/metaphone and merge into fuzzystrmatch.

Joe Conway

Revision 1.1: download - view: text, markup, annotated - select for diffs
Tue Aug 7 16:47:43 2001 UTC (10 years, 5 months ago) by momjian
Branches: MAIN
Per this discussion, here's a patch to implement both levenshtein() and
metaphone() in a contrib. There seem to be a fair number of different
approaches to both of these algorithms. I used the simplest case for
levenshtein which has a cost  of 1 for any character insertion, deletion, or
substitution. For metaphone, I adapted the same code from CPAN that the PHP
folks did.

A couple of questions:
1. Does it make sense to fold the soundex contrib together with this one?

2. I was debating trying to add multibyte support to levenshtein (it would
make no sense at all for metaphone), but a quick search through the contrib
directory found no hits on the word MULTIBYTE. Should worry about adding
multibyte support to levenshtein()?

Joe Conway

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options

PostgreSQL CVSweb <webmaster@postgresql.org>