occurrencesOf

20 novembre 2018

Comment trouver le nombre d’occurrences d’une chaîne de caractères dans MySQL?  J’en avais déjà parlé ici en français et aussi en anglais.

Pour se faciliter la vie, rien de mieux qu’une fonction pour faire le travail!

 

SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='STRICT_TRANS_TABLES';

DROP FUNCTION IF EXISTS occurrencesOf;

DELIMITER //
CREATE FUNCTION occurrencesOf ( sourceString VARBINARY(255), searchString VARBINARY(255))
RETURNS MEDIUMINT UNSIGNED
DETERMINISTIC
LANGUAGE SQL
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT 'Returns the number of occurrences of <searchString> inside <sourceString>'

BEGIN
DECLARE occ MEDIUMINT UNSIGNED;

SELECT FLOOR(( CHAR_LENGTH(sourceString) - CHAR_LENGTH(REPLACE(sourceString, searchString, '')) ) / (CHAR_LENGTH(searchString))) INTO occ;

RETURN occ;
END
//

DELIMITER ;
SET SQL_MODE=@OLD_SQL_MODE;

P.S.  J’ai l’intention de publier plusieurs fonctions utilitaires du genre.  D’ici à ce que je m’organise sur GitHub, je publierai le code de ces procédures et/ou fonctions ici en attendant!

Mise à jour (2019-03-07)



Count occurrences of a string using MySQL

20 juillet 2016

This was originally posted in French here.

There’s no string function in MySQL (and many other databases!) to help you find the number of occurrences of a string within another string.  For example, how many times does « abc »  appear in « abcbcbabcbacbcabcababcabacb » ?

I was asked this question on IRC a long time ago. Some poor soul was trying to find a particular subsequence in a genomic string (for instance « TAT ») in the following sequence :

ATTGGTGGGCTCTACTAAGATATCAACGGGACTTCGGAGCGTGCCGCACTATTT

Obviously, you can use your favorite programming language and do this kind of search programmatically but is there a way to do it in SQL?

Luckily, the answer is yes!  The solution is simple and looks like this:

SELECT FLOOR(( LENGTH(source) - LENGTH(REPLACE(source, target, '')) ) / (LENGTH(target))) as occ

To come back to our example, « source » being the genomic sequence and « target » being « TAT », you’d have :

SELECT FLOOR(( LENGTH('ATTGGTGGGCTCTACTAAGATATCAACGGGACTTCGGAGCGTGCCGCACTATTT') - LENGTH(REPLACE('ATTGGTGGGCTCTACTAAGATATCAACGGGACTTCGGAGCGTGCCGCACTATTT', 'TAT', '')) ) / (LENGTH('TAT'))) as occ

Here’s the answer!

Fortunately, in life there are way more many solutions than problems!  And sometimes, long SQL queries!

P.S.  This new post will probably interest you a lot!

Update (2019-03-07)

Save

Save

Save

Save

Save