PHP String

Having a PHPString source type seems silly at first but in the context of some spellcheckers requiring the encoding to function properly and computation of line numbers and offset characters position, you might want to attach a specific encoding to your string.

A PHP string is just a sequence of bytes, with no encoding tagged to it whatsoever. String values can come from various sources: the client (over HTTP), a database, a file, or from string literals in your source code. PHP reads all these as byte sequences, and it never extracts any encoding information.

PHPString class is meant to circumvent this issue by keeping the encoding information attached to the sequence of bytes representing your string.

In a UTF-8 encoded php file, if you run this script, you'll see the output difference:


$misspellingFinder = new MisspellingFinder(
    Aspell::create(), // Creates aspell spellchecker pointing to "aspell" as it's binary path
    new EchoHandler() // Handles all the misspellings found by echoing their information

$misspellings = $misspellingFinder->find(
    $string = new \PhpSpellcheck\Source\PHPString(
        mb_convert_encoding('ça éxagèrre', 'windows-1252'),
    'iso-8859-1' // Aspell will consider the input text encoded in iso-8859-1 which is roughly equivalent to windows-1252
// Output:
// word: �xag�rre | line: 1 | offset: 3 | suggestions: exag�rer,exag�re,exag�r�

/** @var \PhpSpellcheck\Misspelling[]|\Generator $misspellings */
$misspellings = $misspellingFinder->find(
    $string = new \PhpSpellcheck\Source\PHPString('ça éxagèrre'),
// Output:
// word: éxagèrre | line: 1 | offset: 3 | suggestions: exagérer,exagère,exagéré