This method will identify Arabic text in a given UTF-8 multi-language document and return an array of start and end positions for Arabic text segments. Understanding the language and encoding of a given document is an essential step in working with unstructured multilingual text. Without this basic knowledge, applications such as information retrieval and text mining cannot accurately process data, and important information may be completely missed or misrouted.
Any application that works with Arabic in multiple languages documents can benefit from this functionality. Applications can use it to take a fully automated approach to process Arabic text by quickly and accurately determining Arabic text segments within multiple languages document.
Peace سلام שלום Hasîtî शान्ति Barış 和平 Мир
Some Authors:
<?php
require '../src/arabic.php';
$Arabic = new \ArPHP\I18N\Arabic();
$p = $Arabic->arIdentify($html);
for ($i = count($p)-1; $i >= 0; $i-=2) {
$arStr = substr($html, $p[$i-1], $p[$i] - $p[$i-1]);
$replace = '<mark>' . $arStr . '</mark>';
$html = substr_replace($html, $replace, $p[$i-1], $p[$i] - $p[$i-1]);
}
echo $html;