문제 설명
php와 drupal을 사용하여 pdf를 텍스트로 변환 (pdf to text convert using php and drupal)
pdf를 텍스트로 변환하기 위해 이 코드를 사용하고 있습니다. 제대로 작동하지만 지원하지 않습니다. 스웨덴 문자, 예:
correect swedish word = incorrect word
Förnamn = Fšrnamn,
Försäljningsdatum = FšrsŠljningsdatum,
varumärket = varumŠrket,
terförsäljaruppgifter = terfšrsŠljaruppgifter
코드:
<?php
require_once "pdf.pdf2text.inc";
$filename = "customerfile.pdf";
$pdf = new Pdf(urldecode($filename));
print utf8_decode($pdf‑>getText());//with utf‑8
print $pdf‑>getText(); //without utf‑8
?>
utf‑8 인코딩/디코딩을 추가했지만 작동하지 않습니다. 이 코드 사용
누군가 나를 도와주거나 적절한 텍스트(단어 ) 이 코드를 사용합니다.
미리 감사합니다.
참조 솔루션
방법 1:
iconv();
might be a possibility http://php.net/manual/fr/function.utf8‑decode.php
$myUnicodeString = "Åäö"; echo iconv("UTF‑8", "ISO‑8859‑1", $myUnicodeString);
as some comments say UTF‑8_decode();
is not enough to handle accents.
According to a comment on Drupal.org from Saubhagya:
add the octal and unicode equivalents of desired characters in array $_pdfDocToUni line 18 file initialize.pdf2text.inc (remember octal need to be in 3 digits as in other entries of array).
Then just go to line 335 of pdf2text.module and add your character in the same format of other ones.
https://www.drupal.org/node/1079780
Not sure about the use of the word "just" but it might be a help...
This appears to be the module he is talking about and it does have the array he mentioned ‑ perhaps your version may have modules missing ‑ there seem to be a lot of them on offer
http://cgit.drupalcode.org/pdf2text/tree/pdf2text.module?id=a15059bc1531aa336fef255397ba362c81c9fce5