CF9 Solr Hangs on Corrupt PDFs

February 25, 2015, 5:57 am

≫ Next: how do I find a human to talk to about web service API general questions

≪ Previous: pass value from javascript into cfc

I am indexing 34000 + documents physically located on the Hard Drive

Windows Server 2008 SP2

CF9

Oracle

Thanks to advice in another thread I started I am indexing the folders one at a time followed by an update after each. Some of the PDFs can be huge (130mb) but the average is closer to 1 mb. On occasion I will get to a PDF that is corrupt (If I copy it to my desktop and attempt to open it, Acrobat Pro says it is corrupt).

I have attempted using cfpdf to read header info in a cftry block with the catch creating a log entry. That should work but it hangs trying to read the doc (assuming that is what is happening with Solr too). I get no log entry and it will continue to hang until timeout for the request.

Can anyone think of a way to break out of a hung file and continue to index the remaining files?

Thanks

↧